Ramblings

I've written some things in the past as responses to email questions, and thought I'd share with the world. Sometimes I find myself repeating these mini lessons, or running into the same issues infrequently.

Using Crontab
Setting process priority in Perl
Eight-oh-what?
Fitting data with Gnuplot
Array of pointers of functions
Sanitizing strings in Java
Using tar
Using zip, gz, and tar.gz files
Lost files after fsck
Max value of set of numbers

Using Crontab

Date: Tue, 6 Apr 2004 11:44:14 -0700 (PDT)

To make a cron table with "crontab", as yourself,
type:

  crontab

This will put you in an editor (probably vi) for your
cron table. Type this line.

  */5 * * * * ~/somescript.sh

Save it and quit. crontab will use that line. It tells
cron to execute ~/somescript.sh "every five minutes".
You can read more about the format of the these lines
with:

  man 5 crontab

Crontab survive reboots, so as soon as the system is
running, your crontab is active.

Let me know if you have questions! :)

-Michal

PS. I've been asked several times, so in case you're
    wondering, cron stands for "Command Run ON".

Setting process priority in Perl

Date: Mon, 12 Apr 2004 10:57:12 -0700 (PDT)

I tried running setpriority from perl on one of each
cluster "types" (fp1-32, 33-64, 65-96, 97-128) and
suceeded.  The code is:

niceme.pl:

  #!/usr/bin/perl
  while (1)
  {
    my $x = getpriority(0,0);
    print "Nice me! I am $$ and my priority is $x.\n";
    sleep(5);
  }

iwillniceyou.pl:

  #!/usr/bin/perl
  use strict;
  my $pid = 15932; #change this manually..
  print "Setting priority of process $pid to 4\n";
  setpriority (0, $pid, 4);

You'll find this code in ~mikeg/hong/.

First, run "niceme.pl". Then, adjust the $pid variable
in iwillniceyou.pl and run it ont the same host. The
priority should be changed -- I verify this with "top"
as well.

You can of course run "setpriority" against the
calling process.

-Michal

Eight-oh-what?

Spawned after a night out at a bar where a debate over the order of Intel's processor releases, and which one had how many bits.

Date: Fri, 16 Apr 2004 10:41:14 -0700 (PDT)

Now that some of us have recovered from our drunken
stupor, a snippet from http://tinyurl.com/2tmpr is in
order:

   The [8086] also featured a 16 bit data bus allowing
   a 16 bit value to be read or write in one clock
   pulse.  However, this first version was too
   expensive to implement in small business computers
   of the time, so Intel developed an 8 bit data bus
   compatible version, the 8088. This version was
   chosen by IBM for the first IBM PC and made Intel
   the leader of a multi-million dollar industry.

This says that the 8086 came before the 8088. More is
on the above page.

Another place for some more details on the inner
workings (and fuel for the 16/20/24/32bit debate) of
these marvelous inventions is here:

  http://tinyurl.com/2qydv

And finally, the Intel page about dates of
introduction is here:

  http://tinyurl.com/3exg8

Make of this what you will!

-Michal

Fitting data with Gnuplot

Date: Thu, 6 May 2004 12:26:08 -0700 (PDT)

assume output data can be fit to a*log(b*x+c)+d curve.
define f(x):

  gnuplot> f(x) = a*log(x*b+c)+d

fit f(x) to data in file "data4" using variables a, b,
c, and d:

  gnuplot> fit f(x) "data4" via a, b, c, d

This will output the coefficients to stdout, and also
to a file called fit.log.

see "help fit" in gnuplot.

-Michal

Array of pointers of functions

If you want to create a bunch of functions that take the same input arguments but perform different things, and you really want them to be different functions (and not blocks of code chosen by if/then conditions), then you can index them with a number by creating an array of pointers to functions.

Date: Thu, 5 Aug 2004 20:32:53 -0700 (PDT)

And, it compiles too!

--begin cut--

#include <stdio.h>
#include <stdlib.h>

int sum(int x, int y)     { return x + y; }
int product(int x, int y) { return x * y; }
int max(int x, int y)     { return x > y ? x : y; }
int min(int x, int y)     { return x < y ? x : y; }

int main(void)
{

  void **f = (void *)malloc(sizeof(void *) * 4);

  f[0] = (void *)∑
  f[1] = (void *)&product;
  f[2] = (void *)&max;
  f[3] = (void *)&min;


  int a = 5;
  int b = 6;
  int i;
  for (i = 0; i<4; i+=1)
  {
    int (*mystery_function) (int, int);
    mystery_function = (void *)(f[i]);
    printf("f[%d](%d, %d) = %d\n", i, a, b, mystery_function(a,b));
  }

  return 0;
}

--end cut--

I still don't know how to nest the call into one line.
Guess you could write a wrapper function, a sort of
metafunction. Fun!

Good site to look thru:

  www.function-pointer.org

-Michal

Sanitizing strings in Java

I was asked for a good way to sanitize strings in Java. I this function which should work for most cases.

Date: Fri, 12 Nov 2004 15:16:18 -0800 (PST)

Anyway, here is some code you should try to fit into 
the UserExpUploadAction.java file, where it says to 
sanitize the filename:

  mikeg@ovid11:~/java >cat Sanitize.java
  class Sanitize {
      public static void main(String[] args) {
          String x = "Hello, world! Some bad characters are here: @!#$%^&()[]{}.,;'\":<>?";
          System.out.println("Original string: " + x);
  
          /* Characters that are OK in a file are described
             by regular expressions as:
  
              \w - alphanumeric (A-Za-z0-9)
              \. - dot
              \- - dash
              \: - colon
              \; - semicolon
              \# - number sign
              \_ - underscore
  
            Each \ above must be escaped to allow javac to parse
            it correctly. That's why it looks so bad below.
  
            Since we want to replace things that are not the above,
            set negation ([^ and ]) is used.
          */  
          String y = x.replaceAll("[^\\w\\.\\-\\:\\;\\#\\_]", "_");
          System.out.println("Fixed string:    " + y);
      }
  }
  mikeg@ovid11:~/java >javac Sanitize.java && java Sanitize.class
  Original string: Hello, world! Some bad characters are here: @!#$%^&()[]{}.,;'":<>?
  Fixed string:    Hello__world__Some_bad_characters_are_here:___#__________._;__:___
  mikeg@ovid11:~/java >

What do you think?

-Michal

Using tar

Tar is an ancient program. It defaults to using tapes (it is called Tape ARchive for a reason...) which are a rarity on most desktop PCs, for one. Anyway, here is a brief summary of how to use it.

Date: Tue, 16 Nov 2004 19:49:23 -0800 (PST)

Ok, to create a tar file:

Assuming you have your files in the directory 
/files/xyz/, you can put everything in xyz (including 
xyz) into a tar file like so:

  cd /files
  tar -cf xyz.tar xyz/*

(the -c means create, and -f means what follows is a 
filename)

That will create xyz.tar. To see what's inside it, do:

  tar -t xyz.tar

(the -t means list)

You will see things like

  xyz/foo
  xyz/bar
  xyz/abcd
  ...

The tar file is just a collection of files glued 
together.

To extract its contents, do:

  tar -xvf xyz.tar

(the -x means "extract", the -v means "verbose", and 
"-f" means what follows is a filename)

To compress the tar file, use gzip:

  gzip -9 xyz.tar

That will make xyz.tar.gz. It will be much smaller 
than xyz.tar. To uncompress it, do:

  gunzip xyz.tar.gz

which will create xyz.tar.

(You can use all these steps to convince yourself that 
you have correctly made the tar file, by copying the 
.tar.gz file to another directory and extracting it 
there.)

Once you have this file, you can email it as an 
attachment. The "tar.gz" file (called a "tarball") is 
very common and should be understood by most users of 
UNIX. WinZip, the popular "zip" program for Windows, 
also understands .tar.gz files and uncompresses them.

-Michal

Using zip, gz, and tar.gz files

More fun with file formats.


Date: Tue, 17 Oct 2006 23:09:29 -0700 (PDT)

> How does one decompress *.zip files, *.gz, and *.tar.tar files in *nix?

.zip files are handled with "zip" and "unzip", .gz 
files are handled with "gzip" and "gunzip", .tar.tar 
files are unknown to me, so i'll guess you mean 
.tar.gz, which are just .gz files.

.zip is a container for many files, and this container 
is compressed.

.tar is a container for many files.

.gz is a container for one compressed file.

.tar.gz. is a container for many files (.tar) that has 
been compressed (.gz.). sometimes, this is abbreviated 
as .tgz.


zip:

if you have a .zip file, you can see its contents with 

  unzip -l foo.zip

and extract it with

  unzip foo.zip

(learn more with "unzip --help")


gz:

if you have a .gz file, you can uncompress it with

  gunzip foo.gz

which will create file foo and delete foo.gz.

if you have a file foo and want to compress it, use

  gzip foo

which will create file foo.gz and delete foo.


tar:

if you have a file foo.tar you can see its contents 
with:

  tar tf foo.tar

and extract its contents with

  tar xvf foo.tar

(the "v" is optional, but useful)


tar.gz:

if you have a file foo.tar.gz, you can uncompress it 
with:

  gunzip foo.tar.gz

which will create foo.tar and delete foo.tar.gz. then 
follow the tar instructions above to inspect or 
extract it. linux people conveniently anticipated this 
awful situation and gave tar the "z" flag to handle 
.tar.gz files automatically. so, if you have a .tar.gz 
file you can see its contents with:

  tar ztf foo.tar.gz

and extract its contents with:

  tar zxvf foo.tar.gz

-Michal

Lost files after fsck

On ext2 file systems, file got lost very often after a hard reset. Still can happen on ext3, but it's less common now. Here's what happens.

Date: Tue, 23 Nov 2004 14:26:09 -0800 (PST)

Whenever your computer is started after being abruptly 
turned off, it runs the filesystem check (fsck) at 
startup. Part of the file checking process is to 
recover files that have been lost. Such files are put 
in the "lost+found" directory of the filesystem's root 
directory. The files won't have the original names, 
but will have original contents.

Check these directories on your system after every 
fsck run:

  /lost+found/
  /maxa/lost+found/
  /maxb/lost+found/
  /spare/lost+found/

You will need to be root to see files in there.

Repeated abrupt restarts will cause files to be lost 
from their original location and found in the 
lost+found directory. If you keep it up without being 
careful, eventually enough system files will be 
"lost+found" and your system may not boot. So it's 
important to check the lost+found directories after 
fsck runs.

-Michal

Max value of set of numbers

I've been dancing around this issue for a while now, and never got around to making the script until now. Well, wait no more.

Date: Mon, 8 May 2006 18:42:32 -0700 (PDT)

>I got another small question. trying find it with no success.
>   
>  "is there a simple command for finding maximum value in specified column of a file?"
>  OR i should just write it my own....

You can use awk, sort and head to get that.

Here's a file with five columns:

  mikeg@karma:~/tmp >cat values
  62 19 63 18 40
  22 37 15 43 45
  36 29 72 62 24
  90 46 84 48 10
  41 11 61 12 75
  52 46 68 34 78
  79 22 44 59 53
  94 74 53 37 98
  54 39 12 75 24
  93 33 97 45 50
  
Show only the fourth column:
  
  mikeg@karma:~/tmp >cat values | awk '{print $4}'
  18
  43
  62
  48
  12
  34
  59
  37
  75
  45
  
Sort that set of numbers in numerically (1 before 11) 
in reverse order (biggest first)
  
  mikeg@karma:~/tmp >cat values | awk '{print $4}' | sort -rn
  75
  62
  59
  48
  45
  43
  37
  34
  18
  12
  
And show only the first line of that list:
  
  mikeg@karma:~/tmp >cat values | awk '{print $4}' | sort -rn | head -1
  75

You can also use "sort -n | tail -1", but that is less 
efficient since the long list of numbers has to be 
seen by "tail" before it decides to show you the last 
one.

If your list of numbers is very long, or you are 
repeating this step frequently, then it is also 
inefficient to sort the list in memory. You can omit 
this step by writing a small awk script. Instead of 
printing the fourth column, accumulate it into a 
variable "m" and print it at the end:

  mikeg@karma:~/tmp >cat values | awk '{if ($4>m) {m=$4} } END {print m}'
  75

This uses no more memory than the longest line in your 
file, and is faster.

If you're crazy and motivated, you can also write a 
shell script that uses awk to find min, max and avg of 
the first column in one step:

  mikeg@karma:~/tmp >cat ~/bin/minmaxavg.sh 
  #!/bin/sh
  awk '
  {
    if (!min) { min = $1; }
    if ($1<min) min = $1;
    if ($1>max) max = $1;
    sum += $1;
  }
  
  END {
    print "min", min
    print "max", max
    print "avg", sum/NR
  }'
  
  mikeg@karma:~/tmp >cat values | awk '{print $4}' | ~/bin/minmaxavg.sh 
  min 12
  max 75
  avg 43.3

And then extract only the max value of the fourth 
column with:

  mikeg@karma:~/tmp >cat values | awk '{print $4}' | ~/bin/minmaxavg.sh | grep "max" | awk '{print $2}'
  75

Or simply:

  mikeg@karma:~/tmp >cat values | awk '{print $4}' | ~/bin/minmaxavg.sh | awk '/max/ {print $2}'
  75

(If you make such a program, don't forget to do
"chmod +x minmaxavg.sh" so it is executable.)

awk is very powerful, but the above is probably as far 
as anyone should go with awk. Anything more and you 
should write a perl or python script, for clarity and 
sanity.

-Michal

This is https://michal.guerquin.com/ramblings.html, updated 2006-10-18 01:51 EDT

Contact: michalg at domain where domain is gmail.com (more)