The awk utility is used to make selections and
substitutions in text files. It is actually a highly sophisticated
C-like interpretive programming language. Our use of awk will
be very limited, and will come with complete instructions with each
case. We give a couple of common, useful examples here. For more
information about awk, see A. Aho, B.W. Kernighan, and P.J. Weinberger,
The AWK Programming Language (Addison-Wesley, Reading, MA, 1988).
Here is an example of where you might use awk.
Suppose you have a program that produces a list of values like this:
2.5 0.513
2.7 0.416
2.9 0.213
in a file called oldfile.
Suppose you decided that you wanted to multiply the value by 2 on
every line and put the result in a file called newfile. The
awk command to do this is
awk '{print $1, 2*$2;}' oldfile > newfile
Inside the single quotes '' is the instruction to awk.
The instruction is applied to each line, one at a time. Awk
sees each line as a bunch of fields separated by ``white'' space
(blanks and tabs are white space). Each line in this case has two
fields, therefore. The single instruction says for each line, print
the first field and twice the second field.
Awk can print the third field on all lines containing the string
ans with the command
awk '/ans/{print $3;}' oldfile
This procedure is very handy for extracting numbers from complicated
output files for subsequent processing, as long as you had the
foresight to include a unique keyword ans on the line
containing the number you wanted to extract. Since we chose not to
redirect output in this example, it comes to the screen. (You could
redirect it to a file, of course.)
Awk can average all the numbers in a field. Suppose in the above list
of numbers, you wanted to compute the average value of the second
field. The awk command is
awk 'BEGIN{s=0;}{s=s+$2;}END{print s/NR;}' oldfile
There are actually three instructions given to awk, separated by
a semicolon ;. The first BEGIN{s=0} is executed only once
before the file is read. It initializes the variable s that is
to hold the sum. This step is actually not necessary, since all
variables are automatically initialized to zero. The second command
is executed for each line and accumulates the sum. The
third command END{print s/NR;} is executed only after the last
line has been processed. It says to print the sum, divided by NR, a special awk variable that gives the number of lines
(``records'') read so far.
In these examples the awk scripts are short enough to put on the
command line, enclosed between single quotes. Longer scripts are
generally put in files. Scripts are easier to read if the commands
are separated by line breaks, just as you would a C program. If the
file name extension is .awk as in avg.awk, The emacs
editor recognizes the name and enters a useful awk editing mode
similar to its C and C++ modes. If we had done that with the
averaging script above, we would run it using
awk -f avg.awk oldfile
One can use awk to do quick calculator-style arithmetic. To evaluate
, for example, do
awk 'BEGIN{print cos(3.5);}'
The syntax for arithmetic expressions is the same as in C. In this
case awk is not working on any file, so our commands must be executed
under BEGIN.
Sometimes we like to be able to change an awk variable each time we
run a script. Instead of editing the script each time, we specify
the variable on the command line. Here is an awk script for selecting
all lines for which the first field matches a specified key string
# keyselect.awk
{ if($1==KEYSTRING)print; }
We run it with
awk -f keyselect.awk -v KEYSTRING=ans
In this case the value of the variable KEYSTRING is set to
``ans'' on the command line.
One can do much more with awk. Indeed one could almost use awk as a programming language in place of C++, Fortran or C, but its
strengths lie in record manipulation, and not in number crunching, and
some of the most elegant applications involve only one-liners as in
the foregoing examples.