awk vs. nawk vs. gawk vs. C

Dan Ross

unread,

Apr 14, 1993, 5:51:42 PM4/14/93

to

Locally, we have three versions of awk: awk, nawk, and gawk. What are
the technical differences in these versions, with respect to:

* availability on every UNIX implementation (I suppose the source to gawk
is freely available)
* performance

Furthermore, is it worth rewriting relatively simple scripts in C, esp. if
they are operating on multi-megabyte files?

Is it generally true that C will be faster than even gawk?

Dan

Henry Spencer

unread,

Apr 15, 1993, 11:57:11 AM4/15/93

to

In article <dross.7...@cs.wisc.edu> dr...@cambizola.cs.wisc.edu (Dan Ross) writes:
>Locally, we have three versions of awk: awk, nawk, and gawk. What are
>the technical differences in these versions, with respect to:
>
>* availability on every UNIX implementation (I suppose the source to gawk
> is freely available)
>* performance

Old awk is obsolete, although still somewhat more widely available than
the newer ones. Nawk (which many people now just call awk) used to be
scarce but is now widespread, although not universal yet. There are two
freely-redistributable awks: mawk and gawk.

Nawk, mawk, and gawk are all pretty much compatible.

Mawk generally has the best performance, followed (I believe) by gawk,
with nawk and old awk left in the dust.

>Furthermore, is it worth rewriting relatively simple scripts in C, esp. if
>they are operating on multi-megabyte files?

It depends. C *will* be faster, but it's also a lot more hassle. We
don't hesitate to feed multi-meg files to awk programs, provided that
it's not done too often and nobody is sitting there waiting for the output.

The fast rule is to write things in awk when possible, and rewrite in C
only when the awk version's performance is clearly unsatisfactory. The
awk version is typically a lot easier to maintain, so the rewrite isn't
worthwhile unless the awk version just won't do.
--
All work is one man's work. | Henry Spencer @ U of Toronto Zoology
- Kipling | he...@zoo.toronto.edu utzoo!henry

william E Davidsen

unread,

Apr 15, 1993, 3:23:52 PM4/15/93

to

In article <dross.7...@cs.wisc.edu>, dr...@cambizola.cs.wisc.edu (Dan Ross) writes:
| Locally, we have three versions of awk: awk, nawk, and gawk. What are
| the technical differences in these versions, with respect to:
|
| * availability on every UNIX implementation (I suppose the source to gawk
| is freely available)

mawk is also widely available in source.

| * performance

I posted some benchmarks here a while ago, they indicate that on Sun,
MIPS, and Intel UNIX, for the problems I tested and believe to be
common, mawk is faster than gawk or nawk. I believe gawk is usually
faster than nawk.

|
| Furthermore, is it worth rewriting relatively simple scripts in C, esp. if
| they are operating on multi-megabyte files?

Not that I've seen.

|
| Is it generally true that C will be faster than even gawk?

Good C will, but good C with pattern matching and string manipulation
is not casually written.
|
| Dan

--
bill davidsen, GE Corp. R&D Center; Box 8; Schenectady NY 12345

Jerry Wieber

unread,

Apr 16, 1993, 12:12:19 AM4/16/93

to

In article <dross.7...@cs.wisc.edu> dr...@cambizola.cs.wisc.edu (Dan Ross) writes:

>Is it generally true that C will be faster than even gawk?

Sure is. You would not want to use (g)awk if a fast runtime is
essential. However, for the majority of applications,
the 'critical time' is the time spent writing the program.

Coding and debugging in awk is an order of magnitude easier than in C.
Perl, by the way, is gradually replacing awk -- you should seriously
consider using perl instead of awk.

-Jerry
--
__________
UUCP: uunet!cs.umd.edu!jerryw SPOKEN: Jerry Wieber |/ `-. | U of Md
INTERNET: jer...@cs.umd.edu \_|.|-,
` -

Randal L. Schwartz

unread,

Apr 18, 1993, 2:33:52 PM4/18/93

to

>>>>> In article <dross.7...@cs.wisc.edu>, dr...@cambizola.cs.wisc.edu (Dan Ross) writes:

Dan> Furthermore, is it worth rewriting relatively simple scripts in C, esp. if
Dan> they are operating on multi-megabyte files?

If you want something halfway between *awk and C, take a look at Perl.
There's even an awk-to-Perl converter included in the distribution
that works rather well.

And Perl is available under the Gnu Public License: essentially free.

Dan> Is it generally true that C will be faster than even gawk?

Almost certainly, unless you factor in the programmer debugging time. :-)

echo 'BEGIN { print "Just another Perl hacker," }' | a2p | perl
--
Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
mer...@ora.com (semi-permanent) mer...@agora.rain.com (for newsreading only)
phrase: "Welcome to Portland, Oregon ... home of the California Raisins!"

richard ryan

unread,

Apr 19, 1993, 4:32:35 AM4/19/93

to

Hi,

I've got a problem with doing something in C that I would like to have
done in awk or gawk. We've got several different machines with our
home directories on a file server. If we compile a C program and put
it in $HOME/bin, it'll only run on the machine it was compiled on.
But if I use it, I would need it to run on several machines.

I have a C program called bend similar to fold that folds a line past
a given column and appending a backslash "\" where the line is folded.

It is used like:

bend -78 file

or

cat file | bend -78

Has anybody written a simple awk program to do this? Any help would
be appreciated. If you would like to see the C program, please let me
know. It takes tabs as a width of eight characters.

Thanks,
Richard.

p.s. Heck, I'll just throw the C program in for good measure.

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#define DEFAULT_LINE_LENGTH 77

int main(int argc, char* argv[])
{
char ch = '\0';
int i;
int tab_width = 7;
int line_length;
FILE *fptr;

/* this switch statement is just a command line parser */
switch(argc) {
case 1:
line_length = DEFAULT_LINE_LENGTH;
fptr = stdin;
break;
case 2:
if(argv[1][0] == '-') {
line_length = -atoi(argv[1]);
fptr = stdin;
}
else {
line_length = DEFAULT_LINE_LENGTH;
fptr = fopen(argv[1], "r");
if(fptr == NULL) {
printf("File %s could not be opened for read.\n", argv[1]);
exit(1);
}
}
break;
case 3:
line_length = -atoi(argv[1]);
fptr = fopen(argv[2], "r");
if(fptr == NULL) {
printf("File %s could not be opened for read.\n", argv[2]);
exit(1);
}
break;
default:
fprintf(stdout, "There are too many command line arguments.\n");
exit(1);
}

/* this while loop reads in a line of text one character at a time */
while(ch != EOF) {
for(i = 0, ch = getc(fptr);
(i < line_length) && (ch != '\n') && (ch != EOF);
i++, ch = getc(fptr)) {
if(ch == '\t') {
if((i += tab_width) >= line_length) {
for(i -= tab_width; i < line_length; i++)
putc(' ',stdout);
continue;
}
}
putc(ch,stdout);
}
if(ch != '\n' && ch != EOF) {
if(ch != '\t')
fprintf(stdout, "%c\\\n", ch);
else
fprintf(stdout, " \\\n");
}
else
fprintf(stdout, "\n");
}

return 0;
}

J Lee Jaap

unread,

Apr 19, 1993, 4:14:04 PM4/19/93

to

In article <RYANR.93A...@beethoven.CS.ColoState.EDU> ry...@beethoven.CS.ColoState.EDU (richard ryan) writes:
I've got a problem with doing something in C that I would like to have
done in awk or gawk. We've got several different machines with our
home directories on a file server. If we compile a C program and put
it in $HOME/bin, it'll only run on the machine it was compiled on.
But if I use it, I would need it to run on several machines.

We have DECstations, Suns, IRISes, and HPs sharing a common user
environment, and our solution to this problem is to have a separate
bin subdir for each architecture needed:
$HOME/bin/{sun4,sun3,decstation,iris4d,hp9000s700}
where the subdir name is the output of arch (we created a one-liner
shell script for the OSs that don't have arch). PATH (and path)
are then set in the following way:

set path=( ... $HOME/bin/`arch` $HOME/bin ... )

which finds any architecture-specific programs first, then anything
else (such as shell, awk, or perl scripts) in the generic directory.

As an optimization, we

setenv ARCH `arch`

earlier in .cshrc, then use $ARCH instead of `arch`.
--
J Lee Jaap <J.L....@LaRC.NASA.Gov> +1 804/864-2148
employed by, not speaking for, AS&M Inc, at
NASA LaRC, Hampton VA 23681-0001

Alvin Chia-Hua Shih

unread,

Apr 20, 1993, 4:55:10 AM4/20/93

to

In <RYANR.93A...@beethoven.CS.ColoState.EDU> ry...@beethoven.CS.ColoState.EDU (richard ryan) writes:

>Hi,

>I've got a problem with doing something in C that I would like to have
>done in awk or gawk. We've got several different machines with our
>home directories on a file server. If we compile a C program and put
>it in $HOME/bin, it'll only run on the machine it was compiled on.
>But if I use it, I would need it to run on several machines.

>I have a C program called bend similar to fold that folds a line past
>a given column and appending a backslash "\" where the line is folded.

>It is used like:

> bend -78 file

>or

> cat file | bend -78

>Has anybody written a simple awk program to do this? Any help would
>be appreciated. If you would like to see the C program, please let me
>know. It takes tabs as a width of eight characters.

>Thanks,
>Richard.

>p.s. Heck, I'll just throw the C program in for good measure.

[ C source elided. ]

You can get the same effect with the *standard* UNIX utilities already
on your system(s).

There's:
expand file | fold -78
or:
cat file | expand | fold -78

Check the man pages for expand(1) and fold(1) on your system. You
don't need awk for this. Awk will also be considerably slower.

ACS
--
___ ___ ___ ______________________________________________________________
| | | __| "Maybe I'm paranoid, but remember, even paranoids have |
| - | --|__ | enemies."--Cal Pryluck, Temple University |
|_|_|___|___|______________________________________________________________|
Alvin_C._Shih_____...@csri.utoronto.ca______________________|

william E Davidsen

unread,

Apr 20, 1993, 10:20:48 AM4/20/93

to

In article <JAAPJL.93A...@amb3.larc.nasa.gov>, jaa...@tab00.larc.nasa.gov (J Lee Jaap) writes:

| shell script for the OSs that don't have arch). PATH (and path)
| are then set in the following way:
|
| set path=( ... $HOME/bin/`arch` $HOME/bin ... )
|
| which finds any architecture-specific programs first, then anything
| else (such as shell, awk, or perl scripts) in the generic directory.

We do a similar thing, but put the ARCH first so people only have to
mount a single point. That is, if your server has sun3, sun4, hp700, and
cray code, it's easier for us to mount /common/hp700 for the server and
get the bin, lib, man and etc directories, than mount bin/ARCH,
etc/ARCH, man/ARCH... and to preserve sanity we do the same thing in
personal directorys.

Mount points are not an expensive resource, but neither are they
totally without cost.

--
bill davidsen, GE Corp. R&D Center; Box 8; Schenectady NY 12345

Last year I worried that Bush would die and let Quayle take over.
This year I worry that Hillary will die and let Bill take over.

Steve Bacher

unread,

Apr 20, 1993, 2:04:00 PM4/20/93

to

In article <RYANR.93A...@beethoven.CS.ColoState.EDU>,
ry...@beethoven.CS.ColoState.EDU (richard ryan) writes:

>I have a C program called bend similar to fold that folds a line past
>a given column and appending a backslash "\" where the line is folded.
>
>It is used like:
>
> bend -78 file
>
>or
>
> cat file | bend -78

>Has anybody written a simple awk program to do this? Any help would
>be appreciated. If you would like to see the C program, please let me
>know. It takes tabs as a width of eight characters.

In article <1993Apr20....@jarvis.csri.toronto.edu>,

a...@csri.toronto.edu (Alvin Chia-Hua Shih) writes:

>You can get the same effect with the *standard* UNIX utilities already
>on your system(s).
>
>There's:
> expand file | fold -78
>or:
> cat file | expand | fold -78
>
>Check the man pages for expand(1) and fold(1) on your system. You
>don't need awk for this. Awk will also be considerably slower.

But fold doesn't append a backslash to each split line, as Richard
was asking for.

Here, try this:

--- cut here ---
#!/bin/sh

if [ $# = 0 ];then
echo "Usage; $0 [-count] [files]" 1>&2
exit 1
fi

count=80

case "$1" in
-[0-9]*) count=`echo "$1" | cut -c2-`;shift;;
esac

awk '
BEGIN {count='"$count"'}
length($0) == 0 {print;next}
{
line = $0;
while (length(line) >= count) {
print substr(line,1,count-1) "\\";
line = substr(line,count)
}
print line;
}
' $*
--- ereh tuc ---

--
Steve Bacher (Batchman) Draper Laboratory
Internet: s...@draper.com Cambridge, MA, USA