Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mawk and gawk: Comparing and Contrasting 2 fine programs

144 views
Skip to first unread message

Chris Fearnley

unread,
Jan 29, 1996, 3:00:00 AM1/29/96
to

I decided to spend a little bit of time comparing and contrasting gawk
and mawk, two GPL'd versions of the AWK progrmming language. I report
my results here and await learning what sillinesses I made!

Where to get the software:
ftp://prep.ai.mit.edu/pub/gnu/gawk-3.0.0.tar.gz
ftp://oxy.edu/public/mawk1.2.2.tar.gz [Well, I just checked and only
an older version is there right now. But you can get my Debian port
(ftp site listed below) - the "debianized" source (.tar.gz) includes
everything necessary to build mawk for most platforms including DOS.]

I compiled the software on a Debian GNU/Linux system with gcc 2.7.2 and
-O2 optimization ("debianized" source and binaries are available from
ftp://ftp.netaxs.com/people/cjf/debian and soon from the Debian
project's main ftp site: ftp://ftp.debian.org). Compiling went without
any hitches.

Here are summaries of the 3 tests I ran (transcripts of some of the
tests are available from ftp://ftp.netaxs.com/people/cjf/debian/awk):

Test1: compare mawk and gawk in performing "make dep" on the linux
kernel 1.3.43.

Conclusions: MAWK performed significantly faster than GAWK as determined
by the output of time(1). Here is the summary data (over 5 tests):
mawk: avg user = 29.17 secs; avg system = 15.50 secs; avg elapsed = 76.39 secs
gawk: avg user = 55.97 secs; avg system = 16.82 secs; avg elapsed = 105.45 secs
The output from both programs was identical.

Test2: To test mawk and gawk under the mawk test suite

Conclusion: gawk fails to pass mawk's "function calls and general stress
test". The test it fails on is a script decl.awk. That file warns:
"some awks need double escapes on strings used as regular expressions.
If not run on mawk, use gdecl.awk". So I reran the tests with a copy of
the original mawktest program with the change to use the gdecl.awk script.
Both mawk and gawk passed this test but mawk performed much faster
(more than twice as fast!) as determined by the output of time(1).

Test3: to test gawk and mawk under the gawk test suite

Conclusions: The gawk test suite includes some 53 tests. In 27 of
these mawk and gawk gave identical output. Given that so many of these
tests showed up differences betweeen gawk and mawk, it seems fruitful
to analyze each test separately. I did run one speed test (running
each of the 27 tests where both gawk and mawk gave identical results).
In this case mawk performed just a bit faster than gawk.

Here is a test-by-test Analysis (tests were conducted using bash
under Linux 1.2.13, mawk and gawk both compiled with gcc 2.7.2 and -O2
optimization):
1. The following tests were passed by both mawk and gawk: swaplns
longwrds getline fstabplus compare arrayref rs fsrs rand fsbs
negexp asgext splitargv nfset reparse convfmt resplit rswhite intprec
childin numsubstr pcntplus prmreuse math fflush fldchg posix
2. messages: mawk passes the test; gawk's results differ from what
the test expects, but the test considers this situation to be "ok".
$ make messages AWK=gawk
{ cmp ./out1.ok out1 && cmp ./out2.ok out2 && cmp ./out3.ok out3 && rm -f out1 out2 out3; } || { test -d /dev/fd && echo IT IS OK THAT THIS TEST FAILED; }
./out2.ok out2 differ: char 1, line 1
IT IS OK THAT THIS TEST FAILED
$ cat out2.ok
Normal print statement
This printed on stdout
$ cat out2
This printed on stdout
Normal print statement
$ cat messages.awk
# This is a demo of different ways of printing with gawk. Try it
# with and without -c (compatibility) flag, redirecting output
# from gawk to a file or not. Some results can be quite unexpected.
BEGIN {
print "Goes to a file out1" > "out1"
print "Normal print statement"
print "This printed on stdout" > "/dev/stdout"
print "You blew it!" > "/dev/stderr"
}
3. anchgsub: This test tests if anchoring a regular expression works
in the gsub built-in. It is unclear to me if the gsub or the anchor
should take precedence (mawk takes the former position, gawk the
later). Presumably the programmer should use the sub built-in if
they only want the "leftmost longest" string to be substituted.
Here is the test (I use GNU cat to show non-printing characters):
$ cat --show-all anchgsub.awk
{ gsub(/^[ ^I]*/, "", $0) ; print }$
$ cat --show-all anchgsub.in
^IThis is a test, this is only a test.$
$ gawk -f anchgsub.awk < anchgsub.in
This is a test, this is only a test.
$ mawk -f anchgsub.awk < anchgsub.in
Thisisatest,thisisonlyatest.
4. awkpath: mawk doesn't support the environment variable AWKPATH.
5. arrayparm, paramdup, prmarscl, sclforin, sclforin: mawk and gawk
give different error messages (I didn't bother trying to evaluate
each case. In general both gawk and mawk noticed a problem and
reported some clues as to what went wrong. The one case I looked
at suggested to me that both programs could improve their handling
of error conditions.).
6. nonl, defref, nofmtch, noeffect: This test uses the --lint option
which mawk doesn't support.
7. litoct: This test uses the --traditional which mawk doesn't support.
8. fieldwdth: This test uses the special variable FIELDWIDTHS which mawk
doesn't support.
9. ignrcase, igncfs: These tests use the special variable IGNORECASE
which mawk doesn't support.
10. manyfiles: mawk limits the number of open files (gawk seems to
allow for an unlimited number of open files). When I modified the
test to only open 252 open files, gawk outperformed mawk as indicated
by time(1). Here is the error message mawk gives when it dies:
mawk: cannot open "junk/253" for output (Too many open files)
11. argtest: mawk doesn't recognize the command line options -x or -y
which gawk adds to its ARGV array. I think this is non-standard
(though possibly useful) behavior on gawk's part.
12. badargs: mawk gives a more useful error message than gawk in
response to the command 'mawk -f' mawk returns:
mawk: option -f lacks argument
whereas gawk spits out its complete usage message.
13. strftime, gensub: mawk doesn't support the strftime() and gensub()
built-in functions.
14. gnureops: mawk doesn't support the GNU regular expression operator
extensions. [Things like "\w", "\W", "\<", "\>", "\y", etc.]

--
Christopher J. Fearnley | UNIX SIG Leader at PACS
c...@netaxs.com | (Philadelphia Area Computer Society)
http://www.netaxs.com/~cjf | Design Science Revolutionary
ftp://ftp.netaxs.com/people/cjf | Explorer in Universe
"Dare to be Naive" -- Bucky Fuller | Linux Advocate

bill davidsen

unread,
Jan 30, 1996, 3:00:00 AM1/30/96
to
In article <4ehpf0$1...@netaxs.com>, Chris Fearnley <c...@netaxs.com> wrote:
|
| I decided to spend a little bit of time comparing and contrasting gawk
| and mawk, two GPL'd versions of the AWK progrmming language. I report
| my results here and await learning what sillinesses I made!

Thank you for a most useful and enlightening comparison. I may have
something to add in a few days, I'm hoping to get the latest mawk
beta, which may effect some of these results.
--
-bill davidsen (davi...@tmr.com)
"To start the new year, in the USA the government has laid off most
non-essential government personel. In November the voters will get rid
of the rest." -me

0 new messages