As for the speed testing:
I have found another comparison which seems quite well suited:
http://awka.sourceforge.net/compare.html
or even better, because it contains newer source files:
https://github.com/chadbrewbaker/awka/tree/master/benchmark
cast.awk (lots of string to number & back casts in a loop)
array.awk (tests arrays with string elements)
array2.awk (tests arrays with integer elements)
array3.awk (speed accessing multi-dimension arrays)
io.awk (writes out a large file, reads it in & splits $0)
split.awk (how fast is the split functionality?)
function.awk (many repeated function calls passing arguments)
expr.awk (speed evaluating a reasonably complex expression)
parsecsv.awk (a benchmarking example from comp.lang.awk)
loop.awk (heavily nested for loops and if statements)
recurse.awk (deep recursive function test)
regexp.awk (test regular expression speed)
I have used these Compiler versions for Microsoft Windows:
GAWK 4.1.3:
https://sourceforge.net/projects/ezwinports/files/gawk-4.1.3-w32-bin.zip/download
MAWK 1.3.3:
http://www.klabaster.com/freeware.htm#dl
TAWK 5.0c
http://www.tasoft.com/tawk.html
For the timer I used a Windows console program named timer.exe:
http://www.gammadyne.com/cmdline.htm#timer
e.g. 0:00:04.18 means 4 seconds 18 milliseconds
Here are my results:
*** array.awk ***
1st Winnner: MAWK time for array.awk:
0:00:04.18
GAWK time for array.awk:
0:00:06.40
2nd Winnner: TAWK time for array.awk:
0:00:02.34
*** array2.awk ***
1st Winnner: MAWK time for array2.awk:
0:00:00.68
2nd Winnner: GAWK time for array2.awk:
0:00:01.01
TAWK time for array2.awk:
0:00:01.64
*** array3.awk ***
2nd Winnner: MAWK time for array3.awk:
0:00:04.21
1st Winnner: GAWK time for array3.awk:
0:00:02.42
TAWK time for array3.awk:
0:00:05.76
*** cast.awk ***
1st Winnner: MAWK time for cast.awk:
0:00:05.37
2nd Winnner: GAWK time for cast.awk:
0:00:09.37
TAWK time for cast.awk:
0:00:10.40
*** expr.awk ***
1st Winnner: MAWK time for expr.awk:
0:00:02.07
GAWK time for expr.awk:
0:00:13.09
2nd Winnner: TAWK time for expr.awk:
0:00:06.75
*** function.awk ***
MAWK time for function.awk:
0:00:02.57
1st Winnner: GAWK time for function.awk:
0:00:01.93
2nd Winnner: TAWK time for function.awk:
0:00:02.51
*** io.awk ***
1st Winnner: MAWK time for io.awk:
0:00:02.56
2nd Winnner: GAWK time for io.awk:
0:00:03.78
TAWK time for io.awk:
0:00:08.84
*** loop.awk ***
1st Winnner: MAWK time for loop.awk:
0:00:02.82
GAWK time for loop.awk:
0:00:07.64
2nd Winnner: TAWK time for loop.awk:
0:00:03.21
*** parsecsv.awk ***
1st Winnner: MAWK time for parsecsv.awk:
0:00:01.73
2nd Winnner: GAWK time for parsecsv.awk:
0:00:04.73
TAWK time for parsecsv.awk:
0:00:05.14
*** recurse.awk ***
1st Winnner: MAWK time for recurse.awk:
0:00:00.15
GAWK time for recurse.awk:
0:00:00.35
2nd Winnner: TAWK time for recurse.awk:
0:00:00.25
*** regexp.awk ***
1st Winnner: MAWK time for regexp.awk:
0:00:00.48
GAWK time for regexp.awk:
0:00:27.59
2nd Winnner: TAWK time for regexp.awk:
0:00:01.26
*** split.awk ***
1st Winnner: MAWK time for split.awk:
0:00:02.00
2nd Winnner: GAWK time for split.awk:
0:00:04.84
TAWK time for split.awk:
0:00:05.23
The result as a summary:
1st Winnner: MAWK (array.awk)
1st Winnner: MAWK (array2.awk)
1st Winnner: GAWK (array3.awk)
1st Winnner: MAWK (cast.awk)
1st Winnner: MAWK (expr.awk)
1st Winnner: GAWK (function.awk)
1st Winnner: MAWK (io.awk)
1st Winnner: MAWK (loop.awk)
1st Winnner: MAWK (parsecsv.awk)
1st Winnner: MAWK (recurse.awk)
1st Winnner: MAWK (regexp.awk)
1st Winnner: MAWK (split.awk)
2nd Winnner: TAWK (array.awk)
2nd Winnner: GAWK (array2.awk)
2nd Winnner: MAWK (array3.awk)
2nd Winnner: GAWK (cast.awk)
2nd Winnner: TAWK (expr.awk)
2nd Winnner: TAWK (function.awk)
2nd Winnner: GAWK (io.awk)
2nd Winnner: TAWK (loop.awk)
2nd Winnner: GAWK (parsecsv.awk)
2nd Winnner: TAWK (recurse.awk)
2nd Winnner: TAWK (regexp.awk)
2nd Winnner: GAWK (split.awk)
So MAWK wins in 10/12 tests as "1st Winner".
Some more comments about the test:
1. cast.awk doesn't work for TAWK.
https://github.com/chadbrewbaker/awka/blob/master/benchmark/cast.awk
I've had to change:
sprintf(xx, "v1->ival = %d, v2->ptr = %s\n",v1,v2)
to:
sprintf("v1->ival = %d, v2->ptr = %s\n",v1,v2)
(not sure what 'xx' is meant to be).
2. Some of the AWK scripts contain Unix systems calls
like system("rm -f io.txt") which I haven't changed because
they don't cause errors although they don't work for Windows.
3. Although the results for GAWK generally look good,
there is a case where GAWK is extremely slow compared
to MAWK and TAWK:
*** regexp.awk ***
1st Winnner: MAWK time for regexp.awk:
0:00:00.48
GAWK time for regexp.awk:
0:00:27.59
2nd Winnner: TAWK time for regexp.awk:
0:00:01.26
https://github.com/chadbrewbaker/awka/blob/master/benchmark/regexp.awk
Either of these three cases seems to be very slow:
# Manually doing a gsub
while (match(s1, j))
s1 = substr(s1, 1, RSTART-1) Switch[j] substr(s1, RSTART+RLENGTH)
# Use gsub
gsub(j, Switch[j], s2)
# gsub, and prevent RE recompile
gsub(Switch_R[j], Switch[j], s3)
GAWK developpers might want to scrutinize this test file.