Hi!
I remade the simulations here using the "LANG=C" locale suggested and the results are these (the values between parentheses indicate how faster (-) or slower (+) the timings are compared to my previous post's timings):
MAWK 1.3.3 (LANG=C):
Average Real Time: 3.138s (-0.137s) (Cumulated 313.771s (-13.749s))
Average User Time: 3.112s (-0.153s) (Cumulated 311.232s (-15.244s))
Average System Time: 0.003s (-0.000s) (Cumulated 000.336s (-00.004s))
MAWK 1.3.4 (LANG=C):
Average Real Time: 2.568s (-0.024s) (Cumulated 256.824s (-2.341s))
Average User Time: 2.546s (-0.038s) (Cumulated 254.640s (-3.712s))
Average System Time: 0.003s (+0.001s) (Cumulated 000.304s (+0.080s))
GAWK 4.1.0 (LANG=C):
Average Real Time: 3.154s (-0.124s) (Cumulated 315.426s (-12.381s))
Average User Time: 3.124s (-0.144s) (Cumulated 312.368s (-14.412s))
Average System Time: 0.006s (+0.003s) (Cumulated 000.608s (+00.260s))
NAWK (LANG=C):
Average Real Time: 7.504s (+0.084s) (Cumulated 750.387s (+8.385s))
Average User Time: 7.464s (+0.055s) (Cumulated 746.396s (+5.456s))
Average System Time: 0.005s (+0.002s) (Cumulated 000.452s (+0.200s))
The cumulated timings are for 100 runs each.
As you can see, it is true that setting "LANG=C" improves the timings for "gawk-4.1.0", BUT it does also improve the timings for the "mawk"s too.
Both "mawk"s achieved improvements, being "mawk 1.3.3" the one that more reduced all of its timings.
The newest "mawk 1.3.4" reduced even more its timings in more than 2 seconds for "real time" and more than 3 seconds for "user time".
The latest "gawk" flavor, "gawk-4.1.0", could also reduce its timings, but not as remarkable as a 10 fold factor.
The good old "nawk" got worse results than its previous timings.
Janis Papanagnou:
>I find it interesting. There already have been such comparisons in the past
>and one result had been that speed also significantly depends on the actual
>features used and the way test implementation was done. So my first comment
>would be a request to post your actual code.
The AWK code I made for the genetic algorithm can be seen here:
http://pastebin.com/LgHkZR82
Please, take into account I'm no AWK guru and I've just finished chapter 2 of the book 1988 "The AWK Programming Language" by Aho, Weinberger and Kernighan. So, it's no surprise the code is a quick and dirty genetic algorithm that I wrote in one afternoon. :-)
>Additional information about how you performed the test runs would also be
>interesting, just to be able to be sure about any "external effects"
The tests were performed using this command in Linux:
for((i = 0; i < 100; i++)); do (time awk -f code.awk > /dev/null) 2>> times.dat; done
I did the same command for each one of the AWKs I used, that is, "mawk 1.3.3", "mawk 1.3.4", "gawk-4.1.0", and "nawk".
Any comments on the advantages/disadvantages of this approach are welcome.
>Yeah, it's a pleasure. Though I wonder about your statement WRT being less
>error prone than C++. Not having strict typing and lacking compile time
>checks I always found, e.g,. C++ (or any other strictly typed and compiled
>languages) to be better in that respect. [. . .]
I used templates in C++ and all those "nice" stuffs the language has to offer. Sometimes, when a bug/error occurred, the compiler sent so many pages of errors/warnings/etc that I didn't know what to do with them. :-)
If you want a simple and fast sense of what I'm saying by "AWK is *MUCH* easier and less bug prone than coding in C/C++", see this:
http://bit.ly/1k2sYbR
You're less likely to commit mistakes when dealing with 3 lines of code rather than dealing with 50.
By the way, thank you very much for the comments! :-)
Best Regards!
Marcelo