Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A Simple Run Time Comparison Of AWKs Running A Genetic Algorithm

336 views
Skip to first unread message

Marcelo de Brito

unread,
Mar 7, 2014, 7:46:46 AM3/7/14
to
Hi!

It may not be of interest for most of you, but I did a simple run time comparison of AWKs (mawk 1.3.3, mawk 1.3.4, gawk 4.1.0, and nawk) running a simple genetic algorithm.

Each one of the simulations/runs was made for a population of 25 strings/chromossomes and 1000 generations/iterations. The timing was taken through the "time" command in Linux. Each AWK ran the same algorithm 100 times.

The genetic algorithm fitness function used was the Rosenbrock Function.

I found these results:

MAWK 1.3.3:

Average Real Time:. . . 3.275s (Cumulated: 327.520s for 100 runs)
Average User Time:. . . 3.265s (Cumulated: 326.476s for 100 runs)
Average System Time:. . 0.003s (Cumulated:. .0.340s for 100 runs)

MAWK 1.3.4:

Average Real Time:. . . 2.592s (Cumulated: 259.165s for 100 runs)
Average User Time:. . . 2.584s (Cumulated: 258.352s for 100 runs)
Average System Time:. . 0.002s (Cumulated:. .0.224s for 100 runs)

GAWK 4.1.0:

Average Real Time:. . . 3.278s (Cumulated: 327.807s for 100 runs)
Average User Time:. . . 3.268s (Cumulated: 326.780s for 100 runs)
Average System Time:. . 0.003s (Cumulated:. .0.348s for 100 runs)

NAWK:

Average Real Time:. . . 7.420s (Cumulated: 742.002s for 100 runs)
Average User Time:. . . 7.409s (Cumulated: 740.940s for 100 runs)
Average System Time:. . 0.003s (Cumulated:. .0.252s for 100 runs)

Some obvious observations:

01. The new mawk by Thomas E. Dickey is well faster than any other AWK. It does really matter if you're dealing with big data sets.

02. The old mawk by Michael D. Brennan, even though it's almost 18 years old (last update in November, 1996), performed as good as the newest gawk flavor (gawk-4.1.0).

03. nawk achieved the poorest results, being almost three times slower than mawk 1.3.4.

Now, not so obvious observations:

01. Implementing a genetic algorithm in AWK was a pleasure from the beginning to the end. *MUCH* easier and less bug prone than coding it in C or C++.

02. AWK can even help the programmer to verify the ouput of his own code. I did that in the terminal so many times, for example, to check the string/chromossome length. I just piped the strings/chromossomes the genetic algorithm yielded into AWK and used the "length" function. It worked like a charm!

03. The AWK-coded genetic algorithm is barely 280 lines. I used "awka" to translate it into C and the translated code is more than 1000 lines!

04. I'm thinking about coding a full AWK set of artificial intelligence tools. The easiness of coding in AWK and the raw speed of mawk are compelling.

If someone has anything to say, please, let me know! :-)

Best Regards!

Marcelo

admini...@mintywestapps.net

unread,
Mar 7, 2014, 11:15:44 AM3/7/14
to
On Friday, March 7, 2014 8:46:46 PM UTC+8, Marcelo de Brito wrote:
> It may not be of interest for most of you, but I did a simple run time comparison of AWKs (mawk 1.3.3, mawk 1.3.4, gawk 4.1.0, and nawk) running a simple genetic algorithm.

I vote that:
mawk - is quick! (BUT)

Question.1.
Did you test the following conditions:

a.
http://stackoverflow.com/questions/12896590/gawk-ten-times-slower-with-lang-set-to-utf-8-compared-with-lang-c

"Definitely worth remembering - gawk was 10x slower with LANG set to UTF-8 compared with C"

I'm yet to test this fairly, but have a fair amount of faith in others test results. I'm a big fan of implementations of 'python/java for date & time', perl for sort (+some), (g)awk for match & actions - and now has some amazing potentials via extensions. Java for Beans/Interfaces (+you know) - but gawk4 includes... hmmm.

Just messing with you with this question, but I did test another awk:
1. V8, Nodejs awk running on a ram disk.
(My) Results were: Dog slow in comparison. But!!!! it is asynchronous AND for tens of thousands of pattern/match + actions/results - hmmm - imagination required for GPU adaptations.

2. gawk + mawk - not much in it if configured *fairly* (1 min + x secs on several million rows, tons of maths + matching results set merges on unsophisticated classic spindle harddisk infrastructure + RAM disk + compression + ssh privacy & remote filtering/offloading.

"Full credit to Linus for such good thread sharing of CPUs" - on the tests I've conducted:
1. for my limited result sets of a 100's x millions of rows x less than 132 cols (typical), on simple 2nd hand PC architecture, from a 2nd's shop - less than $1000AUS.

~+ If someone has anything to say, please, let me know! :-) ~+

gawk4 changes have interested me so much, I've regained interest in R&D, in my personal time + FreeBsd as a implementation test harness + plus Linux for comparison.

We've enough Linux options in the cloud (now - wow). But I wish google cloud infrastructure was more affordable for developers to implement "small developer scale" tests, to prove concepts - (Sorry off thread now - but still in context of mawk vs gawk). Cost drives me Googleapps + Droplets & Vm's + some personal infrastructure.

So much testing to do and my comments are limited - at this time - to:
1. gawks developments are exciting.
2. mawk is (still) stunning for performance and made a performance statement.

Where to now...

I'll be posting some questions as (g)awk grows. I think Arnold has chosen a fair path for the next 10yrs - a guess (of time).

Janis Papanagnou

unread,
Mar 7, 2014, 2:06:17 PM3/7/14
to
On 07.03.2014 13:46, Marcelo de Brito wrote:
> Hi!
>
> It may not be of interest for most of you, but I did a simple run time
> comparison of AWKs (mawk 1.3.3, mawk 1.3.4, gawk 4.1.0, and nawk) running a
> simple genetic algorithm.

I find it interesting. There already have been such comparisons in the past
and one result had been that speed also significantly depends on the actual
features used and the way test implementation was done. So my first comment
would be a request to post your actual code.

>
> Each one of the simulations/runs was made for a population of 25
> strings/chromossomes and 1000 generations/iterations. The timing was taken
> through the "time" command in Linux. Each AWK ran the same algorithm 100
> times.

Additional information about how you performed the test runs would also be
interesting, just to be able to be sure about any "external effects".

>
> The genetic algorithm fitness function used was the Rosenbrock Function.
>
> I found these results:
[snip]

The results look reasonable, and at first glance also seem to match with
similar tests published some years ago (modulo the newer versions of gawk
of course that were not available then).

>
> Some obvious observations:
>
> 01. The new mawk by Thomas E. Dickey is well faster than any other AWK. It
> does really matter if you're dealing with big data sets.
>
> 02. The old mawk by Michael D. Brennan, even though it's almost 18 years
> old (last update in November, 1996), performed as good as the newest gawk
> flavor (gawk-4.1.0).
>
> 03. nawk achieved the poorest results, being almost three times slower than
> mawk 1.3.4.
>
> Now, not so obvious observations:
>
> 01. Implementing a genetic algorithm in AWK was a pleasure from the
> beginning to the end. *MUCH* easier and less bug prone than coding it in C
> or C++.

Yeah, it's a pleasure. Though I wonder about your statement WRT being less
error prone than C++. Not having strict typing and lacking compile time
checks I always found, e.g,. C++ (or any other strictly typed and compiled
languages) to be better in that respect. In awk you can very fast write
compact code without much overhead. Though awk's features are very limited
and its expressiveness thus restricted to only very few concepts.

>
> 02. AWK can even help the programmer to verify the ouput of his own code. I
> did that in the terminal so many times, for example, to check the
> string/chromossome length. I just piped the strings/chromossomes the
> genetic algorithm yielded into AWK and used the "length" function. It
> worked like a charm!

Yep, the powerful one-liners that makes awk to what it is.

>
> 03. The AWK-coded genetic algorithm is barely 280 lines. I used "awka" to
> translate it into C and the translated code is more than 1000 lines!

The overhead of "normal" languages is not astonishing to me. (Sometimes
people here play golf and reduce even 200-line awk code to a fraction of
that. ;-)

>
> 04. I'm thinking about coding a full AWK set of artificial intelligence
> tools. The easiness of coding in AWK and the raw speed of mawk are
> compelling.

Certainly a welcome contribution.

>
> If someone has anything to say, please, let me know! :-)

Did so, as you wished. :-)

Janis

>
> Best Regards!
>
> Marcelo
>

Marcelo de Brito

unread,
Mar 8, 2014, 3:04:22 AM3/8/14
to
Hi!

I remade the simulations here using the "LANG=C" locale suggested and the results are these (the values between parentheses indicate how faster (-) or slower (+) the timings are compared to my previous post's timings):

MAWK 1.3.3 (LANG=C):

Average Real Time: 3.138s (-0.137s) (Cumulated 313.771s (-13.749s))
Average User Time: 3.112s (-0.153s) (Cumulated 311.232s (-15.244s))
Average System Time: 0.003s (-0.000s) (Cumulated 000.336s (-00.004s))

MAWK 1.3.4 (LANG=C):

Average Real Time: 2.568s (-0.024s) (Cumulated 256.824s (-2.341s))
Average User Time: 2.546s (-0.038s) (Cumulated 254.640s (-3.712s))
Average System Time: 0.003s (+0.001s) (Cumulated 000.304s (+0.080s))

GAWK 4.1.0 (LANG=C):

Average Real Time: 3.154s (-0.124s) (Cumulated 315.426s (-12.381s))
Average User Time: 3.124s (-0.144s) (Cumulated 312.368s (-14.412s))
Average System Time: 0.006s (+0.003s) (Cumulated 000.608s (+00.260s))

NAWK (LANG=C):

Average Real Time: 7.504s (+0.084s) (Cumulated 750.387s (+8.385s))
Average User Time: 7.464s (+0.055s) (Cumulated 746.396s (+5.456s))
Average System Time: 0.005s (+0.002s) (Cumulated 000.452s (+0.200s))

The cumulated timings are for 100 runs each.

As you can see, it is true that setting "LANG=C" improves the timings for "gawk-4.1.0", BUT it does also improve the timings for the "mawk"s too.

Both "mawk"s achieved improvements, being "mawk 1.3.3" the one that more reduced all of its timings.

The newest "mawk 1.3.4" reduced even more its timings in more than 2 seconds for "real time" and more than 3 seconds for "user time".

The latest "gawk" flavor, "gawk-4.1.0", could also reduce its timings, but not as remarkable as a 10 fold factor.

The good old "nawk" got worse results than its previous timings.

Janis Papanagnou:

>I find it interesting. There already have been such comparisons in the past
>and one result had been that speed also significantly depends on the actual
>features used and the way test implementation was done. So my first comment
>would be a request to post your actual code.

The AWK code I made for the genetic algorithm can be seen here: http://pastebin.com/LgHkZR82

Please, take into account I'm no AWK guru and I've just finished chapter 2 of the book 1988 "The AWK Programming Language" by Aho, Weinberger and Kernighan. So, it's no surprise the code is a quick and dirty genetic algorithm that I wrote in one afternoon. :-)

>Additional information about how you performed the test runs would also be
>interesting, just to be able to be sure about any "external effects"

The tests were performed using this command in Linux:

for((i = 0; i < 100; i++)); do (time awk -f code.awk > /dev/null) 2>> times.dat; done

I did the same command for each one of the AWKs I used, that is, "mawk 1.3.3", "mawk 1.3.4", "gawk-4.1.0", and "nawk".

Any comments on the advantages/disadvantages of this approach are welcome.

>Yeah, it's a pleasure. Though I wonder about your statement WRT being less
>error prone than C++. Not having strict typing and lacking compile time
>checks I always found, e.g,. C++ (or any other strictly typed and compiled
>languages) to be better in that respect. [. . .]

I used templates in C++ and all those "nice" stuffs the language has to offer. Sometimes, when a bug/error occurred, the compiler sent so many pages of errors/warnings/etc that I didn't know what to do with them. :-)

If you want a simple and fast sense of what I'm saying by "AWK is *MUCH* easier and less bug prone than coding in C/C++", see this: http://bit.ly/1k2sYbR

You're less likely to commit mistakes when dealing with 3 lines of code rather than dealing with 50.

By the way, thank you very much for the comments! :-)

Best Regards!

Marcelo

Janis Papanagnou

unread,
Mar 8, 2014, 3:53:01 AM3/8/14
to
On 08.03.2014 09:04, Marcelo de Brito wrote:
>
> The AWK code I made for the genetic algorithm can be seen here: http://pastebin.com/LgHkZR82

Is that a bug or only dead code...?

function convert(vet, bits, j, a, x, sum)
{
...
return sum;
sum = 0;
}

>
> Please, take into account I'm no AWK guru and I've just finished chapter 2 of the book 1988 "The AWK Programming Language" by Aho, Weinberger and Kernighan. So, it's no surprise the code is a quick and dirty genetic algorithm that I wrote in one afternoon. :-)

Don't worry. :-)

Janis

> [...]


0 new messages