Google Gruppi non supporta più i nuovi post o le nuove iscrizioni Usenet. I contenuti storici continuano a essere visibili.

did gawk get faster in the last five years?

130 visualizzazioni
Passa al primo messaggio da leggere

timm

da leggere,
29 gen 2016, 19:31:4929/01/16
a
used to be an gawk fanatic 5 years ago

but i gave that up when my timing trials showed gawk was ridiculous slow than, say python's loops

but just today, i tried the following and found that gawk's loops compete with python

specifically, to add 100,000,000 nums, gawk took 10.9 secs and python took 11 .0 secs (or 17.8 secs depending on the python code)

anyone know what has changed? anything else been optimized in gawk in the last 5 years?

ADVAthanxNCE

t

p.s. sample code

===========================
# add 100,000,000 nums in gawk
$ cat count.awk
# add 100,000,000 nums in gawk
BEGIN {for(i=1;i<10^8;i++) j+=i; print j}

$ time gawk -f count.awk
49999995000000

real 0m10.968s
user 0m10.891s
sys 0m0.030s

===========================
$ cat count1.py
# add 100,000,000 nums in python
j = 0
for i in xrange(10**8):
j += i
print j

$ time python count1.py
49999995000000

real 0m11.022s
user 0m10.941s
sys 0m0.036s

$ cat count2.py
# add 100,000,000 nums in python with a while loop
j = i = 0
n=10**8
while i < n:
j += i
i += 1
print j

time python count2.py
49999995000000

real 0m17.830s
user 0m17.572s
sys 0m0.124s

BartC

da leggere,
30 gen 2016, 16:44:3330/01/16
a
On 30/01/2016 00:31, timm wrote:

> specifically, to add 100,000,000 nums, gawk took 10.9 secs and python took 11 .0 secs (or 17.8 secs depending on the python code)

> ===========================
> # add 100,000,000 nums in gawk
> $ cat count.awk
> # add 100,000,000 nums in gawk
> BEGIN {for(i=1;i<10^8;i++) j+=i; print j}
>
> $ time gawk -f count.awk
> 49999995000000

> ===========================
> $ cat count1.py
> # add 100,000,000 nums in python
> j = 0
> for i in xrange(10**8):
> j += i
> print j
>
> $ time python count1.py
> 49999995000000

That's odd; those results seem to be for sums to 10 million-1 not 100
million-1.

Also Python starts from 0 not 1, but that's not significant. Python is
faster if the code is put inside a function (something to do with local
name lookups being faster).

(On my machine, not a fast one, that Python code ran in 31 seconds to
give 4999999950000000. Inside a function, it took 19 seconds. With PyPy,
about 8 or 9 seconds.

Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)

--
Bartc

Janis Papanagnou

da leggere,
30 gen 2016, 16:52:5230/01/16
a
On 30.01.2016 22:44, BartC wrote:
> [...]
> Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)

This is a very old version; meanwhile we have 4.1, and a lot has changed
in gawk in the past eight years. I suggest to get a newer version.

Janis

Hongyi Zhao

da leggere,
30 gen 2016, 21:03:2930/01/16
a
Here, the results is as follows:

$ time awk 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m7.697s
user 0m7.696s
sys 0m0.004s
$ awk --version
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)
Copyright (C) 1989, 1991-2014 Free Software Foundation.


>
> Janis





--
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

BartC

da leggere,
31 gen 2016, 10:09:1031/01/16
a
I couldn't find a newer ready-built binary for Windows. I tried
something called MAWK, which ran in 10 seconds (GAWK was 21), but gave a
result of 5e+15, so it doesn't work the same way.

--
bartc

Andrew Schorr

da leggere,
31 gen 2016, 10:09:5131/01/16
a
These tests were all done on a Fedora 21 linux system:

Using gawk-3.1.8 from May, 2010:

bash-4.3$ time ./gawk 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m10.795s
user 0m10.798s
sys 0m0.000s

Using gawk-4.1.3:

bash-4.3$ time ./gawk 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m12.367s
user 0m12.369s
sys 0m0.000s

Using current gawk master branch:

bash-4.3$ time ./gawk 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m9.275s
user 0m9.278s
sys 0m0.000s

So I don't see any huge improvement.

Regards,
Andy

Luuk

da leggere,
31 gen 2016, 10:10:5331/01/16
a
31 compared to 8 or 9 seems odd.....

luuk@opensuse:~/tmp> cat count.py
# add 100,000,000 nums in python
j = 0
for i in xrange(10**8):
j += i
print j

luuk@opensuse:~/tmp> cat count2.py
# add 100,000,000 nums in python
def count2():
j = 0
for i in xrange(10**8):
j += i
return j
print count2()

luuk@opensuse:~/tmp> time python count.py
4999999950000000

real 0m15.492s
user 0m15.484s
sys 0m0.000s
luuk@opensuse:~/tmp> time python count2.py
4999999950000000

real 0m6.022s
user 0m6.016s
sys 0m0.000s
luuk@opensuse:~/tmp> time gawk -f count.awk
4999999950000000

real 0m14.729s
user 0m14.720s
sys 0m0.000s
luuk@opensuse:~/tmp> cat count.awk
# add 100,000,000 nums in gawk
BEGIN {for(i=1;i<10^8;i++) j+=i; print j}

luuk@opensuse:~/tmp>

gawk 4.3.1
python 2.7.6

Kenny McCormack

da leggere,
31 gen 2016, 10:19:4331/01/16
a
I'm sure there are repos out there that maintain downloadable up-to-date
pre-compiled versions of GAWK for Windows, but, TBH, I couldn't find any in
a quick bit of Googling.

My recommendation to you, which I suspect you won't like, since you don't
seem to like to take initiative on these sorts of things, is to install
Cygwin and then to get the latest (4.1.3) sources form ftp.gnu.org and
compile it yourself. I know some people don't like Cygwin, for various
religious and/or dogmatic reasons, but I've always found that compiling
with Cygwin gives the best results for Gawk (and other pure-Unix-y things
as well) on Windows.

--
One of the best lines I've heard lately:

Obama could cure cancer tomorrow, and the Republicans would be
complaining that he had ruined the pharmaceutical business.

(Heard on Stephanie Miller = but the sad thing is that there is an awful lot
of direct truth in it. We've constructed an economy in which eliminating
cancer would be a horrible disaster. There are many other such examples.)

Luuk

da leggere,
31 gen 2016, 10:31:4531/01/16
a
On 31-01-16 16:19, Kenny McCormack wrote:
> In article <n8l7t9$m5l$1...@dont-email.me>, BartC <b...@freeuk.com> wrote:
>> On 30/01/2016 21:52, Janis Papanagnou wrote:
>>> On 30.01.2016 22:44, BartC wrote:
>>>> [...]
>>>> Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)
>>>
>>> This is a very old version; meanwhile we have 4.1, and a lot has changed
>>> in gawk in the past eight years. I suggest to get a newer version.
>>
>> I couldn't find a newer ready-built binary for Windows. I tried
>> something called MAWK, which ran in 10 seconds (GAWK was 21), but gave a
>> result of 5e+15, so it doesn't work the same way.
>
> I'm sure there are repos out there that maintain downloadable up-to-date
> pre-compiled versions of GAWK for Windows, but, TBH, I couldn't find any in
> a quick bit of Googling.
>

http://sourceforge.net/projects/ezwinports/files/?source=navbar

it currently has 4.1.3 (binarys)

Janis Papanagnou

da leggere,
31 gen 2016, 11:38:3231/01/16
a
On 31.01.2016 03:03, Hongyi Zhao wrote:
> On Sat, 30 Jan 2016 22:52:50 +0100, Janis Papanagnou wrote:
>
>> On 30.01.2016 22:44, BartC wrote:
>>> [...]
>>> Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)
>>
>> This is a very old version; meanwhile we have 4.1, and a lot has changed
>> in gawk in the past eight years. I suggest to get a newer version.
>
> Here, the results is as follows:
>
> $ time awk 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
> 4999999950000000
>
> real 0m7.697s
> user 0m7.696s
> sys 0m0.004s
> $ awk --version
> GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)
> Copyright (C) 1989, 1991-2014 Free Software Foundation.

On my system the python code needs around 12sec and the awk code around 10sec
on average.

Interestingly modifying the awk code to

BEGIN {z=10^8; for(i=1;i<z;i++) j+=i; print j}

will gain another 1+ sec on average. This surprised me a bit; I thought that
gawk would optimize constant expressions.

Janis

pop

da leggere,
31 gen 2016, 12:44:0231/01/16
a
Kenny McCormack wrote on 1/31/2016 9:19 AM:
> In article <n8l7t9$m5l$1...@dont-email.me>, BartC <b...@freeuk.com> wrote:
>> On 30/01/2016 21:52, Janis Papanagnou wrote:
>>> On 30.01.2016 22:44, BartC wrote:
>>>> [...]
>>>> Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)
>>>
>>> This is a very old version; meanwhile we have 4.1, and a lot has changed
>>> in gawk in the past eight years. I suggest to get a newer version.
>>
>> I couldn't find a newer ready-built binary for Windows. I tried
>> something called MAWK, which ran in 10 seconds (GAWK was 21), but gave a
>> result of 5e+15, so it doesn't work the same way.
>
> I'm sure there are repos out there that maintain downloadable up-to-date
> pre-compiled versions of GAWK for Windows, but, TBH, I couldn't find any in
> a quick bit of Googling.

The best windows version I use is at:
<http://www.klabaster.com/freeware.htm>
it is a single .exe and doesn't need the libs that the sourceforge
version requires.

BartC

da leggere,
31 gen 2016, 15:41:2731/01/16
a
On 31/01/2016 15:31, Luuk wrote:
> On 31-01-16 16:19, Kenny McCormack wrote:
>> In article <n8l7t9$m5l$1...@dont-email.me>, BartC <b...@freeuk.com> wrote:
>>> On 30/01/2016 21:52, Janis Papanagnou wrote:
>>>> On 30.01.2016 22:44, BartC wrote:
>>>>> [...]
>>>>> Your Awk code took 21 seconds on a GAWK.EXE 3.1.6 from 2008.)
>>>>
>>>> This is a very old version; meanwhile we have 4.1, and a lot has
>>>> changed
>>>> in gawk in the past eight years. I suggest to get a newer version.
>>>
>>> I couldn't find a newer ready-built binary for Windows. I tried
>>> something called MAWK, which ran in 10 seconds (GAWK was 21), but gave a
>>> result of 5e+15, so it doesn't work the same way.
>>
>> I'm sure there are repos out there that maintain downloadable up-to-date
>> pre-compiled versions of GAWK for Windows, but, TBH, I couldn't find
>> any in
>> a quick bit of Googling.
>>
>
> http://sourceforge.net/projects/ezwinports/files/?source=navbar
>
> it currently has 4.1.3 (binarys)

Thanks. That one takes 14 or 15 seconds.

--
Bartc

Hongyi Zhao

da leggere,
31 gen 2016, 20:02:1831/01/16
a
On Sun, 31 Jan 2016 17:38:30 +0100, Janis Papanagnou wrote:

> Interestingly modifying the awk code to
>
> BEGIN {z=10^8; for(i=1;i<z;i++) j+=i; print j}
>
> will gain another 1+ sec on average.

I also verified this on my Debian Jessie box:

$ time awk 'BEGIN {z=10^8; for(i=1;i<z;i++) j+=i; print j}'
4999999950000000

real 0m6.504s
user 0m6.504s
sys 0m0.004s

Regards

Luuk

da leggere,
1 feb 2016, 14:26:5001/02/16
a Hongyi Zhao
On 01-02-16 02:02, Hongyi Zhao wrote:
> On Sun, 31 Jan 2016 17:38:30 +0100, Janis Papanagnou wrote:
>
>> Interestingly modifying the awk code to
>>
>> BEGIN {z=10^8; for(i=1;i<z;i++) j+=i; print j}
>>
>> will gain another 1+ sec on average.
>
> I also verified this on my Debian Jessie box:
>
> $ time awk 'BEGIN {z=10^8; for(i=1;i<z;i++) j+=i; print j}'
> 4999999950000000
>
> real 0m6.504s
> user 0m6.504s
> sys 0m0.004s
>
> Regards
>

This does not say anything......

If i tell you i can run from my living room to my kitchen in 10 minutes,
you might think i have a big house.

But maybe it's just /me running (very) slow....

;)

Aharon Robbins

da leggere,
1 feb 2016, 15:56:1001/02/16
a
Hi Tim. Long time no see. Welcome back to gawk.

In article <2fece733-7976-46c0...@googlegroups.com>,
timm <tim.m...@gmail.com> wrote:
>but just today, i tried the following and found that gawk's loops
>compete with python
> ...
>anyone know what has changed? anything else been optimized in gawk in
>the last 5 years?

Gawk 4.0 came with a new execution engine based on byte codes instead
of recursively evaluating the parse tree. This also brings with it
a debugger in the style of GDB, which I find useful.

The -O option causes gawk to do constant folding. Here are my
timings:

$ time gawk-3.1.8 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m12.094s
user 0m12.048s
sys 0m0.000s

$ time gawk-3.1.8 'BEGIN {k=10^8; for(i=1;i<k;i++) j+=i; print j}'
4999999950000000

real 0m10.484s
user 0m10.477s
sys 0m0.003s

$ time gawk-4.1.3 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m11.924s
user 0m11.913s
sys 0m0.004s

$ time gawk-4.1.3 -O 'BEGIN {for(i=1;i<10^8;i++) j+=i; print j}'
4999999950000000

real 0m9.109s
user 0m9.103s
sys 0m0.003s

We will probably make -O the default in the next major release (4.2
or 5.0; not sure what I'll call it).

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com

Aharon Robbins

da leggere,
1 feb 2016, 15:59:0501/02/16
a
In article <56ae28e0$0$23778$e4fe...@news.xs4all.nl>,
Luuk <lu...@invalid.lan> wrote:
>http://sourceforge.net/projects/ezwinports/files/?source=navbar
>
>it currently has 4.1.3 (binarys)

This is the Windows port that I recommend. Eli Zaretskii, who
produces these binaries, is one of the gawk developers.

Luuk

da leggere,
2 feb 2016, 14:18:2402/02/16
a
On 01-02-16 21:56, Aharon Robbins wrote:
> In article <56ae28e0$0$23778$e4fe...@news.xs4all.nl>,
> Luuk <lu...@invalid.lan> wrote:
>> http://sourceforge.net/projects/ezwinports/files/?source=navbar
>>
>> it currently has 4.1.3 (binarys)
>
> This is the Windows port that I recommend. Eli Zaretskii, who
> produces these binaries, is one of the gawk developers.
>

I did not know that!
(it was just the first hit i found....)
Thanks for this extra info

Luuk

da leggere,
2 feb 2016, 14:27:2902/02/16
a
On 01-02-16 21:53, Aharon Robbins wrote:

>
> We will probably make -O the default in the next major release (4.2
> or 5.0; not sure what I'll call it).
>
> Arnold
>

A short test shows that this might not be a goog idea to Windows users ;)

C:\temp\gawk\bin>(
More? echo %TIME%
More? gawk -O "BEGIN {for(i=1;i<10^8;i++) j+=i; print j}"
More? echo %TIME%
More? )
20:21:48.48
4999999950000000
20:21:56.00

C:\temp\gawk\bin>(
More? echo %TIME%
More? gawk -O "BEGIN {z=10^8;for(i=1;i<z;i++) j+=i; print j}"
More? echo %TIME%
More? )
20:22:14.98
4999999950000000
20:22:39.78

C:\temp\gawk\bin>(
More? echo %TIME%
More? gawk "BEGIN {z=10^8;for(i=1;i<z;i++) j+=i; print j}"
More? echo %TIME%
More? )
20:23:19.92
4999999950000000
20:23:25.87

C:\temp\gawk\bin>gaw --version
'gaw' is not recognized as an internal or external command,
operable program or batch file.

C:\temp\gawk\bin>gawk --version
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.0-p8, GNU MP 5.0.2)
Copyright (C) 1989, 1991-2015 Free Software Foundation.

This program is free software; you can redistribute it

C:\temp\gawk\bin>ver

Microsoft Windows [Version 10.0.10586]

Marc de Bourget

da leggere,
3 feb 2016, 16:12:4203/02/16
a
With Mathematics GAWK seems to be really good! However, this is only a small topic if you want to compare speed between GAWK and Python. There are a lot of other topics like reading from files and pattern matching speed.
0 nuovi messaggi