Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Awk vs grep (speed test)

2,141 views
Skip to first unread message

Håkon Hægland

unread,
Jan 2, 2014, 6:07:57 AM1/2/14
to
This question is based on the following question on SO:

http://stackoverflow.com/questions/20879442/awk-vs-grep-in-time

Just for fun I tested this now on my system (Ubuntu 12.04): First generate a large file:

#! /bin/bash
linelen=80
pos=50
numlines=1000000 #A million lines -> gives 77 Mb
bigFileName="file"
str=$(awk -vN="$linelen" -vp="$pos" 'BEGIN{for (i=1;i<(N-3);i++) str=str "x"; printf "%s%s%s\n", substr(str,1,p-1),"New", substr(str,p)}')
awk -vN="$numlines" -vstr="$str" 'BEGIN{for (i=1;i<=N;i++) print str}' > "$bigFileName"

Then run speed test:

#! /bin/bash
echo -n "Grepping.."
time grep -c New file
echo
echo -n "Awk.."
time gawk 'BEGIN{i=0} /New/{i++} END{print i}' file


which gives output:

Grepping..1000000

real 0m1.106s
user 0m1.080s
sys 0m0.016s

Awk..1000000

real 0m0.298s
user 0m0.276s
sys 0m0.016s


This is using GNU Awk 3.1.8. If I use GNU Awk 4.1.0, I get

Awk..1000000

real 0m0.250s
user 0m0.224s
sys 0m0.020s

This indicates that awk is about 3 to 4 times faster than grep. However, the question on SO indicates the opposite (grep is 7 times faster than awk)

Anybody who knows more about these things?

Janis Papanagnou

unread,
Jan 2, 2014, 9:03:34 AM1/2/14
to
On 02.01.2014 12:07, H�kon H�gland wrote:
> This question is based on the following question on SO:
>
> http://stackoverflow.com/questions/20879442/awk-vs-grep-in-time
>
> Just for fun I tested this now on my system (Ubuntu 12.04): First generate a large file:
>
[snip code]
>
> This indicates that awk is about 3 to 4 times faster than grep. However,
> the question on SO indicates the opposite (grep is 7 times faster than
> awk)
>
> Anybody who knows more about these things?

It depends on the awk version. There are performance differences between
old awks and gawk of a magnitude (on average, depending on the functions
performed). For some data see http://awka.sourceforge.net/compare.html;
but I think this data is about 15 years old, though in principle still
valid.

But WRT your posted shell programs and data I get different results...

Grepping..1000000

real 0m0.158s
user 0m0.120s
sys 0m0.040s

Awk..1000000

real 0m0.326s
user 0m0.300s
sys 0m0.030s

...which is more what I expected, i.e., awk being slower than grep.

Janis

Håkon Hægland

unread,
Jan 2, 2014, 5:02:03 PM1/2/14
to
> But WRT your posted shell programs and data I get different results...

> Grepping..1000000

> real 0m0.158s
>
> user 0m0.120s
>
> sys 0m0.040s
>
>
> Awk..1000000

> real 0m0.326s
>
> user 0m0.300s
>
> sys 0m0.030s
>
>
>
> ...which is more what I expected, i.e., awk being slower than grep.

Thanks for the reply! What version of awk and grep are you running? On what system?

I am ran the test on Ubuntu 12.04 with GNU grep 2.10 and GNU Awk 3.1.8.

Janis Papanagnou

unread,
Jan 2, 2014, 5:16:04 PM1/2/14
to
On 02.01.2014 23:02, H�kon H�gland wrote:
>
> Thanks for the reply! What version of awk and grep are you running? On what system?
>
> I am ran the test on Ubuntu 12.04 with GNU grep 2.10 and GNU Awk 3.1.8.
>

GNU grep 2.5.4
GNU Awk 4.1.0
Linux 2.6.32 (64 bit)

Janis

Janis Papanagnou

unread,
Jan 2, 2014, 5:18:34 PM1/2/14
to
On 02.01.2014 23:02, H�kon H�gland wrote:
>
> I am ran the test on Ubuntu 12.04 with GNU grep 2.10 and GNU Awk 3.1.8.

BTW, gawk 3.1.8 is very old, a lot has changed, and you should consider
to update to version 4.x if you can.

Janis

Aharon Robbins

unread,
Jan 2, 2014, 11:39:54 PM1/2/14
to
In article <la4ojq$tq7$3...@news.m-online.net>,
This is true.

The locale in use can also make a big difference, as well as the
awk version.

And there is a new grep version just releaed, 2.16, which is likely to
be faster than 2.10.

The default awk on Ubuntu is mawk, which only works in 8 bit characters
and ignores locales. It would not surprise me if it were very fast.

Setting LC_ALL=C in the environment will speed up both grep and gawk.

Also, you should throw away the first run's timings since the system
has to read the file from disk. The 2nd run will be faster since the
file's contents are in the buffer cache.

Timing things isn't as simple as it might look.

And in any case, you're comparing apples to oranges; the programs do
different things, and it's OK if awk is slower at the straight I/O and
pattern matching.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon
D.N. Shimshon 9978500 ISRAEL

Janis Papanagnou

unread,
Jan 3, 2014, 7:31:35 AM1/3/14
to
On 03.01.2014 05:39, Aharon Robbins wrote:
> In article <la4ojq$tq7$3...@news.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
>> On 02.01.2014 23:02, Hᅵkon Hᅵgland wrote:
>>>
>>> I am ran the test on Ubuntu 12.04 with GNU grep 2.10 and GNU Awk 3.1.8.
>>
>> BTW, gawk 3.1.8 is very old, a lot has changed, and you should consider
>> to update to version 4.x if you can.
>
> This is true.
>
> The locale in use can also make a big difference, as well as the
> awk version.
>
> And there is a new grep version just releaed, 2.16, which is likely to
> be faster than 2.10.
>
> The default awk on Ubuntu is mawk, which only works in 8 bit characters
> and ignores locales. It would not surprise me if it were very fast.
>
> Setting LC_ALL=C in the environment will speed up both grep and gawk.
>
> Also, you should throw away the first run's timings since the system
> has to read the file from disk. The 2nd run will be faster since the
> file's contents are in the buffer cache.
>
> Timing things isn't as simple as it might look.
>
> And in any case, you're comparing apples to oranges; the programs do
> different things, and it's OK if awk is slower at the straight I/O and
> pattern matching.

All very true.

I just want to add that the timings I posted consider many of the
issues mentioned; specifically buffering and LC_ALL=C. Adjusting
the function to a comparable 'grep' (without counting) also gives
similar results, specifically a clear factor of 2.

Janis

0 new messages