Performance of string functions

1,932 views
Skip to first unread message

Vladimir Sibirov

unread,
Mar 13, 2011, 12:20:15 PM3/13/11
to golan...@googlegroups.com
I wrote a simple benchmark for goroutines used in asynchronous input processing but then I decided to compare it to the same program without goroutines (just serial input). The results of comparison might be interesting on their own (I'm going to continue it another time), but then I decided to write a similar program in Python. I compared a simple program that counts words in a file in Go to a literal conversion of it in Python and to my surprise the results were nearly like this:


Both programs take file path as the only argument. I used this file to test it on a file which is big enough: http://dl.dropbox.com/u/15014241/Go/bible.txt.gz (un-gzipped of course).

Then I made a dozen experiments and found out that the bottle neck in Go program is this:
strings.Trim(word, cutSet) != ""

If I replace it with just word != "", then the Go script takes just ~ 0.12 sec. I decided to make alternative Trim() implementations. With callbacks replaced with conventional loops I got somewhat like ~ 0.9 sec, but it was still too slow. Then I thought maybe string slices are slow and, as the file is in latin characters, tried converting strings to []byte to operate on byte arrays directly. That was no faster than string slices. So, I got stuck at this point. Another interesting fact is that strings.Split() does not damage performance so much.

If anybody has considerable ideas about such behavior of string functions it would be nice to hear them. The cause of this strings problem may also have impact on Go's problems with regexp performance (see for example http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=go&lang2=python3). I think nobody doubts that text processing speed is important in the modern programming world.

John Asmuth

unread,
Mar 13, 2011, 1:02:55 PM3/13/11
to golang-nuts
One thing to keep in mind is that many of python's commonly used
libraries and functions are written in C.

On Mar 13, 12:20 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> I wrote a simple benchmark for goroutines used in asynchronous input
> processing but then I decided to compare it to the same program without
> goroutines (just serial input). The results of comparison might be
> interesting on their own (I'm going to continue it another time), but then I
> decided to write a similar program in Python. I compared a simple program
> that counts words in a file in Go to a literal conversion of it in Python
> and to my surprise the results were nearly like this:
>
> wordcount_serial.go (http://dl.dropbox.com/u/15014241/Go/wordcount_serial.go)
> ~ 1.1 sec
> wordcount_serial.py (http://dl.dropbox.com/u/15014241/Go/wordcount_serial.py)
> ~ 0.54 sec
>
> Both programs take file path as the only argument. I used this file to test
> it on a file which is big enough:http://dl.dropbox.com/u/15014241/Go/bible.txt.gz(un-gzipped of course).
>
> Then I made a dozen experiments and found out that the bottle neck in Go
> program is this:
> strings.Trim(word, cutSet) != ""
>
> If I replace it with just word != "", then the Go script takes just ~ 0.12
> sec. I decided to make alternative Trim() implementations. With callbacks
> replaced with conventional loops I got somewhat like ~ 0.9 sec, but it was
> still too slow. Then I thought maybe string slices are slow and, as the file
> is in latin characters, tried converting strings to []byte to operate on
> byte arrays directly. That was no faster than string slices. So, I got stuck
> at this point. Another interesting fact is that strings.Split() does not
> damage performance so much.
>
> If anybody has considerable ideas about such behavior of string functions it
> would be nice to hear them. The cause of this strings problem may also have
> impact on Go's problems with regexp performance (see for examplehttp://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=go...).

unread,
Mar 13, 2011, 1:12:10 PM3/13/11
to Vladimir Sibirov, golang-nuts
I was able to speed up the Go code 2.6 times. See the attached diffs.

> I think nobody doubts that text processing speed is important in the modern 
> programming world.

I *do* doubt that. The "modern programming world", as you call it, is so full of inefficient code that it sometimes frightens me.
wordcount_serial1.go.diff
wordcount_serial2.go.diff

Brad Fitzpatrick

unread,
Mar 13, 2011, 1:35:27 PM3/13/11
to ⚛, Vladimir Sibirov, golang-nuts
And you get another 35% speed-up by not doing the hash lookup:

var cutSet_map = make([]bool, 256)

func init() {
for _, rune := range cutSet {
cutSet_map[rune] = true
}
}

func cutSet_func(rune int) bool {
return rune < 256 && cutSet_map[rune]

Dmitry Chestnykh

unread,
Mar 13, 2011, 1:41:49 PM3/13/11
to golan...@googlegroups.com, ⚛, Vladimir Sibirov
And another speedup by using string.FieldsFunc ;-)

func countWords(line string) int {
return len(strings.FieldsFunc(line, cutSet_func))
}


-Dmitry

Dmitry Chestnykh

unread,
Mar 13, 2011, 1:46:25 PM3/13/11
to golan...@googlegroups.com, ⚛, Vladimir Sibirov
...though if you're seriosly writing a wordcount function, you're better not allocate useless storage for words.

-Dmitry

Vladimir Sibirov

unread,
Mar 13, 2011, 3:17:52 PM3/13/11
to golan...@googlegroups.com
Thank you for the replies, guys! So we ended up with hashtable optimization for my Trim call (latest one gets ~ 0.335 sec vs. ~ 0.41 of Python's, which as we all know uses C implementation of string functions) and proposals of doing word count in a different way. What I actually asked is why Trim() function is so slow and whether some improvements to strings package could be made.

The original program is not focused on word counting anyways, it is focused on counting words for each line in the file in a separate goroutine, it is a trivial MapReduce-like example. Current code can be found here: http://dl.dropbox.com/u/15014241/Go/wordcount.go
Like my previous Ring benchmark, this one also takes -p parameter for GOMAXPROCS and here are some results if anybody is interested:
wordcount_serial bible.txt ~ 0.334703 seconds
wordcount -p=1 bible.txt ~ 0.387062 seconds
wordcount -p=2 bible.txt ~ 0.288115 seconds
wordcount -p=3 bible.txt ~ 0.245873 seconds
wordcount -p=4 bible.txt ~ 0.253397 seconds
(Performed on a dual core Intel processor and 64bit openSUSE 11.4)

2011/3/13 Dmitry Chestnykh <dch...@gmail.com>

unread,
Mar 14, 2011, 10:02:33 AM3/14/11
to golang-nuts
On Mar 13, 7:17 pm, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> What I actually asked is why Trim()
> function is so slow and whether some improvements to strings package could
> be made.

The reasons why the original program was slow compared to the Python
version:

- Go's function "strings.Trim" will allocate some memory each time it
is called
- Go compiler is not producing well-optimized code, it is unable to
inline functions
- Go's type system is preventing the programmer to load 4 adjacent
bytes (from "[]byte") into a single 32-bit register by means of a
single CPU instruction (which has the unfortunate consequence that
highly optimized C string-processing code will always be substantially
faster (e.g: 2x faster) than highly optimized Go string-processing
code)
- the method "trim" in Python2 and Python3 is implemented in C
- Python2 uses "memchr()" from glibc
- Python3 uses some kind of "bloom mask" to reduce the complexity by a
linear factor, the "bloom filter" has a hit ratio of something like
50% in case of "bible.txt"

Serge Hulne

unread,
Mar 14, 2011, 4:51:53 PM3/14/11
to golang-nuts
>
> it on a file which is big enough:http://dl.dropbox.com/u/15014241/Go/bible.txt.gz(un-gzipped of course).
>

Two questions:


1) I tried this example, but it seems to me that (even when gunzip-ed)
the file bible.txt is not a plain text file. Is this correct ?



2) I tried the example (the one with the goroutines), on a plain text
file, mamely: http://www.gutenberg.org/cache/epub/100/pg100.txt
(Shakespeare's complete work in plain text UTF-8 encoded).

Oddly enough, when run on that plain text file (pg100.txt), the higher
the value of the parameter p, the longer the file takes to process:

Here is what I obtained (on a dual core pentium 64 bits Ubuntu 10.04,
running the latest version of Go):

serge2@serge2-desktop:~/development/workspace/go-src/word_count/bin$
time ./wc_goroutines -p=1 ../../../corpora/shakes.txt
Done: 900991 words in 0.875275 seconds

real 0m0.878s
user 0m0.850s
sys 0m0.020s
serge2@serge2-desktop:~/development/workspace/go-src/word_count/bin$
time ./wc_goroutines -p=2 ../../../corpora/shakes.txt
Done: 900991 words in 0.915319 seconds

real 0m0.918s
user 0m0.880s
sys 0m0.030s
serge2@serge2-desktop:~/development/workspace/go-src/word_count/bin$
time ./wc_goroutines -p=3 ../../../corpora/shakes.txt
Done: 900991 words in 2.201452 seconds

real 0m2.205s
user 0m1.070s
sys 0m0.950s
serge2@serge2-desktop:~/development/workspace/go-src/word_count/bin$

It does not see to match your results.

I wonder why these results differ ?

Serge.




Vladimir Sibirov

unread,
Mar 14, 2011, 7:23:08 PM3/14/11
to golang-nuts
1) The file is in UTF-8 with UNIX end of line. Make sure you used
correct link without the part that gmail added after gz:
http://dl.dropbox.com/u/15014241/Go/bible.txt.gz

2) I ran it on the Shakespeare example provided, got this output:

trustmaster@linux-hple:~/workspace/wordcount/bin/linux_amd64> time ./
wordcount ~/pg100.txt
Done: 900991 words in 0.672628 seconds

real 0m0.675s
user 0m0.664s
sys 0m0.016s
trustmaster@linux-hple:~/workspace/wordcount/bin/linux_amd64> time ./
wordcount -p=2 ~/pg100.txt
Done: 900991 words in 0.583819 seconds

real 0m0.586s
user 0m0.744s
sys 0m0.148s
trustmaster@linux-hple:~/workspace/wordcount/bin/linux_amd64> time ./
wordcount -p=3 ~/pg100.txt
Done: 900991 words in 0.547963 seconds

real 0m0.551s
user 0m0.752s
sys 0m0.272s
trustmaster@linux-hple:~/workspace/wordcount/bin/linux_amd64> time ./
wordcount -p=4 ~/pg100.txt
Done: 900991 words in 0.595652 seconds

real 0m0.599s
user 0m0.748s
sys 0m0.364s

And similar numbers if I run them again. I can't see any obvious
reasons of such a difference in behavior, except for kernel 2.6.37.1.
It would be great to see some results from other machines too,
unfortunately this is the only one I have here till the end of the
week.

On 14 мар, 23:51, Serge Hulne <serge.hu...@gmail.com> wrote:
> > it on a file which is big enough:http://dl.dropbox.com/u/15014241/Go/bible.txt.gz(un-gzippedof course).

Dmitry Chestnykh

unread,
Mar 14, 2011, 7:28:42 PM3/14/11
to golan...@googlegroups.com
I get the similar results to yours, but if I set buffer size for channel,

partCounts := make(chan int, 100)

GOMAXPROCS=2 version runs faster for me on Core 2 Duo (and =3 runs much slower)

$ ./wordcount -p=1 pg100.txt 
Done: 900991 words in 0.580931 seconds

$ ./wordcount -p=2 pg100.txt 
Done: 900991 words in 0.442415 seconds

Compared to non-buffered version:

$ ./wordcount -p=1 pg100.txt 
Done: 900991 words in 0.639964 seconds
$ ./wordcount -p=2 pg100.txt 
Done: 900991 words in 1.172734 seconds

-Dmitry

Dmitry Chestnykh

unread,
Mar 14, 2011, 7:29:49 PM3/14/11
to golan...@googlegroups.com
Forgot to tell that I'm on darwin-amd64.

Dave Wells

unread,
Mar 14, 2011, 8:57:54 PM3/14/11
to golang-nuts
Vladimir, Thanks for the interesting post. Here's results from a
netbook.

/* Results on Acer Aspire 1, N270 Atom, Ubuntu 9.10
wordcount_serial bible.txt in 1.451830 seconds
with the serial2 patch applied
wordcount -p 1 bible.txt in 2.102102 seconds
wordcount -p 2 bible.txt in 1.314629 seconds
*/

andrey mirtchovski

unread,
Mar 14, 2011, 9:24:57 PM3/14/11
to golan...@googlegroups.com
i don't know the purpose of this exercise, but i'll play. the
following is a run on a 12-cpu, 24-core node running centos. there was
too much noise with the regular bible, so i had to expand it a bit:
bible2 is just bible repeated 10 times.

$ ls -l bible*
-rw-rw-r-- 1 a a 44044450 Mar 14 17:09 bible2.txt
-rw-rw-r-- 1 a a 4404445 Mar 14 17:08 bible.txt
$ for i in `seq 1 24`; do echo -n "$i: "; ./wordcount -p $i bible.txt; done
1: Done: 820742 words in 0.458876 seconds
2: Done: 820742 words in 0.334103 seconds
3: Done: 820742 words in 0.267302 seconds
4: Done: 820742 words in 0.241759 seconds
5: Done: 820722 words in 0.224769 seconds
6: Done: 820742 words in 0.293846 seconds
7: Done: 820742 words in 0.365714 seconds
8: Done: 820742 words in 0.396215 seconds
9: Done: 820742 words in 0.340283 seconds
10: Done: 820742 words in 0.338947 seconds
11: Done: 820702 words in 0.430823 seconds
12: Done: 820742 words in 0.327284 seconds
13: Done: 820742 words in 0.483451 seconds
14: Done: 820742 words in 0.265444 seconds
15: Done: 820715 words in 0.337210 seconds
16: Done: 820742 words in 0.531193 seconds
17: Done: 820742 words in 0.457809 seconds
18: Done: 820742 words in 0.411667 seconds
19: Done: 820729 words in 0.302328 seconds
20: Done: 820697 words in 0.383019 seconds
21: Done: 820703 words in 0.534184 seconds
22: Done: 820742 words in 0.497216 seconds
23: Done: 820697 words in 0.342722 seconds
24: Done: 820715 words in 0.391646 seconds
$ for i in `seq 1 24`; do echo -n "$i: "; ./wordcount -p $i bible2.txt; done
1: Done: 8207420 words in 4.828265 seconds
2: Done: 8207420 words in 2.922690 seconds
3: Done: 8207420 words in 2.367519 seconds
4: Done: 8207386 words in 2.275688 seconds
5: Done: 8207420 words in 2.024210 seconds
6: Done: 8207420 words in 2.289805 seconds
7: Done: 8207420 words in 2.894400 seconds
8: Done: 8207420 words in 3.827817 seconds
9: Done: 8207407 words in 2.925551 seconds
10: Done: 8207420 words in 3.114902 seconds
11: Done: 8207379 words in 3.028556 seconds
12: Done: 8207407 words in 2.844268 seconds
13: Done: 8207420 words in 2.995080 seconds
14: Done: 8207420 words in 2.854469 seconds
15: Done: 8207420 words in 2.573538 seconds
16: Done: 8207407 words in 3.983480 seconds
17: Done: 8207420 words in 3.761097 seconds
18: Done: 8207420 words in 2.909477 seconds
19: Done: 8207407 words in 3.725225 seconds
20: Done: 8207420 words in 2.615825 seconds
21: Done: 8207403 words in 3.260078 seconds
22: Done: 8207420 words in 2.750152 seconds
23: Done: 8207407 words in 2.505290 seconds
24: Done: 8207420 words in 2.844379 seconds

Do the results look slightly off to you? They should -- the number of
words counted is not always the same, which means there's a race
condition in your code. You create a lot of goroutines, some 31103 in
all for bible.txt. This means that some goroutines may not have had a
chance to count their words before you terminate the reducer. You must
either keep track of how many you've created and how many have
finished their job or, better yet, use a fixed number of goroutines to
do the work and tie that to the number of processors.

Johann Höchtl

unread,
Mar 15, 2011, 3:38:57 AM3/15/11
to golang-nuts


On Mar 15, 2:24 am, andrey mirtchovski <mirtchov...@gmail.com> wrote:
>
> Do the results look slightly off to you? They should -- the number of
> words counted is not always the same, which means there's a race
> condition in your code. You create a lot of goroutines, some 31103 in
> all for bible.txt.

I think unless the race condition is fixed, this test is odd as the
executable may terminate without actually finishing the work.

Vladimir Sibirov

unread,
Mar 15, 2011, 4:54:34 AM3/15/11
to golang-nuts
Thanks for pointing to my mistake. I have fixed it and started another
thread "Lazy parallelism benchmark" <http://groups.google.com/group/
golang-nuts/browse_thread/thread/7b1daaea9a44277c>, because the
current topic is originally about speed of string functions in Go.
Reply all
Reply to author
Forward
0 new messages