The Ring benchmark in Go

338 views
Skip to first unread message

Vladimir Sibirov

unread,
Mar 9, 2011, 4:32:21 AM3/9/11
to golan...@googlegroups.com
Hello Go World,

I have followed the footsteps of Christopher Stawarz with his Erlang vs. Stackless vs. Multitask Python comparison (http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html), converted this Erlang benchmark to Go and compared results.

I should make a remark here that the ring benchmark can only be used to compare efficiency of message passing implementation; it does not compare languages in general; neither it benchmarks multicore and multiprocessor environments because of its serial nature (the message is passed through a ring of coroutines one by one).

The code follows Erlang implementation quite literally, including some functions unnecessary in Go, but that makes the comparison more fair. You can obtain it here:
http://dl.dropbox.com/u/15014241/Go/ring.erl is a slightly modified Erlang ring, which runs in Erlang shell rather than command line.

Apart from 2 traditional parameters -n (number of processes) and -m (number of message rounds) it has a third one -p which sets GOMAXPROCS runtime variable. It defaults to 1, but if a higher value is set, it lets us see how well it deals with this serial benchmark when interprocess communication is involved on a multicore system.

The testing environment is dual core Intel Core 2 64...@2.13Ghz, 2GB RAM, running 64bit openSuSE 11.3 with kernel 2.6.34.7, Erlang/OTP R14B, Go release.2011-03-07. The results are represented by the following table:

N

M

Erlang (s)

Go (s)

Go x2 (s)

Go/Erlang (%)

Go x2/Go

100

100

0.011

0.007249

0.044173

65.9

6.0936680922

100

1000

0.099

0.062647

0.411897

63.2797979798

6.5748878637

100

10000

1.078

0.752017

4.248261

69.7603896104

5.6491555377

1000

100

0.095

0.070151

0.438409

73.8431578947

6.24950464

1000

1000

1.094

0.877429

4.482036

80.2037477148

5.1081466421

1000

10000

10.962

7.880481

44.842602

71.8890804598

5.6903381913

10000

100

1.482

0.921457

4.70202

62.176585695

5.1028100063

10000

1000

14.887

10.666996

46.764236

71.6530933029

4.3840117686

10000

10000

146.28

105.998892

458.785201

72.4630106645

4.3282075156


In this table only run time is measured, setup stage is omitted. Go x2 stands for Go with GOMAXPROCS=2 (for 2 cores on this CPU). As you can see, Go implementation rotates these messages about 30% faster within one process. But when interprocess communication is involved, things go nearly 5 times slower.

Previously I ran this test on Ubuntu 10.10 64bit with quad-core AMD Phenom II and 4GB RAM and for N=10000, M=10000 it was near 75.6 seconds for Erlang and 74 seconds for Go. And it was several times slower with GOMAXPROCS=2,3,4 and almost no difference which one of these values (2, 3 or 4) is picked.

I also ran Erlang with HiPE, but it made almost no difference.

Another thing to mention is that both Erlang and Go are very close and both have nearly linear scaling as the number of messages grows.

I hope you find this benchmark useful.

--
Regards,
Vladimir

Vladimir Sibirov

unread,
Mar 9, 2011, 4:39:10 AM3/9/11
to golang-nuts
Seems like HTML table got screwed by plain text, here you can find the
results in TXT or ODS format:
http://dl.dropbox.com/u/15014241/Go/ringbench.txt
http://dl.dropbox.com/u/15014241/Go/ringbench.ods

On 9 мар, 12:32, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> Hello Go World,
>
> I have followed the footsteps of Christopher Stawarz with his Erlang vs.
> Stackless vs. Multitask Python comparison (http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html),
> converted this Erlang benchmark to Go and compared results.
>
> I should make a remark here that the ring benchmark can only be used to
> compare efficiency of message passing implementation; it does not compare
> languages in general; neither it benchmarks multicore and multiprocessor
> environments because of its serial nature (the message is passed through a
> ring of coroutines one by one).
>
> The code follows Erlang implementation quite literally, including some
> functions unnecessary in Go, but that makes the comparison more fair. You
> can obtain it here:http://dl.dropbox.com/u/15014241/Go/ring.gohttp://dl.dropbox.com/u/15014241/Go/ring.erlis a slightly modified Erlang
> ring, which runs in Erlang shell rather than command line.
>
> Apart from 2 traditional parameters -n (number of processes) and -m (number
> of message rounds) it has a third one -p which sets GOMAXPROCS runtime
> variable. It defaults to 1, but if a higher value is set, it lets us see how
> well it deals with this serial benchmark when interprocess communication is
> involved on a multicore system.
>
> The testing environment is dual core Intel Core 2 6...@2.13Ghz, 2GB RAM,

Johann Höchtl

unread,
Mar 9, 2011, 9:24:37 AM3/9/11
to golang-nuts


On Mar 9, 10:32 am, Vladimir Sibirov <trustmas...@kodigy.com> wrote:
> Hello Go World,
>
> I have followed the footsteps of Christopher Stawarz with his Erlang vs.
> Stackless vs. Multitask Python comparison (http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html),
> converted this Erlang benchmark to Go and compared results.
>
> I should make a remark here that the ring benchmark can only be used to
> compare efficiency of message passing implementation; it does not compare
> languages in general; neither it benchmarks multicore and multiprocessor
> environments because of its serial nature (the message is passed through a
> ring of coroutines one by one).
>
> The code follows Erlang implementation quite literally, including some
> functions unnecessary in Go, but that makes the comparison more fair. You
> can obtain it here:http://dl.dropbox.com/u/15014241/Go/ring.gohttp://dl.dropbox.com/u/15014241/Go/ring.erlis a slightly modified Erlang
> ring, which runs in Erlang shell rather than command line.
>
> Apart from 2 traditional parameters -n (number of processes) and -m (number
> of message rounds) it has a third one -p which sets GOMAXPROCS runtime
> variable. It defaults to 1, but if a higher value is set, it lets us see how
> well it deals with this serial benchmark when interprocess communication is
> involved on a multicore system.
>
> The testing environment is dual core Intel Core 2 6...@2.13Ghz, 2GB RAM,
So scheduling is fast, but mapping logical threads onto system threads
and communication in-between is slow.
Is the GC aware of keeping memory thread-local to prevent context
switches?

Kai Backman

unread,
Mar 9, 2011, 9:40:01 AM3/9/11
to Vladimir Sibirov, golan...@googlegroups.com
On Wed, Mar 9, 2011 at 11:32 AM, Vladimir Sibirov
<trust...@kodigy.com> wrote:
> As you can see, Go implementation rotates these messages about 30% faster within one process. But when interprocess communication is involved, things go nearly 5 times slower.

We ran into the same issue with GC a while back and basically ended
splitting our worker tasks into one process per core instead of
running a single process and GOMAXPROC=4. We saw >4x performance
increase. In our case the code and deployment changes were trivial as
the system was already distributed and we just ended up increasing
node count.

It seems prudent to increase performance for multi core Go
applications but I'm starting to think that single proces multi core
is more important for desktop applications than distributed servers.

Kai

--
Kai Backman, programmer
http://tinkercad.com - solid modeling for artists and makers

Reply all
Reply to author
Forward
0 new messages