4k goroutines performance vs. GOMAXPROCS

John Nogatch

unread,

Aug 1, 2011, 1:42:36 PM8/1/11

to golang-nuts

While learning go, I implemented Conway's game of life. A goroutine is
started for each cell of an array that is 126 columns by 37 rows =
4662 goroutines, plus the main thread which performs the display.

The following shows GOMAXPROCS, mean time per generation (sec), and
standard deviation of time per generation (sec), when run on a 1 CPU
Intel Pentium III:
1 8.27 0.202
2 7.02 0.272
3 1.79 1.85
4 1.64 1.99
5 1.07 0.879
6 0.584 0.684
7 0.35 0.405
8 0.311 0.0256
9 0.31 0.0237
10 0.302 0.0243
25 0.304 0.0249
50 0.308 0.0434
100 0.305 0.0264
200 0.305 0.0295
400 0.312 0.0239
1000 0.32 0.0693
2000 0.348 0.0485
4000 0.369 0.0533
8000 0.392 0.0676
16000 0.363 0.0356
32000 0.373 0.0565
64000 0.376 0.0471
128000 0.38 0.091
256000 0.375 0.0508
512000 0.369 0.0464
1024000 0.381 0.0463

Performance improved when GOMAXPROCS was increased to 10, on a 1 CPU
laptop.

Very large values of GOMAXPROCS degraded performance only slightly.

Using the "H" option to "top" showed only 7 active threads, for
"GOMAXPROCS=100".

Florian Uekermann

unread,

Aug 1, 2011, 2:12:40 PM8/1/11

to golang-nuts

I can imagine why there could be a performance benefit for running
only one goroutine at a time (no locking), but I never really
understood why there should be a real life benefit caused by limiting
the number of goroutines depending on the number of processors. And I
think I read somewhere that thats planned for the future. Isn't the
operating system scheduler good enough to handle that task? Or is
there some benefit in having less threads and switching between
goroutines in one thread?

Muharem Hrnjadovic

unread,

Aug 1, 2011, 2:18:52 PM8/1/11

to Florian Uekermann, golang-nuts

On 08/01/2011 08:12 PM, Florian Uekermann wrote:
> I can imagine why there could be a performance benefit for running
> only one goroutine at a time (no locking), but I never really
> understood why there should be a real life benefit caused by limiting
> the number of goroutines depending on the number of processors. And I
> think I read somewhere that thats planned for the future.
> Isn't the operating system scheduler good enough to handle that task?

I understand that the OS scheduler operates on process level whereas
goroutines execute all *inside* the same process i.e. we are looking at a
different level of granularity.

> Or is there some benefit in having less threads and switching between
> goroutines in one thread?
>
> On Aug 1, 7:42 pm, John Nogatch <jnoga...@gmail.com> wrote:
>> While learning go, I implemented Conway's game of life. A goroutine is
>> started for each cell of an array that is 126 columns by 37 rows =
>> 4662 goroutines, plus the main thread which performs the display.
>>
>> The following shows GOMAXPROCS, mean time per generation (sec), and
>> standard deviation of time per generation (sec), when run on a 1 CPU
>> Intel Pentium III:

[..]

>> Performance improved when GOMAXPROCS was increased to 10, on a 1 CPU
>> laptop.
>>
>> Very large values of GOMAXPROCS degraded performance only slightly.
>>
>> Using the "H" option to "top" showed only 7 active threads, for
>> "GOMAXPROCS=100".

Best regards/Mit freundlichen Grüßen

--
Muharem Hrnjadovic <muh...@lbox.cc>
Public key id : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F 5602 219F 6B60 B2BB FCFC

signature.asc

⚛

unread,

Aug 1, 2011, 2:49:18 PM8/1/11

to golang-nuts

On Aug 1, 8:12 pm, Florian Uekermann <f...@uekermann-online.de> wrote:
> I can imagine why there could be a performance benefit for running
> only one goroutine at a time (no locking), but I never really
> understood why there should be a real life benefit caused by limiting
> the number of goroutines depending on the number of processors. And I
> think I read somewhere that thats planned for the future. Isn't the
> operating system scheduler good enough to handle that task?

The OS scheduler is not good enough for the task, when there are many
goroutines.

In addition, even if a program runs with GOMAXPROCS=1 there can be
measurable benefits of having many goroutines. For example the program
can initiate 100 network requests and use 100 goroutines to
concurrently wait for the results to arrive. Even with GOMAXPROCS=1
this can be much faster than sequential processing.

Florian Weimer

unread,

Aug 1, 2011, 5:04:50 PM8/1/11

to Florian Uekermann, golang-nuts

* Florian Uekermann:

> I can imagine why there could be a performance benefit for running
> only one goroutine at a time (no locking), but I never really
> understood why there should be a real life benefit caused by limiting
> the number of goroutines depending on the number of processors. And I
> think I read somewhere that thats planned for the future. Isn't the
> operating system scheduler good enough to handle that task?

The scheduler should be good enough. But in a 1:1 model, each
goroutine needs a kernel stack, which takes up 8K. This means that
they are not really that lightweight, no matter what you do on the
userspace side.

Florian Uekermann

unread,

Aug 1, 2011, 5:29:04 PM8/1/11

to golang-nuts

Ah that last one makes sense to me, thanks FW...

Is it correct to say reusing threads for short-lived goroutines
prevents thread creation overhead?
So for long-lived goroutines the internal scheduler has no performance
benefits if the OS scheduler is good?

I am not going to speculate wildly about performance drawbacks, but if
someone has something to say about this (go devs?) I would be very
interested to read it?

Muharem Hrnjadovic

unread,

Aug 1, 2011, 2:12:20 PM8/1/11

to John Nogatch, golang-nuts

On 08/01/2011 07:42 PM, John Nogatch wrote:
> While learning go, I implemented Conway's game of life. A goroutine is
> started for each cell of an array that is 126 columns by 37 rows =
> 4662 goroutines, plus the main thread which performs the display.
>
> The following shows GOMAXPROCS, mean time per generation (sec), and
> standard deviation of time per generation (sec), when run on a 1 CPU
> Intel Pentium III:

[..]

> Performance improved when GOMAXPROCS was increased to 10, on a 1 CPU
> laptop.
>
> Very large values of GOMAXPROCS degraded performance only slightly.

I made some similar observations running parallel Go code on a multicore
machine though: http://tinyurl.com/3w4nrby -- The sweet spot in my case
was GOMAXPROCS=8

signature.asc

DisposaBoy

unread,

Aug 2, 2011, 3:28:23 AM8/2/11

to golan...@googlegroups.com

Keep in mind that the most significant degradation of performance will come not from the number of goroutines you have running but the number if *real* OS threads. E.g you could run a thousand goroutines at the same time and notice little difference between that and the same operations done in sequence. This is all relative ofcourse and the reason could vary from the fact that those goroutines only used up to ten threads because they completed quickly and thus the threads got reused.

John Nogatch

unread,

Aug 5, 2011, 10:14:24 AM8/5/11

to golan...@googlegroups.com

After upgrading to "8g version release.r59 9276", the performance of my "game of life" application is greatly improved!

On a 1-CPU Pentium III, the time per generation has decreased to 0.1 sec, and is no longer sensitive to the setting of GOMAXPROCS. This is 3X faster than GOMAXPROCS=10 with the previous version of 8g.

On a 2-CPU system, GOMAXPROCS=2 is slightly faster than 1, but the difference is small.

The performance of the computing goroutines is so much better, that the display of the result is now the limiting factor.

Reply all

Reply to author

Forward