Concurrency & performance

18 views
Skip to first unread message

MKoistinen

unread,
Nov 13, 2009, 5:07:37 PM11/13/09
to golang-nuts
I'm doing some tests with goroutines and concurrency.

My program breaks up 2048 otherwise linear tasks into 1 per goroutine
and uses channels to manage the process and collect the resulting
data.

All in all, the performance of the code is about the same as the
linear version of the same thing (very slightly longer, probably due
to the extra management required).

When running in goroutines, I notice that the I only ever max out 1
core of my core2 duo.

So, I set the env variable $GOMAXPROCS to 2 and re-ran it.

The processor utilisation seems to have gone up, but the performance
went down, taking 68% longer to finish.

Does anyone have any tips on improving performance with goroutines?

If it helps, I'm using Mac OS X and 8g to compile.

Maradatscha

unread,
Nov 13, 2009, 5:28:56 PM11/13/09
to golang-nuts

Adam Langley

unread,
Nov 13, 2009, 5:30:38 PM11/13/09
to MKoistinen, golang-nuts
On Fri, Nov 13, 2009 at 2:07 PM, MKoistinen <mkois...@gmail.com> wrote:
> So, I set the env variable $GOMAXPROCS to 2 and re-ran it.
>
> The processor utilisation seems to have gone up, but the performance
> went down, taking 68% longer to finish.
>

I think the answer at the moment is that we first want to make the
code work, then make it work fast.

In time, suitable problems should be able to see a linear speedup in
number of cores, but we're not there yet.


AGL

Peter Bourgon

unread,
Nov 14, 2009, 5:37:43 AM11/14/09
to Adam Langley, MKoistinen, golang-nuts
So, go-routine'd, parallelized computations taking in many cases
_longer_ than their single-threaded counterparts is a known issue at
this point?

MKoistinen

unread,
Nov 14, 2009, 5:56:30 AM11/14/09
to golang-nuts
Given that this is a beta/experimental language, I can't see how that
is so unreasonable. And I'm pleased to have learnt about this now
during my own experimental phases rather than finding out in the
middle of a bigger project.

On Nov 14, 10:37 am, Peter Bourgon <peterbour...@gmail.com> wrote:
> So, go-routine'd, parallelized computations taking in many cases
> _longer_ than their single-threaded counterparts is a known issue at
> this point?
>
> On Sat, Nov 14, 2009 at 12:30 AM, Adam Langley <a...@golang.org> wrote:

Russ Cox

unread,
Nov 14, 2009, 10:09:54 AM11/14/09
to peter....@gmail.com, golang-nuts
On Sat, Nov 14, 2009 at 02:37, Peter Bourgon <peterb...@gmail.com> wrote:
> So, go-routine'd, parallelized computations taking in many cases
> _longer_ than their single-threaded counterparts is a known issue at
> this point?

I don't think I'd go that far. It's possible to implement coarse
parallelization with few code changes. For example, in the
$GOROOT/test/bench directory, compare spectral-norm.go
and spectral-norm-parallel.go.

On the other hand, code that spends all its time switching
between goroutines rather than computing is going to get
slower when those goroutines run in different OS threads.
A smarter scheduler might be able to keep them in a single
OS thread, but that doesn't change the fact that the program
is doing more context switching than real work.

Goroutines certainly aren't a magic bullet. They make it
easy to manage parallelism, but you still have to decide
what level of parallelism will work best for the problem at hand.

Russ

MKoistinen

unread,
Nov 14, 2009, 6:41:54 PM11/14/09
to golang-nuts
One of the problems that I've found with large amounts of concurrency
is that Go will happily create new goroutines as requested, even if it
exhausts the available computing resources -- namely RAM.

This brings forth these questions for me:
1) How much RAM overhead is a goroutine? (8g compiler)
2) In the future, will there be any sort of goroutine management
features in the runtime?

I've managed to get mine under control by creating a 'throttle' worker
goroutine which hands out some max number of 'tickets' (bool) on a
published channel and won't hand anymore out until those goroutines
that received them, return them back again as they return. Its crude
but really helped me stay within RAM and, independently, reduced the
run times to about 21% of what they were.

Sadly, setting GOMAXPROCS to anything > 1 still results double
runtimes.

Scott Mansell

unread,
Nov 14, 2009, 7:28:50 PM11/14/09
to golang-nuts
One thing I noticed when playing with the prime sieve example and watching the cpu stats in top, is that one thread results in 25% userspace to,e and 75% idle time.

When I set GOMAXPROCS to something higher, like 8, top reports 26-27% userspace time and 73-74% system time.

I'm not an expert on this, but I'm guess that because seive.go spans a new goroutine for every prime that it finds, and each goroutine simply contains a modulus and an if statement, most of the total cost are the channels which are almost free for a single processor program, but involve an expensive system call for syncronynation between multiple processors.

____________
Scott Mansell

Ian Lance Taylor

unread,
Nov 15, 2009, 11:19:04 PM11/15/09
to MKoistinen, golang-nuts
MKoistinen <mkois...@gmail.com> writes:

> 1) How much RAM overhead is a goroutine? (8g compiler)

Most of the overhead is the stack, which starts out at 4K. There is
also a data structure for the goroutine, which is smaller.


> 2) In the future, will there be any sort of goroutine management
> features in the runtime?

Probably, though I don't think we know what is required yet.

Ian
Reply all
Reply to author
Forward
0 new messages