Running concurrent code in parallel

Brendan Tracey

unread,

Apr 12, 2013, 2:35:05 PM4/12/13

to golan...@googlegroups.com

Hi,

I have been experimenting with different ways of doing parallel for loops in go. I have two different models at the moment.

The first launches a set of go routines every run

http://play.golang.org/p/S_E3_ayKmW

The second launches one set of goroutines that lasts for the duration of the program

http://play.golang.org/p/XHHxccQeiy

Both codes compute the same (correct) answer. However, on my computer (OSX 10.8.2), only the first actually uses multiple computation threads (though it does not use multiple threads for the first run); the second one only uses one of them. The same behavior is seen on both 1.0.3 and go1.1beta2. Could someone help me understand why this is? I would think the second program is easier to schedule given that the same goroutines exist the whole time.

Thanks

Brendan Tracey

unread,

Apr 12, 2013, 3:07:38 PM4/12/13

to golan...@googlegroups.com, ty...@torbit.com

D'oh! Thanks.

On Friday, April 12, 2013 11:45:28 AM UTC-7, ty...@torbit.com wrote:

Checkout your use of GOMAXPROCS... You set it in one program and not the other.
Also this is typically set right away in main.go but you don't do that (you set it after your dispatcher...)

Put "runtime.GOMAXPROCS(runtime.NumCPU())" on the first line of main.go for each program.

Dave Cheney

unread,

Apr 12, 2013, 7:52:28 PM4/12/13

to ty...@torbit.com, golang-nuts

> Put "runtime.GOMAXPROCS(runtime.NumCPU())" on the first line of main.go for
> each program.

Please stop this cargo cult programming. There are at least three good
reasons why you should not add this line to the start of any main
function you write.

1. Obviously your program is CPU bound, therefore, if you run your
program more that once concurrently, it will be misconfigured and
there is nothing you can do about it.
2. Your program is not non deterministic. Sure it adapts to the number
of cores present, but that means when you move it from your high
powered production server to your laptop, it will have radically
different characteristics.
3. Your operations people will hate you. You've just robbed them of
the one mechanism they have to adjust up, or down, a Go program
deployed in their environment.

To repeat: export GOMAXPROCS=somenumber is the recommended way to
adjust GOMAXPROCS for an invocation of a program. The value of
`somenumber` should be determined by profiling and benchmarking,
rather than the number of cores in the host you have at the time.

Cheers

Dave

Jens Alfke

unread,

Apr 13, 2013, 12:26:07 AM4/13/13

to golan...@googlegroups.com, ty...@torbit.com

On Friday, April 12, 2013 4:52:28 PM UTC-7, Dave Cheney wrote:

Please stop this cargo cult programming.

Please stop insulting people for making suggestions. I understand your intention, but your language is needlessly rude. I've moderated various mailing lists in my time and I try to stay away from this sort of language.

There are at least three good
reasons why you should not add this line to the start of any main
function you write.
1. Obviously your program is CPU bound, therefore, if you run your
program more that once concurrently, it will be misconfigured and
there is nothing you can do about it.

That implies the program will be run more than once simultaneously. That's completely dependent on the type of program it is an the environment it runs in; for example, almost none of the code I write (primarily iOS and Mac application code and system frameworks) would ever be run this way.

2. Your program is not non deterministic.

I think you mean it *is* nondeterministic. But any code that uses parallelism is nondeterministic. Even if the number of CPUs is fixed, all sorts of other parameters like the timing of I/O operations or the frequency of task interrupts aren't. It's a *good* idea to exercise code in different environments instead of just assuming it will work in one.

For example, the docs for GOMAXPROCS point out that this mechanism will go away when the Go schedule is more mature. At that point presumably CPU scheduling will be more dynamically based on the available CPU cores. leading to exactly the sort of nondeterminism you're decrying here.

3. Your operations people will hate you. You've just robbed them of
the one mechanism they have to adjust up, or down, a Go program
deployed in their environment.

Who says there are "operations people"? Who says this program runs in some server farm? How do you know what kind of environment this code is for?

To repeat: export GOMAXPROCS=somenumber is the recommended way to
adjust GOMAXPROCS for an invocation of a program.

Again, you're assuming an environment where it's convenient or just possible to manipulate environment variables around a program. I've worked in many environments where it's not (such as GUI applications.) Anyway, opinions vary — IMHO environment variables are a total mess to use, especially during development. If I want some kind of dynamic preference I'd prefer to use something else such as the CFPreferences API.

My 2¢? I understand that Go's scheduler is immature and not yet capable of adapting intelligently to the available CPU power. But it seems weird to me to have the default be to limit it to only a single core — it's restricting the program to using only 1/4 or 1/8 of the typical available CPU. As a first pass approximation before (or instead of) doing careful tuning, it seems to me more appropriate to allow the code to use all CPUs.

Cheers
Dave

I'm guessing this is a canned .sig since it clashes weirdly with the tone of the rest of the message. I always find that sort of jarring...

Hugs,

—Jens

andrey mirtchovski

unread,

Apr 13, 2013, 1:01:40 AM4/13/13

to Jens Alfke, golang-nuts, ty...@torbit.com

> Please stop insulting people for making suggestions. I understand your
> intention, but your language is needlessly rude.

I don't think Dave was rude unless "please" has now become a rude
word. The term "cargo cult programming" has an exact definition which
fit the described solution perfectly.[1]

Besides, "Put "runtime.GOMAXPROCS(runtime.NumCPU())" on the first line
of main.go for each program" contradicts the FAQ directly.[2]

---
1: http://en.wikipedia.org/wiki/Cargo_cult_programming
2: http://golang.org/doc/faq#Why_GOMAXPROCS

Jens Alfke

unread,

Apr 13, 2013, 1:28:01 AM4/13/13

to andrey mirtchovski, golang-nuts, ty...@torbit.com

On Apr 12, 2013, at 10:01 PM, andrey mirtchovski <mirtc...@gmail.com> wrote:

I don't think Dave was rude unless "please" has now become a rude
word. The term "cargo cult programming" has an exact definition which
fit the described solution perfectly.[1]

I know its definition, and it’s fairly insulting to someone’s skills (as evident from reading the first paragraph of the Wikipedia article). I’m not saying there isn’t ever cause to use it — I definitely have — but the strong implication given was that tylor is incompetent or unskilled. I would think twice before using the term to someone’s face, literally or in a mailing list.

I also disagree that this case fits the definition — I personally find it rational to add that line to a program by default. It’s a sensible initial value that’s mostly better than 1 and can be tuned later on.

Besides, "Put "runtime.GOMAXPROCS(runtime.NumCPU())" on the first line
of main.go for each program" contradicts the FAQ directly.[2]

Not really. The text you reference says "Go's goroutine scheduler is not as good as it needs to be ... For now, GOMAXPROCS should be set on a per-application basis.” Tylor’s advice was to set GOMAXPROCS in the application. The FAQ does not weigh in on whether it’s better to do so in an environment variable or a runtime API.

I find the advice “you can’t hardcode this, you have to tune it for each program” kind of disingenuous since GOMAXPROCS already starts out hardcoded to a fixed value in each program. It just happens to be hardcoded to a value of 1, which is, for most programs, inefficient. I don’t see any harm in starting a program by changing it to a different hardwired value that’s likely to make it run faster.

(I’ve been writing some parallel code this week and was disappointed initially at how slowly it ran. Only later did I remember the obscure detail that Go 1’s scheduler only allows one core to run by default; then I added the exact line recommended in this thread, and my code immediately doubled in speed. There are probably people out there kicking the tires of Go and looking at performance who didn’t know about this detail and may have given up on it.)

—Jens

Jesse McNelis

unread,

Apr 13, 2013, 2:04:26 AM4/13/13

to ty...@torbit.com, golang-nuts, Jens Alfke

On Sat, Apr 13, 2013 at 3:30 PM, <ty...@torbit.com> wrote:

Calling my suggestion cargo cult programming was felt rude because it assumes that 1. I was making a recommendation for all Go programs and 2. that I do not understand what runtime.GOMAXPROCS(runtime.NumCPU()) does. I understand exactly what it does (and in fact I think it is what most people want by default, whether you think that is good for them is another matter).

runtime.GOMAXPROCS(runtime.NumCPU()) is cargo cult programming. It gets copy/pasted around as a magic incantation to "make go programs faster". It's offered as a quick solution with little explanation so it's implications are often not understood by the people copy/pasting it.

I've been seeing it more and more in open source Go programs and it's a worrying trend.

The Go runtime doesn't currently scale well past GOMAXPROCS=16. If you happen to have 24 active threads then you'll see a decrease in performance compared with 16 active threads. If you can't change this value because it's hard-coded then your newer hardware is actually making your program slower.

If you have a 512 CPU SSI cluster then this is a really big problem.

If I have 4 cores and intentionally tell my OS to only allow the Go program to use one of those cores then it will still spawn 4 threads and they'll compete for that single core since GOMAXPROCS is a limitation on the parallelism of goroutines and has nothing to do with the number of CPUs your program can use.

Hard-coding GOMAXPROCS in a way that can't be easily changed by sysadmins will make them very sad.

=====================
http://jessta.id.au

ron minnich

unread,

Apr 15, 2013, 11:33:32 AM4/15/13

to Jens Alfke, golang-nuts, ty...@torbit.com

On Fri, Apr 12, 2013 at 9:26 PM, Jens Alfke <je...@mooseyard.com> wrote:
>But any code that uses
> parallelism is nondeterministic.

Actually, that's not really correct, it is highly dependent on many
factors. If enough effort is expended on software, architecture, and
hardware, as in e.g. the Blue Gene systems, parallel execution can be
as deterministic as you wish. But your statement is true in many cases
-- just not all.

ron

Kevin Gillette

unread,

Apr 15, 2013, 1:20:28 PM4/15/13

to golan...@googlegroups.com, Jens Alfke, ty...@torbit.com

Indeed, if any kind of synchronization occurs on shared-state access (and that shared state is modified such that it is consistent and valid on each access), then the system as a whole is deterministic, as well as each independent computation being deterministic. Unless you're trying to create a system that discovers the meaning of life by accident, or are using qubits, then you really want to use synchronization, therefore making your software, whether or not anything happens in parallel, fully deterministic.

Devon H. O'Dell

unread,

Apr 15, 2013, 2:46:49 PM4/15/13

to Kevin Gillette, golang-nuts, Jens Alfke, ty...@torbit.com

2013/4/15 Kevin Gillette <extempor...@gmail.com>:

> Indeed, if any kind of synchronization occurs on shared-state access (and
> that shared state is modified such that it is consistent and valid on each
> access), then the system as a whole is deterministic, as well as each
> independent computation being deterministic. Unless you're trying to create
> a system that discovers the meaning of life by accident, or are using
> qubits, then you really want to use synchronization, therefore making your
> software, whether or not anything happens in parallel, fully deterministic.

Synchronization doesn't guarantee determinism, although it might
appear to. The problem is that there exist many cases in which
expected deterministic behavior is impossible due to hardware,
operating system, operating environment, etc. Indeed, even fair
synchronization systems (for example, spinlocks like ticket, MCS,
queue locks, etc) don't guarantee a purely evenly distributed
workload. So even with synchronization, you don't know who is waiting
and who is synchronized. And if your synchronized path does anything
with syscalls, if your locks put you to sleep, or if pre-emption
exists, all bets are off.

Even assuming syscalls, sleep locks, and pre-emption all work fine
every time, synchronization by itself doesn't create determinism in a
non-deterministic system. I feel like I am understanding you to say
"deterministic" in a way that I don't typically understand it to be
defined. In a deterministic system, a given input always results in
the same output and does so by traversing through the same sequence of
states. Synchronization does not guarantee sequence (implicitly or
explicitly). You can have a deterministic parallel system only when
you can guarantee a total order of the intermediate states. Simply
synchronizing around a machine that contains valid states does not do
this. Consider:

accumulator = 0
thread_add(n) {
lock(accumulator)
accumulator += n
unlock(accumulator)
}

add3(a, b, c) {
thread_add(a)
thread_add(b)
thread_add(c)
}

In this system, add3(5, 7, 11) will always leave accumulator with the
value 23. But in between, it may have the states (11, 18), (5, 12),
(7, 12), etc. And this can be a problem if you're dealing with
problems that expect a total order. The above example is too simple to
demonstrate this, but it is easy to conceive of a system where a
consistent state isn't the same thing as a valid state. And an invalid
(but consistent) state breaks determinism if it can happen as a
side-effect of parallel scheduling. Simply synchronizing doesn't solve
this. You also have to schedule properly and you have to guarantee
total ordering of state sequence.

Maybe that's what you meant? Or maybe I'm just being too much of a
pedantic jerk.

--dho

> On Monday, April 15, 2013 9:33:32 AM UTC-6, ron minnich wrote:
>>
>> On Fri, Apr 12, 2013 at 9:26 PM, Jens Alfke <je...@mooseyard.com> wrote:
>> >But any code that uses
>> > parallelism is nondeterministic.
>>
>> Actually, that's not really correct, it is highly dependent on many
>> factors. If enough effort is expended on software, architecture, and
>> hardware, as in e.g. the Blue Gene systems, parallel execution can be
>> as deterministic as you wish. But your statement is true in many cases
>> -- just not all.
>>
>> ron
>

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

ron minnich

unread,

Apr 18, 2013, 12:42:30 PM4/18/13

to Devon H. O'Dell, Kevin Gillette, golang-nuts, Jens Alfke, ty...@torbit.com

On Mon, Apr 15, 2013 at 11:46 AM, Devon H. O'Dell <devon...@gmail.com> wrote:
>
> Maybe that's what you meant? Or maybe I'm just being too much of a
> pedantic jerk.
>

I think you're pointing out that achieving determinism in parallel
codes requires a lot more than just synchronization.

Not simple, in other words :-)

ron

John Nagle

unread,

Apr 18, 2013, 12:54:02 PM4/18/13

to golan...@googlegroups.com

On 4/15/2013 11:46 AM, Devon H. O'Dell wrote:
> 2013/4/15 Kevin Gillette <extempor...@gmail.com>:
>> Indeed, if any kind of synchronization occurs on shared-state access (and
>> that shared state is modified such that it is consistent and valid on each
>> access), then the system as a whole is deterministic, as well as each
>> independent computation being deterministic. Unless you're trying to create
>> a system that discovers the meaning of life by accident, or are using
>> qubits, then you really want to use synchronization, therefore making your
>> software, whether or not anything happens in parallel, fully deterministic.
>
> Synchronization doesn't guarantee determinism, although it might
> appear to.

Right. It is, though, quite possible to write deterministic
parallel programs. One of the common cases is to fan out
parts of the work to parallel tasks which share no data.
Then wait for them all to finish, and collect the results from
each task. Each parallel task is deterministic, and the final
results are deterministic.

On the other hand, you can get the same results by spawning
several tasks, each of which then requests disjoint work from a critical
section that protects shared data. When all work is completed,
the results are collected. This is nondeterministic while running
but eventually consistent. This is a common way to organize
big crunching jobs, and can be be more efficient than the
deterministic approach if not all blocks of work take the
same time.

Then there are true nondeterministic programs, where you
get different results each time.

John Nagle

Kevin Gillette

unread,

Apr 18, 2013, 3:34:42 PM4/18/13

to golan...@googlegroups.com, Kevin Gillette, Jens Alfke, ty...@torbit.com

I was treating it as: "In a deterministic system, a given input always results in
the same output". Anything that is not synchronized with a system will not be able to observe the determinism (so an uncaught panic passes out of synchronization, and thus determinism, though catching the panic, synchronizing, and then exiting, can provide a deterministic view of the system). Of course, a system can have layers of synchronization (such as you'd find in multi-stage concurrent batch processing or classical CSP data flows) -- even if they're part of the same eventually-synchronized whole, sibling subsystems will appear non-deterministic with respect to each other. "Same sequence of steps" with respect to an unsynchronized outside observer is very costly, and not useful in a correctly functioning system. Even in a buggy system, you often don't need information outside the crashed goroutine's own stack to find and fix bugs.

Reply all

Reply to author

Forward