I realize that closures would give access to the variables that wouldhave been available in the for's scope, but this is not the reason I
suggested that. I did that so each iteration would have its *own copy*
of each of those variables. Think of just a regular integer counting
for loop.
go func() {
for i:=0; i<10; i++ {
fmt.Printf("%d\n")
}
}
If that code segment actually prints out 0 through 9 then, well, you
got *lucky*.
Basically, each iteration of the loop does, in essence, have its own
parameters. It's just that in the normal for loop, they happen to be
stored in a reused variable. When the iterations are run in parallel,
you can't reuse that variable for obvious reasons. But we don't want
to make a copy of *every* variable, obviously. The "<all-variables-
declared-in-the-for-statement>" was my attempt to tease out exactly
which variables need to be copied.
This was exactly the problem which occured in my mind. Within a range
statement, the variables would be shared across the different
closures. I think this is not what we want if we want the body of the
for loop to be executed in parallel.
The next attempt:
gowait res := range db.Execute("SELECT * FROM MY_TABLE") {
.
.
.
mypkg.DoSomethingWithTheResult(res);
.
.
.
}
gowait does all: make goroutines out of the results of range, copy the
results proper, and wait until all results have been processed.
Then, a variant could be used if blocking is not wanted (for example,
the program doesn't care about the results anymore):
goasync res := range db.Execute....
We can still use channels if there is a need for communication after
all.
Sjoerd
sem := make(semaphore);
for i,xi := range x {
go func(i int) {
...
sem.Signal();
}(i);
}
sem.Wait(len(x));
Ryanne's semaphore
implementation uses space proportional to the semaphore count
(unsafe.Sizeof(struct{}) == 4).
and, perhaps more seriously, the kids won't
go away until they're waited for - it's easy to
imagine a situation where 100000 intermediate
goroutines are spawned but there are only 3
active routines left to wait for at the end.
Right, that was the idea behind chan struct {}, using a zero-size type rather than bool, since only the length of the channel matters. I still think it would be nice to have semaphores supported by the sync package tho.
- from my phone -
On Dec 8, 2009 11:51 AM, "atomly" <ato...@atomly.com> wrote:
I don't really see the need for a semaphore, but if you need one,
couldn't use easily build it in a go-like way:
type Semaphore struct {
sig channel bool;
}
func (self *Semaphore) Signal() {
sig<-true;
}
func (self *Semaphore) Wait(int n) {
for i := 0; i < n; i++ {
<-sig;
} } On Tue, Dec 8, 2009 at 12:11 PM, roger peppe <rogp...@gmail.com> wrote: > 2009/12/8 Ryanne D...
-- :: atomly :: [ ato...@atomly.com : www.atomly.com : http://blog.atomly.com/ ... [ atomiq recor...
I don't really see the need for a semaphore, but if you need one,
couldn't use easily build it in a go-like way:
type Semaphore struct {
sig channel bool;
}
func (self *Semaphore) Signal() {
sig<-true;
}
func (self *Semaphore) Wait(int n) {
for i := 0; i < n; i++ {
<-sig;
}
}
Roger,
I'd argue that semaphores are the original signal...the simplest message you can send. I'd bet you've used them an aweful lot wo realizing it. Anytime you send a message but don't actually inspect the message ( done <- true ) you are using the same kinda logic, right?
- from my phone -
On Dec 8, 2009 11:11 AM, "roger peppe" <rogp...@gmail.com> wrote:
2009/12/8 Ryanne Dolan <ryann...@gmail.com>:
> That is a shame! I hope that is a short-term problem. I don't see why > [N]struct{} should have ...
sometimes it's useful to divide by the size of a type
without checking for divide-by-zero.
i don't know if that's the case here though. (the
only place i could find was in the map code, which
is probably easy enough to change).
> I was gonna suggest to the group that semaphores be added > to the sync library, but then realize...
John,
I don't mean to pick on you (publicly? anonymously?). But your code is still strange to me. The stranger thing is that the golang.org examples seem to advocate your pattern. So maybe you are right, or at least more go-ish. I don't think you are optimal tho.
Consider in the matrix example that you have two input matrices and one output matrix. You never write to the inputs and the output writes are all independent. This is a data-parallel calculation. Channels aren't needed at all! All you need is to sync at the end (or some point in the future), which right now needs a channel (unless native semaphores are introduced at some point). Other than this one sync step, no sync on the data is required.
Assuming stack is faster than channel communication, i'd say a better way is to use closures like in my example.
An even better way would be to return a closure to call when you need the result of the multiply. Other examples in this thread do just that. This thunk-based lazy eval is probly the most parallel way to do a for-loop, tho I think it might have necessarily more overhead.
- from my phone -
On Dec 8, 2009 12:25 PM, "John Asmuth" <jas...@gmail.com> wrote:On Dec 8, 10:14 am, Ryanne Dolan <ryannedo...@gmail.com> wrote: > To me, abstracting away the cha...
John,
You are very right about the load balancing problem... which is why golang.org's load balancing examples are so strange to me. I don't see why you'd try to implement load balancing in Go. I totally agree that channels are the way to go (the only way maybe) to do this in Go, but doesn't the runtime load balance already? Under the hood, the runtime uses traditional sync methods to solve this problem on a grander scale. I think it is impossible to solve the problem efficiently in Go the language, but Go the runtime does it magically.
That said, I think using N goroutines and letting the runtime multiplex them to X procs is the only way to write efficient parallelism in Go.
I think this is a good thing, but I still wonder why golang.org leads us to believe we need to load balance ourselves. Am I missing something or is the example just really bad?
- from my phone -
On Dec 8, 2009 1:06 PM, "John Asmuth" <jas...@gmail.com> wrote:On Dec 8, 1:52 pm, Ryanne Dolan <ryannedo...@gmail.com> wrote: > I don't mean to pick on you (pub...
I know you didn't, I'm just teasing.
> But your code is still > strange to me. The stranger thing is that the golang.org examples seem t...
The channels are to ensure load balancing. Perhaps for parallel matrix
multiplication, it is unlikely one computation will take longer than
the other, since they have exactly the same number of steps. But in
general, the doStuff() method can take a varying amount of time to
complete. So if I want to split the work up into X (the number of
processors) threads, I can't just give each of those workers N/X
computations to do. I use the channel like a queue, so no worker is
left idle until there are no more jobs to pick up.
This is the kind of thing that I would expect a good implementation of
parallel for to do, and I don't think it is good to expect each
developer to write the same pattern each time. Either a "go for" or a
templated "parallel.For<T>(c chan T, func(t T))" is needed to allow
developers to write code that will load balance a for loop and not
have to worry about whether they're waiting on channels properly or
not.
> > Assuming stack is faster than channel communication, i'd say a better way is > to use closures...
I don't see using closures as sufficient for the load-balancing
question, unless I'm missing something.
> An even better way would be to return a closure to call when you need the > result of the multipl...
John,
You are very right about the load balancing problem... which is why golang.org's load balancing examples are so strange to me. I don't see why you'd try to implement load balancing in Go. I totally agree that channels are the way to go (the only way maybe) to do this in Go, but doesn't the runtime load balance already? Under the hood, the runtime uses traditional sync methods to solve this problem on a grander scale. I think it is impossible to solve the problem efficiently in Go the language, but Go the runtime does it magically.
That said, I think using N goroutines and letting the runtime multiplex them to X procs is the only way to write efficient parallelism in Go.
I think this is a good thing, but I still wonder why golang.org leads us to believe we need to load balance ourselves. Am I missing something or is the example just really bad?
Note that c += ... is not threadsafe. You really, really want to
communicate c over a channel to an aggregator.
You can easily implement that without semaphores. Here is some
untested code to do so. Note that I am encapsulating the body of the
loop as an arbitrary function that neither expects to receive nor
return arguments.
John,
Do I understand you right?
You mean that launching N goroutines on X procs is a bad idea b/c each goroutine may require a ton of memory. You say it would be better to launch less goroutines if you know you can't run more than X at a time anyway.
I'd agree to a point. In the current release, Go allocates a big stack with each goroutine even if the goroutine doesn't need one (I think), so there is a memory penalty for using many goroutines even when they don't allocate 10^6 integers. In many cases, the process of allocating the goroutine's stack may be expensive compared to what the goroutine actually _does_. In that regard, it is bad to use goroutines when you don't need them, especially with the current implementation. Agreed.
But your example is a little preposterous...
for i := 0; i < 1e6; i++ {data := make ([]huge, many);initialize (data);process (data, i);c += reduce (data)}No matter how you parallelize the above for-loop, it isn't a very good for-loop. Your proposed solution is to do the allocation only 20 times:for j := 0; j < 20; j++ {data := make ([]huge, many);initialize (data);for i := 0; i < 1e6; i++ {process (data);c += reduce (data);}}
... but that is still 20x worse that it should be. You are again describing a data-parallel for-loop (the subject of this thread). Any data-parallel computation doesn't require synchronization, which means:data := make ([]huge, many);initialize (data);for i := 0; i < 1e6; i++ {process (data, i);}reduce (data);.. is the proper implementation. I think the mistake that you and others keep making is to assume that Go doesn't allow shared memory. It is easy to assume that goroutines can only communicate thru channels, because the golang.org articles would have you believe that. But you can also communicate through closures, which are unsafe, fast, and on the stack. The unsafe part doesn't matter if you are doing data-parallel computation.
By the way, a smart compiler can figure out when a for-loop is actually type-2. Go's compiler isn't that smart yet (I don't think), but I'm sure it will be; if not, I volunteer for the patch! For example:
for _,xi := range array {....}Since the iteration index is never used, whatever is in '...' can very likely be thrown into a stackless goroutine automatically.
No matter how you parallelize the above for-loop, it isn't a very good for-loop.
Right, this is what John's code does also. I still hold that it usually isn't what you mean when you write a for-loop.
- from my phone -
On Dec 8, 2009 6:32 PM, "Ben Tilly" <bti...@gmail.com> wrote:On Tue, Dec 8, 2009 at 4:20 PM, SnakE <snake...@gmail.com> wrote: > 2009/12/9 Ryanne Dolan <ryann...
I disbelieve. I already posted a solution that allows this exact
Right, this is what John's code does also. I still hold that it usually isn't what you mean when you write a for-loop.
Awesome thanks. I wonder where I got that impression. Sorry for the blunder.
I guess small and big are relative tho. I'd hope for stackless goroutines in some cases. Or maybe it doesn't matter.
Ryanne
- from my phone -
On Dec 8, 2009 7:39 PM, "Ian Lance Taylor" <ia...@google.com> wrote:Ryanne Dolan <ryann...@gmail.com> writes: > In the current release, Go allocates a big stack wit...
2009/12/10 R D <rd1...@gmail.com>:
no, sorry - i was talking about my implementation of start().which is a race if start() is called outside of the main process.
your code has the unguarded:
kidsInAction++;
[...]
> Well - I think to write any code that is guaranteed to use space
> that's less than proportional to the number of goroutines spawned
> would require careful synchronization,
> it would need to wait for some to finish before spawning more.that's not unusual.
it's also quite common for goroutines to be started according
to some sporadic event - if the goroutines are short lived
and the events aren't too fast, then they won't accumulate
indefinitely, but there might be many millions over time.
this assumes that you know the maximum number
> sem := make(chan bool, maxKids);
of kids in advance. it's sometimes not possible to do so.
did you look at my concurrent tree example?