parallel processing problems

207 views
Skip to first unread message

Westley Hennigh

unread,
Apr 25, 2013, 2:41:27 PM4/25/13
to juli...@googlegroups.com
I think I have an example which pretty clearly illustrates a problem I've been having involving parallel computing. I need to get this working, so I'm going to be trying to figure out what's going on, but I would appreciate help from anyone similarly stuck :)

Start with an array of 100,000 random numbers.
a = rand(100000)

Cool. Now, on one core, I can blast through those and map them to between 0 and 100.
julia> @elapsed for x in a
         x = x * 100
       end
0.017558935

Just for sanity sake you can add a print statement in the loop if you want. That way you know it's not getting optimized out.
Anyway, what happens when I add a second core? Let's do a pmap and compare.
julia> addprocs(1)
:ok

julia> @elapsed pmap((x)->x * 100, a)
19.985164725

What the heck? "Well," you say, "maybe 100,000 numbers isn't enough to see gains by using multiple cores."
Maybe so, but that's not the primary source of my consternation. Consider that I can generate a random array on core 2 and pull it over to core 1 lickety-split.
julia> @elapsed fetch(@spawnat 2 rand(100000))
0.425517213

Also, in an effort to compare apples to apples, we can run pmap with only one core.
julia> @elapsed pmap((x)->x * 100, a)
3.844994459

A little overhead, but not too bad.
I haven't done enough testing to say if it's just pmap, I figured I would post first and see if anyone had any insight.

Thanks!




Jameson Nash

unread,
Apr 25, 2013, 3:05:15 PM4/25/13
to juli...@googlegroups.com
It's not a matter of the size of the problem, it is an issue of breaking up the problem very poorly. With only 1 client processor, the second example is still doing the whole thing in serial, but now it has to transmit and receive the code and data for the entire function call, instead of just multiplying the numbers.

pmap is intended to be used for a small to moderate number of computationally expensive operations (as seen by the low overhead for combining everything in one fetch operations). there is an @parallel for loop reduction construction for doing map-reduce style operations

alternatively, there is DArray for doing this example without any special code other than vector operations and @spawn

Adam Savitzky

unread,
Apr 25, 2013, 8:18:39 PM4/25/13
to juli...@googlegroups.com
It seems like there is something fundamentally wrong with the way parallel computing is working right now. I can see things being a little bit slower because of overhead, but the order of magnitude is way off.

Westley Hennigh

unread,
May 3, 2013, 12:06:08 AM5/3/13
to juli...@googlegroups.com
Thanks Jameson, you're totally right, sorry for the trouble. Each different thing I tried was broken in one way or another and I became overly suspicious. 
Reply all
Reply to author
Forward
0 new messages