Need help writing parallel code with @sync and @everywhere

387 views
Skip to first unread message

Daniel Carrera

unread,
Jun 17, 2015, 3:23:13 AM6/17/15
to julia...@googlegroups.com
Hello,

I have been having a lot of trouble figuring out how to write parallel code in Julia. Consider this toy example of a serial program:

    N = 5
    for i = 1:N
        for j = (i + p):N
            println(i * j)
        end
    end


Now, suppose that "i * j" is an expensive operation, so I want to compute those values in parallel and print them later. Here is my (failed) attempt at doing that:

    N = 5
    tmp = SharedArray(Int, (N))
   
    for i = 1:N
        # ----------------------- #
        # Compute tmp in parallel #
        # ----------------------- #
        @sync begin
            @everywhere begin
                p = myid()
                np = nprocs()
                for j = (i + p):np:N
                    tmp[j] = i * j
                end
            end
        end
       
        # --------------------- #
        # Consume tmp in serial #
        # --------------------- #
        for j = (i + 1):N
            println(tmp[j])
        end
    end



So, my idea is to have a shared array with N elements where I store some of the "i * j" calculations in parallel. Once everyone is finished, I consume the results and repeat the loop. However, I get an error saying that "i" is not visible inside the @everywhere block:

julia> nprocs()
4

julia> foo_parallel()
exception on 1: ERROR: i not defined
 in anonymous at /home/sigrid/Daniel/Science/VENUS/venus.jl:303
 in eval at /build/buildd/julia-0.3.8-docfix/base/sysimg.jl:7
 in anonymous at multi.jl:1310
 in run_work_thunk at multi.jl:621
 in run_work_thunk at multi.jl:630
 in anonymous at task.jl:6
 ... many lines ...



So, apparently the workers in the @everywhere block cannot see outside variables. What can I do? ... One thing I do not want to do is turn "tmp" into an N x N matrix. That would make "tmp" very large for large N, and in my real problem there are other outside variables that I want to use.

Help?

Cheers,
Daniel.

Nils Gudat

unread,
Jun 17, 2015, 4:28:37 AM6/17/15
to julia...@googlegroups.com
I haven't used @everywhere in combination with begin..end blocks, I usually pair @sync with @parallel - see an example here, where I've parallelized the entire nested loop ranging from lines 25 to 47.

Daniel Carrera

unread,
Jun 17, 2015, 5:49:58 AM6/17/15
to julia...@googlegroups.com

On Wednesday, 17 June 2015 10:28:37 UTC+2, Nils Gudat wrote:
I haven't used @everywhere in combination with begin..end blocks, I usually pair @sync with @parallel - see an example here, where I've parallelized the entire nested loop ranging from lines 25 to 47.


Aha! Thanks. Copying your example I was able to produce this:

    N = 5
    tmp = SharedArray(Int, (N))
    
    for i = 1:N
        # Compute tmp in parallel #
        @sync @parallel for j = (i + 1):N
            tmp[j] = i * j
        end
        
        # Consume tmp in serial #
        for j = (i + 1):N
            println(tmp[j])
        end
    end


This seems to work correctly and gives the same answer as the serial code. Can you help me understand how it works? What does "@sync @parallel" do? I feel like I half-understand it, but the concept is not clear in my head.

Thanks.

Daniel.

David Gold

unread,
Jun 17, 2015, 8:22:08 AM6/17/15
to julia...@googlegroups.com
Have you tried macroexpanding the expression? Doing so yields

julia> macroexpand(:( for i = 1:N
                         
@sync @parallel for j = (i + 1):N  
                              tmp
[j] = i * j  
                         
end
                     
end ))

:(for i = 1:N # line 2:
       
begin  # task.jl, line 342:
           
Base.sync_begin() # line 343:
           
#6#v = begin  # multi.jl, line 1487:
                   
Base.pfor($(Expr(:localize, :(()->begin  # expr.jl, line 113:
           
begin  # multi.jl, line 1460:
               
function (#7#lo::Base.Int,#8#hi::Base.Int) # multi.jl, line 1461:
                   
for j = (i + 1:N)[#7#lo:#8#hi] # line 1462:
                       
begin  # line 3:
                            tmp
[j] = i * j
                       
end
                   
end
               
end
           
end
       
end))),Base.length(i + 1:N))
               
end # line 344:
           
Base.sync_end() # line 345:
           
#6#v
       
end
   
end)


It looks like @parallel does the work of setting up a properly formatted call to Base.pfor. In particular, it builds an Expr object with head :localize and argument a zero-arg anonymous function, and then passes the interpolation of that expression along with `Base.length(i + 1:N)` to Base.pfor. The body of the anonymous function declares another function with arguments `#7#lo`, `#8#hi`. The latter variables somehow annotate the delimiters of your inner loop, which gets reproduced inside the body of the declared function. I'm *guessing* that the anonymous function is used as a vehicle to pass the code of the annotated inner loop to Base.pfor without executing it beforehand. But I could be wrong.


Then @sync just wraps all the above between calls to `Base.sync_begin` and `Base.sync_end`.


I also should note I have zero experience with Julia's parallel machinery and am entirely unfamiliar with the internals of Base.pfor. I just enjoy trying to figure out macros.

David Gold

unread,
Jun 17, 2015, 8:25:56 AM6/17/15
to julia...@googlegroups.com
Actually, it seems that @sync is also responsible for setting the variable #6#v equal to the return object of the call to Base.pfor and then returning #6#v after calling Base.sync_end().

Avik Sengupta

unread,
Jun 17, 2015, 8:42:24 AM6/17/15
to julia...@googlegroups.com

So again, this is an informal description... In particular, my nomenclature is not precise... 

So basically, an @parallel is a construct which will take the work to be done in each iteration of a for loop, and will farm them out to available remote processors, all at once. This will happen asynchronously, which means that all these jobs will be started without waiting for any of them to finish. You then want to wait for all the jobs to complete before going on the the "Consume tmp" stage. Hence you put an @async around this, to wait for all the parallel tasks to complete. 

Hope this makes it a little more understandable. I realise this does not help in designing a parallel system from scratch, but that is a much longer story. 

Note that with "tmp" being a shared array, this code will work only when all julia processes are in a single physical machine. 

Also, the @parallel construct is most useful when you combine a reduction operator with the for loop. 

Hope this helps
-
Avik

Daniel Carrera

unread,
Jun 17, 2015, 8:44:05 AM6/17/15
to julia...@googlegroups.com
Thanks. I didn't know about macroexpand(). To me macros often feel like black magic.
--
When an engineer says that something can't be done, it's a code phrase that means it's not fun to do.

Daniel Carrera

unread,
Jun 17, 2015, 8:44:53 AM6/17/15
to julia...@googlegroups.com
Wait.... #6#v is the name of a variable? How is that possible?

David Gold

unread,
Jun 17, 2015, 9:12:18 AM6/17/15
to julia...@googlegroups.com
My wording above is not entirely accurate, since you yourself can't set `#6#v` as a variable. It's a product of how local variable names are parsed when macros are expanded:

julia> macro foo()
           quote
               v
= 5
               
return v
           
end
       
end

julia
> macroexpand(:( @foo ))
quote  
# none, line 3:
   
#15#v = 5 # line 4:
   
return #15#v
end


This prevents variables declared within the body of a macro definition from interacting with identically named variables declared outside the definition. Here is the definition of `@sync`, which is very similar to the toy example above: https://github.com/JuliaLang/julia/blob/e97588db65f590d473e7fbbb127f30c01ea94995/base/task.jl#L340 

`gensym()` serves a similar purpose when one is building expressions within the body of a regular function:

julia> v = gensym("v")
symbol
("##v#8970")

julia
> v
symbol
("##v#8970")


Reply all
Reply to author
Forward
0 new messages