Runaway processes with multiple cores

106 views
Skip to first unread message

Chris Fonnesbeck

unread,
Jun 6, 2012, 12:42:44 PM6/6/12
to juli...@googlegroups.com
I've run into a problem with multiprocessing with Julia, using a build from GitHub master yesterday. If I start julia with 4 cores, three of them continue to run after computations have finished.

http://cl.ly/2l1Q0p162a3r243z0d3I 

Even closing the julia session leaves them running, and I have to kill them manually. 

Chris Fonnesbeck

unread,
Jun 6, 2012, 12:56:06 PM6/6/12
to juli...@googlegroups.com
Forgot to report, running on MacBook Air 11" with OS X 10.7.3.

Miguel Bazdresch

unread,
Jun 6, 2012, 1:18:10 PM6/6/12
to juli...@googlegroups.com
I tried with examples/queens.jl and can't reproduce this, on Ubuntu 10.04, x86_64.

Are you sure the program has finished running? When the "main" core (the one that owns the CLI) finishes execution, you will get the julia prompt back, even if the other cores have not finished executing yet.

-- mb

Chris Fonnesbeck

unread,
Jun 6, 2012, 2:06:06 PM6/6/12
to juli...@googlegroups.com
On Wednesday, June 6, 2012 12:18:10 PM UTC-5, Miguel Bazdresch wrote:
I tried with examples/queens.jl and can't reproduce this, on Ubuntu 10.04, x86_64.

Are you sure the program has finished running? When the "main" core (the one that owns the CLI) finishes execution, you will get the julia prompt back, even if the other cores have not finished executing yet.

-- mb

The calculations have finished, because they have printed the output from all the computations. Also, when I exit the REPL, three julia processes continue to crank. I have replicated this with two cores as well, so one julia-release-basic process continues to run.

Douglas Bates

unread,
Jun 6, 2012, 2:40:35 PM6/6/12
to juli...@googlegroups.com
Are you running the simplegibbs.jl code from the useRshootout repository, Chris?

I just tried that code again myself on an Ubuntu 12.04 system and got some weird behavior.  Sometimes the parallel version apparently ran the sampler but didn't print the timings from @elapsed.

I also got different timings from the Gibbs2 and Gibbs3 samplers than I had with an earlier version of Julia.  The Gibbs2 sampler (using the Rmath gamma and normal samplers but calling them directly with ccall) was faster (8.5 seconds versus 13.6 seconds previously) but Gibbs3 was slower (9.5 seconds versus 6.6 seconds previously).  I hope that Gibbs3 wasn't slowed down by changes I made to randg. :-)

Douglas Bates

unread,
Jun 6, 2012, 2:58:23 PM6/6/12
to juli...@googlegroups.com
On looking more closely at the code, I think that Gibbs3 was slowed down by my changes to randg although, in my defense, they were made because the previous version was wrong in some cases.  I think there is a way of avoiding that slowdown.

Of course, the main issue to address is why the parallel versions are not behaving as expected.
 

Miguel Bazdresch

unread,
Jun 6, 2012, 3:09:22 PM6/6/12
to juli...@googlegroups.com
From my (very limited) experience, the cleanest way to run code in parallel is to wrap function calls in an isinteractive() check, and use @spawn. To clarify, simplegibbs.jl (really, any code meant to be loaded in parallel) should have the following structure:

function dowork(args)
    .....
end

if isinteractive()
    for i in 1:nprocs()
         @spawn dowork(args)
    end
end

or similar.

-- mb

Douglas Bates

unread,
Jun 6, 2012, 3:38:48 PM6/6/12
to juli...@googlegroups.com
On Wednesday, June 6, 2012 2:09:22 PM UTC-5, Miguel Bazdresch wrote:
From my (very limited) experience, the cleanest way to run code in parallel is to wrap function calls in an isinteractive() check, and use @spawn. To clarify, simplegibbs.jl (really, any code meant to be loaded in parallel) should have the following structure:

function dowork(args)
    .....
end

if isinteractive()
    for i in 1:nprocs()
         @spawn dowork(args)
    end
end

or similar.
 
The code in question (described at http://dmbates.blogspot.com/) uses distributed arrays so spawning processes is a side-effect of the generation of local array slices.

Miguel Bazdresch

unread,
Jun 6, 2012, 10:44:08 PM6/6/12
to juli...@googlegroups.com
I'm going to really go out on a limb here, but my hunch is that each of the four julia processes created by Chris is running each sampler on its own set of arrays. So, you have four sets of distributed arrays.

In other words, the println() lines should be executed only once:
if isinteractive()
## Timings
println("JGibbs1: $([@elapsed JGibbs1(20000, 200) for i=1:10])")
println("JGibbs2: $([@elapsed JGibbs2(20000, 200) for i=1:10])")
println("JGibbs3: $([@elapsed JGibbs3(20000, 200) for i=1:10])")
println("dJGibbs3a: $([@elapsed dJGibbs3a(20000, 200) for i=1:10])")
end
My home machine crashes when running samplegibbs so I can't test my hunch at the moment, but it may be worth a shot.

-- mb

Chris Fonnesbeck

unread,
Jun 6, 2012, 10:56:53 PM6/6/12
to juli...@googlegroups.com
This is the simplegibbs.jl code from our repo. The problem is, the code ran fine a few weeks ago, but I had just updated Julia to the latest codebase, and this began happening, so I assumed it had something to do with a change in Julia, rather than a change in your model.

Viral Shah

unread,
Jun 10, 2012, 8:16:16 AM6/10/12
to juli...@googlegroups.com
I guess it is time that we add some parallel tests, starting with simplegibbs.

-viral
Reply all
Reply to author
Forward
0 new messages