[pymc] Parallelizing evaluation of deterministics?

72 views
Skip to first unread message

John D. Chodera

unread,
May 15, 2010, 2:41:22 PM5/15/10
to PyMC
Hello!

First off, let me thank the PyMC developers and community for
producing such a wonderfully powerful tool. I've made use of it in
several projects already, and it's simplified my life greatly!

Now, a question:

I have a model consisting of ~ 1000 deterministics that depend on ~ 20
stochastics sampled by MCMC. These deterministics feed into some
observed stochastics through a normal error model. The problem is
that computing each deterministic for a new set of parent stochastics
is a nontrivial expense, consuming a few seconds of CPU time. As a
result, every iteration takes several-to-many minutes.

Would it be possible for me to make some relatively small changes that
distribute the updating of just the deterministics over multiple
processor cores? For example, I was considering the use of the
'multiprocessing' or 'parallel python' packages to do this.
Relatively little information is needed to evaluate the deterministic
-- just the parent stochastics and a small amount of auxiliary
information (different for each deterministic).

I've poked around the code to see where the updating of the
deterministics is done. The model is contained in a dict() object, so
I'm guessing this updating happens during the calls to get_value() for
the associated container, but the looping over stored elements to call
get_value -- the part I presumably want to parallelize -- appears to
occur in DCValue.run() in Container_values.pyx. Is that correct? Has
anyone experimented with this sort of thing previously?

Thanks!

John

--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To post to this group, send email to py...@googlegroups.com.
To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pymc?hl=en.

Chris Fonnesbeck

unread,
May 15, 2010, 10:44:29 PM5/15/10
to PyMC
Hi John,

Thank you for the feedback. One of our developers has begun looking at
at least one approach to parallelization, though I am not aware of the
current state of development int his regard. My recommendation to you,
if you want to tinker, is to go onto GitHub and create a fork of PyMC
of your own; its really easy to do this in GitHub, if you are not
already familiar with the system. Then, if you end up doing something
particularly cool that we want to integrate into the main branch, we
can do this with minimal fuss.

Regards,
Chris

John D. Chodera

unread,
May 15, 2010, 11:19:56 PM5/15/10
to PyMC
Thanks, Chris!

Forking should be straightforward; I'm currently a bit more concerned
with figuring out which part of the code exactly I need to target for
parallelization, and whether this requires replacing part of a Pyrex
module (e.g. Container_values.pyx) with a pure Python module to take
advantage of 'multiprocessing' or 'parallel python'. I'm hoping
someone has a bit more information on which part of the PyMC codebase
I should look at in detail.

Thanks!

John

On May 15, 7:44 pm, Chris Fonnesbeck <fonnesb...@gmail.com> wrote:
> Hi John,
>
> Thank you for the feedback. One of our developers has begun looking at
> at least one approach to parallelization, though I am not aware of the
> current state of development int his regard. My recommendation to you,
> if you want to tinker, is to go onto GitHub and create a fork of PyMC
> of your own; its really easy to do this in GitHub, if you are not
> already familiar with the system. Then, if you end up doing something
> particularly cool that we want to integrate into the main branch, we
> can do this with minimal fuss.
>
> Regards,
> Chris

David Huard

unread,
May 16, 2010, 9:05:48 PM5/16/10
to py...@googlegroups.com
Hi John,

Anand and I have looked at parallelizing the code from two different
ends. Anand tried to parallelize at log probabilities scale, and I at
the sampler scale. For what you want to do, I think Anand's experience
will be more relevant, that is, split the set of log computations and
scatter them across processes. I got a basic parallel sampler working
in the sandbox, but the database support is incomplete.

Is it possible for you to merge all deterministics in a vector and
compute its value once ? This might reduce the overhead.

In any case, please go ahead and give it a go, I think everyone here
would be thrilled if someone could get that done one way or another.

Cheers,

David

Anand Patil

unread,
May 17, 2010, 4:48:32 AM5/17/10
to py...@googlegroups.com
Hi all,

If each of the deterministics is taking several seconds, there's a very good chance you'll see a speedup by parallelizing them. However, you can probably get away with changing the deterministics' internal functions rather than the entire object model.

If you can write the functions in Fortran or C, then you can use http://github.com/pymc-devs/pymc/blob/master/pymc/gp/cov_funs/brownian.py as a template. The function fills in the entries of a matrix in stripes, and each stripe is handled by a different thread. Because the subroutine 'brownian' in http://github.com/pymc-devs/pymc/blob/master/pymc/gp/cov_funs/isotropic_cov_funs.f uses 'cf2py threadsafe', each thread releases the GIL on entry into Fortran and re-acquires it on exit. That means the threads can actually do their Fortran work at the same time.

Even if you can't write the internal functions in Fortran or C, you can probably mimic the same setup using processes in place of threads.

These relatively simple strategies work because your situation is bread-and-butter loop-level parallelism. I had previously tried to come up with something that would distribute any old analysis across multiple cores, which did require changing the object model. Unfortunately, it didn't work well because of all the overhead.

Anand

Anand Patil

unread,
May 17, 2010, 4:54:05 AM5/17/10
to py...@googlegroups.com
On Mon, May 17, 2010 at 9:48 AM, Anand Patil <anand.prab...@gmail.com> wrote:
Hi all,

If each of the deterministics is taking several seconds, there's a very good chance you'll see a speedup by parallelizing them. However, you can probably get away with changing the deterministics' internal functions rather than the entire object model.

If you can write the functions in Fortran or C, then you can use http://github.com/pymc-devs/pymc/blob/master/pymc/gp/cov_funs/brownian.py as a template. The function fills in the entries of a matrix in stripes, and each stripe is handled by a different thread. Because the subroutine 'brownian' in http://github.com/pymc-devs/pymc/blob/master/pymc/gp/cov_funs/isotropic_cov_funs.f uses 'cf2py threadsafe', each thread releases the GIL on entry into Fortran and re-acquires it on exit. That means the threads can actually do their Fortran work at the same time.

Even if you can't write the internal functions in Fortran or C, you can probably mimic the same setup using processes in place of threads.

These relatively simple strategies work because your situation is bread-and-butter loop-level parallelism. I had previously tried to come up with something that would distribute any old analysis across multiple cores, which did require changing the object model. Unfortunately, it didn't work well because of all the overhead.


Oh- the parallelization would probably have to be done after implementing David's suggestion of aggregating the deterministics. So you'd take n deterministics that each return scalars and combine them to make a single deterministic that returns a vector, and then you could think about distributing the n independent internal tasks.

Anand
Reply all
Reply to author
Forward
0 new messages