configuring multiprocessing?

344 views
Skip to first unread message

EBo

unread,
Aug 27, 2012, 1:53:23 AM8/27/12
to Deap users
Sorry if this is a little off topic but...

I'm hunting around for documentation/examples on how to set up Python's
multiprocessing pool so that it runs my DEAP subtasks on Beowulf
node/cores and not on the headnode cores. I still of course want the
main DEAP program to run on the head node and collate the results.

I have not had any luck finding any documentation and/or examples so
far. Any suggestions would be most appreciated.

Thanks and best regards,

EBo --

Marc-André Gardner

unread,
Aug 27, 2012, 2:11:58 AM8/27/12
to deap-...@googlegroups.com
Hi,

Multiprocessing does not seem to be the right tool for your usage. Although there is some kind of remote control in the multiprocessing module, the introduction of its doc clearly states that "the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine." Its main goal was to avoid the issues caused by the Global Interpreter Lock (GIL), which dramatically reduces efficacity of threading in Python.

In your case, as there is no way to use multiprocessing.Pool with remote systems, I can see two solutions :

- Use the multiprocessing.Manager and its networking capabilities (http://docs.python.org/library/multiprocessing.html#using-a-remote-manager) to send the data to a remote node, which will evaluate it and return the results through the same interface. However, it implies that you wrote your own wrapper code, which will allow the reception of the data and the return of the results on your compute nodes (multiprocessing.Manager will only handle the data transfer, not the spawning or the data interpretation).

- Use another more convenient tool, like Scoop (http://code.google.com/p/scoop/). It is a framework which handles all communication opening, process spawning, error routing, scaling up to hundreds of cores, etc., by exposing a very similar API than multiprocessing (actually, it is the same as the new concurrent.futures in Python 3, but it is almost the same as the multiprocessing.Pool API). So, it's true that you have to install some tools (in this case, ZeroMQ) to make it work, but I think it's worth it if you plan to do a lot of work with DEAP, as it clearly brings down the complexity of the remote parallelization to a toy usable by a ten years old kid. Also the developers are from the same lab than the team who produces DEAP, so you will not experience integration problems with DEAP... They also have their own mailing list if you have any issue with Scoop, and they provide an example of DEAP integration.

In the meantime, if you experienced any issue with DEAP itself, do not hesitate to ask.

Marc-André

EBo

unread,
Aug 27, 2012, 3:21:00 AM8/27/12
to deap-...@googlegroups.com
Marc-André,

Thank you for your most informative answer. I will read up on Scoop
and start playing with it as time allows. I looked briefly at
multiprocessing.Manager, but thought I would ask the list before jumping
into that...

Also, DEAP has been working wonderfully for us over the last 8 months.
Soon I should write a paper on what we are doing and why.

This whole exercise was brought about by automating a model's
calibration -- which can take 10's of thousands of trials, and each
trial is creeping up on the 1 to 2 hours each. As is, DEAP has allowed
me to fully utilize a 24 core system, but I really need to bump it up by
a factor of at least 4 if not 20. There are also serious performance
hits going on where each trial is reading in up to 100 GB of input data
for each trial. There are a number of things which can be done to make
that more efficient, but at the moment I am looking at getting it to run
on the available clusters.

As a note, I broke the processing up so that the task returns about 20
parameters back to the originating program, so once the program is
spawned (whether on a separate core or across a network) the
communication is typically 100 to 250 byte blocks at the beginning and
end. So, the message passing part is quite efficient.

Thanks again,

EBo --

On Sun, 26 Aug 2012 23:11:58 -0700 (PDT), Marc-André Gardner wrote:
> Hi,
>
> Multiprocessing does not seem to be the right tool for your usage.
> Although
> there is some kind of remote control in the multiprocessing module,
> the
> introduction of its doc clearly states that "the
>
> multiprocessing<http://docs.python.org/library/multiprocessing.html#module-multiprocessing>module

EBo

unread,
Aug 27, 2012, 9:55:28 AM8/27/12
to deap-...@googlegroups.com
On Sun, 26 Aug 2012 23:11:58 -0700 (PDT), Marc-André Gardner wrote:
>
> ...
>
> - Use another more convenient tool, like Scoop
> (http://code.google.com/p/scoop/). It is a framework which handles
> all
> communication opening, process spawning, error routing, scaling up to
> hundreds of cores, etc., by exposing a very similar API than
> multiprocessing (actually, it is the same as the new
> concurrent.futures in
> Python 3, but it is almost the same as the multiprocessing.Pool API).
> So,
> it's true that you have to install some tools (in this case, ZeroMQ)
> to
> make it work, but I think it's worth it if you plan to do a lot of
> work
> with DEAP, as it clearly brings down the complexity of the remote
> parallelization to a toy usable by a ten years old kid. Also the
> developers
> are from the same lab than the team who produces DEAP, so you will
> not
> experience integration problems with DEAP... They also have their own
> mailing list if you have any issue with Scoop, and they provide an
> example
> of DEAP integration.

Marc-André,

I must say that that was by far the easiest upgrade I have probably
ever done. I still need to build and test it on the cluster, but on my
development machine it took about 10 minutes to upgrade and test.

Thanks again!

EBo --

Reply all
Reply to author
Forward
0 new messages