parallel testing of sympy with ipython

22 views
Skip to first unread message

Ondrej Certik

unread,
Mar 19, 2009, 3:52:15 AM3/19/09
to IPython Development list, sy...@googlegroups.com
Hi,

Brian helped me fix my problems with ipython and it then was super
easy to create parallel testing for sympy. Currently in my branch
here:

http://github.com/certik/sympy/tree/test2

e.g. start the cluster:

$ ipcluster local -n 8

Install sympy:

$ git clone git://github.com/certik/sympy.git
$ cd sympy
$ git checkout -b test2 origin/test2
$ python setup.py install --home=~/lib

(make sure ~/lib/lib/python is in your PYTHONPATH)

test in parallel:

$ python t.py
number of tests: 1376
distributing jobs
collecting results
processor: 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . F . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
________________________________________________________________________________
File "/home/ondrej/lib/lib/python/sympy/mpmath/tests/test_basic_ops.py",
line 115, in test_complex_misc
assert mpc(2+1e-15j).ae(2)
File "/home/ondrej/lib/lib/python/sympy/mpmath/tests/test_basic_ops.py",
line 115, in test_complex_misc
assert mpc(2+1e-15j).ae(2)


processor: 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
processor: 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
processor: 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . F . . . . . exceptions
________________________________________________________________________________
File "/home/ondrej/lib/lib/python/sympy/utilities/tests/test_lambdify.py",
line 97, in test_mpmath_lambda
assert -prec < f(mpmath.mpf("0.2")) - sin02 < prec
File "/home/ondrej/lib/lib/python/sympy/utilities/tests/test_lambdify.py",
line 97, in test_mpmath_lambda
assert -prec < f(mpmath.mpf("0.2")) - sin02 < prec


processor: 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
processor: 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
processor: 6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions
processor: 7
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . exceptions

All tests pass, except 4 mpmath tests, but I suspect it could be some
bug in mpmath.

Compare the timing:

Sequential:

$ time bin/test sympy
[...]
real 1m4.295s
user 1m3.872s
sys 0m0.332s

Parallel:

$ time python t.py
[..]
real 0m10.206s
user 0m0.308s
sys 0m0.064s

And this doesn't do any load balancing, so in the late phase (last 2s)
it is basically just computing on one processor only. I am looking
forward when I run this on our UNR cluster, I guess all sympy tests
could be done in just couple seconds.

Now some thoughts about load balancing:

It starts here:

http://github.com/certik/sympy/blob/55de47b6f0a7bee01249fc24c03e5567695c4569/t.py#L50

I create a tasks dictionary, which contains processors ids as keys and
sympy tests numbers as values (those tests should be executed on that
processor), where one test is just one testfunction in a file (e.g.
there is almost 1400 tests).

Then I distribute the tests on the processors here using nonblocking
mec.execute:

http://github.com/certik/sympy/blob/55de47b6f0a7bee01249fc24c03e5567695c4569/t.py#L64

And finally I collect the results:

http://github.com/certik/sympy/blob/55de47b6f0a7bee01249fc24c03e5567695c4569/t.py#L69

and report the results to the user. This should of course be
integrated into our testing framework, so that it looks exactly the
same. The goal should be that one would just use

bin/test

or

bin/test -j8

but otherwise the output doesn't change, only it will be 8x faster.

Ondrej

Ondrej Certik

unread,
Mar 19, 2009, 3:56:44 AM3/19/09
to IPython Development list, sy...@googlegroups.com
> Now some thoughts about load balancing:

[...]

> And finally I collect the results:
>
> http://github.com/certik/sympy/blob/55de47b6f0a7bee01249fc24c03e5567695c4569/t.py#L69
>
> and report the results to the user. This should of course be

I forgot to ask the question:

how could this be made more efficient using some native ipython parrallel tools?

Ondrej

Ondrej Certik

unread,
Mar 19, 2009, 4:09:36 AM3/19/09
to IPython Development list, sy...@googlegroups.com
On Thu, Mar 19, 2009 at 12:56 AM, Ondrej Certik <ond...@certik.cz> wrote:
[...]

> I forgot to ask the question:
>
> how could this be made more efficient using some native ipython parrallel tools?

And last question:

Currently all the engines must be able to import sympy and I noticed
it is not even enough to install sympy after each change, but the
ipcluster must be restarted as well, otherwise the old version of
sympy stays in the memory and it fails.

What is the best way to transfer the library (sympy) and tests to the
engines automatically to make the edit/run/debug cycle easier?

And what about a big cluster --- wouldn't it be handy to just tell
ipython: take this library and make it available on the engines,
without me requiring to install it manually and then restarting the
ipcluster?

Ondrej

Fredrik Johansson

unread,
Mar 19, 2009, 4:27:11 AM3/19/09
to sy...@googlegroups.com
On Thu, Mar 19, 2009 at 8:52 AM, Ondrej Certik <ond...@certik.cz> wrote:
> All tests pass, except 4 mpmath tests, but I suspect it could be some
> bug in mpmath.

I suspect this is just due to tests being run out of the expected
order. Probably, adding "mp.dps = 15" at the top of each failing test
will fix it. I should add a test decorator that does this
automatically.

> Compare the timing:

Nice!

Fredrik

Ondrej Certik

unread,
Mar 19, 2009, 5:14:55 AM3/19/09
to sy...@googlegroups.com

Yes, this is exactly it. Excellent, so the fix is easy.

Btw in fact there were also couple failures in secondquant.py, that I
fixed by this patch:

http://code.google.com/p/sympy/issues/detail?id=1331

please review.

Ondrej

Vinzent Steinberg

unread,
Mar 19, 2009, 9:47:14 AM3/19/09
to sympy
You can tell Python to import/reload a module using an explicit path,
for example:

module = __import__(name)

So ideally you only pass the path to the engines.

Vinzent

Ondrej Certik

unread,
Mar 19, 2009, 11:43:37 AM3/19/09
to sy...@googlegroups.com
>> And what about a big cluster --- wouldn't it be handy to just tell
>> ipython: take this library and make it available on the engines,
>> without me requiring to install it manually and then restarting the
>> ipcluster?
>
> You can tell Python to import/reload a module using an explicit path,
> for example:
>
> module = __import__(name)
>
> So ideally you only pass the path to the engines.

You are right, that should actually work. Or we can set sys.path.

Ondrej

Brian Granger

unread,
Mar 19, 2009, 1:29:19 PM3/19/09
to sy...@googlegroups.com
Yes, we do have built-in dynamic load balancing. Check out the following:

client.TaskClient
client.StringTask
client.MapTask

All of this is dynamically load balanced. Here is how I would approach this:

1. Use MultiEngineClient to setup all the basic imports and variables
that all the engines need.

2. Create a TaskClient and then submit many StringTasks/MapTasks to
the TaskClient.

The only thing to note is that currently, the TaskClient has slightly
more overhead than the MultiEngineClient, so you want to make sure
that each tasks lasts long enough to be worth it. If your tasks are
really short and you are not seeing good speedup, you may want each
StringTask/MapTask instance handle a small batch of actual tasks.

But all of this should "Just Work"

Right now the best documentation for this is in the doc strings:

http://ipython.scipy.org/doc/nightly/html/api/generated/IPython.kernel.taskclient.html

http://ipython.scipy.org/doc/nightly/html/api/generated/IPython.kernel.task.html#IPython.kernel.task.StringTask

http://ipython.scipy.org/doc/nightly/html/api/generated
/IPython.kernel.task.html#IPython.kernel.task.MapTask

If these task types don't work for you, it is also possible to define
new task types.

Cheers,

Brian

On Thu, Mar 19, 2009 at 12:56 AM, Ondrej Certik <ond...@certik.cz> wrote:
>

Ondrej Certik

unread,
Mar 19, 2009, 3:58:01 PM3/19/09
to sy...@googlegroups.com
On Thu, Mar 19, 2009 at 10:29 AM, Brian Granger <elliso...@gmail.com> wrote:
>
> Yes, we do have built-in dynamic load balancing.  Check out the following:
>
> client.TaskClient
> client.StringTask
> client.MapTask
>
> All of this is dynamically load balanced.  Here is how I would approach this:
>
> 1. Use MultiEngineClient to setup all the basic imports and variables
> that all the engines need.
>
> 2. Create a TaskClient and then submit many StringTasks/MapTasks to
> the TaskClient.
>
> The only thing to note is that currently, the TaskClient has slightly
> more overhead than the MultiEngineClient, so you want to make sure
> that each tasks lasts long enough to be worth it.  If your tasks are
> really short and you are not seeing good speedup, you may want each
> StringTask/MapTask instance handle a small batch of actual tasks.
>
> But all of this should "Just Work"
>
> Right now the best documentation for this is in the doc strings:
>
> http://ipython.scipy.org/doc/nightly/html/api/generated/IPython.kernel.taskclient.html
>
> http://ipython.scipy.org/doc/nightly/html/api/generated/IPython.kernel.task.html#IPython.kernel.task.StringTask
>
> http://ipython.scipy.org/doc/nightly/html/api/generated
> /IPython.kernel.task.html#IPython.kernel.task.MapTask
>
> If these task types don't work for you, it is also possible to define
> new task types.

Thanks Brian, that should do it. Yes, we have ~1400 of small tests, so
I can group them in larger groups as you suggested.

One more beginner question, as I am new to cluster computing. :) After
I install all the necessary dependencies on our UNR cluster, and then
submit a job that executes ipcluster (using mpirun), then I will have
an ipcluster running. In order to connect to it, I can submit another
job that will be my script (e.g. it imports ipython and all should be
fine). But if I wanted to interactively deal with the ipcluster from
the ipython session --- do you think there is some way? Because that
would really rock. I guess it depends on how the cluster is built,
which I don't know until I try it. I thought that maybe if I run
ipython on the master node that it could interact with the ipcluster
--- but maybe this is forbidden, in which case it will not be possible
to interact with it. When you run ipython on some big cluster, are you
able to actually interact with it?

Ondrej

Brian Granger

unread,
Mar 19, 2009, 4:13:18 PM3/19/09
to sy...@googlegroups.com
Ondrej,

I need to take my kids somewhere but the answer to your question is
*yes absolutely*, you can interact with a controller and engine
running on a cluster somewhere from an ipython session on your laptop
sitting in starbucks. This is *the* main focus of our parallel stuff
in ipython. More details to follow later.

Cheers,

Brian

Ondrej Certik

unread,
Mar 22, 2009, 2:01:37 AM3/22/09
to sy...@googlegroups.com
On Thu, Mar 19, 2009 at 1:13 PM, Brian Granger <elliso...@gmail.com> wrote:
>
> Ondrej,
>
> I need to take my kids somewhere but the answer to your question is
> *yes absolutely*, you can interact with a controller and engine
> running on a cluster somewhere from an ipython session on your laptop
> sitting in starbucks.  This is *the* main focus of our parallel stuff
> in ipython.  More details to follow later.

I figured the rest of the details:

http://mail.scipy.org/pipermail/ipython-dev/2009-March/005065.html

And indeed, it absolutely rocks. It just haven't occured to me that I
can play with the whole cluster interactively.

Ondrej

Reply all
Reply to author
Forward
0 new messages