Parallel Processing

820 views
Skip to first unread message

Alan Bromborsky

unread,
Apr 25, 2014, 12:13:01 PM4/25/14
to sympy
Has anyone had any experience in using parallel python with sympy. It
seems to me that there are probably a lot of loops that have independent
operations.

F. B.

unread,
Apr 27, 2014, 6:48:58 AM4/27/14
to sy...@googlegroups.com
I am not very expert of parallel processing in Python, but as no one else is answering, and supposing you mean multithreading, I'll point out the problem concerned with multithreading in Python.

Python's main implementation, CPython, has a Global Interpreter Lock (GIL), which forbids the interpreter to create new threads. When you use modules such as threads, CPython is still using a single thread which emulates multithreading. I don't believe you will get some speedup, that threading support looks more like a control flow on the code.

http://en.wikipedia.org/wiki/Global_Interpreter_Lock

Multiprocessing should be able to fork an independent process on the CPU, but unfortunately a new process requires the memory space of the previous process to be cloned as well, which may be not very comfortable. IPython supports cluster distribution of computations on many computers.

PyPy should have a better support for multithreading, as far as I know.

Alan Bromborsky

unread,
Apr 27, 2014, 8:01:43 AM4/27/14
to sy...@googlegroups.com
No I mean parallel python http://www.parallelpython.com/ which gets around the problem of the GIL.
--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/455f0bb6-5e0e-433c-81af-d68ab1da41ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Meurer

unread,
Apr 27, 2014, 8:37:25 PM4/27/14
to sy...@googlegroups.com
I've never heard of that before. I guess you can try importing sympy
and running the tests in it, and that will tell you what works and
what doesn't, at least as a start.

Aaron Meurer
> https://groups.google.com/d/msgid/sympy/535CF1A7.3040603%40verizon.net.

Stefan Krastanov

unread,
Apr 29, 2014, 2:29:00 AM4/29/14
to sy...@googlegroups.com
There is also the ipython cluster which seems quite similar to parallelpython.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6K2o8OXJX7temz1m8m8eB7XsAa9-O2HWn4s1r4bQOVQMQ%40mail.gmail.com.

Joachim Durchholz

unread,
Apr 29, 2014, 3:13:47 AM4/29/14
to sy...@googlegroups.com
I'm unsure whether it's possible.

Multithreading becomes problematic as soon as you have shared updatable
data structures.

Parallelizing stuff within a SymPy computation would require explicitly
forking off threads in the various algorithms; I do not think SymPy does
that.
I'm not sure whether SymPy's expression transformation algorithms do
in-place updates or construct new trees. If it's the former, any attempt
at parallelization will crash horribly.

A potentially huge, potentially nonexistent problem are SymPy's value
caches. SymPy creates considerable amounts of precalculated constants -
Pi, a prime sieve, known integrals, that kind of stuff. Creation happens
on-demand; I do not know what happens if two threads try to create the
same constant at the same time. Possible outcomes:
- Just duplicate effort, everything works.
- Duplicate effort and constant is stored twice (some constants are
large, so that's not so good but workable - this MIGHT break code that
compares against these constants using == though).
- Things crash horribly (but at least you know that it failed).
- Silent errors, wrong results.

Since SymPy is CPU-bound, Python implementations that use the global
interpreter lock might see a minimal speedup (thread A continues while
thread B waits for a file operation) -> not worth it.
Other Python implementations might work better. Still, the common caches
are going to create lock contention between threads, at least initially
while SymPy is still creating constants; later on, when the constants
are read-only, this should become a nonissue, so use large-ish problems
when benchmarking.

(Sorry for jumping in so late, I missed the original question.)

Alan Bromborsky

unread,
Apr 29, 2014, 7:53:30 AM4/29/14
to sy...@googlegroups.com
Would that be true for basic algebraic manipulation (the core?)
simplification, and calculating derivatives.

Ondřej Čertík

unread,
Apr 29, 2014, 11:13:54 PM4/29/14
to sympy
Parallelization is another area where CSymPy can make make a difference ---
it's pure C++, so there are no issues with Python. So the underlying
basic algebraic manipulation (the core) can be parallelized.

If anybody is interesting in trying this out, send us PRs, or ask for
help with this.
I suspect some of the algorithms are memory bound, so there might not be
much speedup, but I am sure there are many algorithms that can be parallelized
easily.

Ondrej

>
>> Since SymPy is CPU-bound, Python implementations that use the global
>> interpreter lock might see a minimal speedup (thread A continues while
>> thread B waits for a file operation) -> not worth it.
>> Other Python implementations might work better. Still, the common caches
>> are going to create lock contention between threads, at least initially
>> while SymPy is still creating constants; later on, when the constants are
>> read-only, this should become a nonissue, so use large-ish problems when
>> benchmarking.
>>
>> (Sorry for jumping in so late, I missed the original question.)
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/535F92BA.5060806%40verizon.net.

Vinzent Steinberg

unread,
Apr 30, 2014, 1:13:41 PM4/30/14
to sy...@googlegroups.com
On Tuesday, April 29, 2014 3:13:47 AM UTC-4, Joachim Durchholz wrote:
Am 25.04.2014 18:13, schrieb Alan Bromborsky:
> Has anyone had any experience in using parallel python with sympy. It
> seems to me that there are probably a lot of loops that have independent
> operations.

I'm unsure whether it's possible.

Multithreading becomes problematic as soon as you have shared updatable
data structures.

Parallelizing stuff within a SymPy computation would require explicitly
forking off threads in the various algorithms; I do not think SymPy does
that.

Python does not support running threads in parallel because of the GIL. It can only run processes in parallel, which adds a large overhead, because memory cannot be shared. This means that all data shared between processes has to be serialized (usually using pickle). I don't think it is feasible to improve sympy's performance by parallelizing it without escaping to languages that support native multithreading. 

Vinzent

Joachim Durchholz

unread,
Apr 30, 2014, 3:52:49 PM4/30/14
to sy...@googlegroups.com
Am 30.04.2014 19:13, schrieb Vinzent Steinberg:
> Python does not support running threads in parallel because of the GIL.

Not quite so - there's actually threading modules in Python.

The GIL does not prevent multithreading per se, it just prevents
parallel bytecode execution. For programs like SymPy which are
bytecode-heavy, this means that you don't gain performance (but there
can be improved program structure for some kinds of tasks).

Now the GIL isn't present in all Python implementations.
ipython and parallelpython were mentioned elsewhere, and I believe that
Stackless Python doesn't have it either.

Also, Python does come with multithreading facilities in the standard
library, and I doubt that they would be there if the GIL were really the
end-all of any hope of having multithreading in Python.

> It
> can only run processes in parallel, which adds a large overhead, because
> memory cannot be shared.

Read-only data should be sharable.
Though I guess that the extremely dynamic nature of Python makes it hard
for the interpreter to identify read-only data, so everything ever
shared across threads would undergo frequent locking and unlocking
operations.
I'm only guessing here though.

> This means that all data shared between processes
> has to be serialized (usually using pickle).

For really slow algorithms that can take hours to complete (Risch etc.),
even that may pay off.
SymPy's data structures aren't very large, the pickling overhead would
be negligible for slow algorithms.

The larger problem would be to find out how reliable operation is
possible. I have little knowledge of how the various Python
implementations differ wrt. parallel execution; we might find that we
can't easily make SymPy work reliably on all relevant platforms. We
might also find that it's too much effort to get a parallel and a
sequential version of an algorithm to work equivalently, and keep them
that way while SymPy undergoes improvements and changes.

Vinzent Steinberg

unread,
May 3, 2014, 1:14:23 PM5/3/14
to sy...@googlegroups.com
On Wednesday, April 30, 2014 3:52:49 PM UTC-4, Joachim Durchholz wrote:
Am 30.04.2014 19:13, schrieb Vinzent Steinberg:
> Python does not support running threads in parallel because of the GIL.

Not quite so - there's actually threading modules in Python.

Python's threading can only use one core. [1] It is useful if you want to avoid IO blocking. It cannot execute things in parallel.

The GIL does not prevent multithreading per se, it just prevents
parallel bytecode execution. For programs like SymPy which are
bytecode-heavy, this means that you don't gain performance (but there
can be improved program structure for some kinds of tasks).

I don't understand how that makes any difference. Don't we want to execute bytecode in parallel? I think the only way to release the GIL is by calling external code, which is what I meant by "escaping to other languages".
 
Now the GIL isn't present in all Python implementations.
ipython and parallelpython were mentioned elsewhere, and I believe that
Stackless Python doesn't have it either.

ipython and parallelpython are Python implementations? I think they just use multiple processes, like CPython's multiprocessing.

Jython, IronPython don't have a GIL. Stackless Python does. However, Sympy is targeting CPython.
 
Also, Python does come with multithreading facilities in the standard
library, and I doubt that they would be there if the GIL were really the
end-all of any hope of having multithreading in Python.

Even with the GIL, threading can be useful. (I.e. if you are waiting for IO or writing a GUI.)
And, yes, I believe the GIL is the end of all hope to execute threads in parallel within the same Python process.
 
 > It
> can only run processes in parallel, which adds a large overhead, because
> memory cannot be shared.

Read-only data should be sharable.

Only among threads, not among processes. Your OS should not allow you to read another's process memory.
(In theory, os.fork() is often implemented efficiently using copy-on-write, but I imagine it is hard to share Python data types without triggering the copy. See [2] for a few options on how to do it for binary data.)
 
Though I guess that the extremely dynamic nature of Python makes it hard
for the interpreter to identify read-only data, so everything ever
shared across threads would undergo frequent locking and unlocking
operations.
I'm only guessing here though.

 > This means that all data shared between processes
> has to be serialized (usually using pickle).

For really slow algorithms that can take hours to complete (Risch etc.),
even that may pay off.
SymPy's data structures aren't very large, the pickling overhead would
be negligible for slow algorithms.

I think it depends on how parallelizable your algorithm is. Without knowing where the Risch algorithm spends most of the time, I doubt it is easily parallelizable.

Are there actually examples where Risch runs for hours and produces useful results?

The larger problem would be to find out how reliable operation is
possible. I have little knowledge of how the various Python
implementations differ wrt. parallel execution; we might find that we
can't easily make SymPy work reliably on all relevant platforms. We
might also find that it's too much effort to get a parallel and a
sequential version of an algorithm to work equivalently, and keep them
that way while SymPy undergoes improvements and changes.

Aaron Meurer

unread,
May 3, 2014, 2:49:10 PM5/3/14
to sy...@googlegroups.com
The slow one is heurisch, not risch. It spends a lot of time solving a
very large sparse linear system. In most cases when it is slow it
doesn't produce an answer, although in theory it could happen.

Aaron Meurer

>
>> The larger problem would be to find out how reliable operation is
>> possible. I have little knowledge of how the various Python
>> implementations differ wrt. parallel execution; we might find that we
>> can't easily make SymPy work reliably on all relevant platforms. We
>> might also find that it's too much effort to get a parallel and a
>> sequential version of an algorithm to work equivalently, and keep them
>> that way while SymPy undergoes improvements and changes.
>
>
> Vinzent
>
>
> [1] https://docs.python.org/2/library/threading.html
> [2]
> https://stackoverflow.com/questions/17785275/share-large-read-only-numpy-array-between-multiprocessing-processes
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/3fbcadc7-18c3-4cc2-a6cc-4785ace133d5%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages