Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Multi-threading in Python vs Java

1,714 views
Skip to first unread message

Peter Cacioppi

unread,
Oct 11, 2013, 2:01:25 AM10/11/13
to
Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.

I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).

There is quite a significant performance improvement when multithreading here.

I'd like to port the project to Python, partly because Python is a better language (IMHO) and partly because Python plays well with Amazon Web Services.

But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

Any advice much appreciated. Thanks.

Cameron Simpson

unread,
Oct 11, 2013, 2:53:02 AM10/11/13
to Peter Cacioppi, pytho...@python.org
On 10Oct2013 23:01, Peter Cacioppi <peter.c...@gmail.com> wrote:
> Could someone give me a brief thumbnail sketch of the difference between multi-threaded programming in Java.
>
> I have a fairly sophisticated algorithm that I developed as both a single threaded and multi-threaded Java application. The multi-threading port was fairly simple, partly because Java has a rich library of thread safe data structures (Atomic Integer, Blocking Queue, Priority Blocking Queue, etc).
>
> There is quite a significant performance improvement when multithreading here.
>
> I'd like to port the project to Python, [...]
> But I'm a little leery that things like the Global Interpret Lock will block the multithreading efficiency, or that a relative lack of concurrent off the shelf data structures will make things much harder.

A couple of random items:

A Java process will happily use multiple cores and hyperthreading.
It makes no thread safety guarentees in the language itself,
though as you say it has a host of thread safe tools to make all
this easy to do safely.

As you expect, CPython has the GIL and will only use one CPU-level
thread of execution _for the purely Python code_. No two python
instructions run in parallel. Functions that block or call thread
safe libraries can (and usually do) release the GIL, allowing
other Python code to execute while native non-Python code does
stuff; that will use multiple cores etc.

Other Python implementations may be more aggressive. I'd suppose
Jypthon could multithread like Java, but really I have no experience
with them.

The standard answer with CPython is that if you want to use multiple
cores to run Python code (versus using Python code to orchestrate
native code) you should use the multiprocessing stuff to fork the
interpreter, and then farm out jobs using queues.

Regarding "concurrent off the shelf data structures", I have a bunch
of Python multithreaded stuff and find the stdlib Queues and Locks
(and Semaphores and so on) sufficient. The Queues (including things
like deque) are thread safe, so a lot of the coordination is pretty
easy.

And of course context managers make Locks and Semaphores very easy
and reliable to use:

L = Lock()
.......
with L:
... do locked stuff ...
...
...

I'm sure you'll get longer and more nuanced replies too.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

A squealing tire is a happy tire.
- Bruce MacInnes, Skip Barber Driving School instructor

Peter Cacioppi

unread,
Oct 11, 2013, 4:41:37 AM10/11/13
to
I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).

while len(jobs) :
job = jobs.pop()
model = Model(job) # Model is py interface for a lib written in C
newJobs = model.solve() # This will take a long time
for each newJob in newJobs :
jobs.add(newJob)

Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.

Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.

So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.





Chris Angelico

unread,
Oct 11, 2013, 4:48:31 AM10/11/13
to pytho...@python.org
On Fri, Oct 11, 2013 at 7:41 PM, Peter Cacioppi
<peter.c...@gmail.com> wrote:
> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.

Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!

ChrisA

Steven D'Aprano

unread,
Oct 11, 2013, 5:30:48 AM10/11/13
to
On Fri, 11 Oct 2013 17:53:02 +1100, Cameron Simpson wrote:

> Other Python implementations may be more aggressive. I'd suppose Jypthon
> could multithread like Java, but really I have no experience with them.

Neither Jython nor IronPython have a GIL.


> The standard answer with CPython is that if you want to use multiple
> cores to run Python code (versus using Python code to orchestrate native
> code) you should use the multiprocessing stuff to fork the interpreter,
> and then farm out jobs using queues.

Note that this really only applies to CPU-bound tasks. For tasks that
depend on file IO (reading and writing files), CPython threads will
operate in parallel as independently and (almost) as efficiently as those
in other languages. That is to say, they will be constrained by the
underlying operating system's ability to do file IO, not by the number of
cores in your CPU.


--
Steven

Piet van Oostrum

unread,
Oct 11, 2013, 10:55:44 AM10/11/13
to
But it only works if the external C library has been written to release
the GIL around the long computations. If not, then the OP could try to
write a wrapper around them that does this.
--
Piet van Oostrum <pi...@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

Terry Reedy

unread,
Oct 11, 2013, 3:53:41 PM10/11/13
to pytho...@python.org
On 10/11/2013 4:41 AM, Peter Cacioppi wrote:

> I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).
>
> while len(jobs) :
> job = jobs.pop()
> model = Model(job) # Model is py interface for a lib written in C
> newJobs = model.solve() # This will take a long time
> for each newJob in newJobs :
> jobs.add(newJob)
>
> Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.
>
> Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.
>
> So, my hope is that the GIL restrictions won't be problematic here. That is to say, I don't need **Python** code to ever run concurrently. I just need Python to allow a different Python worker thread to execute when all the other worker threads are blocking on the model.solve() task. Once the algorithm is in full swing, it is typical for all the worker threads should be blocking on model.Solve() at the same time.
>
> It's a nice algorithm for high level languages. Java worked well here, I'm hoping py can be nearly as fast with a much more elegant and readable code.

Given that model.solve takes a 'long time' (seconds, at least), the
extra time to start a process over the time to start a thread will be
inconsequential. I would therefore look at the multiprocessing module.

--
Terry Jan Reedy

Peter Cacioppi

unread,
Oct 11, 2013, 4:10:46 PM10/11/13
to
On Thursday, October 10, 2013 11:01:25 PM UTC-7, Peter Cacioppi wrote:
"Sounds like Python will serve you just fine! Check out the threading
module, knock together a quick test, and spin it up!"

Thanks, that was my assessment as well, just wanted a double check. At the time of posting I was mentally blocked on how to set up a quick proof of concept, but of course writing the post cleared that up ;)

Along with "batteries included" and "we're all adults", I think Python needs a pithy phrase summarizing how well thought out it is. That is to say, the major design decisions were all carefully considered, and as a result things that might appear to be problematic are actually not barriers in practice. My suggestion for this phrase is "Guido was here".

So in this case, I thought the GIL would be a fly in the ointment, but on reflection it turned out not to be the case. Guido was here.

Cameron Simpson

unread,
Oct 11, 2013, 5:35:26 PM10/11/13
to pytho...@python.org
On 11Oct2013 15:53, Terry Reedy <tjr...@udel.edu> wrote:
> On 10/11/2013 4:41 AM, Peter Cacioppi wrote:
> >I should add that the computational heavy lifting is done in a third party library. So a worker thread looks roughly like this (there is a subtle race condition I'm glossing over).
> >
> >while len(jobs) :
> > job = jobs.pop()
> > model = Model(job) # Model is py interface for a lib written in C
> > newJobs = model.solve() # This will take a long time
> > for each newJob in newJobs :
> > jobs.add(newJob)
> >
> >Here jobs is a thread safe object that is shared across each worker thread. It holds a priority queue of jobs that can be solved in parallel.
> >
> >Model is a py class that provides the API to a 3rd party library written in C.I know model.solve() will be the bottleneck operation for all but trivial problems.
[...]
> Given that model.solve takes a 'long time' (seconds, at least), the
> extra time to start a process over the time to start a thread will
> be inconsequential. I would therefore look at the multiprocessing
> module.

And, for contrast, I would not. Threads are my friends and Python
threads seem eminently suited to the above scenario.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

[Alain] had been looking at his dashboard, and had not seen me, so I
ran into him. - Jean Alesi on his qualifying prang at Imola '93
0 new messages