multiprocessing vs thread performance

mk

unread,

Dec 29, 2008, 9:52:07 AM12/29/08

to pytho...@python.org

Hello everyone,

After reading http://www.python.org/dev/peps/pep-0371/ I was under
impression that performance of multiprocessing package is similar to
that of thread / threading. However, to familiarize myself with both
packages I wrote my own test of spawning and returning 100,000 empty
threads or processes (while maintaining at most 100 processes / threads
active at any one time), respectively.

The results I got are very different from the benchmark quoted in PEP
371. On twin Xeon machine the threaded version executed in 5.54 secs,
while multiprocessing version took over 222 secs to complete!

Am I doing smth wrong in code below? Or do I have to use
multiprocessing.Pool to get any decent results?

# multithreaded version

#!/usr/local/python2.6/bin/python

import thread
import time

class TCalc(object):

def __init__(self):
self.tactivnum = 0
self.reslist = []
self.tid = 0
self.tlock = thread.allocate_lock()

def testth(self, tid):
if tid % 1000 == 0:
print "== Thread %d working ==" % tid
self.tlock.acquire()
self.reslist.append(tid)
self.tactivnum -= 1
self.tlock.release()

def calc_100thousand(self):
tid = 1
while tid <= 100000:
while self.tactivnum > 99:
time.sleep(0.01)
self.tlock.acquire()
self.tactivnum += 1
self.tlock.release()
t = thread.start_new_thread(self.testth, (tid,))
tid += 1
while self.tactivnum > 0:
time.sleep(0.01)

if __name__ == "__main__":
tc = TCalc()
tstart = time.time()
tc.calc_100thousand()
tend = time.time()
print "Total time: ", tend-tstart

# multiprocessing version

#!/usr/local/python2.6/bin/python

import multiprocessing
import time

def testp(pid):
if pid % 1000 == 0:
print "== Process %d working ==" % pid

def palivelistlen(plist):
pll = 0
for p in plist:
if p.is_alive():
pll += 1
else:
plist.remove(p)
p.join()
return pll

def testp_100thousand():
pid = 1
proclist = []
while pid <= 100000:
while palivelistlen(proclist) > 99:
time.sleep(0.01)
p = multiprocessing.Process(target=testp, args=(pid,))
p.start()
proclist.append(p)
pid += 1
print "=== Main thread waiting for all processes to finish ==="
for p in proclist:
p.join()

if __name__ == "__main__":
tstart = time.time()
testp_100thousand()
tend = time.time()
print "Total time:", tend - tstart

janislaw

unread,

Dec 29, 2008, 9:59:18 AM12/29/08

to

On 29 Gru, 15:52, mk <mrk...@gmail.com> wrote:
> Hello everyone,
>
> After readinghttp://www.python.org/dev/peps/pep-0371/I was under

> impression that performance of multiprocessing package is similar to
> that of thread / threading. However, to familiarize myself with both
> packages I wrote my own test of spawning and returning 100,000 empty
> threads or processes (while maintaining at most 100 processes / threads
> active at any one time), respectively.
>
> The results I got are very different from the benchmark quoted in PEP
> 371. On twin Xeon machine the threaded version executed in 5.54 secs,
> while multiprocessing version took over 222 secs to complete!
>
> Am I doing smth wrong in code below? Or do I have to use
> multiprocessing.Pool to get any decent results?

Oooh, 100000 processes! You're fortunate that your OS handled them in
finite time.

[quick browsing through the code]

Ah, so there are 100 processes at time. 200secs still don't sound
strange.

JW

mk

unread,

Dec 29, 2008, 10:14:14 AM12/29/08

to pytho...@python.org

janislaw wrote:

> Ah, so there are 100 processes at time. 200secs still don't sound
> strange.

I ran the PEP 371 code on my system (Linux) on Python 2.6.1:

Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py
empty_func.py

Importing empty_func
Starting tests ...
non_threaded (1 iters) 0.000005 seconds
threaded (1 threads) 0.000235 seconds
processes (1 procs) 0.002607 seconds

non_threaded (2 iters) 0.000006 seconds
threaded (2 threads) 0.000461 seconds
processes (2 procs) 0.004514 seconds

non_threaded (4 iters) 0.000008 seconds
threaded (4 threads) 0.000897 seconds
processes (4 procs) 0.008557 seconds

non_threaded (8 iters) 0.000010 seconds
threaded (8 threads) 0.001821 seconds
processes (8 procs) 0.016950 seconds

This is very different from PEP 371. It appears that the PEP 371 code
was written on Mac OS X. The conclusion I get from comparing above costs
sis that OS X must have very low cost of creating the process, at least
when compared to Linux, not that multiprocessing is a viable alternative
to thread / threading module. :-(

Christian Heimes

unread,

Dec 29, 2008, 10:17:37 AM12/29/08

to pytho...@python.org

mk wrote:
> Am I doing smth wrong in code below? Or do I have to use
> multiprocessing.Pool to get any decent results?

You have missed an important point. A well designed application does
neither create so many threads nor processes. The creation of a thread
or forking of a process is an expensive operation. You should use a pool
of threads or processes.

The limiting factor is not the creation time but the communication and
synchronization overhead between multiple threads or processes.

Christian

mk

unread,

Dec 29, 2008, 10:29:43 AM12/29/08

to pytho...@python.org

Christian Heimes wrote:
> mk wrote:
>> Am I doing smth wrong in code below? Or do I have to use
>> multiprocessing.Pool to get any decent results?
>
> You have missed an important point. A well designed application does
> neither create so many threads nor processes.

Except I was not developing "well designed application" but writing the
test the goal of which was measuring the thread / process creation cost.

> The creation of a thread
> or forking of a process is an expensive operation.

Sure. The point is, how expensive? While still being relatively
expensive, it turns out that in Python creating a thread is much, much
cheaper than creating a process via multiprocessing on Linux, while this
seems to be not necessarily true on Mac OS X.

> You should use a pool
> of threads or processes.

Probably true, except, again, that was not quite the point of this
exercise..

> The limiting factor is not the creation time but the communication and
> synchronization overhead between multiple threads or processes.

Which I am probably going to test as well.

Roy Smith

unread,

Dec 29, 2008, 10:32:55 AM12/29/08

to

In article <mailman.6337.1230563...@python.org>,
Christian Heimes <li...@cheimes.de> wrote:

> You have missed an important point. A well designed application does
> neither create so many threads nor processes. The creation of a thread
> or forking of a process is an expensive operation. You should use a pool
> of threads or processes.

It's worth noting that forking a new process is usually a much more
expensive operation than creating a thread. Not that I would want to
create 100,000 of either!

Not everybody realizes it, but threads eat up a fair chunk of memory (you
get one stack per thread, which means you need to allocate a hunk of memory
for each stack). I did a quick look around; 256k seems like a common
default stack size. 1 meg wouldn't be unheard of.

Aaron Brady

unread,

Dec 29, 2008, 10:35:46 AM12/29/08

to

On Dec 29, 8:52 am, mk <mrk...@gmail.com> wrote:
> Hello everyone,
>

> After readinghttp://www.python.org/dev/peps/pep-0371/I was under

> impression that performance of multiprocessing package is similar to
> that of thread / threading. However, to familiarize myself with both
> packages I wrote my own test of spawning and returning 100,000 empty
> threads or processes (while maintaining at most 100 processes / threads
> active at any one time), respectively.
>
> The results I got are very different from the benchmark quoted in PEP
> 371. On twin Xeon machine the threaded version executed in 5.54 secs,
> while multiprocessing version took over 222 secs to complete!
>
> Am I doing smth wrong in code below? Or do I have to use
> multiprocessing.Pool to get any decent results?

I'm running a 1.6 GHz. I only ran 10000 empty threads and 10000 empty
processes. The threads were the ones you wrote. The processes were
empty executables written in a lower language, also run 100 at a time,
started with 'subprocess', not 'multiprocessing'. The threads took
1.2 seconds. The processes took 24 seconds.

The processes you wrote had only finished 3000 after several minutes.

Jarkko Torppa

unread,

Dec 29, 2008, 11:30:42 AM12/29/08

to

On 2008-12-29, mk <mrk...@gmail.com> wrote:
> janislaw wrote:
>
>> Ah, so there are 100 processes at time. 200secs still don't sound
>> strange.
>
> I ran the PEP 371 code on my system (Linux) on Python 2.6.1:
>
> Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py
> empty_func.py
>
> Importing empty_func
> Starting tests ...
> non_threaded (1 iters) 0.000005 seconds
> threaded (1 threads) 0.000235 seconds
> processes (1 procs) 0.002607 seconds
>
> non_threaded (2 iters) 0.000006 seconds
> threaded (2 threads) 0.000461 seconds
> processes (2 procs) 0.004514 seconds
>
> non_threaded (4 iters) 0.000008 seconds
> threaded (4 threads) 0.000897 seconds
> processes (4 procs) 0.008557 seconds
>
> non_threaded (8 iters) 0.000010 seconds
> threaded (8 threads) 0.001821 seconds
> processes (8 procs) 0.016950 seconds
>
> This is very different from PEP 371. It appears that the PEP 371 code
> was written on Mac OS X.

On the PEP371 it says "All benchmarks were run using the following:
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)"

On my iMac 2.3Ghz dualcore. python 2.6

iTaulu:src torppa$ python run_benchmarks.py empty_func.py

Importing empty_func
Starting tests ...

non_threaded (1 iters) 0.000002 seconds
threaded (1 threads) 0.000227 seconds
processes (1 procs) 0.002367 seconds

non_threaded (2 iters) 0.000003 seconds
threaded (2 threads) 0.000406 seconds
processes (2 procs) 0.003465 seconds

non_threaded (4 iters) 0.000004 seconds
threaded (4 threads) 0.000786 seconds
processes (4 procs) 0.006430 seconds

non_threaded (8 iters) 0.000006 seconds
threaded (8 threads) 0.001618 seconds
processes (8 procs) 0.012841 seconds

With python2.5 and pyProcessing-0.52

iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py

Importing empty_func
Starting tests ...

non_threaded (1 iters) 0.000003 seconds
threaded (1 threads) 0.000143 seconds
processes (1 procs) 0.002794 seconds

non_threaded (2 iters) 0.000004 seconds
threaded (2 threads) 0.000277 seconds
processes (2 procs) 0.004046 seconds

non_threaded (4 iters) 0.000005 seconds
threaded (4 threads) 0.000598 seconds
processes (4 procs) 0.007816 seconds

non_threaded (8 iters) 0.000008 seconds
threaded (8 threads) 0.001173 seconds
processes (8 procs) 0.015504 seconds

--
Jarkko Torppa, Elisa

mk

unread,

Dec 29, 2008, 12:11:37 PM12/29/08

to pytho...@python.org

Jarkko Torppa wrote:

> On the PEP371 it says "All benchmarks were run using the following:
> Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)"

Right... I overlooked that. My tests I quoted above were done on SLES
10, kernel 2.6.5.

> With python2.5 and pyProcessing-0.52
>
> iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py
> Importing empty_func
> Starting tests ...
> non_threaded (1 iters) 0.000003 seconds
> threaded (1 threads) 0.000143 seconds
> processes (1 procs) 0.002794 seconds
>
> non_threaded (2 iters) 0.000004 seconds
> threaded (2 threads) 0.000277 seconds
> processes (2 procs) 0.004046 seconds
>
> non_threaded (4 iters) 0.000005 seconds
> threaded (4 threads) 0.000598 seconds
> processes (4 procs) 0.007816 seconds
>
> non_threaded (8 iters) 0.000008 seconds
> threaded (8 threads) 0.001173 seconds
> processes (8 procs) 0.015504 seconds

There's smth wrong with numbers posted in PEP. This is what I got on
4-socket Xeon (+ HT) with Python 2.6.1 on Debian (Etch), with kernel
upgraded to 2.6.22.14:

non_threaded (1 iters) 0.000004 seconds
threaded (1 threads) 0.000159 seconds
processes (1 procs) 0.001067 seconds

non_threaded (2 iters) 0.000005 seconds
threaded (2 threads) 0.000301 seconds
processes (2 procs) 0.001754 seconds

non_threaded (4 iters) 0.000006 seconds
threaded (4 threads) 0.000581 seconds
processes (4 procs) 0.003906 seconds

non_threaded (8 iters) 0.000009 seconds
threaded (8 threads) 0.001148 seconds
processes (8 procs) 0.008178 seconds

Hrvoje Niksic

unread,

Dec 29, 2008, 3:22:44 PM12/29/08

to

Roy Smith <r...@panix.com> writes:

> In article <mailman.6337.1230563...@python.org>,
> Christian Heimes <li...@cheimes.de> wrote:
>
>> You have missed an important point. A well designed application does
>> neither create so many threads nor processes. The creation of a thread
>> or forking of a process is an expensive operation. You should use a pool
>> of threads or processes.
>
> It's worth noting that forking a new process is usually a much more
> expensive operation than creating a thread.

If by "forking" you mean an actual fork() call, as opposed to invoking
a different executable, the difference is not necessarily that great.
Modern Unix systems tend to implement a 1:1 mapping between threads
and kernel processes, so creating a thread and forking a process
require similar amount of work.

On my system, as measured by timeit, spawning and joining a thread
takes 111 usecs, while forking and waiting for a process takes 260.
Slower, but not catastrophically so.

> Not that I would want to create 100,000 of either!

Agreed.

> Not everybody realizes it, but threads eat up a fair chunk of memory
> (you get one stack per thread, which means you need to allocate a
> hunk of memory for each stack). I did a quick look around; 256k
> seems like a common default stack size. 1 meg wouldn't be unheard
> of.

Note that this memory is virtual memory, so it doesn't use up the
physical RAM until actually used. I've seen systems running legacy
Java applications that create thousands of threads where *virtual*
memory was the bottleneck.

James Mills

unread,

Dec 29, 2008, 7:05:06 PM12/29/08

to mk, pytho...@python.org

On Tue, Dec 30, 2008 at 12:52 AM, mk <mrk...@gmail.com> wrote:
> Hello everyone,
>

> After reading http://www.python.org/dev/peps/pep-0371/ I was under

> impression that performance of multiprocessing package is similar to that of
> thread / threading. However, to familiarize myself with both packages I
> wrote my own test of spawning and returning 100,000 empty threads or
> processes (while maintaining at most 100 processes / threads active at any
> one time), respectively.
>
> The results I got are very different from the benchmark quoted in PEP 371.
> On twin Xeon machine the threaded version executed in 5.54 secs, while
> multiprocessing version took over 222 secs to complete!
>
> Am I doing smth wrong in code below? Or do I have to use
> multiprocessing.Pool to get any decent results?

The overhead in starting OS level processes
is quite high. This is why event-driven, single
process servers can perform far better than
ones that fork (spawn multiple processes)
per request.

As others have mentioned, it's not suprising
that spawning even 100 processes took some
time.

Bottom line: multiprocessing should not be used this way.
(nor should threading).

cheers
James

Aaron Brady

unread,

Dec 29, 2008, 8:34:46 PM12/29/08

to

On Dec 29, 6:05 pm, "James Mills" <prolo...@shortcircuit.net.au>
wrote:

> On Tue, Dec 30, 2008 at 12:52 AM, mk <mrk...@gmail.com> wrote:
> > Hello everyone,
>

> > After readinghttp://www.python.org/dev/peps/pep-0371/I was under

> > impression that performance of multiprocessing package is similar to that of
> > thread / threading. However, to familiarize myself with both packages I
> > wrote my own test of spawning and returning 100,000 empty threads or
> > processes (while maintaining at most 100 processes / threads active at any
> > one time), respectively.

snip

> As others have mentioned, it's not suprising
> that spawning even 100 processes took some
> time.
>
> Bottom line: multiprocessing should not be used this way.
> (nor should threading).

The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
claims is a 'distribution oriented language'.

You might also find it interesting to examine a theoretical OS that is
optimized for process overhead. In other words, what is the minimum
overhead possible? Can processes be as small as threads? Can entire
threads be only a few bytes (words) big?

Also, could generators provide any of the things you need with your
multiple threads? You could, say, call 'next()' on many items in a
list, and just remove them on StopIteration.

James Mills

unread,

Dec 29, 2008, 8:40:10 PM12/29/08

to Aaron Brady, pytho...@python.org

On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady <casti...@gmail.com> wrote:
> The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
> claims is a 'distribution oriented language'.

I would suggest to the OP that he take a look
at circuits (1) an event framework with a focus
on component architectures and distributed
processing.

I'm presently looking at Virtual Synchrony and
other distributed processing architectures - but
circuits is meant to be general purpose enough
to fit event-driven applications/systems.

cheers
James

1. http://trac.softcircuit.com.au/circuits/

Aaron Brady

unread,

Dec 29, 2008, 9:52:59 PM12/29/08

to

On Dec 29, 7:40 pm, "James Mills" <prolo...@shortcircuit.net.au>
wrote:

> On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady <castiro...@gmail.com> wrote:
> > The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
> > claims is a 'distribution oriented language'.

snip

> I'm presently looking at Virtual Synchrony and
> other distributed processing architectures - but
> circuits is meant to be general purpose enough
> to fit event-driven applications/systems.

I noticed a while ago that threads can be used to simulate
generators. 'next', 'send', and 'yield' are merely replaced by
synchronizor calls (coining the term).

Not the other way around, though, unless the generator guarantees a
yield frequently. 'settrace' anyone?

James Mills

unread,

Dec 29, 2008, 10:08:16 PM12/29/08

to Aaron Brady, pytho...@python.org

Aaron, circuits doesn't use generators :)
What did your comment have to do with this ?

I have often seen generators used to
facilitate coroutine and coooperative
programming though.

cheers
James

Aaron Brady

unread,

Dec 30, 2008, 9:29:19 AM12/30/08

to

On Dec 29, 9:08 pm, "James Mills" <prolo...@shortcircuit.net.au>
wrote:

James, Hi. I'm glad you asked; I never know how "out there" my
comments are (but surmise that feedback is always a good thing). What
I was thinking was, I didn't know Virtual Synchrony, and I've never
used Erlang, but I'm interested in concurrency especially as it
pertains to units of work, division of labor, and division of context;
and generators are another way to divide context. So: I wanted to put
more of my background and interests on the table. What I said wasn't
directly relevant, I see. But it's not like I
"dissertated" (discussed) the Tibettan-Austrian spice trade. I think
I just want to say stuff about threading! Maybe I'm just excited to
meet people who share my interests... not unheard of.

In Economics, you can divide a market into vertical and horizontal
dimensions, vertical being the chain from raw resources to finished
products, and horizontal being market coverage. With a similar
division in tasks, horizontal units would handle incoming events,
prepare them, then pass them on to a next unit, which processes a
little, then passes it on, like an assembly line (bucket brigade/
alternating current); and vertical units would take one incoming
event, and see it through start to finish (direct current). You don't
have to use that depiction.

The terminology is transposed from that of a distributed matrix
multiplication task depiction, where horizontal works start-to-finish,
and vertical takes handoffs from its predecessor.

'Circuits' doesn't use generators. I think generators are an
underexplored technique. Is 'circuits' assembly line or start-to-
finish, if my analogy makes any sense? 'Circuits' is event-driven,
but I don't see any difference between 'event-driven' and
multithreaded in general. (I think contrast can create a good picture
and a clear understanding.) What is special about an 'event-driven'
architecture? Are you distinguishing blocking from polling?

James Mills

unread,

Dec 30, 2008, 5:42:26 PM12/30/08

to Aaron Brady, pytho...@python.org

On Wed, Dec 31, 2008 at 12:29 AM, Aaron Brady <casti...@gmail.com> wrote:
> James, Hi. I'm glad you asked; I never know how "out there" my
> comments are (but surmise that feedback is always a good thing). What
> I was thinking was, I didn't know Virtual Synchrony, and I've never
> used Erlang, but I'm interested in concurrency especially as it
> pertains to units of work, division of labor, and division of context;
> and generators are another way to divide context. So: I wanted to put
> more of my background and interests on the table. What I said wasn't
> directly relevant, I see. But it's not like I
> "dissertated" (discussed) the Tibettan-Austrian spice trade. I think
> I just want to say stuff about threading! Maybe I'm just excited to
> meet people who share my interests... not unheard of.

Glad to see others also interested in these topics :)

(snip)

> 'Circuits' doesn't use generators. I think generators are an
> underexplored technique. Is 'circuits' assembly line or start-to-
> finish, if my analogy makes any sense? 'Circuits' is event-driven,
> but I don't see any difference between 'event-driven' and
> multithreaded in general. (I think contrast can create a good picture
> and a clear understanding.) What is special about an 'event-driven'
> architecture? Are you distinguishing blocking from polling?

I'll shortly be releasing circuits-1.0 today hopefully.
To answer your question, circuits is inspired by a
software architecture that my most favoured
lecturer a few years back was teaching. That is:
* Behaviour Trees (design)
and consequently:
* The concept of "everything is a Component"
* Systems and Sub-Systems are built upon Components
and
* Everything is an event.
and
* An emergent property of such systems are "Behaviour".

That being said, circuits employs both an event-driven
approach as well as a Component architecture.
In your analogy it is both horizontal and vertical.

As I continue to develop circuits and improve it's
core design as well as building it's ever growing set
of Components, I try to keep it as general as
possible - my main aim though is distributed
processing and architectures. (See the primes example).

Thanks for sharing your interest :)

cheers
James

James Mills

unread,

Dec 30, 2008, 7:52:57 PM12/30/08

to Aaron Brady, pytho...@python.org

On Wed, Dec 31, 2008 at 8:42 AM, James Mills
<prol...@shortcircuit.net.au> wrote:
(snip)

> As I continue to develop circuits and improve it's
> core design as well as building it's ever growing set
> of Components, I try to keep it as general as
> possible - my main aim though is distributed
> processing and architectures. (See the primes example).

Aaron, just wanted to demonstrate
to you the example (primes) that I mentioned
above:

On Terminal A:

jmills@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b
127.0.0.1:8000 -p 1000 -w

Total Primes: 1001 (23/s after 44.16s)
Total Events: 43373 (983/s after 44.16s)
Distribution:
c1096b40-7606-4ba7-9593-1385e14ef339: 348
8313b43f-d45d-4a0a-8d87-e6a93d3dfb0b: 653

On Terminal B:

jmills@atomant:~/other/circuits/examples$ ./primes.py -b
127.0.0.1:8001 -s 127.0.0.1:8000

The example uses circuits to distribute work
amongst any arbitrary number of nodes connected
to the system. This is all manually implemented
by simply taking advantage of the circuits framework
with my basic understanding ( so far ) of distributed
processing, virtual synchrony, and other techniques.

This is the same thing just run on a single node (no
distribution):

jmills@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b
127.0.0.1:8000 -p 1000

Total Primes: 1001 (17/s after 62.13s)
Total Events: 28579 (460/s after 62.13s)
Distribution:
701be749-9833-40a4-9181-5ee18047b1ad: 1001

As you can see, running 2 instances almost halves
the time. If you do try this though, you'll note that
the CPU isn't being used very heavily at all - This
could be improved - but the example merely
demonstrates distributed event processing and
syncronization.

cheers
James

Tim Roberts

unread,

Dec 31, 2008, 2:43:04 AM12/31/08

to

Christian Heimes <li...@cheimes.de> wrote:
>
>You have missed an important point. A well designed application does
>neither create so many threads nor processes. The creation of a thread
>or forking of a process is an expensive operation.

That actually depends on the operating system. As one example, thread
creation on Windows is not all that expensive.

>You should use a pool of threads or processes.

Even so, this is good advise.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Aaron Brady

unread,

Dec 31, 2008, 6:22:13 PM12/31/08

to

I had an idea. You could use 'multiprocessing' for synchronization,
and just use an mmap for the actual communication. (I think it's a
good idea.)

Paul Rubin

unread,

Dec 31, 2008, 7:19:54 PM12/31/08

to

Aaron Brady <casti...@gmail.com> writes:
> I had an idea. You could use 'multiprocessing' for synchronization,
> and just use an mmap for the actual communication. (I think it's a
> good idea.)

Are you reinventing POSH? (http://poshmodule.sourceforge.net)

Aaron Brady

unread,

Dec 31, 2008, 7:26:42 PM12/31/08

to

On Dec 31, 6:19 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:

I thought the same thing! Paul Boddie introduced me to POSH some
months ago. I don't recall the implementation of synchro. objects in
POSH, but IIRC, 'multiprocessing' uses native OS handles opening in
multiple processes. It could be that there would be substantial
overlap, or the trade-offs could be significant. For one, the above
combination only permits strings to be shared, not objects.

Philip Semanchuk

unread,

Dec 31, 2008, 7:28:43 PM12/31/08

to Python (General)

Or sysv_ipc? Or posix_ipc?

http://semanchuk.com/philip/sysv_ipc/
http://semanchuk.com/philip/posix_ipc/

Bug fix version of the latter coming out in a day or two...

Nick Craig-Wood

unread,

Jan 3, 2009, 8:31:12 AM1/3/09

to

mk <mrk...@gmail.com> wrote:
> After reading http://www.python.org/dev/peps/pep-0371/ I was under
> impression that performance of multiprocessing package is similar to
> that of thread / threading. However, to familiarize myself with both
> packages I wrote my own test of spawning and returning 100,000 empty
> threads or processes (while maintaining at most 100 processes / threads
> active at any one time), respectively.
>
> The results I got are very different from the benchmark quoted in PEP
> 371. On twin Xeon machine the threaded version executed in 5.54 secs,
> while multiprocessing version took over 222 secs to complete!
>
> Am I doing smth wrong in code below?

Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example. Just check ps when it
is running and you will see.

My conjecture is that this is due to the way fork() works under unix.
I think that when the parent forks it yields the CPU to the child.
Because you are giving the child effectively no work to do it returns
immediately, re-awakening the parent, thus serialising your jobs.

If you give the children some work to do you'll see a quite different
result. I gave each child time.sleep(1) to do and cut down the total
number to 10,000.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 10000 working ==

=== Main thread waiting for all processes to finish ===

Total time: 101.382129192

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 10000 working ==
Total time: 100.659118176

So almost identical results and as expected - we ran 10,000 sleep(1)s
in 100 seconds so we must have been running 100 simultaneously.

If you replace the "time.sleep(1)" with "for _ in xrange(1000000):
pass" you get this much more interesting answer on my dual core linux
laptop, showing nicely the effect of the contention on the python
global interpreter lock and how multiprocessing avoids it.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 10000 working ==

=== Main thread waiting for all processes to finish ===

Total time: 266.808327913

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 10000 working ==
Total time: 834.81882

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Gabriel Genellina

unread,

Jan 5, 2009, 10:24:14 PM1/5/09

to pytho...@python.org

En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood <ni...@craig-wood.com>
escribió:
> mk <mrk...@gmail.com> wrote:

>> The results I got are very different from the benchmark quoted in PEP
>> 371. On twin Xeon machine the threaded version executed in 5.54 secs,
>> while multiprocessing version took over 222 secs to complete!
>>
>> Am I doing smth wrong in code below?
>
> Yes!
>
> The problem with your code is that you never start more than one
> process at once in the multiprocessing example. Just check ps when it
> is running and you will see.

Oh, very good analysis! Those results were worriying me a little.

--
Gabriel Genellina

Arash Arfaee

unread,

Jan 7, 2009, 7:55:51 PM1/7/09

to pytho...@python.org

Hi All ,

Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of good example for it? I am trying to break a loop to run it over multiple core in a system. And I need to return an integer value as the result of the process an accumulate all of them. the examples that I found there is no return for the process.

Thanks,
-Arash

On Mon, Jan 5, 2009 at 7:24 PM, Gabriel Genellina <gags...@yahoo.com.ar> wrote:

En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood <ni...@craig-wood.com> escribió:

mk <mrk...@gmail.com> wrote:

The results I got are very different from the benchmark quoted in PEP
371. On twin Xeon machine the threaded version executed in 5.54 secs,
while multiprocessing version took over 222 secs to complete!

Am I doing smth wrong in code below?

Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example. Just check ps when it
is running and you will see.

Oh, very good analysis! Those results were worriying me a little.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

James Mills

unread,

Jan 7, 2009, 8:05:53 PM1/7/09

to Arash Arfaee, pytho...@python.org

On Thu, Jan 8, 2009 at 10:55 AM, Arash Arfaee <ere...@gmail.com> wrote:
> Hi All ,

HI :)

> Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of
> good example for it? I am trying to break a loop to run it over multiple
> core in a system. And I need to return an integer value as the result of the
> process an accumulate all of them. the examples that I found there is no
> return for the process.

You communicate with the process in one of several
ways:
* Semaphores
* Locks
* PIpes

I prefer to use Pipes which act much like sockets.
(in fact they are).

Read the docs and let us know how you go :)
I'm actually implementing multiprocessing
support into circuits (1) right now...

Gabriel Genellina

unread,

Jan 9, 2009, 6:19:33 PM1/9/09

to pytho...@python.org

En Wed, 07 Jan 2009 23:05:53 -0200, James Mills
<prol...@shortcircuit.net.au> escribió:

>> Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch
>> of
>> good example for it? I am trying to break a loop to run it over multiple
>> core in a system. And I need to return an integer value as the result
>> of the
>> process an accumulate all of them. the examples that I found there is no
>> return for the process.
>
> You communicate with the process in one of several
> ways:
> * Semaphores
> * Locks
> * PIpes

The Pool class provides a more abstract view that may be better suited in
this case. Just create a pool, and use map_async to collect and summarize
the results.

import string
import multiprocessing

def count(args):
(lineno, line) = args
print "This is %s, processing line %d\n" % (
multiprocessing.current_process().name, lineno),
result = dict(letters=0, digits=0, other=0)
for c in line:
if c in string.letters: result['letters'] += 1
elif c in string.digits: result['digits'] += 1
else: result['other'] += 1
# just to make some "random" delay
import time; time.sleep(len(line)/100.0)
return result

if __name__ == '__main__':

summary = dict(letters=0, digits=0, other=0)

def summary_add(results):
# this is called with a list of results
for result in results:
summary['letters'] += result['letters']
summary['digits'] += result['digits']
summary['other'] += result['other']

# count letters on this same script
f = open(__file__, 'r')

pool = multiprocessing.Pool(processes=6)
# invoke count((lineno, line)) for each line in the file
pool.map_async(count, enumerate(f), 10, summary_add)
pool.close() # no more jobs
pool.join() # wait until done
print summary

--
Gabriel Genellina