Parallelization on muli-CPU hardware?

P.M.

unread,

Oct 5, 2004, 12:45:14 AM10/5/04

to

Hi,

I'm working on an C++/embedded-Python application that will run on
multi-CPU hardware and I need to efficiently distribute the execution
of multiple scripts across the CPUs. I'm embedding Python using the
Boost Python library.

My question is, how can I best parallelize the running of separate
autonomous Python scripts within this app? Can I run multiple
interpreters in separate threads within a single process? In past
newsgroup messages I've seen advice that the only way to get
scalability, due to the GIL, is to use an IPC mechanism between
multiple distinct processes running seperate interpreters. Is this
still true or are there better ways to accomplish this?

Thanks
Peter

Andrew Dalke

unread,

Oct 5, 2004, 12:55:50 AM10/5/04

to

P.M. wrote:
> My question is, how can I best parallelize the running of separate
> autonomous Python scripts within this app? Can I run multiple
> interpreters in separate threads within a single process?

No. Python only supports one interpreter per process. There
may be several threads running, but only in the context of a
single interpreter. Why? Various data structures assume
global state. One is support for different modules.

> In past
> newsgroup messages I've seen advice that the only way to get
> scalability, due to the GIL, is to use an IPC mechanism between
> multiple distinct processes running seperate interpreters. Is this
> still true or are there better ways to accomplish this?

Hasn't changed. There many ways to do it, ranging from
ZODB to Twisted to POSH to CORBA to shared memory to Pyro
to a few dozen more. I've been waiting for a time to
give Pyro a try.

If most of your time is spent in Boost calculations then
you can release the GIL when you leave Python. So long
as extensions do not call back into Python there won't
be a scaling problem due to the GIL.

Andrew
da...@dalkescientific.com

Sam G.

unread,

Oct 5, 2004, 9:23:15 AM10/5/04

to

According to the fact that all Thread run on the same CPU (if i didn't
understand wrong), i'm asking if python will suffer from the future
multicore CPU. Will not python use only one core, then a half or a
quarter of CPU ? It could be a serious problem for the future of python...

I'm sorry if I ask a stupid alsmost-newbie question, but it's not the
first time I'm asking it.

Sam.

Alan Kennedy

unread,

Oct 5, 2004, 10:38:51 AM10/5/04

to

[Sam G.]

> According to the fact that all Thread run on the same CPU (if i didn't
> understand wrong), i'm asking if python will suffer from the future
> multicore CPU. Will not python use only one core, then a half or a
> quarter of CPU ? It could be a serious problem for the future of
python...

I agree that it could potentially be a serious hindrance for cpython if
"multiple core" CPUs become commonplace. This is in contrast to jython
and ironpython, both of which support multiple-cpu parallelism.

Although I completely accept the usual arguments offered in defense of
the GIL, i.e. that it isn't a problem in the great majority of use
cases, I think that position will become more difficult to defend as
desktop CPUs sprout more and more execution pipelines.

I think that this also fits in with AM Kuchling's recent
musing/thesis/prediction that the existing cpython VM may no longer be
in use in 5 years, and that it may be superceded by python
"interpreters" running on top of other VMs, namely the JVM, the CLR,
Smalltalk VM, Parrot, etc, etc, etc.

http://www.amk.ca/diary/archives/cat_python.html#003382

I too agree with Andrew's basic position: the Python language needs a
period of library consolidation. There is so much duplication of
functionality out there, with the situation only getting worse as people
re-invent the wheel yet again using newer features such generators,
gen-exps and decorators.

Just my €0,02.

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Grant Edwards

unread,

Oct 5, 2004, 11:06:24 AM10/5/04

to

On 2004-10-05, Sam G. <> wrote:

> According to the fact that all Thread run on the same CPU

I don't think anybody said that, and based on my understanding
of Python's threading, I don't see why it would be true.

> (if i didn't understand wrong), i'm asking if python will
> suffer from the future multicore CPU. Will not python use only
> one core, then a half or a quarter of CPU ? It could be a
> serious problem for the future of python...

AFAIK, the threads aren't all locked to a single CPU. However,
there is a global interpreter lock (a mutex), which somewhat
prevents simultaneous execution of threads when they _are_ on
different CPUs.

> I'm sorry if I ask a stupid alsmost-newbie question, but it's
> not the first time I'm asking it.

--
Grant Edwards grante Yow! What UNIVERSE is
at this, please??
visi.com

Aahz

unread,

Oct 5, 2004, 12:48:14 PM10/5/04

to

In article <fmy8d.32815$Z14....@news.indigo.ie>,

Alan Kennedy <ala...@hotmail.com> wrote:
>
>I agree that it could potentially be a serious hindrance for cpython if
>"multiple core" CPUs become commonplace. This is in contrast to jython
>and ironpython, both of which support multiple-cpu parallelism.
>
>Although I completely accept the usual arguments offered in defense of
>the GIL, i.e. that it isn't a problem in the great majority of use
>cases, I think that position will become more difficult to defend as
>desktop CPUs sprout more and more execution pipelines.

Perhaps. Then again, those pipelines will probably have their work cut
out running firewalls, spam filters, voice recognition, and so on. I
doubt the GIL will make much difference, still.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

WiFi is the SCSI of the 21st Century -- there are fundamental technical
reasons for sacrificing a goat. (with no apologies to John Woods)

Andreas Kostyrka

unread,

Oct 5, 2004, 5:36:16 PM10/5/04

to Aahz, pytho...@python.org

On Tue, Oct 05, 2004 at 12:48:14PM -0400, Aahz wrote:
> In article <fmy8d.32815$Z14....@news.indigo.ie>,
> Alan Kennedy <ala...@hotmail.com> wrote:
> >
> >I agree that it could potentially be a serious hindrance for cpython if
> >"multiple core" CPUs become commonplace. This is in contrast to jython
> >and ironpython, both of which support multiple-cpu parallelism.
> >
> >Although I completely accept the usual arguments offered in defense of
> >the GIL, i.e. that it isn't a problem in the great majority of use
> >cases, I think that position will become more difficult to defend as
> >desktop CPUs sprout more and more execution pipelines.
>
> Perhaps. Then again, those pipelines will probably have their work cut
> out running firewalls, spam filters, voice recognition, and so on. I
> doubt the GIL will make much difference, still.

Actually I doubt that loosing the GIL would make that much performance
difference even on a real (not HT) 4-way box ->
What the GIL buys us is no lock contention and no locking overhead
with Python.

And before somebody cries out, just think how the following trivial
statement would happen.

a = b + c
Lock a
Lock b
Lock c
a = b + c
Unlock c
Unlock b
Unlock a

So basically you either get a really huge number of locks (one per
object) with enough potential for conflicts, deadlocks and all the other
stuff to make it real slow down the ceval.

One could use less granularity, and lock say the class of the object
involved, but that wouldn't help that much either.

So basically the GIL is a design decision that makes sense, perhaps it
shouldn't be just called the GIL, call it the "very large locking
granularity design decision".

And before somebody points out that other languages can use locks too.
Well, other languages have usually much lower level execution model than
Python. And they usually force the developer to deal with
synchronization primitives. Python OTOH had always the "no segmentation
fault" policy -> so locking would have be "safe" as in "not producing
segfaults". That's not trivial to implement, for example reference
counting isn't trivially implementable without locking (at least
portably).

So, IMHO, there are basically the following design decisions:
GIL: large granularity
MSL: (many small locks) would slow down the overall execution of Python
programs.
MSLu: (many small locks, unsafe) inacceptable because it would change
Python experience ;)

Andreas

> --
> Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
>
> WiFi is the SCSI of the 21st Century -- there are fundamental technical
> reasons for sacrificing a goat. (with no apologies to John Woods)

> --
> http://mail.python.org/mailman/listinfo/python-list

Aahz

unread,

Oct 5, 2004, 5:07:12 PM10/5/04

to

In article <mailman.4377.1097009...@python.org>,

Andreas Kostyrka <and...@kostyrka.org> wrote:
>
>So, IMHO, there are basically the following design decisions:
>GIL: large granularity
>MSL: (many small locks) would slow down the overall execution of Python
> programs.
>MSLu: (many small locks, unsafe) inacceptable because it would change
> Python experience ;)

That's a good summation. Not quite QOTW, but definitely something
everyone should remember.

Daniel Dittmar

unread,

Oct 5, 2004, 5:32:53 PM10/5/04

to

Andreas Kostyrka wrote:
> So, IMHO, there are basically the following design decisions:
> GIL: large granularity
> MSL: (many small locks) would slow down the overall execution of Python
> programs.
> MSLu: (many small locks, unsafe) inacceptable because it would change
> Python experience ;)

You forgot GarbageCollection instead of reference counting.

Daniel

Jeff Shannon

unread,

Oct 5, 2004, 7:01:50 PM10/5/04

to

Daniel Dittmar wrote:

That's an orthogonal issue. GC/Refcounting may depend on the
data-locking scheme being thread-safe, but even if one used a GC scheme
that was completely thread-safe regardless of locking, you'd still need
data-locking to safely use objects in your own code. The fact that I
can be completely confident that the memory space pointed to by a, b,
and c won't be deallocated behind my back says nothing about whether
another worker thread may change b in between when I test it and when I
use it. This means that in a free-threaded environment (without GIL),
given

a = b + c

d = b + c
assert (a == d)

I cannot guarantee that the assertion is true, unless I explicitly lock
all the variables involved -- four locking operations plus four
unlocking operations. Locking the local namespace won't work, because
any of those could be a global reference or a cross-module reference or
whatever. One could, I suppose, explicitly acquire/release a
GIL-granularity lock (process-global "critical section", in essence),
but that leaves the possibility of mistakenly unprotected variable
causing a segfault -- again, a violation of one of Python's design
constraints. And I don't think that it's practical to have the
interpreter infer where it'd be safe to release the GIL.

Jeff Shannon
Technician/Programmer
Credit International

Bryan Olson

unread,

Oct 6, 2004, 12:04:56 AM10/6/04

to

Andreas Kostyrka wrote:
> Aahz wrote:

>>Alan Kennedy wrote:
>>
>>>I agree that it could potentially be a serious hindrance for cpython if
>>>"multiple core" CPUs become commonplace. This is in contrast to jython
>>>and ironpython, both of which support multiple-cpu parallelism.
>>>
>>>Although I completely accept the usual arguments offered in defense of
>>>the GIL, i.e. that it isn't a problem in the great majority of use
>>>cases, I think that position will become more difficult to defend as
>>>desktop CPUs sprout more and more execution pipelines.

I have to agree with Alan. The major chip vendors are telling
us that the future is multi-processor.

>>Perhaps. Then again, those pipelines will probably have their work cut
>>out running firewalls, spam filters, voice recognition, and so on. I
>>doubt the GIL will make much difference, still.

I don't know of any quantitative demonstration either way, but
demands on computers tend to be bursty. Frequently, a single
process has a lot of tasks that suddenly demanding computation,
while all other processes are quiet. A GIL is a killer.

> Actually I doubt that loosing the GIL would make that much performance
> difference even on a real (not HT) 4-way box ->
> What the GIL buys us is no lock contention and no locking overhead
> with Python.
>
> And before somebody cries out, just think how the following trivial
> statement would happen.
>
> a = b + c
> Lock a
> Lock b
> Lock c
> a = b + c
> Unlock c
> Unlock b
> Unlock a

If that happens frequently, the program has blown it rather
badly.

> So basically you either get a really huge number of locks (one per
> object) with enough potential for conflicts, deadlocks and all the other
> stuff to make it real slow down the ceval.
>
> One could use less granularity, and lock say the class of the object
> involved, but that wouldn't help that much either.

There's a significant point there in that threads do well with
course granularity, and badly with finely parallel algorithms.
But if we look at how sensible multi-threaded programs work, we
see that the threads share little writable data. Thread
contention is surprisingly rare, outside of identifiable hot-
spots around shared data-structures.

> So basically the GIL is a design decision that makes sense, perhaps it
> shouldn't be just called the GIL, call it the "very large locking
> granularity design decision".
>
> And before somebody points out that other languages can use locks too.
> Well, other languages have usually much lower level execution model than
> Python. And they usually force the developer to deal with
> synchronization primitives. Python OTOH had always the "no segmentation
> fault" policy -> so locking would have be "safe" as in "not producing
> segfaults". That's not trivial to implement, for example reference
> counting isn't trivially implementable without locking (at least
> portably).

It's not trivial to implement, but it is implementable, and is
so valuable that languages that do it are likely to antiquate
those that do not. For example, Java strictly specified what
the programmer can assume and what he cannot. Updates to 32-bit
integers are always atomic, updates to 64-bit integers are not.
The Java VM (with the Java libraries) does not crash because of
application-program threading mistakes; or at least if it does
everyone recognizes it as a major bug. As presently
implemented, Java both allows more thread-concurrency than
Python, and is more thread-safe.

> So, IMHO, there are basically the following design decisions:
> GIL: large granularity
> MSL: (many small locks) would slow down the overall execution of Python
> programs.
> MSLu: (many small locks, unsafe) inacceptable because it would change
> Python experience ;)

One-grain-at-a-time is quickly becoming unacceptable regardless
of the granularity.

--
--Bryan

P.M.

unread,

Oct 6, 2004, 1:00:40 AM10/6/04

to

Alan Kennedy <ala...@hotmail.com> wrote in message news:<fmy8d.32815$Z14....@news.indigo.ie>...

>
> I agree that it could potentially be a serious hindrance for cpython if
> "multiple core" CPUs become commonplace. This is in contrast to jython
> and ironpython, both of which support multiple-cpu parallelism.
>
> Although I completely accept the usual arguments offered in defense of
> the GIL, i.e. that it isn't a problem in the great majority of use
> cases, I think that position will become more difficult to defend as
> desktop CPUs sprout more and more execution pipelines.

I wonder if Python could be changed to use thread local storage? That
might allow for multiple interpreter instances without the GIL (I've
never looked at the actual code so I'm just hypothesizing). I took a
quick look at Lua today and it has no problems with creating multiple
instances of the interpreter, so it definately is a solvable problem.

Steve Holden

unread,

Oct 6, 2004, 4:49:50 AM10/6/04

to

P.M. wrote:

> Alan Kennedy <ala...@hotmail.com> wrote in message news:<fmy8d.32815$Z14....@news.indigo.ie>...
>
>>I agree that it could potentially be a serious hindrance for cpython if
>>"multiple core" CPUs become commonplace. This is in contrast to jython
>>and ironpython, both of which support multiple-cpu parallelism.
>>
>>Although I completely accept the usual arguments offered in defense of
>>the GIL, i.e. that it isn't a problem in the great majority of use
>>cases, I think that position will become more difficult to defend as
>>desktop CPUs sprout more and more execution pipelines.
>

The GIL has been a problem to me (probably a "fairly typical" Python
user) precisely once, when I tried to run a lengthy database process on
my brand-new H-P desktop-replacement laptop. The task seemed to be
running a little slower than I would have expected from the quoted clock
speed, so I opened up the Windows task manager and lo and behold, the
performance monitor was showing two CPUs, one hitting the stop at 100%
and the other basically idle.

Until that time I hadn't realized that the laptop was built on a
multi-core processor. Fortunately the task was easily partitioned into
two independent processes, each dealing with a separate set of database
rows, so the run completed well under the necessary 24 hours.

>
> I wonder if Python could be changed to use thread local storage? That
> might allow for multiple interpreter instances without the GIL (I've
> never looked at the actual code so I'm just hypothesizing). I took a
> quick look at Lua today and it has no problems with creating multiple
> instances of the interpreter, so it definately is a solvable problem.

Thread synchronization through a common namespace would likely be the
problem. The last time this was discussed on c.l.py someone who knows
far more than I about the interpreter's structure (Tim Peters, unless I
mistake my guess) opined that removing the GIL from CPython was far from
being a simple undertaking. Whether that's changed since I dunno.

regards
Steve
--
http://www.holdenweb.com
http://pydish.holdenweb.com
Holden Web LLC +1 800 494 3119

Sam G.

unread,

Oct 6, 2004, 5:43:07 AM10/6/04

to

OK, a mistake from me, I read more on GIL to understand well. Thanks to
correct me.

But if only one thread per interpreter could run at a time, in multicore
two (or more) thread can't run at the same time. In performance the
result is almost the same : if the server has nothing else than running
python (not real i know, just to be simple), only one core work...

Not very good for Zope, as multicore are for the near future...

Sam.

Grant Edwards wrote:
[...]

> AFAIK, the threads aren't all locked to a single CPU. However,
> there is a global interpreter lock (a mutex), which somewhat
> prevents simultaneous execution of threads when they _are_ on
> different CPUs.

[...]

Alex Martelli

unread,

Oct 6, 2004, 6:18:19 AM10/6/04

to

"Sam G." <nospam:samu...@yahoo.fr> wrote:

> OK, a mistake from me, I read more on GIL to understand well. Thanks to
> correct me.
>
> But if only one thread per interpreter could run at a time, in multicore
> two (or more) thread can't run at the same time. In performance the
> result is almost the same : if the server has nothing else than running
> python (not real i know, just to be simple), only one core work...
>
> Not very good for Zope, as multicore are for the near future...

Unless Zope can use multiple processes (I don't know whether it can or
not), the trouble's already there -- multi-CPU computers are already
widespread and have been for a while (all PowerMacs except the
entry-point ones, and I believe all Mac Servers, have been using
multiple CPUs for years now). It does, however, appear to me that
structuring a server application into multiple processes rather than
into multiple threads shouldn't be that big a deal anyway. And with
multiple processes you can exploit multiple CPUs or cores easily.

Alex

Neil Hodgson

unread,

Oct 6, 2004, 7:08:47 AM10/6/04

to

Steve Holden:

> Until that time I hadn't realized that the laptop was built on a
> multi-core processor.

It is more likely you had a machine that featured 'hyperthreading' which
is much less than multiple cores. Somewhere between two sets of registers
and two processors.

> Fortunately the task was easily partitioned into
> two independent processes, each dealing with a separate set of database
> rows, so the run completed well under the necessary 24 hours.

Did you measure a real performance increase, that is, elapsed time to
completion of job? Many benchmarks show minimal or even negative performance
improvements for hyperthreading. Relying on secondary indicators such as CPU
busyness can be misleading.

Neil

Andreas Kostyrka

unread,

Oct 6, 2004, 8:13:36 AM10/6/04

to dit...@snafu.de, pytho...@python.org

Well, "GarbageCollection" isn't really an option for a portable ANSI C
program, isn't it?

Andreas

>
> Daniel
> --
> http://mail.python.org/mailman/listinfo/python-list

Alan Kennedy

unread,

Oct 6, 2004, 7:38:17 AM10/6/04

to

[Andreas Kostyrka]

> So basically you either get a really huge number of locks (one per
> object) with enough potential for conflicts, deadlocks and all the
> other stuff to make it real slow down the ceval.
>
> One could use less granularity, and lock say the class of the object
> involved, but that wouldn't help that much either.
>
> So basically the GIL is a design decision that makes sense, perhaps it
> shouldn't be just called the GIL, call it the "very large locking
> granularity design decision".

Reading the above, one might be tempted to conclude that presence of the
GIL in the cpython VM is actually a benefit, and perhaps that all
language interpreters should have one!

But the reality is that few other VMs have selected to employ such a
"very large locking granularity design decision". If one were to believe
the arguments about the problems of fine-grained locking, one might be
tempted to conclude that other VMs such as the JVM and the CLR are
incapable of competing with the cpython VM, in terms of performance. But
Jim Hugunin's pre-release figures for IronPython performance indicate
that it can, in some cases, outperform cpython while running on just a
single processor, let alone when multiple processors are available. And
that is for an "interpreter running on an interpreter".

CPython's GIL does give a *small* performance benefit, but only when
there is a single execution pipeline. Once there is more than one
pipeline, it degrades performance.

As I've already stated, I believe the benefits and trade-offs of the GIL
are arguable either way when there is a small number of processors
involved, e.g. 2 or less. But if chip makers are already producing chips
with 2 execution pipelines, then you can be sure it won't too be long
before they are shipping units with 4, 8, 16, 32, etc, execution
pipelines. As this number increases, the GIL will increasingly become a
restrictive bottleneck.

Contrarily, jython, ironpython, etc, will continue to benefit from the
enormous and massively-resourced optimisation efforts going into the JVM
and CLR respectively (e.g. JIT compilation), all without a single change
to the python code or the code interpreting it.

Lastly, as the number of execution pipelines in CPUs grow, what will
happen if/when they start talking back and forth to each,
transputer-style, instead of executing mostly isolated and in parallel
as they do today? Transputers[1] had the lovely concept of high-speed
hardware communication channels cross-linking all CPUs in the "array".
The reason this model never really took off back in the 80's was because
there were no familiar high-level language models for exploiting it,
besides Occam[2].

IMHO, new python concepts such as generators are precisely the right
high-level concepts for enabling transputer style fine-granularity,
inter-execution-pipeline, on-demand, "pull" comms. This would put python
at an enormous conceptual advantage compared to other mainstream
languages, which generally don't have generator-style concepts. What a
shame that the GIL would restrict only one such CPU at a time to
actually be running python code.

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

[1] Transputer architecture
http://www.cmpe.boun.edu.tr/courses/cmpe511/fall2003/transputer.ppt

[2] Occam programming language
http://encyclopedia.thefreedictionary.com/Occam%20programming%20language

Daniel Dittmar

unread,

Oct 6, 2004, 7:30:23 AM10/6/04

to

Jeff Shannon wrote:
> a = b + c
> d = b + c
> assert (a == d)

I don't think that anybody expects the assertion to be true if threads
and any global variables are involved.

> I cannot guarantee that the assertion is true, unless I explicitly
> lock all the variables involved -- four locking operations plus four
> unlocking operations.

Trying to lock more than one object automatically is a recipe for
deadlocks.

> causing a segfault -- again, a violation of one of Python's design
> constraints. And I don't think that it's practical to have the
> interpreter infer where it'd be safe to release the GIL.

Unlike Java (1), Python assignments are not atomic operations, but
rather complex operations (updating of a dictionary).

Dictionaries would be locked at the C-level, so every assignment to
module or instance variables would trigger at least one lock.

I'm not sure if locals () would have to be locked. As Pythons closures
can't assign new values to variables of the surrounding scope, I'm not
aware of a situation where a local variable could be updated (2) by a
second thread.

Although 'Garbage Collection' fits your 'MSL: (many small locks) would
slow down the overall execution of Python programs.' in name, it is
technical a very different solution from locking the reference counter,
probably with very different performance implications.

Daniel

(1) assignments of certain types is not guaranteed to be atomic on Java
(2) 'updated' is here meant as assigning a new value, not as calling a
method on a mutable object.

Nick Craig-Wood

unread,

Oct 6, 2004, 9:30:09 AM10/6/04

to

Alan Kennedy <ala...@hotmail.com> wrote:
> [Andreas Kostyrka]
> > So basically you either get a really huge number of locks (one per
> > object) with enough potential for conflicts, deadlocks and all the
> > other stuff to make it real slow down the ceval.
> >
> > One could use less granularity, and lock say the class of the object
> > involved, but that wouldn't help that much either.
> >
> > So basically the GIL is a design decision that makes sense, perhaps it
> > shouldn't be just called the GIL, call it the "very large locking
> > granularity design decision".
>
> Reading the above, one might be tempted to conclude that presence of the
> GIL in the cpython VM is actually a benefit, and perhaps that all
> language interpreters should have one!

Interestingly the linux kernel has been through a similar evolution...

In linux 2.0 Multi-processing (SMP) was brought in. This was done
using something called the Big Kernel Lock (BKL). Sounds familiar
doesn't it! This worked fine but had the problem that only one CPU
could be executing in the kernel at once and hence a performance loss
in certain situations. This is the identical situation to python now
- there can only be one thread in python core at once.

However the linux kernel has evolved to replace the BKL with a series
of smaller locks. This has been a gradual process from 2.2->2.6 The
BKL is still there but its only used by legacy code in 2.6.

Yes fine grained locking does have an overhead. More locks mean more
memory and more wasted time in locking and unlocking them. The linux
kernel has got round this by

1) The locks aren't compiled in at all for a uniprocessor kernel (this
would mean a non-threading python - probably not an option)

2) The linux kernel has many different forms of locking both heavy and
light-weight

3) care has been taken to properly align the locks on cache line
boundaries. This costs memory though.

4) the linux kernel now has Read-Copy-Update (RCU) which requires no
locking

Linux is aiming at the future when everyone has 4 or 8
cores/hyperthreads and I think that is the right decision. Fine
grained locking will come to python one day I'm sure.

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Дамјан Георгиевски

unread,

Oct 6, 2004, 10:25:16 AM10/6/04

to

>> Reading the above, one might be tempted to conclude that presence of the
>> GIL in the cpython VM is actually a benefit, and perhaps that all
>> language interpreters should have one!

> Linux is aiming at the future when everyone has 4 or 8

> cores/hyperthreads and I think that is the right decision. Fine
> grained locking will come to python one day I'm sure.

One of the biggest problems with the GIL for me iswhen Python is embeded
inside some multi-threaded program, like for example inside Apache2 with
the "worker MPM" (a multithreaded apache process model).

But since some operations release the GIL, maybe its not a big issue even
then.

--
дамјан

To boldly go where I surely don't belong.

Corey Coughlin

unread,

Oct 6, 2004, 2:32:57 PM10/6/04

to

Speaking of multiprocessor architectures, here's a link I ran across
last month:

http://blogs.sun.com/roller/page/jonathan/20040910#the_difference_between_humans_and

It describes Sun's next chip, the Niagra project, a single chip with 8
processor cores each capable of running 4 threads, for a total of 32
threads. And you can bet they're coming up with multichip servers.
This is actually technology from a company that Sun bought, and now
they're kind of staking the future of the company on it. (Especially
given the Millenium chip fiasco.) But they're not alone by any
strech, AMD says it'll be shipping a multicore opteron next year, and
Intel is already out the HT technology, and they'll be doing
multicores soon too. The future is here, we should be prepared.

------ Corey

Alan Kennedy <ala...@hotmail.com> wrote in message news:<WOQ8d.32865$Z14....@news.indigo.ie>...

>
> As I've already stated, I believe the benefits and trade-offs of the GIL
> are arguable either way when there is a small number of processors
> involved, e.g. 2 or less. But if chip makers are already producing chips
> with 2 execution pipelines, then you can be sure it won't too be long
> before they are shipping units with 4, 8, 16, 32, etc, execution
> pipelines. As this number increases, the GIL will increasingly become a
> restrictive bottleneck.

> ally be running python code.
>

Steve Holden

unread,

Oct 6, 2004, 6:40:26 PM10/6/04

to

Neil Hodgson wrote:

> Steve Holden:
>
>
>>Until that time I hadn't realized that the laptop was built on a
>>multi-core processor.
>
>
> It is more likely you had a machine that featured 'hyperthreading' which
> is much less than multiple cores. Somewhere between two sets of registers
> and two processors.
>

Aah, the penny drops and I realize you are indeed correct. It's using
hyperthreading.

>
>>Fortunately the task was easily partitioned into
>>two independent processes, each dealing with a separate set of database
>>rows, so the run completed well under the necessary 24 hours.
>
>
> Did you measure a real performance increase, that is, elapsed time to
> completion of job? Many benchmarks show minimal or even negative performance
> improvements for hyperthreading. Relying on secondary indicators such as CPU
> busyness can be misleading.
>

No, I *was* actually measuring elapsed time to completion, and I was
surprised that the speedup was indeed just about linear. Not often you
come across a task that can be cleanly partitioned in that way.

Thomas Bellman

unread,

Oct 7, 2004, 6:33:56 AM10/7/04

to

Andreas Kostyrka <and...@kostyrka.org> wrote:

> Well, "GarbageCollection" isn't really an option for a portable ANSI C
> program, isn't it?

Depends on what you mean. It is perfectly possible to write a
garbage collection system in portable ANSI C. It will of course
only handle memory that has been properly registered with the
garbage collector, not any plain memory just received from
malloc(). For an example of such a system, look at the Lisp
interpreter in GNU Emacs. Or why not the C-Python interpreter,
which features garbage collection since version 2.0?

--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"The one who says it cannot be done should ! bellman @ lysator.liu.se
never interrupt the one who is doing it." ! Make Love -- Nicht Wahr!

Aahz

unread,

Oct 6, 2004, 4:40:15 PM10/6/04

to

In article <ck0emf$1ar$1...@s5.feed.news.oleane.net>,

Sam G. <nospam:samu...@yahoo.fr> wrote:
>

>But if only one thread per interpreter could run at a time, in multicore
>two (or more) thread can't run at the same time. In performance the
>result is almost the same : if the server has nothing else than running
>python (not real i know, just to be simple), only one core work...
>
>Not very good for Zope, as multicore are for the near future...

Depends where it bogs down. If it's CPU, it may be an issue. If it's
I/O, CPython already releases the GIL and Zope benefits.

Aahz

unread,

Oct 6, 2004, 4:39:13 PM10/6/04

to

In article <6db3f637.04100...@posting.google.com>,

P.M. <p...@cpugrid.net> wrote:
>
>I wonder if Python could be changed to use thread local storage? That
>might allow for multiple interpreter instances without the GIL (I've
>never looked at the actual code so I'm just hypothesizing). I took a
>quick look at Lua today and it has no problems with creating multiple
>instances of the interpreter, so it definately is a solvable problem.

The problem is that CPython doesn't have thread-local storage. All
objects are created and stored on the heap, and any CPython code can
access any object at any time, thanks to the glories of introspection.
I'm not sure what compromises Jython and IronPython make.

Daniel Dittmar

unread,

Oct 7, 2004, 10:39:44 AM10/7/04

to

P.M. wrote:
> I wonder if Python could be changed to use thread local storage? That
> might allow for multiple interpreter instances without the GIL (I've
> never looked at the actual code so I'm just hypothesizing). I took a
> quick look at Lua today and it has no problems with creating multiple
> instances of the interpreter, so it definately is a solvable problem.

Most VMs don't use thread local storage, but rather keep all state in
one object that is passed through all the functions. This is probably
solvable in Python.

But this would only allow to run multiple Python VMs in the same
process. These VMs couldn't communicate through Python objects, but only
through IPC. Each VM would still require a ILL (Interpreter Local Lock).
It wouldn't be that different from actually having multiple processes.

Zope wouldn't benefit at all.
Apache with threads + mod_python could benefit, but caching session
state in Python variables becomes much harder.

Daniel

Aahz

unread,

Oct 7, 2004, 11:03:34 AM10/7/04

to

In article <6db3f637.04100...@posting.google.com>,
P.M. <p...@cpugrid.net> wrote:
>

>My question is, how can I best parallelize the running of separate
>autonomous Python scripts within this app? Can I run multiple
>interpreters in separate threads within a single process? In past
>newsgroup messages I've seen advice that the only way to get
>scalability, due to the GIL, is to use an IPC mechanism between
>multiple distinct processes running seperate interpreters. Is this
>still true or are there better ways to accomplish this?

I'm not sure where in the thread to hang this, so I went back to the
root to post this reminder:

One critical reason for the GIL is to support CPython's ability to call
random C libraries with little effort. Too many C libraries are not
thread-safe, let alone thread-hot. Forcing libraries that wish to
participate in threading to use Python's GIL-release mechanism is the
only safe approach.

Daniel Dittmar

unread,

Oct 7, 2004, 11:30:59 AM10/7/04

to

Aahz wrote:
> One critical reason for the GIL is to support CPython's ability to call
> random C libraries with little effort. Too many C libraries are not
> thread-safe, let alone thread-hot. Forcing libraries that wish to
> participate in threading to use Python's GIL-release mechanism is the
> only safe approach.

Most (or many) wrappers around C libs are generated by SWIG, boost, SIP
and what not. It can't be that difficult to generate code instead so
that entering extension code acquires a lock end leaving the code
releases it. And writing an extension by hand is verbose enough, having
to add the locking code wouldn't really multiply the effort.

One could even add a flag to definitions of native methods that this
method is reentrant. If this flag isn't set, then the interpreter would
acquire and release the lock.

Daniel

Alex Martelli

unread,

Oct 7, 2004, 12:26:11 PM10/7/04

to

Aahz <aa...@pythoncraft.com> wrote:
...

> The problem is that CPython doesn't have thread-local storage. All

In 2.4 it does -- see threading.local documentation at
<http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
even better, the docstring of the new _threading_local module).

Alex

Daniel Dittmar

unread,

Oct 7, 2004, 12:37:46 PM10/7/04

to

Aahz wrote:
> P.M. <p...@cpugrid.net> wrote:
>
>>I wonder if Python could be changed to use thread local storage? That
>>might allow for multiple interpreter instances without the GIL (I've
>>never looked at the actual code so I'm just hypothesizing). I took a
>>quick look at Lua today and it has no problems with creating multiple
>>instances of the interpreter, so it definately is a solvable problem.
>
>
> The problem is that CPython doesn't have thread-local storage. All

I'm sure P.M. meant that the Python C API uses thread local storage
instead of global/static variables.

Daniel

Aahz

unread,

Oct 7, 2004, 1:02:17 PM10/7/04

to

In article <1glatga.8upk9brifr2qN%ale...@yahoo.com>,
Alex Martelli <ale...@yahoo.com> wrote:

>Aahz <aa...@pythoncraft.com> wrote:
>>
>> The problem is that CPython doesn't have thread-local storage.
>

>In 2.4 it does -- see threading.local documentation at
><http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
>even better, the docstring of the new _threading_local module).

IIUC, that's not thread-local storage in the sense that I'm using the
term (and which I believe is standard usage). Values created with
thread-local storage module are still allocated on the heap, and it's
still possible to use introspection to access thread-local data in
another thread.

Don't get me wrong; I think it's a brilliant addition to Python.
Unfortunately, it doesn't help with the real issues with making the
Python core free-threaded (or anything more fine-grained than the GIL).

Alex Martelli

unread,

Oct 7, 2004, 1:56:10 PM10/7/04

to

Aahz <aa...@pythoncraft.com> wrote:

> In article <1glatga.8upk9brifr2qN%ale...@yahoo.com>,
> Alex Martelli <ale...@yahoo.com> wrote:
> >Aahz <aa...@pythoncraft.com> wrote:
> >>
> >> The problem is that CPython doesn't have thread-local storage.
> >
> >In 2.4 it does -- see threading.local documentation at
> ><http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
> >even better, the docstring of the new _threading_local module).
>
> IIUC, that's not thread-local storage in the sense that I'm using the
> term (and which I believe is standard usage). Values created with
> thread-local storage module are still allocated on the heap, and it's
> still possible to use introspection to access thread-local data in
> another thread.

Is it...? Maybe I'm missing something...:

import threading

made_in_main = threading.local()
made_in_main.foo = 23
def f():
print 'foo in made_in_main is', getattr(made_in_main, 'foo', None)

print 'in main thread:',
f()
t = threading.Thread(target=f)
print 'in subthread:',
t.start()
t.join()
print 'back in main thread:',
f()

What I see is:

kallisti:~/cb/little_neat_things alex$ python2.4 lots.py
in main thread: foo in made_in_main is 23
in subthread: foo in made_in_main is None
back in main thread: foo in made_in_main is 23

so how does a subthread introspect to 'break the rules'...? (preferably
in a platform-independent way rather than by taking advantage of quirks
of implementation or one or another platform)

> Don't get me wrong; I think it's a brilliant addition to Python.
> Unfortunately, it doesn't help with the real issues with making the
> Python core free-threaded (or anything more fine-grained than the GIL).

I'm not claiming it does, mind you! It's just that it DOES seem to me
to be a "true" implementation of the "thread-specific storage" design
pattern, and I don't know of any distinction between that DP and the
synonym "thread-local storage" (Schmidt et al used both interchangeably,
if I'm not mistaken).

Alex

Aahz

unread,

Oct 7, 2004, 2:57:37 PM10/7/04

to

In article <1glawed.1e83ntp1ck15pvN%ale...@yahoo.com>,

Alex Martelli <ale...@yahoo.com> wrote:
>Aahz <aa...@pythoncraft.com> wrote:
>> In article <1glatga.8upk9brifr2qN%ale...@yahoo.com>,
>> Alex Martelli <ale...@yahoo.com> wrote:
>>>Aahz <aa...@pythoncraft.com> wrote:
>>>>
>>>> The problem is that CPython doesn't have thread-local storage.
>>>
>>>In 2.4 it does -- see threading.local documentation at
>>><http://www.python.org/dev/doc/devel/lib/module-threading.html> (and
>>>even better, the docstring of the new _threading_local module).
>>
>> IIUC, that's not thread-local storage in the sense that I'm using the
>> term (and which I believe is standard usage). Values created with
>> thread-local storage module are still allocated on the heap, and it's
>> still possible to use introspection to access thread-local data in
>> another thread.
>
>Is it...? Maybe I'm missing something...:

If you look at the code for _threading_local, you'll see that it depends
on __getattribute__() to acquire a lock and patch in the current
thread's local state. IIRC from the various threads about securing
Python, there are ways to break __getattribute__(), but I don't remember
any off-hand (and don't have time to research).

Alex Martelli

unread,

Oct 7, 2004, 3:13:56 PM10/7/04

to

Aahz <aa...@pythoncraft.com> wrote:
...

> >> still possible to use introspection to access thread-local data in
> >> another thread.
> >
> >Is it...? Maybe I'm missing something...:
>
> If you look at the code for _threading_local, you'll see that it depends
> on __getattribute__() to acquire a lock and patch in the current
> thread's local state. IIRC from the various threads about securing
> Python, there are ways to break __getattribute__(), but I don't remember
> any off-hand (and don't have time to research).

I believe you're thinking of the minimal portable version: I believe
that for such systems as Windows and Linux (together probably 99% of
Python users, and i speak as a Mac fan...!-) there are specific
implementations that use faster and probably unbreakable approaches.

Alex

Bryan Olson

unread,

Oct 8, 2004, 1:47:12 AM10/8/04

to

Alex Martelli wrote:

> Aahz <aa...@pythoncraft.com> wrote:
>>If you look at the code for _threading_local, you'll see that it depends
>>on __getattribute__() to acquire a lock and patch in the current
>>thread's local state. IIRC from the various threads about securing

>>Python, there are ways to break __getattribute__() [...]

>
> I believe you're thinking of the minimal portable version: I believe
> that for such systems as Windows and Linux (together probably 99% of
> Python users, and i speak as a Mac fan...!-) there are specific
> implementations that use faster and probably unbreakable approaches.

Are we looking at the same question? A language that makes safe
threading logically possible is easy; C does that. 'Unbreakable'
is a high standard, but one that a high-level language should
meet. No matter how badly the Python programmer blows it,
Python itself should not crash, nor otherwise punt to arbitrary
behavior.

Now look at implementing Python without a global interpreter
lock (GIL). To work safely, before we rely on any condition,
we must check that the condition is true; for example we do not
access an array element until we've checked that the subscript is
within the current array size. Python does a reasonable job of
safety checking. In the presence of OS-level multi-tasking,
Python could still fail catastrophically. We might be suspended
after the check; then another thread might change the size of
the array; then the former thread might be restored before
accessing the element.

So what's the solution? A global-interpreter-lock works; that's
what Python has now; but we lose much of the power of multi-
processing systems. We could give every object its own lock,
but the locking overhead would defeat the purpose (and inter-
object conditions can be trickier than they might look).

Maybe every object, at any particular time, belongs to just one
thread. That thread can do what it wants with the object, but
any other has to coordinate before accessing the object. I
don't know the current results on that idea.

Functional programming languages point out a possible facility:
threads can safely share non-updatable values. In Python,
ensuring that immutable objects never contain references to
mutable objects is a bigger change than we might reasonably
expect to impose.

Guido did not impose the GIL lightly. This is a hard problem.

--
--Bryan

Paul Rubin

unread,

Oct 8, 2004, 2:16:36 AM10/8/04

to

Bryan Olson <fakea...@nowhere.org> writes:
> So what's the solution? A global-interpreter-lock works; that's
> what Python has now; but we lose much of the power of multi-
> processing systems. We could give every object its own lock,
> but the locking overhead would defeat the purpose (and inter-
> object conditions can be trickier than they might look).

Giving every object its own lock is basically what Java does. Is it
really so bad, if done right? Acquiring or releasing the lock can be
just one CPU instruction on most reasonable processors. I've been
wondering about this for a while.

Alex Martelli

unread,

Oct 8, 2004, 3:22:58 AM10/8/04

to

Bryan Olson <fakea...@nowhere.org> wrote:
...

> Are we looking at the same question? A language that makes safe

Probably not: I'm focusing on (and answering, I hope) the specific
assertion "CPython doesn't have thread-local storage", an assertion
which I think is wrong for 2.4. Even if we do agree it does, that
doesn't necessarily mean such TLS (which CPython exposes to Python
programs) is in the least helpful in implementing CPython itself, of
course; nevertheless it seems reasonable to me to try and answer
assertions which I believe are wrong, even when those assertions may not
be directly relevant to a thread's "Subject".

> Guido did not impose the GIL lightly. This is a hard problem.

Sure. I wonder what (e.g.) IronPython or Ruby do about it -- never
studied the internals of either, yet.

Alex

Aahz

unread,

Oct 8, 2004, 3:40:04 PM10/8/04

to

In article <1glbyrn.4jt08j144wg1dN%ale...@yahoo.com>,
Alex Martelli <ale...@yahoo.com> wrote:

>Bryan Olson <fakea...@nowhere.org> wrote:
>>
>> Guido did not impose the GIL lightly. This is a hard problem.
>
>Sure. I wonder what (e.g.) IronPython or Ruby do about it -- never
>studied the internals of either, yet.

Ruby doesn't have OS-level threading IIRC.

Bryan Olson

unread,

Oct 10, 2004, 12:45:52 AM10/10/04

to

Paul Rubin wrote:

I'm not really up-to-date on modern multi-processor support.
Back in grad school I read some papers on cache coherence, and I
don't know how well the problems have been solved. The issue
was that a single processor can support a one-instruction lock
(in the usual no-contention case) simply by supplying an
uninterruptable read-and-update instruction, but on a multi-
processor, all the processors have respect the lock.

--
--Bryan

Piet van Oostrum

unread,

Oct 11, 2004, 6:13:53 AM10/11/04

to

>>>>> Bryan Olson <fakea...@nowhere.org> (BO) wrote:

BO> I'm not really up-to-date on modern multi-processor support.
BO> Back in grad school I read some papers on cache coherence, and I
BO> don't know how well the problems have been solved. The issue
BO> was that a single processor can support a one-instruction lock
BO> (in the usual no-contention case) simply by supplying an
BO> uninterruptable read-and-update instruction, but on a multi-
BO> processor, all the processors have respect the lock.

For multiprocessor systems, uninterruptible isn't enough (depending on your
definition of uninterruptible). Memory operations from other processors
also shouldn't interleave with the instruction. Modern processors usually
have a 'test and set' or 'compare and swap' instruction for this purpose,
which lock memory for the duration of the operation.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van....@hccnet.nl

Luis P Caamano

unread,

Oct 11, 2004, 4:25:23 PM10/11/04

to

aa...@pythoncraft.com (Aahz) wrote in message news:<ck3ls6$62e$1...@panix1.panix.com>...

>
> One critical reason for the GIL is to support CPython's ability to call
> random C libraries with little effort. Too many C libraries are not
> thread-safe, let alone thread-hot.

Very true and I think that's one of the main reasons we still have the
GIL.

> Forcing libraries that wish to
> participate in threading to use Python's GIL-release mechanism is the
> only safe approach.

The ONLY one? That's too strong a statement unless you qualify it.
One problem is that CPython tries to make C library development as
easy as possible to the point of being condescending. IOW, it seems
that CPython assumes that most C library writers are too stupid to
write thread-safe code. However, one could say that if it had always
forced all C libraries to be written thread-safe from scratch, most
library developers would've practiced a lot by now.

Given enough time, familiarity is a good tool to tame apparently
complex issues.

Unfortunately, it's not that simple or maybe not even true. I suspect
that the real reason C libraries are not required to be thread-safe is
that there's already too many C libraries out there that are NOT
thread safe and too many people think (arguably mistaken) that fixing
all those libaries to get rid of the GIL is not worth the trouble.

I speculate that the GIL "problem" might have grown to its current
state because CPython grew in parallel with thread programming.
CPython existed way before Posix 1003c was ratified or even before we
had good pthread libraries. In addition, CPython always wanted to be
OS agnostic. In 1995, it would've been possible to write a
CPosixPython with full support for the Posix thread library with
support for all those cool things like CPU affinity, options for
kernel or user threads, thread cancellation, thread local storage,
etc. but it would not have worked on any OS that didn't have a good,
conformat pthreads library, which at that time were many.

Therefore, fixing the GIL was always in competition with existing code
and libraries, and it always lost because it was (and still is)
considered "not worth the effort."

If we were going to write a CPython interpreter today, it would be a
lot easier to write the interpreter loop and its data in a manner that
would maximize the use of threads because today thread support is
widely spread among supported OSes.

As a kernel developer and experience pthreads programmer, the first
time I saw CPython's main interpreter loop in ceval.c my jaw hit the
floor because I couldn't believe why anybody would write a threaded
interpreter that grabs ONE BIG mutex, runs 100 op codes (checkinterval
default), and then releases the mutex. It took a while to finally
understand that the reason is not technical but mostly historical
(IMHO).

I've said it before. One day enough people will think that the GIL is
a problem big enough to warrant a solution, e.g., when the majority of
systems where CPython runs have more than one CPU. Until then we have
to go back to early 90s programming and use IPC (interprocess
communication) to scale applications that want to run PURE python code
on more than one CPU. That's probably the main disagreement I have
with those that think that the GIL is not a big problem, IPC is not a
solution but a workaround.

This is not a Python unique problem either. Most major UNIX OSes,
e.g., HPUX, Sun, AIX, etc. went through the same thing when adding
support for SMP in early through mid 90s. And that was a bigger
problem that CPython's GIL because they had a lot drivers that were
not SMP safe.

I know the problem is complex and there are other non-technical issues
to consider. However, and I don't mean to oversimplify, here are a
few ideas that might help with the GIL problem:

- Detect multiple CPUs, if single CPU, do not lock globals. That's
what some OSes do to avoid unnecessary SMP penalties on single CPU
systems.

- Create python objects in thread local storage by default, which
don't need locking.

- Rewrite the interpreter loop so that it doesn't grab a BIG lock,
unless configured to do so.

- Let users lock, not the interpreter. If you use threads and you
access global objects without locking, you might get undefined
behaviors, just like in C/C++.

- Allow legacy behavior with a command line option to enable the GIL
and create global objects. (or viceversa).

- Require C libraries to tell the interpreter if they are thread
safe or not. If they are not, the interpreter would not load them
unless running in legacy-mode.

- Augment the posix module to include full support for the pthreads
library, e.g., thread cancellation, CPU affinity.

Of course, that's easier said than done and I'm not saying that it can
or should be done now. The point is that getting rid of the GIL is
straightforward, it's been done before many times, but it will not
happen until it's widely viewed as a problem.

Unfortunately, those changes are big enough that I don't think they'll
happen under the CPython source tree even we all wanted them. More
than likely it will require a separate project, PosixPython perhaps?

I hope one day I'll work for an employer that could afford to donate
some (or all) my time for python development so I could start working
on that. Now, that would be cool.

With a <sigh> and a <wink>,

--
Luis P Caamano

Neil Hodgson

unread,

Oct 11, 2004, 6:01:33 PM10/11/04

to

Luis P Caamano:

> Therefore, fixing the GIL was always in competition with existing code
> and libraries, and it always lost because it was (and still is)
> considered "not worth the effort."

Greg Stein implemented a "free-threaded" version of Python in 1996 and
had another look in 2000.

http://python.tpnet.pl/contrib-09-Dec-1999/System/threading.README
http://mail.python.org/pipermail/python-dev/2000-April/003605.html

So, it is not the initial implementation of a GIL-free Python that is
the stumbling block but the maintenance of the code. The poor performance of
free-threaded Python contributed to the lack of interest.

> That's probably the main disagreement I have
> with those that think that the GIL is not a big problem, IPC is not a
> solution but a workaround.

For me, the GIL works fine because it is released around I/O operations
which are the bottlenecks in the types of application I write.

> - Let users lock, not the interpreter. If you use threads and you
> access global objects without locking, you might get undefined
> behaviors, just like in C/C++.

That will cause problems for much existing code.

> Unfortunately, those changes are big enough that I don't think they'll
> happen under the CPython source tree even we all wanted them.

It won't happen until the people that think this is a problem are
prepared to provide the effort to maintain free-threaded Python.

Neil

exa...@divmod.com

unread,

Oct 11, 2004, 9:43:17 PM10/11/04

to pytho...@python.org

On Mon, 11 Oct 2004 22:01:33 GMT, "Neil Hodgson" <nhod...@bigpond.net.au> wrote:
>Luis P Caamano:
>
> > Therefore, fixing the GIL was always in competition with existing code
> > and libraries, and it always lost because it was (and still is)
> > considered "not worth the effort."
>
> Greg Stein implemented a "free-threaded" version of Python in 1996 and
> had another look in 2000.
>
> http://python.tpnet.pl/contrib-09-Dec-1999/System/threading.README
> http://mail.python.org/pipermail/python-dev/2000-April/003605.html
>
> So, it is not the initial implementation of a GIL-free Python that is
> the stumbling block but the maintenance of the code. The poor performance of
> free-threaded Python contributed to the lack of interest.

"existing code" refers only partially to the current CPython implementation. Rather more importantly, it refers to the large number of third party extension modules with which it is desired to retain compatibility.

Recompiling CPython isn't a big deal. Recompiling, possibly after partially rewriting, N other packages is.

Jp

Bryan Olson

unread,

Oct 11, 2004, 11:46:15 PM10/11/04

to

Piet van Oostrum wrote:

>>>>>>Bryan Olson <fakea...@nowhere.org> (BO) wrote:
> BO> I'm not really up-to-date on modern multi-processor support.
> BO> Back in grad school I read some papers on cache coherence, and I
> BO> don't know how well the problems have been solved. The issue
> BO> was that a single processor can support a one-instruction lock
> BO> (in the usual no-contention case) simply by supplying an
> BO> uninterruptable read-and-update instruction, but on a multi-
> BO> processor, all the processors have respect the lock.
>
> For multiprocessor systems, uninterruptible isn't enough (depending
on your
> definition of uninterruptible). Memory operations from other processors
> also shouldn't interleave with the instruction. Modern processors usually
> have a 'test and set' or 'compare and swap' instruction for this purpose,
> which lock memory for the duration of the operation.

I'm with you that far. The reason I mentioned cache coherence
is that locking memory isn't enough on MP systems. Another
processor may have the value in local cache memory. For all the
processors to respect the lock, they have to communicate the
locking before any processor can update the address.

If every thread treats a certain memory address is used as lock,
each can access it only by instructions that go to main memory.
But then locking every object can have a devastating effect on
speed.

--
--Bryan Olson
(who, incidentlly, hates his initials)

Bengt Richter

unread,

Oct 12, 2004, 1:14:25 AM10/12/04

to

I don't know if smarter instructions are available now, but if a lock
is in a locked state, there is no point in writing locked status value
through to memory, which is what a naive atomic swap would do. The
returned value would just leave you to try again. If n-1 out of n
processors were doing that to a queue lock waiting for something to
be dequeuable, that would be a huge waste of bus bandwidth, maybe
interfering with DMA i/o. I believe the strategy is (or was, if better
instructions are available) to read lock values passively in a short
spin, so that each processor so doing is just looking at its cache value.
Then when the lock is released, that is done by a write through, and
everyone's cache is invalidated and reloaded, and potentially many see the
lock as free. At that point, everyone tries their atomic swap instructions,
writing locked status through to memory, and only one succeeds in getting
the unlocked value back in the atomic swap. As soon as everyone sees locked
again, they go back to passive spinning. The spins don't go on forever, since
multiple threads in one CPU may be waiting for different things. That's a balance
between context switch (within CPU) overhead vs a short spin lock wait. Probably
lock-dependent. Maybe automatically tunable.

The big MP cache hammer comes if multiple CPU's naively dequeue thread work
first-come-first-serve, and they alternate reloading each other's previously
cached thread active memory content. I suspect all of the above is history
long ago by now. It's been a while ;-)

Regards,
Bengt Richter

Nicolas Lehuen

unread,

Oct 12, 2004, 4:40:45 AM10/12/04

to

ale...@yahoo.com (Alex Martelli) wrote in message news:<1glbyrn.4jt08j144wg1dN%ale...@yahoo.com>...

> > Guido did not impose the GIL lightly. This is a hard problem.
>
> Sure. I wonder what (e.g.) IronPython or Ruby do about it -- never
> studied the internals of either, yet.
>
>
> Alex

I've just asked the question on the IronPython mailing list - I'll
post the answer here. I've also asked about Jython since Jim Hugunin
worked on it.

Just as a note, coming from a Java background (I switched to Python
two years ago and nearly never looked back - my current job allows
this, I'm an happy guy), I must say that there is a certain deficit of
thread-awareness in the Python community at large (not offense meant).

The stdlib is actually quite poor on the thread-safe side, and whereas
I see people rush to implement Doug Lea's concurrent package
[http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html]
on new languages like D, it seems that it is not a priority to the
Python community (maybe a point in the 'develop the stdlib vs develop
the interpreter and language' debate). A symptom of this is that
must-have classes like threading.RLock() are still implemented in
Python ! This should definitely be written in C for the sake of
performance.

I ran into many multi-threading problem in mod_python as well, and the
problem was the same ; people expected mod_python to run in a
multi-process context, not a multi-threaded context (I guess this is
due to a Linux-centered mindset, forgetting about BSD, MacosX or Win32
OSes). When I asked questions and pointed problem, the answer was 'Duh
- use Linux with the forking MPM, not Windows with the threading MPM'.
Well, this is not going to take us anywhere, especially with all the
multicore CPUs coming.

Java, C#, D and many other languages prove that threading is not
necessarily a complicated issue. I can be tricky, granted, but where
would be the fun it if it was not ? In a former life, I have
implemented an application server and had to write a big bunch of
code, always with thread awareness in mind. I guarantee you that
provided your language gives you the proper support (I was using
Java), it's a real treat :).

With all respect due to the Founding Fathers (and most of all Guido),
there are quite a few languages out there that do not require a GIL...
I understand that historical implementation issues meant that a GIL
was required (we talk about a language whose implementation began in
the early nineties), but maybe it is time to reconsider this in Python
3000 ? And while I'm on this iconoclast binge, why not forget about
reference counting and just switch to plain garbage collecting
everywhere, thus clearing a lot of the mess we have to deal with when
writing C extensions ? Plus, a full garbage collection model can
actually help with threading issues, and vice versa.

I know a lot of people who fleed Python for two reasons. The first
reason is a bad one, it's the indentation-as-syntax reason. My answer
is 'just try it'. The second reason is concerns about performance and
scalability, especially from those who heard about the GIL. Well, all
I can answer is 'That will improve over time, look at Psyco, for
instance'. But as far as the GIL is concerned, well... I'm as worried
as them. Please, please, could we change this ?

Regards,

Nicolas Lehuen

unread,

Oct 12, 2004, 5:07:57 AM10/12/04

to

"Neil Hodgson" <nhod...@bigpond.net.au> a écrit dans le message de news:1pDad.22529$5O5....@news-server.bigpond.net.au...

> Luis P Caamano:
>
> > Therefore, fixing the GIL was always in competition with existing code
> > and libraries, and it always lost because it was (and still is)
> > considered "not worth the effort."
>
> Greg Stein implemented a "free-threaded" version of Python in 1996 and
> had another look in 2000.
>
> http://python.tpnet.pl/contrib-09-Dec-1999/System/threading.README
> http://mail.python.org/pipermail/python-dev/2000-April/003605.html
>
> So, it is not the initial implementation of a GIL-free Python that is
> the stumbling block but the maintenance of the code. The poor performance of
> free-threaded Python contributed to the lack of interest.

Yeah, and the poor performance of the ActiveState Python for .NET implementation led them to tell that the CLR could not be efficient in implementing Python. Turns out that the problem was in the implementation, not the CLR, and that IronPython gives pretty good performances.

It's not because an implementation gives poor performance that the whole free-threaded concept is bad.

Plus, the benchmarks is highly questionable. Compare a single threaded CPU intensive program with a multi-threaded CPU intensive program, the single threaded program wins. Now compare IO intensive (especially network IO) programs, the multi-threaded program wins. Granted, you can built high performing asynchronous frameworks (Twisted or medusa come to mind), but sooner or later you'll have to use threads (see the DB adapter for Twisted).

Regards,

Nicolas Lehuen

Richie Hindle

unread,

Oct 12, 2004, 5:37:10 AM10/12/04

to pytho...@python.org

[Nicolas, arguing for a free-threading Python]

> compare IO intensive (especially network IO) programs, the multi-threaded
> program wins.

The GIL is released during network IO. Multithreaded network-bound
Python programs can take full advantage of multiple CPUs already.

(You may already know this and I'm missing a point somewhere, but it's
worth repeating for the benefit of those that don't know.)

--
Richie Hindle
ric...@entrian.com

Alex Martelli

unread,

Oct 12, 2004, 5:23:19 AM10/12/04

to

Nicolas Lehuen <nic...@lehuen.com> wrote:

> problem was the same ; people expected mod_python to run in a
> multi-process context, not a multi-threaded context (I guess this is
> due to a Linux-centered mindset, forgetting about BSD, MacosX or Win32
> OSes). When I asked questions and pointed problem, the answer was 'Duh
> - use Linux with the forking MPM, not Windows with the threading MPM'.

Sorry, I don't get your point. Sure, Windows makes process creation
hideously expensive and has no forking. But all kinds of BSD, including
MacOSX, are just great at forking. Why is a preference for multiple
processes over threads "forgetting about BSD, MacOSX", or any other
flavour of Unix for that matter?

> Well, this is not going to take us anywhere, especially with all the
> multicore CPUs coming.

Again, I don't get it. Why would multicore CPUs be any worse than
current multi-CPU machines at multiple processes, and forking?

Alex

Nicolas Lehuen

unread,

Oct 12, 2004, 9:05:03 AM10/12/04

to

"Alex Martelli" <ale...@yahoo.com> a écrit dans le message de news:1gljja1.1nxj82c1a25c1bN%ale...@yahoo.com...

> Nicolas Lehuen <nic...@lehuen.com> wrote:
>
> > problem was the same ; people expected mod_python to run in a
> > multi-process context, not a multi-threaded context (I guess this is
> > due to a Linux-centered mindset, forgetting about BSD, MacosX or Win32
> > OSes). When I asked questions and pointed problem, the answer was 'Duh
> > - use Linux with the forking MPM, not Windows with the threading MPM'.
>
> Sorry, I don't get your point. Sure, Windows makes process creation
> hideously expensive and has no forking. But all kinds of BSD, including
> MacOSX, are just great at forking. Why is a preference for multiple
> processes over threads "forgetting about BSD, MacOSX", or any other
> flavour of Unix for that matter?

Because when you have multithreaded programs, you can easily share objects between different threads, provided you carefully implement them. On the web framework I wrote, this means sharing and reusing the same DB connection pool, template cache, other caches and so on. This means a reduced memory footprint and increased performance. In a multi-process environment, you have to instantiate as many connections, caches, templates etc. that you have processes. This is a waste of time and memory.

BTW [shameless plug] here is the cookbook recipe I wrote about thread-safe caching.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997

> > Well, this is not going to take us anywhere, especially with all the
> > multicore CPUs coming.
>
> Again, I don't get it. Why would multicore CPUs be any worse than
> current multi-CPU machines at multiple processes, and forking?

Obviously they won't be any worse. Well, to be precise, it still depends on the OS, because the scheduler must know the difference between 2 processors and a 2-core processor to efficiently balance the work, but anyway.

What I meant is that right now I'm writing this on a desktop PC with hyperthreading. This means that even on a desktop PC you can benefit from having multithreaded (or multi-processed) applications. With multicore or multiprocessor machines being more and more current, the pressure to have proper threading support in Python will grow and grow.

Regards,

Nicolas

Nicolas Lehuen

unread,

Oct 12, 2004, 9:08:46 AM10/12/04

to

"Richie Hindle" <ric...@entrian.com> a écrit dans le message de news:mailman.4742.1097573...@python.org...

Yeah I know this. My point was that when benchmarking single-threaded programs vs multithreaded ones, you have to choose the kind of program you use : CPU-intensive vs IO-intensive. On a single CPU machine, CPU-intensive program will always run better in a single-threaded model. But of course, if you have 2 procs and a nice threading model, you can even do 2 CPU-intensive tasks simultaneously, which Python cannot do if I've understood everything so far.

Regards,

Nicolas

Nicolas Lehuen

unread,

Oct 12, 2004, 10:59:17 AM10/12/04

to

"Alex Martelli" <ale...@yahoo.com> a écrit dans le message de news:1gljufr.9d1d5319us58iN%ale...@yahoo.com...

> Nicolas Lehuen <nicolas...@thecrmcompany.com> wrote:
>
> > "Alex Martelli" <ale...@yahoo.com> a écrit dans le message de
> news:1gljja1.1nxj82c1a25c1bN%ale...@yahoo.com...
> > > Nicolas Lehuen <nic...@lehuen.com> wrote:
> > >
> > > > problem was the same ; people expected mod_python to run in a
> > > > multi-process context, not a multi-threaded context (I guess this is
> > > > due to a Linux-centered mindset, forgetting about BSD, MacosX or Win32
> > > > OSes). When I asked questions and pointed problem, the answer was 'Duh
> > > > - use Linux with the forking MPM, not Windows with the threading MPM'.
> > >
> > > Sorry, I don't get your point. Sure, Windows makes process creation
> > > hideously expensive and has no forking. But all kinds of BSD, including
> > > MacOSX, are just great at forking. Why is a preference for multiple
> > > processes over threads "forgetting about BSD, MacOSX", or any other
> > > flavour of Unix for that matter?
> >
> > Because when you have multithreaded programs, you can easily share objects
> between different threads, provided you carefully implement them. On the
> web framework I wrote, this means sharing and reusing the same DB
> connection pool, template cache, other caches and so on. This means a
> reduced memory footprint and increased performance. In a multi-process
> environment, you have to instantiate as many connections, caches,
> templates etc. that you have processes. This is a waste of time and
> memory.
>

> I'm not particularly interested in debating the pros and cons of threads
> vs processes, right now, but in getting a clarification of your original
> assertion which I _still_ don't get. It's still quoted up there, so
> please DO clarify: how would "a Linux-centered mindset forgetting about
> BSD" (and MacOSX is a BSD in all that matters in this context, I'd say)
> bias one against multi-threading or towards multi-processing? In what
> ways are you claiming that Unix-like systems with BSD legacy are
> inferior in multi-processing, or superior in multi-threading, to ones
> with Linux kernels? I think I understand both families decently well
> and yet I _still_ don't understand whence this claim is coming. (Forget
> the red herring of Win32 -- nobody's disputing that _their_ process
> spawning is a horror of inefficiency -- let's focus about Unixen, since
> you did see fit to mention them so explicitly contrasted, hm?!).

Wow, I don't want to launch any OS flame war there. My point is just that I have noticed that the vast majority of people running mod_python are running it on Apache 2 on Linux with the forking MPM. Hence, multi-threading problems that you DO encounter if you use the threading MPM are often dismissed because, hey, everybody uses the forking MPM in which a single thread handles all the requests it is given by the parent process. Now, when I discussed about this problem on the mod_python mailing list, I had some echo from people who would like to use the multithreading MPM under MacOS X (which is a BSD indeed). Just to say that people running Apache 2 on Win32 are not the only ones interested.

To sum up : people running mod_python under Linux don't have any multithreading issues. They represent 95% of mod_python's market (pure guess). So the multithreading issues have not many chances of being fixed soon. That is why I said that a "Linux-centered mindset forgetting [other OSes, none in particular]" is hindering the bugfix process.

> _Then_, once that point is cleared, we may (e.g.) debate how the newest
> Linux VM development (presumably coming in 2.6.9) may make mmap quite as
> fast as sharing memory among threads (which, I gather from hearsay only,
> is essentially the case today already... but _only_ for machines with no
> more than 2GB, 3GB tops of physical memory being so shared -- the claim
> is that the newest developments will let you have upwards of 256 GB of
> physical memory shares with similar efficiency, by doing away with the
> current pagetables overhead of mmap), and what (if anything) is there in
> BSD-ish kernels to match that accomplishments. But until you clarify
> that (to me) strange and confusing assertion, to help me understand what
> point you were making there (and nothing in this "answer" of yours is at
> all addressing my doubts and my very specific question about it!), I see
> little point in trying to delve into such exoterica.

I do hope the point above is cleared.

Now, you propose to share objects between Python VMs using shared memory. Why not, if it is correctly implemented (I've seen it done in Gemstone for Java, IIRC), I'd be as happy with this as I am when sharing objects between threads. The trouble is that you'll have exactly the same problems, if not more. You'll have to implement the same locking primitives. You'll have to make sure that all the stdlib and extensions are ready to support objects that are shared this way. All the trouble we have now with multiple threads, you'll have with multiple processes. And you may have so big portability issues. I don't see any benefit vs the work already done, even if tiny, on the multithreading support.

>
> > BTW [shameless plug] here is the cookbook recipe I wrote about thread-safe
> caching.
> >
> > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302997
>

> I do not see any relevance of that recipe (with which I'm quite
> familiar, since I'm preparing the 2nd Edition of the Cookbook) to your
> assertion with Linux on one side, and BSD derivatives grouped with
> Windows on the other, quoted, questioned and never clarified above.

Well, that was a shameless plug... But this recipe allows you to build, for example, module caches that can safely be shared between threads, with minimum locking. I encountered a bug in mod_python on the subject, which I fixed using it, so I though it might be illustrative.

> > > Again, I don't get it. Why would multicore CPUs be any worse than
> > > current multi-CPU machines at multiple processes, and forking?
> >
> > Obviously they won't be any worse. Well, to be precise, it still depends
> on the OS, because the scheduler must know the difference between 2
> processors and a 2-core processor to efficiently balance the work, but
> anyway.
>

> As far as I know there is no special support in current kernels for
> multi-core CPUs as differentiated from multiple CPUs sharing external
> buses... (nor even yet, unless I'm mistaken, for such fundamental
> novelties as HyperTransport, aka DirectConnect, which _has_ been around
> for quite a while now -- and points to a completely different paradigm
> than shared memory as being potentially much-faster IPC... bandwidths of
> over 10 gigabytes/second, which poor overworked memory subsystems might
> well have some trouble reaching... how one would exploit hypertransport
> within multiple threads of a process, programmed on the basis of sharing
> memory, I dunno -- using it between separate processes which exchange
> messages appears conceptually simpler). Anyway, again to the limited
> amount of my current knowledge, this holds just as much for multiple
> threads as for multiple processes, no?

I am way beyond my competences here, but I've read some articles about hyperthreading-aware schedulers (in WinXP, Win2003, and patches for Linux). The idea is that on multi-core CPUs, threads from the same process should be ran on the same core for maximum cache efficiency, whereas different processes can freely run on different cores. I've read that you can effectively have worse performances on multicore CPUs if the kernel scheduler does not know about HT. I cannot find the articles again but there are a bunch referenced on Google :

http://www.google.com/search?q=hyperthread+scheduler

But apart from this caveat, yes, multi-threads and multi-processes application equally benefit from multi-core CPUs.

>
> > What I meant is that right now I'm writing this on a desktop PC with
> hyperthreading. This means that even on a desktop PC you can benefit
> from having multithreaded (or multi-processed) applications.
>

> I'm writing this on my laptop (uniprocessor, no quirks), because I'm on
> a trip, but at home I do have a dual-processor desktop (actually a
> minitower, but many powerful 'desktops' are that way), and it's a
> year-old model (and Apple was making dual processors for years before
> the one I own, though with 32-bit 'G4' chips rather than 64-bit 'G5'
> ones they're using now). So this is hardly news: I can run make on a
> substantial software system much faster with a -j switch to let it spawn
> multiple jobs (processes).

>
> > With multicore or multiprocessor machines being more and more current, the
> pressure to have proper threading support in Python will grow and grow.
>

> The pressure has been growing for a while and I concur it will keep
> growing, particularly since the OS by far most widespread on desktops
> has such horrible features for multiple-process spawning and control.
> But, again, before we go on to debate this, I would really appreciate it
> if you clarified your previous assertions, to help me understand why you
> believe that BSD derivatives, including Mac OS X, are to be grouped on
> the same side as Windows, while only Linux would favour processes over
> threads -- when, to _me_, it seems so obvious that the reasonable
> grouping is with all Unix-like systems on one side, Win on the other.
> You either know something I don't, about the internals of these systems,
> and it appears so obvious to you that you're not even explaining it now
> that I have so specifically requested you to explain; or there is
> something else going on that I really do not understand.

I hope it has been cleared : it's just that some people using MacOS X seemed as interested as me in having a decent multi-threading support in mod_python. If more people used a multi-threading MPM on Linux, their reaction would be the same. I'm really not religious about OSes, no offence meant.

Best regards,

Nicolas

exa...@divmod.com

unread,

Oct 12, 2004, 9:50:03 AM10/12/04

to pytho...@python.org

On Tue, 12 Oct 2004 15:08:46 +0200, "Nicolas Lehuen" <nicolas...@thecrmcompany.com> wrote:
> [snip]

>
> Yeah I know this. My point was that when benchmarking single-threaded
> programs vs multithreaded ones, you have to choose the kind of program
> you use : CPU-intensive vs IO-intensive. On a single CPU machine,
> CPU-intensive program will always run better in a single-threaded model.
> But of course, if you have 2 procs and a nice threading model, you can
> even do 2 CPU-intensive tasks simultaneously, which Python cannot do if
> I've understood everything so far.

The way to do this is to write an extension module, implement your CPU-intensive tasks, and release the GIL around it. You can argue that this is not "in Python", but it is possible, widely practiced, supported by the language, developer community, and existing APIs, and lets you adjust Python's threading behavior _specific to your application_, rather than requiring every Python program written by every Python programmer comply with your wishes for the language.

Jp

exa...@divmod.com

unread,

Oct 12, 2004, 9:37:35 AM10/12/04

to pytho...@python.org

On Tue, 12 Oct 2004 11:07:57 +0200, "Nicolas Lehuen" <nicolas...@thecrmcompany.com> wrote:
>
> "Neil Hodgson" <nhod...@bigpond.net.au> a =E9crit dans le message de =

> news:1pDad.22529$5O5....@news-server.bigpond.net.au...
> > Luis P Caamano:

> >=20
> > > Therefore, fixing the GIL was always in competition with existing =

> code
> > > and libraries, and it always lost because it was (and still is)
> > > considered "not worth the effort."

> >=20
> > Greg Stein implemented a "free-threaded" version of Python in 1996 =

> and
> > had another look in 2000.

> >=20
> > http://python.tpnet.pl/contrib-09-Dec-1999/System/threading.README
> > http://mail.python.org/pipermail/python-dev/2000-April/003605.html
> >=20
> > So, it is not the initial implementation of a GIL-free Python that =
> is
> > the stumbling block but the maintenance of the code. The poor =

> performance of
> > free-threaded Python contributed to the lack of interest.
>

> Yeah, and the poor performance of the ActiveState Python for .NET =
> implementation led them to tell that the CLR could not be efficient in =
> implementing Python. Turns out that the problem was in the =
> implementation, not the CLR, and that IronPython gives pretty good =
> performances.
>
> It's not because an implementation gives poor performance that the whole =
> free-threaded concept is bad.
>
> Plus, the benchmarks is highly questionable. Compare a single threaded =
> CPU intensive program with a multi-threaded CPU intensive program, the =
> single threaded program wins. Now compare IO intensive (especially =
> network IO) programs, the multi-threaded program wins. Granted, you can =
> built high performing asynchronous frameworks (Twisted or medusa come to =
> mind), but sooner or later you'll have to use threads (see the DB =
> adapter for Twisted).

Uh, no. Threads are useful for supporting libraries with blocking APIs. Nothing about database access requires threads. Twisted uses threads DB-API 2.0 specifies a blocking API, and it is useful to support DB-API 2.0 modules.

Several SQL modules provide an asynchronous API which can be used without threads.

When it comes to network-bound tasks, single-threaded wins the performance contest.

Jp

Nicolas Lehuen

unread,

Oct 12, 2004, 11:08:29 AM10/12/04

to

<exa...@divmod.com> a écrit dans le message de news:mailman.4751.1097589...@python.org...

And then I have to tackle all these ugly INCREFs and DECREFs and forget about using a marvellous language. I'm sorry but if I switched from Java to Python, it's not to write C again. I did, though, since I wrote a ternary search tree in C with a SWIG interface to Python, but I won't do it again if I can help it.

Building a language is about bringing a service to the developers. Every language has its limitations, but since it has been proved that other language can handle threading more gracefully, I think the question can be opened without being deferred to another language which is NOT very kind to developers (putting the obligatory "I'm a C h4cker" snobbery aside).

I want to write Python extensions in D !

Regards,
Nicolas

Alex Martelli

unread,

Oct 12, 2004, 11:55:03 AM10/12/04

to

Nicolas Lehuen <nicolas...@thecrmcompany.com> wrote:
...

> > > > Sorry, I don't get your point. Sure, Windows makes process creation
> > > > hideously expensive and has no forking. But all kinds of BSD, including
> > > > MacOSX, are just great at forking. Why is a preference for multiple
> > > > processes over threads "forgetting about BSD, MacOSX", or any other
> > > > flavour of Unix for that matter?

...

> mod_python's market (pure guess). So the multithreading issues have not
> many chances of being fixed soon. That is why I said that a
> "Linux-centered mindset forgetting [other OSes, none in particular]" is
> hindering the bugfix process.

...

> I do hope the point above is cleared.

It's basically "retracted", as I see thing, except that whatever's left
still doesn't make any sense to me. Why should a Linux user need to
"forget" another system, that's just as perfect for multiprocessing as
Linux, before deciding he's got better things to do with his or her time
than work on a problem which multiprocessing finesses>

> Now, you propose to share objects between Python VMs using shared memory.

Not necessarily -- as you say, if you need synchronization the problem
is just about as hard (maybe even worse). I was just trying to
understand the focus on multithreading vs multiprocessing, it now
appears it's more of a focus on the shared-memory paradigm of
multiprocessing in general.

> > > Obviously they won't be any worse. Well, to be precise, it still depends
> > on the OS, because the scheduler must know the difference between 2
> > processors and a 2-core processor to efficiently balance the work, but

...

> I am way beyond my competences here, but I've read some articles about
> hyperthreading-aware schedulers (in WinXP, Win2003, and patches for
> Linux). The idea is that on multi-core CPUs, threads from the same
> process should be ran on the same core for maximum cache efficiency,
> whereas different processes can freely run on different cores. I've read

And how is this a difference between two processors and two cores within
the same processor, which is what I quoted you above as saying? If two
CPUs do not share caches, the CPU-affinity issues (of processing units
that share address spaces vs ones that don't) would appear to be the
same. If two CPUs share some level of cache (as some multi-CPU designs
do), that's different from the case where the CPUs share no cache but to
share RAM.

> But apart from this caveat, yes, multi-threads and multi-processes
> application equally benefit from multi-core CPUs.

So it would seem to me, yes -- except that if CPUs share caches this may
help (perhaps) used of shared memory (if the cache design is optimized
for that and the scheduler actively helps), even though even in that
case it doesn't seem to me you can reach the same bandwidth that
hypertransport promises for a more streaming/message passing approach.

Alex

Nicolas Lehuen

unread,

Oct 12, 2004, 11:23:10 AM10/12/04

to

<exa...@divmod.com> a écrit dans le message de news:mailman.4747.1097588...@python.org...

> > Plus, the benchmarks is highly questionable. Compare a single threaded =
> > CPU intensive program with a multi-threaded CPU intensive program, the =
> > single threaded program wins. Now compare IO intensive (especially =
> > network IO) programs, the multi-threaded program wins. Granted, you can =
> > built high performing asynchronous frameworks (Twisted or medusa come to =
> > mind), but sooner or later you'll have to use threads (see the DB =
> > adapter for Twisted).
>
> Uh, no. Threads are useful for supporting libraries with blocking APIs. Nothing about database access requires threads. Twisted uses threads DB-API 2.0 specifies a blocking API, and it is useful to support DB-API 2.0 modules.
>
> Several SQL modules provide an asynchronous API which can be used without threads.
>
> When it comes to network-bound tasks, single-threaded wins the performance contest.
>
> Jp

OK, how many SQL modules with an asynchronous API in Python ?

You are perfectly right. "Threads are useful for supporting libraries with blocking APIs.". Blocking APIs are way more current in Python and other languages than asynchronous APIs.

Many people (including me) find it more easy to write and debug synchronous code than asynchronous bug. As I've wrote elsewhere, building a language is about bringing a service to developers. So either we make it easy to write asynchronous code (e.g. include Twister's Deferred in the stdlib) and add functionalities to debuggers to be able to track asynchronous code (because, well, the basic programming model in a debugger is synchronous call stacks), or we build a better multi-threading support in Python. Or we could stick back to assembly code, use "jump" all over the place and forget about building fancy languages.

Regards,
Nicolas

Nicolas Lehuen

unread,

Oct 12, 2004, 12:57:50 PM10/12/04

to

"Alex Martelli" <ale...@yahoo.com> a écrit dans le message de news:1glk0w7.wac0yl4laurwN%ale...@yahoo.com...

> Nicolas Lehuen <nicolas...@thecrmcompany.com> wrote:
> ...
> > > > > Sorry, I don't get your point. Sure, Windows makes process creation
> > > > > hideously expensive and has no forking. But all kinds of BSD, including
> > > > > MacOSX, are just great at forking. Why is a preference for multiple
> > > > > processes over threads "forgetting about BSD, MacOSX", or any other
> > > > > flavour of Unix for that matter?
> ...
> > mod_python's market (pure guess). So the multithreading issues have not
> > many chances of being fixed soon. That is why I said that a
> > "Linux-centered mindset forgetting [other OSes, none in particular]" is
> > hindering the bugfix process.
> ...
> > I do hope the point above is cleared.
>
> It's basically "retracted", as I see thing, except that whatever's left
> still doesn't make any sense to me. Why should a Linux user need to
> "forget" another system, that's just as perfect for multiprocessing as
> Linux, before deciding he's got better things to do with his or her time
> than work on a problem which multiprocessing finesses>

Because a Linux user could eventually be interested in sharing data amongst different workers - be it processes or threads. Things that are taken for granted in the Java world, like a good connection pool, are still not being used whereas they are really interesting from a performance and management point of view.

> > Now, you propose to share objects between Python VMs using shared memory.
>
> Not necessarily -- as you say, if you need synchronization the problem
> is just about as hard (maybe even worse). I was just trying to
> understand the focus on multithreading vs multiprocessing, it now
> appears it's more of a focus on the shared-memory paradigm of
> multiprocessing in general.

Exactly. I don't care about multithreading or multiprocessing, I want to share data between my workers. It's just more easily done with threads, as of today. The connection pool example is a good one : how can you easily share TCP connection to a DBMS server between processes ? With threads it's a matter of a few lines of Python code (I'll publish my connection pool code on the Python Cookbook soon).

> > > > Obviously they won't be any worse. Well, to be precise, it still depends
> > > on the OS, because the scheduler must know the difference between 2
> > > processors and a 2-core processor to efficiently balance the work, but
> ...
> > I am way beyond my competences here, but I've read some articles about
> > hyperthreading-aware schedulers (in WinXP, Win2003, and patches for
> > Linux). The idea is that on multi-core CPUs, threads from the same
> > process should be ran on the same core for maximum cache efficiency,
> > whereas different processes can freely run on different cores. I've read
>
> And how is this a difference between two processors and two cores within
> the same processor, which is what I quoted you above as saying? If two
> CPUs do not share caches, the CPU-affinity issues (of processing units
> that share address spaces vs ones that don't) would appear to be the
> same. If two CPUs share some level of cache (as some multi-CPU designs
> do), that's different from the case where the CPUs share no cache but to
> share RAM.
>
> > But apart from this caveat, yes, multi-threads and multi-processes
> > application equally benefit from multi-core CPUs.
>
> So it would seem to me, yes -- except that if CPUs share caches this may
> help (perhaps) used of shared memory (if the cache design is optimized
> for that and the scheduler actively helps), even though even in that
> case it doesn't seem to me you can reach the same bandwidth that
> hypertransport promises for a more streaming/message passing approach.
>
>
> Alex

The trick is that there are many levels of internal or external cache, and that IIRC from my readings the two cores of a Pentium IV processor with HT share some level of internal cache.

But again, I'm pretty much outsmarted on the subject, so I'll call it quits on the subject of hyperthreading and its compared impact on the threaded model vs the fork model :)

Best regards,

Nicolas

Nicolas Lehuen

unread,

Oct 12, 2004, 9:39:20 PM10/12/04

to

Aahz wrote:
> BTW, Nicolas, it'd be nice if you did not post quoted-printable. I'm
> leaving the crud in so you can see it.

I write non-ASCII emails, since I'm french (using accentuated characters
and all that), what is the encoding you would suggest me to use ? AFAIK,
quoted-printable is a standard encoding (though a poor one). The MUA I
used may have chosed a bad encoding, since it was Outlook Express. At
home, I use Thunderbird 0.8, tell me if you feel better with the
encoding it uses.

> In article <416bd75f$0$10909$afc3...@news.easynet.fr>,
> Nicolas Lehuen <nicolas...@thecrmcompany.com> wrote:
>
>>Yeah I know this. My point was that when benchmarking single-threaded =
>>programs vs multithreaded ones, you have to choose the kind of program =
>>you use : CPU-intensive vs IO-intensive. On a single CPU machine, =

>>CPU-intensive program will always run better in a single-threaded

model. =
>>But of course, if you have 2 procs and a nice threading model, you can =
>>even do 2 CPU-intensive tasks simultaneously, which Python cannot do if =

>>I've understood everything so far.
>
>

> Python itself cannot, but there's nothing preventing anyone from writing
> numeric extensions that release the GIL.

There is nothing preventing anyone from writing a new language from
scratch, either. What I was suggesting was that we could try to make
Python a more thread-friendly language, so that we could write more
CPU-efficient code in the new context of multicore or multi-CPU machines
without having to revert to a low-level language like C.

If only I could tell the Python Interpreter 'OK, this bit of Python code
is thread-safe, so you don't need to hold the GIL on it', maybe I could
be more efficient, but as of today, this is a thing that can only be
done in C, not in Python. Of course, being able to release the GIL from
Python code is not a true solution. A true solution would be to get rid
of the GIL itself.

Regards,

Nicolas

Nicolas Lehuen

unread,

Oct 12, 2004, 9:38:33 PM10/12/04

to Aahz

Aahz wrote:
> BTW, Nicolas, it'd be nice if you did not post quoted-printable. I'm
> leaving the crud in so you can see it.

I write non-ASCII emails, since I'm french (using accentuated characters
and all that), what is the encoding you would suggest me to use ? AFAIK,
quoted-printable is a standard encoding (though a poor one). The MUA I
used may have chosed a bad encoding, since it was Outlook Express. At
home, I use Thunderbird 0.8, tell me if you feel better with the
encoding it uses.

> In article <416bd75f$0$10909$afc3...@news.easynet.fr>,
> Nicolas Lehuen <nicolas...@thecrmcompany.com> wrote:
>

>>Yeah I know this. My point was that when benchmarking single-threaded =
>>programs vs multithreaded ones, you have to choose the kind of program =
>>you use : CPU-intensive vs IO-intensive. On a single CPU machine, =
>>CPU-intensive program will always run better in a single-threaded model. =
>>But of course, if you have 2 procs and a nice threading model, you can =
>>even do 2 CPU-intensive tasks simultaneously, which Python cannot do if =

>>I've understood everything so far.
>
>

Jon Perez

unread,

Oct 25, 2004, 9:43:30 PM10/25/04

to

http://poshmodule.sf.net

POSH (Python Object sharing) allows Python objects
to be placed in shared memory and accessed transparently
between different Python processes.

This removes the need to futz around with IPC mechanisms
and lets you use Python objects (placed in shared memory)
between processes in much the same way that you use them
in threads.

"On multiprocessor architectures, multi-process applications
using POSH can significantly outperform similar multi-threaded
applications, since Python threads don't scale to take advantage
of multiple processors. Even so, POSH lends itself to a
programming model very similar to threads."

POSH has actually been around for quite a while, so I wonder
why it has not been mentioned more often.

P.M. wrote:

> ... I've seen advice that the only way to get
> scalability, due to the GIL, is to use an IPC mechanism between
> multiple distinct processes running seperate interpreters. Is this
> still true or are there better ways to accomplish this?
>
> Thanks
> Peter

Andrew Dalke

unread,

Oct 26, 2004, 1:46:51 AM10/26/04

to

Jon Perez wrote:
> POSH (Python Object sharing) allows Python objects
> to be placed in shared memory and accessed transparently
> between different Python processes.

...

> POSH has actually been around for quite a while, so I wonder
> why it has not been mentioned more often.

How often should it be mentioned? I mentioned it twice
early this month. The previous mentions were in August,
then April, then January, then ...

Huh. But I see I mention it the most, so I'm biased.

The major problem I have with it is its use of Intel
specific assembly, generated (I as I recall) through
gcc-specific inline code. I run OS X.

So the only people who would use it are Python developers
running on a multi-proc Intel machine with Linux/*BSD
who have compute intensive jobs and aren't interested
in portability. And don't prefer the dozen+ other more
mature/robust solutions.

I suspect that might be a small number.

Andrew
da...@dalkescientific.com

Alex Martelli

unread,

Oct 26, 2004, 3:22:36 AM10/26/04

to

Andrew Dalke <ada...@mindspring.com> wrote:
...

> The major problem I have with it is its use of Intel
> specific assembly, generated (I as I recall) through
> gcc-specific inline code. I run OS X.

Ouch -- and here I was, all rarin' to download and try in on my dual G5
powermac. Ah well! Tx for saving me the trouble, Andrew.

While intel-like CPUs are surely >10 times more widespread than ppc-like
ones _in general_, I think that's not the ratio for SMPs, given that
dual-G4 and dual-G5 powermacs have been sold in such large numbers over
the years. Add to that the general popularity of Macs in the Python
community, and anybody interested in multi-CPU use of Python might want
to consider how much wider an audience he'd get if he supported MacOSX
as well as intelloids. Andrew and I are just two examples of
Pythonistas who'd plunge to try out a POSH-like approach _iff_ it
supported our powermacs, but may not bother as it doesn't (I confess
I've never gone into enough depth on G4/G5 machine code/assembly, yet,
to be of much help _porting_ some intelloid-specific solution... my
[dated] assembly expertise is on intelloids and very different
architectures such as 6502, S/370, and VAX...).

Alex