The Future of Python Threading

Justin T.

unread,

Aug 10, 2007, 6:01:51 AM8/10/07

to

Hello,

While I don't pretend to be an authority on the subject, a few days of
research has lead me to believe that a discussion needs to be started
(or continued) on the state and direction of multi-threading python.

Python is not multi-threading friendly. Any code that deals with the
python interpreter must hold the global interpreter lock (GIL). This
has the effect of serializing (to a certain extent) all python
specific operations. IE, any thread that is written purely in python
will not release the GIL except at particular (and possibly non-
optimal) times. Currently that's the rather arbitrary quantum of 100
bytecode instructions. Since the ability of the OS to schedule python
threads is based on when its possible to run that thread (according to
the lock), python threads do not benefit from a good scheduler in the
same manner that real OS threads do, even though python threads are
supposed to be a thin wrapper around real OS threads[1].

The detrimental effects of the GIL have been discussed several times
and nobody has ever done anything about it. This is because the GIL
isn't really that bad right now. The GIL isn't held that much, and
pthreads spawned by python-C interations (Ie, those that reside in
extensions) can do all their processing concurrently as long as they
aren't dealing with python data. What this means is that python
multithreading isn't really broken as long as python is thought of as
a convenient way of manipulating C. After all, 100 bytecode
instructions go by pretty quickly, so the GIL isn't really THAT
invasive.

Python, however, is much better than a convenient method of
manipulating C. Python provides a simple language which can be
implemented in any way, so long as promised behaviors continue. We
should take advantage of that.

The truth is that the future (and present reality) of almost every
form of computing is multi-core, and there currently is no effective
way of dealing with concurrency. We still worry about setting up
threads, synchronization of message queues, synchronization of shared
memory regions, dealing with asynchronous behaviors, and most
importantly, how threaded an application should be. All of this is
possible to do manually in C, but its hardly optimal. For instance, at
compile time you have no idea if your library is going to be running
on a machine with 1 processor or 100. Knowing that makes a huge
difference in architecture as 200 threads might run fine on the 100
core machine where it might thrash the single processor to death.
Thread pools help, but they need to be set up and initialized. There
are very few good thread pool implementations that are meant for
generic use.

It is my feeling that there is no better way of dealing with dynamic
threading than to use a dynamic language. Stackless python has proven
that clever manipulation of the stack can dramatically improve
concurrent performance in a single thread. Stackless revolves around
tasklets, which are a nearly universal concept.

For those who don't follow experimental python implementations,
stackless essentially provides an integrated scheduler for "green
threads" (tasklets), or extremely lightweight snippets of code that
can be run concurrently. It even provides a nice way of messaging
between the tasklets.

When you think about it, lots of object oriented code can be organized
as tasklets. After all, encapsulation provides an environment where
side effects of running functions can be minimized, and is thus
somewhat easily parallelized (with respect to other objects).
Functional programming is, of course, ideal, but its hardly the trendy
thing these days. Maybe that will change when people realize how much
easier it is to test and parallelize.

What these seemingly unrelated thoughts come down to is a perfect
opportunity to become THE next generation language. It is already far
more advanced than almost every other language out there. By
integrating stackless into an architecture where tasklets can be
divided over several parallelizable threads, it will be able to
capitalize on performance gains that will have people using python
just for its performance, rather than that being the excuse not to use
it.

The nice thing is that this requires a fairly doable amount of work.
First, stackless should be integrated into the core. Then there should
be an effort to remove the reliance on the GIL for python threading.
After that, advanced features like moving tasklets amongst threads
should be explored. I can imagine a world where a single python web
application is able to redistribute its millions of requests amongst
thousands of threads without the developer ever being aware that the
application would eventually scale. An efficient and natively multi-
threaded implementation of python will be invaluable as cores continue
to multiply like rabbits.

There has been much discussion on this in the past [2]. Those
discussions, I feel, were premature. Now that stackless is mature (and
continuation free!), Py3k is in full swing, and parallel programming
has been fully realized as THE next big problem for computer science,
the time is ripe for discussing how we will approach multi-threading
in the future.

Justin

[1] I haven't actually looked at the GIL code. It's possible that it
creates a bunch of wait queues for each nice level that a python
thread is running at and just wakes up the higher priority threads
first, thus maintaining the nice values determined by the scheduler,
or something. I highly doubt it. I bet every python thread gets an
equal chance of getting that lock despite whatever patterns the
scheduler may have noticed.

[2]
http://groups.google.com/group/comp.lang.python/browse_thread/thread/7d16083cf34a706f/858e64a2a6a5d976?q=stackless%2C+thread+paradigm&lnk=ol&
http://www.stackless.com/pipermail/stackless/2003-June/000742.html
... More that I lost, just search this group.

Jean-Paul Calderone

unread,

Aug 10, 2007, 6:52:13 AM8/10/07

to pytho...@python.org

On Fri, 10 Aug 2007 10:01:51 -0000, "Justin T." <jmtu...@gmail.com> wrote:
>Hello,
>
>While I don't pretend to be an authority on the subject, a few days of
>research has lead me to believe that a discussion needs to be started
>(or continued) on the state and direction of multi-threading python.
>

> [snip - threading in Python doesn't exploit hardware level parallelism
> well, we should incorporate stackless and remove the GIL to fix this]
>

I think you have a misunderstanding of what greenlets are. Greenlets are
essentially a non-preemptive user-space threading mechanism. They do not
allow hardware level parallelism to be exploited.

In a hypothetical Python interpreter, equivalent to current CPython except
for the removal of the GIL, you could certainly have greenlets running in
different pthreads, and thus take care of hardware parallelism, but you
could do the same without greenlets too.

So Stackless Python is an unrelated matter.

>There has been much discussion on this in the past [2]. Those
>discussions, I feel, were premature. Now that stackless is mature (and
>continuation free!), Py3k is in full swing, and parallel programming
>has been fully realized as THE next big problem for computer science,
>the time is ripe for discussing how we will approach multi-threading
>in the future.

Many of the discussions rehash the same issues as previous ones. Many
of them are started based on false assumptions or are discussions between
people who don't have a firm grasp of the relevant issues.

I don't intend to suggest that no improvements can be made in this area of
Python interpreter development, but it is a complex issue and cheerleading
will only advance the cause so far. At some point, someone needs to write
some code. Stackless is great, but it's not the code that will solve this
problem.

In the mean time, you might consider some multi-process solutions. There
are a number of tools for getting concurrency like that.

Jean-Paul

Steve Holden

unread,

Aug 10, 2007, 6:57:20 AM8/10/07

to pytho...@python.org

Justin T. wrote:
> Hello,
>
> While I don't pretend to be an authority on the subject, a few days of
> research has lead me to believe that a discussion needs to be started
> (or continued) on the state and direction of multi-threading python.

[...]

> What these seemingly unrelated thoughts come down to is a perfect
> opportunity to become THE next generation language. It is already far
> more advanced than almost every other language out there. By
> integrating stackless into an architecture where tasklets can be
> divided over several parallelizable threads, it will be able to
> capitalize on performance gains that will have people using python
> just for its performance, rather than that being the excuse not to use
> it.
>

Aah, the path to world domination. You know you don't *have* to use
Python for *everything*.

> The nice thing is that this requires a fairly doable amount of work.
> First, stackless should be integrated into the core. Then there should
> be an effort to remove the reliance on the GIL for python threading.
> After that, advanced features like moving tasklets amongst threads
> should be explored. I can imagine a world where a single python web
> application is able to redistribute its millions of requests amongst
> thousands of threads without the developer ever being aware that the
> application would eventually scale. An efficient and natively multi-
> threaded implementation of python will be invaluable as cores continue
> to multiply like rabbits.
>

Be my guest, if it's so simple.

> There has been much discussion on this in the past [2]. Those
> discussions, I feel, were premature. Now that stackless is mature (and
> continuation free!), Py3k is in full swing, and parallel programming
> has been fully realized as THE next big problem for computer science,
> the time is ripe for discussing how we will approach multi-threading
> in the future.
>

I doubt that a thread on c.l.py is going to change much. It's the
python-dev and py3k lists where you'll need to take up the cudgels,
because I can almost guarantee nobody is going to take the GIL out of
2.6 or 2.7.

>
> [1] I haven't actually looked at the GIL code. It's possible that it
> creates a bunch of wait queues for each nice level that a python
> thread is running at and just wakes up the higher priority threads
> first, thus maintaining the nice values determined by the scheduler,
> or something. I highly doubt it. I bet every python thread gets an
> equal chance of getting that lock despite whatever patterns the
> scheduler may have noticed.
>

It's possible that a little imp tosses a coin and decides which thread
gets to run next. The point of open source is that it's easy to dispel
ignorance and make your own changes if you are competent to do so. Talk
is cheap, code is costly. Your bet is worth nothing. Is it even possible
to run threads of the same process at different priority levels on all
platforms?

> [2]
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/7d16083cf34a706f/858e64a2a6a5d976?q=stackless%2C+thread+paradigm&lnk=ol&
> http://www.stackless.com/pipermail/stackless/2003-June/000742.html
> ... More that I lost, just search this group.
>

Anyone who's been around Python for a while is familiar with the issues.
Have you actually asked Chris Tismer whether he's happy to see Stackless
go into the core? He was far from certain that would be a good idea at
PyCon earlier this year. He probably has his reasons ...

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Paul Boddie

unread,

Aug 10, 2007, 7:26:24 AM8/10/07

to

On 10 Aug, 12:01, "Justin T." <jmtull...@gmail.com> wrote:
>
> While I don't pretend to be an authority on the subject, a few days of
> research has lead me to believe that a discussion needs to be started
> (or continued) on the state and direction of multi-threading python.
>
> Python is not multi-threading friendly.

Yes it is: Jython and IronPython support free threading; CPython,
however, does not. Meanwhile, take a look at this page for different
flavours of alternative solutions:

http://wiki.python.org/moin/ParallelProcessing

Paul

Bjoern Schliessmann

unread,

Aug 10, 2007, 7:59:58 AM8/10/07

to

Justin T. wrote:

> The detrimental effects of the GIL have been discussed several
> times and nobody has ever done anything about it.

Also it has been discussed that dropping the GIL concept requires
very fine locking mechanisms inside the interpreter to keep data
serialised. The overhead managing those fine ones properly is not
at all negligible, and I think it's more than managing GIL.

> The truth is that the future (and present reality) of almost every
> form of computing is multi-core,

Is it? 8)

The question is: If it really was, how much of useful performance
gain would you get?

Regards,

Björn

--
BOFH excuse #336:

the xy axis in the trackball is coordinated with the summer solstice

kyos...@gmail.com

unread,

Aug 10, 2007, 9:20:31 AM8/10/07

to

On Aug 10, 5:57 am, Steve Holden <st...@holdenweb.com> wrote:
> Justin T. wrote:
> > Hello,

>
> > The nice thing is that this requires a fairly doable amount of work.
> > First, stackless should be integrated into the core. Then there should
> > be an effort to remove the reliance on the GIL for python threading.
> > After that, advanced features like moving tasklets amongst threads
> > should be explored. I can imagine a world where a single python web
> > application is able to redistribute its millions of requests amongst
> > thousands of threads without the developer ever being aware that the
> > application would eventually scale. An efficient and natively multi-
> > threaded implementation of python will be invaluable as cores continue
> > to multiply like rabbits.
>
> Be my guest, if it's so simple.
>
>
>

> regards
> Steve
> --
> Steve Holden +1 571 484 6266 +1 800 494 3119
> Holden Web LLC/Ltd http://www.holdenweb.com
> Skype: holdenweb http://del.icio.us/steve.holden
> --------------- Asciimercial ------------------
> Get on the web: Blog, lens and tag the Internet
> Many services currently offer free registration
> ----------- Thank You for Reading -------------

I've been watching this Intel TBB thing and it sounds like it might be
useful for threading. This fellow claims that he's going to play with
SWIG and may use it to create bindings to TBB for Python, although
he's starting with Perl:

http://softwareblogs.intel.com/2007/07/26/threading-building-blocks-and-c/

Dunno what the license ramifications are though.

Mike

Aahz

unread,

Aug 10, 2007, 9:36:23 AM8/10/07

to

In article <mailman.1826.1186743...@python.org>,

Steve Holden <st...@holdenweb.com> wrote:
>
>I doubt that a thread on c.l.py is going to change much. It's the
>python-dev and py3k lists where you'll need to take up the cudgels,
>because I can almost guarantee nobody is going to take the GIL out of
>2.6 or 2.7.

Actually, python-ideas is the right place for this. Threads have been
hashed and rehashed over and over and over and over and over again
through the years; python-dev and python-3000 should not get cluttered
because another Quixote rides up.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS

Ben Finney

unread,

Aug 10, 2007, 10:38:39 AM8/10/07

to

"Justin T." <jmtu...@gmail.com> writes:

> The truth is that the future (and present reality) of almost every
> form of computing is multi-core, and there currently is no effective
> way of dealing with concurrency.

Your post seems to take threading as the *only* way to write code for
multi-core systems, which certainly isn't so.

Last I checked, multiple processes can run concurrently on multi-core
systems. That's a well-established way of structuring a program.

> We still worry about setting up threads, synchronization of message
> queues, synchronization of shared memory regions, dealing with
> asynchronous behaviors, and most importantly, how threaded an
> application should be.

All of which is avoided by designing the program to operate as
discrete processes communicating via well-defined IPC mechanisms.

--
\ "I can picture in my mind a world without war, a world without |
`\ hate. And I can picture us attacking that world, because they'd |
_o__) never expect it." -- Jack Handey |
Ben Finney

king kikapu

unread,

Aug 10, 2007, 11:04:07 AM8/10/07

to

> All of which is avoided by designing the program to operate as
> discrete processes communicating via well-defined IPC mechanisms.

Hi Ben,

i would like to learn more about this, have you got any links to give
me so i can have a look ?

Cameron Laird

unread,

Aug 10, 2007, 10:14:26 AM8/10/07

to

In article <5i329uF...@mid.individual.net>,

Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com> wrote:
>Justin T. wrote:
>
>> The detrimental effects of the GIL have been discussed several
>> times and nobody has ever done anything about it.
>
>Also it has been discussed that dropping the GIL concept requires
>very fine locking mechanisms inside the interpreter to keep data
>serialised. The overhead managing those fine ones properly is not
>at all negligible, and I think it's more than managing GIL.
>
>> The truth is that the future (and present reality) of almost every
>> form of computing is multi-core,
>
>Is it? 8)

.
.
.
I reinforce some of these points slightly:
A. An effective change to the GIL impacts not just
Python in the sense of the Reference Manual, but
also all its extensions. That has the potential
to be a big, big cost; and
B. At least part of the attention given multi-core
processors--or, perhaps more accurately, the
claims that multi-cores deserve threading re-
writes--smells like vendor manipulation.

Nick Craig-Wood

unread,

Aug 10, 2007, 11:30:06 AM8/10/07

to

Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com> wrote:
> Justin T. wrote:
>
> > The detrimental effects of the GIL have been discussed several
> > times and nobody has ever done anything about it.
>
> Also it has been discussed that dropping the GIL concept requires
> very fine locking mechanisms inside the interpreter to keep data
> serialised. The overhead managing those fine ones properly is not
> at all negligible, and I think it's more than managing GIL.

That is certainly true. However the point being is that running on 2
CPUs at once at 95% efficiency is much better than running on only 1
at 99%...

> > The truth is that the future (and present reality) of almost every
> > form of computing is multi-core,
>
> Is it? 8)

Intel, AMD and Sun would have you believe that yes!

> The question is: If it really was, how much of useful performance
> gain would you get?

The linux kernel has been through these growing pains already... SMP
support was initially done with the Big Kernel Lock (BKL) which is
exactly equivalent to the GIL.

The linux kernel has moved onwards to finer and finer grained locking.

I think one important point to learn from the linux experience is to
make the locking a compile time choice. You can build a uniprocessor
linux kernel with all the locking taken out or you can build a
multi-processor kernel with fine grained locking.

I'd like to see a python build as it is at the moment and a python-mt
build which has the GIL broken down into a lock on each object.
python-mt would certainly be slower for non threaded tasks, but it
would certainly be quicker for threaded tasks on multiple CPU
computers.

The user could then choose which python to run.

This would of course make C extensions more complicated...

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Ben Sizer

unread,

Aug 10, 2007, 11:37:34 AM8/10/07

to

On 10 Aug, 15:38, Ben Finney <bignose+hates-s...@benfinney.id.au>
wrote:

> "Justin T." <jmtull...@gmail.com> writes:
> > The truth is that the future (and present reality) of almost every
> > form of computing is multi-core, and there currently is no effective
> > way of dealing with concurrency.
>
> Your post seems to take threading as the *only* way to write code for
> multi-core systems, which certainly isn't so.
>
> Last I checked, multiple processes can run concurrently on multi-core
> systems. That's a well-established way of structuring a program.

It is, however, almost always more complex and slower-performing.

Plus, it's underdocumented. Most academic study of concurrent
programming, while referring to the separately executing units as
'processes', almost always assume a shared memory space and the
associated primitives that go along with that.

> > We still worry about setting up threads, synchronization of message
> > queues, synchronization of shared memory regions, dealing with
> > asynchronous behaviors, and most importantly, how threaded an
> > application should be.
>
> All of which is avoided by designing the program to operate as
> discrete processes communicating via well-defined IPC mechanisms.

Hardly. Sure, so you don't have to worry about contention over objects
in memory, but it's still completely asynchronous, and there will
still be a large degree of waiting for the other processes to respond,
and you have to develop the protocols to communicate. Apart from
convenient serialisation, Python doesn't exactly make IPC easy, unlike
Java's RMI for example.

--
Ben Sizer

brad

unread,

Aug 10, 2007, 12:08:32 PM8/10/07

to

Justin T. wrote:
> Hello,
>
> While I don't pretend to be an authority on the subject, a few days of
> research has lead me to believe that a discussion needs to be started
> (or continued) on the state and direction of multi-threading python.

This is all anecdotal... threads in Python work great for me. I like
Ruby's green threads too, but I find py threads to be more robust. We've
written a tcp scanner (threaded connects) and can process twenty ports
on two /16s (roughly 171K hosts) in about twenty minutes. I'm happy with
that. However, that's all I've ever really used threads for, so I'm
probably less of an expert than you are :) I guess it comes down to what
you're doing with them.

Brad

brad

unread,

Aug 10, 2007, 12:12:17 PM8/10/07

to

brad wrote:

> This is all anecdotal... threads in Python work great for me. I like
> Ruby's green threads too,

I forgot to mention that Ruby is moving to a GIL over green threads in v2.0

Aahz

unread,

Aug 10, 2007, 12:12:53 PM8/10/07

to

In article <slrnfbou9...@irishsea.home.craig-wood.com>,

Nick Craig-Wood <ni...@craig-wood.com> wrote:
>
>This would of course make C extensions more complicated...

It's even worse than that. One of the goals for Python is to make it
easy to call into random libraries, and there are still plenty around
that aren't thread-safe (not even talking about thread-hot).

Chris Mellon

unread,

Aug 10, 2007, 12:13:30 PM8/10/07

to pytho...@python.org

On 8/10/07, Ben Sizer <kyl...@gmail.com> wrote:
> On 10 Aug, 15:38, Ben Finney <bignose+hates-s...@benfinney.id.au>
> wrote:
> > "Justin T." <jmtull...@gmail.com> writes:
> > > The truth is that the future (and present reality) of almost every
> > > form of computing is multi-core, and there currently is no effective
> > > way of dealing with concurrency.
> >
> > Your post seems to take threading as the *only* way to write code for
> > multi-core systems, which certainly isn't so.
> >
> > Last I checked, multiple processes can run concurrently on multi-core
> > systems. That's a well-established way of structuring a program.
>
> It is, however, almost always more complex and slower-performing.
>
> Plus, it's underdocumented. Most academic study of concurrent
> programming, while referring to the separately executing units as
> 'processes', almost always assume a shared memory space and the
> associated primitives that go along with that.
>

This is simply not true. Firstly, there's a well defined difference
between 'process' and a 'thread' and that is that processes have
private memory spaces. Nobody says "process" when they mean threads of
execution within a shared memory space and if they do they're wrong.

And no, "most" academic study isn't limited to shared memory spaces.
In fact, almost every improvement in concurrency has been moving
*away* from simple shared memory - the closest thing to it is
transactional memory, which is like shared memory but with
transactional semantics instead of simple sharing. Message passing and
"shared-nothing" concurrency are very popular and extremely effective,
both in performance and reliability.

There's nothing "undocumented" about IPC. It's been around as a
technique for decades. Message passing is as old as the hills.

> > > We still worry about setting up threads, synchronization of message
> > > queues, synchronization of shared memory regions, dealing with
> > > asynchronous behaviors, and most importantly, how threaded an
> > > application should be.
> >
> > All of which is avoided by designing the program to operate as
> > discrete processes communicating via well-defined IPC mechanisms.
>
> Hardly. Sure, so you don't have to worry about contention over objects
> in memory, but it's still completely asynchronous, and there will
> still be a large degree of waiting for the other processes to respond,
> and you have to develop the protocols to communicate. Apart from
> convenient serialisation, Python doesn't exactly make IPC easy, unlike
> Java's RMI for example.
>

There's nothing that Python does to make IPC hard, either. There's
nothing in the standard library yet, but you may be interested in Pyro
(http://pyro.sf.net) or Parallel Python
(http://www.parallelpython.com/). It's not erlang, but it's not hard
either. At least, it's not any harder than using threads and locks.

Justin T.

unread,

Aug 10, 2007, 12:37:19 PM8/10/07

to

On Aug 10, 3:52 am, Jean-Paul Calderone <exar...@divmod.com> wrote:

> On Fri, 10 Aug 2007 10:01:51 -0000, "Justin T." <jmtull...@gmail.com> wrote:
> >Hello,
>
> >While I don't pretend to be an authority on the subject, a few days of
> >research has lead me to believe that a discussion needs to be started
> >(or continued) on the state and direction of multi-threading python.
>
> > [snip - threading in Python doesn't exploit hardware level parallelism
> > well, we should incorporate stackless and remove the GIL to fix this]
>
> I think you have a misunderstanding of what greenlets are. Greenlets are
> essentially a non-preemptive user-space threading mechanism. They do not
> allow hardware level parallelism to be exploited.

I'm not an expert, but I understand that much. What greenlets do is
force the programmer to think about concurrent programming. It doesn't
force them to think about real threads, which is good, because a
computer should take care of that for you.Greenlets are nice because
they can run concurrently, but they don't have to. This means you can
safely divide them up among many threads. You could not safely do this
with just any old python program.

>
> >There has been much discussion on this in the past [2]. Those
> >discussions, I feel, were premature. Now that stackless is mature (and
> >continuation free!), Py3k is in full swing, and parallel programming
> >has been fully realized as THE next big problem for computer science,
> >the time is ripe for discussing how we will approach multi-threading
> >in the future.
>
> Many of the discussions rehash the same issues as previous ones. Many
> of them are started based on false assumptions or are discussions between
> people who don't have a firm grasp of the relevant issues.

That's true, but there are actually a lot of good ideas in there as
well.

>
> I don't intend to suggest that no improvements can be made in this area of
> Python interpreter development, but it is a complex issue and cheerleading
> will only advance the cause so far. At some point, someone needs to write
> some code. Stackless is great, but it's not the code that will solve this
> problem.

Why not? It doesn't solve it on its own, but its a pretty good start
towards something that could.

Justin T.

unread,

Aug 10, 2007, 12:52:23 PM8/10/07

to

On Aug 10, 3:57 am, Steve Holden <st...@holdenweb.com> wrote:
> Justin T. wrote:
> > Hello,
>
> > While I don't pretend to be an authority on the subject, a few days of
> > research has lead me to believe that a discussion needs to be started
> > (or continued) on the state and direction of multi-threading python.
> [...]
> > What these seemingly unrelated thoughts come down to is a perfect
> > opportunity to become THE next generation language. It is already far
> > more advanced than almost every other language out there. By
> > integrating stackless into an architecture where tasklets can be
> > divided over several parallelizable threads, it will be able to
> > capitalize on performance gains that will have people using python
> > just for its performance, rather than that being the excuse not to use
> > it.
>
> Aah, the path to world domination. You know you don't *have* to use
> Python for *everything*.
>

True, but Python seems to be the *best* place to tackle this problem,
at least to me. It has a large pool of developers, a large standard
library, it's evolving, and it's a language I like :). Languages that
seamlessly support multi-threaded programming are coming, as are
extensions that make it easier on every existent platform. Python has
the opportunity to lead that change.

>
> Be my guest, if it's so simple.
>

I knew somebody was going to say that! I'm pretty busy, but I'll see
if I can find some time to look into it.

>
> I doubt that a thread on c.l.py is going to change much. It's the
> python-dev and py3k lists where you'll need to take up the cudgels,
> because I can almost guarantee nobody is going to take the GIL out of
> 2.6 or 2.7.
>

I was hoping to get a constructive conversation on what the structure
of a multi-threaded python would look like. It would appear that this
was not the place for that.

> Is it even possible
> to run threads of the same process at different priority levels on all
> platforms?

No, it's not, and even fewer allow the scheduler to change the
priority dynamically. Linux, however, is one that does.

Jean-Paul Calderone

unread,

Aug 10, 2007, 1:34:47 PM8/10/07

to pytho...@python.org

On Fri, 10 Aug 2007 16:37:19 -0000, "Justin T." <jmtu...@gmail.com> wrote:
>On Aug 10, 3:52 am, Jean-Paul Calderone <exar...@divmod.com> wrote:
>> On Fri, 10 Aug 2007 10:01:51 -0000, "Justin T." <jmtull...@gmail.com> wrote:
>> >Hello,
>>
>> >While I don't pretend to be an authority on the subject, a few days of
>> >research has lead me to believe that a discussion needs to be started
>> >(or continued) on the state and direction of multi-threading python.
>>
>> > [snip - threading in Python doesn't exploit hardware level parallelism
>> > well, we should incorporate stackless and remove the GIL to fix this]
>>
>> I think you have a misunderstanding of what greenlets are. Greenlets are
>> essentially a non-preemptive user-space threading mechanism. They do not
>> allow hardware level parallelism to be exploited.
>
>I'm not an expert, but I understand that much. What greenlets do is
>force the programmer to think about concurrent programming. It doesn't
>force them to think about real threads, which is good, because a
>computer should take care of that for you.Greenlets are nice because
>they can run concurrently, but they don't have to. This means you can
>safely divide them up among many threads. You could not safely do this
>with just any old python program.

There may be something to this. On the other hand, there's no _guarantee_
that code written with greenlets will work with pre-emptive threading instead
of cooperative threading. There might be a tendency on the part of developers
to try to write code which will work with pre-emptive threading, but it's just
that - a mild pressure towards a particular behavior. That's not sufficient
to successfully write correct software (where "correct" in this context means
"works when used with pre-emptive threads", of course).

One also needs to consider the tasks necessary to really get this integration
done. It won't change very much if you just add greenlets to the standard
library. For there to be real consequences for real programmers, you'd
probably want to replace all of the modules which do I/O (and maybe some
that do computationally intensive things) with versions implemented using
greenlets. Otherwise you end up with a pretty hard barrier between greenlets
and all existing software that will probably prevent most people from changing
how they program.

Then you have to worry about the other issues greenlets introduce, like
invisible context switches, which can make your code which _doesn't_ use
pre-emptive threading broken.

All in all, it seems like a wash to me. There probably isn't sufficient
evidence to answer the question definitively either way, though. And trying
to make it work is certainly one way to come up with such evidence. :)

Jean-Paul

Luc Heinrich

unread,

Aug 10, 2007, 5:02:50 PM8/10/07

to

Justin T. <jmtu...@gmail.com> wrote:

> What these seemingly unrelated thoughts come down to is a perfect
> opportunity to become THE next generation language.

Too late: <http://www.erlang.org/>

:)

--
Luc Heinrich

Justin T.

unread,

Aug 10, 2007, 5:51:50 PM8/10/07

to

On Aug 10, 2:02 pm, l...@honk-honk.com (Luc Heinrich) wrote:

Uh oh, my ulterior motives have been discovered!

I'm aware of Erlang, but I don't think it's there yet. For one thing,
it's not pretty enough. It also doesn't have the community support
that a mainstream language needs. I'm not saying it'll never be
adequate, but I think that making python into an Erlang competitor
while maintaining backwards compatibility with the huge amount of
already written python software will make python a very formidable
choice as languages adapt more and more multi-core support. Python is
in a unique position as its actually a flexible enough language to
adapt to a multi-threaded environment without resorting to terrible
hacks.

Justin

Bjoern Schliessmann

unread,

Aug 10, 2007, 6:03:41 PM8/10/07

to

Nick Craig-Wood wrote:
[GIL]

> That is certainly true. However the point being is that running
> on 2 CPUs at once at 95% efficiency is much better than running on
> only 1 at 99%...

How do you define this percent efficiency?

>>> The truth is that the future (and present reality) of almost
>>> every form of computing is multi-core,
>>
>> Is it? 8)
>
> Intel, AMD and Sun would have you believe that yes!

Strange, in my programs, I don't need any "real" concurrency (they
are network servers and scripts). Or do you mean "the future of
computing hardware is multi-core"? That indeed may be true.

>> The question is: If it really was, how much of useful
>> performance gain would you get?
>
> The linux kernel has been through these growing pains already...
> SMP support was initially done with the Big Kernel Lock (BKL)
> which is exactly equivalent to the GIL.

So, how much performance gain would you get? Again, managing
fine-grained locking can be much more work than one simple lock.

> The linux kernel has moved onwards to finer and finer grained
> locking.

How do you compare a byte code interpreter to a monolithic OS
kernel?

> I'd like to see a python build as it is at the moment and a
> python-mt build which has the GIL broken down into a lock on each
> object. python-mt would certainly be slower for non threaded
> tasks, but it would certainly be quicker for threaded tasks on
> multiple CPU computers.

From where do you take this certainty? For example, if the program
in question involves mostly IO access, there will be virtually no
gain. Multithreading is not Performance.

> The user could then choose which python to run.
>
> This would of course make C extensions more complicated...

Also, C extensions can release the GIL for long-running
computations.

Regards,

Björn

--
BOFH excuse #391:

We already sent around a notice about that.

Justin T.

unread,

Aug 10, 2007, 7:17:52 PM8/10/07

to

On Aug 10, 10:34 am, Jean-Paul Calderone <exar...@divmod.com> wrote:

> >I'm not an expert, but I understand that much. What greenlets do is
> >force the programmer to think about concurrent programming. It doesn't
> >force them to think about real threads, which is good, because a
> >computer should take care of that for you.Greenlets are nice because
> >they can run concurrently, but they don't have to. This means you can
> >safely divide them up among many threads. You could not safely do this
> >with just any old python program.
>
> There may be something to this. On the other hand, there's no _guarantee_
> that code written with greenlets will work with pre-emptive threading instead
> of cooperative threading. There might be a tendency on the part of developers
> to try to write code which will work with pre-emptive threading, but it's just
> that - a mild pressure towards a particular behavior. That's not sufficient
> to successfully write correct software (where "correct" in this context means
> "works when used with pre-emptive threads", of course).

Agreed. Stackless does include a preemptive mode, but if you don't use
it, then you don't need to worry about locking at all. It would be
quite tricky to get around this, but I don't think it's impossible.
For instance, you could just automatically lock anything that was not
a local variable. Or, if you required all tasklets in one object to
run in one thread, then you would only have to auto-lock globals.

>
> One also needs to consider the tasks necessary to really get this integration
> done. It won't change very much if you just add greenlets to the standard
> library. For there to be real consequences for real programmers, you'd
> probably want to replace all of the modules which do I/O (and maybe some
> that do computationally intensive things) with versions implemented using
> greenlets. Otherwise you end up with a pretty hard barrier between greenlets
> and all existing software that will probably prevent most people from changing
> how they program.

If the framework exists to efficiently multi-thread python, I assume
that the module maintainers will slowly migrate over if there is a
performance benefit there.

>
> Then you have to worry about the other issues greenlets introduce, like
> invisible context switches, which can make your code which _doesn't_ use
> pre-emptive threading broken.

Not breaking standard python code would definitely be priority #1 in
an experiment like this. I think that by making the changes at the
core we could achieve it. A standard program, after all, is just 1
giant tasklet.

>
> All in all, it seems like a wash to me. There probably isn't sufficient
> evidence to answer the question definitively either way, though. And trying
> to make it work is certainly one way to come up with such evidence. :)

::Sigh:: I honestly don't see myself having time to really do anything
more than experiment with this. Perhaps I will try to do that though.
Sometimes I do grow bored of my other projects. :)

Justin

Beorn

unread,

Aug 10, 2007, 11:16:28 PM8/10/07

to

Btw, although overly simple (single CPU system!), this benchmark is
pretty interesting:

http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-first-benchmark/

About the GIL:

I think I've heard Guido say the last attempt at removing the Global
Interpreter Lock (GIL) resulted in a Python that was much slower...
which kind-of beats the purpose. I don't think it's feasible to
remove the GIL in CPython; the best hope of a GIL-free Python might
be PyPy.

The general trend seems to be that it's hard enough to program single-
thread programs correctly, adding to that extreme concurrency
awareness as required when you're programming with threads means it's
practically impossible for most programmers to get the programs
right. The shared-nothing model seems to be a very workable way to
scale for many programs (at least web apps/services).

Seun Osewa

unread,

Aug 11, 2007, 12:43:16 AM8/11/07

to

> I think I've heard Guido say the last attempt at removing the Global
> Interpreter Lock (GIL) resulted in a Python that was much slower...

What is it about Python that makes a thread-safe CPython version much
slower? Why doesn'ttrue threading slow down other languages like Perl
and Java?

I'm thinking it might be the reference counting approach to memory
management... but what do you guys think is the reason?

Paul Rubin

unread,

Aug 11, 2007, 1:06:10 AM8/11/07

to

Seun Osewa <seun....@gmail.com> writes:
> What is it about Python that makes a thread-safe CPython version much

> slower?...

> I'm thinking it might be the reference counting approach to memory
> management...

Yes. In the implementation that was benchmarked, if I understand
correctly, every refcount had its own lock, that was acquired and
released every time the refcount was modified.

Bryan Olson

unread,

Aug 11, 2007, 4:21:18 AM8/11/07

to

Justin T. wrote:
> True, but Python seems to be the *best* place to tackle this problem,
> at least to me. It has a large pool of developers, a large standard
> library, it's evolving, and it's a language I like :). Languages that
> seamlessly support multi-threaded programming are coming, as are
> extensions that make it easier on every existent platform. Python has
> the opportunity to lead that change.

I have to disagree. A dynamic scripting language, even a great
one such as Python, is not the vehicle to lead an advance of
pervasive threading. The features that draw people to Python
do not play nicely with optimizing multi-core efficiency; they
didn't play all that well with single-core efficiency either.
Python is about making good use of our time, not our machines'
time.

Finer-grain locking sounds good, but realize how many items
need concurrency control. Python offers excellent facilities
for abstracting away complexity, so we programmers do not
mentally track all the possible object interactions at once.
In a real application, objects are more intertwingled than
we realize. Whatever mistakes a Python programmer makes,
the interpreter must protect its own data structures
against all possible race errors.

Run-time flexibility is a key Python feature, and it
necessarily implies that most bindings are subject to
change. Looking at a Python function, we tend to focus on
the named data objects, and forget how many names Python
is looking up at run time. How often do we take and release
locks on all the name-spaces that might effect execution?

The GIL both sucks and rocks. Obviously cores are becoming
plentiful and the GIL limits how we can exploit them. On
the other hand, correctness must dominate efficiency. We
lack exact reasoning on what we must lock; with the GIL,
we err on the side of caution and correctness. Instead of
trying to figure out what we need to lock, we default to
locking everything, and try to figure out what we can
safely release.

[Steve Holden:]

>> Be my guest, if it's so simple.
>>
> I knew somebody was going to say that! I'm pretty busy, but I'll see
> if I can find some time to look into it.

If you want to lead fixing Python's threading, consider
first delivering a not-so-grand but definite
improvement. Have you seen how the 'threading' module
in Python's standard library implements timeouts?

--
--Bryan

Cameron Laird

unread,

Aug 11, 2007, 6:50:27 AM8/11/07

to

In article <1186807396.9...@b79g2000hse.googlegroups.com>,

Crudely, Perl threading is fragile, and Java requires
coding at a lower level.

Memory management is indeed important, and arguably
deserves at least as much attention as multi-core
accomodations. I know of no easy gains from here
on, just engineering trade-offs.

Cameron Laird

unread,

Aug 11, 2007, 6:52:55 AM8/11/07

to

In article <mailman.1836.1186762...@python.org>,
Chris Mellon <ark...@gmail.com> wrote:
.
.

.
>There's nothing "undocumented" about IPC. It's been around as a
>technique for decades. Message passing is as old as the hills.

.
.
.
... and has significant successes to boast, including
the highly-reliable and high-performing QNX real-time
operating system, and the already-mentioned language
Erlang.

fdu.x...@gmail.com

unread,

Aug 11, 2007, 2:55:24 AM8/11/07

to pytho...@python.org

Multi-core or multi-CPU computers are more and more popular, especially in
scientific computation. It will be a great great thing if multi-core support
can be added to python.

Kay Schluehr

unread,

Aug 11, 2007, 7:59:29 AM8/11/07

to

Have you checked out the processing [1] package? I've currently the
impression that people want to change the whole language before they
checkout a new package. It would be nice to read a review.

[1] http://cheeseshop.python.org/pypi/processing

Nick Craig-Wood

unread,

Aug 11, 2007, 10:30:05 AM8/11/07

to

Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com> wrote:
> Nick Craig-Wood wrote:
> [GIL]
> > That is certainly true. However the point being is that running
> > on 2 CPUs at once at 95% efficiency is much better than running on
> > only 1 at 99%...
>
> How do you define this percent efficiency?

Those are hypothetical numbers. I guess that a finely locked python
will spend a lot more time locking and unlocking individual objects
than it currently does locking and unlocking the GIL. This is for two
reasons

1) the GIL will be in cache at all times and therefore "hot" and quick
to access

2) much more locking and unlocking of each object will need to be
done.

> >>> The truth is that the future (and present reality) of almost
> >>> every form of computing is multi-core,
> >>
> >> Is it? 8)
> >
> > Intel, AMD and Sun would have you believe that yes!
>
> Strange, in my programs, I don't need any "real" concurrency (they
> are network servers and scripts). Or do you mean "the future of
> computing hardware is multi-core"? That indeed may be true.

I meant the latter. I agree with you though that not all programs
need to be multi-threaded. Hence my proposal for two python binaries.

> >> The question is: If it really was, how much of useful
> >> performance gain would you get?
> >
> > The linux kernel has been through these growing pains already...
> > SMP support was initially done with the Big Kernel Lock (BKL)
> > which is exactly equivalent to the GIL.
>
> So, how much performance gain would you get? Again, managing
> fine-grained locking can be much more work than one simple lock.

Assuming that you are not IO bound, but compute bound and that compute
is being done in python then you'll get a speed up proportional to how
many processors you have, minus a small amount for locking overhead.

> > The linux kernel has moved onwards to finer and finer grained
> > locking.
>
> How do you compare a byte code interpreter to a monolithic OS
> kernel?

In this (locking) respect they are quite similar actually. You can
think of kernel code as being the python interpreter (BKL vs GIL),
user space being as C extensions running with the GIL unlocked and
calling back into the python interpreter / kernel.

> > I'd like to see a python build as it is at the moment and a
> > python-mt build which has the GIL broken down into a lock on each
> > object. python-mt would certainly be slower for non threaded
> > tasks, but it would certainly be quicker for threaded tasks on
> > multiple CPU computers.
>
> From where do you take this certainty? For example, if the program
> in question involves mostly IO access, there will be virtually no
> gain. Multithreading is not Performance.

Yes you are right of course. IO bound tasks don't benefit from
multi-threading. In fact usually the reverse. Twisted covers this
ground extremely well in my experience. However IO bound tasks
probably aren't taxing your quad core chip either...

> > The user could then choose which python to run.
> >
> > This would of course make C extensions more complicated...
>
> Also, C extensions can release the GIL for long-running
> computations.

Provided they stay in C. If they call any python stuff then they need
to take it again.

Bjoern Schliessmann

unread,

Aug 11, 2007, 5:46:05 PM8/11/07

to

Nick Craig-Wood wrote:
> Bjoern Schliessmann <usenet-mail-03...@spamgourmet.com>

>> So, how much performance gain would you get? Again, managing
>> fine-grained locking can be much more work than one simple lock.
>
> Assuming that you are not IO bound, but compute bound and that
> compute is being done in python then you'll get a speed up
> proportional to how many processors you have, minus a small amount
> for locking overhead.

Assuming this: Agreed.

Regards,

Björn

--
BOFH excuse #348:

We're on Token Ring, and it looks like the token got loose.

Ben Sizer

unread,

Aug 11, 2007, 5:53:31 PM8/11/07

to

On Aug 10, 5:13 pm, "Chris Mellon" <arka...@gmail.com> wrote:

> On 8/10/07, Ben Sizer <kylo...@gmail.com> wrote:
>
> > On 10 Aug, 15:38, Ben Finney <bignose+hates-s...@benfinney.id.au>
> > wrote:
> > > Last I checked, multiple processes can run concurrently on multi-core
> > > systems. That's a well-established way of structuring a program.
>
> > It is, however, almost always more complex and slower-performing.
>
> > Plus, it's underdocumented. Most academic study of concurrent
> > programming, while referring to the separately executing units as
> > 'processes', almost always assume a shared memory space and the
> > associated primitives that go along with that.
>
> This is simply not true. Firstly, there's a well defined difference
> between 'process' and a 'thread' and that is that processes have
> private memory spaces. Nobody says "process" when they mean threads of
> execution within a shared memory space and if they do they're wrong.

I'm afraid that a lot of what students will be taught does exactly
this, because the typical study of concurrency is in relation to
contention for shared resources, whether that be memory, a file, a
peripheral, a queue, etc. One example I have close to hand is
'Principles of Concurrent and Distributed Programming', which has no
mention of the term 'thread'. It does have many examples of several
processes accessing shared objects, which is typically the focus of
most concurrent programming considerations.

The idea that processes have memory space completely isolated from
other processes is both relatively recent and not universal across all
platforms. It also requires you to start treating memory as
arbitrarily different from other resources which are typically
shared.

> And no, "most" academic study isn't limited to shared memory spaces.
> In fact, almost every improvement in concurrency has been moving
> *away* from simple shared memory - the closest thing to it is
> transactional memory, which is like shared memory but with
> transactional semantics instead of simple sharing.

I think I wasn't sufficiently clear; research may well be moving in
that direction, but you can bet that the typical student with their
computer science or software engineering degree will have been taught
far more about how to use synchronisation primitives within a program
than how to communicate between arbitrary processes.

> There's nothing "undocumented" about IPC. It's been around as a
> technique for decades. Message passing is as old as the hills.

I didn't say undocumented, I said underdocumented. The typical
programmer these days comes educated in at least how to use a mutex or
semaphore, and will probably look for that capability in any language
they use. They won't be thinking about creating an arbitrary message
passing system and separating their project out into separate
programs, even if that has been what UNIX programmers have chosen to
do since 1969. There are a multitude of different ways to fit IPC into
a system, but only a few approaches to threading, which also happen to
coincide quite closely to how low-level OS functionality handles
processes meaning you tend to get taught the latter. That's why it's
useful for Python to have good support for it.

> > Hardly. Sure, so you don't have to worry about contention over objects
> > in memory, but it's still completely asynchronous, and there will
> > still be a large degree of waiting for the other processes to respond,
> > and you have to develop the protocols to communicate. Apart from
> > convenient serialisation, Python doesn't exactly make IPC easy, unlike
> > Java's RMI for example.
>
> There's nothing that Python does to make IPC hard, either. There's
> nothing in the standard library yet, but you may be interested in Pyro
> (http://pyro.sf.net) or Parallel Python
> (http://www.parallelpython.com/). It's not erlang, but it's not hard
> either. At least, it's not any harder than using threads and locks.

Although Pyro is good in what it does, simple RPC alone doesn't solve
most of the problems that typical threading usage does. IPC is useful
for the idea of submitting jobs in the background but it doesn't fit
so well to situations where there are parallel loops both acting on a
shared resource. Say you have a main thread and a network reading
thread - given a shared queue for the data, you can safely do this by
adding just 5 lines of code: 2 locks, 2 unlocks, and a call to start
the networking thread. Implementing that using RPC will be more
complex, or less efficient, or probably both.

--
Ben Sizer

Ant

unread,

Aug 11, 2007, 6:25:28 PM8/11/07

to

On Aug 11, 5:43 am, Seun Osewa <seun.os...@gmail.com> wrote:
> > I think I've heard Guido say the last attempt at removing the Global
> > Interpreter Lock (GIL) resulted in a Python that was much slower...
>
> What is it about Python that makes a thread-safe CPython version much
> slower? Why doesn'ttrue threading slow down other languages like Perl
> and Java?

I have no idea about Perl - but there's probably some hideous black
magic going on ;-)

As for Java, making code thread safe *does* slow down the code. It is
the very reason that the language designers made the collections API
non-thread safe by default (you have to wrap the standard collections
in a synchronised wrapper to make them thread safe).

--
Ant...

Message has been deleted

Seun Osewa

unread,

Aug 22, 2007, 8:58:53 PM8/22/07

to

Yes, but if you reduce the coupling between threads in Java (by using
the recommended Python approach of communicating with Queues) you get
the full speed of all the cores in your CPU. i wonder why we can't
have this in Java; it will be very good for servers!