Thread State and the Global Interpreter Lock

Afanasiy

unread,

Jun 6, 2003, 4:25:15 PM6/6/03

to

Are there plans to make the Python interpreter
"fully thread safe" in a future version?

http://www.python.org/doc/current/api/threads.html

Skip Montanaro

unread,

Jun 6, 2003, 4:40:25 PM6/6/03

to

>> Are there plans to make the Python interpreter "fully thread safe" in
>> a future version?

In what way is it not "fully thread safe" today?

Skip

Afanasiy

unread,

Jun 6, 2003, 5:18:29 PM6/6/03

to

http://www.python.org/doc/current/api/threads.html

Michael Chermside

unread,

Jun 6, 2003, 5:05:41 PM6/6/03

to

Afanasiy writes:
> Are there plans to make the Python interpreter
> "fully thread safe" in a future version?
>

> http://www.python.org/doc/current/api/threads.html

No, the Global Interpreter Lock will not be going away in
any forseeable version of Python. (Well, of cPython anyhow.)
But the document you referred to is somewhat misleading.
It is written from the point of view of Guido or others
who are writing Python itself (the interpreter). The code
that these people write is not "fully thread safe", and
because of that, the Global Interpreter Lock must exist.

But if you are writing programs IN Python, they they ARE
fully thread safe. You can launch threads and do whatever
you like in them. The only issue is that because of the
Global Interpreter Lock, only one thread will be executing
new Python bytecodes at a time. If you have only one CPU,
then that is no restriction at all... after all, ALL
multithreading must be done by taking turns. If most of
your threads are blocked waiting for things like file or
socket IO (a very common use for threads) then that's no
problem... the threads that are waiting won't hold up any
others. And if your threads are busy calling out to some
lengthy computation written in C (the best place for a CPU
intensive calculation), then _IF_ the author of the C
extension used the Global Interpreter Lock (GIL for short)
properly, it won't hold up your program. And if you DO
happen to have a multi-processor machine, aren't running
anything else but this Python program, and are doing
lengthy computations (written in Python) in multiple
threads, well then only one of your CPUs will be working
at a time.

It's a wart (if only a small, special case one), and I'm
sure that the Python maintainers would love to fix it.
But they've considered it and decided that it would require
some VERY deep changes in cPython... rewriting the very
core of it. So they're not planning to do it. (Although
the new version coming out very soon now has some
improvements to make it easier for C programmers to make
their extensions cooperate like they ought to.)

If you'd like to, there are a few options. I think that
the best option is to simply try it and see... in all
likelihood you'll find that Python is "fast enough" as it
is, and if not, there are lots of other ways to achieve
large speedups. Threading behavior is NOT likely to be
the biggest contributor. But if you're doing specialty
work and already know that your compute threads must make
simultaneous use of multiple CPUs, then you have a couple
of alternatives. Jython, the java-based implementation of
Python does not have this restriction (assuming that the
JVM you use doesn't have a similar limitation), although
it's usually slower than cPython. Or you could take a
look at the source to Python and see if you can figure
out a clever solution that others have missed. If you DID
contribute a "fix", I'm sure it would be gleefully
accepted (after being vigorously tested).

-- Michael Chermside

Scott David Daniels

unread,

Jun 6, 2003, 6:19:23 PM6/6/03

to

I suspect not. It would be incredibly hard to program.
The rub is that, as a prgrammer, you think in
statements (or perhaps expressions) of the language
being executed, but a "fully threadsafe" system
interleaves execution at a resolution finer than those
elements. In fact, the interleaving can happen during
the execution of a single Python opcode.

Data structures written by one thread will be seen as
half-updated by another thread. Modern processors go put
in a lot of effort to make some of their optimizations
such as out-of-order execution appear as if a simple
march of instructions is processed. They try to make
each operation look like it has either already happened
or has not yet happened. Python would have to put in
double effort: first to pik and respect orders of
operations in the C code that might allow safe
interleaving (possible, but very hard to do). Then it
would have to "decorate" the C code with volatility
declarations in order to prevent a C compiler from
optimizing away some of that careful ordering.

Stupid little sample (in fake C):
...
lock = 1; /* A */
ds[0] = 5;
ds[1] = -3;
lock = 0; /* B */

If lock is not "declared volatile", a C compiler is free
to observe that the assignment at A is not read before
the assignment at B. Therefore, it can simply skip the
assignment at A, since B will set the final value anyway.
Compilers do this all the time. So the Python interpretter
would have to not only get the parts of Python opcodes to
interact nicely at the "C" level, it would also have to get
the code to run correctly on all the C compilers that Python
is built with. The only way to do _that_ is to get very
conservative about what code you write, and I'd expect an
easily measurable drop in the speed of Python code that
reflect a lot of disabled optimizations. As a very simple
example, imagine one thread writing into a dictionary and
another thread reading that dictionary. That won't happen
in a single C statement by a lobng shot, but it must appear
to the other thread as if was either yet-to-happen or
had already happened (no intermediate states).

If you want to write thread code that interacts safely,
each thread mostly plays in its own yard anyway. Why not
just bite the bullet and put two processes up and allow
them to communicate the little bit they must through some
system-mediated communication facility such as sockets, or
shared memory, or .... You may even get a bigger speed
advantage that way.

Its C code interleaving compiler optimization of its
code from appearing to break the indivisibility of its
instructions.

-Scott David Daniels
Scott....@Acm.Org

Peter Hansen

unread,

Jun 6, 2003, 10:37:59 PM6/6/03

to

I suspect Skip meant to elicite from you the answer to this
question: What problems have *you* encountered as a result of
the "not fully thread-safe" issue you are concerned about?

-Peter

Martin v. Löwis

unread,

Jun 7, 2003, 1:40:24 PM6/7/03

to

Afanasiy <abeli...@hotmail.com> writes:

> >In what way is it not "fully thread safe" today?
>
> http://www.python.org/doc/current/api/threads.html

Ah, that. I think it should be best fixed by removing the first
sentence of the documentation. Python is fully thread safe.

Regards,
Martin

Ryjek

unread,

Jun 24, 2003, 4:51:58 PM6/24/03

to

I was looking for an answer to the same question and came accross this
thread.

Our application currently consists of two processes: one is a database
and one is a windowing program with a user interface. Both embed a
Python interpreter for customization purposes (i.e. you can write your
own GUI extensions, and your own database procedures).

I would like to merge both processes into one process with two threads
(mostly for performance purposes), but keep both threads' data separated
the same way as they are now. Python practically does not allow me to do
it because it insists that only one thread can execute Python bytecodes
at a time. I would like to be able to run both interpreters in
completely separate environments, where no state and no objects are
shared. I believe the restriction of "one thread at a time" is too
strict in such cases. What is the purpose of Py_NewInterpreter() if you
cannot run it concurrently ..?

I understand that making Python completely thread safe w.respect to
native threads may not be appropriate, but it shouldn't be that
difficult to allow for real concurrent execution in isolated
environments. At least, this is what they did in Perl:
http://www.icewalkers.com/Perl/5.8.0/lib/threads.html

r

Aahz

unread,

Jun 28, 2003, 9:20:54 AM6/28/03

to

In article <3ef8b9b2$1...@news.hks.com>, Ryjek <ry...@findmeifyou.can> wrote:
>
>Our application currently consists of two processes: one is a database
>and one is a windowing program with a user interface. Both embed a
>Python interpreter for customization purposes (i.e. you can write your
>own GUI extensions, and your own database procedures).
>
>I would like to merge both processes into one process with two threads
>(mostly for performance purposes), but keep both threads' data separated
>the same way as they are now. Python practically does not allow me to do
>it because it insists that only one thread can execute Python bytecodes
>at a time. I would like to be able to run both interpreters in
>completely separate environments, where no state and no objects are
>shared. I believe the restriction of "one thread at a time" is too
>strict in such cases. What is the purpose of Py_NewInterpreter() if you
>cannot run it concurrently ..?

It was a mistake, essentially, kept around for the rare occasions when
the caveats below don't apply.

I would say that if performance is your primary goal, multi-threading
your application would be a poor idea. Threads work best in two
contexts: breaking up similar types of work for performance (and in the
case of Python, that pretty much needs to be I/O work), and increasing
the responsiveness of user-centered applications. Unless you're sharing
huge amounts of data, it's quite likely that even outside of Python you
wouldn't see much (if any) performance increase.

>I understand that making Python completely thread safe w.respect to
>native threads may not be appropriate, but it shouldn't be that
>difficult to allow for real concurrent execution in isolated
>environments. At least, this is what they did in Perl:
>http://www.icewalkers.com/Perl/5.8.0/lib/threads.html

The problem is that Python relies heavily on DLLs, both internally and
third-party libraries. It's pretty much impossible to set things up so
that random DLLs can be correctly loaded by multiple interpreters in a
single process.

Because threading has existed in core Python for many years, a lot of
bugs and issues have been hashed out -- it's not at all clear to me the
extent to which Perl has done the same. One reason why Python sticks
with the GIL model is that multi-threading does diddly-squat for
computational threads on a single-CPU box. It's actually *more*
efficient the way Python does it.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

Usenet is not a democracy. It is a weird cross between an anarchy and a
dictatorship.