Thread-safety in Pylons (Python?)

188 views
Skip to first unread message

kgs

unread,
Mar 4, 2009, 4:58:58 PM3/4/09
to pylons-discuss
Hi,

For couple of last days I was creating simple WSGI app based on Paste
components. Pylons are also based on Paste, so I started reading
Pylons sources to learn something new :)

I've found something that for me is strange, maybe because I don't
fully understand how threading works in Python. As I correctly
uderstand PylonsApp object (from wsgiapp.py module) is created only
one for application (it is shared between all worker threads in WSGI
server like Paste#http) and its __call__ method is called from many
threads. Am I right?

Inside __call__ method after some processing we are in find_controller
method - and here we are *accessing and modyfing* dictionary
self.controller_classes. But this dictionary is shared between many
threads, so is it thread safe?

I could not find anywhere unambigous answer if accessing Python
primitives from many threads is safe or not - for me it looks that it
might be not safe (because modyfing/iterating/accessing e.g.
dictionary may result in context switches).

So if it would be not safe we need some RWLock here, right? I am
asking also because in my application I also want to store some
"global" data as dict (shared between all worker threads) and I am
wondering if I have to use locking primitives.

Thanks in advance for all explanations.

--
Cheers,
Kamil Gorlo

Ben Bangert

unread,
Mar 4, 2009, 8:18:03 PM3/4/09
to pylons-...@googlegroups.com
On Mar 4, 2009, at 1:58 PM, kgs wrote:

> Inside __call__ method after some processing we are in find_controller
> method - and here we are *accessing and modyfing* dictionary
> self.controller_classes. But this dictionary is shared between many
> threads, so is it thread safe?

Access to Python objects is thread-safe in that Python itself has a
Global Interpreter Lock (GIL) that prevents say, two threads from
updating the exact same key at the exact same time. The GIL locks on
the dict access/setting.

The GIL does not generally lock on many operations that occur at the C
layer, like I/O, and while waiting on database access.

Looking up Python GIL should provide quite a few threads about it.
It's for this reason that to ensure you're effectively using a multi-
core processor to its full potential, you should run a Pylons process
for every core. This is what I do for all the sites I run.

Cheers,
Ben

Philip Jenvey

unread,
Mar 4, 2009, 9:46:11 PM3/4/09
to pylons-...@googlegroups.com

On Mar 4, 2009, at 1:58 PM, kgs wrote:

>
> I could not find anywhere unambigous answer if accessing Python
> primitives from many threads is safe or not - for me it looks that it
> might be not safe (because modyfing/iterating/accessing e.g.
> dictionary may result in context switches).

This is the most authoritative page:

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

It's still a bit ambiguous on what is really safe, but I can guarantee
the two basic dict operations in question (a getitem and setitem on
steady dict value) are in fact safe.

--
Philip Jenvey

Noah Gift

unread,
Mar 4, 2009, 10:50:17 PM3/4/09
to pylons-...@googlegroups.com
Although, if I read this correctly, you are counting on the fact the
current implementation of the GIL will always remain exactly this way
for the operations getitem and setitem. That seems to be a fairly
safe bet though, right?

> --
> Philip Jenvey
>
>
> >
>



--
Cheers,

Noah

Ben Bangert

unread,
Mar 5, 2009, 1:50:50 AM3/5/09
to pylons-...@googlegroups.com
On Mar 4, 2009, at 7:50 PM, Noah Gift wrote:

> Although, if I read this correctly, you are counting on the fact the
> current implementation of the GIL will always remain exactly this way
> for the operations getitem and setitem. That seems to be a fairly
> safe bet though, right?

If it stops working in this way, there'll be significantly larger
problems in Python code than just Pylons. :)

- Ben

Wichert Akkerman

unread,
Mar 5, 2009, 2:07:36 AM3/5/09
to Ben Bangert, pylons-...@googlegroups.com

I know CPython has the GIL, but I am not sure if all python
implementations hvae it. If I remember correctly stackless python did
not suffer from that particular wart.

Wichert.


--
Wichert Akkerman <wic...@wiggy.net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.

Kamil Gorlo

unread,
Mar 5, 2009, 2:53:51 AM3/5/09
to pylons-...@googlegroups.com

Yeah, this site is a bit ambigous, e.g.

"Operations that replace other objects may invoke those other objects’
__del__ method when their reference count reaches zero, and that can
affect things. This is especially true for the mass updates to
dictionaries and lists. When in doubt, use a mutex!"

But before that statement author said that following operations are atomic:

D[x] = y
D1.update(D2)

where x,y are objects and D1 and D2 are dict. So how this could be true?

But even if we assume that getitem and setitem on dict are atomic (on
'steady' value only? which means primitives?). How to solve problem
which I am facing now:

I want to start another thread T1 in my Pylons App (manually, not
Paste#http worker) and have some global dict which all http workers
will read. But T1 periodically will update this dictionary (in fact
all he want to do is to swap this global dict with local dict which
was prepared during his work). In this dict I will have Python
primitives (other dicts too) and simple classes which act only as
'structs'. Do I need lock?

Cheers,
Kamil Gorlo

Chris Miles

unread,
Mar 5, 2009, 3:07:26 AM3/5/09
to pylons-...@googlegroups.com, Chris Miles

On 05/03/2009, at 6:53 PM, Kamil Gorlo wrote:
> I want to start another thread T1 in my Pylons App (manually, not
> Paste#http worker) and have some global dict which all http workers
> will read. But T1 periodically will update this dictionary (in fact
> all he want to do is to swap this global dict with local dict which
> was prepared during his work). In this dict I will have Python
> primitives (other dicts too) and simple classes which act only as
> 'structs'. Do I need lock?

If unsure I'd suggest use a lock. What are you saving by skipping the
lock? If the dict swap happens infrequently then the lock will be
insignificant.

You also need to ensure that modifications to any of the objects that
the dict points to are protected by locks.

Cheers,
Chris Miles

Lawrence Oluyede

unread,
Mar 5, 2009, 7:17:11 AM3/5/09
to pylons-...@googlegroups.com, Ben Bangert
On Thu, Mar 5, 2009 at 8:07 AM, Wichert Akkerman <wic...@wiggy.net> wrote:
> I know CPython has the GIL, but I am not sure if all python
> implementations hvae it. If I remember correctly stackless python did
> not suffer from that particular wart.

Stackless has the GIL, IronPython does not, Jython does not
<http://fisheye3.atlassian.com/browse/jython/trunk/jython/src/org/python/compiler/Future.java?r1=4203&r2=4202>.

--
Lawrence Oluyede
[eng] http://oluyede.org - http://twitter.com/lawrenceoluyede
[ita] http://neropercaso.it - http://twitter.com/rhymes

Kamil Gorlo

unread,
Mar 5, 2009, 8:52:33 AM3/5/09
to pylons-...@googlegroups.com
On Thu, Mar 5, 2009 at 9:07 AM, Chris Miles <miles...@gmail.com> wrote:
>
>
> On 05/03/2009, at 6:53 PM, Kamil Gorlo wrote:
>> I want to start another thread T1 in my Pylons App (manually, not
>> Paste#http worker) and have some global dict which all http workers
>> will read. But T1 periodically will update this dictionary (in fact
>> all he want to do is to swap this global dict with local dict which
>> was prepared during his work). In this dict I will have Python
>> primitives (other dicts too) and simple classes which act only as
>> 'structs'. Do I need lock?
>
> If unsure I'd suggest use a lock. What are you saving by skipping the
> lock?  If the dict swap happens infrequently then the lock will be
> insignificant.

I will probably use lock, but even if this will be own implemented
RWLock (because there is no ReadWriteLock in python library) readers
also have to acquire it, not only writer thread. I was asking because
I want to know what can I assume using python primitives in
multithread environment and it looks that nobody knows for sure, what
is quite strange.

Hovewer, first tests shows on my ubuntu machine with python 2.5 that
using lock sometimes is faster than not using it. Could anybody
explain this?

This is simple example showing that x += 1 is not atomic:

>>> test1.py
import threading
gv = 0

class T(threading.Thread):
def run(self):
global gv
for i in xrange(10000000):
gv += 1

x = T()
y = T()
x.start()
y.start()
x.join()
y.join()
print gv
<<< test1.py

This is another example, but this time with lock:

>>> test2.py
import threading
gv = 0
gl = threading.Lock()

class T(threading.Thread):
def __init__(self, lock):
threading.Thread.__init__(self)
self.lock = lock

def run(self):
global gv
self.lock.acquire()
try:
for i in xrange(10000000):
gv += 1
finally:
self.lock.release()

x = T(gl)
y = T(gl)
x.start()
y.start()
x.join()
y.join()

print gv
<<< test2.py

And here are my results:

$ time python test1.py
14224533

real 0m6.630s
user 0m5.616s
sys 0m2.560s


$ time python test2.py
20000000

real 0m4.446s
user 0m4.380s
sys 0m0.036s


Cheers,
Kamil

Peter Hansen

unread,
Mar 5, 2009, 1:28:43 PM3/5/09
to pylons-...@googlegroups.com
Kamil Gorlo wrote:
> On Thu, Mar 5, 2009 at 3:46 AM, Philip Jenvey <pje...@underboss.org> wrote:
>> http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

>>
> Yeah, this site is a bit ambigous, e.g.
>
> "Operations that replace other objects may invoke those other objects’
> __del__ method when their reference count reaches zero, and that can
> affect things. This is especially true for the mass updates to
> dictionaries and lists. When in doubt, use a mutex!"
>
> But before that statement author said that following operations are atomic:
>
> D[x] = y
> D1.update(D2)
>
> where x,y are objects and D1 and D2 are dict. So how this could be true?

I believe the adjective "atomic" applies only to the affect on D or D1,
in this case, and given the comment about __del__ I think it probably
doesn't apply even to the .update() case (unless D2 has length 1).

> But even if we assume that getitem and setitem on dict are atomic (on
> 'steady' value only? which means primitives?). How to solve problem
> which I am facing now:
>
> I want to start another thread T1 in my Pylons App (manually, not
> Paste#http worker) and have some global dict which all http workers
> will read. But T1 periodically will update this dictionary (in fact
> all he want to do is to swap this global dict with local dict which
> was prepared during his work). In this dict I will have Python
> primitives (other dicts too) and simple classes which act only as
> 'structs'. Do I need lock?

If by "swap" you simply mean you will be rebinding the global name to a
new dictionary, then the rebinding itself will certainly be atomic. One
the other hand, if the various worker threads access this global name
more than once in a request (or session, or whatever) then you will
certainly have problems as they would use the old dictionary for part of
the request then use the new one.

If they simply grab a local reference to the global name (possibly
storing it in their request object, or session, depending on what you
need) when first needed, and thereafter access it only from there, I
can't see how you'd have any problems (ignoring issues involving changes
to various contained items, which might be shared if you haven't ensured
they are not).

I think if you aren't sure yet what will work, post some pseudo-code
showing what you would do if you were to ignore thread-safety issues.

Also note that use of the Queue module is a very common step to dealing
with thread-safety issues, generally resulting in most or all potential
problems just going away. If that could work for you it's probably the
best bet.

--
Peter Hansen, P.Eng.
Engenuity Corporation
416-617-1499

Noah Gift

unread,
Mar 5, 2009, 2:01:16 PM3/5/09
to pylons-...@googlegroups.com
I concur, in fact I wrote an article advocating just that :)

http://www.ibm.com/developerworks/aix/library/au-threadingpython/
>
> --
> Peter Hansen, P.Eng.
> Engenuity Corporation
> 416-617-1499
>
> >
>



--
Cheers,

Noah

Kamil Gorlo

unread,
Mar 5, 2009, 2:16:55 PM3/5/09
to pylons-...@googlegroups.com
On Thu, Mar 5, 2009 at 7:28 PM, Peter Hansen <pe...@engcorp.com> wrote:
>
> Kamil Gorlo wrote:
>> On Thu, Mar 5, 2009 at 3:46 AM, Philip Jenvey <pje...@underboss.org> wrote:
>>> http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm
>>>
>> Yeah, this site is a bit ambigous, e.g.
>>
>> "Operations that replace other objects may invoke those other objects’
>> __del__ method when their reference count reaches zero, and that can
>> affect things. This is especially true for the mass updates to
>> dictionaries and lists. When in doubt, use a mutex!"
>>
>> But before that statement author said that following operations are atomic:
>>
>> D[x] = y
>> D1.update(D2)
>>
>> where x,y are objects and D1 and D2 are dict. So how this could be true?
>
> I believe the adjective "atomic" applies only to the affect on D or D1,
> in this case, and given the comment about __del__ I think it probably
> doesn't apply even to the .update() case (unless D2 has length 1).

Yes, of course - we should consider "atomic" separately for each line.

>> But even if we assume that getitem and setitem on dict are atomic (on
>> 'steady' value only? which means primitives?). How to solve problem
>> which I am facing now:
>>
>> I want to start another thread T1 in my Pylons App (manually, not
>> Paste#http worker) and have some global dict which all http workers
>> will read. But T1 periodically will update this dictionary (in fact
>> all he want to do is to swap this global dict with local dict which
>> was prepared during his work). In this dict I will have Python
>> primitives (other dicts too) and simple classes which act only as
>> 'structs'. Do I need lock?
>
> If by "swap" you simply mean you will be rebinding the global name to a
> new dictionary, then the rebinding itself will certainly be atomic.  One
> the other hand, if the various worker threads access this global name
> more than once in a request (or session, or whatever) then you will
> certainly have problems as they would use the old dictionary for part of
> the request then use the new one.
>
> If they simply grab a local reference to the global name (possibly
> storing it in their request object, or session, depending on what you
> need) when first needed, and thereafter access it only from there, I
> can't see how you'd have any problems (ignoring issues involving changes
> to various contained items, which might be shared if you haven't ensured
> they are not).

Yes, that is the case. I have global variable and 'swap' means
'rebinding' to new dictionary.

So there is global dict G and two types of threads:

1. HTTP Worker (in many instances), and he does in each request:
- local_G = G
- from this moment operate only on 'local_G'

2. Refresh Worker (only one instance) and he from time to time does:
- tmp_G = {'bla' : 42, .... }
- G = tmp_G

So everything probably will be OK and this solution is thread-safe in
Python. But.. I will probably use locks even if they *might* be not
necessary - this code is also for other people than me and for them it
will be probably easier to read and understand that everything works
OK. Especially that there is probably no performance trade-off.

> I think if you aren't sure yet what will work, post some pseudo-code
> showing what you would do if you were to ignore thread-safety issues.
>
> Also note that use of the Queue module is a very common step to dealing
> with thread-safety issues, generally resulting in most or all potential
> problems just going away.  If that could work for you it's probably the
> best bet.

I know this Queue pattern and I also know that it is probably the best
solution for synchronization problems not only in Python :) I was only
wondering how much we can get from Python primitives :)

However - big thanks to all of you guys for posts, I've learned
something new (as everyday :)).

Cheers,
Kamil

Philip Jenvey

unread,
Mar 5, 2009, 4:32:27 PM3/5/09
to pylons-...@googlegroups.com

On Mar 5, 2009, at 4:17 AM, Lawrence Oluyede wrote:

>
> On Thu, Mar 5, 2009 at 8:07 AM, Wichert Akkerman <wic...@wiggy.net>
> wrote:
>> I know CPython has the GIL, but I am not sure if all python
>> implementations hvae it. If I remember correctly stackless python did
>> not suffer from that particular wart.
>
> Stackless has the GIL, IronPython does not, Jython does not
> <http://fisheye3.atlassian.com/browse/jython/trunk/jython/src/org/python/compiler/Future.java?r1=4203&r2=4202
> >.

Though Jython and IronPython lack a GIL, they ensure the methods we
expect to be thread safe on the core data structures are in fact
thread safe, for compatibility with CPython.

--
Philip Jenvey

Kamil Gorlo

unread,
Mar 6, 2009, 3:58:45 PM3/6/09
to pylons-...@googlegroups.com
On Thu, Mar 5, 2009 at 10:32 PM, Philip Jenvey <pje...@underboss.org> wrote:
> Though Jython and IronPython lack a GIL, they ensure the methods we
> expect to be thread safe on the core data structures are in fact
> thread safe, for compatibility with CPython.

So, is there any place where can I read what is thread safe in
IronPython or Jython (what means: what should be done to be compatible
with CPython and what means: what can CPython guarantee in terms of
thread safety)?

Cheers,
Kamil

Philip Jenvey

unread,
Mar 6, 2009, 6:20:39 PM3/6/09
to pylons-...@googlegroups.com

Unfortunately, no, not one that I know of.

Which is a little unfortunate for Jython/IronPython -- their
collection classes would be a little faster if they didn't do thread
safety. Even still, you should be able to utilize multiple threads
better on these implementations. They're essentially doing fine
grained locking, instead of one giant lock.

--
Philip Jenvey

Wyatt Baldwin

unread,
Mar 6, 2009, 6:21:36 PM3/6/09
to pylons-discuss
On Mar 6, 12:58 pm, Kamil Gorlo <kgo...@gmail.com> wrote:
> On Thu, Mar 5, 2009 at 10:32 PM, Philip Jenvey <pjen...@underboss.org> wrote:
> > Though Jython and IronPython lack a GIL, they ensure the methods we
> > expect to be thread safe on the core data structures are in fact
> > thread safe, for compatibility with CPython.
>
> So, is there any place where can I read what is thread safe in
> IronPython or Jython (what means: what should be done to be compatible
> with CPython and what means: what can CPython guarantee in terms of
> thread safety)?

My reading is that there is no guarantee and that you should use locks
when you need to ensure thread safety.
Reply all
Reply to author
Forward
0 new messages