thread safe dict manipulation

1,500 views
Skip to first unread message

Bernhard

unread,
May 29, 2012, 1:51:54 PM5/29/12
to python-...@googlegroups.com
Hello together,
i have a dict that holds all channels for my chat application. If a user joins a channel, i check if this channel exists in my dict, if not => create.

[...]
channels = dict()
def join_user(request, channel)
  if channel not in self.channels:
    self.channels[channel] = set()
  self.channels[channel].add(request)

I wonder who to get this thread safe in tornado. Locking seems not be be a good solution under high load?
Same problem exists if i want to remove a channel if the last user quits.

Thank you for help and best regards,
Boerni


Claudio Freire

unread,
May 29, 2012, 2:11:37 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 2:51 PM, Bernhard <boe...@gmail.com> wrote:
> I wonder who to get this thread safe in tornado. Locking seems not be be a
> good solution under high load?
> Same problem exists if i want to remove a channel if the last user quits.

Do you use threads?

Tornado is async, which means no threads unless you start some threads
explicitly.

Anyway, CPython's GIL guarantees that any single operation on dicts is
atomic (though "single" is usually less than a line of code). With
care, you might be able to manipulate the dict using only atomic
operations, which would give you thread safety without explicit
locking (only CPython's implicit GIL locking).

One such way:

self.channels.setdefault(channel, set()).add(request)

It works since the setdefault call will be atomic, so all threads will
be guaranteed to be operating on the same set.

Bernhard

unread,
May 29, 2012, 3:06:28 PM5/29/12
to python-...@googlegroups.com
Am Dienstag, 29. Mai 2012 20:11:37 UTC+2 schrieb Klauss:
> self.channels.setdefault(channel, set()).add(request)

Thank you very much for help! setdefault method does what i want.
I'll consult the python docs to look for an "equivalent" to safe remove a dict key.

Best regards,
Boerrni

 

Andrew Fort

unread,
May 29, 2012, 3:08:06 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 11:11 AM, Claudio Freire <klauss...@gmail.com> wrote:

> self.channels.setdefault(channel, set()).add(request)
>
> It works since the setdefault call will be atomic, so all threads will
> be guaranteed to be operating on the same set.

Note that it's only atomic recently, and it looks like only in 2.7 or later.

http://bugs.python.org/issue13521

Claudio Freire

unread,
May 29, 2012, 3:08:34 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 4:06 PM, Bernhard <boe...@gmail.com> wrote:
> Thank you very much for help! setdefault method does what i want.
> I'll consult the python docs to look for an "equivalent" to safe remove a
> dict key.

try:
del dict[key]
except KeyError:
pass

should work just fine (del is atomic)

Claudio Freire

unread,
May 29, 2012, 3:10:14 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 4:08 PM, Andrew Fort <af...@choqolat.org> wrote:
> Note that it's only atomic recently, and it looks like only in 2.7 or later.
>
> http://bugs.python.org/issue13521

Nice one, so it should still be atomic in 2.6 and 2.5, if the key
types don't implement __hash__ in python.

Bernhard

unread,
May 29, 2012, 3:22:12 PM5/29/12
to python-...@googlegroups.com
Am Dienstag, 29. Mai 2012 21:08:34 UTC+2 schrieb Klauss:
try:
   del dict[key]
except KeyError:
   pass

should work just fine (del is atomic)

Thanks again :) 
There is a special problem in this case. I have to check if len(dict[key]) == 0:

try:
  if len(dict[key]) == 0:
    del dict[key]
except KeyError: 
   pass 

In this case the operation is not atomic?

Best regards,
Bernhard

Claudio Freire

unread,
May 29, 2012, 3:24:40 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 4:22 PM, Bernhard <boe...@gmail.com> wrote:
> try:
>   if len(dict[key]) == 0:
>     del dict[key]
> except KeyError:
>    pass
>
> In this case the operation is not atomic?

Nope.

I think you may need locks in that case.

Ben Darnell

unread,
May 29, 2012, 3:32:53 PM5/29/12
to python-...@googlegroups.com
On Tue, May 29, 2012 at 10:51 AM, Bernhard <boe...@gmail.com> wrote:
> Hello together,
> i have a dict that holds all channels for my chat application. If a user
> joins a channel, i check if this channel exists in my dict, if not =>
> create.
>
> [...]
> channels = dict()
> def join_user(request, channel)
>   if channel not in self.channels:
>     self.channels[channel] = set()
>   self.channels[channel].add(request)
>
> I wonder who to get this thread safe in tornado. Locking seems not be be a
> good solution under high load?

Why not? If you need a lock, use a lock. You can sometimes take
advantage of atomic dict operations, but those operations are made
atomic via the GIL so you're not avoiding any of the fundamental
issues that can arise with locks (and the GIL ensures that contention
on your own fine-grained locks is low so they're pretty cheap). I
recommend not relying too much on atomic dict operations since they
are so subtle (is the key's __hash__ method defined in C or python?
Does defaultdict have the same atomicity properties as a regular dict?
etc...).

Of course all of this may be moot since as Claudio pointed out Tornado
doesn't create any threads, so threading isn't an issue unless you
create the threads yourself.

-Ben

Eric O'Connor

unread,
May 29, 2012, 3:33:35 PM5/29/12
to python-...@googlegroups.com
There be dragons...

The fact that even the people recommending this abuse of implementation
details of the GIL are having difficulties on the mailing list may hint
against its use (for projects that may, in the future, be modified by
mortals, run against different versions of the compiler, etc. etc. etc.)

Also, the alternative isn't that bad:

$ python3 test.py
> no lock: 0.00768291711807251, lock: 0.019561283588409424, speedup:
2.5460750503731786

test.py << EOF
# this is not a good test
import threading,time

def test(x):
ret={}
for y in range(0,x):
ret[y]=x*2
for y in range(0,x,2):
del ret[y]
return ret

def test_lock(x):
ret={}
lock=threading.Lock()
for y in range(0,x):
with lock:
ret[y]=x*2
for y in range(0,x,2):
with lock:
del ret[y]
return ret

def time_it(c,fn,args):
total=0.0
for i in range(c):
start=time.time()
fn(*args)
end=time.time()
total+= end-start
return total/c

if __name__=='__main__':
no_lock=time_it(100,test,[10000])
lock=time_it(100,test_lock,[10000])
print('no lock: {}, lock: {}, speedup: {}'.format(no_lock, lock,
lock/no_lock))
EOF
--
Eric O'Connor
Reply all
Reply to author
Forward
0 new messages