Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

deque and thread-safety

29 views
Skip to first unread message

Christophe Vandeplas

unread,
Oct 12, 2012, 8:40:21 AM10/12/12
to pytho...@python.org
Hello,

I have a question about deque and thread-safety.

My application has multiple threads running concurrently and doing the
same action (downloading pages)
To know what has already been downloaded I created the variable:
seen = deque('', 1000) (keeps list of max 1000 urls in memory)

In one place of the code I do: seen.append(url)

And another place:
def seenPage()
if url in seen:
return True
return False

>From the documentation I understand that deques are thread-safe:
> Deques are a generalization of stacks and queues (the name is pronounced “deck”
> and is short for “double-ended queue”). Deques support thread-safe, memory
> efficient appends and pops from either side of the deque with approximately the
> same O(1) performance in either direction.

It seems that appending to deques is indeed thread-safe, but not
iterating over them.
I've seen a similar, but different, situation here:
http://comments.gmane.org/gmane.comp.python.devel/85487

Forgetting the above url, and considering my situation this behavior
screws up the concept I need:
- Keeping a thread-safe collection of seen urls,
- Being able to check if something is in that collection
- No need to clean the collection to prevent the memory from filling up

So I know I could work around this problem by using a lock.
But then I don't only need to use the lock around the iterator, but
also around the append(), but that defeats the purpose of deque being
thread-safe.

In short, what's your advice:

1/ build a lock around the .append() and the iterator. Using the
already-existing lock in the deque. But HOW?

1/ simply build a lock around the .append() and the iterator.
Defeating the build-in thread-safety.

2/ use another collection that does what I need


Thanks for your expertise.

Christophe

Christophe Vandeplas

unread,
Oct 12, 2012, 8:59:16 AM10/12/12
to pytho...@python.org
Ok sorry for the mail,
I found the solution by using deque.count(url) that's available
starting from python 2.7

Antoine Pitrou

unread,
Oct 12, 2012, 9:36:24 AM10/12/12
to pytho...@python.org

Hello,

Christophe Vandeplas <christophe <at> vandeplas.com> writes:
>
> From the documentation I understand that deques are thread-safe:
> > Deques are a generalization of stacks and queues (the name is pronounced
“deck”
> > and is short for “double-ended queue”). Deques support thread-safe, memory
> > efficient appends and pops from either side of the deque with approximately
the
> > same O(1) performance in either direction.
>
> It seems that appending to deques is indeed thread-safe, but not
> iterating over them.

Indeed, if you read the above sentence, deques only support "thread-safe [...]
appends and pops". Other operations are not necessarily thread-safe.
Moreover, any operation which will call back into Python (such as iterating
several times) are *not* thread-safe.

Regardless, deques are not the right container for containment tests
("url in seen"). Like with lists or tuples, a containment test in a deque will
be O(n). So if you want efficient containment tests, you should use a set or a
dict.

Regards

Antoine.


Dieter Maurer

unread,
Oct 13, 2012, 2:35:19 AM10/13/12
to pytho...@python.org
Christophe Vandeplas <chris...@vandeplas.com> writes:

> ...
> From the documentation I understand that deques are thread-safe:
>> Deques are a generalization of stacks and queues (the name is pronounced “deck”
>> and is short for “double-ended queue”). Deques support thread-safe, memory
>> efficient appends and pops from either side of the deque with approximately the
>> same O(1) performance in either direction.
>
> It seems that appending to deques is indeed thread-safe, but not
> iterating over them.

You are right.

And when you think about it, then there is not much point in striving
for thread safety for iteration (alone).
Iteration is (by nature) a non atomic operation: you iterate because
you want to do something with the intermediate results; this "doing"
is not part of the iteration itself.
Thus, you are looking for thread safety not for only the iteration
but for the iteration combined with additional operations (which
may well extend beyond the duration of the iteration).

Almost surely, the "deque" implementation is using locks
to ensure thread safety for its "append" and "pop". Check whether
this lock is exposed to the application. In this case, use
it to protect you atomic sections involving iteration.

0 new messages