Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sorted list vs dictionnary

2 views
Skip to first unread message

Alexandre Fayolle

unread,
May 16, 2002, 7:45:08 AM5/16/02
to
Hello everyone,

I have a class that is able to process messages. I want to avoid
processing the same message twice, so a solution is to maintain a set
of processed messages IDs, and to check if for the presence of the ID
before processing a document.

I've found two ways (without using anything outside the standard
library) of doing this:

* use a dictionnary with the ID as the key and None as the value, and
check for the key with cache.has_key(msg.ID)
* use a sorted list and the bisect module to keep it sorted, and check
for the key with
i = bisect.bisect_left(l,e)
if i<len(l) and l[i] == e:
pass

I'd expect the first one to be faster, since the lookup of a key in a
dictionnary is O(1), and insertion is a dictionnary is roughly O(1) too,
whereas bisect_left is O(log(N)), and inserting in the middle of a list
is O(N).

Could anyone confirm the speed issue, and maybe share some insights about
the memory efficiency of both solutions, especially when the number of
IDs get big (the order of magnitude is 50000)? I'm not too familiar with the
implementation of lists and dictionnary in the interpreter.

TIA

Alexandre Fayolle
--
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Narval, the first software agent available as free software (GPL).

Terry Reedy

unread,
May 16, 2002, 8:44:05 AM5/16/02
to

"Alexandre Fayolle" <a...@logilab.fr> wrote in message
news:slrnae76u...@virgo.logilab.fr...

For the reasons you discerned, a dictionary with a dummy value is
currently the standard way to maintain a set. Currently, using 1
instead of None as the dummy is faster since it avoids the lookup of
the name 'None' in the module and builtin dicts, but this will change
when 'None' is made a keyword (maybe in 2.3). I believe internal
memory consumption should be near enough the same to not be a worry.

Terry J. Reedy

0 new messages