Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sorted dictionary

4 views
Skip to first unread message

Jan Kaliszewski

unread,
Jan 20, 2010, 8:02:02 PM1/20/10
to pytho...@python.org
Hello,

Inspired by some my needs as well as some discussions in the net, I've
implemented a sorted dictionary (i.e. a dict that returns keys, values and
items always sorted by keys):

http://code.activestate.com/recipes/576998/

Maybe it'll appear to be useful for somebody... And I'm curious about your
opinions.

Regards,
*j

--
Jan Kaliszewski (zuo) <z...@chopin.edu.pl>

Steven D'Aprano

unread,
Jan 20, 2010, 8:50:58 PM1/20/10
to
On Thu, 21 Jan 2010 02:02:02 +0100, Jan Kaliszewski wrote:

> Hello,
>
> Inspired by some my needs as well as some discussions in the net, I've
> implemented a sorted dictionary (i.e. a dict that returns keys, values
> and items always sorted by keys):
>
> http://code.activestate.com/recipes/576998/


What's the advantage of that over sorting the keys as needed?


E.g.

for key in sorted(dict):
print key


works.


--
Steven

Message has been deleted

Steven D'Aprano

unread,
Jan 21, 2010, 1:54:35 AM1/21/10
to
On Wed, 20 Jan 2010 22:21:22 -0800, Dennis Lee Bieber wrote:

> On Thu, 21 Jan 2010 02:02:02 +0100, "Jan Kaliszewski"

> <z...@chopin.edu.pl> declaimed the following in
> gmane.comp.python.general:


>
>> Hello,
>>
>> Inspired by some my needs as well as some discussions in the net, I've
>> implemented a sorted dictionary (i.e. a dict that returns keys, values
>> and items always sorted by keys):
>>

> How does this differ from all the other "ordered dictionary"
> schemes out there (and maybe even included in 2.7? I'm still on 2.5)


Ordered dicts remember the insertion order of keys.

Sorted dicts return the keys in sorted order.


--
Steven

Chris Rebert

unread,
Jan 21, 2010, 2:49:22 AM1/21/10
to Steven D'Aprano, pytho...@python.org

Well, it does spread out the cost of sorting the key list over each of
the N distinct key assignments, versus sorting the entire list all at
once, on-demand at the end of the process. And you avoid having to
traverse the M buckets in the dictionary to locate the N keys in the
first place. So it has different performance characteristics, which
could theoretically matter depending on the use case; e.g. it could
avoid a GUI application hanging for several seconds while it sorts a
large list of keys.

Cheers,
Chris
--
http://blog.rebertia.com

Raymond Hettinger

unread,
Jan 21, 2010, 3:27:52 AM1/21/10
to
On Jan 20, 5:02 pm, "Jan Kaliszewski" <z...@chopin.edu.pl> wrote:
> Hello,
>
> Inspired by some my needs as well as some discussions in the net, I've  
> implemented a sorted dictionary (i.e. a dict that returns keys, values and  
> items always sorted by keys):
>
> http://code.activestate.com/recipes/576998/
>
> Maybe it'll appear to be useful for somebody... And I'm curious about your  
> opinions.

Using an underlying list to track sorted items
means that insertion and deletion take O(n) time.
That could be reduced to O(log n) time by using
a blist or skiplist as the underlying structure
for maintaining a sort.

The other alternative is to just append items to the list
and sort the list on demand. Since the Timsort takes advantage
of existing order, the process will approach O(n) if sorting
after every few updates. This is close the O(n) time it takes
to iterate the list in the first place:

s = SortedDict(items)
list(s) # O(n log n) to sort and O(n) to iterate
s[newkey] = newval
list(s) # close to O(n)

my two cents,


Raymond

Bearophile

unread,
Jan 21, 2010, 4:24:25 AM1/21/10
to
Raymond Hettinger:

> Using an underlying list to track sorted items
> means that insertion and deletion take O(n) time.
> That could be reduced to O(log n) time by using
> a blist or skiplist as the underlying structure
> for maintaining a sort.

In the collections module it can be useful to have ordered dicts and
sets based on search trees.
I'm thinking about B+ trees to use CPU cache locality better. It can
be fun :-)

Bye,
bearophile

Jan Kaliszewski

unread,
Jan 21, 2010, 1:08:45 PM1/21/10
to pytho...@python.org
21-01-2010, 07:21:22 Dennis Lee Bieber <wlf...@ix.netcom.com> wrote:

> How does this differ from all the other "ordered dictionary" schemes
> out there (and maybe even included in 2.7? I'm still on 2.5)

It's completely different, please read first paragraph of page I linked
(http://code.activestate.com/recipes/576998/).

Jan Kaliszewski

unread,
Jan 21, 2010, 1:43:45 PM1/21/10
to pytho...@python.org
Dnia 21-01-2010 o 08:49:22 Chris Rebert <cl...@rebertia.com> napisał(a):

> On Wed, Jan 20, 2010 at 5:50 PM, Steven D'Aprano
> <ste...@remove.this.cybersource.com.au> wrote:

>>> http://code.activestate.com/recipes/576998/

> Well, it does spread out the cost of sorting the key list over each of
> the N distinct key assignments, versus sorting the entire list all at
> once, on-demand at the end of the process. And you avoid having to
> traverse the M buckets in the dictionary to locate the N keys in the
> first place. So it has different performance characteristics, which
> could theoretically matter depending on the use case; e.g. it could
> avoid a GUI application hanging for several seconds while it sorts a
> large list of keys.

Generally, I see 3 possible approaches:

1. What Steven wrote about: simply sort all keys when you need to get them
sorted (it needs iteration over whole a dict). In many cases it's good
enough (so is the best, because of simplicity) -- e.g. when a dict is not
very long (very common case), or when that sorting it is a rare operation.
Sort always when a key is added/deleted, maintaining an auxiliary list of
sorted keys (sufficient especially when you modify a dict rarely and get
from it often).

2. Sort all keys on-demand -- but only if a dict is marked as "modified".
Mark a dict as "modified" when you add/delete any key; and mark a dict as
"unmodified" when sorting. (Of course, it can be automatized, see e.g.:
http://pypi.python.org/pypi/sorteddict). Nice, if your use-pattern is:
add/del a lot, *then* only get/iter a lot...

3. Continually (at each set/del of a key) maintain auxiliary sorted list
of keys -- using binary search (as I did), heap queue (see:
http://pypi.python.org/pypi/HeapDict) or such an algorithm. I think this
approach is the best when you have quite long dict, and you add/del fairly
often and get/iter very often.

Jan Kaliszewski

unread,
Jan 21, 2010, 1:45:56 PM1/21/10
to pytho...@python.org
Dnia 21-01-2010 o 09:27:52 Raymond Hettinger <pyt...@rcn.com> napisał(a):

> On Jan 20, 5:02 pm, "Jan Kaliszewski" <z...@chopin.edu.pl> wrote:

>> http://code.activestate.com/recipes/576998/

> Using an underlying list to track sorted items
> means that insertion and deletion take O(n) time.
> That could be reduced to O(log n) time by using
> a blist or skiplist as the underlying structure
> for maintaining a sort.

Please note that I used funcions from bisect, that use binary search.

Doesn't it take O(log n) time?

Arnaud Delobelle

unread,
Jan 21, 2010, 2:41:53 PM1/21/10
to
"Jan Kaliszewski" <z...@chopin.edu.pl> writes:

Insertion and deletion are still O(n) as all items to the right of the
inserted/deleted one have to be shifted by one place.

--
Arnaud

0 new messages