Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ordered Sets

2 views
Skip to first unread message

Raymond Hettinger

unread,
Mar 19, 2009, 5:26:06 PM3/19/09
to
Here's a new, fun recipe for you guys:

http://code.activestate.com/recipes/576694/

Enjoy,


Raymond

Aahz

unread,
Mar 19, 2009, 7:19:08 PM3/19/09
to
In article <9a5d59e1-2798-4864...@s9g2000prg.googlegroups.com>,

Raymond Hettinger <pyt...@rcn.com> wrote:
>
>Here's a new, fun recipe for you guys:
>
>http://code.activestate.com/recipes/576694/

That is *sick* and perverted.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"Programming language design is not a rational science. Most reasoning
about it is at best rationalization of gut feelings, and at worst plain
wrong." --GvR, python-ideas, 2009-3-1

Nigel Rantor

unread,
Mar 20, 2009, 7:33:15 AM3/20/09
to Aahz, pytho...@python.org
Aahz wrote:
> In article <9a5d59e1-2798-4864...@s9g2000prg.googlegroups.com>,
> Raymond Hettinger <pyt...@rcn.com> wrote:
>> Here's a new, fun recipe for you guys:
>>
>> http://code.activestate.com/recipes/576694/
>
> That is *sick* and perverted.

I'm not sure why.

Would it be less sick if it had been called "UniqueList" ?

n

Aahz

unread,
Mar 20, 2009, 9:59:59 AM3/20/09
to
In article <mailman.2269.1237548...@python.org>,

The doubly-linked list part is what's sick and perverted.

Raymond Hettinger

unread,
Mar 20, 2009, 1:08:48 PM3/20/09
to
[Aahz]

> The doubly-linked list part is what's sick and perverted.

The doubly-linked list part is what gives it big-oh running
times the same as regular sets. If the underlying sequence
is stored as a list, then deletion goes from O(1) to O(n).
If the insertion times are recorded in a dictionary, then
the insertion order has to be sorted prior to iteration
which makes iteration go from O(n) to O(n log n).


Raymond

Scott David Daniels

unread,
Mar 20, 2009, 2:50:28 PM3/20/09
to

The double-linked list part could be done with weakrefs and
perhaps Aahz would relax a bit.

--Scott David Daniels
Scott....@Acm.Org

Raymond Hettinger

unread,
Mar 20, 2009, 3:42:01 PM3/20/09
to
[Scott David Daniels]

> The double-linked list part could be done with weakrefs and
> perhaps Aahz would relax a bit.

Am thinking that the __del__() takes care of business.
The clear() operation removes the links and all references
to them (including circular).


Raymond

Aaron Brady

unread,
Mar 20, 2009, 8:02:18 PM3/20/09
to
> Scott.Dani...@Acm.Org

Whether the structure holds one reference to its "containees" or three
doesn't matter. But you're right, weakrefs are relaxing.

I pronounce it the perfect data structure. Tell your CS 200 professor
immediately. Shout it from the rooftops.

IINM if I'm not mistaken, an O(1) 'multiadd' method wouldn't be hard
either. If the item exists already, insert it in place in the linked
list.

Steven D'Aprano

unread,
Mar 20, 2009, 11:46:07 PM3/20/09
to

I'm not sure whether Aahz's description is meant to be taken seriously,
or if there is a smiley hidden in his post. After all, doubly-linked
lists are a standard building block in data structure design.

But assuming it is meant as a serious criticism, I'm not sure that I like
the idea of using weakrefs. I think it is a feature, not a bug, that
sets, lists and dicts keep a reference to the items in them. I think it
would be bizarre if putting an item into an OrderSet meant that you
needed to put it somewhere else as well to prevent it being garbage
collected.

Or have I misunderstood the consequences of using weakrefs?

--
Steven

Terry Reedy

unread,
Mar 21, 2009, 12:29:17 AM3/21/09
to pytho...@python.org

I think the idea was 2 weakrefs and 1 normal ref instead of 3 normal refs.

Raymond Hettinger

unread,
Mar 21, 2009, 1:39:27 AM3/21/09
to
[Terry Reedy]

> I think the idea was 2 weakrefs and 1 normal ref instead of 3 normal refs.

Right. Here's a link to a weakref version (though I think the
previous __del__ version also does the trick):

http://code.activestate.com/recipes/576696/


Raymond

pataphor

unread,
Mar 26, 2009, 6:29:43 AM3/26/09
to
Raymond Hettinger wrote:

> Right. Here's a link to a weakref version (though I think the
> previous __del__ version also does the trick):
>
> http://code.activestate.com/recipes/576696/

Interesting. But how about this one? Does it also have the reference
problem? By the way collections.MutableSet needs python >= 2.6

P.

import collections

class OrderedSet(collections.MutableSet):
'Set that remembers the order elements were added'
# Big-O running times for all methods are the same as for regular sets.

def __init__(self, iterable=None):
self.__map = {} # key --> [prev,key,next]
if iterable is not None:
self |= iterable

def __len__(self):
return len(self.__map)

def __contains__(self, key):
return key in self.__map

def add(self, key):
if not self.__map:
self.start = key
self.__map = {key: [key,key,key]}
elif key not in self.__map:
a,b,c = self.__map[self.start]
self.__map[a][2] = key
self.__map[b][0] = key
self.__map[key] = [a,key,b]

def discard(self, key):
if key in self.__map:
a,b,c = self.__map.pop(key)
if self.__map:
self.__map[a][2] = c
self.__map[c][0] = a
if b == self.start:
self.start = c

def __iter__(self):
if self.__map:
a,b,c = self.__map[self.start]
while c is not self.start:
yield b
a,b,c = self.__map[c]
yield b

def __reversed__(self):
if self.__map:
a,b,c = self.__map[self.start]
while a is not self.start:
yield a
a,b,c = self.__map[a]
yield a

def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))
self.discard(key)
return key

def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))

def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return not self.isdisjoint(other)

def test():
D = OrderedSet('abcd')
R = OrderedSet()
for x in list(D):
print(D,R)
R.add(D.pop(last = False))
print(D,R)

if __name__ == '__main__':
test()

Scott David Daniels

unread,
Mar 26, 2009, 11:20:17 AM3/26/09
to
pataphor wrote:
> Raymond Hettinger wrote:
>
>> Right. Here's a link to a weakref version (though I think the
>> previous __del__ version also does the trick):
>>
>> http://code.activestate.com/recipes/576696/
>
> Interesting. But how about this one? Does it also have the reference
> problem? By the way collections.MutableSet needs python >= 2.6
...

> class OrderedSet(collections.MutableSet):
> 'Set that remembers the order elements were added'
> # Big-O running times for all methods are the same as for regular sets.
>
> def __init__(self, iterable=None):
> self.__map = {} # key --> [prev,key,next]
> if iterable is not None:
> self |= iterable

...


> def add(self, key):
> if not self.__map:
> self.start = key
> self.__map = {key: [key,key,key]}

...
Two responses:
(1) No, it doesn't have the reference problem, but it does replace
a reference with map lookup (at a performance cost).
and
(2) Why, oh why, do people feel so comforted adding double_underscores
to data structures? If I want to inherit from your mapping in order
to log the attempts to discard a key not actually in the set, I
have to either us the nasty name mangling to get at self.__map, or
name my subclass OrderedSet and take advantage of a not-guaranteed
name mangling. What on earth makes you unwilling to go with "_map"
and credit your code's consumers with a bit of common sense?

Sorry, this latter rant has been building up for a while (spurred on by
a sojourn into the unittest code where they have the private disease in
spades.

--Scott David Daniels
Scott....@Acm.Org

Gabriel Genellina

unread,
Mar 28, 2009, 3:33:55 AM3/28/09
to pytho...@python.org
En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels
<Scott....@acm.org> escribió:

> (2) Why, oh why, do people feel so comforted adding double_underscores
> to data structures? If I want to inherit from your mapping in order
> to log the attempts to discard a key not actually in the set, I
> have to either us the nasty name mangling to get at self.__map, or
> name my subclass OrderedSet and take advantage of a not-guaranteed
> name mangling. What on earth makes you unwilling to go with "_map"
> and credit your code's consumers with a bit of common sense?
>
> Sorry, this latter rant has been building up for a while (spurred on by
> a sojourn into the unittest code where they have the private disease in
> spades.

My commiserations.

--
Gabriel Genellina

Aahz

unread,
Mar 29, 2009, 12:22:42 PM3/29/09
to
In article <01d457aa$0$17208$c3e...@news.astraweb.com>,

Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:
>On Fri, 20 Mar 2009 11:50:28 -0700, Scott David Daniels wrote:
>> Raymond Hettinger wrote:
>>> [Aahz]
>>>>
>>>> The doubly-linked list part is what's sick and perverted.
>>>
>>> The doubly-linked list part is what gives it big-oh running times the
>>> same as regular sets. If the underlying sequence is stored as a list,
>>> then deletion goes from O(1) to O(n). If the insertion times are
>>> recorded in a dictionary, then the insertion order has to be sorted
>>> prior to iteration which makes iteration go from O(n) to O(n log n).
>>
>> The double-linked list part could be done with weakrefs and perhaps Aahz
>> would relax a bit.
>
>I'm not sure whether Aahz's description is meant to be taken seriously,
>or if there is a smiley hidden in his post. After all, doubly-linked
>lists are a standard building block in data structure design.

It's a half-smiley. I find the trick of using a Python list to store the
doubly-linked list difficult to understand (as opposed to the usual
mechanism of a node class). I understand why it was done (time and space
efficiency), but I also still feel emotionally that it's somewhat sick
and perverted. I probably would feel more comfortable if the
doubly-linked list were abstracted out and commented. Heck, the whole
thing needs unit tests.

I needed to look up the code again, so I might as well repeat the URL to
make it handy for everyone else:

http://code.activestate.com/recipes/576694/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it." --Brian W. Kernighan

CTO

unread,
Mar 29, 2009, 2:29:49 PM3/29/09
to
On Mar 28, 3:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

> En Thu, 26 Mar 2009 12:20:17 -0300, Scott David Daniels  
> <Scott.Dani...@acm.org> escribió:

>
> > (2) Why, oh why, do people feel so comforted adding double_underscores
> >      to data structures?

Probably because other authors feel too comfortable using single
underscores for
things the end user should mess with, as in namedtuple.

Nick Craig-Wood

unread,
Mar 30, 2009, 4:30:04 AM3/30/09
to
Aahz <aa...@pythoncraft.com> wrote:
> I find the trick of using a Python list to store the doubly-linked
> list difficult to understand (as opposed to the usual mechanism of a
> node class). I understand why it was done (time and space
> efficiency), but I also still feel emotionally that it's somewhat
> sick and perverted. I probably would feel more comfortable if the
> doubly-linked list were abstracted out and commented. Heck, the
> whole thing needs unit tests.

Hmmm... Lets compare the two.

Here is the length three list

$ python
Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> L = []
>>> for i in xrange(1000000):
... L.append([1,2,3])
...
>>> import os
>>> os.getpid()
28134
>>>

(from top)
28134 ncw 20 0 58860 53m 1900 S 0 2.6 0:02.62 python

vs a Node class with __slots__ (for efficiency)

$ python
Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class Node(object):
... __slots__ = ["prev", "next", "this"]
... def __init__(self, prev, next, this):
... self.prev = prev
... self.next = next
... self.this = this
...
>>> Node(1,2,3)
<__main__.Node object at 0xb7e897cc>
>>> Node(1,2,3).prev
1
>>> L = []
>>> for i in xrange(1000000):
... L.append(Node(1,2,3))
...
>>> import os
>>> os.getpid()
28203
>>>

(from top)
28203 ncw 20 0 43364 38m 1900 S 0 1.9 0:04.41 python

So the Node class actually takes less memory 38 Mbytes vs 53 Mbytes for
the list.

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

pataphor

unread,
Mar 30, 2009, 12:27:11 PM3/30/09
to
On Mon, 30 Mar 2009 03:30:04 -0500
Nick Craig-Wood <ni...@craig-wood.com> wrote:

> >>> class Node(object):
> ... __slots__ = ["prev", "next", "this"]
> ... def __init__(self, prev, next, this):
> ... self.prev = prev
> ... self.next = next
> ... self.this = this

[...]

> So the Node class actually takes less memory 38 Mbytes vs 53 Mbytes
> for the list.

Well OK, but if we're going to start optimizing, it seems we don't need
to store the key itself in the Node or the list. Here's a version of my
script that stores only prev and next. Another twist is that it is now
possible to arbitrarily set the starting point for the iteration. (Just
in case someone needed a cue for further ranting)

P.

import collections

class OrderedSet(collections.MutableSet):
'Set that remembers the order elements were added'
# Big-O running times for all methods are the same as for regular
sets.
def __init__(self, iterable=None):

self.links = {} # key --> [prev,next]


if iterable is not None:
self |= iterable

def __len__(self):
return len(self.links)

def __contains__(self, key):
return key in self.links

def add(self, key):
if not self:
self.start = key
self.links = {key: [key,key]}
elif key not in self.links:
links = self.links
start = self.start
prev, next = links[start]
links[prev][1] = key
links[start][0] = key
links[key] = [prev,start]

def discard(self, key):
links = self.links
if key in links:
prev,next = links.pop(key)
if self.links:
links[prev][1] = next
links[next][0] = prev
if key == self.start:
self.start = next
else:
del self.start

def __iter__(self):
links = self.links
start = self.start
if links:
yield start
prev,next = links[start]
while next != start:
yield next
prev,next = links[next]

def __reversed__(self):
links = self.links
start = self.start
if links:
prev,next = links[start]
while prev != start:
yield prev
prev,next = self.links[prev]
yield start

def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))
self.discard(key)
return key

def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))

def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return not self.isdisjoint(other)

def test():
D = OrderedSet('abcd')
R = OrderedSet()
for x in list(D):
print(D,R)
R.add(D.pop(last = False))
print(D,R)

print()
R.start = 'c'
print(list(reversed(R)))

Alex_Gaynor

unread,
Mar 30, 2009, 10:57:39 PM3/30/09
to
On Mar 30, 12:27 pm, pataphor <patap...@gmail.com> wrote:
> On Mon, 30 Mar 2009 03:30:04 -0500
>

I really like the Ordered Set class(I've been thinking about one ever
since ordered dict hit the std lib), is there any argument against
adding one to the collections module? I'd be willing to write a PEP
up for it.

Alex

pataphor

unread,
Mar 31, 2009, 4:33:26 AM3/31/09
to
On Mon, 30 Mar 2009 19:57:39 -0700 (PDT)
Alex_Gaynor <alex....@gmail.com> wrote:

> I really like the Ordered Set class(I've been thinking about one ever
> since ordered dict hit the std lib), is there any argument against
> adding one to the collections module? I'd be willing to write a PEP
> up for it.

Suppose the implementation would use a circular linked list. Then the
iteration could start from any point, the thing is symmetric after all.
But this would also mean we could add items to the left of that new
starting point, since that would now be the 'end' of the circle. This
is something different from merely remembering insertion order. How do
you feel about that?

P.

pataphor

unread,
Mar 31, 2009, 5:52:45 AM3/31/09
to

And in case that didn't confuse you enough, how about this method?

def move(self,key1,key2):
#self ==> key1,(key2 ... end), (key1+1... key2-1)
links = self.links
if set([key1,key2]) and self :
start = self.start
a = links[key1][1]
b = links[key2][0]
c = links[start][0]
links[key1][1] = key2
links[key2][0] = key1
links[a][0] = c
links[c][1] = a
links[b][1] = start
links[start][0] = b

This takes [key2:] (isn't pseudo slice notation wonderful?) and
inserts it after key1.

for example:

R = OrderedSet(range(10))
print(list(R))
R.move(3,7)
print(list(R))

gives:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 7, 8, 9, 4, 5, 6]

All in O(1) of course.

P.

Alex_Gaynor

unread,
Mar 31, 2009, 9:03:13 AM3/31/09
to
On Mar 31, 5:52 am, pataphor <patap...@gmail.com> wrote:
> On Tue, 31 Mar 2009 10:33:26 +0200
>

My inclination would be to more or less *just* have it implement the
set API, the way ordered dict does in 2.7/3.1.

Alex

pataphor

unread,
Mar 31, 2009, 11:06:22 AM3/31/09
to
On Tue, 31 Mar 2009 06:03:13 -0700 (PDT)
Alex_Gaynor <alex....@gmail.com> wrote:

> My inclination would be to more or less *just* have it implement the
> set API, the way ordered dict does in 2.7/3.1.

As far as I can tell all that would be needed is read/write access to
two key variables: The iterator start position and the underlying map.
There is no need for more than basic set API since people can use
those two variables to subclass their own iterators.

P.

Alex_Gaynor

unread,
Mar 31, 2009, 1:33:07 PM3/31/09
to
On Mar 31, 11:06 am, pataphor <patap...@gmail.com> wrote:
> On Tue, 31 Mar 2009 06:03:13 -0700 (PDT)
>
> Alex_Gaynor <alex.gay...@gmail.com> wrote:
> > My inclination would be to more or less *just* have it implement the
> > set API, the way ordered dict does in 2.7/3.1.
>
> As far as I can tell all that would be needed is read/write access to
> two key variables: The iterator start position and the underlying map.
> There is no need for more than basic set API since people can use
> those two variables to subclass their own iterators.
>
> P.

The only issue with that is if we ever moved it to a C implementation
we'd probably use a more conventional linked list.

Alex

0 new messages