I find such data structures v. useful in C++. I know that in Python
the sort function is v. fast, but often I prefer never to sort but
simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.)
I think other people must find such things useful. There are three
implementations on the Python Cookbook site, and one on PyPI, all in
pure Python (plus I have my own implementation, also pure Python).
I would suppose that it would be better if it was implemented in C---
for example, my own pure Python ordered dict loads data about eight
times slower than the built-in dict. Nonetheless, I still find it
worth using for the convenience it offers.
Do other Python programmers feel this lack? Is this worth a PEP?
[I originally asked about this on the P3K mailing list, but then
realised that it isn't version-specific really.]
Yes, this is a serious lack in the standard library.
Michele Simionato
cheers,
Stef Mientki
I usually make a distinction between a sorted dict, where iteration
(and potentially positional indexing) happens in sorted key order; and
an ordered dict where items maintain insertion order. I use the latter
all the time, and e.g. Django's model metaclass does some minor magic
to overcome the fact that field-order is lost by the time your
metaclass gets control, since the attributes are passed as a regular
dict.
-- bjorn
> I feel that Python lacks one useful data structure: an ordered
> dictionary.
>
> I find such data structures v. useful in C++.
[snip]
Personally, I've never missed an ordered dict. What do people use them
for?
In fact, I'm not sure what people mean by ordered dicts. I assume they
mean the keys are ordered, not the values. But ordered by what? Insertion
order? Modification order? Alphabetical order?
--
Steven.
Insertion order.
M.S.
> If you're going to extend the dictionary, there's one other flag I'm
> continuously missing: "case-insensitive" key.
Completely untested and probably buggy as anything, but here's an
absolutely minimal case-insensitive dictionary.
class CaselessDict(dict):
def __setitem__(self, key, value):
super(CaselessDict, self).__setitem__(key.lower(), value)
def __getitem__(self, key):
return super(CaselessDict, self).__getitem__(key.lower())
--
Steven.
Actually I meant by key order, so insertion order doesn't matter at
all. If you need a dictionary-like data structure that respects
insertion order you could use a list of (key, value) tuples.
Another respondent asked about use cases.
I have found time and again that I needed (key, value) pairs where the
key is some string that provides a representation for human readers
and the value is some internal key (e.g., database ID) that the system
uses internally. In these cases I often need to present the user with
a list of items from which to choose and want to present them in
sorted order. Naturally, I could use an ordinary dict and then do
this:
for string in sorted(d.keys()):
process(string)
But what happens when I need to do this a *lot* and when the number of
items is hundreds or a few thousands? I'm having to sort again and
again, since it is often the case that the items in the list changes
during the runtime of the application. So my solution in C++ is to use
an ordered dictionary (a map in C++ jargon), which in Python means I
can simply write:
for string in od.keys():
process(string)
Another situation where this is useful is if you need to hold lists of
strings preserving case but want them to be presented case-
insensitively. Here, you populate the ordered dictionary like this:
for string in somelist:
od[string.lower()] = string
Now you can iterate over od in case-insensitive order and yet still
have access to the case-sensitive original, and never have to
explicitly sort the strings.
I think it comes down to the nail and hammer issue: if you have a
hammer all problems look nail shaped. In Python we have dict and list
so everything looks suitable for those data structures. But having had
a wrench (ordered dictionaries) in C++, I find that some problems need
a wrench. I think that once Python programmers have access to an
ordered dictionary some proportion of them will find it painful to
program without it, so I really think it belongs in the standard
library.
For your use case I would wrap a list [(key, value)] with a dict-like
object and I would use the bisect module in the standard library to
keep
the inner list ordered.
M.S.
> In fact, I'm not sure what people mean by ordered dicts. I assume they
> mean the keys are ordered, not the values. But ordered by what?
> Insertion order? Modification order? Alphabetical order?
All of the above at different times. Usually I think they mean original
insertion order, I'd more often like modification order though.
Insertion order can certainly be useful: any situation where you currently
keep items in a list because the original order is important but also copy
them into a dictionary or set because you need fast lookup.
e.g. if you want only unique items out of a list but need to preserve order
of first occurrence.
Another example would be uml->code round tripping: you parse in an existing
file containing method definitions, then update the method definitions from
a UML model and write the file out again: ArchGenXML does exactly that, but
last I looked the methods were written out in whatever order the dictionary
stored them which really messes up version control. It really should
preserve the order that methods appeared in the file.
Modification order would be great for a cache. Touch items whenever they
are referenced and then whenever the cache gets too big just pop the oldest
until you get back to the desired size.
Alphabetical or other sorted order when you want to get smallest/largest or
next smaller/larger then a specific item. I don't agree you necessarily
mean ordered by key rather than value, I think you could specify any
sortkey and expect updating the value to restore the ordering. Maybe if you
are recording some stats and want to regularly output the top 10?
Mark referred to C++ and the sort function, so I assume he meant
alphabetical order (or the equivalent for types other than strings).
--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton
Roel Schroeven
Or subclass dict to carry along a sorted list of keys with it and return
that when dict.keys() is called. Even better, only have .keys() sort
the keys list when a key has been added to it since the last call.
--
-Bill Hamilton
These sort of disagreements about what such a beast should "obviously"
do are a good part of the reason why one doesn't exist in the standard
library. A sorted dict is so trivial that it's not really seen as
necessary, since all you have to do is sort the keys(). An ordered
dict is generally better satisfied with a list, as you've seen.
There's implementations of all these for people who want something
wrapped up in the cookbook.
>But what happens when I need to do this a *lot* and when the number of
>items is hundreds or a few thousands?
keys = sorted(d.keys())
for k in key:
...
You've got two choices: either you insert more than you iterate, in
which case sorting at the point of iteration is the most efficient
option, or you iterate more than you insert, in which case sorting
after you insert is easy. There's no way that an ordered dict
implementation could satisfy both constraints. Can you honestly think
of a use case where the performance of your sorting is a bottleneck
*and* it's a suitably general case that a standard lib class should
optimize for that specific case? Even in this short thread we've seen
3 different ideas of what the dict should do.
That would make insertion and deletion would be quite expensive.
Which might be ok for the OP, but it's often better to go with a
btree.
Which, you know, would be a reasonable candidate for collections
module.
Carl Banks
I once had a GUI that displayed the contents of a dict as a tree. The
user could add entries, and the entries would display at the end.
Hence, the ordered dict.
I could have kept the order separately, but what was the point? I
just whipped up a homemade ordered dict class, it worked fine, and I
didn't have to worry about keeping them synchronized.
Carl Banks
A mapping with sorted keys could theoretically be convenient when
your keys are comparable but not hashable.
--
Neil Cerutti
Actually, in my own ordereddict I wrap a dict and keep the keys in
order using the bisect module.
This works fine, but I imagine that a C-based implementation based on
B*trees or skiplists would be a lot faster.
In response to some of the following emails, I note one responder says
such a class is trivial to implement. Yes, and so presumably is
defaultdict, but the latter is in the standard library. One of the
"problems" of Python is that the range of skills of its users varies
from casual "amateur" programmers to guru professionals and everywhere
inbetween. Sure at somewhere on that continuum implementing *anything*
is "trivial", but for some users it is worthwhile to have it out of
the box.
What I like about ordered dictionaries is the ability to have sorted
data without sorting. It lets me have multiple sort orders. For
example, in some programs I give the user the choice of how they want
their data ordered and can provide such orderings without sorting,
simply by maintaining a set of parallel data structures with keys
being ordered and values being pointers (object references in Python)
to the relevant values. This costs in memory (but not much since no
values are duplicated and the values are usually large the keys
usually small).
Psuedocode example:
class Person:
def __init__(self, name, payrollno, grade, dept):
self.name = name
# etc
For each person created:
personFromName["%s\t%012d" % (person.name.lower(),
person.payrollno)] = person
personFromPayrollNo[person.payrollno] = person
personFromGrade["%s\t%012d" % (person.grade, person.payrollno)] =
person
personFromDept["%s\t%012d" % (person.dept, person.payrollno)] =
person
So now I have four orderings with no sorting. (I have to disambiguate
to avoid duplicates and to provide subordering.)
If I have 10,000 people I would potentially have to resort 10,000
instances whenever the user chose a new sort order: this way I just
have to iterate over the right ordered dictionary.
So I do feel that there are good use cases, and I think that for some
Python users having this in the library would be a genuine benefit.
Most of the use cases I envisage involve lots of insertions at start
up, relatively few during runtime, and lots of iterations over the
data as the user changes their view of it.
Of course Python does have an ordered dictionary, it is just not
necessarily always installed. You can use the bsdbd3 module's btopen()
method and pass None as filename to get an in-memory Btree, but I
believe (not sure) that there may be length limits on the keys.
Opinions vary, but I think of it this way:
- A sorted dict keeps its keys sorted. This has an obvious alternative
just sort the keys later (though I can imagine cases where it would be
nicer not to).
- An ordered dict remembers the order of insertion. This is useful when
you want to retain the order of insertion yet you also want fast lookup
of individual items. Unlike a sorted dict, there is no trivial
alternative (though you can argue about "trivial") and I really wish
there was. There are several free versions available.
-- Russell
Why it should be a dict. With it you can only maintain the order
x1<x2<x3... But by definition the order only requires x1<=x2<=x3...
There are many use cases where keys are not unique. I'd prefer a list
implementation, which would maintain the user defined order (cmp, key).
So you could map your objects with any order.
TV
Yes. It should use a functional data structure.
Could you elaborate?
Here's my impression of the postings so far: there are 3 categories of
response:
(A) There is no need for such a thing.
(B) It is useful, but so easy to implement that it needn't be in the
library.
(C) It is worth having an ordered data structure of some kind.
Regarding (A) and (B): I remember Python before it had the sets
module. "There's no need for sets, you can do them with dicts".
Thankfully sets are in the language and I for one use them
extensively.
For (C) what I had in mind was:
An API that is identical to a dict, with the few additional methods/
behaviours listed below:
- If an item is inserted it is put in the right place (because the
underlying data structure, b*tree, skiplist or whatever is
intrinsically ordered by < on the key)
- Extra methods
key_at(index : int) -> key
item_at(index : int) -> (key, value)
value_at(index : int) -> value
set_value_at(index : int, value) # no return value necessary but
maybe key if convenient
od[x:y:z] -> list of values ordered by key # can read a slice but
not assign a slice
and maybe
keys_for(fromindex : int, uptobutexcludingindex : int) -> ordered
list of keys
I've given examples of the use cases I envisage (and that I use in
both Python with my own ordered dict and in C++ with the map class) in
a previous posting, and I've shown the API I envisage above. I know
that some people might prefer ordering by value or the option of case-
insensitive ordering, but both these can be achieved using the API
above, e.g.,
od[value.lower()] = value # case-insensitive ordering of list of
values
And as for duplicates there are two approaches I use: (1) make each
string unique as I showed in my previous post, or (2) use a value that
itself is a list, dict or set.
(As for those who have suggested an ordered data structure that isn't
a dict, I don't see a conflict, I'd be happy to see more data
structures in the collections module.)
Is what I've suggested sufficient (and sufficiently general) for the
standard library? If it is not, what should be done, or is there some
other approach and API that would better provide the functionality
envisaged?
If you want a dictionary that iterates over his keys and does so
in a sorted manner you probably want a tree.
One posible implemenation can be found here:
http://www.pardon-sleeuwaegen.be/antoon/avltree.html
I hope it is usefull to you.
--
Antoon Pardon
Is the index the insertion order? The only use I have an ordered
dictionary is to quickly determine that a sequence value is the
first duplicate found (the structure can have millions of numbers)
and then play back (so they don't have to be re-calculated) the
sequence from the first duplicate to the end.
So I would get the index when I find the duplicate and then
iterate data[index:] to get the sequence in insertion order?
+1 for all your suggestions below, but -1 for the above. You suggest that
myOrderedDict['key'] = value
would place it in the dictionary in sorted order depending on 'key'.
More natural to some of us (or maybe its just me) would be to append the
key/value pair to the end of items.
My Argument: The problem with inserting at sorted position is the
assumption that the dict is already sorted, but maybe I want the key
ordering to reflect something arbitrary, like a database, etc (e.g.
'Last', 'First', 'MI'). So now I assign to an OrderedDict that already
has these latter 3 keys:
myOrderedDict['Favorite Color'] = 'chartruse'
Where does it go? Answer: it probably depends on how the underlying
sorter works and (even worse) the internal state of the sorter's data
structure. So, my intuition is that SortedDict behavior should be kept
distinct from OrderedDict behavior.
> - Extra methods
> key_at(index : int) -> key
> item_at(index : int) -> (key, value)
> value_at(index : int) -> value
> set_value_at(index : int, value) # no return value necessary but
> maybe key if convenient
> od[x:y:z] -> list of values ordered by key # can read a slice but
> not assign a slice
I do not see why assigning a slice is a bad idea here. The behavior
could be such:
od = OrderedDict((('Last':'Smith'), ('First':'John'), ('MI':'J'))
od[:1] = ['Brown']
od[:2] = ['Glotz', 'Joe']
od[:2] = ['Williams', 'Mary', 'Francis'] # <== ValueError
The latter would be a phenomenon similar to unpacking a tuple of the
wrong length.
> and maybe
> keys_for(fromindex : int, uptobutexcludingindex : int) -> ordered
> list of keys
Of course the naming is redundant for these in my opinion. E.g.:
key_at -> key
item_at -> item
value_at -> value
set_value_at -> set_value (not simply set)
keys_for -> keys (takes optional argument(s))
James
I mean use a data structure like an AVL tree, that can make a new dict
object quickly, whose contents are the same as the old object except
for one element (i.e. most of the contents are shared with the old
dict). You should also have ordered versions of both lists and sets.
Consider the situation with lists:
a = some 1000 element list
b = a + [23]
Now a has 1000 elements and b has 1001 elements (23 followed by a's
elements), but constructing b required copying all of a's elements.
It's sort of the same with sets:
a = 1000 element set
b = a + set((23,))
b had to copy the contents of a. By using a tree structure, you can
construct b in O(log(1000)) operations so that most of the elements
are shared with a, while at the same time you don't mutate a. That
makes it a lot easier to program in a style where you have a bunch of
slightly different versions of some set or dict flying around. For
an example of where this came up, see this thread:
For sample implemetations and API's, you could look at the Hedgehog
Lisp version of AVL trees (including a Python dict-like interface):
and Haskell's Data.Map library:
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html
I think there is something similar in the ML Basis library but I'm not
familiar with that.
For more about functional data structures, see some of Chris Okasaki's
articles:
http://www.eecs.usma.edu/webs/people/okasaki/pubs.html
Dr. Okasaki also has written a book about the topic, called "Purely
Functional Data Structures", which is very good.
That said, in Python maybe this stuff would be easier to use
improperly because Python coding style uses mutation so much. It
might be best to limit sharing to the case of frozen dicts and frozen
sets.
Actually, 'set' seems better than set_value.
>> d = {'a': 1, 'b': 2, 'c': 3}
>> for key in d:
>> # do something that relies on order of keys as specified in the constructor
It's a bit tireing having to type
>> for key in sorted(d.keys()):
>> do_somethig_with(d[key])
instead of a trustfully
>> for value in d.values():
>> do_somethig_with(value)
As far as I can see much cleaner. No black magic needed ++ sort the
dict
to another order and rely on the sort order being stable would be a
really
a nice thing to have.
My 2 cents, Jürgen
Or, maybe, like, you know, you could have two different types that
maintain two different orders?
Carl Banks
Do you mean, like, a SortedDict and an OrderedDict like I mentioned in
the post you are replying to?
What I envisage is:
d = ordereddict(a=1, x=20, b=35, m=4)
# some time later
d["e"] = 15
# later still
d["b"] = 70
d.keys() # returns ['a', 'b', 'e', 'm', 'x']
d.values() # returns [1, 70, 15, 4, 20]
So no matter _when_ you add because the underlying data structure is
ordered (by key) you can always get access in key order. Also, you
can't "sort" the ordereddict; it is in key order and that's it. The
whole point is that you can use these things and never have to sort;
if you need different orders you create extra ordereddicts with
suitably munged keys and with values that are object references to the
objects you are interested in.
In reply to James Stroud:
- The reason you can't assign a slice is that the index positions
depend on the keys, so if you do:
od[newkey] = newvalue
where newkey goes (i.e., its index position) depends on how it orders
in relation to the rest, so it could go "anywhere".
I now feel that offering a slice is wrong because the [] syntax is
used for access by key, whereas a slice would be for access by index,
so using [] for both would be v. confusing.
- James proposed better method names which I've adopted (see later),
but I don't think set_value() should be set() because that would make
it seem to be the partner of get(), whereas get() uses a key and
set_value() uses an index for lookup.
- James also suggested a name and behaviour change:
keys(fromindex : int = None, uptobutexcludingindex : int = None) -
> ordered list of keys
So keys() called on its own behaves just like dict.keys() (except that
all the keys are returned in order), but with one argument returns the
keys with index positions from the given index to the end, and with
two arguments returns the keys in the given range of indexes. (Note:
in an ordereddict values() returns all the values in key order.)
So from the earlier example:
d.key(2) # returns 'e'
d.item(3) # returns ('m', 4)
d.value(4) # returns 20
d.set_value(3, 99) # maybe returns 'm'; but anyway d.value(3) and
d['m'] now both return 99
- James (and some others) also felt that insertions should just go as
key/value pairs at the end. But that is not what I'm proposing (that's
a completely different data structure).
The whole point of ordereddict is that whatever you insert and
whenever you insert it, the ordereddict is always in key order.
Paul Rubin and at least one other person mentioned the use of an AVL
tree. At the moment I just want to see if I can get enough consensus
on an API and behaviour so that I could put in a PEP for just one
ordered data structure.
So to clarify, here's the entire API I'm proposing for ordereddict. In
all cases the ordereddict is always in (and kept in) key order, and
any method that returns a list or iterator always returns that list or
iterator (whether of keys or values) in key order:
ordereddict(dictionary=None)
The argument can be a dict or another ordereddict; all the dict()
initializer
approaches should also be supported.
ordereddict.update(dictonary=None, **kwargs)
Same as dict.update()---except that key order is preserved (a
point I won't
repeat in the others when I say "same as dict", but which is true
throughout)
@classmethod
ordereddict.fromkeys(cls, iterable, value=None) # Same as dict
ordereddict.key(index : int) -> key
Returns the index-th item's key
ordereddict.item(index : int) -> (key, value)
Returns the index-th item
ordereddict.value(index : int) -> value
Returns the index-th item's value
ordereddict.set_value(index : int, value)
Sets the index-th item's value to value; raises IndexError if
index is out of
range. If not expensive, maybe return the key.
ordereddict.copy() # Same as dict.
ordereddict.clear() # Same as dict.
ordereddict.get(key, value=None) # Same as dict
ordereddict.setdefault(key, value) # Same as dict
ordereddict.pop(key, value) # Same as dict
ordereddict.popitem() # Same as dict
ordereddict.keys(fromindex : int = None, uptoindex : int : None) ->
list of keys
Returns an ordered list of keys, or a slice of keys if one or two
indexes is given
ordereddict.values() # Same as dict
ordereddict.items() # Same as dict
ordereddict.iterkeys() # Same as dict
ordereddict.itervalues() # Same as dict
ordereddict.iteritems() # Same as dict
ordereddict.has_key() # Same as dict
Also the same as dict (and as always, working in key order):
for key in d: pass
if key in d: pass
len(d)
del d[key]
d[key]
d[key] = value
What this means is that users could drop in an ordereddict in place of
a plain dict and get the same behaviour (maybe not the same
performance!), but would now find that they could rely on the ordering
of keys, as well as having just four additional methods and one
existing method with additional optional arguments.
ordereddict.delete(index : int)
Also, the API for keys() should be
ordereddict.keys(firstindex : int = None, secondindex : int = None)
If called with no args, returns list of all keys (in key order of
course); if one arg is given, returns keys with indexes in range(0,
firstindex); if two args are given, returns keys with indexes in
range(firstindex, secondindex).
Below is a stripped-down implementation in pure python that is just 84
lines long.
(I have a proper version with blank lines and doctests which is 482
lines but I thought that was a bit long to post.)
It should work as a drop-in replacement for dict (apart from
performance), but with the keys ordered by <, so that every list or
iterator that it returns is always in key order.
The get(), has_key(), __contains__(), len(), and __getitem__() methods
are not reimplemented because the base class versions work fine.
I'm only posting it to give a better feel for the API---if someone did
a better (faster) implementation (e.g., in C), that'd be great, but
best to get consensus on an API first I think (if consensus is
possible at all!).
import bisect
class ordereddict(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self.__keys = sorted(dict.keys(self))
def update(self, *args, **kwargs):
dict.update(self, *args, **kwargs)
self.__keys = sorted(dict.keys(self))
@classmethod
def fromkeys(cls, iterable, value=None):
dictionary = cls()
for key in iterable:
dictionary[key] = value
return dictionary
def key(self, index):
return self.__keys[index]
def item(self, index):
key = self.__keys[index]
return key, self[key]
def value(self, index):
return self[self.__keys[index]]
def set_value(self, index, value):
self[self.__keys[index]] = value
def delete(self, index):
key = self.__keys[index]
del self.__keys[index]
dict.__delitem__(self, key)
def copy(self):
dictionary = ordereddict(dict.copy(self))
dictionary.__keys = self.__keys[:]
return dictionary
def clear(self):
self.__keys = []
dict.clear(self)
def setdefault(self, key, value):
if key not in self:
bisect.insort_left(self.__keys, key)
return dict.setdefault(self, key, value)
def pop(self, key, value=None):
if key not in self:
return value
i = bisect.bisect_left(self.__keys, key)
del self.__keys[i]
return dict.pop(self, key, value)
def popitem(self):
item = dict.popitem(self)
i = bisect.bisect_left(self.__keys, item[0])
del self.__keys[i]
return item
def keys(self, firstindex=None, secondindex=None):
if firstindex is not None and secondindex is None:
secondindex = firstindex
firstindex = 0
else:
if firstindex is None:
firstindex = 0
if secondindex is None:
secondindex = len(self)
return self.__keys[firstindex:secondindex]
def values(self):
return [self[key] for key in self.__keys]
def items(self):
return [(key, self[key]) for key in self.__keys]
def __iter__(self):
return iter(self.__keys)
def iterkeys(self):
return iter(self.__keys)
def itervalues(self):
for key in self.__keys:
yield self[key]
def iteritems(self):
for key in self.__keys:
yield key, self[key]
def __delitem__(self, key):
i = bisect.bisect_left(self.__keys, key)
del self.__keys[i]
dict.__delitem__(self, key)
def __setitem__(self, key, value):
if key not in self:
bisect.insort_left(self.__keys, key)
dict.__setitem__(self, key, value)
def __repr__(self):
return "ordereddict({%s})" % ", ".join(
["%r: %r" % (key, self[key]) for key in self.__keys])
which seems to be exactly what my avltree module mentioned earlier
provides.
>>> from avltree import Tree
>>> d=Tree(a=1, x=20, b=35, m=4)
>>> d["e"] = 15
>>> d["b"] = 70
>>> d.keys()
['a', 'b', 'e', 'm', 'x']
>>> d.values()
[1, 70, 15, 4, 20]
--
Antoon Pardon
Antoon,
Your AVL tree looks impressive (although it has five times the lines
of my ordereddict), but it does not appear to support dict.update()'s
API (which was extended in 2.4), so you can't do: avl.update({'g':
20}, a=9, b=22).
Also, it does not provide the key(), value(), and item() methods that
the API I proposed can support (because in an ordereddict, index
positions make sense).
If there was consensus on an API and you, me, and others had different
implementations, we could always come up with some tests to compare
their relative performance, and put the fastest one(s) forward in a
PEP. (I don't care if my own code is used or not, it is the
functionality in Python's standard library that I want, whoever's code
it is.)
It is true that I didn't follow up the API difference made to dict
since I wrote the module, but I think these are rather minor.
I don't think it would take a lot of work to correct these.
> Also, it does not provide the key(), value(), and item() methods that
> the API I proposed can support (because in an ordereddict, index
> positions make sense).
At the time I wrote my module I never had a need for these. Do you have
a use case or is it a consideration of completeness that makes you want
these? Maybe I can take a look in how to implement this, but at this
moment it doesn't sound that usefull to me.
On the other hand your API doesn't seem to allow for iterating over only
a part of the keys. Supposing all keys are strings, I can easily iterate
over all keys that start with an 'n' or with any arbitrary prefix.
That IMO seems more usefull.
> If there was consensus on an API and you, me, and others had different
> implementations, we could always come up with some tests to compare
> their relative performance, and put the fastest one(s) forward in a
> PEP. (I don't care if my own code is used or not, it is the
> functionality in Python's standard library that I want, whoever's code
> it is.)
Good luck on finding that consensus. If you really want this I fear you
will have to start writing that PEP before a consensus is reached and
hope to find a proposal that will be acceptable to the majority and
especially the BDFL.
--
Antoon Pardon
I'm sure you're right.
> > Also, it does not provide the key(), value(), and item() methods that
> > the API I proposed can support (because in an ordereddict, index
> > positions make sense).
>
> At the time I wrote my module I never had a need for these. Do you have
> a use case or is it a consideration of completeness that makes you want
> these? Maybe I can take a look in how to implement this, but at this
> moment it doesn't sound that usefull to me.
I put them in for completeness, although in some contexts I have found
the ability to ask for the n-th item to be v. useful.
> On the other hand your API doesn't seem to allow for iterating over only
> a part of the keys. Supposing all keys are strings, I can easily iterate
> over all keys that start with an 'n' or with any arbitrary prefix.
> That IMO seems more usefull.
That is an appealing feature---but I don't want to make any assumption
about keys (they could be ints, id()s, strs, or anything that is
acceptable to a dict.
There's nothing to stop you creating a PEP for your AVL tree---I'd
certainly be glad for one to be in the collections module. I'm not
advocating "only one" ordered data structure, but rather one
particular one---and I certainly hope the collections module will have
several eventually, and that other people will propose PEPs for other
data structures, such as AVL trees, B*Trees, skiplists, etc., since
all have something to offer.
> > If there was consensus on an API and you, me, and others had different
> > implementations, we could always come up with some tests to compare
> > their relative performance, and put the fastest one(s) forward in a
> > PEP. (I don't care if my own code is used or not, it is the
> > functionality in Python's standard library that I want, whoever's code
> > it is.)
>
> Good luck on finding that consensus. If you really want this I fear you
> will have to start writing that PEP before a consensus is reached and
> hope to find a proposal that will be acceptable to the majority and
> especially the BDFL.
I don't expect my API to satisfy everyone, but by making it as close
to what exists already, i.e., a dict, yet with keys that "happen" to
be ordered (and offering a few extra methods to help users exploit
that if they want), I am hoping this will make it more likely to be
acceptable.
If you don't have a use case, I advise you to drop them. As far as I
understand people's sentiments, including a feature without a use case
illustrating its usefullness will decrease your chances.
>> On the other hand your API doesn't seem to allow for iterating over only
>> a part of the keys. Supposing all keys are strings, I can easily iterate
>> over all keys that start with an 'n' or with any arbitrary prefix.
>> That IMO seems more usefull.
>
> That is an appealing feature---but I don't want to make any assumption
> about keys (they could be ints, id()s, strs, or anything that is
> acceptable to a dict.
The feature doesn't depend on any assumption. if your keys are integers
you can iterate over all keys between 121 and 8264. Iterating over all
keys that start with an 'n' just depends on the fact that all such
strings lie between the strings 'n' and 'o'.
However not all keys acceptable to a dict, will be acceptable to
a SortedDict. Some types are hashable but not completly comparable.
Those objects will not be usable as a key in a SortedDict although
they can be used as a key in an normal dict.
> There's nothing to stop you creating a PEP for your AVL tree---I'd
> certainly be glad for one to be in the collections module. I'm not
> advocating "only one" ordered data structure, but rather one
> particular one---and I certainly hope the collections module will have
> several eventually, and that other people will propose PEPs for other
> data structures, such as AVL trees, B*Trees, skiplists, etc., since
> all have something to offer.
I'm not interrested in writing a PEP. My impression from asking around
is that is too much work for too little chance to get it accepted.
That is more a personal evaluation of how I value my time and how much I
would prefer it to have my module included than about the PEP process
in itself.
If someone would like to use my avl module as a starting point for a PEP,
I may consider allowing that, but writing the PEP myself is going to
take too much time from other projects.
>> > If there was consensus on an API and you, me, and others had different
>> > implementations, we could always come up with some tests to compare
>> > their relative performance, and put the fastest one(s) forward in a
>> > PEP. (I don't care if my own code is used or not, it is the
>> > functionality in Python's standard library that I want, whoever's code
>> > it is.)
>>
>> Good luck on finding that consensus. If you really want this I fear you
>> will have to start writing that PEP before a consensus is reached and
>> hope to find a proposal that will be acceptable to the majority and
>> especially the BDFL.
>
> I don't expect my API to satisfy everyone, but by making it as close
> to what exists already, i.e., a dict, yet with keys that "happen" to
> be ordered (and offering a few extra methods to help users exploit
> that if they want), I am hoping this will make it more likely to be
> acceptable.
I wish you all the luck you can get. Maybe if you succeed I'll change
my mind about writing a PEP myself.
However I think your chances will increase if you write your module
and have it available in the cheese shop. If people start using your
module regularly, your chances of it being included in the standard
library will increase greatly.
--
Antoon Pardon
--
Antoon Pardon
I've taken your advice and put it on PyPI. PyPI isn't as easy to use
as CPAN, and the classifiers don't include "algorithms" or "data
structures" which I find surprising.
May I also make one more suggestion, to call it a "sort_ordered_dict"
(or "sortordereddict", or even better a "sorteddict"--where the "ed"
comes from "ordered")? Its hard for me to move past the established
definition of "order", as we think of tuples being ordered--as in the
first sentence of http://en.wikipedia.org/wiki/Tuple--to something that
is preserving an order according to a comparison. The distinction is so
firmly ingrained in my head that it took me a while to wake up to the
fact that you were describing something completely different than an
ordered dictionary (e.g. http://www.voidspace.org.uk/python/odict.html)
even though you were being very unambiguous with your description.
And I also think the ability to drop it in for a built-in dict is very
valuable.
James
It seems that the correct Python terminology for this is indeed
sorteddict (ordered by key), with ordereddict meaning (in insertion
order).
I guess I'll have to rename my module (although unfortunately, my book
has just gone into production and I have a (simpler) example of what I
considered to be an ordered dict, so I will be adding to the
terminology confusion). That notwithstanding, given that it is a
sorteddict, how is the API?
I must think the API good because I have been implementing, in parallel
with this discussion, my own "OrderedDict" with a very similar API (this
is part of a larger project where I recently found the need to have a
well-implemented ordered dict). The only real omission I see is to allow
instantiating a "sorted dict" with an optional cmp function--to allow
the generalization of such features as case-independence, etc.
James
I tend to create different orderings by munging the keys rather than
by having optional cmp functions (as you'll see in the sorteddict.py
docstring). I've now put sorteddict on PyPI (with docs & tests):
http://pypi.python.org/pypi/sorteddict
I'm going offline for a week, so I'll see if there's any consensus or
progress when I'm back online, and then decide whether to do a PEP or
not.
Like, you know, maybe I should read the whole article I reply to.
*sigh*
Carl Banks
The advantage of a passed cmp-function is that the context of usage
hasn't to be aware of any change in semantics. Yet for better
performance, the cmp-function might be a tad to slow, better use a
key-function like for sorted/sort.
Additionally, I'd change the parameter names of the keys()-method to
start/end and make more clear if end is inclusive or not. The
first/secondindex-names are pretty horrible IMHO, because the
essentially enumerate parameters. Who wants to work with a method
foo(arg1, arg2, arg3)
? :)
Diez