[Python-ideas] OrderedCounter and OrderedDefaultDict

325 views
Skip to first unread message

Ian Foote

unread,
Oct 16, 2015, 6:37:49 PM10/16/15
to Python-Ideas
I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python which suggests:

from collections import OrderedDict, Callable

class DefaultOrderedDict(OrderedDict):
    # Source: http://stackoverflow.com/a/6190500/562769
    def __init__(self, default_factory=None, *a, **kw):
        if (default_factory is not None and
           not isinstance(default_factory, Callable)):
            raise TypeError('first argument must be callable')
        OrderedDict.__init__(self, *a, **kw)
        self.default_factory = default_factory

    def __getitem__(self, key):
        try:
            return OrderedDict.__getitem__(self, key)
        except KeyError:
            return self.__missing__(key)

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):
        if self.default_factory is None:
            args = tuple()
        else:
            args = self.default_factory,
        return type(self), args, None, None, self.items()

    def copy(self):
        return self.__copy__()

    def __copy__(self):
        return type(self)(self.default_factory, self)

    def __deepcopy__(self, memo):
        import copy
        return type(self)(self.default_factory,
                          copy.deepcopy(self.items()))

    def __repr__(self):
        return 'OrderedDefaultDict(%s, %s)' % (self.default_factory,
                                               OrderedDict.__repr__(self))

This seems to me both easy to get wrong and hard to discover. It also risks getting out of sync with updates to collections.
I'm wondering if this is sufficient justification to add OrderedCounter and OrderedDict to collections, either directly or as recipes.

Thanks,
Ian F

Andrew Barnert via Python-ideas

unread,
Oct 16, 2015, 9:39:54 PM10/16/15
to Ian Foote, Python-Ideas
On Oct 16, 2015, at 03:36, Ian Foote <i...@feete.org> wrote:

I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:

This is already in the docs for OrderedDict, so it shouldn't have taken any googling.
from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python which suggests:

Most of the trickiness here is handling None as a factory, which is only useful for subclasses that implement a custom __missing__. To make something that works like you'd expect except for that part is a lot simpler. And I'm not sure a docs recipe that's mostly code that's unnecessary for most uses, hiding the important and simple part, would be a great idea.
Since one of them is already a recipe and the other one would probably not be useful as one, I don't think that's a great idea.

Adding OrderedDefaultDict to the module itself might be useful. Or, alternatively, put it in PyPI or ActiveState and add a link to it from the docs? (I'm pretty sure there are already implementations in both places, actually. Also, it would be nice to be able to point at a third-party implementation that works in 2.6+/3.2+ or whatever even if it only appears in the 3.6 docs.)

Andrew Barnert via Python-ideas

unread,
Oct 16, 2015, 10:01:34 PM10/16/15
to Andrew Barnert, Python-Ideas
Sorry, sent too soon…

Sent from my iPhone

On Oct 16, 2015, at 18:39, Andrew Barnert via Python-ideas <python...@python.org> wrote:

On Oct 16, 2015, at 03:36, Ian Foote <i...@feete.org> wrote:

I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:

This is already in the docs for OrderedDict, so it shouldn't have taken any googling.
from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python which suggests:

Most of the trickiness here is handling None as a factory, which is only useful for subclasses that implement a custom __missing__

… or that requires two-stage initialization, where it can initialize the base with None and then assign the factory later. Still not all that common, but it is a separate reason for the extra complexity besides a custom __missing__ that doesn't need the factory.

To make something that works like you'd expect except for that part is a lot simpler.

Off the top of my head (untested, since I'm on my phone):

class DefaultOrderedDict(OrderedDict):
    def __init__(self, default_factory, *a, **kw):
        super().__init__(*a, **kw)
        self.default_factory = default_factory
    def __getitem__(self, key):
        try:
            return super().__getitem__(key)
        except KeyError:
            return self.__missing__(key)
    def __missing__(self, key):
        self[key] = value = self.default_factory()
        return value
    def __repr__(self):
        return '{}({}, {})'.format(type(self).__name__, self.default_factory, dict.__repr__(self))

In fact, if you aren't worried about subclassing, or about being able to directly call or pass around a bound__missing__, you can make this even simpler.

I may be wrong about this being picklable/copyable without any extra help; if so, it still doesn't need the whole complicated "treat a None factory as if it doesn't exist" part.

Also, I think the SO recipe works in Python 2.4+, or whichever first added OrderedDict, while this requires 3.1 (although it should be obvious how to backport it, make sure that OrderedDict isn't an old-style class…).

Anyway, other than those issues, it seems pretty obvious, and I don't think there's any part of that functionality that's easy to get wrong. Handling a None factory in __missing__ and pickle/copy is easy to get wrong, but if you aren't going to subclass this and therefore aren't going to try to build that part, you're not going to build it wrong.

Andrew Barnert via Python-ideas

unread,
Oct 16, 2015, 10:12:10 PM10/16/15
to Andrew Barnert, Python-Ideas
Actually, forget all that; it's even simpler.

At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So:

    class OrderedDefaultDict(OrderedDict, defaultdict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory

More importantly, because __missing__ support is built into dict, despite the confusing docs for defaultdict, you don't really need defaultdict at all here:

    class OrderedDefaultDict(OrderedDict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory
        def __missing__(self, key):
            self[key] = value = default_factory()
            return value

And either of these should work with 2.5+ (according to https://docs.python.org/2/library/stdtypes.html#dict that's when dict.__missing__ was added).

Sent from my iPhone

Sven R. Kunze

unread,
Oct 20, 2015, 2:15:08 PM10/20/15
to python...@python.org
On 17.10.2015 04:08, Andrew Barnert via Python-ideas wrote:
Actually, forget all that; it's even simpler.

At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So:

    class OrderedDefaultDict(OrderedDict, defaultdict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory

Not saying that inheritance is a bad thing but to me It seems to me that ordering and default values should be orthogonal aspects of the standard dict.

Just as Sandi described it here: https://www.youtube.com/watch?v=29MAL8pJImQ

Best,
Sven

Chris Angelico

unread,
Oct 20, 2015, 5:40:43 PM10/20/15
to python-ideas
On Wed, Oct 21, 2015 at 5:14 AM, Sven R. Kunze <srk...@mail.de> wrote:
> Not saying that inheritance is a bad thing but to me It seems to me that
> ordering and default values should be orthogonal aspects of the standard
> dict.
>
> Just as Sandi described it here: https://www.youtube.com/watch?v=29MAL8pJImQ

Yeah... that video is absolutely correct... for Ruby. In Python, you
can use multiple inheritance to do that in the exactly obvious way.

class EchoRandomHouse(EchoHouse, RandomHouse): pass

When I got to the bit in that video where she says that inheritance
paints you into a corner, I went and reimplemented her example code in
Python, and it's flawless. It doesn't even matter which order you put
the two superclasses, as there's no conflict.

ChrisA
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Carl Meyer

unread,
Oct 20, 2015, 6:35:11 PM10/20/15
to python...@python.org
On 10/20/2015 03:40 PM, Chris Angelico wrote:
> On Wed, Oct 21, 2015 at 5:14 AM, Sven R. Kunze <srk...@mail.de> wrote:
>> Not saying that inheritance is a bad thing but to me It seems to me that
>> ordering and default values should be orthogonal aspects of the standard
>> dict.
>>
>> Just as Sandi described it here: https://www.youtube.com/watch?v=29MAL8pJImQ
>
> Yeah... that video is absolutely correct... for Ruby. In Python, you
> can use multiple inheritance to do that in the exactly obvious way.
>
> class EchoRandomHouse(EchoHouse, RandomHouse): pass
>
> When I got to the bit in that video where she says that inheritance
> paints you into a corner, I went and reimplemented her example code in
> Python, and it's flawless. It doesn't even matter which order you put
> the two superclasses, as there's no conflict.

I think you missed her real point, which applies both to Python and
Ruby. In her presentation, it's obscured a bit by too much discussion of
"one side or the other" code duplication (which can be "solved" with
multiple inheritance). It takes her a couple minutes more to get to the
real point, which starts at the slide "inheritance is for
specialization, not for sharing code."

One symptom of the problem is that using multiple inheritance this way
doesn't scale: the number of leaf subclasses you need grows
geometrically with the number of behavior knobs. Composition with
strategy objects doesn't have this issue.

Carl

signature.asc

Chris Angelico

unread,
Oct 20, 2015, 6:50:53 PM10/20/15
to python-ideas
On Wed, Oct 21, 2015 at 8:56 AM, Carl Meyer <ca...@oddbird.net> wrote:
> I think you missed her real point, which applies both to Python and
> Ruby. In her presentation, it's obscured a bit by too much discussion of
> "one side or the other" code duplication (which can be "solved" with
> multiple inheritance). It takes her a couple minutes more to get to the
> real point, which starts at the slide "inheritance is for
> specialization, not for sharing code."

Sure, but a dictionary with a default handler _is_ a form of
specialization - as is a dictionary that preserves order. Both of them
behave absolutely identically to a regular dict when you set
something, retrieve it, iterate over them, etc, etc, etc. She
recommends a massive superclass that's capable of any form of
injection; using inheritance allows the dict type to be broadly
unaware of the modified dictionaries that exist.

Suppose I write a SortedDict that behaves exactly the way any other
dictionary does, but when you iterate over its keys, you get them in
order. I should be able to do this without tampering with the original
dict type. In the composition model, I would need to appeal for an
"IterationOrder" feature to be added to the base dict; using
inheritance, it's as simple as:

class SortedDict(dict):
def __iter__(self):
yield from sorted(self.keys())

To me, this is the biggest benefit of inheritance: you do NOT have to
predict what someone might want to do. I can subclass someone else's
object and change how it works.

> One symptom of the problem is that using multiple inheritance this way
> doesn't scale: the number of leaf subclasses you need grows
> geometrically with the number of behavior knobs. Composition with
> strategy objects doesn't have this issue.

Sure it does... but nobody needs to know about _all_ the leaf
subclasses. How many subclasses of object are there in Python?

>>> len(object.__subclasses__())
122

And that in a bare interactive Py3, without importing anything fancy.
('import decimal' raises that to 159, for instance.)

Composition has its place, don't get me wrong. But inheritance isn't
the attractive nuisance she makes it out to be. It works just fine.

Chris Barker - NOAA Federal

unread,
Oct 20, 2015, 7:20:25 PM10/20/15
to Chris Angelico, python-ideas
It seems to me that
>> ordering and default values should be orthogonal aspects of the standard
>> dict.

Exactly -- which makes it a perfect candidate for multiple
inheritance, I.e. "Mixins"

-CHB

Steven D'Aprano

unread,
Oct 20, 2015, 9:51:24 PM10/20/15
to python...@python.org
On Wed, Oct 21, 2015 at 09:50:21AM +1100, Chris Angelico wrote:

> To me, this is the biggest benefit of inheritance: you do NOT have to
> predict what someone might want to do. I can subclass someone else's
> object and change how it works.

I think this is wrong. I think that in general you need to have a fairly
good idea of the internal workings of the class before you can inherit
from it. Otherwise, how do you know what methods need to be overwritten?

Take dict. One nuisance with inheriting from dict is that it isn't
sufficient to override __setitem__, you also have to override update and
clear as well. And possibly others -- the exact set of which methods
depend on which other methods are not documented in the dict API. Given
an arbitrary class, how can you possibly tell which methods you need to
override, or even which methods are *safe* to override?


[...]
> Composition has its place, don't get me wrong. But inheritance isn't
> the attractive nuisance she makes it out to be. It works just fine.

It doesn't work "just fine", it has risks and annoyances, and multiple
inheritance even more so. Classes have problems.

See Jack Diederich's talk "Stop Writing Classes", or at least stop
writing *stupid* classes:

http://eev.ee/blog/2013/03/03/the-controller-pattern-is-awful-and-other-oo-heresy/

As far as multiple inheritance, there are real issues with MI that
aren't solved by coming up with a nifty solution to the diamond problem.
See Michele Simionato's series of essays on super, multiple inheritance,
mixins and traits:

http://www.artima.com/weblogs/viewpost.jsp?thread=246488


Clearly classes are useful, but they aren't an unalloyed good thing, and
inheriting from them has its issues.


--
Steve

Random832

unread,
Oct 20, 2015, 10:59:23 PM10/20/15
to python...@python.org
Chris Angelico <ros...@gmail.com> writes:

> On Wed, Oct 21, 2015 at 8:56 AM, Carl Meyer <ca...@oddbird.net> wrote:
> Sure, but a dictionary with a default handler _is_ a form of
> specialization - as is a dictionary that preserves order. Both of them
> behave absolutely identically to a regular dict when you set
> something, retrieve it, iterate over them, etc, etc, etc. She
> recommends a massive superclass that's capable of any form of
> injection; using inheritance allows the dict type to be broadly
> unaware of the modified dictionaries that exist.

Even considering that OrderedDict (or a _proper_ SortedDict - other
languages' equivalent class doesn't require hashable keys because it
stores items as a sorted list) requires a complete overhaul of how the
dictionary stores items?

Really, from an implementation standpoint, it seems like a bad idea to
even inherit in this case.

Guido van Rossum

unread,
Oct 20, 2015, 11:37:26 PM10/20/15
to Steven D'Aprano, Python-Ideas
On Tue, Oct 20, 2015 at 6:50 PM, Steven D'Aprano <st...@pearwood.info> wrote:
On Wed, Oct 21, 2015 at 09:50:21AM +1100, Chris Angelico wrote:

> To me, this is the biggest benefit of inheritance: you do NOT have to
> predict what someone might want to do. I can subclass someone else's
> object and change how it works.

I think this is wrong. I think that in general you need to have a fairly
good idea of the internal workings of the class before you can inherit
from it. Otherwise, how do you know what methods need to be overwritten?
 
Right. Chris's thinking recalls the reason why inheritance at some became popular (I guess it was in the '90s). Steven explains (in the part that I've cut) why many experts have soured on it quite a bit.

Personally, I happen to think that inheritance is often useful when a number of classes are designed together (typically all at once and belonging to the same library) and also occasionally when a base class is explicitly and carefully designed to be inherited (there are beautiful things you can do with the template pattern, for example). But inheriting from an implementation that wasn't designed with your use case in mind is often asking for trouble -- if not now, then in the future when a refactored implementation is released.

You might interject, that's the fault of the implementation refactoring -- they didn't properly think about interface compatibility. But while it's usually easy enough to keep an interface compatible where it's just the user calling methods on the implementation, the "interface" presented by subclassing is much more complex -- you would have to specify exactly which method's implementation calls which other method, and you'd also have to ensure that the object is in a sane state when it calls that other method, because it *could* be the case that the latter is overridden by a subclass. It's terribly fragile, and better avoided.

--
--Guido van Rossum (python.org/~guido)

Brendan Barnwell

unread,
Oct 21, 2015, 1:55:10 AM10/21/15
to python...@python.org
On 2015-10-20 18:50, Steven D'Aprano wrote:
> Take dict. One nuisance with inheriting from dict is that it isn't
> sufficient to override __setitem__, you also have to override update and
> clear as well. And possibly others -- the exact set of which methods
> depend on which other methods are not documented in the dict API. Given
> an arbitrary class, how can you possibly tell which methods you need to
> override, or even which methods are*safe* to override?

I've always considered that (and related problems) to be one of
Python's warts. And, as you say, it's a documentation flaw.

It's true that it's easy to write classes with poorly documented APIs,
which makes them hard to extend because you don't know how they work.
But I don't think that means subclassing is not a good idea. It means
writing classes with clearly-specified APIs is a good idea.

--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no
path, and leave a trail."
--author unknown

Sven R. Kunze

unread,
Oct 21, 2015, 1:24:11 PM10/21/15
to python...@python.org
On 21.10.2015 05:36, Guido van Rossum wrote:
> Right. Chris's thinking recalls the reason why inheritance at some
> became popular (I guess it was in the '90s). Steven explains (in the
> part that I've cut) why many experts have soured on it quite a bit.
>

Yep. It's because experience shows that the usefulness is limited when
it comes to orthogonal aspects.

Furthermore, I can remember changing an algorithm's design **because**
of the lack of such built-in datastructure. That's always a bad sign.

> Personally, I happen to think that inheritance is often useful when a
> number of classes are designed together (typically all at once and
> belonging to the same library) and also occasionally when a base class
> is explicitly and carefully designed to be inherited (there are
> beautiful things you can do with the template pattern, for example).
> But inheriting from an implementation that wasn't designed with your
> use case in mind is often asking for trouble -- if not now, then in
> the future when a refactored implementation is released.

That's a pretty good observation.

> You might interject, that's the fault of the implementation
> refactoring -- they didn't properly think about interface
> compatibility. But while it's usually easy enough to keep an interface
> compatible where it's just the user calling methods on the
> implementation, the "interface" presented by subclassing is much more
> complex -- you would have to specify exactly which method's
> implementation calls which other method, and you'd also have to ensure
> that the object is in a sane state when it calls that other method,
> because it *could* be the case that the latter is overridden by a
> subclass. It's terribly fragile, and better avoided.

Maybe, that's one reason why people hesitate to write their own
OrderDefaultDict or DefaultOrderedDict. It's just not their responsibility.

Best,
Sven

Sven R. Kunze

unread,
Oct 21, 2015, 1:52:51 PM10/21/15
to python...@python.org
On 21.10.2015 00:50, Chris Angelico wrote:
She recommends a massive superclass that's capable of any form of injection

Nope. She does not.

The "superclass" is not "massive" at all. It is even slimmer as orthogonal aspects are refactored out into separate entities. In fact, it makes it even easier to test and maintain these separate aspects (the core dev should be interested in that). Furthermore, it's, of course, up to debate which aspects should be injectable and which are not.


Just imagine, I would need to use several orderings and/or a default value for missing keys:

normal_dict = dict()
ordered_dict = dict(order=dict.order_by_insert)
sorted_dict = dict(order=sorted)
sorted_default_dict = dict(order=sorted, default=int)


How many subclasses am I supposed to write, maintain and upgrade (in case Guido rewrites his precious dict implementation)? I would even allege that for sufficiently large teams and projects, there are multiple implementations with the same intent.


Please, no.


Best,
Sven


PS: the instances above are real-world examples, I remember requiring during the course of the last year.

Jonathan Slenders

unread,
Oct 21, 2015, 2:26:35 PM10/21/15
to Sven R. Kunze, python-ideas
Just want to say that I'm happy to see that lately the disadvantages of inheritance (which are already known for a very long time) are getting more attention.
It's not bad by definition, but there's so much truth in Sandy her talk and I think for many Python projects, we went way too far into "abusing" inheritance.
Actually, it's a bit unfortunate that we made inheritance so user friendly and powerful in Python that for many people it became the logical way to extend or reuse some code.

Jonathan







Andrew Barnert via Python-ideas

unread,
Oct 21, 2015, 3:31:31 PM10/21/15
to Jonathan Slenders, python-ideas
On Oct 21, 2015, at 11:25, Jonathan Slenders <jona...@slenders.be> wrote:
>
> Just want to say that I'm happy to see that lately the disadvantages of inheritance (which are already known for a very long time) are getting more attention.
> It's not bad by definition, but there's so much truth in Sandy her talk and I think for many Python projects, we went way too far into "abusing" inheritance.
> Actually, it's a bit unfortunate that we made inheritance so user friendly and powerful in Python that for many people it became the logical way to extend or reuse some code.

One of the most important things people have learned about OO over the past two decades is that subtyping, implementation extension, and implementation mixins, and interface-extending mixins (think collections.abc.Sequence adding count for you) are all different things. Compared to the other languages in existence at the time of Python 1.x or even the 2.2/2.3 changeover, it's hard to fault Python. The fact that it can easily be used to write bad code is a little unfortunate, but the fact that it can also easily be used to write good code, when other languages either don't allow some things to be expressed, force them to be expressed in clumsy or limited ways, or force you to misuse inappropriate features instead more than makes up for it. In particular, any language that has fixed structure layouts and vtables makes is much more unfriendly to both kinds of mixins, and makes interface subtyping clumsy; Python has no such problems. Yeah, maybe we could design something better in 2015,
but based on the knowledge people had at the time?

Sven R. Kunze

unread,
Oct 21, 2015, 4:57:11 PM10/21/15
to Andrew Barnert, Jonathan Slenders, python-ideas
On 21.10.2015 21:27, Andrew Barnert wrote:
On Oct 21, 2015, at 11:25, Jonathan Slenders <jona...@slenders.be> wrote:
Just want to say that I'm happy to see that lately the disadvantages of inheritance (which are already known for a very long time) are getting more attention.
It's not bad by definition, but there's so much truth in Sandy her talk and I think for many Python projects, we went way too far into "abusing" inheritance.
Actually, it's a bit unfortunate that we made inheritance so user friendly and powerful in Python that for many people it became the logical way to extend or reuse some code.
One of the most important things people have learned about OO over the past two decades is that subtyping, implementation extension, and implementation mixins, and interface-extending mixins (think collections.abc.Sequence adding count for you) are all different things. Compared to the other languages in existence at the time of Python 1.x or even the 2.2/2.3 changeover, it's hard to fault Python. The fact that it can easily be used to write bad code is a little unfortunate, but the fact that it can also easily be used to write good code, when other languages either don't allow some things to be expressed, force them to be expressed in clumsy or limited ways, or force you to misuse inappropriate features instead more than makes up for it. In particular, any language that has fixed structure layouts and vtables makes is much more unfriendly to both kinds of mixins, and makes interface subtyping clumsy; Python has no such problems. Yeah, maybe we could design something better in 2015, 
 b
ut based on the knowledge people had at the time?

True. Just because everything is possible does not necessarily mean one should do it.


@Jonathan
Inheritance has its disadvantages. The bigger the class hierarchy, the uglier it becomes. (We have this with the django view class system.)

I see the same coming for the dict discussion if people insist on having separate leaf classes for all possible orthogonal aspects. That does not scale and it's hard to test ( --> @core-devs ).


Something that's become clearer and clearer to me is that nobody could have known this in advance. It seemed like a good idea to start with several different base classes for each aspect (not knowing that they could be combine later). But now, people start to realize they might need these different aspects in a single implementation.

So, it's the time to think about how to solve it in a scalable and maintainable way. Multiple inheritance might be a solution, composition is another. This time I think composition is the better one.


Best,
Sven


PS: Funny coincidence, Sandi uses the aspect "order" for the house. The same as a dict could have an aspect "order".

Terry Reedy

unread,
Oct 21, 2015, 9:12:37 PM10/21/15
to python...@python.org
On 10/21/2015 1:41 PM, Sven R. Kunze wrote:
> On 21.10.2015 00:50, Chris Angelico wrote:
>> She recommends a massive superclass that's capable of any form of injection
>
> Nope. She does not.
>
> The "superclass" is not "massive" at all. It is even slimmer as
> orthogonal aspects are refactored out into separate entities. In fact,
> it makes it even easier to test and maintain these separate aspects (the
> core dev should be interested in that). Furthermore, it's, of course, up
> to debate which aspects should be injectable and which are not.

The dict class itself is, in a sense, a poor example for this
discussion. It is a critical part of Python's infrastructure, involved
in a large fraction of executed statements. It therefore needs to be as
fast as possible. For CPython, this means a heavily optimized C
implementation that disallows injection and that takes shortcuts like
bypassing method lookups. This makes the usefulness of subclassing limited.

Of course, before 2.2, dict (and other built-in types) could not even be
subclassed. UserDict is the 'user base dict class', meant to be
subclassed by users. A solution to UserDict being too slow could be a C
accelerator that did not bypass method lookups. A revised UserDict
could be designed for injection in the way being discussed.

--
Terry Jan Reedy

Greg Ewing

unread,
Oct 21, 2015, 11:49:38 PM10/21/15
to python...@python.org
Carl Meyer wrote:
> It takes her a couple minutes more to get to the
> real point, which starts at the slide "inheritance is for
> specialization, not for sharing code."

I'm not sure there's any content in that statement.
In a duck-typed language, sharing code is the *only*
reason to use inheritance.

Also, what is "specialisation" anyway? Any time
you have two objects with the same interface, and
mostly-similar but slightly different behaviours,
you could regard one as being a specialisation of
the other. Whether they share any code or not is
an implementation detail.

I think the real issue is that if you specialise
something by inheriting and overriding random
methods, you make your class fragile with respect
to changes in the base class. If the author of
the base class changes something undocumented about
the way its methods interact with each other, your
class may break.

With pluggable behaviours, there is a well-defined
interface between the main class and the classes
representing the behaviours, so that sort of thing
is much less likely to happen.

You could get the same result with inheritance by
designating some methods as "designed to be
overridden", and documenting how they interact with
the rest of the class. Effectively you are then
defining an interface *within* the class and
between the class and its subclasses. This is
really just another implementation of pluggable
behaviours where the plugging mechanism consists
of overriding one or more methods.

So to summarise, I would say that the issue here
isn't really composition vs. inheritance, but
structured vs. ad-hoc behaviour modification.

I wouldn't say that one is always better than the
other. Sometimes you need to do things that the
author of a class didn't anticipate, and then the
ability to override things in an ad-hoc manner is
very useful. But I think the speaker is right to
point out that doing *everything* that way can
lead to a lot of trouble.

--
Greg

Sven R. Kunze

unread,
Oct 22, 2015, 12:09:33 PM10/22/15
to python...@python.org
On 22.10.2015 03:11, Terry Reedy wrote:
On 10/21/2015 1:41 PM, Sven R. Kunze wrote:
The "superclass" is not "massive" at all. It is even slimmer as
orthogonal aspects are refactored out into separate entities. In fact,
it makes it even easier to test and maintain these separate aspects (the
core dev should be interested in that). Furthermore, it's, of course, up
to debate which aspects should be injectable and which are not.

The dict class itself is, in a sense, a poor example for this discussion.  It is a critical part of Python's infrastructure, involved in a large fraction of executed statements.  It therefore needs to be as fast as possible.  For CPython, this means a heavily optimized C implementation that disallows injection and that takes shortcuts like bypassing method lookups.  This makes the usefulness of subclassing limited.

The discussion is about dict:


normal_dict = dict()
ordered_dict = dict(order=dict.order_by_insert)
sorted_dict = dict(order=sorted)
sorted_default_dict = dict(order=sorted, default=int)

Why couldn't dict() or {} redirect to the super-fast built-in C-implementation whereas dict(order=sorted, default=int) redirects to some more feature rich one?

As you see, I for one don't see a contradiction between performance and features.

Best,
Sven

Wes Turner

unread,
Nov 13, 2015, 11:47:11 PM11/13/15
to Andrew Barnert, Python-Ideas
On Fri, Oct 16, 2015 at 9:08 PM, Andrew Barnert via Python-ideas <python...@python.org> wrote:
Actually, forget all that; it's even simpler.

At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So:

    class OrderedDefaultDict(OrderedDict, defaultdict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory

More importantly, because __missing__ support is built into dict, despite the confusing docs for defaultdict, you don't really need defaultdict at all here:

    class OrderedDefaultDict(OrderedDict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory
        def __missing__(self, key):
            self[key] = value = default_factory()
            return value

And either of these should work with 2.5+ (according to https://docs.python.org/2/library/stdtypes.html#dict that's when dict.__missing__ was added).


Thanks!

- [ ] Could/should maybe either of these make it into the standard library,
that would save a fair amount of copying.

.. Great in combination w/ dict views:



 

Andrew Barnert via Python-ideas

unread,
Nov 14, 2015, 12:20:11 AM11/14/15
to Wes Turner, Python-Ideas
On Nov 13, 2015, at 20:46, Wes Turner <wes.t...@gmail.com> wrote:

On Fri, Oct 16, 2015 at 9:08 PM, Andrew Barnert via Python-ideas <python...@python.org> wrote:
Actually, forget all that; it's even simpler.

At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So:

    class OrderedDefaultDict(OrderedDict, defaultdict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory

More importantly, because __missing__ support is built into dict, despite the confusing docs for defaultdict, you don't really need defaultdict at all here:

    class OrderedDefaultDict(OrderedDict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory
        def __missing__(self, key):
            self[key] = value = default_factory()
            return value

And either of these should work with 2.5+ (according to https://docs.python.org/2/library/stdtypes.html#dict that's when dict.__missing__ was added).


Thanks!

- [ ] Could/should maybe either of these make it into the standard library,
that would save a fair amount of copying.

You could say the same about everything in the recipes, every one-liner like "def identity(x): return x", and so on. But most such things aren't used that often, and there's a cost to putting them in the stdlib—more for people to learn and remember, more code to be maintained, bigger downloads, etc. So just saying "maybe it would save some copying" isn't an argument for adding it to the stdlib.

Adding OrderedDefaultDict as a docs recipe (as OrderedCounter already is) might be worth doing. 
Reorganizing the docs a bit to make it more obvious (by better highlighting __missing__, and making it clear that it's a method of dict rather than something special about defaultdict) seems even more likely to be worth it. If someone has an idea of how it should read, just file a docs bug and submit a patch and see if whoever's in charge of that area says it needs to come back here.
Of course, and that works out of the box.

Wes Turner

unread,
Feb 12, 2016, 5:20:32 PM2/12/16
to Andrew Barnert, Python-Ideas
This seems to keep a consistent __init__ signature with OrderedDict (by .pop()-ing 'default_factory' from kwargs instead of specifying as a positionalkwarg):

class OrderedDefaultDict(OrderedDict):

    def __init__(self, *a, **kw):
        default_factory = kw.pop('default_factory', self.__class__)
        OrderedDict.__init__(self, *a, **kw)
        self.default_factory = default_factory

    def __missing__(self, key):
        self[key] = value = self.default_factory()
        return value

I've added a few tests (as well as to_json, and _repr_json_
https://gist.github.com/westurner/be22dba8110be099a35e/c1a3a7394e401d4742df0617900bde6ab2643300#file-ordereddefaultdict-py-L120-L122

(Without this fix, 
json.loads(output_json, object_pairs_hook=OrderedDefaultDict)
doesn't seem to work).

Reply all
Reply to author
Forward
0 new messages