Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to properly override the default factory of defaultdict?

692 views
Skip to first unread message

Herman

unread,
Feb 14, 2016, 7:18:42 PM2/14/16
to
I want to pass in the key to the default_factory of defaultdict and I found
that defaultdict somehow can intercept my call to dict.__getitem__(self,
key), so my class's __getitem__ have to catch a TypeError instead instead
of KeyError. The following class is my code:

class DefaultDictWithEnhancedFactory(defaultdict):
"""Just like the standard python collections.dict,
but the default_factory takes the missing key as argument.

Args:
default_factory: A function that takes the missing key as the
argument
and return a value for the missing key.
*a: arguments passing to the defaultdict constructor
**kw: keyword arguments passing to the defaultdict constructor
"""
def __init__(self, default_factory, *a, **kw):
defaultdict.__init__(self, default_factory, *a, **kw)

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
# Normally, you would expect this line to be
# called for missing keys...
return self.default_factory(key)
except TypeError as ex:
# However, this is actually getting called
# because for some reason, defaultdict still
# intercepts the __getitem__ call and raises:
# TypeError: <lambda>() takes exactly 1 argument (0 given)
# So we have to catch that instead...
if "lambda" in str(ex):
return self.default_factory(key)

Ben Finney

unread,
Feb 14, 2016, 7:37:18 PM2/14/16
to
Herman <sors...@gmail.com> writes:

> I want to pass in the key to the default_factory of defaultdict and I
> found that defaultdict somehow can intercept my call to
> dict.__getitem__(self, key), so my class's __getitem__ have to catch a
> TypeError instead instead of KeyError. The following class is my code:

I don't see an example of using that code, so that we can use it the
same way you are.

So I'll comment on some things that may not be relevant but nevertheless
should be addressed:

> class DefaultDictWithEnhancedFactory(defaultdict):
> """Just like the standard python collections.dict,
> but the default_factory takes the missing key as argument.

Your doc string should have only a brief (about 50–60 characters) single
line synopsis. If it's longer, you may be writing something too complex.
(See PEP 257.)

> Args:

The class itself doesn't take arguments; its numerous methods do. I
think you mean these descriptions to be in the ‘__init__’ method's doc
string.

> def __init__(self, default_factory, *a, **kw):
> defaultdict.__init__(self, default_factory, *a, **kw)
>
> def __getitem__(self, key):
> try:
> return dict.__getitem__(self, key)

you are using the inheritance hierarchy but thwarting it by not using
‘super’. Instead::

super().__init__(self, default_factory, *a, **kw)

and::

super().__getitem__(self, key)

If you're not using Python 3 (and you should, for new code), ‘super’ is
a little more complex. Migrating to Python 3 has this advantage among
many others.

--
\ "Those who will not reason, are bigots, those who cannot, are |
`\ fools, and those who dare not, are slaves." —“Lord” George |
_o__) Gordon Noel Byron |
Ben Finney

Chris Angelico

unread,
Feb 14, 2016, 8:04:44 PM2/14/16
to
On Mon, Feb 15, 2016 at 11:17 AM, Herman <sors...@gmail.com> wrote:
> I want to pass in the key to the default_factory of defaultdict and I found
> that defaultdict somehow can intercept my call to dict.__getitem__(self,
> key), so my class's __getitem__ have to catch a TypeError instead instead
> of KeyError. The following class is my code:

Save yourself a lot of trouble, and just override __missing__:

class DefaultDictWithEnhancedFactory(collections.defaultdict):
def __missing__(self, key):
return self.default_factory(key)

ChrisA

Steven D'Aprano

unread,
Feb 14, 2016, 10:56:50 PM2/14/16
to
On Monday 15 February 2016 11:17, Herman wrote:

> I want to pass in the key to the default_factory of defaultdict

Just use a regular dict and define __missing__:

class MyDefaultDict(dict):
def __missing__(self, key):
return "We gotcha key %r right here!" % key


If you want a per-instance __missing__, do something like this:


class MyDefaultDict(dict):
def __init__(self, factory):
self._factory = factory
def __missing__(self, key):
return self._factory(self, key)


d = MyDefaultDict(lambda self, key: ...)


--
Steve

Gregory Ewing

unread,
Feb 15, 2016, 3:28:42 AM2/15/16
to
Herman wrote:
> I want to pass in the key to the default_factory of defaultdict and I found
> that defaultdict somehow can intercept my call to dict.__getitem__(self,
> key),

What's happening here is that defaultdict doesn't actually
override __getitem__ at all. Instead, it overrides __missing__,
which gets called by the standard dict's __getitem__ for a
missing key.

As Steven said, you don't need a defaultdict here at all,
just a dict subclass that defines __missing__ the way you
want.

--
Greg

Herman

unread,
Feb 18, 2016, 12:42:28 PM2/18/16
to
d = dictutil.DefaultDictWithEnhancedFactory(lambda k: k)
self.assertEqual("apple", d['apple'])

From: Ben Finney <ben+p...@benfinney.id.au>
>
> you are using the inheritance hierarchy but thwarting it by not using
> ‘super’. Instead::
>
> super().__init__(self, default_factory, *a, **kw)
>
> and::
>
> super().__getitem__(self, key)
> --
> \ "Those who will not reason, are bigots, those who cannot, are |
> `\ fools, and those who dare not, are slaves." —“Lord” George |
> _o__) Gordon Noel Byron |
> Ben Finney

super does not work for defaultdict. I am using python 2.7. If I use
super(defaultdict, self).__init__(default_factory, *a, **kw), I get the
error:

super(defaultdict, self).__init__(default_factory, *a, **kw)
TypeError: 'function' object is not iterable

My use case is:
d = dictutil.DefaultDictWithEnhancedFactory(lambda k: k)
self.assertEqual("apple", d['apple'])


> From: Chris Angelico <ros...@gmail.com>
>
> Save yourself a lot of trouble, and just override __missing__:
>
> class DefaultDictWithEnhancedFactory(collections.defaultdict):
> def __missing__(self, key):
> return self.default_factory(key)
>
> ChrisA
>
This works! Thanks.

From: "Steven D'Aprano" <steve+comp....@pearwood.info>
> > I want to pass in the key to the default_factory of defaultdict
>
> Just use a regular dict and define __missing__:
>
> class MyDefaultDict(dict):
> def __missing__(self, key):
> return "We gotcha key %r right here!" % key
>
>
> If you want a per-instance __missing__, do something like this:
>
>
> class MyDefaultDict(dict):
> def __init__(self, factory):
> self._factory = factory
> def __missing__(self, key):
> return self._factory(self, key)
>
>
> d = MyDefaultDict(lambda self, key: ...)
>
>
> --
> Steve
>
Look like inheriting from defaultdict is easier. I don't even have to
override the constructor as suggested by Chris Angelico above. Thanks.

Ian Kelly

unread,
Feb 19, 2016, 7:25:39 PM2/19/16
to
On Thu, Feb 18, 2016 at 10:41 AM, Herman <sors...@gmail.com> wrote:
> From: Ben Finney <ben+p...@benfinney.id.au>
>>
>> you are using the inheritance hierarchy but thwarting it by not using
>> ‘super’. Instead::
>>
>> super().__init__(self, default_factory, *a, **kw)
>>
>> and::
>>
>> super().__getitem__(self, key)
>> --
>> \ "Those who will not reason, are bigots, those who cannot, are |
>> `\ fools, and those who dare not, are slaves." —“Lord” George |
>> _o__) Gordon Noel Byron |
>> Ben Finney
>
> super does not work for defaultdict. I am using python 2.7. If I use
> super(defaultdict, self).__init__(default_factory, *a, **kw), I get the
> error:
>
> super(defaultdict, self).__init__(default_factory, *a, **kw)
> TypeError: 'function' object is not iterable

You're using it incorrectly. If your class is named
DefaultDictWithEnhancedFactory, then the super call would be:

super(DefaultDictWithEnhancedFactory, self).__init__(default_factory, *a, **kw)

You pass in the current class so that super can look up the next
class. If you pass defaultdict instead, super will think that it's
being called *by* defaultdict and call the __init__ method on its own
superclass, dict, which has a different signature.
defaultdict.__init__ is effectively skipped.

> Look like inheriting from defaultdict is easier. I don't even have to
> override the constructor as suggested by Chris Angelico above. Thanks.

True, although there's a faint code smell as this technically violates
the Liskov Substitution Principle; the default_factory attribute on
defaultdict instances is expected to be a function of zero arguments,
not one.

Chris Angelico

unread,
Feb 19, 2016, 9:19:39 PM2/19/16
to
On Sat, Feb 20, 2016 at 11:24 AM, Ian Kelly <ian.g...@gmail.com> wrote:
>> Look like inheriting from defaultdict is easier. I don't even have to
>> override the constructor as suggested by Chris Angelico above. Thanks.
>
> True, although there's a faint code smell as this technically violates
> the Liskov Substitution Principle; the default_factory attribute on
> defaultdict instances is expected to be a function of zero arguments,
> not one.

My suggestion was a minimal change against his own code, but really,
the right way to do it is to not subclass defaultdict at all - just
subclass dict and define __missing__. Then there's no violation of
LSP, and probably no super() call either.

And the correct way to use super() is with no arguments and Python 3.
Much less fiddling around, no repetition of the class name, and you're
not depending on a global name lookup that might fail:

Python 2.7:

>>> class Foo(object):
... def method(self): print("Base method")
...
>>> class Bar(Foo):
... def method(self):
... super(Bar,self).method()
... print("Derived method")
...
>>> b = Bar()
>>> del Bar
>>> b.method()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in method
NameError: global name 'Bar' is not defined

Python 3.6:
>>> class Foo(object):
... def method(self): print("Base method")
...
>>> class Bar(Foo):
... def method(self):
... super().method()
... print("Derived method")
...
>>> b = Bar()
>>> del Bar
>>> b.method()
Base method
Derived method

Yes, that's a little contrived. But imagine the mess if you use a
class decorator that returns a function, or accidentally reuse a name
at the interactive prompt, or something. Needing to look something up
just to find your own parent seems unnecessary.

(That said, though, we depend on a name lookup to perform a recursive
function call, and that's a lot more likely to be rebound than a
class. But the no-arg super is still better for the other reasons.)

ChrisA
0 new messages