Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Chanelling Guido - dict subclasses

103 views
Skip to first unread message

Steven D'Aprano

unread,
Jan 14, 2014, 8:27:36 PM1/14/14
to
Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
discussion involving at least two PEPs, about bytes/str compatibility.
But I don't want to talk about that. (Oh gods, I *really* don't want to
talk about that...)

In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

[quote]
From: Guido van Rossum <gu...@python.org>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.


class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?


--
Steven

Ned Batchelder

unread,
Jan 14, 2014, 9:04:22 PM1/14/14
to pytho...@python.org
One problem with it is that there are lots of ways of setting values in
the dict, and they don't use your __setitem__:

>>> tod = TextOnlyDict()
>>> tod.update({1: "haha"})
>>>

This is what you're getting at with your "need to override more methods
too", but it turns out to be a pain to override enough methods.

I don't know if that is what Guido was getting at, I suspect he was
talking at a more refined "principles of object design" level rather
than "dicts don't happen to work that way" level.

Also, I've never done it, but I understand that deriving from
collections.MutableMapping avoids this problem.

--
Ned Batchelder, http://nedbatchelder.com

Terry Reedy

unread,
Jan 14, 2014, 10:48:09 PM1/14/14
to pytho...@python.org
On 1/14/2014 8:27 PM, Steven D'Aprano wrote:

> In the midst of that discussion, Guido van Rossum made a comment about
> subclassing dicts:
>
> [quote]
> From: Guido van Rossum <gu...@python.org>
> Date: Tue, 14 Jan 2014 12:06:32 -0800
> Subject: Re: [Python-Dev] PEP 460 reboot
>
> Personally I wouldn't add any words suggesting or referring
> to the option of creation another class for this purpose. You
> wouldn't recommend subclassing dict for constraining the
> types of keys or values, would you?
> [end quote]
>
> https://mail.python.org/pipermail/python-dev/2014-January/131537.html
>
> This surprises me,

I was slightly surprised too. I understand not wanting to add a subclass
to stdlib, but I believe this was about adding words to the doc. Perhaps
he did not want to over-emphasize one particular possible subclass by
putting the words in the doc.

--
Terry Jan Reedy

F

unread,
Jan 15, 2014, 2:00:48 AM1/15/14
to
I can't speak for Guido but I think it is messy and unnatural and will lead to user frustration.
As a user, I would expect a dict to take any hashable as key and any object as value when using one. I would probably just provide a __getitem__ method in a normal class in your case.

This said I have overriden dict before, but my child class only added to dict, I didn't change it's underlying behaviour so you can use my class(es) as a vanilla dict everywhere, which enforcing types would have destroyed.
--
yrNews Usenet Reader for iOS
http://appstore.com/yrNewsUsenetReader

Peter Otten

unread,
Jan 15, 2014, 3:40:33 AM1/15/14
to pytho...@python.org
Steven D'Aprano wrote:

> In the midst of that discussion, Guido van Rossum made a comment about
> subclassing dicts:
>
> [quote]

> Personally I wouldn't add any words suggesting or referring
> to the option of creation another class for this purpose. You
> wouldn't recommend subclassing dict for constraining the
> types of keys or values, would you?
> [end quote]

> This surprises me, and rather than bother Python-Dev (where it will
> likely be lost in the noise, and certain will be off-topic), I'm hoping
> there may be someone here who is willing to attempt to channel GvR. I
> would have thought that subclassing dict for the purpose of constraining
> the type of keys or values would be precisely an excellent use of
> subclassing.
>
>
> class TextOnlyDict(dict):
> def __setitem__(self, key, value):
> if not isinstance(key, str):
> raise TypeError

Personally I feel dirty whenever I write Python code that defeats duck-
typing -- so I would not /recommend/ any isinstance() check.
I realize that this is not an argument...

PS: I tried to read GvR's remark in context, but failed. It's about time to
to revolt and temporarily install the FLUFL as our leader, long enough to
revoke Guido's top-posting license, but not long enough to reintroduce the
<> operator...

Mark Lawrence

unread,
Jan 15, 2014, 4:10:38 AM1/15/14
to pytho...@python.org
On 15/01/2014 01:27, Steven D'Aprano wrote:
> Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
> discussion involving at least two PEPs, about bytes/str compatibility.
> But I don't want to talk about that. (Oh gods, I *really* don't want to
> talk about that...)

+ trillions

>
> In the midst of that discussion, Guido van Rossum made a comment about
> subclassing dicts:
>
> [quote]
> From: Guido van Rossum <gu...@python.org>
> Date: Tue, 14 Jan 2014 12:06:32 -0800
> Subject: Re: [Python-Dev] PEP 460 reboot
>
> Personally I wouldn't add any words suggesting or referring
> to the option of creation another class for this purpose. You
> wouldn't recommend subclassing dict for constraining the
> types of keys or values, would you?
> [end quote]
>
> https://mail.python.org/pipermail/python-dev/2014-January/131537.html
>
> This surprises me, and rather than bother Python-Dev (where it will
> likely be lost in the noise, and certain will be off-topic), I'm hoping
> there may be someone here who is willing to attempt to channel GvR. I
> would have thought that subclassing dict for the purpose of constraining
> the type of keys or values would be precisely an excellent use of
> subclassing.

Exactly what I was thinking.

>
>
> class TextOnlyDict(dict):
> def __setitem__(self, key, value):
> if not isinstance(key, str):
> raise TypeError
> super().__setitem__(key, value)
> # need to override more methods too
>
>
> But reading Guido, I think he's saying that wouldn't be a good idea. I
> don't get it -- it's not a violation of the Liskov Substitution
> Principle, because it's more restrictive, not less. What am I missing?
>
>

Couple of replies I noted from Ned Batchelder and Terry Reedy. Smacked
bottom for Peter Otten, how dare he? :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Tim Chase

unread,
Jan 15, 2014, 6:03:25 AM1/15/14
to pytho...@python.org
On 2014-01-15 01:27, Steven D'Aprano wrote:
> class TextOnlyDict(dict):
> def __setitem__(self, key, value):
> if not isinstance(key, str):
> raise TypeError
> super().__setitem__(key, value)
> # need to override more methods too
>
>
> But reading Guido, I think he's saying that wouldn't be a good
> idea. I don't get it -- it's not a violation of the Liskov
> Substitution Principle, because it's more restrictive, not less.
> What am I missing?

Just as an observation, this seems almost exactly what anydbm does,
behaving like a dict (whether it inherits from dict, or just
duck-types like a dict), but with the limitation that keys/values need
to be strings.

-tkc


John Ladasky

unread,
Jan 15, 2014, 11:51:28 AM1/15/14
to
On Wednesday, January 15, 2014 12:40:33 AM UTC-8, Peter Otten wrote:
> Personally I feel dirty whenever I write Python code that defeats duck-
> typing -- so I would not /recommend/ any isinstance() check.

While I am inclined to agree, I have yet to see a solution to the problem of flattening nested lists/tuples which avoids isinstance(). If anyone has written one, I would like to see it, and consider its merits.

Peter Otten

unread,
Jan 15, 2014, 1:35:51 PM1/15/14
to pytho...@python.org
Well, you should always be able to find some property that discriminates
what you want to treat as sequences from what you want to treat as atoms.

(flatten() Adapted from a nine-year-old post by Nick Craig-Wood
<https://mail.python.org/pipermail/python-list/2004-December/288112.html>)

>>> def flatten(items, check):
... if check(items):
... for item in items:
... yield from flatten(item, check)
... else:
... yield items
...
>>> items = [1, 2, (3, 4), [5, [6, (7,)]]]
>>> print(list(flatten(items, check=lambda o: hasattr(o, "sort"))))
[1, 2, (3, 4), 5, 6, (7,)]
>>> print(list(flatten(items, check=lambda o: hasattr(o, "count"))))
[1, 2, 3, 4, 5, 6, 7]

The approach can of course break

>>> items = ["foo", 1, 2, (3, 4), [5, [6, (7,)]]]
>>> print(list(flatten(items, check=lambda o: hasattr(o, "count"))))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 2, in flatten
RuntimeError: maximum recursion depth exceeded

and I'm the first to admit that the fix below looks really odd:

>>> print(list(flatten(items, check=lambda o: hasattr(o, "count") and not
hasattr(o, "split"))))
['foo', 1, 2, 3, 4, 5, 6, 7]

In fact all of the following examples look more natural...

>>> print(list(flatten(items, check=lambda o: isinstance(o, list))))
['foo', 1, 2, (3, 4), 5, 6, (7,)]
>>> print(list(flatten(items, check=lambda o: isinstance(o, (list,
tuple)))))
['foo', 1, 2, 3, 4, 5, 6, 7]
>>> print(list(flatten(items, check=lambda o: isinstance(o, (list, tuple))
or (isinstance(o, str) and len(o) > 1))))
['f', 'o', 'o', 1, 2, 3, 4, 5, 6, 7]

... than the duck-typed variants because it doesn't matter for the problem
of flattening whether an object can be sorted or not. But in a real-world
application the "atoms" are more likely to have something in common that is
required for the problem at hand, and the check for it with

def check(obj):
return not (obj is an atom) # pseudo-code

may look more plausible.

Cameron Simpson

unread,
Jan 15, 2014, 4:42:42 PM1/15/14
to pytho...@python.org
I would expect anydbm to be duck typing: just implementing the
mapping interface and directing the various methods directly to the
DBM libraries.

The comment in question was specificly about subclassing dict.

There is a rule of thumb amongst the core devs and elder Python
programmers that it is a bad idea to subclass the basic types which
I have seen stated many times, but not explained in depth.

Naively, I would have thought subclassing dict to constraint the
key types for some special purpose seems like a fine idea. You'd need
to override .update() as well and also the initialiser. Maybe it
is harder than it seems.

The other pitfall I can see is code that does an isinstance(..,
dict) check for some reason; having discovered that it has a dict
it may behave specially. Pickle? Who knows? Personally, if it is
using isinstance instead of a direct type() check then I think it
should expect to cope with subclasses.

I've subclassed str() a number of times, most extensively as a URL
object that is a str with a bunch of utility methods, and it seems
to work well.

I've subclassed dict a few times, most extensively as the in-memory
representation of record in a multibackend data store which I use
for a bunch of things. That is also working quite well.

The benefit of subclassing dict is getting a heap of methods like
iterkeys et al for free. A from-scratch mapping has a surprising
number of methods involved.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

BTW, don't bother flaming me. I can't read.
- afd...@lims03.lerc.nasa.gov (Stephen Dennison)

Daniel da Silva

unread,
Jan 15, 2014, 7:50:53 PM1/15/14
to Steven D'Aprano, Python
On Tue, Jan 14, 2014 at 8:27 PM, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?

Just to be pedantic, this is a violation of the Liskov Substution Principle. According to Wikipedia, the principle states:

 if S is a subtype of T, then objects of type T may be replaced with objects of type S (i.e., objects of type S may be substituted for objects of type T) without altering any of the desirable properties of that program (correctness, task performed, etc.) [0]

 Since S (TextOnlyDict) is more restrictive, it cannot be replaced for T (dict) because the program may be using non-string keys.


Daniel

Gregory Ewing

unread,
Jan 15, 2014, 11:17:28 PM1/15/14
to
Daniel da Silva wrote:

> Just to be pedantic, this /is/ a violation of the Liskov Substution
> Principle. According to Wikipedia, the principle states:
>
> if S is a subtype <http://en.wikipedia.org/wiki/Subtype> of T, then
> objects of type <http://en.wikipedia.org/wiki/Datatype> T may be
> replaced with objects of type S (i.e., objects of type S may
> be /substituted/ for objects of type T) without altering any of the
> desirable properties of that program

Something everyone seems to miss when they quote the LSP
is that what the "desirable properties of the program" are
*depends on the program*.

Whenever you create a subclass, there is always *some*
difference between the behaviour of the subclass and
the base class, otherwise there would be no point in
having the subclass. Whether that difference has any
bad consequences for the program depends on what the
program does with the objects.

So you can't just look at S and T in isolation and
decide whether they satisfy the LSP or not. You need
to consider them in context.

In Python, there's a special problem with subclassing
dicts in particular: some of the core interpreter code
assumes a plain dict and bypasses the lookup of
__getitem__ and __setitem__, going straight to the
C-level implementations. If you tried to use a dict
subclass in that context that overrode those methods,
your overridden versions wouldn't get called.

But if you never use your dict subclass in that way,
there is no problem. Or if you don't override those
particular methods, there's no problem either.

If you're giving advice to someone who isn't aware
of all the fine details, "don't subclass dict" is
probably the safest thing to say. But there are
legitimate use cases for it if you know what you're
doing.

The other issue is that people are often tempted to
subclass dict in order to implement what isn't really
a dict at all, but just a custom mapping type. The
downside to that is that you end up inheriting a
bunch of dict-specific methods that don't really
make sense for your type. In that case it's usually
better to start with a fresh class that *uses* a
dict as part of its implementation, and only
exposes the methods that are really needed.

--
Greg

Devin Jeanpierre

unread,
Jan 16, 2014, 1:30:53 AM1/16/14
to John Ladasky, comp.lang.python
As long as you're the one that created the nested list structure, you
can choose to create a different structure instead, one which doesn't
require typechecking values inside your structure.

For example, os.walk has a similar kind of problem; it uses separate
lists for the subdirectories and the rest of the files, rather than
requiring you to check each child to see if it is a directory. It can
do it this way because it doesn't need to preserve the interleaved
order of directories and files, but there's other solutions for you if
you do want to preserve that order. (Although they won't be as clean
as they would be in a language with ADTs)

-- Devin
0 new messages