[Python-Dev] Dataclasses and correct hashability

Elvis Pranskevichus

unread,

Feb 1, 2018, 7:36:08 PM2/1/18

to pytho...@python.org

There appears to be a critical omission from the current dataclass
implementation: it does not make hash=True fields immutable.

Per Python spec:

"the implementation of hashable collections requires that a key’s hash
value is immutable (if the object’s hash value changes, it will be in
the wrong hash bucket)"

Yet:

import dataclasses

@dataclasses.dataclass(hash=True)
class A:
foo: int = dataclasses.field(hash=True, compare=True)

a = A(foo=1)

s = set()
s.add(a) # s == {a}
a.foo = 2

print(a in s)
print({a} == s}
print(s == s)

# prints False False True

This looks to me like a clearly wrong behavior.

Elvis

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Eric V. Smith

unread,

Feb 1, 2018, 8:18:55 PM2/1/18

to Elvis Pranskevichus, pytho...@python.org

On 2/1/2018 7:34 PM, Elvis Pranskevichus wrote:
> There appears to be a critical omission from the current dataclass
> implementation: it does not make hash=True fields immutable.
>
> Per Python spec:
>
> "the implementation of hashable collections requires that a key’s hash
> value is immutable (if the object’s hash value changes, it will be in
> the wrong hash bucket)"
>
> Yet:
>
> import dataclasses
>
> @dataclasses.dataclass(hash=True)
> class A:
> foo: int = dataclasses.field(hash=True, compare=True)
>
> a = A(foo=1)
>
> s = set()
> s.add(a) # s == {a}
> a.foo = 2
>
> print(a in s)
> print({a} == s}
> print(s == s)
>
> # prints False False True
>
>
> This looks to me like a clearly wrong behavior.
>
>
> Elvis

Data classes do not protect you from doing the wrong thing. This is the
same as writing:

class A:
def __init__(self, foo):
self.foo = foo
def __hash__(self):
return hash((self.foo,))

You're allowed to override the parameters to dataclasses.dataclass for
cases where you know what you're doing. Consenting adults, and all.

Eric.

Eric V. Smith

unread,

Feb 1, 2018, 8:22:54 PM2/1/18

to Elvis Pranskevichus, pytho...@python.org

I should add: This is why you shouldn't override the default (hash=None)
unless you know what the consequences are. Can I ask why you want to
specify hash=True?

Eric

Elvis Pranskevichus

unread,

Feb 1, 2018, 8:31:29 PM2/1/18

to Eric V. Smith, pytho...@python.org

On Thursday, February 1, 2018 8:21:03 PM EST Eric V. Smith wrote:
> I should add: This is why you shouldn't override the default
> (hash=None) unless you know what the consequences are. Can I ask
> why you want to specify hash=True?

hash=None and compare=True leads to the same result, which, I think is
even worse.

> You're allowed to override the parameters to dataclasses.dataclass
> for cases where you know what you're doing. Consenting adults,
> and all.

I don't agree with this. You are comparing implicit dataclass
behavior with an explicit shoot-in-the-foot __hash__() definition.

Elvis

Eric V. Smith

unread,

Feb 1, 2018, 8:39:35 PM2/1/18

to Elvis Pranskevichus, pytho...@python.org

On 2/1/2018 8:29 PM, Elvis Pranskevichus wrote:
> On Thursday, February 1, 2018 8:21:03 PM EST Eric V. Smith wrote:
>> I should add: This is why you shouldn't override the default
>> (hash=None) unless you know what the consequences are. Can I ask
>> why you want to specify hash=True?
>
> hash=None and compare=True leads to the same result, which, I think is
> even worse.

Have you actually tried that?

>>> @dataclass(hash=None)
... class A:
... foo: int = field(hash=True, compare=True)
...
>>> hash(A(2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'A'

I believe the default hash=None on the class decorator does right thing.
Please provide a counter-example.

>> You're allowed to override the parameters to dataclasses.dataclass
>> for cases where you know what you're doing. Consenting adults,
>> and all.
>
> I don't agree with this. You are comparing implicit dataclass
> behavior with an explicit shoot-in-the-foot __hash__() definition.

I don't recommend ever specifying the decorator hash= parameter unless
you have an unusual use case, in which case it's on you to get it right.
There was recently a long python-dev discussion on this issue. I need to
update the PEP to reflect it, but the advice still stands: you almost
always want to use the default hash=None.

Do you have a use case for specifying a hash function on a class with
mutable instances? Maybe you want frozen=True?

Eric

Elvis Pranskevichus

unread,

Feb 1, 2018, 8:51:32 PM2/1/18

to pytho...@python.org, Eric V. Smith

On Thursday, February 1, 2018 8:37:41 PM EST Eric V. Smith wrote:
> > hash=None and compare=True leads to the same result, which, I
> > think is even worse.
>
> Have you actually tried that?

I meant this:

@dataclasses.dataclass(hash=True)
class A:
foo: int = dataclasses.field(compare=True)

> I don't recommend ever specifying the decorator hash= parameter
> unless you have an unusual use case, in which case it's on you to
> get it right.

In my experience this type of breakage is so subtle that people will
happily write code lots of code like this without noticing. My main
objection here is that the dataclass does not go far enough to prevent
obviously wrong behaviour. Or it goes too far with the whole hash/
frozen distinction.

Elvis

Nick Coghlan

unread,

Feb 2, 2018, 12:34:47 AM2/2/18

to Elvis Pranskevichus, Eric V. Smith, python-dev

On 2 February 2018 at 11:49, Elvis Pranskevichus <elp...@gmail.com> wrote:
> In my experience this type of breakage is so subtle that people will
> happily write code lots of code like this without noticing. My main
> objection here is that the dataclass does not go far enough to prevent
> obviously wrong behaviour. Or it goes too far with the whole hash/
> frozen distinction.

For 3.7, I think we should seriously considered just straight up
disallowing the "hash=True, frozen=False" combination, and instead
require folks to provide their own hash function in that case.
"Accidentally hashable" (whether by identity or field hash) isn't a
thing that data classes should be allowing to happen.

If we did that, then the public "hash" parameter could potentially be
dropped entirely for the time being - the replacement for "hash=True"
would be a "def __hash__: ..." in the body of the class definition,
and the replacement for "hash=False" would be "__hash__ = None" in the
class body.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Elvis Pranskevichus

unread,

Feb 2, 2018, 8:57:08 AM2/2/18

to pytho...@python.org, Nick Coghlan, Eric V. Smith

On Friday, February 2, 2018 12:33:04 AM EST Nick Coghlan wrote:
> For 3.7, I think we should seriously considered just straight up
> disallowing the "hash=True, frozen=False" combination, and instead
> require folks to provide their own hash function in that case.
> "Accidentally hashable" (whether by identity or field hash) isn't a
> thing that data classes should be allowing to happen.
>
> If we did that, then the public "hash" parameter could potentially
> be dropped entirely for the time being - the replacement for
> "hash=True" would be a "def __hash__: ..." in the body of the class
> definition, and the replacement for "hash=False" would be "__hash__
> = None" in the class body.

I think "frozen=True" should just imply hashability (by fields). You
can always do "__hash__ = None" to opt out. I don't see the default
hashability of an immutable object as a problem. tuples and
frozensets are hashable after all.

Elvis

Eric V. Smith

unread,

Feb 2, 2018, 10:11:22 AM2/2/18

to Nick Coghlan, python-dev

On 2/2/2018 12:33 AM, Nick Coghlan wrote:
> For 3.7, I think we should seriously considered just straight up
> disallowing the "hash=True, frozen=False" combination, and instead
> require folks to provide their own hash function in that case.
> "Accidentally hashable" (whether by identity or field hash) isn't a
> thing that data classes should be allowing to happen.
>
> If we did that, then the public "hash" parameter could potentially be
> dropped entirely for the time being - the replacement for "hash=True"
> would be a "def __hash__: ..." in the body of the class definition,
> and the replacement for "hash=False" would be "__hash__ = None" in the
> class body.

attrs has the same behavior (if you ignore how dataclasses handles the
cases where __hash__ or __eq__ already exist in the class definition).
Here's what attrs says about adding __hash__ via hash=True:

"Although not recommended, you can decide for yourself and force attrs
to create one (e.g. if the class is immutable even though you didn’t
freeze it programmatically) by passing True or not. Both of these cases
are rather special and should be used carefully."

The problem with dropping hash=True is: how would you write __hash__
yourself? It seems like a bug magnet if you're adding fields to the
class and forget to update __hash__, especially in the presence of
per-field hash=False and eq=False settings. And you'd need to make sure
it matches the generated __eq__ (if 2 objects are equal, they need to
have the same hash value).

If we're going to start disallowing things, how about the per-field
hash=True, eq=False case?

However, I don't feel very strongly about this. As I've said, I expect
the use cases for hash=True to be very, very rare. And now that we allow
overriding __hash__ in the class body without setting hash=False, there
aren't a lot of uses for hash=False, either. But we would need to think
through how you'd get the behavior of hash=False with multiple
inheritance, if that's what you wanted. Again, a very, very rare case.

In all, I think we're better off documenting best practices and making
them the default, like attrs does, and leave it to the programmer to
follow them. I realize we're handing out footguns, the alternatives seem
even more complex and are limiting.

Eric

Elvis Pranskevichus

unread,

Feb 2, 2018, 10:40:11 AM2/2/18

to pytho...@python.org, Nick Coghlan, Eric V. Smith

On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
> However, I don't feel very strongly about this. As I've said, I expect
> the use cases for hash=True to be very, very rare.

Why do you think that the requirement to make a dataclass hashable is a
"very, very rare" requirement? The moment you want to use a dataclass a
a dict key, or put it in a set, you need it to be hashable.

Just put yourself in the shoes of an average Python developer. You try
to put a dataclass in a set, you get a TypeError. Your immediate
reaction is to add "hash=True". Things appear to work. Then, you, or
someone else, decides to mutate the dataclass object and then you are
looking at a very frustrating debug session.

> In all, I think we're better off documenting best practices and making
> them the default, like attrs does, and leave it to the programmer to
> follow them. I realize we're handing out footguns

I don't think attrs doing the same thing is a valid justification. This
is a major footgun that is very easy to trigger, and there's really no
precedent in standard data types.

> the alternatives seem even more complex and are limiting.

The alternative is simple and follows the design of other standard
containers: immutable containers are hashable, mutable containers are
not. @dataclass(frozen=False) gives you a SimpleNamespace-like and
@dataclass(frozen=True) gives you a namedtuple-like. If you _really_
know what you are doing, then you can always declare an explicit
__hash__.

> The problem with dropping hash=True is: how would you write __hash__
> yourself?

Is "def __hash__(self): return hash((self.field1, self.field2))" that
hard? It is explicit, and you made a concious choice, i.e you
understand how __hash__ works. IMO, the danger of
"@dataclass(hash=True)" far overweighs whatever convenience it might
provide.

Elvis

Eric V. Smith

unread,

Feb 2, 2018, 10:53:05 AM2/2/18

to Elvis Pranskevichus, pytho...@python.org, Nick Coghlan

On 2/2/2018 10:38 AM, Elvis Pranskevichus wrote:
> On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
>> However, I don't feel very strongly about this. As I've said, I expect
>> the use cases for hash=True to be very, very rare.
>
> Why do you think that the requirement to make a dataclass hashable is a
> "very, very rare" requirement? The moment you want to use a dataclass a
> a dict key, or put it in a set, you need it to be hashable.

I was specifically talking about the case of a non-frozen, hashable
class. If you want to make a class frozen and hashable, then:

@dataclass(frozen=True)

will do just that.

The case you brought up initially is the non-frozen, hashable class.
It's that case that I think is very rare. I'll ask again: what's your
use case for wanting a non-frozen, hashable class? I'm genuinely interested.

You seem to think that hash=True means "make the class hashable". That's
not true. It means "add a __hash__" to this class". There are other,
better ways to make the class hashable, specifically frozen=True. You
might want to read all of https://bugs.python.org/issue32513 for the
background on the current behavior.

> Just put yourself in the shoes of an average Python developer. You try
> to put a dataclass in a set, you get a TypeError. Your immediate
> reaction is to add "hash=True". Things appear to work. Then, you, or
> someone else, decides to mutate the dataclass object and then you are
> looking at a very frustrating debug session.

I will be documented that the correct way to do this is frozen=True.

>> In all, I think we're better off documenting best practices and making
>> them the default, like attrs does, and leave it to the programmer to
>> follow them. I realize we're handing out footguns
>
> I don't think attrs doing the same thing is a valid justification. This
> is a major footgun that is very easy to trigger, and there's really no
> precedent in standard data types.
>
>> the alternatives seem even more complex and are limiting.
>
> The alternative is simple and follows the design of other standard
> containers: immutable containers are hashable, mutable containers are
> not. @dataclass(frozen=False) gives you a SimpleNamespace-like and
> @dataclass(frozen=True) gives you a namedtuple-like. If you _really_
> know what you are doing, then you can always declare an explicit
> __hash__.

I'm not sure what you're arguing for here. This is how dataclasses work.

>> The problem with dropping hash=True is: how would you write __hash__
>> yourself?
>
> Is "def __hash__(self): return hash((self.field1, self.field2))" that
> hard? It is explicit, and you made a concious choice, i.e you
> understand how __hash__ works.

I didn't say it was hard, I said it needed to be kept up to date as you
add fields. That is, you'd have to duplicate the field list. dataclasses
is trying to prevent you from repeating the field list anywhere.

> IMO, the danger of
> "@dataclass(hash=True)" far overweighs whatever convenience it might
> provide.

We'll just have to disagree about this. As I said, I don't feel very
strongly about it, but I lean toward leaving it in and documenting it
for what it is and does.

Eric

Paul Moore

unread,

Feb 2, 2018, 10:53:32 AM2/2/18

to Elvis Pranskevichus, Nick Coghlan, Eric V. Smith, Python Dev

On 2 February 2018 at 15:38, Elvis Pranskevichus <elp...@gmail.com> wrote:
> On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
>> However, I don't feel very strongly about this. As I've said, I expect
>> the use cases for hash=True to be very, very rare.
>
> Why do you think that the requirement to make a dataclass hashable is a
> "very, very rare" requirement? The moment you want to use a dataclass a
> a dict key, or put it in a set, you need it to be hashable.
>
> Just put yourself in the shoes of an average Python developer. You try
> to put a dataclass in a set, you get a TypeError. Your immediate
> reaction is to add "hash=True". Things appear to work. Then, you, or
> someone else, decides to mutate the dataclass object and then you are
> looking at a very frustrating debug session.

If I saw someone try to put a dataclass into a set, I'd point out that
dataclasses are *mutable*, and if they want immutable values they
should use "frozen=True". If it were me in that situation, that's what
I'd do as well. Adding hashability to a mutable object would *never*
be my immediate reaction.

To put it another way, using your words above, "The moment you want to

use a dataclass a a dict key, or put it in a set, you need it to be

*immutable*" (not hashable, unless you really know what you're doing).

Paul

Elvis Pranskevichus

unread,

Feb 2, 2018, 10:58:36 AM2/2/18

to pytho...@python.org, Nick Coghlan, Eric V. Smith

On Friday, February 2, 2018 10:51:11 AM EST Eric V. Smith wrote:
> I was specifically talking about the case of a non-frozen, hashable
> class. If you want to make a class frozen and hashable, then:
>
> @dataclass(frozen=True)
>
> will do just that.
>
> The case you brought up initially is the non-frozen, hashable class.
> It's that case that I think is very rare. I'll ask again: what's your
> use case for wanting a non-frozen, hashable class? I'm genuinely
> interested.

My point is exactly that there is _no_ valid use case, so (hash=True,
frozen=False) should not be a thing! Why are you so insistent on adding
a dangerous option which you admit is nearly useless?

Elvis

Yury Selivanov

unread,

Feb 2, 2018, 11:16:37 AM2/2/18

to Paul Moore, Nick Coghlan, Eric V. Smith, Python Dev

On Fri, Feb 2, 2018 at 10:51 AM, Paul Moore <p.f....@gmail.com> wrote:
[..]

> To put it another way, using your words above, "The moment you want to
> use a dataclass a a dict key, or put it in a set, you need it to be
> *immutable*" (not hashable, unless you really know what you're doing).

Can someone clarify what is the actual use case of someone *knowingly*
making a mutable collection hashable? Why can't that advanced user
write their own __hash__ implementation? It's easy to do so.

For what it's worth I think this argument is being blindly used to
justify the current questionable design of "dataclass(hash=True)"
being the same as "dataclass(hash=True, frozen=False) case. At least
a few other core developers are concerned with this, but all I see is
"attrs does the same thing".

Eric, in my opinion we shouldn't copy attrs. It was designed as an
external package with its own backwards-compatibility story. At some
point it was realized that "attrs(hash=True, frozen=False)" is an
anti-pattern, but it couldn't be removed at that point. Hence the
warning in the documentation. We can do better.

We are designing a new API that is going to be hugely popular. Why
can't we ship it with dangerous options prohibited in 3.7 (it's easy
to do that!) and then enable them in 3.8 when there's an actual clear
use case?

Yury

Elvis Pranskevichus

unread,

Feb 2, 2018, 11:49:59 AM2/2/18

to pytho...@python.org, Nick Coghlan, Eric V. Smith

> Because it's not the default, it will be documented as being an
> advanced use case, and it's useful in rare instances.
>
> And as I've said a number of times, both here and in other
> discussions, I'm not arguing strenuously for this. I just think that,
> given that it's not the default and it's not recommended and is
> useful in advanced cases, I would prefer to leave it in. I understand
> that you disagree with me.

Is there a real world example of such an "advanced case"?

Eric, have you read https://github.com/python-attrs/attrs/issues/136 ?
Specifically this comment from Hynek [1]:

"I never really thought about it, but yeah mutable objects shouldn’t
have a __hash__ at all."

It is clear from that thread that "hash=True" was an early design
mistake, which was left in for compatibility reasons. Why are we
copying bad design to the standard library?

Elvis

[1] https://github.com/python-attrs/attrs/issues/
136#issuecomment-277425421

Eric V. Smith

unread,

Feb 2, 2018, 12:05:03 PM2/2/18

to Elvis Pranskevichus, pytho...@python.org, Nick Coghlan

On 2/2/2018 10:56 AM, Elvis Pranskevichus wrote:
> On Friday, February 2, 2018 10:51:11 AM EST Eric V. Smith wrote:
>> I was specifically talking about the case of a non-frozen, hashable
>> class. If you want to make a class frozen and hashable, then:
>>
>> @dataclass(frozen=True)
>>
>> will do just that.
>>
>> The case you brought up initially is the non-frozen, hashable class.
>> It's that case that I think is very rare. I'll ask again: what's your
>> use case for wanting a non-frozen, hashable class? I'm genuinely
>> interested.
>
> My point is exactly that there is _no_ valid use case, so (hash=True,
> frozen=False) should not be a thing! Why are you so insistent on adding
> a dangerous option which you admit is nearly useless?

Because it's not the default, it will be documented as being an advanced
use case, and it's useful in rare instances.

And as I've said a number of times, both here and in other discussions,
I'm not arguing strenuously for this. I just think that, given that it's
not the default and it's not recommended and is useful in advanced
cases, I would prefer to leave it in. I understand that you disagree
with me.

Eric

Chris Barker

unread,

Feb 2, 2018, 3:04:57 PM2/2/18

to Elvis Pranskevichus, Nick Coghlan, Eric V. Smith, Python Dev

On Fri, Feb 2, 2018 at 7:38 AM, Elvis Pranskevichus <elp...@gmail.com> wrote:

On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
> However, I don't feel very strongly about this. As I've said, I expect
> the use cases for hash=True to be very, very rare.

Why do you think that the requirement to make a dataclass hashable is a
"very, very rare" requirement?

I think what's rare is wanting hashability without it being frozen.

Just put yourself in the shoes of an average Python developer. You try
to put a dataclass in a set, you get a TypeError. Your immediate
reaction is to add "hash=True". Things appear to work.

agreed, the easy and obvious way should be to make it frozen -- if it's hard to make it hashable and not frozen, then that's good.

But it is nice to have the __hash__ generated more you.... so maybe a flag for "unfrozen_hashable" -- really klunky, but if that is a rare need, then there you go.

Or maybe:

If either hash or frozen is specified, it become both frozen and hashable.

If both are specified, then it does what the user is asking for.

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA 98115   (206) 526-6317   main reception

Chris....@noaa.gov

Ethan Furman

unread,

Feb 2, 2018, 5:09:26 PM2/2/18

to pytho...@python.org

On 02/02/2018 08:14 AM, Yury Selivanov wrote:

> Eric, in my opinion we shouldn't copy attrs. [...]

> We are designing a new API that is going to be hugely popular. Why
> can't we ship it with dangerous options prohibited in 3.7 (it's easy
> to do that!) and then enable them in 3.8 when there's an actual clear
> use case?

+1

--
~Ethan~

Ethan Furman

unread,

Feb 2, 2018, 5:12:14 PM2/2/18

to pytho...@python.org

On 02/02/2018 08:09 AM, Eric V. Smith wrote:
> On 2/2/2018 10:56 AM, Elvis Pranskevichus wrote:

>> My point is exactly that there is _no_ valid use case, so (hash=True,
>> frozen=False) should not be a thing! Why are you so insistent on adding
>> a dangerous option which you admit is nearly useless?
>
> Because it's not the default, it will be documented as being an advanced use case, and it's useful in rare instances.

Personally, I don't think advanced use-cases need to be supported by flags as they can be supported by just writing the
__dunder__ methods.

--
~Ethan~

David Mertz

unread,

Feb 2, 2018, 7:06:11 PM2/2/18

to Ethan Furman, Python-Dev

I agree with Ethan, Elvis, and a few others. I think 'hash=True, frozen=False' should be disabled in 3.7. It's an attractive nuisance. Maybe not so attractive because its obscurity, but still with no clear reason to exist.

If many users of of dataclass find themselves defining '__hash__' with mutable dataclass, it's perfectly possible to allow the switch combination later. But taking it out after previously allowing it—even if every use in the wild is actually a bug in waiting—is harder.

Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx

Nick Coghlan

unread,

Feb 3, 2018, 1:27:02 AM2/3/18

to Eric Smith, python-dev

On 3 Feb. 2018 1:09 am, "Eric V. Smith" <er...@trueblade.com> wrote:

The problem with dropping hash=True is: how would you write __hash__ yourself? It seems like a bug magnet if you're adding fields to the class and forget to update __hash__, especially in the presence of per-field hash=False and eq=False settings. And you'd need to make sure it matches the generated __eq__ (if 2 objects are equal, they need to have the same hash value).

I think anyone that does this needs to think *very* carefully about how they do it, and offering both "hash=True" and "frozen=True" is an attractive nuisance that means people will write "hash=True" when what they wanted was "frozen=True".

In particular, having to work out how write a maintainable "__hash__" will encourage folks to separate out the hashed fields as a separate frozen subrecord or base class.

If this proves to be an intolerable burden then the short hand spelling could be added back in 3.8, but once we ship it we're going to be stuck with explaining the interactions indefinitely.

Cheers,

Nick.

Guido van Rossum

unread,

Feb 3, 2018, 1:46:31 AM2/3/18

to Nick Coghlan, Eric Smith, python-dev

It appears Eric and I are the only ones in favor of keeping the current behavior. But I still am not convinced by all the worries about "attractive nuisances" and all the other bad names this feature has been called. We don't know that any of the doomsday scenarios will happen. In my experience, usually once something is rolled out the set of issues that are *actually* raised is entirely different from the things its designers were worried about.

Please don't commit a change to roll this back without checking in with me; I have some misgivings about the problem being raised here that I still need to think through more carefully. In the meantime, please try to use dataclasses with 3.7b1!

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

--

--Guido van Rossum (python.org/~guido)

Gregory P. Smith

unread,

Feb 3, 2018, 3:07:02 PM2/3/18

to Nick Coghlan, Eric Smith, python-dev

+1 Nick put words to my chief concerns.

It is easy for an author see hash=True in existing code somewhere (cargo culting) and assume it does what they want, or quickly glance at the the API and see "hash=True" without actually taking the time to understand the implications of that to see that the parameter named "frozen" is the one they are supposed to want that _safely_ makes their dataclass properly hashable, not the more attractive parameter named "hash" that enables dangerous behavior.

Forcing people who need a __hash__ method to write it explicitly sounds like a good thing to me. I am not at all worried about someone forgetting to add a new field to an implementation of the __hash__ method when adding a new field, the fields and __hash__ method are all defined in the same place in the code.

I expect someone with a common need for always having a __hash__ method will produce a library on top of dataclasses that implements something like our current hash=True behavior. If that kind of thing turns out to be widely used, we can reintroduce the feature in dataclasses in 3.8 or later, informed by what we see practical uses actually doing.

In my practical experience, people writing Python code do not need to learn and understand the intricacies of what it means to have a __hash__ method, be hashable, or "frozen". We intentionally warn people against writing dunder methods other than __init__ in their code as they are often power features with less obvious semantics than it may seem at first glance making such code harder to maintain.

Even calling the parameter "hash=" and saying it adds a __hash__ method as the PEP currently does seems to launder the danger, washing away the "dunder smell" that adding a special considerations __hash__ method carries.

The PEP (and presumably forthcoming dataclasses module documentation) says "This is a specialized use case and should be considered carefully" which I agree with. But any time we suggest that in an API, how about having the API name make it clear that this is special and not to be done lightly? I guess i'm arguing against using "hash=" as the arg name in favor of "danger_there_be_vorpal_rabbits_hash_me_maybe=" or something more usefully similar if we're going to have it.

-gps

Ethan Furman

unread,

Feb 3, 2018, 5:03:06 PM2/3/18

to pytho...@python.org

On 02/02/2018 10:44 PM, Guido van Rossum wrote:

> It appears Eric and I are the only ones in favor of keeping the current behavior. But I still am not convinced by all
> the worries about "attractive nuisances" and all the other bad names this feature has been called. We don't know that
> any of the doomsday scenarios will happen. In my experience, usually once something is rolled out the set of issues that
> are *actually* raised is entirely different from the things its designers were worried about.

This may all be true, but consider how many times we have asked, "How does attrs handle that?"

It would be wise to also ask, "What pitfalls have attrs discovered, and what would they do different if they could?"

--
~Ethan~

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Steve Holden

unread,

Feb 3, 2018, 9:19:58 PM2/3/18

to Guido van Rossum, Nick Coghlan, Eric Smith, python-dev

As a Bear of Relatively Little Brain, I've grown up understanding, and teaching, that mutable things aren't to be used as dict keys. I'm aware that immutability isn't strictly the required condition, but it for most people, that's the primary reason for using frozen sets and tuples, for example, and immutability serves as a practical and comprehensible first approximation. So I'm at a loss to understand why I am being offered a feature that (especially during maintenance by a different developer) might be prone to bizarre errors caused by a change in hash.

I realise that this won't happen very often, but the difficulty of the debug task should surely merit at least some warning for us bears - you know, the ones that take your work and use it to do mundane things with.

On a slightly tangential note, us bears are very glad that such questions are taken seriously and discussed in such depth. Thank you all.

Steve Holden

Chris Barker - NOAA Federal

unread,

Feb 4, 2018, 3:07:59 PM2/4/18

to Eric V. Smith, Nick Coghlan, pytho...@python.org

>> IMO, the danger of
>> "@dataclass(hash=True)" far overweighs whatever convenience it might
>> provide.

Is there any reason specifying has=True could set frozen=True unless
the user specifically sets frozen=False?

Or is that already the case?

I think the folks that are concerned about this issue are quite right
— most Python users equate immutable and hashable—so the dataclass API
should reflect that.

And this would still make it easy and clear to specify the unusual
(and arguably dangerous) case of:

hash=True, frozen=False

-CHB

Guido van Rossum

unread,

Feb 5, 2018, 12:51:53 AM2/5/18

to Eric V. Smith, Python-Dev

Looks like this is turning into a major flamewar regardless of what I say. :-(

I really don't want to lose the ability to add a hash function to a mutable dataclass by flipping a flag in the decorator. I'll explain below. But I am fine if this flag has a name that clearly signals it's an unsafe thing to do.

I propose to replace the existing (as of 3.7.0b1) hash= keyword for the @dataclass decorator with a simpler flag named unsafe_hash=. This would be a simple bool (not a tri-state flag like the current hash=None|False|True). The default would be False, and the behavior then would be to add a hash function automatically only if it's safe (using the same rules as for hash=None currently). With unsafe_hash=True, a hash function would always be generated that takes all fields into account except those declared using field(hash=False). If there's already a `def __hash__` in the function I don't care what it does, maybe it should raise rather than quietly doing nothing or quietly overwriting it.

Here's my use case.

A frozen class requires a lot of discipline, since you have to compute the values of all fields before calling the constructor. A mutable class allows other initialization patterns, e.g. manually setting some fields after the instance has been constructed, or having a separate non-dunder init() method. There may be good reasons for using these patterns, e.g. the object may be part of a cycle (e.g. parent/child links in a tree). Or you may just use one of these patterns because you're a pretty casual coder. Or you're modeling something external.

My point is that once you have one of those patterns in place, changing your code to avoid them may be difficult. And yet your code may treat the objects as essentially immutable after the initialization phase (e.g. a parse tree). So if you create a dataclass and start coding like that for a while, and much later you need to put one of these into a set or use it as a dict key, switching to frozen=True may not be a quick option. And writing a __hash__ method by hand may feel like a lot of busywork. So this is where [unsafe_]hash=True would come in handy.

I think naming the flag unsafe_hash should take away most objections, since it will be clear that this is not a safe thing to do. People who don't understand the danger are likely to copy a worse solution from StackOverflow anyway. The docs can point to frozen=True and explain the danger.

Gregory P. Smith

unread,

Feb 5, 2018, 12:59:01 AM2/5/18

to gu...@python.org, Eric V. Smith, Python-Dev

+1 using unsafe_hash as a name addresses my concern. It's a good signal that there are caveats worth considering.

-gps

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org

Chris Barker

unread,

Feb 5, 2018, 1:44:27 AM2/5/18

to Gregory P. Smith, Eric V. Smith, Python-Dev

On Sun, Feb 4, 2018 at 11:57 PM, Gregory P. Smith <gr...@krypto.org> wrote:

+1 using unsafe_hash as a name addresses my concern.

mine too -- anyone surprised by using this deserves what they get :-)

-CHB

Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov

Glenn Linderman

unread,

Feb 5, 2018, 2:52:50 AM2/5/18

to pytho...@python.org

On 2/4/2018 9:49 PM, Guido van Rossum wrote:

A frozen class requires a lot of discipline, since you have to compute the values of all fields before calling the constructor. A mutable class allows other initialization patterns, e.g. manually setting some fields after the instance has been constructed, or having a separate non-dunder init() method. There may be good reasons for using these patterns, e.g. the object may be part of a cycle (e.g. parent/child links in a tree). Or you may just use one of these patterns because you're a pretty casual coder. Or you're modeling something external.

My point is that once you have one of those patterns in place, changing your code to avoid them may be difficult. And yet your code may treat the objects as essentially immutable after the initialization phase (e.g. a parse tree). So if you create a dataclass and start coding like that for a while, and much later you need to put one of these into a set or use it as a dict key, switching to frozen=True may not be a quick option. And writing a __hash__ method by hand may feel like a lot of busywork. So this is where [unsafe_]hash=True would come in handy.

I think naming the flag unsafe_hash should take away most objections, since it will be clear that this is not a safe thing to do. People who don't understand the danger are likely to copy a worse solution from StackOverflow anyway. The docs can point to frozen=True and explain the danger.

This is an interesting use case. I haven't got the internals knowledge to know just how just different mutable and immutable classes and objects are under the hood. But this use case makes me wonder if, even at the cost of some performance that "normal" immutable classes and objects might obtain, if it would be possible to use the various undisciplined initialization patterns as desired, followed by as declaration "This OBJECT is now immutable" which would calculate its HASH value, and prevent future mutations of the object?

Yes, I'm aware that the decision for immutability has historically been done at the class level, not the object level, but in my ignorance of the internals, I wonder if that is necessary, for performance or more importantly, for other reasons.

And perhaps the implementation is internally almost like two classes, one mutable, and the other immutable, and the declaration would convert the object from one to the other. But if I say more, I'd just be babbling.

Nathaniel Smith

unread,

Feb 5, 2018, 3:13:17 AM2/5/18

to Glenn Linderman, Python Dev

On Sun, Feb 4, 2018 at 11:28 PM, Glenn Linderman <v+py...@g.nevcal.com> wrote:
> This is an interesting use case. I haven't got the internals knowledge to
> know just how just different mutable and immutable classes and objects are
> under the hood. But this use case makes me wonder if, even at the cost of
> some performance that "normal" immutable classes and objects might obtain,
> if it would be possible to use the various undisciplined initialization
> patterns as desired, followed by as declaration "This OBJECT is now
> immutable" which would calculate its HASH value, and prevent future
> mutations of the object?

It would be technically possible to support something like

@dataclass(freezable=True)
class Foo:
blah: int

foo = Foo()
# Initially, object is mutable, and hash(foo) raises an error
foo.blah = 1
assertRaises(hash, foo)

# This method is automatically generated for classes with freezable=True
foo.freeze()

# Now object is immutable, and hash(foo) is allowed
assertRaises(foo.__setattr__, "blah", 2)
hash(foo)

I don't know if it's worth the complexity, but I guess it would cover
at least some of the use cases Guido raised.

-n

--
Nathaniel J. Smith -- https://vorpus.org

Glenn Linderman

unread,

Feb 5, 2018, 4:17:49 AM2/5/18

to Nathaniel Smith, Python Dev

Thanks, Nathaniel, for confirming that what I was suggesting is not impossible, even if it turns out to be undesirable for some reason, or unwanted by anyone else. But I have encountered a subset of the use cases Guido mentioned, and had to make a 2nd class to gather/hold the values of the eventual immutable class, before I could make it, because pieces of the data for the class values were obtained from different sources at different times. Once all collected, then the immutability could be obtained, the rest of the processing performed. Thrashes the allocator pretty well doing it that way, but the job got done.

Terry Reedy

unread,

Feb 5, 2018, 4:27:23 AM2/5/18

to pytho...@python.org

On 2/5/2018 2:28 AM, Glenn Linderman wrote:

> This is an interesting use case. I haven't got the internals knowledge
> to know just how just different mutable and immutable classes and
> objects are under the hood.

I believe there is no internal difference. An object is immutable if
there is not way to mutate it with Python code that not poke into
internals, such as one can do with ctypes or 3rd party extensions.
Numbers and strings have no mutation methods, including no .__init__.

A tuple is a fixed sequence of objects and has no .__init__. But if any
object in a tuple is mutable, then the tuple is. But the tuple does not
know its status, and there is no 'is_mutable' function. However,
tuple.__hash__ calls the .__hash__ method of each object and if that is
missing for one, tuple.__hash raises.

>>> hash((1, 'a', []))
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
hash((1, 'a', []))
TypeError: unhashable type: 'list'

The built-in immutable objects are mutated from their initial blank
values in the C code of their .__new__ methods. So they are only
'immutable' once constructed. Guido pointed out that users constructing
objects in Python code might reasonably do so other than only with
.__new__, but still want to treat the object as immutable once constructed.

In Lisp, for instance, lists are actually trees. To be immutable, they
can only be singly linked and must be constructed from leaf nodes to the
root (or head). Python programmers should be able to link in both
directions and start from the root, and still treat the result as frozen
and hashable.

> But this use case makes me wonder if, even
> at the cost of some performance that "normal" immutable classes and
> objects might obtain, if it would be possible to use the various
> undisciplined initialization patterns as desired, followed by as
> declaration "This OBJECT is now immutable" which would calculate its
> HASH value, and prevent future mutations of the object?

Something like this has been proposed, at least for dicts, and rejected.

--
Terry Jan Reedy

Nick Coghlan

unread,

Feb 5, 2018, 9:36:11 AM2/5/18

to Guido van Rossum, Eric V. Smith, Python-Dev

On 5 February 2018 at 15:49, Guido van Rossum <gu...@python.org> wrote:
> My point is that once you have one of those patterns in place, changing your
> code to avoid them may be difficult. And yet your code may treat the objects
> as essentially immutable after the initialization phase (e.g. a parse tree).
> So if you create a dataclass and start coding like that for a while, and
> much later you need to put one of these into a set or use it as a dict key,
> switching to frozen=True may not be a quick option. And writing a __hash__
> method by hand may feel like a lot of busywork. So this is where
> [unsafe_]hash=True would come in handy.
>
> I think naming the flag unsafe_hash should take away most objections, since
> it will be clear that this is not a safe thing to do. People who don't
> understand the danger are likely to copy a worse solution from StackOverflow
> anyway. The docs can point to frozen=True and explain the danger.

Aye, calling the flag unsafe_hash would convert me from -1 to -0.

The remaining -0 is because I think there's a different and more
robust way to tackle your example use case:

# Mutable initialization phase
>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
... a: int
... b: int
...
>>> c = Example(None, None)
>>> c
Example(a=None, b=None)
>>> c.a = 1
>>> c.b = 2
>>> c
Example(a=1, b=2)

# Frozen usage phase
>>> @dataclass(frozen=True)
... class LockedExample(Example):
... pass
...
>>> c.__class__ = LockedExample
>>> c.a = 1

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
in _frozen_setattr
raise FrozenInstanceError(f'cannot assign to field {name!r}')
dataclasses.FrozenInstanceError: cannot assign to field 'a'
>>> c.b = 2

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
in _frozen_setattr
raise FrozenInstanceError(f'cannot assign to field {name!r}')
dataclasses.FrozenInstanceError: cannot assign to field 'b'
>>> hash(c)
3713081631934410656

The gist of that approach is to assume that there will be *somewhere*
in the code where it's possible to declare the construction of the
instance "complete", and flip the nominal class over to the frozen
subclass to make further mutation unlikely, even though the true
underlying type is still the mutable version.

That said, if we do provide "unsafe_hash", then the documentation for
that flag becomes a place where we can explicitly suggest using a
frozen subclass instead.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Kirill Balunov

unread,

Feb 5, 2018, 11:03:20 AM2/5/18

to pytho...@python.org

On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum <guido at python.org> wrote:

Looks like this is turning into a major flamewar regardless of what I say.
:-(
I really don't want to lose the ability to add a hash function to a
mutable dataclass by flipping a flag in the decorator. I'll explain below.
But I am fine if this flag has a name that clearly signals it's an unsafe
thing to do.

I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
@dataclass decorator with a simpler flag named unsafe_hash=. This would be
a simple bool (not a tri-state flag like the current hash=None|False|True).
The default would be False, and the behavior then would be to add a hash
function automatically only if it's safe (using the same rules as for
hash=None currently). With unsafe_hash=True, a hash function would always
be generated that takes all fields into account except those declared using
field(hash=False). If there's already a `def __hash__` in the function I
don't care what it does, maybe it should raise rather than quietly doing
nothing or quietly overwriting it.

Here's my use case.

May be it is better to provide a special purpose function `make_unsafe_hash` in
dataclass module which will patch a dataclass, instead of to clutter @dataclass
API with arguments which are rather special case.

This `unsafe_hash` argument will constantly raise questions among ordinary users
like me, and will be possibly considered as a non-obvious design - there is a
public API but it is somehow unsafe. On the other hand, with a function, when
the user asks how to make a `frozen=False` dataclass hashable, you can suggest
to use this `make_unsafe_hash` function with all its cautions in its docs or to try to
implement __hash__ by yourself.

Also taking into account the Python approach for backward compatibility it is
better to stick with function and if it will be usefull to add a `unsafe_hash`
argument in Python 3.8. It is easier to add later than to deprecate in the future.

With kind regards,

-gdg

Guido van Rossum

unread,

Feb 5, 2018, 12:48:14 PM2/5/18

to Nick Coghlan, Eric V. Smith, Python-Dev

I'm sorry, but a solution that requires __class__ assignment is way too fragile for my taste.

Guido van Rossum

unread,

Feb 5, 2018, 12:51:06 PM2/5/18

to Kirill Balunov, Python-Dev

If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

David Mertz

unread,

Feb 5, 2018, 2:00:59 PM2/5/18

to Chris Barker, Eric V. Smith, Python-Dev

Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.

Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx

Kirill Balunov

unread,

Feb 5, 2018, 2:17:51 PM2/5/18

to Guido van Rossum, Python-Dev

2018-02-05 20:47 GMT+03:00 Guido van Rossum <gu...@python.org>:

If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.

I apologize and don't want to make unnecessary noise. But the already selected design with decorator @dataclass implies that it will mutate the freshly created class (which in its turn already limits some possibilities), or I've missed something? If you meant that everything should be defined in one place, then I basically understand your desire as the least of two evils.

With kind regards,

-gdg

Guido van Rossum

unread,

Feb 5, 2018, 2:21:16 PM2/5/18

to Kirill Balunov, Python-Dev

Yes, that's what I meant -- "afterwards" meaning after the @dataclass decorator is applied.

Paul G

unread,

Feb 5, 2018, 4:31:13 PM2/5/18

to pytho...@python.org

I don't think it matters so much whether you are stacking two decorators or a single decorator, but would an @add_unsafe_hash decorator be useful for anything *except* data classes? If not, then there's no point in having a *second* decorator that can *only* modify the first one - particularly considering @dataclass actually takes arguments.

> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> https://mail.python.org/mailman/listinfo/python-dev

> Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
>

signature.asc

Nick Coghlan

unread,

Feb 5, 2018, 5:58:21 PM2/5/18

to Guido van Rossum, Kirill Balunov, Python-Dev

On 6 February 2018 at 03:47, Guido van Rossum <gu...@python.org> wrote:
> If there's going to be an API for it, it should be in the class, not
> something that mutates the class afterwards.

Something I realised after posting the __class__ setting idea is that
you can actually use a comparable trick to inject an unsafe hash from
the frozen version into the mutable version:

>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
... a: int
... b: int
...

>>> c = Example(1, 2)
>>> hash(c)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

TypeError: unhashable type: 'Example'

>>> @dataclass(frozen=True)
... class LockedExample(Example):
... pass
...

>>> Example.__hash__ = LockedExample.__hash__
>>> hash(c)
3713081631934410656

So "unsafe_hash=True" would just be a shorthand spelling of that which
skips creating the full frozen version of the class (and with the
explicit parameter, we can better document the risks of making
something hashable without also freezing it post-creation).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Ivan Levkivskyi

unread,

Feb 5, 2018, 6:55:55 PM2/5/18

to Nick Coghlan, Kirill Balunov, Python-Dev

Just wanted to add my 5 cents here. I am a bit surprised how people are scared by adding `__hash__` to mutable classes.

From my experience it is quite normal, I was always thinking about `hash()` as hashing a _value_,

and never as hashing _identity_, if I would need the latter, there is a different function for this, `id()`.

Moreover, I often did this in situations where dataclasses would be useful: a class with several fields,

necessary dunders, and few other methods (record-like classes).

My motivation was mostly speed-up by memorization.

To be fair my use cases were mostly about some tree-like strictures, but this is probably a coincidence.

FWIW it is not a super-safe practice, but neither super-dangerous.

--

Ivan

Unsubscribe: https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com

Lukasz Langa

unread,

Feb 6, 2018, 10:35:21 AM2/6/18

to Ivan Levkivskyi, Nick Coghlan, Kirill Balunov, Python-Dev

To add a counter-example that I'm painfully familiar with: the old Thrift for Python makes its (mutable) structures hashable. This is "useful" because you can memoize functions that take Thrift structures as arguments. You can key dictionaries with them. And so on.

Unfortunately, more often then not this same structure is passed around and inevitably gets mutated in place. Everything breaks. What should be memoized isn't (and in pathological cases, what shouldn't be memoized is), dictionary access *using the same object* raises unexpected key errors but iterating on the dictionary reveals the key.

It's clearly clowntown and in hindsight we shouldn't have enabled this behavior. But we can't go back since too much code relies on hashability now and it's "mostly" fine.

As a solution, the new asyncio-only Thrift implementation for Python uses C-level structures to make sure they are truly immutable in Python. We hash them and cache them like there's no tomorrow.

- Ł

Unsubscribe: https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl

signature.asc

Steven D'Aprano

unread,

Feb 6, 2018, 12:28:14 PM2/6/18

to pytho...@python.org

On Mon, Feb 05, 2018 at 10:50:21AM -0800, David Mertz wrote:

> Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.

(I don't mean to pick on David specifically, I had to reply to some
message in this thread and I just picked his.)

I'm rather gobsmacked at the attitudes of many people here about hashing
data classes. I thought *I* was the cynical pessimist who didn't have a
high opinion of the quality of the average programmer, but according to
this thread apparently I'm positively Pollyanna-esque for believing that
most people will realise that if an API offers separate switches for
hashable and frozen, you need to set *both* if you want both.

Greg Smith even says that writing dunders apart from __init__ is a code
smell, and warns people not to write dunders. Seriously? I get that
__hash__ is hard to write correctly, which is why we have a hash=True to
do the hard work for us, but I can't help feeling that at the point
we're saying "don't write dunders, any dunder, you'll only do it wrong"
we have crossed over to the wrong side of the pessimist/optimist line.

But here we are: talking about naming a perfectly reasonable argument
"unsafe_hash". Why are we trying to frighten people?

There is nothing unsafe about a DataClass with hash=True, frozen=True,
but this scheme means that even people who know what they're doing will
write unsafe_hash=True, frozen=True, as if hashability was some sort of
hand grenade waiting to go off.

Perhaps we ought to deprecate __hash__ and start calling it
__danger_danger_hash__ too? No, I don't think so.

In the past, we've (rightly!) rejected proposals to call things like
eval "unsafe_eval", and that really is dangerously unsafe when used
naively with untrusted, unsanitised data. Hashing mutable objects by
accident might be annoyingly difficult and frustrating to debug, but
code injection attacks can lead to identity theft and worse, serious
consequences for real people.

I'm 100% in favour of programmer education, but I think this label is
*miseducation*. We're suggesting that hashability is unsafe, regardless
of whether the object is frozen or not.

I'd far prefer to get a runtime warning:

"Are you sure you want hash=True without frozen=True?"

(or words to that extent) rather than burden all uses of the hash
parameter, good and bad, with the unsafe label.

--
Steve

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,

Feb 6, 2018, 12:40:46 PM2/6/18

to Steven D'Aprano, Python-Dev

Where do you get the impression that one would have to explicitly request __hash__ if frozen=True is set? To the contrary, my proposal is for @dataclass to automatically add a __hash__ method when frozen=True is set. This is what the code currently released as 3.7.0b1 does if hash=None (the default).

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

Ethan Furman

unread,

Feb 6, 2018, 1:12:36 PM2/6/18

to pytho...@python.org

On 02/06/2018 09:38 AM, Guido van Rossum wrote:

> Where do you get the impression that one would have to explicitly request __hash__ if frozen=True is set? To the
> contrary, my proposal is for @dataclass to automatically add a __hash__ method when frozen=True is set. This is what the
> code currently released as 3.7.0b1 does if hash=None (the default).

Which is my issue with the naming -- although, really, it's more with the parameter/argument: in a hand-written class,

__hash__ = None

means the object in is not hashable, but with the decorator:

@dataclass(..., hash=None, ...)

it means something else.

My preference for "fixing" the issue:

1) make the default be a custom object (not None), so that `hash=None`
means disable hashing

2) change the param name -- maybe to `add_hash` (I agree with D'Aprano
that `unsafe_hash` can be misleading)

--
~Ethan~

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,

Feb 6, 2018, 2:27:35 PM2/6/18

to Ethan Furman, Python-Dev

We may be in violent agreement.

I propose *not* to add a way to *disable* hashing when the rest of the flags to @dataclass() would indicate that it's safe to add a __hash__ method.

I propose that with @dataclass(unsafe_hash=False) (the default), a __hash__ method is added when the following conditions are true:

- frozen=True (not the default)

- compare=True (the default)

- no __hash__ method is defined in the class

If we instead use @dataclass(unsafe_hash=True), a __hash__ will be added regardless of the other flags, but if a __hash__ method is present, an exception is raised.

Other values (e.g. unsafe_hash=None) are illegal for this flag.

Note that the the hash= flag to the field() function is unchanged from what's currently in PEP 557 or in the implementation in 3.7.0b1. In particular, the generated __hash__ method will disregard fields declared using field(hash=False). It will also disregard fields declared using field(compare=False, hash=False|None).

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

Ethan Furman

unread,

Feb 6, 2018, 2:41:33 PM2/6/18

to Python-Dev

On 02/06/2018 11:18 AM, Guido van Rossum wrote:

> We may be in violent agreement.
>
> I propose *not* to add a way to *disable* hashing when the rest of the flags to @dataclass() would indicate that it's
> safe to add a __hash__ method.

Okay.

> I propose that with @dataclass(unsafe_hash=False) (the default), a __hash__ method is added when the following
> conditions are true:
>
> - frozen=True (not the default)
> - compare=True (the default)
> - no __hash__ method is defined in the class
>
> If we instead use @dataclass(unsafe_hash=True), a __hash__ will be added regardless of the other flags, but if a
> __hash__ method is present, an exception is raised.
>
> Other values (e.g. unsafe_hash=None) are illegal for this flag.

Ah! Excellent, that greatly allays my worries.

> Note that the the hash= flag to the field() function is unchanged from what's currently in PEP 557 or in the
> implementation in 3.7.0b1. In particular, the generated __hash__ method will disregard fields declared using
> field(hash=False). It will also disregard fields declared using field(compare=False, hash=False|None).

It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is, mutable data is involved in the hash
calculation), but there still seems to be one possibility for an "unsafe_hash" to actually be safe -- that is, if only
immutable fields are used in __eq__, then dataclass could safely generate a hash for us.

Do we have a way to know if the equality fields are hashable? I suppose we could check each one for a for a non-None
__hash__. Then we could modify that first condition from

- frozen=True

to

- frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for eq_field in equality_fields)

Thoughts?

On a different note, should the PEP be updated with the current signature? It still talks about hash=None being the
default.

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

David Mertz

unread,

Feb 6, 2018, 2:59:09 PM2/6/18

to Ethan Furman, Python-Dev

Honestly, the name I would most want for the keyword argument is '_hash'. That carries the semantics I desire.

Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx

Eric V. Smith

unread,

Feb 6, 2018, 3:05:16 PM2/6/18

to Ethan Furman, Python-Dev

On 2/6/18 2:40 PM, Ethan Furman wrote:
> On a different note, should the PEP be updated with the current
> signature? It still talks about hash=None being the default.

Once we've reached an agreement, I'll update the PEP. I don't think
we're there quite yet.

Eric

Guido van Rossum

unread,

Feb 6, 2018, 3:28:58 PM2/6/18

to David Mertz, Python-Dev

That's much less self-descriptive and harder to search Google or StackOverflow for. It's also easier to overlook. We really want to send the signal that this is unsafe and requires serious consideration before it is turned on.

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

Guido van Rossum

unread,

Feb 6, 2018, 3:33:05 PM2/6/18

to Ethan Furman, Python-Dev

On Tue, Feb 6, 2018 at 11:40 AM, Ethan Furman <et...@stoneleaf.us> wrote:

It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is, mutable data is involved in the hash calculation), but there still seems to be one possibility for an "unsafe_hash" to actually be safe -- that is, if only immutable fields are used in __eq__, then dataclass could safely generate a hash for us.

Do we have a way to know if the equality fields are hashable? I suppose we could check each one for a for a non-None __hash__. Then we could modify that first condition from

- frozen=True

to

- frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for eq_field in equality_fields)

There seems to be a misunderstanding underlying these questions. Even if all fields have an immutable type (e.g. all ints, supporting __eq__ and __hash__), if the containing class isn't frozen, they can be assigned to. E.g.

@dataclass()

class Point:

x: int

y: int

p = Point(1, 1)

p.x = 2 # This is legal

The only way to make that assignment to p.x illegal is to make the *class* frozen (using @dataclass(frozen=True)) -- nothing we can do about the *field* will change this.

Of course if you use @dataclass(frozen=True, unsafe_hash=True) you may still get a safe hash. :-)

Ethan Furman

unread,

Feb 6, 2018, 3:45:38 PM2/6/18

to Python-Dev

On 02/06/2018 12:24 PM, Guido van Rossum wrote:

> On Tue, Feb 6, 2018 at 11:40 AM, Ethan Furman wrote:

>> It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is,
>> mutable data is involved in the hash calculation), but there still seems
>> to be one possibility for an "unsafe_hash" to actually be safe -- that is,
>> if only immutable fields are used in __eq__, then dataclass could safely
>> generate a hash for us.
>>

>> Do we have a way to know if the equality fields are hashable?Â I suppose
>> we could check each one for a for a non-None __hash__.Â Then we could

>> modify that first condition from
>>
>> - frozen=True
>>
>> to
>>
>> - frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for
>> eq_field in equality_fields)
>
> There seems to be a misunderstanding underlying these questions. Even if
> all fields have an immutable type (e.g. all ints, supporting __eq__ and
> __hash__), if the containing class isn't frozen, they can be assigned to.
> E.g.
>
> @dataclass()
> class Point:
> x: int
> y: int
>
> p = Point(1, 1)
> p.x = 2 # This is legal
>
> The only way to make that assignment to p.x illegal is to make the *class*
> frozen (using @dataclass(frozen=True)) -- nothing we can do about the *field*
> will change this.

Oh, right. When I was thinking this I thought a field could be frozen individually, didn't find the option at the field
level when I checked the PEP, and then promptly forgot and suggested it anyway.

Although, couldn't we add a field-level frozen attribute (using property for the implementation), and check that all
equality fields are properties as well as hashable?

Guido van Rossum

unread,

Feb 6, 2018, 4:08:20 PM2/6/18

to Ethan Furman, Python-Dev

On Tue, Feb 6, 2018 at 12:44 PM, Ethan Furman <et...@stoneleaf.us> wrote:

Although, couldn't we add a field-level frozen attribute (using property for the implementation), and check that all equality fields are properties as well as hashable?

That would be a totally unrelated feature request. Let's wait whether it's actually needed a lot before designing it.

Eric V. Smith

unread,

Feb 6, 2018, 8:40:41 PM2/6/18

to gu...@python.org, Python-Dev

Sorry for the late reply. Still recovering from a computer failure.

My only concern with this approach is: what if you don’t want any __hash__ added? Say you want to use your base class’s hashing. I guess you could always “del cls.__hash__” after the class is created, but it’s not elegant.

That’s what we got from the tri-state option: never add (False), always add (True), or add if it’s safe (None).

--

Eric

On Feb 5, 2018, at 12:49 AM, Guido van Rossum <gu...@python.org> wrote:

Looks like this is turning into a major flamewar regardless of what I say. :-(

I really don't want to lose the ability to add a hash function to a mutable dataclass by flipping a flag in the decorator. I'll explain below. But I am fine if this flag has a name that clearly signals it's an unsafe thing to do.

I propose to replace the existing (as of 3.7.0b1) hash= keyword for the @dataclass decorator with a simpler flag named unsafe_hash=. This would be a simple bool (not a tri-state flag like the current hash=None|False|True). The default would be False, and the behavior then would be to add a hash function automatically only if it's safe (using the same rules as for hash=None currently). With unsafe_hash=True, a hash function would always be generated that takes all fields into account except those declared using field(hash=False). If there's already a `def __hash__` in the function I don't care what it does, maybe it should raise rather than quietly doing nothing or quietly overwriting it.

Here's my use case.

A frozen class requires a lot of discipline, since you have to compute the values of all fields before calling the constructor. A mutable class allows other initialization patterns, e.g. manually setting some fields after the instance has been constructed, or having a separate non-dunder init() method. There may be good reasons for using these patterns, e.g. the object may be part of a cycle (e.g. parent/child links in a tree). Or you may just use one of these patterns because you're a pretty casual coder. Or you're modeling something external.

My point is that once you have one of those patterns in place, changing your code to avoid them may be difficult. And yet your code may treat the objects as essentially immutable after the initialization phase (e.g. a parse tree). So if you create a dataclass and start coding like that for a while, and much later you need to put one of these into a set or use it as a dict key, switching to frozen=True may not be a quick option. And writing a __hash__ method by hand may feel like a lot of busywork. So this is where [unsafe_]hash=True would come in handy.

I think naming the flag unsafe_hash should take away most objections, since it will be clear that this is not a safe thing to do. People who don't understand the danger are likely to copy a worse solution from StackOverflow anyway. The docs can point to frozen=True and explain the danger.

--
--Guido van Rossum (python.org/~guido)

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com

Guido van Rossum

unread,

Feb 6, 2018, 9:51:08 PM2/6/18

to Eric V. Smith, Python-Dev

That seems a rare case (though I hadn't thought of it). I had thought of the use case where you want a frozen type without a hash; that you can presumably implement using

def __hash__(self): raise TypeError("not hashable")

We can do a similar thing to preserve the superclass __hash__ if it's rare enough:

def __hash__(self): return super().__hash__()

If at all possible I'd like to kill the tri-state hash= flag -- the amount of time spent creating and discussing the huge table in the bpo issue are an indication of how much effort it would take other people to understand it.

If either of those use cases becomes annoyingly common we'll have to think of something else.

Ethan Furman

unread,

Feb 6, 2018, 10:41:53 PM2/6/18

to pytho...@python.org

On 02/06/2018 06:48 PM, Guido van Rossum wrote:

> That seems a rare case (though I hadn't thought of it). I had thought of the use case where you want a frozen type
> without a hash; that you can presumably implement using
>
> def __hash__(self): raise TypeError("not hashable")
>
> We can do a similar thing to preserve the superclass __hash__ if it's rare enough:
>
> def __hash__(self): return super().__hash__()
>
> If at all possible I'd like to kill the tri-state hash= flag -- the amount of time spent creating and discussing the
> huge table in the bpo issue are an indication of how much effort it would take other people to understand it.

I think the biggest reason this has become so complicated is because we are refusing to use an Enum:

class Hashable(Enum):
IF_SAFE = 1
ADD = 2
DEFER = 3
NONE = 4

IF_SAFE is currently the False value.
ADD is currently the True value
DEFER means don't add one
NONE means set __hash__ to None

The only thing missing now is a setting to indicate that dataclass should do nothing if the class already has a __hash__
method -- possibly DEFER, although I think IF_SAFE can include "the class already has one, it's not safe to override it".

> If either of those use cases becomes annoyingly common we'll have to think of something else.

Or we could solve it now and not have to deal with backwards-compatibility issues in the future.

--
~Ethan~

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Guido van Rossum

unread,

Feb 6, 2018, 11:23:41 PM2/6/18

to Ethan Furman, Python-Dev

I'm not a fan, sorry.

Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

Nick Coghlan

unread,

Feb 7, 2018, 3:04:57 AM2/7/18

to Guido van Rossum, Eric V. Smith, Python-Dev

On 7 February 2018 at 12:48, Guido van Rossum <gu...@python.org> wrote:
> That seems a rare case (though I hadn't thought of it). I had thought of the
> use case where you want a frozen type without a hash; that you can
> presumably implement using
>
> def __hash__(self): raise TypeError("not hashable")

Now that attributes in the class dict pre-empt the generated versions,
"__hash__ = None" in the class body will also turn off hash generation
without enabling hashing.

> We can do a similar thing to preserve the superclass __hash__ if it's rare
> enough:
>
> def __hash__(self): return super().__hash__()

Similarly for this variant, you can do "__hash__ =
BaseClassWithDesiredHashAlgorithm.__hash__". (That may be more
appropriate than dynamic lookup in the MRO if you're extending a
frozen class with a subclass that adds more fields, but you want to
keep using the base class hash definition for some reason)

> If at all possible I'd like to kill the tri-state hash= flag -- the amount
> of time spent creating and discussing the huge table in the bpo issue are an
> indication of how much effort it would take other people to understand it.

+1 - the nice thing about "unsafe_hash=True" is that you *only* need
it if adding the hash would be unsafe, and you haven't set __hash__
explicitly in the class body.

While "frozen=True, unsafe_hash=True" is redundant (since the hash is
safe when instances are frozen), it isn't a surprising defect waiting
to happen the way "frozen=False, hash=True" is in the current
interface.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Reply all

Reply to author

Forward