Per Python spec:
"the implementation of hashable collections requires that a key’s hash
value is immutable (if the object’s hash value changes, it will be in
the wrong hash bucket)"
Yet:
import dataclasses
@dataclasses.dataclass(hash=True)
class A:
foo: int = dataclasses.field(hash=True, compare=True)
a = A(foo=1)
s = set()
s.add(a) # s == {a}
a.foo = 2
print(a in s)
print({a} == s}
print(s == s)
# prints False False True
This looks to me like a clearly wrong behavior.
Elvis
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com
Data classes do not protect you from doing the wrong thing. This is the
same as writing:
class A:
def __init__(self, foo):
self.foo = foo
def __hash__(self):
return hash((self.foo,))
You're allowed to override the parameters to dataclasses.dataclass for
cases where you know what you're doing. Consenting adults, and all.
Eric.
I should add: This is why you shouldn't override the default (hash=None)
unless you know what the consequences are. Can I ask why you want to
specify hash=True?
Eric
attrs has the same behavior (if you ignore how dataclasses handles the
cases where __hash__ or __eq__ already exist in the class definition).
Here's what attrs says about adding __hash__ via hash=True:
"Although not recommended, you can decide for yourself and force attrs
to create one (e.g. if the class is immutable even though you didn’t
freeze it programmatically) by passing True or not. Both of these cases
are rather special and should be used carefully."
The problem with dropping hash=True is: how would you write __hash__
yourself? It seems like a bug magnet if you're adding fields to the
class and forget to update __hash__, especially in the presence of
per-field hash=False and eq=False settings. And you'd need to make sure
it matches the generated __eq__ (if 2 objects are equal, they need to
have the same hash value).
If we're going to start disallowing things, how about the per-field
hash=True, eq=False case?
However, I don't feel very strongly about this. As I've said, I expect
the use cases for hash=True to be very, very rare. And now that we allow
overriding __hash__ in the class body without setting hash=False, there
aren't a lot of uses for hash=False, either. But we would need to think
through how you'd get the behavior of hash=False with multiple
inheritance, if that's what you wanted. Again, a very, very rare case.
In all, I think we're better off documenting best practices and making
them the default, like attrs does, and leave it to the programmer to
follow them. I realize we're handing out footguns, the alternatives seem
even more complex and are limiting.
Eric
Is there a real world example of such an "advanced case"?
Eric, have you read https://github.com/python-attrs/attrs/issues/136 ?
Specifically this comment from Hynek [1]:
"I never really thought about it, but yeah mutable objects shouldn’t
have a __hash__ at all."
It is clear from that thread that "hash=True" was an early design
mistake, which was left in for compatibility reasons. Why are we
copying bad design to the standard library?
Elvis
[1] https://github.com/python-attrs/attrs/issues/
136#issuecomment-277425421
On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
> However, I don't feel very strongly about this. As I've said, I expect
> the use cases for hash=True to be very, very rare.
Why do you think that the requirement to make a dataclass hashable is a
"very, very rare" requirement?
Just put yourself in the shoes of an average Python developer. You try
to put a dataclass in a set, you get a TypeError. Your immediate
reaction is to add "hash=True". Things appear to work.
The problem with dropping hash=True is: how would you write __hash__ yourself? It seems like a bug magnet if you're adding fields to the class and forget to update __hash__, especially in the presence of per-field hash=False and eq=False settings. And you'd need to make sure it matches the generated __eq__ (if 2 objects are equal, they need to have the same hash value).
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Is there any reason specifying has=True could set frozen=True unless
the user specifically sets frozen=False?
Or is that already the case?
I think the folks that are concerned about this issue are quite right
— most Python users equate immutable and hashable—so the dataclass API
should reflect that.
And this would still make it easy and clear to specify the unusual
(and arguably dangerous) case of:
hash=True, frozen=False
-CHB
+1 using unsafe_hash as a name addresses my concern. It's a good signal that there are caveats worth considering.
-gps
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
+1 using unsafe_hash as a name addresses my concern.
A frozen class requires a lot of discipline, since you have to compute the values of all fields before calling the constructor. A mutable class allows other initialization patterns, e.g. manually setting some fields after the instance has been constructed, or having a separate non-dunder init() method. There may be good reasons for using these patterns, e.g. the object may be part of a cycle (e.g. parent/child links in a tree). Or you may just use one of these patterns because you're a pretty casual coder. Or you're modeling something external.My point is that once you have one of those patterns in place, changing your code to avoid them may be difficult. And yet your code may treat the objects as essentially immutable after the initialization phase (e.g. a parse tree). So if you create a dataclass and start coding like that for a while, and much later you need to put one of these into a set or use it as a dict key, switching to frozen=True may not be a quick option. And writing a __hash__ method by hand may feel like a lot of busywork. So this is where [unsafe_]hash=True would come in handy.
I think naming the flag unsafe_hash should take away most objections, since it will be clear that this is not a safe thing to do. People who don't understand the danger are likely to copy a worse solution from StackOverflow anyway. The docs can point to frozen=True and explain the danger.
On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum <guido at python.org> wrote:
Looks like this is turning into a major flamewar regardless of what I say.
:-(
I really don't want to lose the ability to add a hash function to a
mutable dataclass by flipping a flag in the decorator. I'll explain below.
But I am fine if this flag has a name that clearly signals it's an unsafe
thing to do.
I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
@dataclass decorator with a simpler flag named unsafe_hash=. This would be
a simple bool (not a tri-state flag like the current hash=None|False|True).
The default would be False, and the behavior then would be to add a hash
function automatically only if it's safe (using the same rules as for
hash=None currently). With unsafe_hash=True, a hash function would always
be generated that takes all fields into account except those declared using
field(hash=False). If there's already a `def __hash__` in the function I
don't care what it does, maybe it should raise rather than quietly doing
nothing or quietly overwriting it.
Here's my use case.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is, mutable data is involved in the hash calculation), but there still seems to be one possibility for an "unsafe_hash" to actually be safe -- that is, if only immutable fields are used in __eq__, then dataclass could safely generate a hash for us.
Do we have a way to know if the equality fields are hashable? I suppose we could check each one for a for a non-None __hash__. Then we could modify that first condition from
- frozen=True
to
- frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for eq_field in equality_fields)
Although, couldn't we add a field-level frozen attribute (using property for the implementation), and check that all equality fields are properties as well as hashable?
My point is that once you have one of those patterns in place, changing your code to avoid them may be difficult. And yet your code may treat the objects as essentially immutable after the initialization phase (e.g. a parse tree). So if you create a dataclass and start coding like that for a while, and much later you need to put one of these into a set or use it as a dict key, switching to frozen=True may not be a quick option. And writing a __hash__ method by hand may feel like a lot of busywork. So this is where [unsafe_]hash=True would come in handy.Here's my use case.I propose to replace the existing (as of 3.7.0b1) hash= keyword for the @dataclass decorator with a simpler flag named unsafe_hash=. This would be a simple bool (not a tri-state flag like the current hash=None|False|True). The default would be False, and the behavior then would be to add a hash function automatically only if it's safe (using the same rules as for hash=None currently). With unsafe_hash=True, a hash function would always be generated that takes all fields into account except those declared using field(hash=False). If there's already a `def __hash__` in the function I don't care what it does, maybe it should raise rather than quietly doing nothing or quietly overwriting it.Looks like this is turning into a major flamewar regardless of what I say. :-(I really don't want to lose the ability to add a hash function to a mutable dataclass by flipping a flag in the decorator. I'll explain below. But I am fine if this flag has a name that clearly signals it's an unsafe thing to do.
A frozen class requires a lot of discipline, since you have to compute the values of all fields before calling the constructor. A mutable class allows other initialization patterns, e.g. manually setting some fields after the instance has been constructed, or having a separate non-dunder init() method. There may be good reasons for using these patterns, e.g. the object may be part of a cycle (e.g. parent/child links in a tree). Or you may just use one of these patterns because you're a pretty casual coder. Or you're modeling something external.
I think naming the flag unsafe_hash should take away most objections, since it will be clear that this is not a safe thing to do. People who don't understand the danger are likely to copy a worse solution from StackOverflow anyway. The docs can point to frozen=True and explain the danger.
----Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev