Issue 306 in protobuf: Python: Message objects should not be hashable

1,403 views
Skip to first unread message

prot...@googlecode.com

unread,
Jun 29, 2011, 11:53:18 PM6/29/11
to prot...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 306 by matt.gi...@gmail.com: Python: Message objects should not
be hashable
http://code.google.com/p/protobuf/issues/detail?id=306

What steps will reproduce the problem?
1. Create a simple .proto file; anything will do:

package test;
message Person {
required string name = 1;
}

2. Create two Message objects and set their fields identically:

>>> import test_pb2
>>> p = test_pb2.Person()
>>> q = test_pb2.Person()
>>> p.name = "Fred"
>>> q.name = "Fred"

3. Note that the two objects compare equally, but their hashes produce
different results:

>>> p == q
True
>>> hash(p) == hash(q)
False

What is the expected output?

>>> hash(p) == hash(q)
TypeError: unhashable type: 'Person'

Rationale

The specification for hashing in Python
(http://docs.python.org/reference/datamodel.html#object.__hash__) specifies
that "The only required property is that objects which compare equal have
the same hash value". Therefore, it is a violation of Python's semantics to
have p and q not hash to the same value. Practical consequences of this are
that if p and q are both inserted into a set or dictionary keys, it will be
undefined whether they will both be stored, or whether one will overwrite
the other (depending on the hash buckets used).

Unfortunately, it is not appropriate to override __hash__ and have the two
objects hash equally when they are considered equal, because they are
mutable. The above specification continues, "If a class defines mutable
objects and implements a __cmp__() or __eq__() method, it should not
implement __hash__(), since hashable collection implementations require
that a object’s hash value is immutable (if the object’s hash value
changes, it will be in the wrong hash bucket)."

The only valid solution is for Message objects to be unhashable (which can
be accomplished by setting __hash__ = None in the Message class). This is
the approach taken by all mutable built-in types in the Python standard
library (e.g., list, set and dict).

This may break existing code, so perhaps it could be introduced as an
option in protoc (which would set __hash__ = None on all of the generated
classes). This would be a useful option, since all code which relies on the
hashability of Message objects is potentially buggy, due to the undefined
behaviour when inserting Messages into hash tables described above.

prot...@googlecode.com

unread,
Jun 30, 2011, 11:17:15 AM6/30/11
to prot...@googlegroups.com
Updates:
Status: Fixed
Labels: FixedIn-2.4.0

Comment #1 on issue 306 by jas...@google.com: Python: Message objects

This was already fixed in 2.4.0 - are you using an older version of the
library?

prot...@googlecode.com

unread,
Jun 30, 2011, 9:59:33 PM6/30/11
to prot...@googlegroups.com

Comment #2 on issue 306 by matt.gi...@gmail.com: Python: Message objects

Ah, my apologies. I was using 2.3.0. It appears to be fixed for me
(Messages are now unhashable).

Reply all
Reply to author
Forward
0 new messages