Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Python 3.x and bytes

123 views
Skip to first unread message

Ethan Furman

unread,
May 17, 2011, 2:47:01 PM5/17/11
to Python
In Python 3 one can say

--> huh = bytes(5)

Since the bytes type is actually a list of integers, I would have
expected this to have huh being a bytestring with one element -- the
integer 5. Actually, what you get is:

--> huh
b'\x00\x00\x00\x00\x00'

or five null bytes. Note that this is an immutable type, so you cannot
go in later and say

--> huh[3] = 9
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'bytes' object does not support item assignment


So, out of curiosity, does anyone actually use this, um, feature?

~Ethan~

Ian Kelly

unread,
May 17, 2011, 3:20:32 PM5/17/11
to Python

I suppose it's for interoperability with the mutable bytearray type,
which takes the same parameters in the constructor.

Ian Kelly

unread,
May 17, 2011, 3:25:35 PM5/17/11
to Python
On Tue, May 17, 2011 at 1:20 PM, Ian Kelly <ian.g...@gmail.com> wrote:
> I suppose it's for interoperability with the mutable bytearray type,
> which takes the same parameters in the constructor.

http://www.python.org/dev/peps/pep-3137/#constructors

MRAB

unread,
May 17, 2011, 3:39:12 PM5/17/11
to pytho...@python.org
On 17/05/2011 19:47, Ethan Furman wrote:
> In Python 3 one can say
>
> --> huh = bytes(5)
>
> Since the bytes type is actually a list of integers, I would have
> expected this to have huh being a bytestring with one element -- the
> integer 5. Actually, what you get is:
>
> --> huh
> b'\x00\x00\x00\x00\x00'
>
> or five null bytes. Note that this is an immutable type, so you cannot
> go in later and say
>
> --> huh[3] = 9
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: 'bytes' object does not support item assignment
>
>
> So, out of curiosity, does anyone actually use this, um, feature?
>
I suppose it follows the example of 'list' and 'tuple' in accepting an
iterable.

Producing a bytestring of zero bytes might have its uses, but because
Python lets me do coding at a high level (lists, dicts, etc), I've
never used that feature.

BTW, help(bytes) doesn't seem to mention it!

Corey Richardson

unread,
May 17, 2011, 3:50:13 PM5/17/11
to pytho...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/2011 02:47 PM, Ethan Furman wrote:
> In Python 3 one can say
>
> --> huh = bytes(5)
>
> Since the bytes type is actually a list of integers, I would have
> expected this to have huh being a bytestring with one element -- the
> integer 5. Actually, what you get is:
>
> --> huh
> b'\x00\x00\x00\x00\x00'
>
> or five null bytes. Note that this is an immutable type, so you cannot
> go in later and say

For the bytes to actually be a 'list of integers', you need to pass it
an iterable, ex:
>>> bytes([5, 6, 1, 3])
b'\x05\x06\x01\x03'

- From help(bytes):
| bytes(iterable_of_ints) -> bytes
| bytes(string, encoding[, errors]) -> bytes
| bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
| bytes(memory_view) -> bytes

Looks like you're using the fourth when you want the first, possibly?

- --
Corey Richardson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN0tF1AAoJEAFAbo/KNFvp41AH/1l2zR6XVOJ0xM7s2P+PDYZX
OAhmi19hFEP0zQoWiW3TiMEVPlaqgtipPCp1t+jTeNNN3F+H4NG2DHJJZ3dPDr2J
CpABQKyS4MJQTUxhCIlXqAaA2I1pejzAv6fwsF66/zPFmyaTAJLDP+3WMQvCUUoZ
5A3qHgHNp6vBHXd13RNdQStLeprfQptA+z6XdiJPos348ecRj/u9id7v28dwxxsm
d9WA6oYwJ+Y/NcG2OP0Flyp3Zc3hymVsv5vhmhG2+EiIrxMn95k8ImsKLEhvUW3a
72CxlE6EaOMD4MuWyeGMS33c0vHwtAvEIE7M56R2FAl8EsUFwP2swaij0tEiemg=
=8MRV
-----END PGP SIGNATURE-----

Ethan Furman

unread,
May 17, 2011, 4:20:45 PM5/17/11
to Felipe Bastos Nunes, python-list
Felipe Bastos Nunes wrote:

> 2011/5/17 Ethan Furman wrote:
>>
>> In Python 3 one can say
>>
>> --> huh = bytes(5)
>>
>> Since the bytes type is actually a list of integers, I would have
>> expected this to have huh being a bytestring with one element -- the
>> integer 5. Actually, what you get is:
>>
>> --> huh
>> b'\x00\x00\x00\x00\x00'
>>
>> or five null bytes. Note that this is an immutable type, so you
>> cannot go in later and say
>>
>> --> huh[3] = 9
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> TypeError: 'bytes' object does not support item assignment
>>
>>
>> So, out of curiosity, does anyone actually use this, um, feature?
>
> They accept .replace(b"00", b"12") for example.

So they do. Although that particular example doesn't work since b'0' is
the integer 48...

--> huh.replace(b'00',b'12')


b'\x00\x00\x00\x00\x00'


The big question, though, is would you do it this way:

some_var = bytes(23).replace(b'\x00', b'a')

or this way?

some_var = bytes(b'a' * 23)

~Ethan~

Ethan Furman

unread,
May 17, 2011, 4:55:36 PM5/17/11
to pytho...@python.org
Corey Richardson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/17/2011 02:47 PM, Ethan Furman wrote:
>> In Python 3 one can say
>>
>> --> huh = bytes(5)
>>
>> Since the bytes type is actually a list of integers, I would have
>> expected this to have huh being a bytestring with one element -- the
>> integer 5. Actually, what you get is:
>>
>> --> huh
>> b'\x00\x00\x00\x00\x00'
>>
>> or five null bytes. Note that this is an immutable type, so you cannot
>> go in later and say
>
> For the bytes to actually be a 'list of integers', you need to pass it
> an iterable, ex:
>>>> bytes([5, 6, 1, 3])
> b'\x05\x06\x01\x03'

Not so.

--> huh = b'abcedfg'
--> huh[3]
101

It's a list of int's.

> - From help(bytes):
> | bytes(iterable_of_ints) -> bytes
> | bytes(string, encoding[, errors]) -> bytes
> | bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
> | bytes(memory_view) -> bytes
>
> Looks like you're using the fourth when you want the first, possibly?

Nope. Apparently, it's not well documented. If you check PEP 358
you'll find it.

~Ethan~

Ian Kelly

unread,
May 17, 2011, 4:51:31 PM5/17/11
to pytho...@python.org
On Tue, May 17, 2011 at 1:50 PM, Corey Richardson <kb1...@aim.com> wrote:
> - From help(bytes):
>  |  bytes(iterable_of_ints) -> bytes
>  |  bytes(string, encoding[, errors]) -> bytes
>  |  bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
>  |  bytes(memory_view) -> bytes
>
> Looks like you're using the fourth when you want the first, possibly?

Nope, he's using the fifth form, bytes(int), which is listed in the
PEP but not in the help.

Ian Kelly

unread,
May 17, 2011, 4:52:43 PM5/17/11
to Python
On Tue, May 17, 2011 at 2:20 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> The big question, though, is would you do it this way:
>
> some_var = bytes(23).replace(b'\x00', b'a')
>
> or this way?
>
> some_var = bytes(b'a' * 23)

Actually, I would just do it this way:

some_var = b'a' * 23

That's already a bytes object. Passing it into the constructor is redundant.

Corey Richardson

unread,
May 17, 2011, 5:27:04 PM5/17/11
to pytho...@python.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/2011 04:55 PM, Ethan Furman wrote:
> Apparently, it's not well documented. If you check PEP 358
> you'll find it.
>
> ~Ethan~

Agreed, it looks like it should be mentioned in bytes.__doc__ about the
single-integer argument.

- --
Corey Richardson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN0ugoAAoJEAFAbo/KNFvphIsH/3LOzN8+D98D0+nm6m8sfCUC
f+KfgTLITAmecYOuOBym1snl6qj2JnZkGYRW6M2O5NV8arNJ1dHty3dPbwMeKdfH
67m2a0UgHcwqv5M5VGNQQYTQ03Mzqy+A84MMvBKWUQ0nxZRCkPMtdxm2T4/UEVLx
uelDPOdOWB1PDmc3sNUDPovXeOFlTKmcQ5yfolyrdLFU/KmbamgRSltpBFEbyInO
4KI3hoGka4PVaaBLf9QPjFC6tBu4QdQ4UTnWD3sy78LA3KPsa5MEpXFXctwJkJ+O
q2Y7SWOPJDz19V+MT87Aeu69YpzxWwkp4fBflNxYaQUoJqNlzIfRkavUzZ0zfMQ=
=E2qm
-----END PGP SIGNATURE-----

Ethan Furman

unread,
May 17, 2011, 6:28:23 PM5/17/11
to Python

Heh, good point.

~Ethan~

Terry Reedy

unread,
May 17, 2011, 10:28:08 PM5/17/11
to pytho...@python.org
On 5/17/2011 3:39 PM, MRAB wrote:

> On 17/05/2011 19:47, Ethan Furman wrote:
>> In Python 3 one can say
>>
>> --> huh = bytes(5)

> BTW, help(bytes) doesn't seem to mention it!

I believe I mentioned that on some tracker issue.

--
Terry Jan Reedy

Terry Reedy

unread,
May 17, 2011, 10:30:45 PM5/17/11
to pytho...@python.org
On 5/17/2011 5:27 PM, Corey Richardson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/17/2011 04:55 PM, Ethan Furman wrote:
>> Apparently, it's not well documented. If you check PEP 358
>> you'll find it.
>>
>> ~Ethan~
>
> Agreed, it looks like it should be mentioned in bytes.__doc__ about the
> single-integer argument.

It will be when some volunteer writes a patch.
http://bugs.python.org/issue11231

--
Terry Jan Reedy

Ethan Furman

unread,
May 18, 2011, 12:40:40 PM5/18/11
to Python

However, as I just discovered, it works well when dealing with a
bytearray object:

some_var = bytearray(b' ' * size) # want space initialized, not null

~Ethan~

Ethan Furman

unread,
May 20, 2011, 1:43:43 AM5/20/11
to python list
Several folk have said that objects that compare equal must hash equal,
and the docs also state this
http://docs.python.org/dev/reference/datamodel.html#object.__hash__

I'm hoping somebody can tell me what horrible thing will happen if this
isn't the case? Here's a toy example of a class I'm thinking of writing
that will compare equal with int's, but hash differently:

--> class Wierd():
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) ** 3)
...
--> one = Wierd(1)
--> two = Wierd(2)
--> three = Wierd(3)
--> one
<Wierd object at 0x00BFE710>
--> one == 1
True
--> one == 2
False
--> two == 2
True
--> three == 3
True
--> d = dict()
--> d[one] = '1'
--> d[two] = '2'
--> d[three] = '3'
--> d
{<Wierd object at 0x00BFE710>: '1',
<Wierd object at 0x00BFE870>: '3',
<Wierd object at 0x00BFE830>: '2'}
--> d[1] = '1.0'
--> d[2] = '2.0'
--> d[3] = '3.0'
--> d
{<Wierd object at 0x00BFE870>: '3',
1: '1.0',
2: '2.0',
3: '3.0',
<Wierd object at 0x00BFE830>: '2',
<Wierd object at 0x00BFE710>: '1'}
--> d[2]
'2.0'
--> d[two]
'2'

All information greatly appreciated!

~Ethan~

Peter Otten

unread,
May 20, 2011, 2:38:30 AM5/20/11
to
Ethan Furman wrote:

> Several folk have said that objects that compare equal must hash equal,
> and the docs also state this
> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>
> I'm hoping somebody can tell me what horrible thing will happen if this
> isn't the case? Here's a toy example of a class I'm thinking of writing
> that will compare equal with int's, but hash differently:
>
> --> class Wierd():
> ... def __init__(self, value):
> ... self.value = value
> ... def __eq__(self, other):
> ... return self.value == other
> ... def __hash__(self):
> ... return hash((self.value + 13) ** 3)
> ...

Try this:

>>> d = {Wierd(1): 0}
>>> 1 in d
False
>>> 1 in d.keys()
True

Chris Rebert

unread,
May 20, 2011, 2:44:59 AM5/20/11
to Ethan Furman, python list
On Thu, May 19, 2011 at 10:43 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> Several folk have said that objects that compare equal must hash equal, and
> the docs also state this
> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>
> I'm hoping somebody can tell me what horrible thing will happen if this
> isn't the case?  Here's a toy example of a class I'm thinking of writing
> that will compare equal with int's, but hash differently:
>
> --> class Wierd():
> ...     def __init__(self, value):
> ...         self.value = value
> ...     def __eq__(self, other):
> ...         return self.value == other
> ...     def __hash__(self):
> ...         return hash((self.value + 13) ** 3)
> ...
> --> one = Wierd(1)
> --> two = Wierd(2)
> --> three = Wierd(3)
> --> one
> <Wierd object at 0x00BFE710>
> --> one == 1
> True
> --> one == 2
> False
> --> two == 2
> True
> --> three == 3
> True
> --> d = dict()
> --> d[one] = '1'
> --> d[two] = '2'
> --> d[three] = '3'
> --> d
> {<Wierd object at 0x00BFE710>: '1',
>  <Wierd object at 0x00BFE870>: '3',
>  <Wierd object at 0x00BFE830>: '2'}
> --> d[1] = '1.0'
> --> d[2] = '2.0'
> --> d[3] = '3.0'

This is the part considered "horrible":


> --> d
> {<Wierd object at 0x00BFE870>: '3',
>  1: '1.0',
>  2: '2.0',
>  3: '3.0',
>  <Wierd object at 0x00BFE830>: '2',
>  <Wierd object at 0x00BFE710>: '1'}

Compare:
>>> x = {5.0 : 'foo'}
>>> x[5]
'foo'

Here's a more common/plausible "horrible" case closer to what the docs
writers had in mind:
>>> class Naughty(object):
... def __init__(self, n):
... self.n = n
... def __eq__(self, other):
... return self.n == other.n
...
>>> Naughty(5) == Naughty(5)
True
>>> Naughty(5) is Naughty(5)
False
>>> bad = Naughty(3)
>>> y = {bad : 'foo'}
>>> y[bad] # just happens to work
'foo'
>>> del bad
>>> # ok, how do we get to 'foo' now?
>>> y[Naughty(3)] # try the obvious way


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

KeyError: <__main__.Naughty object at 0x2a1cb0>
>>> # We're screwed.

Naughty instances (and similar) can't be used sensibly as hash keys
(unless you /only/ care about object identity; this is often not the
case).

Cheers,
Chris
--
http://rebertia.com

Ulrich Eckhardt

unread,
May 20, 2011, 2:33:46 AM5/20/11
to
Ethan Furman wrote:
> Several folk have said that objects that compare equal must hash equal,
> and the docs also state this
> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>
> I'm hoping somebody can tell me what horrible thing will happen if this
> isn't the case?

If you were familiar with what a hash map is, you wouldn't ask. The thing is
that the hash is used to look up the place in the map where the thing is
stored. If two equal objects have different hashes, they will be stored in
different places in the hash map. Looking for object1 will then not turn up
with object2, even though they are equal. If this is something you don't
care about, and all you care about is identity, then I'd derive the hash
from each object's ID.

This ID has another property which is something that is assumed for hashes,
and your code seems a bit to get that wrong, too, and that is that the hash
must not change. Again, the reason is that if the hash changes, the position
in the hash map changes, too. If you then try to look up the changed object,
it will look for it in the new place, but it won't be found because it is in
the old place.

For that reason, it is generally useful to use immutable types like
integers, floats, strings and tuples thereof as keys. Since you can't change
them, you basically have the guarantee that they hash the same.

Uli

--
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

MRAB

unread,
May 20, 2011, 11:50:02 AM5/20/11
to pytho...@python.org
On 20/05/2011 07:33, Ulrich Eckhardt wrote:
> Ethan Furman wrote:
>> Several folk have said that objects that compare equal must hash equal,
>> and the docs also state this
>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>
>> I'm hoping somebody can tell me what horrible thing will happen if this
>> isn't the case?
>
> If you were familiar with what a hash map is, you wouldn't ask. The thing is
> that the hash is used to look up the place in the map where the thing is
> stored. If two equal objects have different hashes, they will be stored in
> different places in the hash map.
[snip]
Is this strictly true? I thought that the hash value, an integer, is
moduloed (Is that how you spell it? Looks weird!) with the number of
array elements to give an index into the array, so different hashes
could give the same index, and objects with different hashes could be
stored in the same 'bucket'.

Chris Angelico

unread,
May 20, 2011, 12:20:04 PM5/20/11
to pytho...@python.org
On Sat, May 21, 2011 at 1:50 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> [snip]
> Is this strictly true? I thought that the hash value, an integer, is
> moduloed (Is that how you spell it? Looks weird!) with the number of
> array elements to give an index into the array, so different hashes
> could give the same index, and objects with different hashes could be
> stored in the same 'bucket'.

There can always be hash collisions between different objects, but the
assumption is that two identical objects will _always_ "collide".

Chris Angelico

Ethan Furman

unread,
May 20, 2011, 1:56:02 PM5/20/11
to Chris Rebert, python list
Chris Rebert wrote:

> On Thu, May 19, 2011 at 10:43 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>> Several folk have said that objects that compare equal must hash equal, and
>> the docs also state this
>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>
>> I'm hoping somebody can tell me what horrible thing will happen if this
>> isn't the case? Here's a toy example of a class I'm thinking of writing
>> that will compare equal with int's, but hash differently:
>
> This is the part considered "horrible":
>> --> d
>> {<Wierd object at 0x00BFE870>: '3',
>> 1: '1.0',
>> 2: '2.0',
>> 3: '3.0',
>> <Wierd object at 0x00BFE830>: '2',
>> <Wierd object at 0x00BFE710>: '1'}
>
> Compare:
>>>> x = {5.0 : 'foo'}
>>>> x[5]
> 'foo'
>
> Here's a more common/plausible "horrible" case closer to what the docs
> writers had in mind:
>--> class Naughty(object):

> ... def __init__(self, n):
> ... self.n = n
> ... def __eq__(self, other):
> ... return self.n == other.n
> ...
>--> Naughty(5) == Naughty(5)
> True
>--> Naughty(5) is Naughty(5)
> False
>--> bad = Naughty(3)
>--> y = {bad : 'foo'}
>--> y[bad] # just happens to work
> 'foo'
>--> del bad
>--> # ok, how do we get to 'foo' now?
>--> y[Naughty(3)] # try the obvious way

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> KeyError: <__main__.Naughty object at 0x2a1cb0>
>--> # We're screwed.

>
> Naughty instances (and similar) can't be used sensibly as hash keys
> (unless you /only/ care about object identity; this is often not the
> case).

I tried this sequence (using Python 3, BTW -- forgot to mention that
little tidbit -- sorry!):

--> del two
--> two


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

NameError: name 'two' is not defined
--> d
{<__main__.Wierd object at 0x00C0C950>: '3',


1: '1.0',
2: '2.0',
3: '3.0',

<__main__.Wierd object at 0x00B3AC10>: '2',
<__main__.Wierd object at 0x00B32E90>: '1'}
--> new_two = Wierd(2)
--> d[new_two]
'2'

~Ethan~

Ethan Furman

unread,
May 20, 2011, 1:57:42 PM5/20/11
to Peter Otten, pytho...@python.org

My apologies -- I'm trying this in Python3:

--> two in d
True
--> two in d.keys()
True
-->
--> 2 in d
True
--> 2 in d.keys()
True

~Ethan~

Ian Kelly

unread,
May 20, 2011, 1:59:44 PM5/20/11
to Python
On Fri, May 20, 2011 at 10:36 AM, Chris Kaynor <cka...@zindagigames.com> wrote:
> I think the question was: can this dummy code ever produce a set containing
> less then itemCount items (for 0 < itemCount < 2**32)?

In CPython, no. Even when you get a hash collision, the code checks
to see whether the hashes are actually equal before it calls the rich
comparison, the former check being a much faster operation since the
hash values are cached.

I'm not sure whether this can be counted on for all Python
implementations, though.

Ethan Furman

unread,
May 20, 2011, 2:38:41 PM5/20/11
to Ulrich Eckhardt, pytho...@python.org
Ulrich Eckhardt wrote:
> Ethan Furman wrote:
>> Several folk have said that objects that compare equal must hash equal,
>> and the docs also state this
>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>
>> I'm hoping somebody can tell me what horrible thing will happen if this
>> isn't the case?
>
> If you were familiar with what a hash map is, you wouldn't ask. The thing is
> that the hash is used to look up the place in the map where the thing is
> stored. If two equal objects have different hashes, they will be stored in
> different places in the hash map. Looking for object1 will then not turn up
> with object2, even though they are equal.

In this case this is the behavior I want.

> If this is something you don't
> care about, and all you care about is identity, then I'd derive the hash
> from each object's ID.

This won't work, as objects of the same type that compare equal should
(and do, in my code) hash equal.

> This ID has another property which is something that is assumed for hashes,
> and your code seems a bit to get that wrong, too, and that is that the hash
> must not change.

The hash does not change on the instances, and is the same for all
instances of my type that compare equal.

~Ethan~

Ethan Furman

unread,
May 20, 2011, 2:46:08 PM5/20/11
to python list
Ethan Furman wrote:
> Several folk have said that objects that compare equal must hash equal,
> and the docs also state this
> http://docs.python.org/dev/reference/datamodel.html#object.__hash__


Two things I didn't make clear originally:

I'm using Python3.

My objects (of type Wierd) obey the premise of comparing equal also
meaning hashing equal (with other objects of type Wierd).

Perhaps my question could be narrowed down to:

Should the docs actually say that "objects of the same *type* that
compare equal must hash equal", or is there an underlying reason that
objects of *different types* that happen to compare equal *must not*
have different hashes?

In other words, is the fact that everything tried so far in Python3 to
break my toy code has failed to do so just an implementation detail of
Python3?

~Ethan~

Chris Rebert

unread,
May 20, 2011, 3:00:43 PM5/20/11
to Ethan Furman, python list
On Fri, May 20, 2011 at 10:56 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> Chris Rebert wrote:
>> On Thu, May 19, 2011 at 10:43 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>>> Several folk have said that objects that compare equal must hash equal,
>>> and
>>> the docs also state this
>>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>>
>>> I'm hoping somebody can tell me what horrible thing will happen if this
>>> isn't the case?
<snip>

>> Here's a more common/plausible "horrible" case closer to what the docs
>> writers had in mind:
>> --> class Naughty(object):
>> ...     def __init__(self, n):
>> ...         self.n = n
>> ...     def __eq__(self, other):

>> ...         return self.n == other.n
>> ...
>> --> Naughty(5) == Naughty(5)
>> True
>> --> Naughty(5) is Naughty(5)
>> False
>> --> bad = Naughty(3)
>> --> y = {bad : 'foo'}
>> --> y[bad] # just happens to work
>> 'foo'
>> --> del bad
>> --> # ok, how do we get to 'foo' now?
>> --> y[Naughty(3)] # try the obvious way
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> KeyError: <__main__.Naughty object at 0x2a1cb0>
>> --> # We're screwed.
>>
>> Naughty instances (and similar) can't be used sensibly as hash keys
>> (unless you /only/ care about object identity; this is often not the
>> case).
>
> I tried this sequence (using Python 3, BTW -- forgot to mention that little
> tidbit -- sorry!):

Doesn't matter anyway.

> --> del two
> --> two
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> NameError: name 'two' is not defined
> --> d
> {<__main__.Wierd object at 0x00C0C950>: '3',
>  1: '1.0',
>  2: '2.0',
>  3: '3.0',
>  <__main__.Wierd object at 0x00B3AC10>: '2',
>  <__main__.Wierd object at 0x00B32E90>: '1'}
> --> new_two = Wierd(2)
> --> d[new_two]
> '2'

Right, this is why I went to the trouble of writing Naughty instead of
using Wierd. Wierd's exact problem is less common and less severe. The
"equality implies identical hash" rule is not a universal one; some
other languages instead impose the lesser requirement of "equality and
same (or related) types implies identical hash". In Ruby for instance:
irb(main):001:0> 1 == 1.0
=> true
irb(main):002:0> a = {1.0 => 'hi'} # float key
=> {1.0=>"hi"}
irb(main):003:0> a[1] = 'bye' # int key
=> "bye"
irb(main):004:0> a # notice how they don't collide:
=> {1=>"bye", 1.0=>"hi"}
(Contrast this with my earlier analogous Python example.)

Basically, Naughty is fundamentally broken [hash(Naughty(2)) !=
hash(Naughty(2))], whereas Wierd merely defies convention [hash(2) !=
hash(Wierd(2)) but hash(Wierd(2)) == hash(Wierd(2))].

Christian Heimes

unread,
May 20, 2011, 3:01:22 PM5/20/11
to pytho...@python.org
Am 20.05.2011 17:50, schrieb MRAB:
> Is this strictly true? I thought that the hash value, an integer, is
> moduloed (Is that how you spell it? Looks weird!) with the number of
> array elements to give an index into the array, so different hashes
> could give the same index, and objects with different hashes could be
> stored in the same 'bucket'.

I don't think 'moduloed' is an existing word but your description is
mostly correct. The hash of the object and length of the hash table are
used to calculate the position in the hash table. However Python's
implementation doesn't use buckets to reduce memory usage and pointer
dereferencing. If a slot in the hash table is already filled with an
object that is not equal to the new object (a collision), the hash is
shifted and the new slot is checked. The implementation detail is well
described in Modules/dictobject.c.

Christian

MRAB

unread,
May 20, 2011, 4:17:29 PM5/20/11
to pytho...@python.org
A brief search on the web found a use of the word in 1982.

Peter Otten

unread,
May 20, 2011, 4:25:27 PM5/20/11
to pytho...@python.org
Ethan Furman wrote:

> My apologies -- I'm trying this in Python3:

Then you have to convert the keys to a list explicitly:

>>> class Weird:


... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) **3)
...

>>> d = {Weird(1): 0}


>>> 1 in d
False
>>> 1 in d.keys()

False
>>> 1 in list(d.keys())
True


Ethan Furman

unread,
May 20, 2011, 5:48:27 PM5/20/11
to Peter Otten, pytho...@python.org
Peter Otten wrote:
> Ethan Furman wrote:
>
>> Peter Otten wrote:
>>> Ethan Furman wrote:
>>>
>>>> Several folk have said that objects that compare equal must hash equal,
>>>> and the docs also state this
>>>> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>>>>
>>>> --> class Wierd():
>>>> ... def __init__(self, value):
>>>> ... self.value = value
>>>> ... def __eq__(self, other):
>>>> ... return self.value == other
>>>> ... def __hash__(self):
>>>> ... return hash((self.value + 13) ** 3)
>>>> ...
>>> Try this:
>>>
>>>>>> d = {Wierd(1): 0}
>>>>>> 1 in d
>>> False
>>>>>> 1 in d.keys()
>>> True
>>>
>> My apologies -- I'm trying this in Python3:
>
> Then you have to convert the keys to a list explicitly:
>
>>>> class Weird:
> ... def __init__(self, value):
> ... self.value = value
> ... def __eq__(self, other):
> ... return self.value == other
> ... def __hash__(self):
> ... return hash((self.value + 13) **3)
> ...
>>>> d = {Weird(1): 0}

>>>> 1 in d
> False
>>>> 1 in d.keys()
> False
>>>> 1 in list(d.keys())
> True

Ah!! The light finally dawned! Many thanks for everyone's input.

So if Wierd has a need to compare equal to some other type, it should
implement a .equals() method. Gotcha.

Likewise, if two different type's instances can compare equal, then for
the most part they should be interchangeable.

~Ethan~

Steven D'Aprano

unread,
May 20, 2011, 8:47:56 PM5/20/11
to
On Fri, 20 May 2011 21:17:29 +0100, MRAB wrote:

> On 20/05/2011 20:01, Christian Heimes wrote:
>> Am 20.05.2011 17:50, schrieb MRAB:
>>> Is this strictly true? I thought that the hash value, an integer, is

>>> moduloed (Is that how you spell it? Looks weird!) ...


>>
>> I don't think 'moduloed' is an existing word but your description is

>> mostly correct. ...


>>
> A brief search on the web found a use of the word in 1982.

All that means is that two people, three decades apart, used the same non-
word :)

I think you are treating "modulo" as a verb, equivalent to division,
hence:

a/b => a is divided by b
a%b => a is "moduloed" by b

But modulo is not a verb. It is a preposition, a modifier word. Just as
you might say "the cat sat on the mat" (cat on mat) or "the Princess
found a pea underneath her mattress" (pea underneath mattress) so
mathematicians will say "a is taken modulo b" (a modulo b).

English verbs nouns at the drop of a hat, but I've never heard of it
verbing propositions:

"The princess underneathed the pea."

No, I don't think so.

English does use "remainder" as a verb, although not in the mathematical
sense; I think that:

a%b => a is remaindered by b

is at least grammatical, although still ugly and awkward. I'm afraid that
in English, the best way to say what you are trying to say is moderately
verbose:

"the hash value, an integer, is taken modulo ..."


--
Steven

MRAB

unread,
May 20, 2011, 9:02:48 PM5/20/11
to pytho...@python.org
On 21/05/2011 01:47, Steven D'Aprano wrote:
> On Fri, 20 May 2011 21:17:29 +0100, MRAB wrote:
>
>> On 20/05/2011 20:01, Christian Heimes wrote:
>>> Am 20.05.2011 17:50, schrieb MRAB:
>>>> Is this strictly true? I thought that the hash value, an integer, is
>>>> moduloed (Is that how you spell it? Looks weird!) ...
>>>
>>> I don't think 'moduloed' is an existing word but your description is
>>> mostly correct. ...
>>>
>> A brief search on the web found a use of the word in 1982.
>
> All that means is that two people, three decades apart, used the same non-
> word :)
>
[snip]
There were other uses. That's just the earliest one I found.

Steven D'Aprano

unread,
May 20, 2011, 9:55:18 PM5/20/11
to


Nevertheless, it is still ungrammatical and incorrect usage. I'm not a
prescriptivist, but not everything people write down is a word, otherwise
we'd be forcefied to say evert typlo and mystake wsa an actul wrd.

--
Steven

Gregory Ewing

unread,
May 21, 2011, 4:24:21 AM5/21/11
to
Ethan Furman wrote:

> Ulrich Eckhardt wrote:
>
>> If two equal objects have different hashes, they
>> will be stored in different places in the hash map. Looking for
>> object1 will then not turn up with object2, even though they are equal.
>
> In this case this is the behavior I want.

You can't rely on it, though. The hash value gets reduced
modulo the size of the dict, so even if two objects have
different hashes, in some cases they will land on the same
dict slot anyway.

So an object such as you're postulating would behave
unpredictably when used as a dict key. Sometimes a lookup
using a different but equal object would find it, and
sometimes not, seemingly at random.

--
Greg

Peter Otten

unread,
May 21, 2011, 5:24:40 AM5/21/11
to
Gregory Ewing wrote:

I think for every potential match the current dict implementation checks
identity, then hashes -- and equality only if the hash values are equal. The
relevant function is lookdict() in dictobject.c.

While that means that the problem you describe cannot occur a simple
approach that avoids relying on an implementation detail (?) would be to use
(hash(obj), obj) tuples instead of just obj as the key.

John Nagle

unread,
May 21, 2011, 6:55:56 PM5/21/11
to
On 5/19/2011 11:33 PM, Ulrich Eckhardt wrote:

> For that reason, it is generally useful to use immutable types like
> integers, floats, strings and tuples thereof as keys. Since you can't change
> them, you basically have the guarantee that they hash the same.

Right. It's something of a lack that Python doesn't
have user-defined immutable objects.

John Nagle

Irmen de Jong

unread,
May 21, 2011, 7:07:28 PM5/21/11
to

collections.namedtuple is rather powerful though... and can be used as key AFAIK.

Irmen

Steven D'Aprano

unread,
May 21, 2011, 8:24:35 PM5/21/11
to


Although it's not that hard to write your own immutable (-ish) objects.

http://northernplanets.blogspot.com/2007/01/immutable-instances-in-python.html


Or see the Decimal and Fraction classes in the standard library.


--
Steven

0 new messages