--> huh = bytes(5)
Since the bytes type is actually a list of integers, I would have
expected this to have huh being a bytestring with one element -- the
integer 5. Actually, what you get is:
--> huh
b'\x00\x00\x00\x00\x00'
or five null bytes. Note that this is an immutable type, so you cannot
go in later and say
--> huh[3] = 9
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'bytes' object does not support item assignment
So, out of curiosity, does anyone actually use this, um, feature?
~Ethan~
I suppose it's for interoperability with the mutable bytearray type,
which takes the same parameters in the constructor.
Producing a bytestring of zero bytes might have its uses, but because
Python lets me do coding at a high level (lists, dicts, etc), I've
never used that feature.
BTW, help(bytes) doesn't seem to mention it!
On 05/17/2011 02:47 PM, Ethan Furman wrote:
> In Python 3 one can say
>
> --> huh = bytes(5)
>
> Since the bytes type is actually a list of integers, I would have
> expected this to have huh being a bytestring with one element -- the
> integer 5. Actually, what you get is:
>
> --> huh
> b'\x00\x00\x00\x00\x00'
>
> or five null bytes. Note that this is an immutable type, so you cannot
> go in later and say
For the bytes to actually be a 'list of integers', you need to pass it
an iterable, ex:
>>> bytes([5, 6, 1, 3])
b'\x05\x06\x01\x03'
- From help(bytes):
| bytes(iterable_of_ints) -> bytes
| bytes(string, encoding[, errors]) -> bytes
| bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
| bytes(memory_view) -> bytes
Looks like you're using the fourth when you want the first, possibly?
- --
Corey Richardson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJN0tF1AAoJEAFAbo/KNFvp41AH/1l2zR6XVOJ0xM7s2P+PDYZX
OAhmi19hFEP0zQoWiW3TiMEVPlaqgtipPCp1t+jTeNNN3F+H4NG2DHJJZ3dPDr2J
CpABQKyS4MJQTUxhCIlXqAaA2I1pejzAv6fwsF66/zPFmyaTAJLDP+3WMQvCUUoZ
5A3qHgHNp6vBHXd13RNdQStLeprfQptA+z6XdiJPos348ecRj/u9id7v28dwxxsm
d9WA6oYwJ+Y/NcG2OP0Flyp3Zc3hymVsv5vhmhG2+EiIrxMn95k8ImsKLEhvUW3a
72CxlE6EaOMD4MuWyeGMS33c0vHwtAvEIE7M56R2FAl8EsUFwP2swaij0tEiemg=
=8MRV
-----END PGP SIGNATURE-----
So they do. Although that particular example doesn't work since b'0' is
the integer 48...
--> huh.replace(b'00',b'12')
b'\x00\x00\x00\x00\x00'
The big question, though, is would you do it this way:
some_var = bytes(23).replace(b'\x00', b'a')
or this way?
some_var = bytes(b'a' * 23)
~Ethan~
Not so.
--> huh = b'abcedfg'
--> huh[3]
101
It's a list of int's.
> - From help(bytes):
> | bytes(iterable_of_ints) -> bytes
> | bytes(string, encoding[, errors]) -> bytes
> | bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
> | bytes(memory_view) -> bytes
>
> Looks like you're using the fourth when you want the first, possibly?
Nope. Apparently, it's not well documented. If you check PEP 358
you'll find it.
~Ethan~
Nope, he's using the fifth form, bytes(int), which is listed in the
PEP but not in the help.
Actually, I would just do it this way:
some_var = b'a' * 23
That's already a bytes object. Passing it into the constructor is redundant.
On 05/17/2011 04:55 PM, Ethan Furman wrote:
> Apparently, it's not well documented. If you check PEP 358
> you'll find it.
>
> ~Ethan~
Agreed, it looks like it should be mentioned in bytes.__doc__ about the
single-integer argument.
- --
Corey Richardson
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJN0ugoAAoJEAFAbo/KNFvphIsH/3LOzN8+D98D0+nm6m8sfCUC
f+KfgTLITAmecYOuOBym1snl6qj2JnZkGYRW6M2O5NV8arNJ1dHty3dPbwMeKdfH
67m2a0UgHcwqv5M5VGNQQYTQ03Mzqy+A84MMvBKWUQ0nxZRCkPMtdxm2T4/UEVLx
uelDPOdOWB1PDmc3sNUDPovXeOFlTKmcQ5yfolyrdLFU/KmbamgRSltpBFEbyInO
4KI3hoGka4PVaaBLf9QPjFC6tBu4QdQ4UTnWD3sy78LA3KPsa5MEpXFXctwJkJ+O
q2Y7SWOPJDz19V+MT87Aeu69YpzxWwkp4fBflNxYaQUoJqNlzIfRkavUzZ0zfMQ=
=E2qm
-----END PGP SIGNATURE-----
Heh, good point.
~Ethan~
> BTW, help(bytes) doesn't seem to mention it!
I believe I mentioned that on some tracker issue.
--
Terry Jan Reedy
It will be when some volunteer writes a patch.
http://bugs.python.org/issue11231
--
Terry Jan Reedy
However, as I just discovered, it works well when dealing with a
bytearray object:
some_var = bytearray(b' ' * size) # want space initialized, not null
~Ethan~
I'm hoping somebody can tell me what horrible thing will happen if this
isn't the case? Here's a toy example of a class I'm thinking of writing
that will compare equal with int's, but hash differently:
--> class Wierd():
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) ** 3)
...
--> one = Wierd(1)
--> two = Wierd(2)
--> three = Wierd(3)
--> one
<Wierd object at 0x00BFE710>
--> one == 1
True
--> one == 2
False
--> two == 2
True
--> three == 3
True
--> d = dict()
--> d[one] = '1'
--> d[two] = '2'
--> d[three] = '3'
--> d
{<Wierd object at 0x00BFE710>: '1',
<Wierd object at 0x00BFE870>: '3',
<Wierd object at 0x00BFE830>: '2'}
--> d[1] = '1.0'
--> d[2] = '2.0'
--> d[3] = '3.0'
--> d
{<Wierd object at 0x00BFE870>: '3',
1: '1.0',
2: '2.0',
3: '3.0',
<Wierd object at 0x00BFE830>: '2',
<Wierd object at 0x00BFE710>: '1'}
--> d[2]
'2.0'
--> d[two]
'2'
All information greatly appreciated!
~Ethan~
> Several folk have said that objects that compare equal must hash equal,
> and the docs also state this
> http://docs.python.org/dev/reference/datamodel.html#object.__hash__
>
> I'm hoping somebody can tell me what horrible thing will happen if this
> isn't the case? Here's a toy example of a class I'm thinking of writing
> that will compare equal with int's, but hash differently:
>
> --> class Wierd():
> ... def __init__(self, value):
> ... self.value = value
> ... def __eq__(self, other):
> ... return self.value == other
> ... def __hash__(self):
> ... return hash((self.value + 13) ** 3)
> ...
Try this:
>>> d = {Wierd(1): 0}
>>> 1 in d
False
>>> 1 in d.keys()
True
This is the part considered "horrible":
> --> d
> {<Wierd object at 0x00BFE870>: '3',
> 1: '1.0',
> 2: '2.0',
> 3: '3.0',
> <Wierd object at 0x00BFE830>: '2',
> <Wierd object at 0x00BFE710>: '1'}
Compare:
>>> x = {5.0 : 'foo'}
>>> x[5]
'foo'
Here's a more common/plausible "horrible" case closer to what the docs
writers had in mind:
>>> class Naughty(object):
... def __init__(self, n):
... self.n = n
... def __eq__(self, other):
... return self.n == other.n
...
>>> Naughty(5) == Naughty(5)
True
>>> Naughty(5) is Naughty(5)
False
>>> bad = Naughty(3)
>>> y = {bad : 'foo'}
>>> y[bad] # just happens to work
'foo'
>>> del bad
>>> # ok, how do we get to 'foo' now?
>>> y[Naughty(3)] # try the obvious way
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: <__main__.Naughty object at 0x2a1cb0>
>>> # We're screwed.
Naughty instances (and similar) can't be used sensibly as hash keys
(unless you /only/ care about object identity; this is often not the
case).
Cheers,
Chris
--
http://rebertia.com
If you were familiar with what a hash map is, you wouldn't ask. The thing is
that the hash is used to look up the place in the map where the thing is
stored. If two equal objects have different hashes, they will be stored in
different places in the hash map. Looking for object1 will then not turn up
with object2, even though they are equal. If this is something you don't
care about, and all you care about is identity, then I'd derive the hash
from each object's ID.
This ID has another property which is something that is assumed for hashes,
and your code seems a bit to get that wrong, too, and that is that the hash
must not change. Again, the reason is that if the hash changes, the position
in the hash map changes, too. If you then try to look up the changed object,
it will look for it in the new place, but it won't be found because it is in
the old place.
For that reason, it is generally useful to use immutable types like
integers, floats, strings and tuples thereof as keys. Since you can't change
them, you basically have the guarantee that they hash the same.
Uli
--
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
There can always be hash collisions between different objects, but the
assumption is that two identical objects will _always_ "collide".
Chris Angelico
I tried this sequence (using Python 3, BTW -- forgot to mention that
little tidbit -- sorry!):
--> del two
--> two
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'two' is not defined
--> d
{<__main__.Wierd object at 0x00C0C950>: '3',
1: '1.0',
2: '2.0',
3: '3.0',
<__main__.Wierd object at 0x00B3AC10>: '2',
<__main__.Wierd object at 0x00B32E90>: '1'}
--> new_two = Wierd(2)
--> d[new_two]
'2'
~Ethan~
My apologies -- I'm trying this in Python3:
--> two in d
True
--> two in d.keys()
True
-->
--> 2 in d
True
--> 2 in d.keys()
True
~Ethan~
In CPython, no. Even when you get a hash collision, the code checks
to see whether the hashes are actually equal before it calls the rich
comparison, the former check being a much faster operation since the
hash values are cached.
I'm not sure whether this can be counted on for all Python
implementations, though.
In this case this is the behavior I want.
> If this is something you don't
> care about, and all you care about is identity, then I'd derive the hash
> from each object's ID.
This won't work, as objects of the same type that compare equal should
(and do, in my code) hash equal.
> This ID has another property which is something that is assumed for hashes,
> and your code seems a bit to get that wrong, too, and that is that the hash
> must not change.
The hash does not change on the instances, and is the same for all
instances of my type that compare equal.
~Ethan~
Two things I didn't make clear originally:
I'm using Python3.
My objects (of type Wierd) obey the premise of comparing equal also
meaning hashing equal (with other objects of type Wierd).
Perhaps my question could be narrowed down to:
Should the docs actually say that "objects of the same *type* that
compare equal must hash equal", or is there an underlying reason that
objects of *different types* that happen to compare equal *must not*
have different hashes?
In other words, is the fact that everything tried so far in Python3 to
break my toy code has failed to do so just an implementation detail of
Python3?
~Ethan~
Doesn't matter anyway.
> --> del two
> --> two
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'two' is not defined
> --> d
> {<__main__.Wierd object at 0x00C0C950>: '3',
> 1: '1.0',
> 2: '2.0',
> 3: '3.0',
> <__main__.Wierd object at 0x00B3AC10>: '2',
> <__main__.Wierd object at 0x00B32E90>: '1'}
> --> new_two = Wierd(2)
> --> d[new_two]
> '2'
Right, this is why I went to the trouble of writing Naughty instead of
using Wierd. Wierd's exact problem is less common and less severe. The
"equality implies identical hash" rule is not a universal one; some
other languages instead impose the lesser requirement of "equality and
same (or related) types implies identical hash". In Ruby for instance:
irb(main):001:0> 1 == 1.0
=> true
irb(main):002:0> a = {1.0 => 'hi'} # float key
=> {1.0=>"hi"}
irb(main):003:0> a[1] = 'bye' # int key
=> "bye"
irb(main):004:0> a # notice how they don't collide:
=> {1=>"bye", 1.0=>"hi"}
(Contrast this with my earlier analogous Python example.)
Basically, Naughty is fundamentally broken [hash(Naughty(2)) !=
hash(Naughty(2))], whereas Wierd merely defies convention [hash(2) !=
hash(Wierd(2)) but hash(Wierd(2)) == hash(Wierd(2))].
I don't think 'moduloed' is an existing word but your description is
mostly correct. The hash of the object and length of the hash table are
used to calculate the position in the hash table. However Python's
implementation doesn't use buckets to reduce memory usage and pointer
dereferencing. If a slot in the hash table is already filled with an
object that is not equal to the new object (a collision), the hash is
shifted and the new slot is checked. The implementation detail is well
described in Modules/dictobject.c.
Christian
> My apologies -- I'm trying this in Python3:
Then you have to convert the keys to a list explicitly:
>>> class Weird:
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return self.value == other
... def __hash__(self):
... return hash((self.value + 13) **3)
...
>>> d = {Weird(1): 0}
>>> 1 in d
False
>>> 1 in d.keys()
False
>>> 1 in list(d.keys())
True
Ah!! The light finally dawned! Many thanks for everyone's input.
So if Wierd has a need to compare equal to some other type, it should
implement a .equals() method. Gotcha.
Likewise, if two different type's instances can compare equal, then for
the most part they should be interchangeable.
~Ethan~
> On 20/05/2011 20:01, Christian Heimes wrote:
>> Am 20.05.2011 17:50, schrieb MRAB:
>>> Is this strictly true? I thought that the hash value, an integer, is
>>> moduloed (Is that how you spell it? Looks weird!) ...
>>
>> I don't think 'moduloed' is an existing word but your description is
>> mostly correct. ...
>>
> A brief search on the web found a use of the word in 1982.
All that means is that two people, three decades apart, used the same non-
word :)
I think you are treating "modulo" as a verb, equivalent to division,
hence:
a/b => a is divided by b
a%b => a is "moduloed" by b
But modulo is not a verb. It is a preposition, a modifier word. Just as
you might say "the cat sat on the mat" (cat on mat) or "the Princess
found a pea underneath her mattress" (pea underneath mattress) so
mathematicians will say "a is taken modulo b" (a modulo b).
English verbs nouns at the drop of a hat, but I've never heard of it
verbing propositions:
"The princess underneathed the pea."
No, I don't think so.
English does use "remainder" as a verb, although not in the mathematical
sense; I think that:
a%b => a is remaindered by b
is at least grammatical, although still ugly and awkward. I'm afraid that
in English, the best way to say what you are trying to say is moderately
verbose:
"the hash value, an integer, is taken modulo ..."
--
Steven
Nevertheless, it is still ungrammatical and incorrect usage. I'm not a
prescriptivist, but not everything people write down is a word, otherwise
we'd be forcefied to say evert typlo and mystake wsa an actul wrd.
--
Steven
You can't rely on it, though. The hash value gets reduced
modulo the size of the dict, so even if two objects have
different hashes, in some cases they will land on the same
dict slot anyway.
So an object such as you're postulating would behave
unpredictably when used as a dict key. Sometimes a lookup
using a different but equal object would find it, and
sometimes not, seemingly at random.
--
Greg
I think for every potential match the current dict implementation checks
identity, then hashes -- and equality only if the hash values are equal. The
relevant function is lookdict() in dictobject.c.
While that means that the problem you describe cannot occur a simple
approach that avoids relying on an implementation detail (?) would be to use
(hash(obj), obj) tuples instead of just obj as the key.
> For that reason, it is generally useful to use immutable types like
> integers, floats, strings and tuples thereof as keys. Since you can't change
> them, you basically have the guarantee that they hash the same.
Right. It's something of a lack that Python doesn't
have user-defined immutable objects.
John Nagle
collections.namedtuple is rather powerful though... and can be used as key AFAIK.
Irmen
Although it's not that hard to write your own immutable (-ish) objects.
http://northernplanets.blogspot.com/2007/01/immutable-instances-in-python.html
Or see the Decimal and Fraction classes in the standard library.
--
Steven