1 view

Skip to first unread message

Sep 5, 2008, 10:38:41 AM9/5/08

to

Hi,

I need to hash arrays of integers (from the hash module).

So, I have to derive from array and supply a __hash__ method.

But how to hash an array (of fixed length, say 25)?

What I need is a function which maps a tuple of 25 integers

into 1 integer with good hashing properties.

Does anybody know such a thing?

Many thanks for a hint,

Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik

RWTH - Aachen University

D 52056 Aachen, Germany

Sep 5, 2008, 10:57:12 AM9/5/08

to

Helmut Jarausch wrote:

> I need to hash arrays of integers (from the hash module).

>

> So, I have to derive from array and supply a __hash__ method.

> But how to hash an array (of fixed length, say 25)?

> What I need is a function which maps a tuple of 25 integers

> into 1 integer with good hashing properties.

>

> Does anybody know such a thing?

Have you tried this already?

def __hash__(self):

return hash(self.tostring())

Peter

Sep 5, 2008, 11:18:52 AM9/5/08

to

Helmut Jarausch:

> I need to hash arrays of integers (from the hash module).

> I need to hash arrays of integers (from the hash module).

One of the possible solutions is to hash the equivalent tuple, but it

requires some memory (your sequence must not be tuples already):

assert not isinstance(somelist, tuple)

hash(tuple(somelist))

This is an alternative solution, it doesn't use much memory, but I am

not sure it works correctly:

from operator import xor

hash(reduce(xor, somelist))

Bye,

bearophile

Sep 5, 2008, 11:49:28 AM9/5/08

to

On Sep 5, 11:18 am, bearophileH...@lycos.com wrote:

> Helmut Jarausch:

>

> > I need to hash arrays of integers (from the hash module).

>

> One of the possible solutions is to hash the equivalent tuple, but it

> requires some memory (your sequence must not be tuples already):

> Helmut Jarausch:

>

> > I need to hash arrays of integers (from the hash module).

>

> One of the possible solutions is to hash the equivalent tuple, but it

> requires some memory (your sequence must not be tuples already):

why can't it be tuple already? Doesn't matter:

>>> from numpy import arange

>>> a=arange(5)

>>> a

array([0, 1, 2, 3, 4])

>>> hash(a)

Traceback (most recent call last):

File "<stdin>", line 1, in ?

TypeError: unhashable type

>>> b=tuple(a)

>>> b

(0, 1, 2, 3, 4)

>>> c=tuple(b)

>>> c

(0, 1, 2, 3, 4)

>>> hash(c)

1286958229

you can discard the tuple, so the memory requirement is transient.

Sep 5, 2008, 12:45:37 PM9/5/08

to

Michael Palmer:

> why can't it be tuple already?

> why can't it be tuple already?

Because if the input list L has tuples and lists, they end having the

same hash value:

>>> L = [[1,2,3], (1,2,3)]

>>> hash(tuple(L[0])), hash(tuple(L[1]))

(-378539185, -378539185)

But it's a not much common situation, and few hash collision pairs

can't damage much, so I agree with you that my assert was useless.

This may solve that problem anyway:

hash(type(L)) ^ hash(tuple(L))

Generally a good hashing functions uses all the input information. If

you use tuple() you ignore part of the input information, that is the

type of L. So xor-ing hash(type(L)) you use that information too.

> you can discard the tuple, so the memory requirement is transient.

Right, but there's lot of GC action, it may slow down the code. So you

can start using hash(tuple(L)), but if later the code performance

comes out bad, you may try a different version that creates less

intermediate garbage.

Bye,

bearophile

Sep 5, 2008, 4:58:00 PM9/5/08

to

On Sep 6, 2:45 am, bearophileH...@lycos.com wrote:

> Michael Palmer:

>

> > why can't it be tuple already?

>

> Because if the input list L has tuples and lists, they end having the

> same hash value:

> Michael Palmer:

>

> > why can't it be tuple already?

>

> Because if the input list L has tuples and lists, they end having the

> same hash value:

Perhaps the OP shouldn't be constructing the hash of a mutable object.

Perhaps he would be grateful if his hash function raised an exception

instead of laboriously masking the problem.

>

> >>> L = [[1,2,3], (1,2,3)]

> >>> hash(tuple(L[0])), hash(tuple(L[1]))

>

> (-378539185, -378539185)

>

> But it's a not much common situation, and few hash collision pairs

> can't damage much, so I agree with you that my assert was useless.

> This may solve that problem anyway:

>

> hash(type(L)) ^ hash(tuple(L))

Consider this:

>>> hash(123) == hash(123.0) == hash(123L)

True

Perhaps the OP (who hasn't stated what he is going to use the hash

results for) needs to use only the values in his hash, and would be if

not highly delighted then at least blithely unconcerned if it turned

out that [1, 2, 3] and (1, 2, 3) had the same hash.

>

> Generally a good hashing functions uses all the input information. If

> you use tuple() you ignore part of the input information, that is the

> type of L. So xor-ing hash(type(L)) you use that information too.

>

Try "uses all the information that is relevant to the task".

Your alternative solution using reduce and xor may have suboptimal

characteristics ... xor_hash((1, 2, 3)) == xor_hash((1, 3, 2)) ==

xor_hash((2, 1, 3)) etc. While the docs for __hash__ say "it is

advised to somehow mix together (e.g., using exclusive or) the hash

values for the components of the object", in practice "somehow" is

rather more elaborate than xor. Have a look at the tuplehash function

in .../Objects/tupleobject.c. If the order of the values in the tuple

doesn't matter, then perhaps the OP really should be using a set (or a

bag).

Cheers,

John

Sep 5, 2008, 5:49:21 PM9/5/08

to

John Machin:

> Consider this:>>> hash(123) == hash(123.0) == hash(123L)

> True

> Consider this:>>> hash(123) == hash(123.0) == hash(123L)

> True

Right... Can you explain me why Python designers have chosen to build

a hash() like that?

> Try "uses all the information that is relevant to the task".

My knowledge of hash data structures seems not enough to understand

why.

> Your alternative solution using reduce and xor may have suboptimal

> characteristics ...

Right, sorry.

Bye,

bearophile

Sep 5, 2008, 7:30:35 PM9/5/08

to

On Sep 6, 7:49 am, bearophileH...@lycos.com wrote:

> John Machin:

>

> > Consider this:>>> hash(123) == hash(123.0) == hash(123L)

> > True

>

> Right... Can you explain me why Python designers have chosen to build

> a hash() like that?

> John Machin:

>

> > Consider this:>>> hash(123) == hash(123.0) == hash(123L)

> > True

>

> Right... Can you explain me why Python designers have chosen to build

> a hash() like that?

I can't channel them; my rationalisation is this:

Following the Law of Least Astonishment,

>> 123 == 123.0 == 123L

True

Consequently if x == y, then adict[x] and adict[y] should give the

same result.

Cheers,

John

Sep 5, 2008, 7:55:27 PM9/5/08

to

Another reason for not folding in the type of the object is this:

>>> type([])

<type 'list'>

>>> hash(type([]))

505252536

>>> id(type([]))

505252536

IOW hash(T) == id(T) where T is a type. id(obj) is just a memory

address which can vary between executions of the same Python binary on

the same machine ... not very reproducible. There is no guarantee in

the docs for hash about under what circumstances hash(x) != hash(x) of

course; I'm just relying on the least astonishment law again :-)

And, again, we don't know what the OP's full requirements are ...

Sep 8, 2008, 4:53:02 PM9/8/08

to pytho...@python.org

bearoph...@lycos.com wrote:

> John Machin:

>> Consider this:>>> hash(123) == hash(123.0) == hash(123L)

>> True

>

> Right... Can you explain me why Python designers have chosen to build

> a hash() like that?

> John Machin:

>> Consider this:>>> hash(123) == hash(123.0) == hash(123L)

>> True

>

> Right... Can you explain me why Python designers have chosen to build

> a hash() like that?

Because that's the kind of hash that dicts expect. If two objects are equal

(i.e. (x==y) is True), they need to have their hash values equal as well. In

order to do a lookup into a dict, it will hash the key and search the table for

a that hash value. If there are multiple keys with the same hash value, then the

dict will compare the keys by value to find a match. Since (123==123.0==123L),

they must also have the same hash value such that

{123.0: 'got it'}[123] == 'got it'

--

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma

that is made terrible by our own mad attempt to interpret it as though it had

an underlying truth."

-- Umberto Eco

Sep 12, 2008, 2:05:56 AM9/12/08

to

On Sep 5, 9:38 am, Helmut Jarausch <jarau...@skynet.be> wrote:

> Hi,

>

> I need to hash arrays of integers (from the hash module).

>

> So, I have to derive from array and supply a __hash__ method.

> Hi,

>

> I need to hash arrays of integers (from the hash module).

>

> So, I have to derive from array and supply a __hash__ method.

I don't believe you need to derive from an array.

Here are two simple and well known hash functions you can use readily:

def djbhash(a):

"""Hash function from D J Bernstein"""

h = 5381L

for i in a:

t = (h * 33) & 0xffffffffL

h = t ^ i

return h

def fnvhash(a):

"""Fowler, Noll, Vo Hash function"""

h = 2166136261

for i in a:

t = (h * 16777619) & 0xffffffffL

h = t ^ i

return h

if __name__ == '__main__':

arr = [1001, 3001, 5001, 9001, 10011, 10013, 10015, 10017, 10019,

20011, 23001]

print djbhash(arr)

print fnvhash(arr)

And finally, here is an excellent page that explains hash functions:

http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Here is Noll's page where he explains the FNV Hash:

http://www.isthe.com/chongo/tech/comp/fnv/

Hope this helps,

--

Sudhi

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu