Hash function: migrating from Python 2.5 to Python 2.7

148 views
Skip to first unread message

Andrin von Rechenberg

unread,
Feb 20, 2012, 4:50:57 AM2/20/12
to appengin...@googlegroups.com, google-a...@googlegroups.com
The built-in hash function differs from Python2.5 to Python2.7.

If you are planning to migrate from Python 2.5 to Python 2.7 and
you were dumb enough to store the value of python 2.5's hash
function, just like I did, you might be interested in the following
code snippet.

This is the hash function that Google uses in production for python 2.5:


def c_mul(a, b):
 
return eval(hex((long(a) * b) & 0xFFFFFFFFFFFFFFFFL)[:-1])

def py25hash(self):
 
if not self:
   
return 0 # empty
  value
= ord(self[0]) << 7
 
for char in self:
    value
= c_mul(1000003, value) ^ ord(char)
  value
= value ^ len(self)
 
if value == -1:
    value
= -2
 
return value

I wonder why Google is not using 64bit hashes in Python 2.7

Cheers,
-Andrin

Anand Mistry

unread,
Feb 20, 2012, 6:15:20 AM2/20/12
to google-a...@googlegroups.com, appengin...@googlegroups.com
Where is this hash function coming from and what is it used for? For python27, we haven't made any changes to the interpreter/standard library to change the hash function. What you see on python.org is what you get.

Andrin von Rechenberg

unread,
Feb 21, 2012, 5:55:55 AM2/21/12
to google-a...@googlegroups.com, appengin...@googlegroups.com
Actually it's signed:

This is the hash function that Google uses in production for python 2.5:


def c_mul(a, b):
 
return eval(hex((long(a) * b) & (2**64 - 1))[:-1])


def py25hash(self):
 
if not self:
   
return 0 # empty
  value
= ord(self[0]) << 7
 
for char in self:
    value
= c_mul(1000003, value) ^ ord(char)
  value
= value ^ len(self)
 
if value == -1:
    value
= -2

 
if value > 2**63:
    value
-= 2**64
 
return value


On Mon, Feb 20, 2012 at 12:15 PM, Anand Mistry <ami...@google.com> wrote:
Where is this hash function coming from and what is it used for? For python27, we haven't made any changes to the interpreter/standard library to change the hash function. What you see on python.org is what you get.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/aDME7qOnZsYJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Andrin von Rechenberg

unread,
Feb 21, 2012, 9:12:08 AM2/21/12
to google-appe...@googlegroups.com, google-a...@googlegroups.com
Ups, there was a small bug (again):


def c_mul(a, b):
 
return eval(hex((long(a) * b) & (2**64 - 1))[:-1])


def py25hash(self):
 
if not self:
   
return 0 # empty
  value
= ord(self[0]) << 7
 
for char in self:
    value
= c_mul(1000003, value) ^ ord(char)
  value
= value ^ len(self)
 
if value == -1:
    value
= -2

 
if value >= 2**63:

    value
-= 2**64
 
return value
Reply all
Reply to author
Forward
0 new messages