Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
range of hash function
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Grégoire Seux  
View profile  
 More options Feb 20 2010, 3:54 am
Newsgroups: fa.caml
From: Grégoire Seux <kamaradclim...@gmail.com>
Date: Sat, 20 Feb 2010 08:54:34 UTC
Local: Sat, Feb 20 2010 3:54 am
Subject: [Caml-list] range of hash function

hello !

i would like to use the polymorhpic hash function on strings. But i would
like to know what is the probability of a collision between two hashes.

my first question is about the range of the Hashtbl.hash function: what is
its range ? ( string -> [1..N] ?)

the second question is : can i assume that the result is a uniform
distribution over [1..N] ? (for 10⁶ words which is an estimation of the
english vocabulary size)

the third one is : is it possible to predict which will be the collision ? I
mean collisions are between words which are very 'similar' (for ex: "boy"
and "boys") or are completely unpredictable.

thanks !

--
Grégoire Seux

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Goswin von Brederlow  
View profile  
 More options Feb 20 2010, 5:52 am
Newsgroups: fa.caml
From: Goswin von Brederlow <goswin-...@web.de>
Date: Sat, 20 Feb 2010 10:52:18 UTC
Local: Sat, Feb 20 2010 5:52 am
Subject: Re: [Caml-list] range of hash function

Grégoire Seux <kamaradclim...@gmail.com> writes:
> hello !

> i would like to use the polymorhpic hash function on strings. But i would like
> to know what is the probability of a collision between two hashes. 

> my first question is about the range of the Hashtbl.hash function: what is its
> range ? ( string -> [1..N] ?)

It is range [min_int..max_int]. Meaning 31 or 63 bits.

> the second question is : can i assume that the result is a uniform distribution
> over [1..N] ? (for 10? words which is an estimation of the english vocabulary
> size)

If the function is any good it will be reasonably uniform. And if it is
crap then I bet someone have replaced it already.

> the third one is : is it possible to predict which will be the collision ? I
> mean collisions are between words which are very 'similar' (for ex: "boy" and
> "boys") or are completely unpredictable.

You can read the source, analyze the math and try to figure out a
collision. From past experience every widely used hash function so far
has been shown to be vulnerable. Like md5 is now considered compromised
due to the ease of creating a collision. Sha1 is deemed unsafe too. It
is just a matter of how much brainpower and cpu time you want to throw
at it. Or pure bad luck.

Is the Hashtbl.hash the same function used to hash polymorphic variant
types? If so then there are collisions between words which are verry
similar. I knew of an example once but google isn't verry helpfull
finding it again.

MfG
        Goswin

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ygrek  
View profile  
 More options Feb 20 2010, 7:40 am
Newsgroups: fa.caml
From: ygrek <ygrekhere...@gmail.com>
Date: Sat, 20 Feb 2010 12:40:25 UTC
Local: Sat, Feb 20 2010 7:40 am
Subject: Re: [Caml-list] range of hash function
On Sat, 20 Feb 2010 11:52:05 +0100
Goswin von Brederlow <goswin-...@web.de> wrote:

> Grégoire Seux <kamaradclim...@gmail.com> writes:

> > hello !

> > i would like to use the polymorhpic hash function on strings. But i would like
> > to know what is the probability of a collision between two hashes. 

> > my first question is about the range of the Hashtbl.hash function: what is its
> > range ? ( string -> [1..N] ?)

> It is range [min_int..max_int]. Meaning 31 or 63 bits.

> > the second question is : can i assume that the result is a uniform distribution
> > over [1..N] ? (for 10? words which is an estimation of the english vocabulary
> > size)

The best way is to test yourself. Or rely on someone else's tests :)

http://alaska-kamtchatka.blogspot.com/2009/06/hash-n-mix.html

> You can read the source, analyze the math and try to figure out a
> collision. From past experience every widely used hash function so far
> has been shown to be vulnerable. Like md5 is now considered compromised
> due to the ease of creating a collision. Sha1 is deemed unsafe too. It
> is just a matter of how much brainpower and cpu time you want to throw
> at it. Or pure bad luck.

Hashtbl.hash function is not required to be cryptographically strong, but rather
uniformly random on "usual" input data and fast.

--
 ygrek
 http://ygrek.org.ua

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jacques Garrigue  
View profile  
 More options Feb 20 2010, 8:54 am
Newsgroups: fa.caml
From: Jacques Garrigue <garri...@math.nagoya-u.ac.jp>
Date: Sat, 20 Feb 2010 13:54:27 UTC
Local: Sat, Feb 20 2010 8:54 am
Subject: Re: [Caml-list] range of hash function

From: Grégoire Seux <kamaradclim...@gmail.com>

> i would like to use the polymorhpic hash function on strings. But i would
> like to know what is the probability of a collision between two hashes.

> my first question is about the range of the Hashtbl.hash function: what is
> its range ? ( string -> [1..N] ?)

Just to get things straight: this is 0..2^30-1 (0..0x3fffffff).
The result of the hash function is the same on 32-bit and 64-bit
architectures.

> the second question is : can i assume that the result is a uniform
> distribution over [1..N] ? (for 10⁶ words which is an estimation of the
> english vocabulary size)

The algorithm for strings is as follows:

      i = caml_string_length(obj);
      for (p = &Byte_u(obj, 0); i > 0; i--, p++)
        hash_accu = hash_accu * 19 + *p;

So you can assume an uniform distribution for sufficiently long
strings.

> the third one is : is it possible to predict which will be the collision ? I
> mean collisions are between words which are very 'similar' (for ex: "boy"
> and "boys") or are completely unpredictable.

Since you have the algorithm, you can predict collisions. Basically
shifting character n by 1 is equivalent to shifting character n+1 by
19, so you have lots of collisions. But this hash function being
intended for hashtables, collisions are not a problem, uniform
distribution matters more.

By the way, for polymorphic variants collisions matter, and the hash
function is different. The range is 31-bits rather than 30-bits, and
the factor is 243, so that names of no more than 4 characters are
guaranteed to be different. You still have collisions, but they are
going to be less similar.

Both hash functions are defined in byterun/hash.c.

Hope this helps,

Jacques Garrigue

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Richard Jones  
View profile  
 More options Feb 21 2010, 6:10 pm
Newsgroups: fa.caml
From: Richard Jones <r...@annexia.org>
Date: Sun, 21 Feb 2010 23:10:16 UTC
Local: Sun, Feb 21 2010 6:10 pm
Subject: Re: [Caml-list] range of hash function

On Sat, Feb 20, 2010 at 10:54:12PM +0900, Jacques Garrigue wrote:
> Since you have the algorithm, you can predict collisions. Basically
> shifting character n by 1 is equivalent to shifting character n+1 by
> 19, so you have lots of collisions. But this hash function being
> intended for hashtables, collisions are not a problem, uniform
> distribution matters more.

I would slightly dispute your assertion that collisions are not a
problem, because of algorithmic complexity attacks:

http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/inde...

The hash implementation used by Perl was changed to avoid this attack:

http://perl5.git.perl.org/perl.git/blob/HEAD:/perl.c#l1465

Rich.

--
Richard Jones
Red Hat

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »