Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hashing algorithm

0 views
Skip to first unread message

David

unread,
Sep 10, 2001, 4:56:26 PM9/10/01
to
 
Hi every one,
 
Does anyone know if there are some "best" hashing algorithm out there that converts a string into a number and guarantees the uniqueness of the number?
 
Thanks all.
 

Rufus V. Smith

unread,
Sep 10, 2001, 5:40:54 PM9/10/01
to
Just take your character code bits and treat them  as one long number.
 
Actually, I'm serious about that because there is really no practical
hashing code that does what you want.  In fact, what you are looking
for is against the whole philosophy of hashing.
 
Any "hash code" that "guaranteed uniqueness" would likely take more
bits than the original string (if you use all 256 possible characters in your
string).
 
The point of hashing is not to guarantee uniqueness, but to more evenly
distribute "hits" of, in this case strings, to entries in a lookup table.  It
helps speed database access and can reduce storage requirements and
possibly even out loads by distributing requests among resources.
 
Duplicate hash values for different strings are called collisions (I think),
and there are various ways to deal with them.
 
And there are no single "best" hashing algorithms.  It depends upon the
purpose.
 
Get a computer science book on databases.  Read about hashing.
 
" David" <dn...@netiq.com> wrote in message news:eaSF3rjOBHA.1552@tkmsftngp03...

Carl Daniel

unread,
Sep 10, 2001, 7:05:32 PM9/10/01
to
Depending on how serious you are about hashing:

1) Perl uses this one: h = 33*h+c where h = current hash and c = current
character. Start h at any value you want & iterate over your strings. This
one has the advantage of speed, the disadvantage that permutations of a
string will all hash to the same value.
2) CRC-32 - nearly as fast as the Perl hash, but more effective - fewer
undesirable properties. You can find code for CRC-32 all over the net -
just look.
3) SHA(1) or MD5 - Cryptographically strong hashes. The code for these is
not trivial, and the hashes they generate are large (160 bits or more).
Again, search the net for code. Crypto++ has implementations of both.

No hashing algorithm is "best" for every application - choose one which
meets your needs. No hashing algorithm will guarantee uniqueness - that's
anathema to the whole idea of hashing. Better hashes simply increase the
chance of uniqueness.

-cd


" David" <dn...@netiq.com> wrote in message
news:eaSF3rjOBHA.1552@tkmsftngp03...

Hi every one,

Bill Davy

unread,
Sep 11, 2001, 4:13:59 AM9/11/01
to
If (and only if) you know the set of strings you have to hash (as a compiler knows what its keywords are, or a calendar processor knows the names of the days of the week, and months of the year), it is possible to determine a "perfect" hash function.  Search for "optimal hash"
" David" <dn...@netiq.com> wrote in message news:eaSF3rjOBHA.1552@tkmsftngp03...
 

Rufus V. Smith

unread,
Sep 11, 2001, 10:23:00 AM9/11/01
to
Excellent point, Bill!  I hadn't considered the times when you
might know in advance all the strings to be hashed!  You
could certainly create a "guaranteed unique" hash.  Then the
optimizations are:  hash value bit-size, hashing function execution
speed,  hash code footprint.
 
0 new messages