Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

333 views

Skip to first unread message

Dec 3, 1990, 9:46:12 AM12/3/90

to inews

in <8...@compnect.UUCP>, john...@compnect.UUCP (John Core ) writes:

> > av...@netcom.UUCP (Avery Colter) writes:

> >

> > >I'm seeing all this talk of "Hash Functions".

> >

> College programming courses seem to never deal with the REAL (ie buisness)

> world. ...

>

> --just another old fart telling a war story...

> > av...@netcom.UUCP (Avery Colter) writes:

> >

> > >I'm seeing all this talk of "Hash Functions".

> >

> College programming courses seem to never deal with the REAL (ie buisness)

> world. ...

>

> --just another old fart telling a war story...

Well, I never learned much about hashing when I went to school (more years

ago than I want to think about ;-). I know the concept of hashing, but have

never had a use for it in the applications I have worked on. My question

is, are there any recomendations out there for a good, fairly basic book on

hashing? I don't need to know all the gory details and the latest hot-shot

methods, just the basics.

--

Chuck Tryon

<bi...@bisco.kodak.com>

USmail: 46 Post Ave.;Roch. NY 14619 B. Baggins

<<...include standard disclamer...>> At Your Service

"Swagger knows no upper bound, but the laws of physics remain unimpressed."

(D. Mocsny)

Dec 3, 1990, 4:54:43 PM12/3/90

to

In article <901203144...@bisco.kodak.COM> bi...@bisco.kodak.COM (Charles Tryon) writes:

> My question

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

> My question

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

Knuth spends a section on hashing. The presentation is good and has all

the basics, but doesn't go into the latest hot-shot methods.

---Dan

Dec 4, 1990, 9:20:17 AM12/4/90

to

Thanks for the reference. Can someone recommend a book that does go into

all the latest "hot-shot" methods?

Tim Lynch

Dec 4, 1990, 9:48:09 PM12/4/90

to

> ... My question

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

Almost any university text on data structures will have a section on hash

searching. See for instance:

"Fundamentals of Data Structures" by Horowitz and Sahni

"Data Structures Techniques" by Standish

"Data Structures" by Reingold & Hansen

The first is the most readable of the three, and the last is the most

complete. Hashing can be used on almost any computerized search

problem, not just string searching problems, and usually performs very

well. I have even used hashing to search for matching states in a

parsing automaton. But it may take some imagination and skill to apply

it to your specific problem.

--

Dan Salomon -- sal...@ccu.UManitoba.CA

Dept. of Computer Science / University of Manitoba

Winnipeg, Manitoba, Canada R3T 2N2 / (204) 275-6682

Dec 4, 1990, 7:37:28 PM12/4/90

to

In article <1990Dec4.1...@batcomputer.tn.cornell.edu> ly...@batcomputer.tn.cornell.edu (Tim Lynch) writes:

> Thanks for the reference. Can someone recommend a book that does go into

> all the latest "hot-shot" methods?

> Thanks for the reference. Can someone recommend a book that does go into

> all the latest "hot-shot" methods?

Have there been any fundamentally new hashing methods in the last twenty

years? The only one I know of is Pearson's.

What are you really looking for? There are lots of string functions that

are both fast and, in practice, better than random hashing. These days I

start from h = 5381, and set h = h << 5 + h + c mod any power of 2 for

each new character c. Apparently Chris Torek prefers a multiplier of 31:

h << 5 - h + c. These are reliable and extremely fast. You can make them

even faster if you want to hash an entire string at once: just

precompute powers of 31 or 33, etc.

---Dan

Dec 15, 1990, 7:14:05 PM12/15/90

to

> Well, I never learned much about hashing when I went to school (more years

> ago than I want to think about ;-). I know the concept of hashing, but have

> never had a use for it in the applications I have worked on. My question

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

> ago than I want to think about ;-). I know the concept of hashing, but have

> never had a use for it in the applications I have worked on. My question

> is, are there any recomendations out there for a good, fairly basic book on

> hashing? I don't need to know all the gory details and the latest hot-shot

> methods, just the basics.

Hashing: taking a long key (often a string) and returning a short key (often

an integer in some range) that tries to be different for every long key.

This is obviously impossible in general, and you get "collisions" where

different long keys map to identical short keys.

For a good hash function, collisions are no more likely than if you used random

numbers. If you know all the input keys ahead of time, you can work out a hash

function that has no collisions. If every short key is used (the N possible

long keys hash to 0..N-1), you have a "perfect hash."

A common application of hash functions is hash tables, where you want to store

some records indexed by long key and get access to them quickly. So you keep

an array of records and index it by the hash function to find the record you

want. Handling collisions can be done in various ways. I like "chaining"

where you make each entry a linked list and do linear search along them.

There's also extensible hashing, where you copy everything to a bigger table

and arrange things so there are no collisions. Then there's linear probing,

useful for tables that never need deletion. You search forwards from the

hash position until you find an empty slot. If you're writing, put it here.

If you're reading, this means that the value in question has never been

written. Simple, but its performance gets Very Bad as the table fills up.

Then there's double hashing, where you search forwards in steps of a second

hashing function, and otherwise works the same (this works very well), uniform

hashing (where the hash function produces a permutation of the table buckets

to search in - double hashing is a very close approximation), and other tricks.

More unusual applications involve traditional indexing techniques (you don't

fill up the space as much), but you store a hash of the key because it's more

compact and quicker to compare. Diff uses this to compare lines... it hashes

a line of text and compares them first, then double-checks with a full text

compare if the lines are equal.

Another interesting variant was used in a version of spell(1), although I don't

know if it's the current one or not. The dictionary of correctly spelled words

was hashed into a large hash space (27 bits?) and the result stored as a

bitmap. No, not as 2^24 = 16 Megabytes, but the gaps between adjacent bits,

with some higher-level indices. that let you check if bit #4392859 is set

fairly quickly. This has the possibility of missing a misspelled word, but

it's very rare and a "stop list" of words it fails to catch was added to

deal with exceptions that actually were noticed in practice.

For integers to integers, input%outputrange works fairly well if you arrange

for outputrange to be prime. For strings, hash = fiddle(hash)+*p++ is popular.

Someone just posted fiddle(x) = x*31 and x*33. (You range reduce mod 2^n

for suitable n eventually.)

One that was published in CACM recently was to set up a table with a random

permutation of 0..255 and loop over hash = table[hash^*p++];. If you set up

the table more methodically, you get an 8-bit CRC of the input string, but

any random permutation works well.

Knuth (section 6.4) mentions multiplicative hashing (take some of the middle

bits of constant*input).

That was quick, but that's the guts of the thing.

0 new messages

Search

Clear search

Close search

Google apps

Main menu