Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

README and test files for the "perfect hashing" code re-posted some months ago

37 views
Skip to first unread message

Michael Herman

unread,
May 4, 1987, 8:47:31 AM5/4/87
to
This is a re-posting of the README, foo and foo.c files for the code
originally posted by keith@seismo quite some time ago (I had reposted
the code itself a few months ago).


# This is a shell archive. Remove anything before this line, then
# unpack it by saving it in a file and typing "sh file". (Files
# unpacked will be owned by you and have default permissions.)
#
# This archive contains:
# README foo foo.example

echo x - README
cat > "README" << '//E*O*F README//'
History:
On 10/7/83 C. Havener (cha...@genrad.UUCP) put a perfect hash
function finding program on the net. On 11/4/84 R. Dietrich
(rob...@tektronix.UUCP) put a Pascal version of the same program
on the net. Both of these programs were based on "More on Minimal
Perfect Hash Tables," Colorado State University Technical Report,
April 1981, by Cook, Curtis R. and Oldehoeft, R. R., and "Minimal
Perfect Hash Functions Made Simple" by Richard J. Cichelli - Comm.
of ACM Jan 1980.

These programs had the following limitations, some of which they shared,
some of which they didn't:
-- limited number of strings (40 or 110) per hash table.
-- slow to extremely slow run-time (large amounts of both
recursion and parallel array processing).
-- a requirement that the hash strings be in a certain order.
-- a fixed hashing algorithm.
-- inability to handle strings that hashed to identical values.

In a recent program I was doing, I had to hash approximately 500 different
keywords, preferably in one access. To do this, I ended up rewriting the
above programs. The attached version has fixed some of the limitations.
-- no keyword limit; I've used it to hash up to about 2400 entries.
-- it's several orders of magnitude faster.
-- it's fairly indifferent to the order the strings are presented in.
-- you can specify the keys to be hashed.
-- it handles strings that hash to identical values.

The basic algorithm is:

hash = the length of the string plus the sum of all of the
associated values of the key characters.

The program assigns values to the characters it is using for
hashing until a set of the values gives each keyword a unique
value.

For example: If the string was "thekid" and the key characters were
't' and 'd', with 't' having a value of 5 and 'd' having a value of
8, the hash value of "thekid" would be 19; 8 + 5 + strlen("thekid");

It takes two arguments --
-d -- print out some interesting debugging information.
-k -- use the following characters of the string in the hashing algorithm.
If the program was called as "perf -k137" and the string "abcdefgh"
was entered, the characters 'a', 'c', and 'g' would be used by the
hashing scheme. To make this easier, the program prints out a routine
at the end of its run that, when passed a string, will return the
correct hash value. Also, the special character '$' tells the program
to use the last character of every string. Default, if the k flag
isn't specified, is "-k1$", i.e. the first and last character of every
string.

followed by a filename containing the keywords.

Ex: suppose there's a file "foo" containing the keywords:
======================================================================
//E*O*F README//

echo x - foo
cat > "foo" << '//E*O*F foo//'
political
man
can
have
as
his
aim
the
realization
of
freedom
//E*O*F foo//

echo x - foo.example
cat > "foo.example" << '//E*O*F foo.example//'
======================================================================
the command "perf foo" produces the following:
======================================================================
52 -- political
44 -- man ** the keywords are listed
45 -- can with their hash values.
33 -- have
19 -- as
39 -- his
14 -- aim
26 -- the
61 -- realization
50 -- of
43 -- freedom

perf: 11 keywords.

******

static long ** followed by a routine that
hash(str) will return that value if
register char *str; passed the keyword as an argument.
{
static int cval[] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 12,
0, 10, 25, 0, 19, 0, 0, 0, 27, 11,
30, 23, 16, 0, 20, 17, 13, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0,
};
register long hval;
register int len;

switch(hval = (long)(len = strlen(str))) {
default:
hval += cval[(int)str[0]];
hval += cval[(int)str[len - 1]];
case 0:
return(hval);
}
}
//E*O*F foo.example//

exit 0

0 new messages