Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Stupid old brain question, how to measure a hash function's quality

14 views

Skip to first unread message

Earl_Colby_Pottinger

unread,

Feb 8, 2012, 10:38:45 PM2/8/12

Okay, after a long time I am starting my work on a compressing VIRTUAL-
RAM-DRIVE again.

The main reason is to test BFS ability to handle large hard drives and
large files (ie. Drives/Files 4TB and larger).

The non-compressing version I presently am using uses a sparse model
that only needs memory when a data_track has data written into it.
However, my 4GB machine is starting to have problems once the virtual
drive is greater than 2TB. I need to compress the data_tracks in
memory to model larger drives.

To do that I have looked at a number of compression schemes, but I
have gotten interested in the compression schemes that are modeling
disk drives and in-memory compression in particular. See:

http://www.cs.umass.edu/~yannis/compression.pdf
http://robertdick.org/publications/yang10feb.pdf
http://www.usenix.org/event/usenix05/tech/general/full_papers/tuduce/tuduce.pdf
http://www.usenix.org/event/usenix99/full_papers/wilson/wilson.pdf
http://web.mit.edu/6.033/2000/www/reports/dp1-kenlu/
http://www.aicit.org/jcit/ppl/vol2_no4_07.pdf
http://www.jormas.com/~vesuri/xpk/
http://code.google.com/p/compcache/

Anyway the question: How do I test the quality of a hash function? The
way I am thinking of doing it is to feed the 149GB test image (this is
my base work image I use to test functions) into the hash function and
then increment each hashed position by one. Presently I am using a
2^24 hash_table. If the hash function works well like I hope
(unlikely) there should be a count of 149*2^30/2^24 = 149*2^6 = 9536
in each hash cell.

http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
uses this simple test.

What I think I need to do to get a simple value as a measure is to add
up the absolute difference between values in the cells vs the expected
average value (in this case 9536). This will probably be a very large
number so I would divide it by the number of cells in the hash_table.
Good idea?

I also tried reading http://burtleburtle.net/bob/hash/doobs.html and
while I understand the code of the hash functions, I just can't seem
to understand how his testing score work. Can someone give me a clue?

0 new messages