experimental sparsehash library

3 views

Skip to first unread message

MrMan

unread,

Dec 23, 2009, 2:39:02 PM12/23/09

to google-sparsehash

Since I am not too familiar with C++, I started using the experimental
C version of the google sparsehash library. Initially I was quite
happy with the performance when tested against large datasets because
of the low memory overhead. When I did further testing, anecdotally I
find that when there are a large number of unique strings in the
hashtable, it becomes unbearably slow. 10-100x slower than the
corresponding legacy hash table code that we use.

The relevant part of the code is as follows:

char key[100];
char record[MAX_BYTES_PER_RECORD];

analyses = AllocateHashTable(0, 1);

strncpy(key,record + (startcol - 1),length);
strcat(key, "\0");

bck = HashFindOrInsert(analyses, PTR_KEY(analyses,key), 0);
bck->data++;

Is there something that I'm doing wrong? Is this a known issue with
the implementation? Any help will be greatly appreciated.

Regards,
Abhikesh

Craig Silverstein

unread,

Dec 28, 2009, 6:07:56 PM12/28/09

to google-s...@googlegroups.com

} Is there something that I'm doing wrong? Is this a known issue with
} the implementation? Any help will be greatly appreciated.

Hmm, I doubt much is known about the experimental C implementation
anymore at all. (I wrote it, but haven't used it in over 10 years.)

But the behavior your see seems surprising to me. You might try
instrumenting the code to see how many probes it's doing during lookup
(how many times it iterates through its loop). Or maybe profile it
and see if the time is all being spent in memcmp or memcpy or
something.