dense hashmap benchmark

540 views
Skip to first unread message

Victor Toledo

unread,
Feb 19, 2011, 6:56:52 PM2/19/11
to google-sparsehash

Hi!
I'm using the dense hashmap with a string key for searching in a
300.000 set. In order to find an element with the [] or find() method
it takes about 500 milisecs.

Is anyone having the same times? I was not able to find a benchmark on
the net considering a string key and search times.

Best regards,
Victor.

Craig Silverstein

unread,
Feb 22, 2011, 3:09:42 AM2/22/11
to google-s...@googlegroups.com
} I'm using the dense hashmap with a string key for searching in a
} 300.000 set. In order to find an element with the [] or find()
} method it takes about 500 milisecs.

What hash function are you using? The performance of dense_hash_map
is particularly sensitive to the choice of hash function. If you're
using the gcc default hash function, try using something like
mumurhash (fastest) or fnv (least code) instead.

craig

vito...@gmail.com

unread,
Feb 22, 2011, 6:25:11 PM2/22/11
to google-s...@googlegroups.com
Hi Craig!
I'm using the standard one and also the fnv with no significant improvement. I'm using this for my Msc thesis so any help is very appreciated.

Sent from my BlackBerry® wireless device
--
You received this message because you are subscribed to the Google Groups "google-sparsehash" group.
To post to this group, send email to google-s...@googlegroups.com.
To unsubscribe from this group, send email to google-sparseh...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-sparsehash?hl=en.

Craig Silverstein

unread,
Feb 22, 2011, 7:04:45 PM2/22/11
to google-s...@googlegroups.com
} I'm using the dense hashmap with a string key for searching in a
} 300.000 set. In order to find an element with the [] or find()
} method it takes about 500 milisecs.

Hmm, may need more data to diagnose, then. If you look in
time_hash_map, in the sparsehash src directory, you can see a
structure that keeps track of how many hashes and key-comparisons are
being done. You could try adding diagnostics like this to your key,
or else just instrumenting your version of densehashtable.h directly,
to see if the problem is that it's requiring lots of lookups or
hashes.

If you can post some source code that illustrates the problem, that
might be helpful too. It would need to be a minimal, but runnable,
program, though.

craig

Victor Toledo

unread,
Feb 23, 2011, 10:55:49 AM2/23/11
to google-s...@googlegroups.com, Craig Silverstein
Excelent,

Working on the alternatives. As soon as I have result i will send you the outputs.


Regards!

Victor Toledo

unread,
Feb 23, 2011, 5:10:43 PM2/23/11
to google-s...@googlegroups.com
Hi Craig!
I found the leak. The method that is used to compare the keys is extremelly slow. I'm attaching the gprof file that i used to find this out. I implemented a version of a densehashtable.h (to put messages and check everything) and therefore i was able also to check that the problem is not in the lookups.

I tested all the string compare method available to make it faster, with no result. Do you have one that could accelerate the comparision?

Best regards,
Victor.


On 22 February 2011 21:04, Craig Silverstein <csil...@google.com> wrote:
gprof.data.5

Craig Silverstein

unread,
Feb 23, 2011, 6:26:59 PM2/23/11
to google-s...@googlegroups.com
} I tested all the string compare method available to make it faster,
} with no result. Do you have one that could accelerate the
} comparision?

Are you just using strcmp() to compare keys, or something else?

One method you could use, if it's just that your keys are very big
strings, so comparing them is very slow, is that you could fingerprint
your keys, and store the fingerprints in your set. (You may also need
a map to get from fingerprint back to original string.) Then
comparisons will be fast, though you'll have some risk of two strings
having the same fingerprint. A good fingerprint function can reduce
the odds.

craig

Victor Toledo

unread,
Feb 23, 2011, 6:37:28 PM2/23/11
to google-s...@googlegroups.com
Will work on that, thnkx!


craig

Amund Tveit

unread,
Jul 27, 2011, 3:30:18 AM7/27/11
to google-s...@googlegroups.com
On Tuesday, February 22, 2011 9:09:42 AM UTC+1, Craig Silverstein wrote:
} I'm using the dense hashmap with a string key for searching in a
} 300.000 set. In order to find an element with the [] or find()
} method it takes about 500 milisecs.

What hash function are you using?  The performance of dense_hash_map
is particularly sensitive to the choice of hash function.

Is sparse_hash_map as performance sensitive as dense_hash_map wrt selection of hash function?

Amund Tveit

Craig Silverstein

unread,
Jul 27, 2011, 3:51:46 AM7/27/11
to google-s...@googlegroups.com
} Is sparse_hash_map as performance sensitive as dense_hash_map wrt
} selection of hash function?

Yes, though since sparse_hash_map is optimized for size, not speed, it
tends not to matter as much. In both cases, though, I'd be wary of
using the identity hash without thinking about whether it will be ok.

craig

Reply all
Reply to author
Forward
0 new messages