Replace SuperFashHash with MurmurHash3 or CityHash?

824 views
Skip to first unread message

Albert J. Wong (王重傑)

unread,
Jun 11, 2013, 2:50:58 PM6/11/13
to Chromium-dev
Hey all,

Nasko and I were poking around our hash functions.  We were a bit surprised to find that we have the smhasher library in third_party, but we don't use either MurmurHash3 or CityHash for  base::Hash().

My hazy memory recalls that MurmurHash and CityHash are the current "state of the art" hash functions in terms of speed on modern CPUs and hash distribution.

Here's a few related posts:

Has anyone looked into replacing SuperFashHash with either MurmurHash or CityHash, either in Chromium or in Blink/Webkit? My feeling is that even if it's net-neutral perfwise, reducing the number of hash functions we have in the code base is a good thing.

-Albert

Albert J. Wong (王重傑)

unread,
Jun 11, 2013, 3:05:30 PM6/11/13
to Chromium-dev
Nico pinged me off thread wondering about alignment issues. Just found a post from Josh Haberman which talks a bit about it:


A particularly interesting bit is the discussion of CityHash on ARM.

-Albert

Scott Violet

unread,
Jun 11, 2013, 3:27:38 PM6/11/13
to Albert J. Wong (王重傑), Chromium-dev
It's my understanding from internal documents that CityHash is the
currently recommended choice. Of course I have no idea if the internal
documents I've seen are up to date.

-Scott

On Tue, Jun 11, 2013 at 11:50 AM, Albert J. Wong (王重傑)
<ajw...@chromium.org> wrote:
> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
>
>
>

Albert J. Wong (王重傑)

unread,
Jun 11, 2013, 3:43:41 PM6/11/13
to Scott Violet, Chromium-dev
Yes, that is what I saw as well.  Josh's analysis is interesting though because it discusses the perf of the implementations on a less uniform set of architectures and compilers which is why we can see CityHash doing silly stuff on ARM making MurMurHash possibly faster.  We need to do benchmarks.

If no one thinks this idea is dead-on-arrival, I'll probably get to exploring this at some point (unless someone beats me to it).

-Albert

Ricardo Vargas

unread,
Jun 11, 2013, 4:05:02 PM6/11/13
to WongJ.Albert(王重傑), Scott Violet, Chromium-dev
I have entertained the idea of changing the hash a few times but have not looked into it.

Note that currently base::Hash() is used by the disk cache so it cannot change without breaking current users (as in forcing the current caches to be dropped). (it moved to base from net). It also has the desired property of producing the same numerical value on multiple platforms so that a cache can be inspected on a different machine (architecture) than the one that generated it. Not a deal breaker but nice to have (there are some unit tests that depend on it)

That said, we are making non-backwards compatible changes on the cache code, so it is a good time to upgrade the hash :)

Albert J. Wong (王重傑)

unread,
Jun 11, 2013, 4:08:45 PM6/11/13
to Ricardo Vargas, Scott Violet, Chromium-dev
I was wondering if we were persisting hash values somewhere. :-/

For this usage, I propose we change the APIs slightly to do what Google3 does which is provide a Fingerprint function and a Hash function.  Hash should be for transient values (eg., hashtables) and Fingerprint can have a stricter stability guarantee.

-Albert

Alexei Svitkine

unread,
Jun 11, 2013, 4:42:17 PM6/11/13
to ajw...@chromium.org, Ricardo Vargas, Scott Violet, Chromium-dev
Is there a way we can measure the benefits of changing the implementation?

On Tue, Jun 11, 2013 at 4:08 PM, Albert J. Wong (王重傑)

Albert J. Wong (王重傑)

unread,
Jun 11, 2013, 4:50:25 PM6/11/13
to Alexei Svitkine, Ricardo Vargas, Scott Violet, Chromium-dev
On Tue, Jun 11, 2013 at 1:42 PM, Alexei Svitkine <asvi...@chromium.org> wrote:
Is there a way we can measure the benefits of changing the implementation?

Off the cuff, I would expect we do microbenchmarks on our major platforms and compilers.  Assuming those look sane, we could push the change out to ToT and watch the perf bots to get a rough feel.

Dana Jansens

unread,
Jun 11, 2013, 5:56:09 PM6/11/13
to Albert J. Wong (王重傑), Alexei Svitkine, Ricardo Vargas, Scott Violet, Chromium-dev
I would suggest using a provably good hash function, rather than a heuristics-suggest-its-good-for-some-data-sets function. http://opendatastructures.org/ods-cpp/5_3_Hash_Codes.html#SECTION00933000000000000000 describes such a hash function for strings.

On Tue, Jun 11, 2013 at 4:50 PM, Albert J. Wong (王重傑) <ajw...@chromium.org> wrote:
On Tue, Jun 11, 2013 at 1:42 PM, Alexei Svitkine <asvi...@chromium.org> wrote:
Is there a way we can measure the benefits of changing the implementation?

Off the cuff, I would expect we do microbenchmarks on our major platforms and compilers.  Assuming those look sane, we could push the change out to ToT and watch the perf bots to get a rough feel.

-Albert
 

On Tue, Jun 11, 2013 at 4:08 PM, Albert J. Wong (王重傑)
<ajw...@chromium.org> wrote:
> I was wondering if we were persisting hash values somewhere. :-/
>
> For this usage, I propose we change the APIs slightly to do what Google3
> does which is provide a Fingerprint function and a Hash function.  Hash
> should be for transient values (eg., hashtables) and Fingerprint can have a
> stricter stability guarantee.

+1 I like the sound of this, as it allows more freedom for tweaking in the future.
Reply all
Reply to author
Forward
0 new messages