base::Hash(), how robust is it supposed to be?

119 views
Skip to first unread message

Primiano Tucci

unread,
Jun 23, 2015, 8:46:23 AM6/23/15
to Chromium-dev
Hey chromium-dev,
after some hours of debugging I found out that the cause of my bug was a collision in base::Hash() 
I am aware that its comment says "// WARNING: This hash function should not be used for any cryptographic purpose" (I am using it for tracing purposes) but a bit surprised to find out that it extremely easy to collide and lacks the basic properties I'd expect from a hash function (changing few chars in a string should not end up with the same hash)

My question now is: is this WAI and I'm expecting too much from base::Hash, or should it be strengthened? 

Specific case below
LOG(ERROR) << 
base::Hash("27081:partition_alloc/thread_27081/buffer_partition/bucket_1536")
<< "  " <<
base::Hash("27081:partition_alloc/thread_27081/buffer_partition/bucket_2560");



Chris Hamilton

unread,
Jun 23, 2015, 9:04:58 AM6/23/15
to prim...@chromium.org, Chromium-dev
Maybe you've just gotten colossally unlucky? The hash in base is Paul Hsieh's SuperFastHash, and is *very* extensively tested and profiled. Although not cryptographically strong, it performs very well in many load balancing tests.


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Chris Hamilton

unread,
Jun 23, 2015, 9:08:09 AM6/23/15
to prim...@chromium.org, Chromium-dev
(I've contacted Paul Hsieh with the collision to see if this is a design flaw or truly just bad luck.)

Primiano Tucci

unread,
Jun 23, 2015, 9:13:03 AM6/23/15
to Chris Hamilton, Chromium-dev
Hmm not sure it'd call it "colossally unlucky". The scenario is the following: each trace point has around 200 entries, and they should not end up having the same hash (the uniqueness is required only in the scope of a single trace point, it is totally fine if they collide over time).
I can hit a collision every 2-3 traces, where each trace is long ~10 second, with a trace point every 250 ms.
In other words, doing the math, I can hit a collision within a set of 200 strings in just  (3 traces x 40 trace points) = 120 runs.

Ian Clelland

unread,
Jun 23, 2015, 10:24:24 AM6/23/15
to prim...@chromium.org, Chris Hamilton, Chromium-dev
Yeah, it's unlucky, but not colossally so -- this is exactly why this algorithm has that huge warning -- single bit changes in the input, especially late in the string, have very little effect on the final output (the avalanche effect is very small in the last two rounds).

In your example, you have pretty much the worst-case-scenario for collissions: two strings of the same length, which are identical up until the last two 32-bit blocks ("et_1" and "536" vs "et_2" and "560"). Even worse, the difference in the second-to-last block is only in the lowest two bits, so the hashes at that point only difffer by 4 bits out of 32.

The unlucky part is that one of the differences in the final block (the "3" vs "6") exactly cancels out two bits of that, and then the last difference ("6" vs "0") takes care of the rest, so the hashes become identical again. -- I'm not sure what the probability of that is, but I'm sure it's far greater than you'd expect from hashing a random pair of uncorrelated strings.

You might improve your odds against this even by reversing the string before hashing it: making the strings differ as early as possible might help, but that's just a guess. You might end up with the same situation, just with the hashes becoming identical earlier in the algorithm, and then staying in lockstep from there out.

Primiano Tucci

unread,
Jun 23, 2015, 10:44:31 AM6/23/15
to Ian Clelland, Chris Hamilton, Chromium-dev
Yeah figured out. Apparently it very easy to get collisions between two strings that differ only about the last 4 characters.
Did a quick experiment here http://pastebin.com/wifa6Puu 
I just switched to use SHA1 in tracing, not worth my time take the risk again for that specific purpose.
But I see that there are many other clients of base::Hash in the codebase and I wonder whether they can hit similar patterns at this point.

Daniel Murphy

unread,
Jun 23, 2015, 2:37:46 PM6/23/15
to prim...@chromium.org, Ian Clelland, Chris Hamilton, Chromium-dev
This is because of the "avalanching" of SuperFastHash:
/* Force "avalanching" of final 127 bits */
    hash ^= hash << 3;
    hash += hash >> 5;
    hash ^= hash << 4;
    hash += hash >> 17;
    hash ^= hash << 25;
    hash += hash >> 6;


This is one of the rare times that stackoverflow has a rather complete answer:
http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
(also commented on the bug)

Daniel Bratell

unread,
Sep 7, 2015, 12:07:20 PM9/7/15
to prim...@chromium.org, Ian Clelland, Daniel Murphy, Chris Hamilton, Chromium-dev
On Tue, 23 Jun 2015 20:36:41 +0200, Daniel Murphy <dmu...@chromium.org> wrote:

This is because of the "avalanching" of SuperFastHash:
/* Force "avalanching" of final 127 bits */
    hash ^= hash << 3;
    hash += hash >> 5;
    hash ^= hash << 4;
    hash += hash >> 17;
    hash ^= hash << 25;
    hash += hash >> 6;


This is one of the rare times that stackoverflow has a rather complete answer:
http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
(also commented on the bug)

In that description SuperFastHash doesn't come out on top in either performance or collisions so replacing SuperFastHash with one of the others (Murmur2?) might be a fun experiment. I wouldn't expect any measurable change, but using the best algorithm available can't be bad.

/Daniel

--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Alexei Svitkine

unread,
Sep 8, 2015, 11:22:30 AM9/8/15
to Daniel Bratell, prim...@chromium.org, Ian Clelland, Daniel Murphy, Chris Hamilton, Chromium-dev
We'd have to be careful not to break things that started to depend on base::Hash() implementation. For example, it seems a few places have been using it for sparse histogram logging - so those would need to be updated to use SuperFastHash() explicitly to preserve their behavior if base::Hash() were to change. There's probably other similar cases - so one would have to audit it's use.

(It is unfortunate that they ended up using base::Hash() in those places, but we shouldn't just break them.)

--

Daniel Bratell

unread,
Sep 8, 2015, 11:25:54 AM9/8/15
to Alexei Svitkine, prim...@chromium.org, Ian Clelland, Daniel Murphy, Chris Hamilton, Chromium-dev
On Tue, 08 Sep 2015 17:20:42 +0200, Alexei Svitkine <asvi...@chromium.org> wrote:

We'd have to be careful not to break things that started to depend on base::Hash() implementation. For example, it seems a few places have been using it for sparse histogram logging - so those would need to be updated to use SuperFastHash() explicitly to preserve their behavior if base::Hash() were to change. There's probably other similar cases - so one would have to audit it's use.

(It is unfortunate that they ended up using base::Hash() in those places, but we shouldn't just break them.)

Doh.

Scott Hess

unread,
Sep 8, 2015, 12:43:47 PM9/8/15
to Alexei Svitkine, Daniel Bratell, Primiano Tucci, Ian Clelland, Daniel Murphy, Chris Hamilton, Chromium-dev
If they have dependencies on a specific hashing implementation, they should be updated to depend on that specific hashing implementation _now_!

Maybe we should have random hash variation in base::Hash() to prevent this kind of dependency from creeping in in the future.

-scott

Reply all
Reply to author
Forward
0 new messages