Hey Lucas, I'm not opposed to this in principle, but I'm a little leery
of using this hash solely due to its lack of track record and proven
collision-resistance. Being generous and saying, "This will reduce
export times by 10%," that means that a one-hour export with the current
hashing function would take 54 minutes with murmurhash. I'm not
convinced that the decreased confidence in the hash is worth that
marginal gain. I appreciate that it doesn't seem to collide with the
assets you've tried out, but that's anecdotal evidence.
If we can rigorously show that MurmurHash is as collision-resistant as
MD5 (within, say, 10%), then I'm totally on board. Otherwise, I'm not
convinced that this is a bottleneck worth widening by changing the hash
function.
-Joe
Hmm, that test suite looks pretty good, and if the results from it are
encouraging, that would go a long way towards me getting on board with this.
And doing some more reading, I see that it's being used in things like
memcached and other large-dataset applications. I'm starting to come
around to this idea :)
-Joe
Yeah, I'll be convinced if we get good numbers for it out of that test.
Thanks, Lucas!
OK, I'm down with this. The only remaining hurdle to clear is:
- is this as robust as md5 with a 64-bit hash returned (that is, is
taking the first 64 bits as valid as md5)?
- does this return the same hash for the same value on 32 vs. 64-bit
machines?
If the answer is yes, then let's DO THIS SHIT.
- is this as robust as md5 with a 64-bit hash returned (that is, is taking the first 64 bits as valid as md5)?