CityHash in Guava

635 views
Skip to first unread message

cod...@gmail.com

unread,
Apr 26, 2014, 3:43:49 PM4/26/14
to guava-...@googlegroups.com
Hi,
It seems that CityHash was removed from Guava. I am curios as to why it was removed. Is there a chance it will be added back?

Thanks.

cod...@gmail.com

unread,
Apr 26, 2014, 3:51:36 PM4/26/14
to guava-...@googlegroups.com


On Saturday, April 26, 2014 12:43:49 PM UTC-7, cod...@gmail.com wrote:
Hi,
It seems that CityHash was removed from Guava. I am curious as to why it was removed. Is there a chance it will be added back?

Thanks.

Colin Decker

unread,
Apr 26, 2014, 5:53:24 PM4/26/14
to cod...@gmail.com, guava-...@googlegroups.com
Here's the issue relating to CityHash: https://code.google.com/p/guava-libraries/issues/detail?id=1232
--
--
guava-...@googlegroups.com
Project site: http://guava-libraries.googlecode.com
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: http://code.google.com/p/guava-libraries/issues/entry
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/3ac8c15a-3cb7-4e98-8d87-ef6310de5187%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cod...@gmail.com

unread,
Apr 26, 2014, 8:35:51 PM4/26/14
to guava-...@googlegroups.com, cod...@gmail.com
Thanks for pointing this out. I was thinking about using 128-bit CityHash to generate id (hash) for billions of documents to stored in db. The intent is content deduplication with very low accidental collision. SipHash implementation in Guava produces 64-bit output. So, it has a relatively high probability of collision for my corpus size. Is there a 128-bit implementation of SipHash? Otherwise, 128-bit murmur3 seems to be a better choice for now.

Martin Grajcar

unread,
Apr 27, 2014, 8:03:16 AM4/27/14
to cod...@gmail.com, guava-discuss
On Sun, Apr 27, 2014 at 2:35 AM, <cod...@gmail.com> wrote:
Thanks for pointing this out. I was thinking about using 128-bit CityHash to generate id (hash) for billions of documents to stored in db. The intent is content deduplication with very low accidental collision. SipHash implementation in Guava produces 64-bit output. So, it has a relatively high probability of collision for my corpus size. Is there a 128-bit implementation of SipHash? Otherwise, 128-bit murmur3 seems to be a better choice for now.

How bad are collisions in your case? Note that it's pretty easy to generate any number of collisions for 128 bit Murmur3. OTOH accidental collisions should not happen.

If speed is not very important I'd go for SHA-1 as GIT does. I guess that storing the hash in the DB takes much more time than the hashing, so there's not much point in optimizing hard.

Reply all
Reply to author
Forward
0 new messages