Cityhash64 for Strings

39 views
Skip to first unread message

mil...@adfin.com

unread,
Feb 12, 2014, 4:42:44 PM2/12/14
to supersonic-...@googlegroups.com
I just wanted to give you guys a heads up that I was able to get small (but nice) speed up from using CityHash64 for the string hashing. Using a dataset of 81M rows (one segment in our dataset) with 

273k distinct string values (domain names) being grouped the I was able to see 5% reduction in runtime. This is a 5% reduction of the total time -- including decoding / decompressing input columns / network traffic -- so a nice and easy gain.

I imagine it's a pretty simple switch for you guys since Google invented CityHash.

- Milosz
Reply all
Reply to author
Forward
0 new messages