HelioSearch - Actual Unique vs. Estimated Unique

瀏覽次數:21 次

Terrance Snyder

2014年12月14日 下午5:20:592014/12/14
I know this has come up a few times - but to support massively large datasets for uniques (this user id unique, session uniques etc) it would be very nice to include the stream-lib or similar library for estimated cardinality sketching. Something like HLL, HLL+, or similar so that we dont have Set<Object>

I don't mind adding this myself and putting a patch request - but I'd rather leave both options to compute 100% correct unique vs 98% correct unique

HLL background

Some libraries

Yonik, can you let me know (PM me if you want) what guidance you can give when adding something like this?
0 則新訊息