I know this has come up a few times - but to support massively large datasets for uniques (this user id unique, session uniques etc) it would be very nice to include the stream-lib or similar library for estimated cardinality sketching. Something like HLL, HLL+, or similar so that we dont have Set<Object>
I don't mind adding this myself and putting a patch request - but I'd rather leave both options to compute 100% correct unique vs 98% correct unique
HLL background
Some libraries
Yonik, can you let me know (PM me if you want) what guidance you can give when adding something like this?