HyperLogLog to check existence

487 views
Skip to first unread message

breakbild

unread,
May 18, 2014, 6:17:29 PM5/18/14
to redi...@googlegroups.com
I'm experimenting with the HyperLogLog feature, and it doesn't seem to work the way I expected.

As far as I understand HyperLogLog can count unique items. I want to use it for checking the existence of an item in a stream (let's say item X in stream S). 

Every time an item is added to the stream I PFADD the S .. X (string concat for stream and item, from Lua script) to a key.

Every time I want to check if an item is already in the stream I do:
1) PFMERGE the existing HLL key to a temporary key (also tried read HLL string, PFADD, put old string back; a PFEXISTS would be nice)
2) PFADD S .. X to the temporary key
3) If result is 0 the count is not updated, so I assume the combination already existed. If the result is 1 I assume the item/stream combination is new.

I get way too many exists, even after a few thousand items. Around 75% already exists, which is way too much for my dataset (first with ZSET it only had a small % already existing)

So, am I missing something, or is HLL not intended for this kind of stuff?

Thanks,
Mark

Nikolay Mihaylov

unread,
Sep 12, 2014, 5:28:57 PM9/12/14
to redi...@googlegroups.com
I was about to ask similar functionality.

Why not add PF-IS-MEMBER ?

However to answer you, checking existence e.g. PF-IS-MEMBER, is not 100% reliable, this is why this command probably were not introduced in the first time...

>> I get way too many exists, even after a few thousand items. 
I did similar tests with my own library ( https://github.com/nmmmnu/CubicHyperLogLog ) probably Redis HLL is not configured OK , e.g. hll-sparse-max-bytes

I will do some tests next few days and post the results.
Reply all
Reply to author
Forward
0 new messages