I'm experimenting with the HyperLogLog feature, and it doesn't seem to work the way I expected.
As far as I understand HyperLogLog can count unique items. I want to use it for checking the existence of an item in a stream (let's say item X in stream S).
Every time an item is added to the stream I PFADD the S .. X (string concat for stream and item, from Lua script) to a key.
Every time I want to check if an item is already in the stream I do:
1) PFMERGE the existing HLL key to a temporary key (also tried read HLL string, PFADD, put old string back; a PFEXISTS would be nice)
2) PFADD S .. X to the temporary key
3) If result is 0 the count is not updated, so I assume the combination already existed. If the result is 1 I assume the item/stream combination is new.
I get way too many exists, even after a few thousand items. Around 75% already exists, which is way too much for my dataset (first with ZSET it only had a small % already existing)
So, am I missing something, or is HLL not intended for this kind of stuff?
Thanks,
Mark