On Thu, Apr 17, 2014 at 5:53 PM, Ghais Issa <
ghais...@gmail.com> wrote:
> Thanks Salvatore
>
> We currently have 3065251 HLLs in production, stored in approximately 35GB.
> We are less sensitive to count/estimation latency when counting multiple
> keys, however we would certainly benefit from the speedup when looking up a
> single HLL (we do that in our real time analytics views)
Yes this is exactly how it works, PFCOUNT is very fast with single
key, a lot slower with N keys when merging is performed.
However the good news is that for sparsely populated HLLs I found (and
just committed) a very big speedup.
As far as raw speed is concerned, we have:
PFCOUNT at 800k ops/sec when it can use the cached value (almost
always possible in practice).
PFCOUNT with multiple *very* sparsely popultated keys at ~ 100k - 200k ops/sec
PFCOUNT with multiple very populated HLLs is very slow at ~ 2000 ops /
sec. It is terrible since it requires to access 16k registers into
different HLLs while working with floating point numbers...
Given this numbers, it is worth or not to work with this
implementation in your opinion / use case?
> I plan to start persisting new data in Redis 2.8.9 in the next couple of
> hours as well as load about 60 days worth of historic data soon after. I
> hope to get something measurable in the next day or 2. If I run into any
> issues I will certainly report them back.
>
> Again, thanks a lot.
You are welcome.
WARNING: All this new stuff we are talking about are only in the
*unstable* branch currently! :-)
I'll apply them into 2.8 after some more testing tomorrow.
The commits will cherry-pick without issues if you want it right now into 2.8.