HyperLogLog with TailCut space reduction

73 views
Skip to first unread message

Matt Abrams

unread,
Jul 7, 2017, 4:42:11 PM7/7/17
to stream-...@googlegroups.com
There is a relatively new paper and sample (Go) implementation of
HyperLogLog with TailCut space reduction making the rounds. I'd love
to see a Java implementation added to stream-lib. Any takers?

https://github.com/axiomhq/hyperloglog

http://cse.seu.edu.cn/PersonalPage/csqjxiao/csqjxiao_files/papers/INFOCOM17.pdf


Thanks,
matt

seif...@gmail.com

unread,
Jul 17, 2017, 5:40:21 PM7/17/17
to stream-lib-user
Author here... I am about to take a stab later on this month. But feel free to beat me to it :D

Matt Abrams

unread,
Jul 21, 2017, 1:02:27 PM7/21/17
to stream-...@googlegroups.com
That would be great! I may work on it during an upcoming hackathon
but I'm not sure how far I'll get.

Great paper BTW.

matt
> --
> You received this message because you are subscribed to the Google Groups
> "stream-lib-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stream-lib-us...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Nate Natarajan

unread,
Jul 25, 2017, 11:34:53 AM7/25/17
to stream-lib-user
Hi,

There are two algorithms proposed in the paper HLL-TC and HLL-TC+. HLL-TC has the same error rate as HLL but provides 20% savings memory whereas HLL-TC+ reduces the error and also saves 45% in memory compared to vanilla HLL.

A couple of questions here,

HLL-TC seems to be very close to HLL with regards to estimation, so it does seem like it can support unions (this is also validated in the paper) whereas HLL-TC+ seems to be using an interesting MLE calculation each time the base register changes, I can't see how this would allow unions.

The second question is how duplicates are handled. It looks to me duplicates will not affect the result if they show up before base register is increased for an offset, but what if a value v1 is duplicated like this v1 -> something else -> base register updated and offsets adjusted -> v1 shows up again. Does this matter?

Regards,
Nat
P.S. Thanks Matt for pointing this paper to me

Seif Lotfy

unread,
Jul 25, 2017, 12:03:13 PM7/25/17
to stream-...@googlegroups.com
Oh nice… I am the author of the Go library not the paper… but I agree nice paper :D
You received this message because you are subscribed to a topic in the Google Groups "stream-lib-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stream-lib-user/FkRR44EO6-o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stream-lib-us...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages