why kylin choose hyperloglog over bloom filter for estimation calculation

208 views
Skip to first unread message

聪樊

unread,
Oct 28, 2014, 11:49:13 PM10/28/14
to kylin...@googlegroups.com
Dear All

Recently, we are trying to solve distinct count problem(like UV, DAU). I am investigated HyperLogLog and Bloom filter. I am wondering why kyin use hll ? Have you guys do any comparison ?

Thanks,
Lucas

Antonios Chalkiopoulos

unread,
Oct 29, 2014, 5:27:42 AM10/29/14
to kylin...@googlegroups.com
HyperLogLog is suited for cardinality ( count distinct elements in a set )

Bloom Filter is for membership checks !

So HLL can be like a 10 KByte file , but bloom filters can be hundred of MBytes ( they server different purposes )

聪樊

unread,
Oct 30, 2014, 3:30:13 AM10/30/14
to kylin...@googlegroups.com
Thanks Antonios

But I think both HyperLogLog and Bloom Filter could be used for membership checks and cardinality estimation. But HyperLogLog has better performance. 
Reply all
Reply to author
Forward
0 new messages