why kylin choose hyperloglog over bloom filter for estimation calculation
208 views
Skip to first unread message
聪樊
unread,
Oct 28, 2014, 11:49:13 PM10/28/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kylin...@googlegroups.com
Dear All
Recently, we are trying to solve distinct count problem(like UV, DAU). I am investigated HyperLogLog and Bloom filter. I am wondering why kyin use hll ? Have you guys do any comparison ?
Thanks, Lucas
Antonios Chalkiopoulos
unread,
Oct 29, 2014, 5:27:42 AM10/29/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kylin...@googlegroups.com
HyperLogLog is suited for cardinality ( count distinct elements in a set )
Bloom Filter is for membership checks !
So HLL can be like a 10 KByte file , but bloom filters can be hundred of MBytes ( they server different purposes )
聪樊
unread,
Oct 30, 2014, 3:30:13 AM10/30/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to kylin...@googlegroups.com
Thanks Antonios
But I think both HyperLogLog and Bloom Filter could be used for membership checks and cardinality estimation. But HyperLogLog has better performance.