Cardinality Estimation for last n minutes

72 views
Skip to first unread message

uij...@gmail.com

unread,
Aug 3, 2017, 9:01:40 AM8/3/17
to stream-lib-user
Hi,

Thanks for the wonderful work being done here. I am working on an application where I need to estimate the cardinality of users in last 60 minutes. I think I cannot use hyperloglog for that. There is this sliding hyperloglog which does exactly that but I could not find any implementation in java or scala. Do you guys have suggestions? Is there any other data structure to do this? 

Matt Abrams

unread,
Aug 3, 2017, 10:06:34 AM8/3/17
to stream-...@googlegroups.com
Your welcome! I'm not sure why you can't use HLL for this? Is the
idea you need to count the number of users in any arbitrary 60 minutes
(not on hourly boundaries)? Then if so create a HLL for each minute
and then aggregate the appropriate set of HLLs for the 60 minutes of
interest to you.

I'm not familiar with a sliding HLL. Do you have a reference paper
you can point us to?

Matt
> --
> You received this message because you are subscribed to the Google Groups
> "stream-lib-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stream-lib-us...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

usman ijaz

unread,
Aug 4, 2017, 9:08:45 AM8/4/17
to stream-lib-user
I actually simplified the problem. The objective is to estimate the cardinality at any time t over the the last w unit of time, where w is smaller then the time window W. With regular hyperloglog as you suggested, the number of HLL counters will increase significantly based on time granularity. If the window refresh rate is one minute, I will need 60 counters and if the refresh rate is one second, I will need 60*60 HLL counters.

My app is written in scala and that's the reason I am looking for a scala or java implementation. 

Matt Abrams

unread,
Aug 4, 2017, 9:15:15 AM8/4/17
to stream-...@googlegroups.com
Thanks for sharing the paper and sample implementations. We do not
have one in stream-lib but it would make a great addition.

Any volunteers?

Matt
Reply all
Reply to author
Forward
0 new messages