rolling time windows for StreamSummary?

67 views
Skip to first unread message

Robert Doherty

unread,
Feb 20, 2015, 8:59:06 AM2/20/15
to stream-...@googlegroups.com
Hi,

Just getting started with stream-lib and am very impressed with the performance.

For my application, I am using StreamSummary, and need to keep track of rolling counts of objects. For example, a rolling 5min window that is updated every, say 15 seconds.

One idea would be to replace the integer representing counts with an data structure of length totalWindowLength/updateFreq. 

Has anyone looked into implementing this and have any suggestions for me? I browsed through previous comments and nothing jumped out at me.

Thank you!


Matt Abrams

unread,
Feb 20, 2015, 9:38:37 AM2/20/15
to stream-...@googlegroups.com
Hey Robert -

Glad you are checking out Stream-Lib.  I'm not 100% I understand the question so if I'm off please point me in the right direction.  Short of modifying StreamSummary directly can you just can you create a wrapper object that meats your needs? For example

Create a StreamSummary for each time bucket (offering all events for the most recent updateFreq seconds).  To get rolling sum combine most recent n StreamSummary objects and then take topK from that?

Matt







--
You received this message because you are subscribed to the Google Groups "stream-lib-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stream-lib-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Doherty

unread,
Feb 23, 2015, 12:32:30 PM2/23/15
to stream-...@googlegroups.com
Thanks for the response, Matt.

To give a little context, I am implementing a count estimate of events streaming through a Storm topology. The peak total throughput of the system is about 12k events/second.
I need to have a rolling count of unique items for five and sixty minutes, updated at, say, 15 second intervals.

I had been thinking that I would need to make changes to the StreamSummary class to allow storing a rolling queue of counts for each item, where old counts expire every 15s (e.g. for the 5-minute window, there would be a queue of 300s/15s=20 counts for each item).

I ended up going with the simpler approach you suggested of creating a StreamSummary for each 15s interval, and in current testing I found that the memory requirements are still well within acceptable levels.

I'm still experimenting with different configurations, but again, fantastic, well-documented library!

Thanks-
Reply all
Reply to author
Forward
0 new messages