redis: how calculate average values

Oleg Ruchovets

unread,

Jul 10, 2013, 6:08:35 AM7/10/13

to redi...@googlegroups.com

Hi ,

I got the events 50-100 every second.

The event structure : event_id | event_type | event_value | time-stamp

For example: ev_id_1 | type1 | 50 |12341234324

ev_id_2 | type2 | 70 |12341235324

I need to calculate averages of each event based on event_type and time_window.

For example : average of type1 during 15 minutes , 1hours , 4 hours , 1day.

Question:

1) I didn't find that redis support average calculation out of the box. If I missed something please share a link or example.

2) What is the most efficient way to model and implement average calculation in my case. (I use jedis -- it is java client)

Thanks in advance.

Oleg.

Rodrigo Ribeiro

unread,

Jul 10, 2013, 11:14:37 AM7/10/13

to redi...@googlegroups.com

Hi Oleg,

Receiving 100/s you can have 8.640.000 events a day, and depending on the event_type cardinality, aggregating it all at query time can be a problem.

An solution is to use a bigger resolution(like 1min).

An way to do it is using two temporarily counters for each event_type, an for the number of events on that period of time, other for the sum of the event values.

Then each minute you dump the value of each in two sorted set, again, one with counter, other with the sum.

The timestamp would be the sorted-set score of both zset, and the value(counter/sum) as the member.

Then you can easily query an time window using both sorted lists, and they would be much smaller and faster to retrieve.

This way the number of event per second isn`t a problem at query time.

With sorted-set you can easily discard old data too.

* You can probably only use the sorted-set(avoiding the temporally counters) using zincrby, but its time complexity is higher(O(log(n)) vs O(1)).

--

Rodrigo Pereira Ribeiro
Software Developer

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.

Felix Gallo

unread,

Jul 10, 2013, 12:11:11 PM7/10/13

to redi...@googlegroups.com

One way to do it would be to consider time as a series of periods of size {your smallest desirable query granularity}, eg. 15 minutes. There is always one open period; the one we're currently in. All the others are closed periods (no more events will be added, as they are in the past). Whenever a period closes, start a new period, and also sum up whatever important values (e.g. event_value) in the last period. As periods close, add the summed values to lists representing appropriately-sized larger buckets of periods (e.g. '10/15/2012' would contain 96 15-minute period sums). You can make those hierarchical if you need to cache for added speed (e.g., '10/2012' might contain 2976 15-minute period sums). If you feel especially frisky, you can precompute all your math at this time as well per bucket and store those (e.g. '10/15/2012:event_count', '10/15/2012:event_value:highwater_mark', '10/15/2012:event_value:average') in case people want to look at those bucket values directly.

Redis is a bare metal database construction kit. Orient your mind that way and you will be well served.