Using Redis for Real-Time Statistics

1,988 views
Skip to first unread message

Kimo

unread,
Apr 30, 2010, 12:14:51 PM4/30/10
to Redis DB
I'm working on a project where we need to offer real-time statistics
for visitors of product pages.
Here's an example of what we're looking to do: http://bit.ly/info/9Dyf63


The idea is to show the live visitor count per page.

A couple of metrics surround our web site:

Content Pages(Unique URLs): 5 Million
Unique Visitors per day: 1 Million
PIs: ~ 7 Million

Each product page looks roughly like this: /cellphone/iphone

Tabs similar to http://bit.ly/info/9Dyf63 with a histogram containing:
- minute aggregated visits (for the 12 hours)
- hourly aggregated visits (for the last 1 week)
- daily aggregated visits (all time - filter by date range)
- weekly aggregated visits (all time - filter by week range)
- month aggregated visits (all time - filter by month range)

Questions:
1. How can I store such data in Redis ?
2. How can I query this data ?
Some pseudo code or redis command examples would be very helpful.

thanks,

Kimo


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Makoto

unread,
May 1, 2010, 5:19:36 AM5/1/10
to Redis DB
Hi,

I am not an expert of Redis (I only attended Simon Willison's Redis
crash course last week), but the project I am involved does something
similar , so happy to share our experience. To be honset, I am not
happy with our current approach, so would love to hear feedback from
experts.

We first store the counter on redis using incrby.

incrby "iphone:counter" 1

Then every 10 minutes(you can do it for every one minute for your
case), we do store this result to mysql.

insert into counter_stats(product_name, counter, created_at)
values('iphone', 1, '2010/05/01:23:10:10:10')

This generated tons of duplicate data even when there are no counter
increment.

| iphone | 2 | 2010-04-28 16:30:04 |
| iphone | 3 | 2010-04-28 16:30:14 |
| iphone | 3 | 2010-04-28 16:30:24 |
| iphone | 3 | 2010-04-28 16:30:34 |
| iphone | 3 | 2010-04-28 16:30:45 |
| iphone | 3 | 2010-04-28 16:30:55 |
| iphone | 3 | 2010-04-28 16:31:05 |
| iphone | 3 | 2010-04-28 16:31:15 |
| iphone | 3 | 2010-04-28 16:31:25 |
| iphone | 3 | 2010-04-28 16:31:35 |
| iphone | 3 | 2010-04-28 16:31:45 |
| iphone | 3 | 2010-04-28 16:31:55 |
| iphone | 3 | 2010-04-28 16:32:05 |
| iphone | 3 | 2010-04-28 16:32:15 |
| iphone | 3 | 2010-04-28 16:32:25 |
| iphone | 3 | 2010-04-28 16:32:35 |
| iphone | 5 | 2010-04-28 16:32:45 |

I assume we took this approach because recording the counter data
requires massive write and mysql may not cope with it.


If that's the case, I think we should just write the couter logs to
mysql, but via some queueing (eg: amqp, beanstalk) to serialise the
write and offload the concurrent write.

In your scenario, I don't think storing all the historical data is the
strong point of Redis, because it may exceed your total memory size
soon (you could exceed memory size using "Virtual Memory", but not
sure how fast it is).

If displaying minute aggregate has to be real time, how about
incrementing the counter per product:timestamp and set ttl like this
(but also writes to mysql and generate aggregate data using cron job)?

incrby "iphone:counter:2010-04-28 16:32" 1
expire "iphone:counter:2010-04-28 16:32" 43200 (60 sec * 60 min * 12
hr)

When displaying you do like this.

keys "iphone:counter*"
mget (specify all the keys you just got above)

It might be more efficient if you use sets/lists, but I haven't used
them much, so advice from the more experts are appreciated. ;-)

If you don't like to store data into 2 different data stores (redis
and mysql), then you might want to look into alternatives like Mongodb
or Tokyo Cabinet. Mongo has quite rich query languages, and Tokyo is
known for fast read/write to disk (Disclaimer, I am heavily involved
in TC community). I am not sure how other NoSQL products fit into this
problem, but would love to hear opinions of other products,too.

Thanks.

Makoto

Gleicon Moraes

unread,
May 1, 2010, 11:36:21 AM5/1/10
to redi...@googlegroups.com
I use stats like this for this poc: http://github.com/gleicon/uurl
MongoDB store docs and stats, and I use redis for atomic operations (in this case a counter only, but I modified it to store tags and act as a analytics).

For redis, you could use the same approach as Simon's throttle control: use a key like iphone:YYYYMMDDHHMM and increment it. When you first create this key, add it to a set or ordered set called iphone::YYMMDD or any time resolution you might want. You can also cascade it as RRD tool, accumulating results in other sets and so on.

Hope it helps, will try to code a recipe for it later.

[]s

gm
--
More cowbell, please !
Reply all
Reply to author
Forward
0 new messages