bitmapist: Powerful realtime analytics with Redis 2.6's bitmaps and Python

525 views
Skip to first unread message

amix

unread,
Oct 25, 2012, 2:53:14 PM10/25/12
to redi...@googlegroups.com
Hi group

I just released a library for Python that uses Redis 2.6's bitmaps to enable powerful realtime analytics that can answer following questions:
  • Has user 123 been online today? This week? This month?
  • Has user 123 performed action "X"?
  • How many users have been active have this month? This hour?
  • How many unique users have performed action "X" this week?
  • How many % of users that were active last week are still active?
  • How many % of users that were active last month are still active this month?
You can read more about it on my blog:

Or fetch the code on GitHub:

Happy hacking!

Best regards,
Amir

Salvatore Sanfilippo

unread,
Oct 25, 2012, 4:31:41 PM10/25/12
to redi...@googlegroups.com
Well done Amir! Thank you

Salvatore

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/tIquE9TWzwAJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.



--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defence against complexity.
       — David Gelernter

Sergei Tulentsev

unread,
Oct 25, 2012, 5:03:08 PM10/25/12
to redi...@googlegroups.com
I believe this library assumes that user ids are (not large) integers. This rules out analytics for facebook games, doesn't it? :)

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/tIquE9TWzwAJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.



--
Best regards,
Sergei Tulentsev

Salvatore Sanfilippo

unread,
Oct 25, 2012, 5:14:57 PM10/25/12
to redi...@googlegroups.com
On Thu, Oct 25, 2012 at 11:03 PM, Sergei Tulentsev
<sergei.t...@gmail.com> wrote:
>
> I believe this library assumes that user ids are (not large) integers. This rules out analytics for facebook games, doesn't it? :)

It is probably straightforward to attach an intermediate layer that
translates any newly seen ID (in any arbitrary long form) into a
successive numerical zero-based ID. This of course will consume
another key (or hash field at least) per ID seen.

Cheers,
Salvatore

Sergei Tulentsev

unread,
Oct 25, 2012, 5:29:46 PM10/25/12
to redi...@googlegroups.com
But if we had probabilistic sets (proposed a few days ago), we wouldn't need any intermediate layer, and also our memory saving would be substantial. Just saying :)
Best regards,
Sergei Tulentsev

Dvir Volk

unread,
Oct 25, 2012, 5:32:36 PM10/25/12
to redi...@googlegroups.com
I've done just that in my python bitmaps based analytics lib.
we use it to map userIds which are not even numeric.

example usage:

the lib:

Dvir Volk
Chief Architect, Everything.me

Sergei Tulentsev

unread,
Oct 25, 2012, 5:35:14 PM10/25/12
to redi...@googlegroups.com
Yeah, I did it too, but in ruby and I can't show the code, it's internal :)
Best regards,
Sergei Tulentsev

bugant

unread,
Oct 30, 2012, 4:48:37 AM10/30/12
to redi...@googlegroups.com
Hi,

On Thu, Oct 25, 2012 at 11:32 PM, Dvir Volk <dvi...@gmail.com> wrote:
> I've done just that in my python bitmaps based analytics lib.
> we use it to map userIds which are not even numeric.

Thank you very much for sharing your code!
If I get it right, you're using an hash to map real user ID to
sequential ones. So, for every user you have a key in the hash right?

How often do you reset the hash?

Dvir Volk

unread,
Oct 30, 2012, 6:06:19 AM10/30/12
to redi...@googlegroups.com
> Thank you very much for sharing your code!
with pleasure...

> If I get it right, you're using an hash to map real user ID to
> sequential ones. So, for every user you have a key in the hash right?
>
> How often do you reset the hash?

currently - never :)
in theory you could use first class keys and not hashes and expire
them, so frequent visitors will never be reset and less frequent will
be automatically garbage collected. It's a fairly small change that
won't add much memory overhead if you're careful.

bugant

unread,
Oct 30, 2012, 7:20:03 PM10/30/12
to redi...@googlegroups.com
On Tue, Oct 30, 2012 at 11:06 AM, Dvir Volk <dvi...@gmail.com> wrote:
> in theory you could use first class keys and not hashes and expire
> them, so frequent visitors will never be reset and less frequent will
> be automatically garbage collected. It's a fairly small change that
> won't add much memory overhead if you're careful.

hmm... what I was thinking for was the case where some users were
active in a first period and then become not-active.
In this case new users, for which there is no sequential ID, will get
a greater one thus making the use of high-order bits more usual than
their low counterparts. Is this the case?

This would lead to an not optimal usage of memory.

Bitmap are used here since they perform better than sets, right? So
what I was thinking of was to dump those bitmap to set on a (say)
daily basis... this way it would be possible to reset the mapping and
keep the bitmap memory under control.

What do you think?

Cheers,
matteo.
Reply all
Reply to author
Forward
0 new messages