Range histograms in Redis - proposal/POC

273 views
Skip to first unread message

Yevgeny Kurliandchick

unread,
Jan 28, 2015, 1:00:44 PM1/28/15
to redi...@googlegroups.com

Josiah Carlson

unread,
Jan 29, 2015, 11:50:42 PM1/29/15
to redi...@googlegroups.com
I can see the usefulness of this, but histograms are pretty easy to implement in Lua, which would make them usable and backwards compatible to Redis 2.6. Heck, you can get a decent-ish implementation right now with a small modification to my windowed rate limiting code [1].

There are other more space-efficient ways of doing it in Lua (a little slower, which is why I used the implementation I did in [1]).

 - Josiah



On Wed, Jan 28, 2015 at 10:00 AM, Yevgeny Kurliandchick <kyev...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Yevgeny Kurliandchick

unread,
Jan 30, 2015, 6:59:00 PM1/30/15
to redi...@googlegroups.com
Hi Josiah, thanks for the response !

Like I wrote in the "Motivation" section of the github page, its is definitely possible to do histograms over Redis today by a variety of means with LUA scripts being one of them, however I believe it would benefit the user base to have something like this standardized and added to the core feature set of Redis which can then in turn be relied upon by the client libraries to provide a high-level histogram API to applications and services.

I also think that having these commands implemented natively is optimal from performance, memory usage and complexity standpoints - no LUA overhead,  one key per histogram, the entire histogram is of fixed size once created and contiguous in memory, multiple samples can be added with one command.

Regarding having this in 2.x, I don't see why this would be a problem as adding the proposed commands will leave the API fully backward-compatible and the histograms are implemented over string values so no data compatibility issues are to be expected either, I will add a 2.6/2.8 based branch to the repo if there is interest in this.


On Wednesday, January 28, 2015 at 8:00:44 PM UTC+2, Yevgeny Kurliandchick wrote:

Itamar Haber

unread,
Jan 31, 2015, 2:16:27 PM1/31/15
to redi...@googlegroups.com
Definitely interesting IMO - I'd like to see more statistical tools and data structures make their way into Redis.

There's a balance between adding more commands/DTs to Redis and implementing them with Lua. I find efficiency and optimization arguments oddly compelling but some may have different views on the matter. The current way to go about discussing these types of suggestion, IIUC, is at the redis-dev mailing list so you may want to move this discussion there. 

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber | Chief Developers Advocate
Redis Watch Newsletter - Editor and Janitor
Redis Labs - Enterprise-Class Redis for Developers

Mobile: +1 (415) 688 2443
Mobile (IL): +972 (54) 567 9692
Email: ita...@redislabs.com
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Yevgeny Kurliandchick

unread,
Jan 31, 2015, 5:17:33 PM1/31/15
to redi...@googlegroups.com
thanks, the "mailing list" links on http://redis.io/community point to this group but now I see there is a https://groups.google.com/forum/#!forum/redis-dev, so will do

Josiah Carlson

unread,
Feb 2, 2015, 12:57:25 PM2/2/15
to redi...@googlegroups.com
I am by no means the arbiter of what does and doesn't make it into Redis, I'm just expressing my thoughts here, my replies inline.

On Fri, Jan 30, 2015 at 3:59 PM, Yevgeny Kurliandchick <kyev...@gmail.com> wrote:
Hi Josiah, thanks for the response !

Like I wrote in the "Motivation" section of the github page, its is definitely possible to do histograms over Redis today by a variety of means with LUA scripts being one of them, however I believe it would benefit the user base to have something like this standardized and added to the core feature set of Redis which can then in turn be relied upon by the client libraries to provide a high-level histogram API to applications and services.

I read that section, and I didn't find it motivated me. The reason why it didn't motivate me is simple: there are countless concepts that can be built on top of the existing Redis data structures, and binned-counters (or histograms, if you want to call them that) are just one of literally thousands. To elevate this particular use-case to the point of getting commands that are *different* from standard Redis commands (which I will describe in a moment), without showing a large number of users out in the wild with this as a need... and I'm just not seeing this as being compelling.

What do I mean by "different"? All other existing Redis commands can operate on an empty key, no initialization required. Specifically, if you want to increment a number stored in a string, you call INCR. That works whether the key existed or not. Ditto with RPUSH, HSET, HINCRBY, SADD, ZADD, ZINCRBY, PFADD, and others. Note that aside from commands that *read*, all structures are able to be added to and manipulated without explicit initialization. But because this histogram *doesn't* know what to do with data until you've initialized it and stored configuration data, you can't use it until it is initialized. This makes it fundamentally different from all other structures in Redis today.

I would even argue that because of that initialization step to set bin sizes and total range of data to be collected, this really feels like configuration is a higher-level concept, and should be embedded in the using language, and arguably just implemented on top of a hash. Something like:

class Histogram(object):
    def __init__(self, conn, key, start, stop, step):
        self.conn, self.key, self.start, self.stop, self.step = conn, key, start, stop, step
    def incr(self, index, value, conn=None):
        if index < self.start or index > self.stop:
            return
        (conn or self.conn).hincrby(key, (index - self.start)//self.step, value)
    def values(self, conn=None):
        out = {}
        for k, v in (conn or self.conn).hgetall(self.key).items():
            start = int(k) * self.step + self.start
            out[start,start+self.step-1] = int(v)
        return out

hist_a = Histogram(conn, 'histogram-1', 0, 99, 5)
hist_a.incr(34, 3)
hist_a.values() # would return {(30, 34): 3}

Incidentally, that's a version that leaves configuration and implementation of binning data to the client language, and keeps all data in a single hash key. Is it as time/space-efficient as the C version you built, or as space-efficient as Lua equivalent to what you built? No. But it is available and usable by everyone using Redis 2.0+, and almost can't be used wrong. No initialization necessary (because configuration is built into the client-side implementation), no additional commands necessary, and no additional maintenance load for Salvatore.

I also think that having these commands implemented natively is optimal from performance, memory usage and complexity standpoints - no LUA overhead,  one key per histogram, the entire histogram is of fixed size once created and contiguous in memory, multiple samples can be added with one command.

None of what you have described requires C. As I just replied to another user, GETRANGE, SETRANGE, and the struct library are sufficient to build literally anything overlaid on top of a Redis string. In this case (and in the other thread I just replied to), it's an array with some metadata. It's about 5-10 lines of Lua to read or write individual indexes in an array, with 5-10 more for increment, and another 5-10 to read the entire histogram.

Performance-wise, of course C is going to be faster, use less memory, etc. But not every bit of functionality should be included in Redis, especially when the argument for implementing it in Redis is "because it would be faster if it was implemented inside Redis". That's circular reasoning.

Most of the time when trying to solve a specific problem, people have a specific request rate that they need to hit. Say 100 qps, 1000qps, etc. The hash-based version that I listed above will be able to run at the speed of any other individual hash increment command, doesn't require Lua scripting, and doesn't require changes to Redis itself. The only drawback? It may use up to ~2x as much memory per histogram compared to the C version you implemented if all bins are being used.

Regarding having this in 2.x, I don't see why this would be a problem as adding the proposed commands will leave the API fully backward-compatible and the histograms are implemented over string values so no data compatibility issues are to be expected either, I will add a 2.6/2.8 based branch to the repo if there is interest in this.

You don't understand what I was saying. The C implementation has no business being backported to 2.6 or 2.8, and I don't think it should even be in 3.0. But a Lua script version is *trivially* executable in 2.6, 2.8, and 3.0, without changes. Why did I mention this? Because when you write it in Lua, not only do you save Salvatore the maintenance burden, you make the code available to everyone running Redis *immediately*.

 - Josiah

On Wednesday, January 28, 2015 at 8:00:44 PM UTC+2, Yevgeny Kurliandchick wrote:

Yevgeny Kurliandchick

unread,
Feb 2, 2015, 5:49:56 PM2/2/15
to redi...@googlegroups.com
Regarding the creation semantics - you are absolutely right about this being different from other redis commands and that's been bugging me as well
- I plan to fix that by having HISTADD or a variation of it take range+buckets as it's first arguments and create a histogram with
those settings if one does not yet exist.

Regarding reasoning - there's no circularity here, the reason for *implementing it in C inside redis*
is that it's the most efficient way to do it, the reason for having this feature in the first place is because I believe
it will be useful to many existing redis users by and by having it part of the core command set will make it easier to bring it further
up-stack for tools and applications, I agree that there needs to be *high* support and expected adoption for this for it to be worth
Salvatore's time.

I understand that loadable modules support was considered and rejected in principle for redis which is too bad imo as it would
have been the most natural form for this and many similar features
Reply all
Reply to author
Forward
0 new messages