Two Feature Proposals re ZSETs

44 views
Skip to first unread message

Dvir Volk

unread,
Oct 25, 2012, 10:37:37 AM10/25/12
to redi...@googlegroups.com
Hi all
I know at least one of my proposals was discussed before, but I wanted to re-raise the them, as these are very common use cases for me, working with statistical data in sorted sets.

1. Product aggregation function in ZINTERSTORE. I know this can be done using sum of logs, and when doing ZUNIONSTORE it's undefined how to treat members that are not in both sets - as 0 or 1.
But for ZINTERSTORE it would be very useful to be able to multiply the matching scores. I wrote a (very very small) patch for this long ago, and I don't see a reason why not to do this.

2. ZSUM. This is of course useful for turning counts into probabilities, without having to fetch the entire set to the client side, or keeping a sum as one of the members or a separate key, which adds overhead to insertion.
I know it can be done in Lua. But if the sorted set itself tracks its sum, it can be done in O(1) time with hardly any time overhead for insertion.

I will gladly implement both.
What do you guys think?


--
Dvir Volk
Chief Architect, Everything.me

Josiah Carlson

unread,
Oct 25, 2012, 12:33:30 PM10/25/12
to redi...@googlegroups.com
On Thu, Oct 25, 2012 at 7:37 AM, Dvir Volk <dvi...@gmail.com> wrote:
> Hi all
> I know at least one of my proposals was discussed before, but I wanted to
> re-raise the them, as these are very common use cases for me, working with
> statistical data in sorted sets.
>
> 1. Product aggregation function in ZINTERSTORE. I know this can be done
> using sum of logs, and when doing ZUNIONSTORE it's undefined how to treat
> members that are not in both sets - as 0 or 1.
> But for ZINTERSTORE it would be very useful to be able to multiply the
> matching scores. I wrote a (very very small) patch for this long ago, and I
> don't see a reason why not to do this.

At Redisconf I implied that I'd personally like to see a collection of
arithmetic operations implemented here. I mentioned 10, which was just
an estimate, which is about what I've just thought up now. I'd be
happy to describe them, their semantics, and even add them and
document them if they are accepted.

> 2. ZSUM. This is of course useful for turning counts into probabilities,
> without having to fetch the entire set to the client side, or keeping a sum
> as one of the members or a separate key, which adds overhead to insertion.
> I know it can be done in Lua. But if the sorted set itself tracks its sum,
> it can be done in O(1) time with hardly any time overhead for insertion.

One of the issues with this is that the vagaries of floating point
math means that summing the values in the ZSET may get a different
result than incrementally adjusting the total value for the ZSET.
Also, summing the values in different orders can also get different
results. This may or may not matter in practice, it depends on the
application.

- Josiah

Dvir Volk

unread,
Oct 25, 2012, 12:41:40 PM10/25/12
to redi...@googlegroups.com
On Thu, Oct 25, 2012 at 6:33 PM, Josiah Carlson <josiah....@gmail.com> wrote:
On Thu, Oct 25, 2012 at 7:37 AM, Dvir Volk <dvi...@gmail.com> wrote:
 
At Redisconf I implied that I'd personally like to see a collection of
arithmetic operations implemented here. I mentioned 10, which was just
an estimate, which is about what I've just thought up now. I'd be
happy to describe them, their semantics, and even add them and
document them if they are accepted.

please do. I was just stating my own need here.
what if you could just pass a lambda expression in Lua that takes S1, S1 and returns whatever you want? :)
 

> 2. ZSUM. This is of course useful for turning counts into probabilities,One of the issues with this is that the vagaries of floating point
math means that summing the values in the ZSET may get a different
result than incrementally adjusting the total value for the ZSET.
Also, summing the values in different orders can also get different
results. This may or may not matter in practice, it depends on the
application.


Well, you're right, but it doesn't matter if you do it in your app on in redis - you have floating point numbers and it's fragile. obviously any app where this matters won't use this feature anyhow, but for most cases it should be perfectly fine IMHO. Especially in the use case of turning integer counts into fractions of the total. 
 

 - Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Josiah Carlson

unread,
Oct 25, 2012, 1:52:32 PM10/25/12
to redi...@googlegroups.com
On Thu, Oct 25, 2012 at 9:41 AM, Dvir Volk <dvi...@gmail.com> wrote:
> please do. I was just stating my own need here.
> what if you could just pass a lambda expression in Lua that takes S1, S1 and
> returns whatever you want? :)

Okay, I'll post them later. But I can just about assure you that you
really don't want this done in Lua unless your ZSETs are small or
unless you've already swapped Lua out for LuaJIT.

> Well, you're right, but it doesn't matter if you do it in your app on in
> redis - you have floating point numbers and it's fragile. obviously any app
> where this matters won't use this feature anyhow, but for most cases it
> should be perfectly fine IMHO. Especially in the use case of turning integer
> counts into fractions of the total.

Generally speaking, I've been a big fan of keeping total count
information as part of tree structures themselves. ZSET skiplists
should include "order statistic" information for pulling the Xth item,
adding running totals isn't really that bad in practice (I've done it
myself for other structures I've built). If done as part of the
structure, any arbitrary operation can get flushed back into the
structure without much difficulty, though binary trees can make this
easier. It can increase memory use (depending on whether there is
internal fragmentation for the nodes already), but but would address
many of the issues with precision in the majority of cases.

Then again, maybe just keeping a total is simpler and solves most of the issues.

- Josiah

M. Edward (Ed) Borasky

unread,
Oct 25, 2012, 4:19:43 PM10/25/12
to redi...@googlegroups.com
Keep counts, sums, sums of squares, sums of X*Y products and you've
got variances, standard deviations, correlations, covariances and
linear regression with little effort. PostgreSQL has had that built in
for years. ;-)
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>



--
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://znmeb.github.com/Computational-Journalism-Publishers-Workbench/

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?
Reply all
Reply to author
Forward
0 new messages