Filtering zset by hash field value

376 views
Skip to first unread message

Bobby Sciacchitano

unread,
Mar 27, 2012, 10:49:38 PM3/27/12
to redi...@googlegroups.com
Hello,

I've got a leader board set containing member ranks (roughly 2 million records)

Each key has a corresponding hash set containing the member id, name, gender and country.

I'd like to be able to sort the leader board and also filter it by a field in the hash set (e.g.: get scores for members where gender is male).

What is the best approach to achieving this?

Thanks

Andy McCurdy

unread,
Mar 27, 2012, 11:01:55 PM3/27/12
to redi...@googlegroups.com
Maintain multiple ZSETs for each leader board you'd like to display.

LEADERBOARD:ALL-PLAYERS
LEADERBOARD:MALES
LEADERBOARD:FEMALES
...

Whenever you updating someone's score, you need a way to figure out which zset's to ZADD their new score to.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/GWoIbdh1gZkJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Felix Gallo

unread,
Mar 27, 2012, 11:11:15 PM3/27/12
to redi...@googlegroups.com
zadd scores 1.0 mary
zadd scores 2.0 sue
zadd scores 3.0 bob
zadd scores 4.0 bruce
zadd scores 5.0 maggie

sadd females mary sue maggie

zinterstore femscores 2 scores females

zrange femscores 0 50 withscores



--

Chris Barnett

unread,
Apr 13, 2012, 5:25:20 PM4/13/12
to redi...@googlegroups.com
This method takes as much time as re-sorting the whole filtered set of members because inserting n entries into a new zset (which zinterstore does) is O(n log n), just like sorting. If you only want the top 50 members of the leader-board, though, it would be more efficient to do your set operations on normal sets and then sort and return only the first 50 members of the result set using SORT (which uses the partial quicksort algorithm to efficiently sort only a subset of an unsorted collection).

Unfortunately you can't currently SORT BY a score looked up in a sorted set (this would be a useful feature to add for exactly this use case though btw) so you would need to store your scores in string keys or hashes so you can SORT BY them.

This model makes most sense if you have lots of overlapping categories (sets) to query over (like Female, American, John etc.) and you will usually only want the first k highest ranked members of some arbitrary intersection of those sets. In this case it is inefficient to always sort the whole result set of the ad-hoc query so you shouldn't use ZINTERSTORE.

I fear I have written this as though I think I'm an expert at Redis, which I'm not. I am actually just beginning a project in Redis which includes almost this exact use case, though, so I have put a lot of thought and research into it lately.

I would very much welcome feedback about my reasoning here.
To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.

Felix Gallo

unread,
Apr 13, 2012, 6:40:54 PM4/13/12
to redi...@googlegroups.com
You're not quite right, if I understand you correctly. A lot depends
on the relative size of your sets.  The time complexity of zinterstore
(from the docs):

O(N*K)+O(M*log(M)) worst case with N being the smallest input sorted
set, K being the number of input sorted sets and M being the number of
elements in the resulting sorted set.

Note the word 'smallest'. So if you have, e.g., 1mm scores, and 1000
players, and the players are split into male and female sets equally,
then zinterstoring the males with the scores will run super fast as
the 500 player set is used as a key. On my horrible old tiny Xen
eighth-fragment of an AMD 8 processor box, doing that interstore takes
.00215 seconds.

There are several designs depending on how much data you have, how
many different types of filters you have, and how much memory you're
willing to trade for speed. The off-the-cuff one I threw out there is
fine for most purposes.

F.

>>> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.


>>> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>>
>>

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/tvuPhp2yG2UJ.


>
> To post to this group, send email to redi...@googlegroups.com.

> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Chris Barnett

unread,
Apr 13, 2012, 7:09:49 PM4/13/12
to redi...@googlegroups.com
I think what you're saying is that if the smallest set in the zinterstore command is insignificant (like 500) then the overhead of zinterstore is insignificant because it only sorts a small result set. Is that what you mean?

I was thinking about a scenario - which I think would be fairly common - where the total number of candidates for the leaderboard, after filtering, is very large (say 1 million females out of the total 2 million users). In this case, if you only want the top 50 ranked females of these 1 million then sorting all 1 million (by putting them into a sorted set) would be wasteful: O(N*K)+O(M*log(M)) where N=1 million, K=2 and M=1 million. It would be more efficient to sort only the top 50 entries in the females set because SORT is O(N+M*log(M)) where N=1 million and M=50 in this case.

Have I rebutted you successfully or did I misunderstand you :) ?

P.S. There is one thing at least that I didn't understand and that is why you distinguished between the number of scores (1 million) and the number of players (1000) in your example.

>>> To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.


>>> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/tvuPhp2yG2UJ.
>
> To post to this group, send email to redi...@googlegroups.com.

> To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.

Felix Gallo

unread,
Apr 13, 2012, 7:17:29 PM4/13/12
to redi...@googlegroups.com
Yes, you misunderstood. You don't need to put the women into a sorted
set. You just intersect them (in their regular set) with the sorted
set (which works).

F.

>> >>> redis-db+u...@googlegroups.com.


>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/redis-db?hl=en.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Redis DB" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/redis-db/-/tvuPhp2yG2UJ.
>> >
>> > To post to this group, send email to redi...@googlegroups.com.
>> > To unsubscribe from this group, send email to

>> > redis-db+u...@googlegroups.com.


>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/nexHu_JvNEYJ.


>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

> redis-db+u...@googlegroups.com.

Chris Barnett

unread,
Apr 13, 2012, 7:28:13 PM4/13/12
to redi...@googlegroups.com
But the intersection of the all-female-users set with the all-users zset is a zset containing all female users. A new zset has to be created at the new key you specify in the ZINTERSTORE command and filling that new zset takes O(M*log(M)) time - that's the reason the ZINTERSTORE time complexity is  O(N*K)+O(M*log(M)). 

>> >>> redis-db+unsubscribe@googlegroups.com.


>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/redis-db?hl=en.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Redis DB" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/redis-db/-/tvuPhp2yG2UJ.
>> >
>> > To post to this group, send email to redi...@googlegroups.com.
>> > To unsubscribe from this group, send email to

>> > redis-db+unsubscribe@googlegroups.com.


>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/nexHu_JvNEYJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

> redis-db+unsubscribe@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages