Is it possible to model a leaderboard in Redis?

258 views
Skip to first unread message

Kirk Chen

unread,
Apr 22, 2014, 2:16:07 PM4/22/14
to redi...@googlegroups.com
I've got a really difficult problem. It's to do a leaderboard, and it's not just your typical all time leaderboard. That would be too easy. This leaderboard needs to be able to rank by user with the top score in any given time period.

Imagine you have users submitting scores at different times, and you've got millions of rows of user_id, score_id, score, time.

I can easily get it to store all time top score. That's just

zadd score.high score user_id

whenever a score is submitted. However, I need to be able to get the user with the high score for any time period.

I tried to add zset for storing when score is submitted table

zadd score.time time score_id

This enables me to filter out scores with score_id. I also have to change the sorting of score to use score_id instead of user_id so I can filter by time and intersect the two sets

Sorting of score_id:

zadd score.high score score_id

Filter by time (and store it for later intersect). This example uses a week time frame:

local t = redis.call('rangebyscore', 'score.time', '1398189776', '1397584990', 'withscores')
for i=1, #t-1, 2 do
t[i],t[i+1] = t[i+1],t[i]
end
if #t > 0 then redis.call('zadd', score.last.week, unpack(t)) end 

Intersect 2 sets

zinterstore score.last.week.high 2 score.last.week score.high weights 0 1

That gives me all the scores last week sorted by rank. Almost there. These are score_id, but I need the rank by user_id

This is where I am completely stuck. I need to further reduce this list down to just 1 score_id entry per user_id. How to do this in redis?


Josiah Carlson

unread,
Apr 22, 2014, 8:55:14 PM4/22/14
to redi...@googlegroups.com
In Lua, assuming you have a mapping of score_ids to user_ids available in a hash somewhere...

-- gets scores from lowest to highest
local scores = redis.call('zrange', 'score.last.week.high', 0, -1, 'withscores')
for i=1, #scores-1, 2 do
    -- look up the user from the score_id
    local user = redis.call('hget', 'score.to.user', scores[i])
    -- add the looked-up user to the result ZSET
    redis.call('zadd', 'score.last.week.users.high', scores[i+1], user)
end

Because we do this from lowest to highest score, the lower scores for a user in a week are overwritten by the recent user.


As an alternate schema... If you have a typical time period, pre-shard by that time period. Like "last week Monday - Sunday" gets one key. Then you add individual users and their high scores to the ZSET (if it's higher for that user), and can eliminate all of this score_id stuff. If you want to offer a more "varied" start/end day, you can have daily ZSETs. Then for any particular time period, you just union the days you want to include with an aggregate of MAX to get the highest scores from those users in the required time period.

I don't know the requirements for your leaderboards, but if you have a lot of users and score submissions, you'll probably want a data cleanup semantic. The awesome thing about the daily ZSETs? Perfect for expiration-based eviction, Redis-nds flushing old data to disk, or even just explicit deletion.

 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Kirk Chen

unread,
Apr 24, 2014, 2:34:46 PM4/24/14
to redi...@googlegroups.com
I am leaning more towards alternate daily schema more. Using Lua script feels less scalable.

The requirements are quite open right now. Only thing is it has to scale well (10 ~ 25 millions of scores / month). Ideally I'd like a varied start/end day (and it has to support few hundred thousand players per month). The daily one actually works very well, as I can archive old scores or even delete them. It'll probably be fast enough too. 

One step further to the alternative schema, maybe I can do this in hourly instead of daily.Then I could do last 24 hours instead of UTC 0:00 ~ 23:59. Although I do have to wonder the impact of this on performance.

Alternatively to the alternative schema, someone suggested that I could do a zset of highest score last week, but to keep it up to date, keep another zset of when score expires. When a score expires, look up the next highest score of that user. The plus side is I don't need to union things when I need the leaderboard. The down side is There's a lot of expiration happening and doing all those updates will not scale well.

Salvatore Sanfilippo

unread,
Apr 24, 2014, 3:37:52 PM4/24/14
to Redis DB
I'm not totally sure I understand the problem. My guess is that you
want to say how a given user ranked over time.
For instance today it is the Top-1023 but yesterday it was just the Top-4982.

Assuming I got it, to make this memory efficient, if you have a given
sampling rate (like 1 sample per day), you could do the following:

1) Just take a single leaderboard, a single zset key.
2) Iterate over all the elements of the leaderboard with a script
every 24 hours, and populate a per-user list where you PUSH the
timestamp+current_rank.
3) Make sure to split the list by different key by, like, 3 months, so
that you have few elements per list to make sure the list will be
encoded as a ziplist.

This way you can go back in time and say how a given user ranked, but
all your computations are performed in a single sorted set when a new
rank is submitted.

Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Josiah Carlson

unread,
Apr 24, 2014, 5:38:18 PM4/24/14
to redi...@googlegroups.com
Here's the thing: you don't need to do exclusively daily or hourly, you can have both. Write to both the hourly and daily ZSETs at the same time, then delete the older data as it gets stale (or use expiration to get rid of it automatically).

The speed of the union for the 24 hour list might not be great, but you can do that on an hourly schedule. The union for the 7-day toplist also won't be very fast, but you should also only need to do that once/day.

Depending on your desires for archived data, there are several methods if you want to keep multiple months worth of rankings. I'm sure that one of us will be able to point you in a good direction or confirm that you are doing the right thing if/when you make a decision to keep archived data.

 - Josiah

Reply all
Reply to author
Forward
0 new messages