Scalability with multiple large leaderboards

818 views
Skip to first unread message

Francis

unread,
Apr 20, 2012, 11:22:24 AM4/20/12
to Redis DB
I work for a game company and we're considering Redis as a solution
for our leaderboards. I have some questions about scalability.

First of all, here is a typical scenario:

- 5M players
- Hundreds of stats for every player, stored in a relational DB
- Leaderboards for some or all of these stats.
- Filtered leaderboards, based on country, age, etc.

After reading Redis documentation and some articles online, I think
that we'll run out of memory really quickly, because everything is
stored in RAM. I know that virtual memory was possible but is now
deprecated. Why?

Do you think Redis can still be helpful in this scenario? Otherwise,
are there other solutions implementing sorted sets, but on a physical
media?

I also made some tests on a Xeon 3.2GHz and I get the following
figures when filling 1M entries in a sorted set:
60000 op/s
100M RAM used
The key is the player id (0..1000000) and I'm running the x64 version
of the server.
Are these numbers normal? If so, we would definitely run out of
memory, even with a small number of leaderboards.

NailK

unread,
Apr 20, 2012, 1:23:31 PM4/20/12
to redi...@googlegroups.com
Hi,

Are you sure you really need 5M in every sorted set?
The only benefit of sorted set here is that you can tell a user his rank.
There is little point in telling that "you are # 4 932 743". This could be replaced with "you are one of best 90% players". And the zset size could be limited to something like 50K.
If you want range queries, you could consider utilizing disk-based B-Tree, like MySQL over SocketHandler.
100M RAM for 1M zset entries is okay.

I agree with you about possibility of storing data larger than RAM, I think it's one of most important aspect.
Usually working set size (hot data) is much less than total size of the data. So if the data size is 100G, the working set could be like 20G.
Disk-based databases have advantage here because they can keep that working set in memory.

And what's limiting here is that almost every time when you work with Redis you have to think about what will you do when your data will grow so much so that keeping it in RAM will become expensive.
In this case you start to think about Redis as only a cache for disk-based data. Which is 1) limiting features you could use 2) more important, it makes coding more time consuming and there is bigger risk of bugs.

I think possibility to work with larger-than-RAM data should be given priority as app developing is usually more expensive than for example db operations.

Didier Spezia

unread,
Apr 20, 2012, 1:24:20 PM4/20/12
to redi...@googlegroups.com
Hi,,

>>   I know that virtual memory was possible but is now  deprecated.  Why?  

Redis is an in-memory data store, and it is tuned for this purpose.

It is not possible to have very good performance and pay
for virtual memory I/O at the same time. It is not possible to have
O(1) algorithmic complexity for most operations while storing
things on disks, so the diskstore project has also been put on
hold as well.

>> Do you think Redis can still be helpful in this scenario?

If you have RAM, yes. If not, you will be better served by using
something else.

>> Are these numbers normal? 

It basically depends on your pipelining factor and how the
client is coded. Contrary to the other data structures, inserting
in a zset is a O(log n) operation (so slightly slower).
60,000 op/s seems fine to me.

Regarding memory consumption, it looks fine as well.
A zset is actually a dictionary plus a skip list. It takes
memory, especially on a 64 bits platform.

100M for 1M keys means 100 bytes per key which is
not that much for this kind of data structure.

If you had to code an equivalent container in C/C++,
you would get very similar memory consumption.

Regards,
Didier.

Josiah Carlson

unread,
Apr 20, 2012, 3:45:42 PM4/20/12
to redi...@googlegroups.com
On Fri, Apr 20, 2012 at 8:22 AM, Francis <franci...@gmail.com> wrote:
> I work for a game company and we're considering Redis as a solution
> for our leaderboards.  I have some questions about scalability.
>
> First of all, here is a typical scenario:
>
> - 5M players
> - Hundreds of stats for every player, stored in a relational DB
> - Leaderboards for some or all of these stats.
> - Filtered leaderboards, based on country, age, etc.
>
> After reading Redis documentation and some articles online, I think
> that we'll run out of memory really quickly, because everything is
> stored in RAM.  I know that virtual memory was possible but is now
> deprecated.  Why?

It caused stalls and slowdowns at inopportune times.

> Do you think Redis can still be helpful in this scenario?  Otherwise,
> are there other solutions implementing sorted sets, but on a physical
> media?
>
> I also made some tests on a Xeon 3.2GHz and I get the following
> figures when filling 1M entries in a sorted set:
> 60000 op/s
> 100M RAM used
> The key is the player id (0..1000000) and I'm running the x64 version
> of the server.
> Are these numbers normal?  If so, we would definitely run out of
> memory, even with a small number of leaderboards.

How many leader boards are you looking to have? How much memory do you
have to work with? Can you do what Didier has recommended with keeping
50k users/board?

If you want a machine with 500 gigs of memory for roughly $2k
US/month, I know a provider that can help you out. I won't claim that
500 gigs should be enough memory for anyone, but it would get you
easily 500-750 5M user leaderboards.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>

Bret A. Barker

unread,
Apr 20, 2012, 4:23:09 PM4/20/12
to redi...@googlegroups.com
FWIW, we've had great success in using Redis to power social game leaderboards with millions of users. We don't slice by hundreds of stats or filter on demographics like that, but for some games we do per-board leaderboards, and we always do "friend leaderboards" where you are ranked only amongst your list of game-playing friends. The latter being done with a simple set intersection. You definitely need to do the math, or even better to actually simulate your load and check the real mem usage. As long as a single list isn't going to take more RAM than you can allocate per machine then you can always partition by list.

-bret

Javier Guerra Giraldez

unread,
Apr 20, 2012, 5:35:36 PM4/20/12
to redi...@googlegroups.com
On Fri, Apr 20, 2012 at 2:45 PM, Josiah Carlson
<josiah....@gmail.com> wrote:
> If you want a machine with 500 gigs of memory for roughly $2k
> US/month, I know a provider that can help you out. I won't claim that
> 500 gigs should be enough memory for anyone, but it would get you
> easily 500-750 5M user leaderboards.

that sounds like a sweet piece of serverporn. can you share a link?
not that i'd buy something like that, i just like to... um... look at
specs

--
Javier

Josiah Carlson

unread,
Apr 20, 2012, 6:07:43 PM4/20/12
to redi...@googlegroups.com

Each of their machines is custom built, and they don't list prices on
their main page. You have to contact their sales team to get a price
on a machine, but they are peakhosting.com , and you should email
Steve Auerbach (saue...@peakhosting.com) with your questions. Tell
him that Josiah Carlson sent you, and they will take care of you.
They've got east coast, west coast, and european data centers. But
yeah, serverporn. I ate lunch with Steve and Jeff (the founder and CEO
of Peak) last week, and after they told me about that box, I couldn't
concentrate on our conversation for 10 minutes. Just the price for the
machine could have saved me 3 months of engineering effort to just
move the data out of Amazon and into their DC.

The aforementioned box has 48 cores of love, and can be outfitted with
6x SSDs (which add to the monthly cost, and is more expensive than the
box itself). You can also get smaller machines (8, 16, 24 cores) with
less memory (from 48 gigs all the way up to the 500 I mentioned) for
prices that will make you wonder why you even bother with any VPS
provider (they seem to charge roughly 1/8-1/4 of what you would pay
for "equivalent" specs from just Amazon, Linode, etc., without any VM
slowdown).

- Josiah

Javier Guerra Giraldez

unread,
Apr 20, 2012, 6:34:28 PM4/20/12
to redi...@googlegroups.com
On Fri, Apr 20, 2012 at 5:07 PM, Josiah Carlson
<josiah....@gmail.com> wrote:
> Each of their machines is custom built, and they don't list prices on
> their main page.

tks, i've bookmarked it, and when i get my redis-backed startup
online, they'll be the first i'll call for non-VPS hosting.


--
Javier

Francis

unread,
Apr 24, 2012, 8:21:27 AM4/24/12
to Redis DB
Thanks for your answers.

NailK, I like your idea of not having the exact rank, but I'm not sure
of the way to make it work correctly. Also, we could implement this
idea in the database and the cost would be acceptable, making Redis
unnecessary. But a solution like this would depend of our client's
use cases, and as we have multiple clients with different needs, it
may not be adequate.

Josiah, those servers are impressive! But I think a cluster of
smaller machines would be cheaper. The leaderboards could easily be
distributed on multiple machines (well, I think).

Redis is nice but it's not an option on our current servers. We would
have to change our hardware or to find a disk based solution similar
to Redis (anyone knows one? I didn't find anything...)

Francis

Didier Spezia

unread,
Apr 24, 2012, 8:52:14 AM4/24/12
to redi...@googlegroups.com

>> We would 
have to change our hardware or to find a disk based solution similar 
to Redis (anyone knows one?  I didn't find anything...) 

... and you will not find any. Redis data structures (linked list, hash tables,
skip lists, etc ...) are oriented towards memory storage. They do not
enforce any specific memory locality. They would generate too many
random I/Os if they were used with a rotational disk.

What you can find however are other storage engines based on classical
btrees, LSM-trees, fractal trees, etc ... whose good data locality can
be exploited to sustain good performance on rotational disks.

If you need something close to Redis and featuring btree indexes,
and use it with some virtual memory.

Regards,
Didier.

Josiah Carlson

unread,
Apr 24, 2012, 11:39:46 AM4/24/12
to redi...@googlegroups.com
There is also edis, which implements Redis in Erlang, with disk-backed
data storage: https://github.com/inaka/edis . It won't be as fast as
Redis, but it may offer the OP something similar to what they are
looking for.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/sS8uCDy-v5oJ.

David Czarnecki

unread,
Apr 27, 2012, 9:42:40 PM4/27/12
to Redis DB
Francis-

You might be interested in some of the work I've done in creating a
library specifically geared towards leaderboards with Redis. Namely,
https://github.com/agoragames/leaderboard, which is a Ruby gem and the
reference implementation for other languages that I've worked on.
Python - https://github.com/agoragames/python-leaderboard, PHP -
https://github.com/agoragames/php-leaderboard, Java -
https://github.com/agoragames/java-leaderboard, Scala -
https://github.com/agoragames/scala-leaderboard, and a few others that
are works in progress.

I've got a section in the README specifically geared towards
performance metrics. I'd have to re-run the metrics to take a look at
memory consumption before and after.

Hope this helps. I can only say that we're using Redis for
leaderboards (among other uses) and have had great success. I'm happy
to answer any questions you might have.

-David
Reply all
Reply to author
Forward
0 new messages