Redis pub/sub scalability question

Chaoran Yang

unread,

Mar 20, 2013, 1:43:14 PM3/20/13

to redi...@googlegroups.com

Dear all,

I have a question about scalability of pub/sub system in Redis. I have a use case which is basically a social network where each user have many friends. Friendship is bi-directional. I want to use pub/sub system in Redis to implement a messaging system. I can think of two ways to do it:

1. Each user has a outgoing channel. When a user logs in, he subscribe every outgoing channel of his friends. A user publish his status posts to his own outgoing channel.

2. Each user has a incoming channel. When a user logs in, he subscribe to his own incoming channel. A user publish his status update to every incoming channel of his friends.

I want to know what are the costs of these two approach with respect to both memory and computation. Which is more scalable? Is there a more scalable way to use pub/sub in Redis to implement this?

Thanks,

Chaoran
Sent from my iPhone

Josiah Carlson

unread,

Mar 20, 2013, 3:15:37 PM3/20/13

to redi...@googlegroups.com

Questions:

Do you care about the messages being stored even when a user isn't online?

How many friends can you expect a user to have? (average, median, 95th percentile are useful numbers to know)

How many friends can you expect to be online at any given time?

- Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Chaoran Yang

unread,

Mar 20, 2013, 3:54:00 PM3/20/13

to redi...@googlegroups.com

Hi Josiah,

I replied inline.

> Questions:
> Do you care about the messages being stored even when a user isn't online?

I persist a message in another DB (MySQL or MongoDB) when a status is posted. I use Redis to let the online users get immediate notification about the new message so that they don't need to actively refresh, which is an expansive SQL query (need to join the friendship table and messages table), to lower the load of SQL server.

When a user logs in, an SQL query can find all messages posted by his friend when he is offline. After the login, he don't need to query again because messages are directly pushed to him with pub/sub in Redis. This is my goal.

> How many friends can you expect a user to have? (average, median, 95th percentile are useful numbers to know)
> How many friends can you expect to be online at any given time?

Since I am building a new app and I don't have users yet, I can't have exact numbers to these questions. But since it's a bi-directional social network like Facebook, I hope my architectural design is capable to scale up to what Facebook currently has, without fundamental change. By "without fundamental change", I mean I can simply replace a Redis server with a Redis cluster when user increases.

-Chaoran

Josiah Carlson

unread,

Mar 20, 2013, 4:32:10 PM3/20/13

to redi...@googlegroups.com

Getting users on a social network is difficult. Scaling a social network is even more difficult. The architecture you have for 1k users will be different than 100k users, and will be different for 100m users. Don't make the assumption that your methods that seem to scale now will necessarily scale to infinity (especially on the message store side of things*)

Estimate that you can move somewhere in the neighborhood of 20k-200k messages through a Redis server.

Your #1 option will use more memory (each subscription is at least a string and some other related information), and will be limited by online_users * average_friends_per_user for memory. For few users (now), this is the right answer due to its simplicity.

Your #2 option will use less memory, at the cost of having to execute average_friends_online_per_user times as many publish commands (if you also kept a set of currently online friends). This will become the right solution when you start having memory pressure and when you start overloading a single Redis server. Why? Because *this* version is naturally shardable, and you can push the "tough" work of fanning-out your writes to your web servers.

One thing to note; for every user, you will need to have a persistent connection to Redis from your web server (or some other gateway server; don't ever put Redis on the bare internet). That might be your limiting factor right off, as 100k online users is close to the reasonable connection limit for Redis, but those 100k users may only move 5k-10k messages/second (times however many online friends they have on average).

Regards,

- Josiah

* On the message store side of things, my only recommendation is to use a database that has composite indices, that you can tune easily, and that offers reasonable backups. I'd go with Postgres, personally, because it's fairly easy to tune it to support 20k-50k inserts/second. Combine that with replication features available in Postgres 9.2 and WAL-E incremental backups, and you have a database that might be able to support you up to 100k-1M users without serious difficulty.

Beyond that, you'll definitely have to shard. Doing the sort of "fetch recent messages from my friends" would actually be pretty reasonable using Lucene, AWS Cloud Search, or Elastic Search (if you are on AWS, I'd go with Cloud Search for its simplicity in setup/scaling), though you'd probably want to fetch the actual message content from a high-IO pre-sharded and easily-scaled NoSQL store (I'd recommend Riak for this). You can contact me off-list and I can describe how this would all work.

Jesús Gabriel y Galán

unread,

Mar 21, 2013, 4:20:00 AM3/21/13

to redi...@googlegroups.com

If nobody objects, I would appreciate if you keep this discussion
on-list as it's quite interesting.

Thanks,

Jesus.

Josiah Carlson

unread,

Mar 21, 2013, 4:49:17 AM3/21/13

to redi...@googlegroups.com

We went through a deeper explanation of my "PS" for the message store/initial load stuff. And talked about scaling option #2.

- Josiah

Chaoran Yang

unread,

Mar 21, 2013, 5:27:03 AM3/21/13

to redi...@googlegroups.com

Here's our discussion:

Hello Josiah,

First of all, thank you very much. Your comments are always very valuable.

Yes, I agree that the first limiting factor I will hit is the max connection limit for Redis. But I think it should be easy to solve. I can have multiple web servers. Each server connects to multiple Redis servers. I can distribute online users among web servers. When a user posts a message, the web server issue the same list of publish command to all Redis servers.

Then you will incur a cost of redis_servers*friends cost any time someone does something, which is *not* scalable. I'd suggest running a group of Redis servers (whose number is known), shard based on user_id, then start a task that knows which machines to publish to for each user (you should have a task server for out-of-band requests that shouldn't happen while a user is waiting for a response).

One more thing to ask: If I publish to every friend rather than online friends only, does it incur extra memory cost in Redis? I suppose publishing to a channel that has no subscribers should not cause extra memory footprint in Redis. If this is true I don't need to keep a list of currently online friends.

It doesn't cost anything except one more command to execute. But the thing to remember is that friend count will tend to be 10x online friend count (at least), so you get a factor of 10 speedup if you're willing to keep a little more information. More specifically, you could use a single ZSET until you have > 1 million users online at any time, then you could switch to a sharded ZSET using the same sharding method as you do for the publish part of notifications.

About storing messages, what I planed is to use MongoDB. I can store a message collection with index on user_id, then I can first fetch all my friend's user_id, then fetch recent messages with a list of friends' user_ids. Since MongoDB can be easily shard, and using Redis's sub/pub system should greatly reduce the number of queries of "recent posts by my friends", I suppose this plan can get me further beyond millions of users. Am I right? What do you think?

I wouldn't trust MongoDB with any data that I want to get back. As I've said before on the list, MongoDB isn't magic (it was actually written by people who knew close to nothing about how actual databases work, which is why it's had significant difficulties in working the way people expect a database to work). Just about everything that you want to do with MongoDB at scale can be done with other databases with less work, less worry, better performance, and with better guarantees about whether you're going to get your data back later.

I mentioned Riak, which is amazing at "add another machine for more throughput, storage, and optionally more resiliency", and supports secondary indices. Its secondary indices aren't terribly fast, and don't quite work the way most people expect, which is also why I suggested a search engine for getting the message ids from a list of friends (search engines also handle pagination, ordering by recency, and ordering by other options too).

If I use Riak, or AWS Cloud search/Lucene in future, what kind of architecture should I use for now?

My advice is actually just to stick with a separate Postgres or MySQL server for your message store for now, because it's more important to have a viable product before you worry about scaling.

When you get to needing to scale it up beyond what a single Postgres or MySQL server can handle, set up a Riak cluster along with Lucene/Cloud Search, and migrate your data over. On the scaling side of things for this, I'd also recommend only keeping the most recent 1k messages from any individual user in the search engine, as that will greatly reduce long-term performance issues.

- Josiah

arindam chakraborty

unread,

Mar 25, 2013, 3:38:53 AM3/25/13

to redi...@googlegroups.com

Well thanks both of you, Chaoran & Josiah.

I had been grappling with a similar situation and Redis is my first choice until the scale is manageable, then eventually migrating to Riak.

Having heard this detailed discussion on Lucene with Riak makes me happy that my choices were somewhat on the right path.