It depends on:
1. the number of writes your system receives (and from which types of people)
2. the number of reads your system receives (and to which types of people)
3. the fan-out of the system (how many people should see each message)
4. the fan-in of the system (how many people are followed by each individual)
5. the performance of your subscribed clients
etc.
My recommendation is that if you really want to implement something
like twitter, you not use Redis publish/subscribe. Unless your fan-out
is trivially small for all users (and you have a tiny number of
users), the moment you get a few writes to the big users, it's going
to slow down the system, generate a lot of outgoing buffers,
potentially crash, etc.
Generally, the systems that tend to scale in a situation like this are
the ones with workers that manually distribute these messages out.
Basically you end up with a task queue, and each task item is
something of the form "distribute message X to all of the users that
should see it". Here is some sample code that should work for as long
as you have memory (pushing old messages off to a database should
eventually happen, and this would work fairly well with sharding).
To publish a message to be syndicated out to users:
def post_message(conn, message):
message['id'] = id = conn.incr(COUNTER)
conn.hset(WORK_ITEMS, id, json.dumps(message))
conn.rpush(WORK_QUEUE, id)
Run as many workers as you can reasonably get away with. The more that
run, the slightly higher the throughput you will get, though you may
increase latency once you max out Redis' throughput and/or the
available processor of your workers.
def process_work(conn):
while not QUIT:
item = conn.blpop(WORK_QUEUE, 1)
if not item:
continue
item = conn.hget(WORK_ITEMS, item)
if not item:
continue
message = json.loads(item)
followers = get_followers(message['author_id'])
for follower_id in followers:
conn.zadd(HOME_TIMELINE + follower_id, id, message['id'])
The above code will add items to a bunch of home timelines, but it
won't clean out the home timelines, or clean out the mapping of
'WORK_ITEMS'. If you add a little bit of pipelining, you should be
able to handle syndicating to a few tens of thousands of timelines
each second, with 100-150k possible on a single Redis instance.
In terms of the "get_followers()" function, my recommendation is
actually to use a database. I know, I'm a horrible person. But
individual follow actions should be fairly low volume writes,
databases have very fast reads (especially when you structure your
index properly), and you could get away with having a single table
with 3 indexes (2 if you don't care about keeping the order in which
people started "following" someone else). The reason not to use Redis
is that 1. everyone's going to have a followers list, 2. followers
lists grow practically without bound, 3. see reason #2 again.
If your network is relatively small, like say <100k users, and you
clean out old items never to return, then you could probably host it
all in Redis. Once you start breaking the million user barrier, get
substantial volume, large numbers of followers, etc., then that is
when you break out the database, etc.
Regards,
- Josiah
How many channels from how many listeners are going to be listened to?
How many messages/second are you going to publish, and how many
listeners are going to receive those messages?
Based on what you have said, I believe that the answer to the first
part is 1 channel per listener, maybe a few hundred listeners. At that
level, assuming relatively short messages, you could move tens of
thousands of messages/second through Redis pubsub without a problem.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>