PubSub multiple channel design

419 views
Skip to first unread message

goldalworming

unread,
Jun 7, 2011, 5:34:43 AM6/7/11
to Redis DB
hello, I'm new in Redis world...

I very happy with pubsub redis functionality. But which one is better?

a user subscribe to multiple channel or a poster publish to multiple
channel in a case -let say- twitter-like web...??

which one better in memory usage, speed, etc

regard

Josiah Carlson

unread,
Jun 7, 2011, 12:15:58 PM6/7/11
to redi...@googlegroups.com

It depends on:
1. the number of writes your system receives (and from which types of people)
2. the number of reads your system receives (and to which types of people)
3. the fan-out of the system (how many people should see each message)
4. the fan-in of the system (how many people are followed by each individual)
5. the performance of your subscribed clients
etc.

My recommendation is that if you really want to implement something
like twitter, you not use Redis publish/subscribe. Unless your fan-out
is trivially small for all users (and you have a tiny number of
users), the moment you get a few writes to the big users, it's going
to slow down the system, generate a lot of outgoing buffers,
potentially crash, etc.

Generally, the systems that tend to scale in a situation like this are
the ones with workers that manually distribute these messages out.
Basically you end up with a task queue, and each task item is
something of the form "distribute message X to all of the users that
should see it". Here is some sample code that should work for as long
as you have memory (pushing old messages off to a database should
eventually happen, and this would work fairly well with sharding).

To publish a message to be syndicated out to users:

def post_message(conn, message):
message['id'] = id = conn.incr(COUNTER)
conn.hset(WORK_ITEMS, id, json.dumps(message))
conn.rpush(WORK_QUEUE, id)

Run as many workers as you can reasonably get away with. The more that
run, the slightly higher the throughput you will get, though you may
increase latency once you max out Redis' throughput and/or the
available processor of your workers.

def process_work(conn):
while not QUIT:
item = conn.blpop(WORK_QUEUE, 1)
if not item:
continue
item = conn.hget(WORK_ITEMS, item)
if not item:
continue
message = json.loads(item)
followers = get_followers(message['author_id'])
for follower_id in followers:
conn.zadd(HOME_TIMELINE + follower_id, id, message['id'])

The above code will add items to a bunch of home timelines, but it
won't clean out the home timelines, or clean out the mapping of
'WORK_ITEMS'. If you add a little bit of pipelining, you should be
able to handle syndicating to a few tens of thousands of timelines
each second, with 100-150k possible on a single Redis instance.

In terms of the "get_followers()" function, my recommendation is
actually to use a database. I know, I'm a horrible person. But
individual follow actions should be fairly low volume writes,
databases have very fast reads (especially when you structure your
index properly), and you could get away with having a single table
with 3 indexes (2 if you don't care about keeping the order in which
people started "following" someone else). The reason not to use Redis
is that 1. everyone's going to have a followers list, 2. followers
lists grow practically without bound, 3. see reason #2 again.

If your network is relatively small, like say <100k users, and you
clean out old items never to return, then you could probably host it
all in Redis. Once you start breaking the million user barrier, get
substantial volume, large numbers of followers, etc., then that is
when you break out the database, etc.

Regards,
- Josiah

goldalworming

unread,
Jun 8, 2011, 3:57:05 PM6/8/11
to Redis DB
I think the problem I want to solve using redis is the long polling
system. Not all data saving.
since I use tornado to create long polling. I don't know how to multi-
channel chat like.
because if I use more than 1 process I don't know how call another
process callback. So I decide to use redis.

A call updates to machineA then long-poll waiting for callback
B send message to machineB for A, B cannot call A callback.

that's what I tried to solve using redis pubsub. but I don't know how
many user & channel can redis pubsub hold.

regards,
Arief

On Jun 7, 11:15 pm, Josiah Carlson <josiah.carl...@gmail.com> wrote:

Josiah Carlson

unread,
Jun 8, 2011, 5:32:00 PM6/8/11
to redi...@googlegroups.com
On Wed, Jun 8, 2011 at 12:57 PM, goldalworming <ariefnu...@gmail.com> wrote:
> I think the problem I want to solve using redis is the long polling
> system. Not all data saving.
> since I use tornado to create long polling. I don't know how to multi-
> channel chat like.
> because if I use more than 1 process I don't know how call another
> process callback. So I decide to use redis.
>
> A call updates to machineA then long-poll waiting for callback
> B send message to machineB for A, B cannot call A callback.
>
> that's what I tried to solve using redis pubsub. but I don't know how
> many user & channel can redis pubsub hold.

How many channels from how many listeners are going to be listened to?
How many messages/second are you going to publish, and how many
listeners are going to receive those messages?

Based on what you have said, I believe that the answer to the first
part is 1 channel per listener, maybe a few hundred listeners. At that
level, assuming relatively short messages, you could move tens of
thousands of messages/second through Redis pubsub without a problem.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages