Redis Cluster Pub/Sub - Scalability Issues

3,000 views
Skip to first unread message

Felix Gessert

unread,
Jul 13, 2015, 1:06:19 PM7/13/15
to redi...@googlegroups.com
Hi all,

according to [1] and [2] PubSub works by broadcasting every publish to every other Redis Cluster node. This limits the PubSub throughput to the bisection bandwidth of the underlying network infrastructure divided by the number of nodes times message size. So if a typical message has 1KB, the cluster has 10 nodes and bandwidth is 1 GBit/s, throughput is already limited to 12.5K RPS. If we increase the message size to 5 KB and the number of nodes to 50, we only get 500 RPS - much less than a single Redis instance could service (>100K RPS), while putting maximum pressure on the network. PubSub thus scales linearly wrt. to the cluster size, but in the the negative direction!

This leads me to my question: why not just redirect clients to the node responsible for the PubSub channel's hash slot owner, similar to normal data operations? Cluster nodes would thus only have to publish and notify locally, similar to keyspace notifications [3], and PubSub would be perfectly scalable. Sure, this would break PSUBSCRIBE for the cluster, but PSUBSCRIBE could be limited to only allow patterns containing a {hash-tag}, meaning that the subscription only pertains to channels containing that hash tag + matching the pattern (i.e. one specific node). As PSUBSCRIBE is semantically a multi-key operation, this would make perfect sense and be consistent with the rest of Redis Cluster.

In summary, I think the assumption that clients may publish and subscribe to any node is a dangerous guarantee that kills scalability. What do you think - could the above be the way to handle PubSub in Redis Cluster? Are there currently any workarounds to have PubSub scale in a Redis Cluster deployment?

Best,
Felix

P.S. Redis Cluster is a great project and highly value all the effort that goes into it!

Josiah Carlson

unread,
Jul 13, 2015, 1:40:20 PM7/13/15
to redi...@googlegroups.com
The negative scalability of Redis Cluster when it comes to pubsub is annoying, yes.

While I don't disagree with your conclusion, there are several issues with changing Redis Cluster to how you describe.

The first is that Redis Cluster currently operates differently, and it is a guarantee that people rely on Redis Cluster's existing behavior for their application to be correct. So if this were to come to pass, it would need to be a configuration option that defaults to off, or somehow uses the keys themselves to determine behavior.

The second issue is that now you have to start sending error messages/redirects to *established* pubsub clients to tell them to reconnect to a different server to get their subscriptions. This hasn't happened before, so you're going to need to get client authors to fix the pubsub behavior in their clients, *and* deal with the fact that now pubsub is substantially less reliable due to the need to reconnect (to a different server), combined with various race conditions surrounding key/notification movement.

And finally, you're going to need to get signoff from Salvatore. He's spent a lot of time getting Cluster the way he thinks it should be, and obviously had opportunities to limit pubsub in the manner that you describe. I suspect part of the reason why he chose to build pubsub as it is in Cluster is because coordination between cluster nodes is done via pubsub, as pre-Cluster Sentinel coordination was also done via pubsub.

Personally, if I had a need for pubsub and was using Redis Cluster, I'd just run a 3-node classic Redis setup with Sentinel on a few of the same machines, and rename all non-pubsub commands to the empty string to prevent writing to the pubsub-only machines.

 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Felix Gessert

unread,
Jul 14, 2015, 5:09:17 AM7/14/15
to redi...@googlegroups.com
You are right. However I think there is a reasonable solution for both problems you mention:

1. (Breaking existing apps) the SUBSCRIBE and PUBLISH commands could be extended to have a new optional parameter "broadcast" which defaults to true. This parameter indicates, whether the receiving node should broadcast the PUBLISH command over the cluster bus. Existing apps would behave as before, while all client drivers (e.g. Jedis) could add the feature of treating non-broadcast PUBLISH calls using hash slot semantics. This part is rather easy because essentially clients just have to treat PUBLISH similar to any other data command.

2. (Subscriptions during hash slot migration) This is harder but totally possible. This is what I would propose:
-When migration starts, PUBLISH calls are redirected with -ASK to the new hash slot owner.
-The node taking over the hash slot keeps a temporary log of PUBLISH messages for some configurable time (SUBSCRIPTION_MIGRATION_TIMEOUT)
-Atomically with the begin of -ASK redirections, subscribed clients receive a message indicating that they sould reconnect to the new node.
-On connection with the new node, clients initially receive (either by default or by option), all messages the node accumulated since the migration and before SUBSCRIPTION_MIGRATION_TIMEOUT has passed.

Using this scheme, no messages will be lost, as long as subscribed clients are able to reconnect to the new node within SUBSCRIPTION_MIGRATION_TIMEOUT. In Redis Cluster, this requires implementing the temporary PubSub migration log. Clients need to extend their subscribe-connections to handle reconnection to a new node. This part can be combined with 1. by having a SUBSCRIBE that indicates whether a client is capable of a handover and defaults to false.

Also, there is a rich body of recent research on live migration in database systems (e.g. Slacker, Albatross, Zephyr [1]) that we can learn from.

Do you have a suggestion, on how to have Salvatore take a look at this problem and proposal?

As for know I think in terms of reliability, functionality and scalability Apache Kafka [2] is far superior to Redis Cluster PubSub - but there is no reason to leave it at that. In terms of operational simplicity and single server perormance there is still a large niche for a scalable Redis Cluster PubSub.

Josiah Carlson

unread,
Jul 16, 2015, 2:13:04 AM7/16/15
to redi...@googlegroups.com
Replies inline.

On Tue, Jul 14, 2015 at 2:09 AM, Felix Gessert <felix....@gmail.com> wrote:
You are right. However I think there is a reasonable solution for both problems you mention:

Of course there are, I was pointing them out so that you (or someone else with similar desires) would know some of the issues that would need to be addressed before or during development of the behavior change.

1. (Breaking existing apps) the SUBSCRIBE and PUBLISH commands could be extended to have a new optional parameter "broadcast" which defaults to true. This parameter indicates, whether the receiving node should broadcast the PUBLISH command over the cluster bus. Existing apps would behave as before, while all client drivers (e.g. Jedis) could add the feature of treating non-broadcast PUBLISH calls using hash slot semantics. This part is rather easy because essentially clients just have to treat PUBLISH similar to any other data command.

I feel like the language is backwards here. There shouldn't be a "broadcast" argument that defaults to true, there should be a "shard" argument that defaults to false. If you want sharded pubsub, you *must* provide SHARD as a final argument, otherwise it will be broadcast across the entire cluster.

2. (Subscriptions during hash slot migration) This is harder but totally possible. This is what I would propose:
-When migration starts, PUBLISH calls are redirected with -ASK to the new hash slot owner.
-The node taking over the hash slot keeps a temporary log of PUBLISH messages for some configurable time (SUBSCRIPTION_MIGRATION_TIMEOUT)
-Atomically with the begin of -ASK redirections, subscribed clients receive a message indicating that they sould reconnect to the new node.
-On connection with the new node, clients initially receive (either by default or by option), all messages the node accumulated since the migration and before SUBSCRIPTION_MIGRATION_TIMEOUT has passed.

Using this scheme, no messages will be lost, as long as subscribed clients are able to reconnect to the new node within SUBSCRIPTION_MIGRATION_TIMEOUT. In Redis Cluster, this requires implementing the temporary PubSub migration log. Clients need to extend their subscribe-connections to handle reconnection to a new node. This part can be combined with 1. by having a SUBSCRIBE that indicates whether a client is capable of a handover and defaults to false.

The problem with adding arguments to SUBSCRIBE and PSUBSCRIBE is that they are both defined such that all arguments are channels or channel patterns. This would either need to be some sort of client option (like CLIENT SETNAME, only this would be CLIENT SHARDED_SUBSCRIBE or something shorter/better), or would necessitate a new command.

And I previously didn't think about or consider it, but I don't know how Cluster handles keyspace notifications right now. Have you tested that out yet? Does it also broadcast keyspace changes across the cluster, or does it limit itself to the local node? That might be useful to put into the configuration file as a way of getting the entire cluster to not fall over with keyspace notifications. Having a second configuration option for defaulting pubsub to sharded behavior might also be a good option and/or replacement for adding arguments to commands or adding a new CLIENT subcommand.

Also, there is a rich body of recent research on live migration in database systems (e.g. Slacker, Albatross, Zephyr [1]) that we can learn from.

Do you have a suggestion, on how to have Salvatore take a look at this problem and proposal?

Make it interesting to him is the best advice I can give. Writing code, talking about it on redis-dev, and submitting a pull request are some of the ways you can make it interesting from my experience.

As for know I think in terms of reliability, functionality and scalability Apache Kafka [2] is far superior to Redis Cluster PubSub

Pubsub exists as a very useful feature to have on top of the existing Redis system, and from what I understand, keeping it viable for a variety of use-cases is a reasonable expectation/priority. But as far as I know (someone please correct me if I am mistaken), beating Kafka as a distributed message bus has never been in the list of priorities or plans. Heck, Redis itself beating most any other software as a distributed message bus hasn't been a priority.
 
- but there is no reason to leave it at that. In terms of operational simplicity and single server perormance there is still a large niche for a scalable Redis Cluster PubSub.

There is certainly an opportunity to improve the performance of Redis pubsub in cluster mode, assuming that you can convince Salvatore that it a big enough problem to address, or that the pain of a change is minimized. One way of helping to reduce the pain of such a change is to reduce the number of changes that need to happen.

For example, consider the following alternate changes:
* new cluster-wide config option sharded-pubsub defaulting to 'no', if provided with 'yes', subscription messages on a channel X will only be sent to nodes that are the master or slaves of the shard that would contain that channel if it were a key
* publishing to a channel not assigned to the node published to will get the message forwarded to the proper master (which is then replicated to the slaves)
* if a subscriber wants to be notified of their channel moving to another server, they must subscribe to a special __cluster__:slots channel (which is broadcast cluster-wide, only cluster nodes can publish, like __sentinel__:hello in sentinels) to be notified of the progress of slot migrations
* immediately upon slot migration start, both the old and new slot master/slaves start receive published messages for sending to subscribers, and on completion of migration (and the slot migration completed message), only the new master/slaves for that slot receive messages for sending to subscribers

Pros to this variant: but no command changes/additions, necessary client changes don't need to be done in the client library, useful cluster informational messages are now more readily available to the cluster, ...

 - Josiah

Felix Gessert

unread,
Jul 16, 2015, 6:56:46 AM7/16/15
to redi...@googlegroups.com
I think that is a very good proposal, I'll post it to redis-dev. The only thing I would additionally suggest is a small grace period during migration/double-publishing, so in a non-failure scenario clients don't lose messages, even if migration is fast.

I am not familiar enough with the internals of how point-to-point cluster communication works (e.g. to selectively foward publishes), so I'm curious if this requires major changes in Redis Cluster.

As for your question on keyspace notifications, there is a comment from Salavatore on Github [1] stating that they are always local. That implies that a scalable solution for one channel could be faked using this (anti-)pattern:
//Connection 1
$ redis-cli config set notify-keyspace-events E$
$ redis-cli --csv subscribe __keyevent@0__:set
//Connection 2
$ redis-cli set mymsg ""
$ redis-cli set myothermsg ""

Of course, this is not really an option.

Best,
Felix

Josiah Carlson

unread,
Jul 17, 2015, 1:02:42 AM7/17/15
to redi...@googlegroups.com
On Thu, Jul 16, 2015 at 3:56 AM, Felix Gessert <felix....@gmail.com> wrote:
I think that is a very good proposal, I'll post it to redis-dev. The only thing I would additionally suggest is a small grace period during migration/double-publishing, so in a non-failure scenario clients don't lose messages, even if migration is fast.

Yeah, it makes sense in a "low data, high publish" scenario.

I am not familiar enough with the internals of how point-to-point cluster communication works (e.g. to selectively foward publishes), so I'm curious if this requires major changes in Redis Cluster.

Standard publish messages are broadcast cluster-wide, and local keyspace (internal publish) notifications are per-node (as you reference below). All of the plumbing is there, so it shouldn't be too bad to plug the right parts together.

 - Josiah

Thomas Sieverding

unread,
Aug 4, 2015, 3:37:01 AM8/4/15
to Redis DB
I was under the impression from previous discussions by Salvatore that he was looking into doing this using bloom filters. If I recall, the concept was that subscriber keys would be added to a bloom filter, that then would be the basis for pub/sub propagation. If we're just talking about publish/subscribe without patterns, then this kind of system could scale out if on an application level, a given client was only utilized for a subset of subscription channels.

At that point, if the app utilized consistent hashing to route connections to servers that subscribe to instances within the respective key slot, then it should be possible to reduce cross-chatter. At that point, reducing cross chatter would be a matter of adding more masters, and increasing slot capacity would be a matter of adding more slaves to fan out data.

Obviously in this scenario there would be a tremendous amount of application level responsibility to be intelligent about subscribing, however such an approach could be possibly added transparently without requiring new configuration or syntax. This approach might be slightly more palatable as well simply because you could build around eventual consistency without creating inherit race conditions. If a given instance ends up with subscribers it shouldn't have after remapping, the cost of a small degree of unnecessary cross-chatter while the application updates is far more acceptable than losing events.

Felix Gessert

unread,
Aug 4, 2015, 8:18:42 AM8/4/15
to Redis DB
I saw the older discussion on using Bloom filters, too. Though being fond about Bloom filters and probabilistic data structures, I think this does not really solve a problem: the Bloom filter would only compress the storage requirements for known subscriber channels on other nodes. However, in any typical scenario this is not the bottleneck. Redis masters would easily be able to store a complete mapping of channels to nodes, in order to selectively forward publish messages. In particular, the more clients learn the smarter partitioning logic, the less overhead the mapping takes. Hence, I do not see any benefit in using Bloom filters (or rather counting Bloom filters, since subscribers come and go).

Best,
Felix

Thomas Sieverding

unread,
Aug 4, 2015, 4:32:39 PM8/4/15
to Redis DB
That's fair. What was important about this scheme was that there was a mechanism in which a given node could be aware of which nodes are subscribing to a given event, and could then reduce chatter. Yes it's not as attractive as consistent hashing, however it wouldn't break existing apps, and still accomplish the desired benefits when complimented with application level consistent hashing of subscription channels.

With the current state of affairs, keyspace notifications don't propagate correct? In other words, someone could utilize a somewhat hacky approach of listing to local keyspace events and using the respective keyspace patterns for set/push/zadd.

Aabhas Bhatia

unread,
Nov 23, 2017, 4:44:45 PM11/23/17
to Redis DB
Hey Felix,

I was wondering, in the key space approach how would you handle the case where a subscriber wants to access channels which are spread throughout i.e. on nodes other than to which the subscriber(client) is connected to.
In key space, a client issues a "get key" command and if the key is not in the hash slots of the connected server, the server gets the value from the right server.
In pub/sub, the client(subscriber) is in a listening mode, it wont go and issue a "get new mssgs" sortaof command so how will it be notified when a new message has been published on a channel on a different node?
Reply all
Reply to author
Forward
0 new messages