scaling redis while using lettuce

462 views
Skip to first unread message

Tuco

unread,
May 12, 2016, 6:19:26 AM5/12/16
to lettuce-redis-client-users
Hi All, 

We have been using lettuce for a while now, and it always works like a charm, but when we are trying to scale redis, we seem to be running into some issues.

Below is the scenario.

We have the following.
  1. 70 master node cluster
  2. one slave per master, ie 70 masters + 70 slaves
  3. ~60 application servers which push data in the redis nodes.
The 60 application servers are expected to increase to 100.

However, if such a configuration change happens, it is not detected by lettuce, unless we refresh the cluster topology by calling ClusterClient.reloadPartitions() or start the background job using ClusterClientOptions.
Considering, the cluster has a number of nodes, there is a possibility that any node may temporarily fail, and slave may become a master, so we decided to run the built in job using ClusterClientOptions

We run the job using ClusterClientOptions configured every 60 secs, but considering there are 140 total nodes, the command "cluster nodes" takes around 30 ms(it is the slowest command, the other commands are in microseconds), while it was taking 100 microseconds for a 30 node cluster.
Also, ClusterTopologyRefreshTask hits each of the 140 nodes for refresh.

Considering, all 60 servers hit 140 nodes every 60 seconds, each redis node gets an average of 1 cluster client request per second, which takes 30 millisecs.
That means we are losing 3% of the time in the "cluster nodes" command.

We are planning to double the size of the redis cluster, and the application servers will also increase.
The former will lead to even slow "cluster nodes" command, and the latter will lead to more than 1 "cluster nodes" command per second on each node. 
The combined effect will definitely have a bad impact on redis considering "cluster nodes" command is already around 1000 times slower than the other commands.

We tried to dig in the redis code, and found that when the ClusterTopologyRefreshTask tries to refresh its partitions, it executes "cluster nodes" command on all the partitions every 60 secs, in our case it is 140 nodes using TimedAsyncCommand. 

One solution which we felt is that if it could iterate over the set of RedisURIs, make a sync call to get the "cluster nodes", and if any call was successful, use it to refresh the partitions, leading to only one random redis being queried for the cluster nodes, which will solve our case.

The above issue is only noticeable when the cluster size is very large because that is when the "cluster nodes" commands becomes very slow.

suggestions / possible solutions are welcome..

Thanks

Mark Paluch

unread,
May 12, 2016, 7:48:02 AM5/12/16
to lettuce-redis-client-users
Hi, 

great to hear that lettuce is used with more just than a handful of servers. Topology change discovery in Redis Cluster has its challenges. While Redis Sentinel publishes messages for topology changes, Redis Cluster requires some sort of pro-active handling.

One way to discover changes it polling, as it is provided with lettuce. I understand that polling the topology, even it's done in parallel, using 100 to 200 nodes is costly.
The other way is listening to client-side events such as disconnect (persistent disconnect) and "MOVED"/"ASK" redirections. That functionality is not yet part of the client as these events are only indicators for a possible topology change.

You could implement your own polling as the executors (for scheduling) and the partition refresh API (triggering a refresh/setting partitions) are public. 

The reason for the job refresh design is to eliminate cluster split issues. When running a cluster and a particular node is extracted from the cluster, it starts maintaining an own view of the cluster. When the refresh hits just that one node, the topology view of the client is totally different than it should be and the only way to fix this, is to obtain the topology from the seed nodes (assuming the seed nodes are still part of the cluster).

I'd love to continue this discussion and derive an improvement for the topology refresh.

I have no solution yet as I'm currently gathering ideas how to approach this issue. Maybe using a small set of randomly picked nodes in the refresh job, combined with listening on client-side events/triggering the refresh upon the event could do the job.

Cheers, Mark

Tuco

unread,
May 12, 2016, 11:47:50 AM5/12/16
to lettuce-redis-client-users
Hi Mark,

Yes, indeed. lettuce is a very nice redis client, and i don't know anyone who knows about it and is not impressed by it. 

I think "Ask/Moved" redirections, or a timeout exception from the node which was supposed to serve a particular slot, are fair indicators that the topology has changed(slots are moved, or a node is down), so that the "cluster slots" or "cluster nodes" command can be used for getting the info to rebuild the cache.

Regarding which node should be used for getting the info, the users should give a list of redis URIs without which the cluster will not work, eg. if we have 5 masters and 10 slaves, the users should provide the IPs and ports of 1 master and its 2 slaves, because atleast one of them should always be working if the cluster is up(although there is a partial cluster use case where the cluster does not fail even though some of its slots are not served). Giving only one redis uri is anyways problematic, because if users give a single URI and depend on auto discovery of  other nodes, it is possible that the redis node failed but its slave was still up, leading to the cluster running in good state, but when the application server was restarted, it will only try to connect to the failed node.

May be, so, the onus should be on the application developer to give the correct set of nodes.
If they give too few nodes, there are chances that the cluster will not be discovered on application restart.
If they give too many nodes, we should choose the first node and use "cluster nodes" from it, if that node gives cluster state as ok in "cluster info" command.
If in the worst case, the cluster is split, and there are two separate running clusters, (split brain scenario), whether ok or not, all bets are off, and may be we should return the first one. 

Please let me know your thoughts.

Thanks

Mark Paluch

unread,
May 12, 2016, 4:25:11 PM5/12/16
to lettuce-redis-client-users
Putting the user in charge of a stable topology discovery sounds key to me. The user is the only one who cal reliably say, which parts of the cluster are some "core" components. So defining the seed nodes as source for the topology is a good idea. Combined with signals from the client, if a disconnect persists for a couple of seconds, this could then trigger the topology refresh using the specified seed nodes.

I created https://github.com/mp911de/lettuce/issues/240 to track the future efforts on this topic.

Tuco

unread,
May 12, 2016, 8:44:36 PM5/12/16
to lettuce-redis-client-users
Considering normally redis servers are very stable, running the job every 1 minute is probably an overkill, particularly for very large clusters where "cluster nodes" is costly. but if a cluster fails for more than 1 minute, that is also not fair. Also, in our case, if we have 60+ application servers querying 1 master + 1 slave which we give as the reliable nodes in the cluster, these nodes will end up getting all the "cluster nodes" command(1+ call per sec), which will degrade their performance, unless we make them just dummy nodes with no slots.

The best signals are the "Moved/Ask" redirections coupled with client disconnect/timeout, which can be used as signals for client topology. If it is done and we can have an event based cluster topology change notifier, then we can get away with the cluster refresh job(or have it optional), and the cluster refresh will be done only when needed, and that too with a max delay of "Moved/Ask"/"Timeout" + some time to  refresh the cluster from the original set.

Thanks 
Reply all
Reply to author
Forward
0 new messages