How to handle Redis Cluster MOVED response on the vertx-redis-client v4.1.2?

Felix Putera

unread,

Jul 22, 2021, 12:39:50 AM7/22/21

to vert.x

Hi, I have a question for vertx-redis-client v4.1.2:

I noticed that using vertx-redis-client v3.9+ on Redis Cluster, the MOVED response isn't doing re-fetching of slots anymore. Instead, it immediately returns a failed future.

Are we supposed to recreate the Redis client (by calling Redis.createClient) when this happens?

Thanks,

Felix

Paulo Lopes

unread,

Jul 23, 2021, 9:51:01 AM7/23/21

to vert.x

Hi Felix,

Yes, the safest is to recreate the client object, the reason is that if there is some load on the client object and many requests get "MOVED" as response, then we would issue as many requests a request for finding the nodes and reconnect, which in turn, can create a snow ball effect and overload the redis cluster.

Failing fast has the benefit that any in flight request would be terminated, and you can decide how to handle this situation, for example, queue while reconnecting or failing requests until the connection is back.

Given that now we have a single interface for all the connection modes, single, sentinel, cluster and replication, perhaps a good contribution would be a wrapper helper that implements opinionated recovery semantics. For example, wrap the client, and register an exception handler that catches recoverable errors. In such events, queue all calls to "send" and "batch" until the underlying client is replaced with a new one.

I think this isn't very complex to implement and a good first contribution if you're interested ;-)

Felix Putera

unread,

Jul 25, 2021, 9:41:55 PM7/25/21

to vert.x

Hi Paulo,

Thanks a lot for your answer! As for your suggestion for creating a wrapper for the Redis client, I'll create a new GitHub issue first for now and see if I got the time to pick it up at my own pace.

Felix Putera

unread,

Jul 27, 2021, 7:23:22 AM7/27/21

to vert.x

Hi Paulo,

After experimenting & reading the implementation code again, I just realized that to retry the MOVED error, we don't need to recreate the Redis client object the send method will always call the connect method on RedisClusterClient which in hand will invoke GET SLOTS.

I thought that the slots were cached in the Redis client object, but apparently not. This behavior seems to contradict your point that we don't want to issue too many requests to find the nodes. Can I ask if this observed behavior is intended?

Thanks,

Felix

Paulo Lopes

unread,

Jul 29, 2021, 10:36:58 AM7/29/21

to vert.x

Hi,

In cluster mode, at connection time, all slots are read and connections to all the participating nodes are made. This allows us to not delay commands as once the object is ready, commands can flow directly to the redis servers.

What you seem to be observing is the way the one shot commands work. In one shot mode you don't manage the connection, therefore for each command the following happens:

1. get a connection

2. run the command

3. return the connection

This is useful for single mode, but it can be very expensive in cluster mode as lots of preparation is done at connection time. Cluster mode users should manage their own connection.

Now during execution, if there's a "MOVED" error you can see in the code:

if (cause.is("MOVED")) {
          // cluster is unbalanced, need to reconnect
          handler.handle(Future.failedFuture(cause));
          return;

I think what you would like to see instead is that the connection has a semaphore that slots are being refreshed and that commands being sent during the time the semaphore is locked they get delayed with some back-off timeout like we do for errors of type "TRYAGAIN"

I think this could be added easily as an improvement as it would not affect any of the public APIs, just would make the internals more resilient.

A second option, which can be complex to implement is to keep a cache of slots at the top level client, but like with anything cache related, the trick is on getting it consistent, yet on any change we still need to refresh the slot connections on any running instance of the client, so complexity adds up.

Felix Putera

unread,

Aug 2, 2021, 1:12:08 AM8/2/21

to vert.x

Hi Paulo,

As always thanks for your response!

Upon looking back, I think my second question is quite unrelated to the first one: I am concerned regarding using the Cluster mode one-shot / connection-pooled client because the GET SLOTS command is invoked on every commands sent.

You suggested that I should use the connection mode Redis. Yet I think that there are some legitimate cases where one might not want to use a single connection, but rather the pooled connection; consider this: by using only single-connection-per-app, we should never use Redis blocking commands (https://redis.io/topics/modules-blocking-ops), because then it will block the processing of other ongoing requests using Redis too.

Ultimately, what I'm suggesting here is that: vert.x should provide a connection-pooled Redis Cluster client that doesn't always send GET SLOTS requests. It's more complicated I agree, but Jedis have these functionalities implemented. As a vert.x newbie, I thought that the vertx redis client would have 'core feature' parity with Jedis, but apparently, that's not the case yet.

Based on our discussion here, do you think if I should open a feature request in the GitHub repository?