Hi,
In cluster mode, at connection time, all slots are read and connections to all the participating nodes are made. This allows us to not delay commands as once the object is ready, commands can flow directly to the redis servers.
What you seem to be observing is the way the one shot commands work. In one shot mode you don't manage the connection, therefore for each command the following happens:
1. get a connection
2. run the command
3. return the connection
This is useful for single mode, but it can be very expensive in cluster mode as lots of preparation is done at connection time. Cluster mode users should manage their own connection.
Now during execution, if there's a "MOVED" error you can see in the code:
if (
cause.is("MOVED")) {
// cluster is unbalanced, need to reconnect
handler.handle(Future.failedFuture(cause));
return;
I think what you would like to see instead is that the connection has a semaphore that slots are being refreshed and that commands being sent during the time the semaphore is locked they get delayed with some back-off timeout like we do for errors of type "TRYAGAIN"
I think this could be added easily as an improvement as it would not affect any of the public APIs, just would make the internals more resilient.
A second option, which can be complex to implement is to keep a cache of slots at the top level client, but like with anything cache related, the trick is on getting it consistent, yet on any change we still need to refresh the slot connections on any running instance of the client, so complexity adds up.