CLIENT PAUSE

Salvatore Sanfilippo

unread,

Feb 4, 2014, 10:25:23 AM2/4/14

to Redis DB

Hi all,

one of the building blocks of the manual failover I'm implementing in
Redis Cluster (there are times where you want to upgrade a given
master manually, and killing it to trigger a normal failover is not
cool, so there is a safer procedure for manual failover), is the
ability to pause clients of a given master for some time, while still
serving the slaves as usually.

Now instead of just providing an API for Redis Cluster to use, I
exposed it as a new sub-command of the CLIENT command, that is, CLIENT
PAUSE <milliseconds>.

The command exact semantics is documented here:
https://github.com/antirez/redis/commit/4919a13f503ab4ac5ad5611987996c4432c8de08

I believe there is some value into back porting this to 2.8, since it
allows a safer master upgrade procedure. You can CLIENT PAUSE a master
for a few seconds, stop it after the slave processed the latest
commands in the stream (a matter of milliseconds usually), and change
the clients configuration in a race-free way.

I wonder if there interest in this command in general and in having it
to 2.8 specifically. Thanks!

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
-- Wikipedia (Straw man page)

Yiftach Shoolman

unread,

Feb 4, 2014, 4:56:26 PM2/4/14

to redi...@googlegroups.com

Hi Salvatore,

Why not just adding manual failover command, and then:

1. Upgrade the slave

2. Manual failover the master--> the new slave

3. Upgrade the new slave

IMO - this this should be smoother, did I miss something ?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.

--

Yiftach Shoolman
+972-54-7634621

Salvatore Sanfilippo

unread,

Feb 4, 2014, 5:00:48 PM2/4/14

to Redis DB

Hello Yiftach,

CLIENT PAUSE is never used in Redis Cluster by the user directly, it
is just a building block (the API, not the command) used by the
internals. However it was exposed also as a raw feature.

So how Redis cluster handles that (designed: implementation due for tomorrow):

* The user just connects to a slave and send "CLUSTER FAILOVER" to
force a failover.
* The master announces its replication offset.
* The slave waits for the offset to match on its side, and starts an
election similar to the normal failover election, but with different
flags to foce masters to give the vote even if the slave's master is
not seen as failing.

So this is totally transparent for the Redis Cluster user.

However given that there is the new semantic I thought it would be
good to expose it via command since may be useful in other non-cluster
scenarios.

Salvatore

Salvatore Sanfilippo

unread,

Feb 4, 2014, 5:02:59 PM2/4/14

to Redis DB

Sorry I forgot a step :-)

the full procedure is:

* The user just connects to a slave and send "CLUSTER FAILOVER" to
force a failover.

* The slave sends a REQUEST_PAUSE packet via the cluster bus to the master.
* The master announces its replication offset to the slave.

* The slave waits for the offset to match on its side, and starts an
election similar to the normal failover election, but with different
flags to foce masters to give the vote even if the slave's master is
not seen as failing.

The request for pause is for X time, while the slave has X/2 time to
complete the failover (got voted) or it will give up.

Salvatore

Yiftach Shoolman

unread,

Feb 4, 2014, 5:09:25 PM2/4/14

to redi...@googlegroups.com, Redis DB

Ok got it, does REQUEST_PAUSE pause everything at the master or only 'writes' ?

Sent from my iPhone

Yiftach Shoolman

unread,

Feb 5, 2014, 1:16:44 AM2/5/14

to redi...@googlegroups.com

Hi Salvatore,

With this approach there is still an issue with the outstanding requests that were sent to the old master before the new master election took place. With the speed of Redis, this can be a significant amount of requests.

This can be solved with the following 'drain' mechanism (similar to what u suggested but with a few minor changes):

The user just connects to a slave and send "CLUSTER FAILOVER" to
force a failover.

* The slave sends a REQUEST_DRAIN packet via the cluster bus to the master.
* The master announces its replication offset to the slave, and starts buffering new requests

* The slave waits for the offset to match on its side, and starts an
election similar to the normal failover election, but with different
flags to foce masters to give the vote even if the slave's master is
not seen as failing.

* Once a new master is elected the old-master starts redirecting the buffered requests to the new-master. AFAIR there is such a concept in the Redis Cluster

Notes:

1. Note sure a failover timer is needed, but can be easily added

2. The redirection at the old-master can be continued during a configurable period to avoid a situation where some of the clients were late to get the updated cluster map.

--

Yiftach Shoolman
+972-54-7634621

Salvatore Sanfilippo

unread,

Feb 5, 2014, 4:45:13 AM2/5/14

to Redis DB

Hello Yiftach,

you are basically referring to the fact that when the clients are
stopped, there are things to process in the client buffers? It should
be already working as expected because this is what happens:

1) A client already sent "SET foo bar". It is in the client buffer to
be processed.
2) The master receives the PAUSE command via the cluster bus from the mastrer.
3) The master announces its replication offset to the slave via PING
(when a master is stopped for failover, it pings the slave that is
failing over manually 10 times per second, and flags the packet in a
special way).
4) The slave reaches the replication offset of the master, and
performs the failover.
5) When a slave performs the failover (both manually or on failure) it
broadcasts a PING with the new configuration to all instances (if this
best-effort packet is lost, there are the UPDATE messages that will
reconfigure nodes).

So basically the master will be reconfigured as slave *before* the
clients are unblocked. When the clients are finally unblocked, the
master is already configured as slave, so the commands instead to be
executed will be redirected to the new instance. It sounds like
everything is fine.

Salvatore

On Wed, Feb 5, 2014 at 7:16 AM, Yiftach Shoolman

— Wikipedia (Straw man page)

Yiftach Shoolman

unread,

Feb 5, 2014, 6:12:15 AM2/5/14

to redi...@googlegroups.com

Agree, missed that

Salvatore Sanfilippo

unread,

Feb 5, 2014, 6:15:05 AM2/5/14

to Redis DB

On Wed, Feb 5, 2014 at 12:12 PM, Yiftach Shoolman
<yiftach....@gmail.com> wrote:

> Agree, missed that

Thanks for giving me a chance to review the logic! Implementing it
right now, most code already written...

Salvatore Sanfilippo

unread,

Feb 5, 2014, 9:57:48 AM2/5/14

to Redis DB

The manual failover implementation was just pushed on Github, that's
what you see logged when a salve performs a manual failover:

[56571] 05 Feb 15:55:53.189 # Manual failover user request accepted.
[56571] 05 Feb 15:55:53.195 # Start of election delayed for 0
milliseconds (rank #0, offset 15).
[56571] 05 Feb 15:55:53.195 # Received replication offset for paused
master manual failover: 15
[56571] 05 Feb 15:55:53.296 # All master replication stream processed,
manual failover can start.
[56571] 05 Feb 15:55:53.296 # Starting a failover election for epoch 7456.
[56571] 05 Feb 15:55:53.297 # Failover election won: I'm the new master.
[56571] 05 Feb 15:55:53.297 # Connection with master lost.
[56571] 05 Feb 15:55:53.297 * Caching the disconnected master state.
[56571] 05 Feb 15:55:53.297 * Discarding previously cached master state.
[56571] 05 Feb 15:55:54.098 * Slave asks for synchronization

Yiftach Shoolman

unread,

Feb 5, 2014, 10:07:08 AM2/5/14

to redi...@googlegroups.com

Great, now it's the time to test it...

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.

--

Yiftach Shoolman
+972-54-7634621

Salvatore Sanfilippo

unread,

Feb 5, 2014, 10:08:14 AM2/5/14

to Redis DB

On Wed, Feb 5, 2014 at 4:07 PM, Yiftach Shoolman
<yiftach....@gmail.com> wrote:
> Great, now it's the time to test it...

Doing it right now ;-)

Salvatore Sanfilippo

unread,

Feb 5, 2014, 11:19:35 AM2/5/14

to Redis DB

Tests went very well so far. This was my setup:

The cluster consistency test running (this detects consistency
failures using atomic counters that are taken both in Redis Cluster
and "logically" client side).
At the same time sending CLUSTER FAILOVER messages around not only
does not trigger inconsistencies, as expected, but moreover does not
allow the client (that writes in a busy loop) to detect write errors
since pending requests are redirected transparently.

More tests in the next days. This was one of the latest missing
features in the core, most of the remaining work is in the tools side
(redis-trib especially).

Salvatore

Reply all

Reply to author

Forward