Master SHUTDOWN and replication to slaves

tomk...@googlemail.com

unread,

Nov 9, 2015, 10:22:40 AM11/9/15

to Redis DB, tom.k...@sap.com

Hello fellow Redis users,

We are wondering if triggering a master SHUTDOWN will ensure the slaves to be in sync with the master?

Is the last acknowledged value guaranteed to be replicated before the master is closed?

Unfortunately there is nothing mentioned in the description (http://redis.io/commands/shutdown).

Sergei Bobovich asked already the same question in 2013 on the redis.io page but did not get an answer.

Thanks and best regards, Tom

Salvatore Sanfilippo

unread,

Nov 9, 2015, 11:18:43 AM11/9/15

to Redis DB, tom.k...@sap.com

Hello Tom,

currently there is no such guarantee, the suggested procedure is to use:

MULTI
CLIENT PAUSE <milliseconds>
ROLE
EXEC

To get the current offset, stop all the clients, wait for slaves to
reach the same offset (by calling ROLE in the slaves too).
When you are done you can kill the server.
However while the server is paused, you can't issue SHUTDOWN. You have
to kill the pid, or SHUTDOWN after it is available again after the
"pause" time.
What you can do is to sent the above transaction and SHUTDOWN
immediately after, so that it will be processed ASAP.

Why SHUTDOWN cannot ensure slaves are properly aligned? Because that
could take time, or slaves could be resynchronizing at the moment, or
disconnected for a netsplit or whatever, so SHUTDOWN is not a good way
to avoid race conditions during failovers. Sentinel manual failovers
can take care of this better. The same can do Redis Cluster that
performs the CLIENT PAUSE + offset check to slaves for you.

What SHUTDOWN could do that does not, is to at least try to write data
to the slave sockets if there is some room in the kernel buffers. This
look like a good idea but does not provide any guarantee at all, so
would be just an implementation improvement but semantically the same.

Regards,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Salvatore Sanfilippo

unread,

Nov 9, 2015, 11:28:46 AM11/9/15

to Redis DB, tom.k...@sap.com

On Mon, Nov 9, 2015 at 5:18 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> What SHUTDOWN could do that does not, is to at least try to write data
> to the slave sockets if there is some room in the kernel buffers. This
> look like a good idea but does not provide any guarantee at all, so
> would be just an implementation improvement but semantically the same.

I implemented the above in unstable / testing (3.2).

Salvatore

tomk...@googlemail.com

unread,

Nov 18, 2015, 7:51:50 AM11/18/15

to Redis DB, tom.k...@sap.com

Hi Salvatore,

thanks for your clear statement and the recommendation on how to do a clean master shutdown in a replication set.

I figured out that we neither can use the CLIENT PAUSE command as it was introduced in 2.9.50, nor can we use the Redis Cluster of 3.0 as we are using Redis 2.8.

But we anyhow have sentinels in place. That's why I focused on the manual failover.

I guess I did something wrong as it showed really bad behavior way worse than the theoretical data loss in the manual master shutdown + automatic failover scenario.

Of course a new master was successfully elected. Our test application is sentinel aware (Jedis). It gets the master via sentinels, writes data until connection is lost, asks for new master and continues with the next value in line.

Theoretically this should be possible without data loss. It really works fine with shutting down the master manually and automatic failover. So far never recognized any data loss. But with the manual failover, data is lost.

Do you have a good solution in place, maybe even some scripts how to update a three availability zone setup with three nodes and three sentinels, one in each zone.

We want to make sure that if we update minor versions, we do not loss data of course.

Our approach is to shutdown one node, independently of the role, update it and start it again. After that continue with the next node, until the last one is updated.