New Clients-Sentinel interaction: ROLE and CLIENT KILL

225 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Jun 16, 2014, 9:51:46 AM6/16/14
to Redis DB
Hi all,

while in the latest months Redis Sentinel did a lot of progresses as
an HA solution for Redis, we are still left with the old protocol
between Sentinel clients and Redis+Sentinel as a system.
The old protocol was something as simple as: on reconnections, ask
Sentinel. The documentation about client interaction is old and even
mentions things no longer exist after the Redis Sentinel rewrite, like
the IDOWNKNOW error.

The current ask-on-reconnection system is fragile, because it is not
always true that master unavailability will result into connections
with all the clients lost.
Moreover, clients may be connected to slaves and don't recognize a
slave->master switch, adding load to the master instance that should
be instead scaled via slaves.
The other (more serious) problem of clients remaining attached to an
old master is mitigated by the fact that when Sentinel reconfigures
the old master as a slave, the client gets errors back on writes,
however this is not enough to call the current solution acceptable.

So some time ago it was proposed that to make this system more
reliable, every Redis instance governed by Sentinel, should make sure
to disconnect the clients after configuration changes.
Because of the ask-on-reconnection rule, this should help to refresh
the configuration. However this alone is not enough, because there is
an inherent race condition. A client may ask to a Sentinel that did
not yet received an update configuration, for example.

All this stuff lead to a design of a new system for Clients - Sentinel
iteration. The idea was to retain most of the simplicity of the old
iteration, but make it safe, and at the same time retain an
interesting property: to make it as simple as it is today to still
decouple Sentinel clients from actual client libraries
implementations, so that Sentinel clients can easily be written as
"wrappers" of the actual library clients. Moreover the other design
constraint, was to only add things which are generally useful for
Redis instead of modeling an ad-hoc solution for Sentinel without
providing the primitives as general tools for Redis.

The final design, which I propose in this email, is composed of a
small set of changes to Sentinel, plus two the introduction of two
general purpose commands:

1) The ROLE command is added as a fast way for clients to fetch
informations about the role of an instance in the context of Redis
replication.
2) A new, alternative form of the CLIENT KILL command is provided,
with changes that are both designed to make the command saner, and to
help Sentinel perform the disconnection work.

The new Client - Sentinel iteration is now, roughly, the following:

1) On disconnection (or at the time of the first connection), a random
Sentinel is contacted to ask the address of the current master (or the
list of slaves, if the client aims to address a slave for read-only
workload).
2) The master (or the slave) is contacted, and its role verified to
match the one provided by Sentinel to use the ROLE command.
3) If the check fails, and the instance is a slave instead of a
master, or the other way around, Step 1 and 2 are performed again,
after a small delay of a few hundred milliseconds.

The ROLE command is designed to be easy to parse (unlike INFO), and to
provide additional useful informations to clients that are approaching
a new Redis instance.
The following is an example, using Ruby, of it's output (redis-cli
output format is not ideal to show the ROLE command output):

irb(main):003:0> r.role
=> ["master", 29, [["127.0.0.1", "6380", "29"], ["127.0.0.1", "6381", "29"]]]

The first element is the role of the instance: master or slave. The
second element is the master replication offset.
If the first element is "master", the third argument is an array of
slave instances, defined as host, port, and currently acknowledged
replication offset.

A client executing the ROLE command in the context of a slave has
enough information to also contact slaves and to estimate the level of
synchronization between master and its slaves.

The following is instead the output of ROLE when called in the context
of a slave:

irb(main):008:0> r.role
=> ["slave", "127.0.0.1", 6379, "connected"]

This time we have the address and port of the master, as additional
arguments, and the state of the replication. (connect, connecting,
sync, connected). Note: "connect" means "must connect", while
"connecting" means that we already have a non-blocking connect in
progress. sync and connected should be more obvious.

The new CLIENT KILL command is backward compatible because the old
three-arguments command fingerprint is not used by the new format
(CLIENT KILL <addr>).
Instead a "option value" arguments style is used. By default the new
command does not kill the client calling the command. The clients to
kill are specified by filters that are handled via logical AND
(however it rarely makes sense to use more than one filter).

Examples:

CLIENT KILL type normal -> kills all the normal clients (including
monitors), but not the client calling the command.
CLIENT KILL type normal skipme no -> kills all the normal clients
including the current client. SKIPME is by default "1".
CLIENT KILL type pubsub
CLIENT KILL type slave
CLIENT KILL addr 127.0.0.1:6379 (the same as the old behavior, kill-by-addr).

An interesting thing is the new kill-by-ID featuer:

CLIENT KILL id 23904023

Now every Redis client connected has an unique incremental 64-bit id
assigned, that is provided via CLIENT LIST, and can be used in order
to kill a specific client without the risk of killing something else.

So Sentinel will make use of CLIENT KILL and always send "SLAVEOF"
commands via MULTI/EXEC together with CLIENT KILL type normal + pubsub
commands.

Something like:

MULTI
CLIENT KILL normal
CLIENT KILL pubsub
SLAVEOF ...
EXEC

Disconnected clients will try to reconnect and check the role via the
ROLE command, or will try again.

All this stuff is already implemented in the unstable branch, but not
merged into 3.0 and 2.8 in order to give the community some time
window to provide feedbacks about what should be improved in the above
schema.
As long as stuff are into unstable, we can change it. So if you have
feedbacks, they are welcomed.

Thanks,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Salvatore Sanfilippo

unread,
Jun 17, 2014, 6:45:09 PM6/17/14
to Redis DB
Hello,

in the unstable branch there is an update to Sentinel handling the
role update via the MULTI/EXEC block that includes the disconnection
of the currently connected clients.
So now the missing part is support from clients. I'm undertaking an
effort to modify redis-rb in order to add Sentinel support.
I tried to design it as a wrapper to Redis-rb, but apparently this
does not allow to write a correct implementation since after the first
disconnection the client will try to reconnect (which per se is a good
behavior).
So the idea is to add a new way to specify a new connection with a
sentinel url, like in sentinel://10.0.0.1:26379/10.0.0.2:26379, this
way the reconnection can be left untouched but at every reconnection
the client will always to contact Sentinel.

After I've a working redis-rb implementation, I'll implement some test
very similar to the consistency test implemented by redis-rb-cluster
examples. This way I can verify the new protocol first-hand.
If everything sounds cool, I'll port everything to Redis 2.8 and
release an update (that is completely backward compatible).

Salvatore
"One would never undertake such a thing if one were not driven on by
some demon whom one can neither resist nor understand."
— George Orwell

Dvir Volk

unread,
Jun 18, 2014, 4:03:33 AM6/18/14
to redi...@googlegroups.com
Will this get into 2.8?


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

unread,
Jun 18, 2014, 4:29:34 AM6/18/14
to Redis DB
On Wed, Jun 18, 2014 at 10:03 AM, Dvir Volk <dv...@everything.me> wrote:

> Will this get into 2.8?

Yes, definitely, in a matter of weeks at max.
That's why I was looking for feedbacks, but anyway before merging this
is going to be validated by tests with the redis-rb implementation.

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

Dvir Volk

unread,
Jun 18, 2014, 4:42:55 AM6/18/14
to redi...@googlegroups.com
We'll try to give it a spin soon, although we're not using 3.0 and it will probably be too late for you anyway...


Salvatore Sanfilippo

unread,
Jun 18, 2014, 5:38:26 AM6/18/14
to Redis DB
Thanks, testing is always welcomed :-)

sudhanshu srivastav

unread,
Feb 4, 2017, 6:20:47 AM2/4/17
to Redis DB
Hi 
I am trying to setup below configuratio -
1)  1 master redis node
2)  1 slave redis node.
3)  1 redis client writing to master.
4) 2 sentinels

Now redis client come up contact to redis sentinel and gets master redis node ip port and start executing commands.
now if i kill master then i think as per redis.io page sentinel will send CLIENT KILL type normal to redis client.
now to handle this event at redis client and resync with new master it. 

My client is getting core dump as soon as i kill master.

hva...@gmail.com

unread,
Feb 4, 2017, 5:49:40 PM2/4/17
to Redis DB
My reading of antirez's description seems to be different than yours.  Sentinel does not send any commands to Redis clients.  CLIENT KILL is a command Sentintel sends to a Redis server.  It causes the Redis server to disconnect the connections from the clients to that Redis server.

If your client code is getting a core dump, it's most likely your client is not properly catching and handling the exception that occurs when it tries to send a command to the Redis server and that connection no longer exists.

  -Greg
Message has been deleted

sudhanshu srivastav

unread,
Feb 5, 2017, 3:04:20 AM2/5/17
to Redis DB
Thanks for reply Mr Greg.

I got your point, But I have killed the redis server then to whom sentinel will send ?

I have put the check if redis command reply is  null terminate gracefully. Is there any specific way to handle "It causes the Redis server to disconnect the connections from the clients to that Redis server."
at client side. how to do it.

hva...@gmail.com

unread,
Feb 5, 2017, 11:39:40 PM2/5/17
to Redis DB
When the master Redis server becomes unavailable, the Sentinels detect this and then they will vote among themselves and decide which of the other Redis servers should become the master.  They send the commands to that Redis server to configure it as the master, and send the commands to the other Redis servers to make them replicate data from the new master.  The next time your client code asks the Sentinels for the master (and slaves) the Sentinels will answer and your client can re-connect to the new master.  You've said you use two Sentinels, which is an even number.  The recommendation is to use an odd number of Sentinels so there is always a majority when they vote.  With an even number, they can have trouble achieving a majority and making the decision on the new master Redis.

There is no universal method for handling a broken TCP connection.  The method you need to use depends on the language you're using, but you haven't described this.

sudhanshu srivastav

unread,
Feb 6, 2017, 6:31:19 AM2/6/17
to Redis DB
I am using C.

Currently I am handling broken TCP Connection like below -
      if redisCommand return is null, again find master from sentinel and start writing. if you have any better way let me know.

1) 1 Master DB(A) and corresponding redis client writing on master.
2) 2 Slave DB(B & C)  and only one client reading from slave DB B.
3) 2 Sentinels.
4) I kill master now i see that B is choosen as master and writing redis client learn new master and start writing on that.
    But I see that reading redis client on slave DB C is not working and GET key command using redisCommand is returning null do we need to reestablish TCP connection if Yes why ?

The Real Bill

unread,
Feb 10, 2017, 12:04:29 PM2/10/17
to Redis DB


On Monday, February 6, 2017 at 5:31:19 AM UTC-6, sudhanshu srivastav wrote:
I am using C.

Currently I am handling broken TCP Connection like below -
      if redisCommand return is null, again find master from sentinel and start writing. if you have any better way let me know.

1) 1 Master DB(A) and corresponding redis client writing on master.
2) 2 Slave DB(B & C)  and only one client reading from slave DB B.
3) 2 Sentinels.
4) I kill master now i see that B is choosen as master and writing redis client learn new master and start writing on that.
    But I see that reading redis client on slave DB C is not working and GET key command using redisCommand is returning null do we need to reestablish TCP connection if Yes why ?


When a master is promoted Sentinel instructs the remaining Redis instances in the pod to kill all "normal" client connections. See https://redis.io/topics/sentinel-clients for details 
and reasoning.


Cheers,
Bill
Reply all
Reply to author
Forward
0 new messages