Hi all,
while in the latest months Redis Sentinel did a lot of progresses as
an HA solution for Redis, we are still left with the old protocol
between Sentinel clients and Redis+Sentinel as a system.
The old protocol was something as simple as: on reconnections, ask
Sentinel. The documentation about client interaction is old and even
mentions things no longer exist after the Redis Sentinel rewrite, like
the IDOWNKNOW error.
The current ask-on-reconnection system is fragile, because it is not
always true that master unavailability will result into connections
with all the clients lost.
Moreover, clients may be connected to slaves and don't recognize a
slave->master switch, adding load to the master instance that should
be instead scaled via slaves.
The other (more serious) problem of clients remaining attached to an
old master is mitigated by the fact that when Sentinel reconfigures
the old master as a slave, the client gets errors back on writes,
however this is not enough to call the current solution acceptable.
So some time ago it was proposed that to make this system more
reliable, every Redis instance governed by Sentinel, should make sure
to disconnect the clients after configuration changes.
Because of the ask-on-reconnection rule, this should help to refresh
the configuration. However this alone is not enough, because there is
an inherent race condition. A client may ask to a Sentinel that did
not yet received an update configuration, for example.
All this stuff lead to a design of a new system for Clients - Sentinel
iteration. The idea was to retain most of the simplicity of the old
iteration, but make it safe, and at the same time retain an
interesting property: to make it as simple as it is today to still
decouple Sentinel clients from actual client libraries
implementations, so that Sentinel clients can easily be written as
"wrappers" of the actual library clients. Moreover the other design
constraint, was to only add things which are generally useful for
Redis instead of modeling an ad-hoc solution for Sentinel without
providing the primitives as general tools for Redis.
The final design, which I propose in this email, is composed of a
small set of changes to Sentinel, plus two the introduction of two
general purpose commands:
1) The ROLE command is added as a fast way for clients to fetch
informations about the role of an instance in the context of Redis
replication.
2) A new, alternative form of the CLIENT KILL command is provided,
with changes that are both designed to make the command saner, and to
help Sentinel perform the disconnection work.
The new Client - Sentinel iteration is now, roughly, the following:
1) On disconnection (or at the time of the first connection), a random
Sentinel is contacted to ask the address of the current master (or the
list of slaves, if the client aims to address a slave for read-only
workload).
2) The master (or the slave) is contacted, and its role verified to
match the one provided by Sentinel to use the ROLE command.
3) If the check fails, and the instance is a slave instead of a
master, or the other way around, Step 1 and 2 are performed again,
after a small delay of a few hundred milliseconds.
The ROLE command is designed to be easy to parse (unlike INFO), and to
provide additional useful informations to clients that are approaching
a new Redis instance.
The following is an example, using Ruby, of it's output (redis-cli
output format is not ideal to show the ROLE command output):
irb(main):003:0> r.role
=> ["master", 29, [["127.0.0.1", "6380", "29"], ["127.0.0.1", "6381", "29"]]]
The first element is the role of the instance: master or slave. The
second element is the master replication offset.
If the first element is "master", the third argument is an array of
slave instances, defined as host, port, and currently acknowledged
replication offset.
A client executing the ROLE command in the context of a slave has
enough information to also contact slaves and to estimate the level of
synchronization between master and its slaves.
The following is instead the output of ROLE when called in the context
of a slave:
irb(main):008:0> r.role
=> ["slave", "127.0.0.1", 6379, "connected"]
This time we have the address and port of the master, as additional
arguments, and the state of the replication. (connect, connecting,
sync, connected). Note: "connect" means "must connect", while
"connecting" means that we already have a non-blocking connect in
progress. sync and connected should be more obvious.
The new CLIENT KILL command is backward compatible because the old
three-arguments command fingerprint is not used by the new format
(CLIENT KILL <addr>).
Instead a "option value" arguments style is used. By default the new
command does not kill the client calling the command. The clients to
kill are specified by filters that are handled via logical AND
(however it rarely makes sense to use more than one filter).
Examples:
CLIENT KILL type normal -> kills all the normal clients (including
monitors), but not the client calling the command.
CLIENT KILL type normal skipme no -> kills all the normal clients
including the current client. SKIPME is by default "1".
CLIENT KILL type pubsub
CLIENT KILL type slave
CLIENT KILL addr
127.0.0.1:6379 (the same as the old behavior, kill-by-addr).
An interesting thing is the new kill-by-ID featuer:
CLIENT KILL id 23904023
Now every Redis client connected has an unique incremental 64-bit id
assigned, that is provided via CLIENT LIST, and can be used in order
to kill a specific client without the risk of killing something else.
So Sentinel will make use of CLIENT KILL and always send "SLAVEOF"
commands via MULTI/EXEC together with CLIENT KILL type normal + pubsub
commands.
Something like:
MULTI
CLIENT KILL normal
CLIENT KILL pubsub
SLAVEOF ...
EXEC
Disconnected clients will try to reconnect and check the role via the
ROLE command, or will try again.
All this stuff is already implemented in the unstable branch, but not
merged into 3.0 and 2.8 in order to give the community some time
window to provide feedbacks about what should be improved in the above
schema.
As long as stuff are into unstable, we can change it. So if you have
feedbacks, they are welcomed.
Thanks,
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)