Hello Jonas,
replying to your email here, but actually taking arguments from the
whole thread.
In general quorum-based systems are not exactly close friends of
changing majorities. Things can be made working, but is surely more
complex compared to a static configuration.
Let's start from things that appear to fix the problem superficially
but are actually not solutions:
1) Auto-cleanup on shutdown, like sending a remove-me message, is not
a solution unfortunately, since what do you do on network partitions
for example? If the message can't be delivered, or can't be delivered
to all nodes, it is not possible to wait forever to shutdown a node.
To make this reliable, involves adding more complexity compared to
other solutions.
2) Auto-eject after timeout, which is another solution that comes
spontaneously to mind, is broken as well. You may end with a
non-reachable quorum for all the "timeout" time, for example, or an
instance can just be down accidentally, for too much time, changing
the quorum to a value that violates the semantics that the user
expected in a given configuration of Sentinel.
So there are a few solutions left. One is to just run Sentinels
outside the auto scaling group. This is fine if Sentinel is just used
as a system that should work in the case of a single master having
issues.
However in a setup which is more resistant to data loss on network
partitions (where the old master gets partitioned away together with
clients), we need to run a Sentinel in every node where there is a
Redis instance that can be promoted to master, and to configure
replication so that masters stop accepting writes after some time they
don't get acks from slaves. In this setup masters isolated in a
minority with clients stop accepting writes, which is a desirable
feature if you want to protect against partitions.
So in this setup, if you auto-scale adding nodes which include
redis+sentinel pairs, you are required, on scale-outs, to make sure
the other Sentinels will forget about the Sentinels that are not going
to be killed.
Currently the only way to do this, is to use SENTINEL RESET, however
there is a problem if a failure happens exactly when we send the RESET
command to all the instances: no Sentinel know other Sentinels, so no
failover can be performed.
To avoid this problem we probably need to implement a new command,
similar to Redis Cluster CLUSTER FORGET, that makes Sentinels able to
just unregister a single instance.
The good news is that in your setup, in the auto scaling group, you
have the app nodes, C and D, and there is no reason for you to run
additional Sentinels when those nodes are automatically scaled.
Your a setup which could be perfect for you is to run Sentinels in A,
B, C, D, making sure that when C and D are auto-scaled no additional
Sentinels are run in E, F, ...
Note: to run Sentinel in an even number of nodes is not an issue in
this Setup, because you don't have at least three nodes with
Redis+Sentinel, so anyway you are not able to protect against
partitions leaving your master partitioned away with a client, while
in the other side the slave is promoted.
I hope this makes things a big clear. Basically Sentinel + Redis as a
distributed system have a semantics that changes depending on the
exact configuration of both.
If you want to protect against single node failures, it is trivial to
configure it, but if you want to bound data loss on arbitrary network
partitions, you need a more complex setup with an odd number of
servers and a Redis+Sentinel combo in every node.
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)