Sentinel Key Server

0 views

Skip to first unread message

Lola Bergo

unread,

Aug 5, 2024, 3:05:03 AM8/5/24

to diablazinha

Sentinelitself is designed to run in a configuration where there are multiple Sentinel processes cooperating together. The advantage of having multiple Sentinel processes cooperating are the following:

The sum of Sentinels, Redis instances (masters and replicas) and clientsconnecting to Sentinel and Redis, are also a larger distributed system withspecific properties. In this document concepts will be introduced graduallystarting from basic information needed in order to understand the basicproperties of Sentinel, to more complex information (that are optional) inorder to understand how exactly Sentinel works.

The current version of Sentinel is called Sentinel 2. It is a rewrite ofthe initial Sentinel implementation using stronger and simpler-to-predictalgorithms (that are explained in this documentation).

However it is mandatory to use a configuration file when running Sentinel, as this file will be used by the system in order to save the current state that will be reloaded in case of restarts. Sentinel will simply refuse to start if no configuration file is given or if the configuration file path is not writable.

Sentinels by default run listening for connections to TCP port 26379, sofor Sentinels to work, port 26379 of your servers must be open to receiveconnections from the IP addresses of the other Sentinel instances.Otherwise Sentinels can't talk and can't agree about what to do, so failoverwill never be performed.

The Redis source distribution contains a file called sentinel.confthat is a self-documented example configuration file you can use toconfigure Sentinel, however a typical minimal configuration file looks like thefollowing:

You only need to specify the masters to monitor, giving to each separatedmaster (that may have any number of replicas) a different name. There is noneed to specify replicas, which are auto-discovered. Sentinel will update theconfiguration automatically with additional information about replicas (inorder to retain the information in case of restart). The configuration isalso rewritten every time a replica is promoted to master during a failoverand every time a new Sentinel is discovered.

The example configuration above basically monitors two sets of Redisinstances, each composed of a master and an undefined number of replicas.One set of instances is called mymaster, and the other resque.

The first line is used to tell Redis to monitor a master called mymaster,that is at address 127.0.0.1 and port 6379, with a quorum of 2. Everythingis pretty obvious but the quorum argument:

Now that you know the basic information about Sentinel, you may wonder whereyou should place your Sentinel processes, how many Sentinel processes you needand so forth. This section shows a few example deployments.

Note that a majority is needed in order to order different failovers, and later propagate the latest configuration to all the Sentinels. Also note that the ability to failover in a single side of the above setup, without any agreement, would be very dangerous:

In the above configuration we created two masters (assuming S2 could failoverwithout authorization) in a perfectly symmetrical way. Clients may writeindefinitely to both sides, and there is no way to understand when thepartition heals what configuration is the right one, in order to preventa permanent split brain condition.

In every Sentinel setup, as Redis uses asynchronous replication, there isalways the risk of losing some writes because a given acknowledged writemay not be able to reach the replica which is promoted to master. However inthe above setup there is a higher risk due to clients being partitioned awaywith an old master, like in the following picture:

In this case a network partition isolated the old master M1, so thereplica R2 is promoted to master. However clients, like C1, that arein the same partition as the old master, may continue to write datato the old master. This data will be lost forever since when the partitionwill heal, the master will be reconfigured as a replica of the new master,discarding its data set.

This problem can be mitigated using the following Redis replicationfeature, that allows to stop accepting writes if a master detects thatit is no longer able to transfer its writes to the specified number of replicas.

With the above configuration (please see the self-commented redis.conf example in the Redis distribution for more information) a Redis instance, when acting as a master, will stop accepting writes if it can't write to at least 1 replica. Since replication is asynchronous not being able to write actually means that the replica is either disconnected, or is not sending us asynchronous acknowledges for more than the specified max-lag number of seconds.

Using this configuration, the old Redis master M1 in the above example, will become unavailable after 10 seconds. When the partition heals, the Sentinel configuration will converge to the new one, the client C1 will be able to fetch a valid configuration and will continue with the new master.

Sometimes we have only two Redis boxes available, one for the master andone for the replica. The configuration in the example 2 is not viable inthat case, so we can resort to the following, where Sentinels are placedwhere clients are:

In this setup, the point of view Sentinels is the same as the clients: ifa master is reachable by the majority of the clients, it is fine.C1, C2, C3 here are generic clients, it does not mean that C1 identifiesa single client connected to Redis. It is more likely something likean application server, a Rails app, or something like that.

If the box where M1 and S1 are running fails, the failover will happenwithout issues, however it is easy to see that different network partitionswill result in different behaviors. For example Sentinel will not be ableto setup if the network between the clients and the Redis servers isdisconnected, since the Redis master and replica will both be unavailable.

Note that if C3 gets partitioned with M1 (hardly possible withthe network described above, but more likely possible with differentlayouts, or because of failures at the software layer), we have a similarissue as described in Example 2, with the difference that here we haveno way to break the symmetry, since there is just a replica and master, sothe master can't stop accepting queries when it is disconnected from its replica,otherwise the master would never be available during replica failures.

So this is a valid setup but the setup in the Example 2 has advantagessuch as the HA system of Redis running in the same boxes as Redis itselfwhich may be simpler to manage, and the ability to put a bound on the amountof time a master in the minority partition can receive writes.

The setup described in the Example 3 cannot be used if there are less thanthree boxes in the client side (for example three web servers). In thiscase we need to resort to a mixed setup like the following:

In theory this setup works removing the box where C2 and S4 are running, andsetting the quorum to 2. However it is unlikely that we want HA in theRedis side without having high availability in our application layer.

Docker uses a technique called port mapping: programs running inside Dockercontainers may be exposed with a different port compared to the one theprogram believes to be using. This is useful in order to run multiplecontainers using the same ports, at the same time, in the same server.

Since Sentinels auto detect replicas using masters INFO output information,the detected replicas will not be reachable, and Sentinel will never be able tofailover the master, since there are no good replicas from the point of view ofthe system, so there is currently no way to monitor with Sentinel a set ofmaster and replica instances deployed with Docker, unless you instruct Dockerto map the port 1:1.

For the first problem, in case you want to run a set of Sentinelinstances using Docker with forwarded ports (or any other NAT setup where portsare remapped), you can use the following two Sentinel configuration directivesin order to force Sentinel to announce a specific set of IP and port:

Enabling the announce-hostnames global configuration makes Sentinel use host names instead. This affects replies to clients, values written in configuration files, the REPLICAOF command issued to replicas, etc.

In the next sections of this document, all the details about Sentinel API,configuration and semantics will be covered incrementally. However for peoplethat want to play with the system ASAP, this section is a tutorial that showshow to configure and interact with 3 Sentinel instances.

Here we assume that the instances are executed at port 5000, 5001, 5002.We also assume that you have a running Redis master at port 6379 with areplica running at port 6380. We will use the IPv4 loopback address 127.0.0.1everywhere during the tutorial, assuming you are running the simulationon your personal computer.

As we already specified, Sentinel also acts as a configuration provider forclients that want to connect to a set of master and replicas. Because ofpossible failovers or reconfigurations, clients have no idea about who isthe currently active master for a given set of instances, so Sentinel exportsan API to ask this question:

Sentinel provides an API in order to inspect its state, check the healthof monitored masters and replicas, subscribe in order to receive specificnotifications, and change the Sentinel configuration at run time.

By default Sentinel runs using TCP port 26379 (note that 6379 is the normalRedis port). Sentinels accept commands using the Redis protocol, so you canuse redis-cli or any other unmodified Redis client in order to talk withSentinel.

It is possible to directly query a Sentinel to check what is the state ofthe monitored Redis instances from its point of view, to see what otherSentinels it knows, and so forth. Alternatively, using Pub/Sub, it is possibleto receive push style notifications from Sentinels, every time some eventhappens, like a failover, or an instance entering an error condition, andso forth.