Redis instance configuration via Sentinel

Salvatore Sanfilippo

unread,

May 21, 2013, 5:42:00 AM5/21/13

to Redis DB

Hello, I just finished the first draft of the proposal. Comments welcomed:

Introduction

===

Sentinel already acts as a configuration device for clients. This
proposal further extends its role as a configuration device for Redis
instances after an instance restart.

This has a number of benefits such as better handling of reboots,
simpler to handle configurations of instances.

Basic operations

===

A new configuration option is added

manage-role-with-sentinels mymaster 192.168.1.1 2679 192.168.1.2 2679 … …

So in the configuration there are two elements: the master group name,
and a number of ip / port pairs that are the entry points to the
Sentinel system, so as usually the list does not need to be
exhaustive, but should contain enough addresses that we'll likely be
able to connect at least with a Sentinel.

As long as we don't have a role, we reply "-NOROLE trying to get a
role via Sentinel".

Note that during the NOROLE state, we reply even to PING with the
-NOROLE error, so the instance is basically down for monitoring
devices (Sentinel itself & so forth).

However we reply to INFO correctly.

How the sentinel advertises itself during that time?

===

When an instance is in NOROLE state, it shows "obtaining" in the INFO
replication section as role, and adds an additional field about the
master group name. Sentinels already monitoring this instance, will
not process the INFO output when the role is as such.

Example of INFO output:

role:obtaining

sentinel-group:mymaster

Startup phase

===

When a Redis instance configured to auto-configure via Sentinel
starts, it asks every instance in the list, one after the other (until
one working is found), what is the role it should use, using the
following command:

SENTINEL GETROLE <groupname> <port>

The Sentinel will obtain the IP address from the socket peer, while
the port is specified directly by the asking instance, so the Sentinel
knows the ip:port pair of the instance to identify the Redis instance.

We handle the following conditions:

1) If the instance is not known, it is assigned as a slave of the
current master.

2) If the instance is known and is a slave, we reply with the master
addr/port coordinates.

3) If the instance is known and is a master, we reply with a port of
zero (and any address) to tell the instance to turn into a master.

4) If we don't have a valid master, we reply -TRYAGAIN

As "master" we always use the elected slave if a failover is in progress.

Also if the current master is in ODOWN or SDOWN state, we always reply
with -TRYAGAIN, with the exception of case "3".

Race conditions handling

===

If a failover is in progress when an instance connects to a Sentinel
to get the role, it may happen that it gets assigned as a slave of the
old master, and there is not enough time for it to be detected and
switched to the new master when slaves are reconfigured.

For this reason, the Sentinel replying to the SENTINEL GETROLE query
with "slave", is in charge of:

1) If the instance is known, to clear the SDOWN state and update the
ping time accordingly.

2) If the instance is not known, to add it to the list of slaves of
the current master.

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter

dvirsky

unread,

May 21, 2013, 6:20:32 AM5/21/13

to redi...@googlegroups.com

Hi Salvatore.

Thanks for sharing. All in all it sounds great and would definitely improve things for cloud clusters.

A few notes inline

On Tuesday, May 21, 2013 12:42:00 PM UTC+3, Salvatore Sanfilippo wrote:

A new configuration option is added
manage-role-with-sentinels mymaster 192.168.1.1 2679 192.168.1.2 2679 … …

wouldn't it be more aesthetic and easy to read, to make this into two directives?

managed-sentinel-group mymaster

sentinels 192.168.1.1 2679 192.168.1.2 2679 … …

Also, what happens if we live-switch the group while the instance is running? it will start from scratch?

Note that during the NOROLE state, we reply even to PING with the
-NOROLE error, so the instance is basically down for monitoring
devices (Sentinel itself & so forth).

What happens if the instance was restarted but has data? can we be in "server stale data" mode like slaves? read only?

When a Redis instance configured to auto-configure via Sentinel
starts, it asks every instance in the list, one after the other (until
one working is found), what is the role it should use, using the
following command:

SENTINEL GETROLE <groupname> <port>

Maybe something like "SENTINEL JOIN <group> <port>" be more expressive for this action?

We handle the following conditions:

1) If the instance is not known, it is assigned as a slave of the
current master.

2) If the instance is known and is a slave, we reply with the master
addr/port coordinates.

3) If the instance is known and is a master, we reply with a port of
zero (and any address) to tell the instance to turn into a master.

4) If we don't have a valid master, we reply -TRYAGAIN

How would you add the first master automatically to a group then?

Also, major issue: what about dynamically creating new groups?

Ideally I'd like to launch a new group by simply firing up some instances, naming the group without telling anything to the sentinels, and let them handle master election.

Or alternatively have an API in sentinel to add empty groups.

Salvatore Sanfilippo

unread,

May 24, 2013, 4:11:44 AM5/24/13

to Redis DB

Hello Dvir,

> On Tuesday, May 21, 2013 12:42:00 PM UTC+3, Salvatore Sanfilippo wrote:
>>
>>
>> A new configuration option is added
>> manage-role-with-sentinels mymaster 192.168.1.1 2679 192.168.1.2 2679 … …
>
>
> wouldn't it be more aesthetic and easy to read, to make this into two
> directives?
> managed-sentinel-group mymaster
> sentinels 192.168.1.1 2679 192.168.1.2 2679 … …

Not a big difference after all, I'll consider both the options when
actually coding this.

> Also, what happens if we live-switch the group while the instance is
> running? it will start from scratch?

I'm not sure what you mean here, I'll try to reply with more details:
the state in which the Redis instance gets the role from Sentinel only
happens at startup, later on it will always use the current master
ip/port to reconnect, that may be either the original one provided by
Sentinel or another because Sentinel reconfigure the instance with
SLAVEOF after a failover.

> What happens if the instance was restarted but has data? can we be in
> "server stale data" mode like slaves? read only?

Since it will only need to fetch the role, so it is a matter of
milliseconds most of the times, I would skip serving steal data at all
and do it after role fetching.

>> SENTINEL GETROLE <groupname> <port>
>
>
> Maybe something like "SENTINEL JOIN <group> <port>" be more expressive for
> this action?

Maybe it is a good idea as it reflects that the command has a side
effect of adding the instance to the table of the Sentinel if the
instance is not known...

> How would you add the first master automatically to a group then?

Just it is what is configured in Sentinels to be the master in sentinel.conf.
If the Redis instance trying to join will match the ip:address of the
master, it will be configured as a master. At this point Sentinels
will stat to reply to the other instances as well with the salve role
and the ip:port of the master instance, so a cold start is definitely
possible.

Here we have the general problem that Sentinel configurations are not
stored on disk, so if all Sentinels reboot there is definitely a
problem.
I think that with the new CONFIG REWRITE it makes sense to rewrite
Sentinel configurations every time there is a master switch.

> Also, major issue: what about dynamically creating new groups?
> Ideally I'd like to launch a new group by simply firing up some instances,
> naming the group without telling anything to the sentinels, and let them
> handle master election.
>
> Or alternatively have an API in sentinel to add empty groups.

I would favor the latter, and again, it would rewrite the
configuration after the group is created.

Another open problem is to provide a Sentinel api (via more sentinel
subcommands) to do online operations, like we do with CONFIG SET in
Redis instances.

Cheers,
Salvatore

Reply all

Reply to author

Forward