Re: Sentinel Operations

134 views

Skip to first unread message

Salvatore Sanfilippo

unread,

Sep 28, 2012, 1:05:32 PM9/28/12

to redi...@googlegroups.com

On Fri, Sep 28, 2012 at 6:06 PM, Tom Coupland <tcou...@gmail.com> wrote:
> Afternoon,

Hello Tom,

> I'm trying to get my head around the Sentinel and just hoping someone could
> clear up/confirm my idea of what it does. Will have to resort to a list i'm
> afraid:
>
> 1. Sentinels should be configured to point at Masters only, they will
> discover the Slaves. Otherwise you risk something like the following, may
> not be a problem, but certainly looks weird:
> # Sentinel
> sentinel_masters:3
> master0:name=slave1,status=ok,address=.32:6379,slaves=1,sentinels=2
> master1:name=slave2,status=ok,address=.32:6379,slaves=1,sentinels=2
> master2:name=mymaster,status=ok,address=.32:6379,slaves=1,sentinels=2

I'm not sure about this, how did you obtained such an effect?

If a Sentinel is configured with the address of a slave what should
happen is a redirection so it should start monitoring the master
instead.
Btw was your output modified in some way? I see ".32" that is not a
valid address.

> 2. When a master fails (been using 'debug segfault' for this) the sentinels
> have a chat and promote one of the slaves to master. However if the master
> is rebooted it is not promoted back to master, nor is it reconfigured to a
> slave, at least that's what i'm seeing when testing this. Is it therefore a
> requirement that any failed master needs to be reconfigured to a slaveof
> before rebooting? Of course any clients must keep their distance.

That's correct, Redis does not try to fix instances that were marked
as having issues right now, what to do in this case is currently not
clear, there are several strategies like doing nothing like today, and
wait for the system administrator to fix it (it was notified, in
theory).
Or actively try to disable the instance sending a SHUTDOWN in the
event it would be back for some reason so that clients will not be
able to talk with it.

There is also another option that looks promising, what we are going
to have ASAP is a way to configure Redis to be a slave of a master
that is not hardcoded ip:port, but discovered via sentinel. Something
like that:

slaveof sentinel://192.168.1.50:2679,192.168.1.51:2679 mymaster

With such a configuration what would happen is that Redis asks the
sentinels listed (one after the other until one replying is found) to
get the address of the master, and use this address to configure
replication.

When this will be available what you do is to configure all the Redis
instances like this, but manually switch one to master only the first
time (SLAVEOF NO ONE). So basically if an instance is rebooted it will
always try to be the slave of what the Sentinels tell it.

You also mentioned the clients in the context of an intermitting
failure / reboot. For instance a master does not work properly,
Sentinel promotes another slave, then the master is back online for
some reason, possibly after a reboot.
When this happens the clients should avoid engaging with it.

There are different solutions in this regard:

1) With the new SLAVEOF sentinel:// ... thing basically the instance
will start as a slave.
2) If the client is Sentinel-aware, when the master is failing and it
asks Sentinel info about the new master, it should probably update the
config of working masters / slaves, so the old one is no longer in the
table. This basically means that every time there is a connection
error a client should refresh the config accordingly to what Sentinels
are telling it.
3) In the failover script, to be safe, it is possible to add commands
to filter the IP of the failing master at layer 3.

> 3. When a master fails clients possible client reactions are:
> - Receive a message from the sentinels via configured scripts and rebuild
> connection pools.
> - Store the replication information on connection to master, which they
> can then use to find the new master should it pop.

Yes, my idea is that in the long term having "Sentinel aware" clients
is better than going for the configured script.
Every time a new Redis connection is created (new connection, link
reconnection after error or timeout) Sentinels should be queried.

So the client configuration should no longer be a list of Redis
instances, but a list of Sentinel addresses.

> I know sentinel is very much a work in progress, so I guess i'm really
> asking if these are design decisions or aspects that are on the list for
> completion.
>
> Keep thinking up places where Redis would be a great solution in the
> platform i'm a part of (its a great tool kit), just need to get my head
> around the high availability aspects before trying for a production use
> case.

Your email is very appreciated because Redis Sentinel is currently a
working system but is far from providing everything, the idea is to
collectively start to use it for what it can do it, and find together
what is the way for further developments to make it more useful and
simpler / safer to operate.

Cheers,
Salvatore

>
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/MWni3df9u00J.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter

Tom Coupland

unread,

Oct 1, 2012, 6:55:40 AM10/1/12

to redi...@googlegroups.com

Hi Salvatore,

I think i edited that first output for some unknown Friday related reason... Essentially the ip's were all the same.

Like the idea of configuring instances with the sentinel addresses to solve the intermittent master problem, that would make the whole problem a non-issue.

I forked the Jedis client over the weekend (https://github.com/mantree/jedis) and shoved in the idea of sentinel driven new connections. There's obviously a bit more to do, but i'm thinking of the following rules:

1) Connection should check sentinels for a quorum approved master, 'get-master-addr-by-name' does this right?
2) Connection validity checks should test for master, is there an 'is-master' command?
3) On connection failure connection pool should be cleared.

The first two are quite easy, the last (in jedis) would require a bit more code, but is probably a good performance enhancement to have.

That would produce a simple sentinel client. What would be nice is if 'get-master-addr-by-name' had a 'wait' param so during failure over clients would queue up waiting for the process to be completed, instead of just failing. This could be done client side, but seems like a nice feature for the sentinel's to provide.

Hopefully should be getting some of this into a working system soon, which should produce more ideas.

Cheers,

Tom

Reply all

Reply to author

Forward

0 new messages