Redis/Sentinel in AWS autoscaling environment

2,682 views
Skip to first unread message

Jonas Nilsson

unread,
May 16, 2014, 5:06:54 AM5/16/14
to redi...@googlegroups.com
Hi,

We have a Redis setup in the Amazon Cloud

The setup is like this:

 - Server A and B has a Redis server setup as master and slave. Each server is also running a sentinel monitoring the master.

 - Server C and D are running our application that is using the the redis to store and cache data. The C and D servers belongs to an autoscaling group and could scale up so that the group includes up to 4 servers.  These servers also run a sentinel to monitor the master.

So far so good. We normally have 4 sentinels that monitor the redis master and failover works like a charm even if the server running the master should die completely. The problem starts when we start to autoscale. If we add two instances to the auto-scaling group we have 6 sentinels that monitors the master and if we scale out two instances we have 4 sentinels monitoring the master again.
The problem is that the remaining sentinels still recognises 6 sentinels (4 running, 2 with the state sdown). And if the redis master server in this situation goes down we only have 3 running sentinels and a failover can' take place as we can't reach a majority. The remaining 3 sentinels are voting for a failover but it is not enough.

I have been trying to remove the sentinels on the auto-scaling instances before shutting down using
  sentinel remove master
The only difference that makes, is that the remaining sentinels notices that the shutdown sentinel is disconnected. It is still available in the list and it seems like it count toward the majority and prevents a failover.

How do I make remaining sentinels to forget sentinels that has been a part of the monitoring cluster? 

Any help is appreciated.

BR
Jonas



 

Jonas Nilsson

unread,
May 16, 2014, 9:21:32 AM5/16/14
to redi...@googlegroups.com
Forgot to mention we are running Redis 2.8.8

/Jonas

Johan Grönvall

unread,
May 16, 2014, 9:58:58 AM5/16/14
to redi...@googlegroups.com
I have a similar problem. I would like to know how to manage Redis and Sentinel in ASW ASG.

Dvir Volk

unread,
May 16, 2014, 3:42:13 PM5/16/14
to redi...@googlegroups.com
Generally speaking, I think putting sentinels under the same autoscaling group is a bad idea. 3-4 sentinels should do for a cluster that could scale up to dozens of machines, and adding more redundant sentinels will just make things less stable. Also, AFAIR you'll be better off with an odd number of sentinels, but I might be wrong.

Just put the sentinels on their own ASG (you can small instances for them, so the price hit is not that significant). This group shouldn't grow automatically, but used more in case a sentinel dies, etc - which is rare.

This does not directly answer your question, I know, but it will make the whole situation less complicated.





--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Jonas Nilsson

unread,
May 20, 2014, 4:26:24 AM5/20/14
to redi...@googlegroups.com, dv...@everything.me
Thanks for the advice, I will probably move our sentinels to a fixed group of instances, it seems like the only feasible solution at the moment.

But I still would like to know how to remove a sentinel from the cluster in a nice manner. It is a typical scenario in a cloud environment that you add and remove instances in an auto scaling group. We are using it to update/upgrade our software seamless, scale in instances with updated software and when the new are live we scale out the old ones. The instances that were scaled out has ceased to exist and will never exist again. How can I get the still existing sentinels to recognize this and don't count them against quorum when voting for a new master.

If this isn't an existing feature we would like to have it. Why can't there be a "-sentinel" on the pub/sub channel when a sentinel is gracefully shutdown or when a sentinel is told to not monitor a master anymore?

BR
Jonas    

Dvir Volk

unread,
May 20, 2014, 8:29:18 AM5/20/14
to Jonas Nilsson, redi...@googlegroups.com
On Tue, May 20, 2014 at 11:26 AM, Jonas Nilsson <jon...@gmail.com> wrote:

If this isn't an existing feature we would like to have it. Why can't there be a "-sentinel" on the pub/sub channel when a sentinel is gracefully shutdown or when a sentinel is told to not monitor a master anymore?

I agree, and AFAIK there is no way to remove a sentinel safely. I don't think doing SENTINEL reset  is such a good idea here. But I'm not sure it does any real damage to have dead sentinels if they really are dead and you have sufficient ones to achieve quorum.

Knowing AWS, if you have 3 sentinels, you'll probably have 2-3 such "deaths" over the course of a year. Version updates will require a restart anyway. Perhaps at such a frequency resetting or restarting them is okay. 

Salvatore Sanfilippo

unread,
May 20, 2014, 8:56:16 AM5/20/14
to Redis DB
Hello Jonas,

replying to your email here, but actually taking arguments from the
whole thread.

In general quorum-based systems are not exactly close friends of
changing majorities. Things can be made working, but is surely more
complex compared to a static configuration.
Let's start from things that appear to fix the problem superficially
but are actually not solutions:

1) Auto-cleanup on shutdown, like sending a remove-me message, is not
a solution unfortunately, since what do you do on network partitions
for example? If the message can't be delivered, or can't be delivered
to all nodes, it is not possible to wait forever to shutdown a node.
To make this reliable, involves adding more complexity compared to
other solutions.

2) Auto-eject after timeout, which is another solution that comes
spontaneously to mind, is broken as well. You may end with a
non-reachable quorum for all the "timeout" time, for example, or an
instance can just be down accidentally, for too much time, changing
the quorum to a value that violates the semantics that the user
expected in a given configuration of Sentinel.

So there are a few solutions left. One is to just run Sentinels
outside the auto scaling group. This is fine if Sentinel is just used
as a system that should work in the case of a single master having
issues.
However in a setup which is more resistant to data loss on network
partitions (where the old master gets partitioned away together with
clients), we need to run a Sentinel in every node where there is a
Redis instance that can be promoted to master, and to configure
replication so that masters stop accepting writes after some time they
don't get acks from slaves. In this setup masters isolated in a
minority with clients stop accepting writes, which is a desirable
feature if you want to protect against partitions.

So in this setup, if you auto-scale adding nodes which include
redis+sentinel pairs, you are required, on scale-outs, to make sure
the other Sentinels will forget about the Sentinels that are not going
to be killed.
Currently the only way to do this, is to use SENTINEL RESET, however
there is a problem if a failure happens exactly when we send the RESET
command to all the instances: no Sentinel know other Sentinels, so no
failover can be performed.

To avoid this problem we probably need to implement a new command,
similar to Redis Cluster CLUSTER FORGET, that makes Sentinels able to
just unregister a single instance.

The good news is that in your setup, in the auto scaling group, you
have the app nodes, C and D, and there is no reason for you to run
additional Sentinels when those nodes are automatically scaled.

Your a setup which could be perfect for you is to run Sentinels in A,
B, C, D, making sure that when C and D are auto-scaled no additional
Sentinels are run in E, F, ...
Note: to run Sentinel in an even number of nodes is not an issue in
this Setup, because you don't have at least three nodes with
Redis+Sentinel, so anyway you are not able to protect against
partitions leaving your master partitioned away with a client, while
in the other side the slave is promoted.

I hope this makes things a big clear. Basically Sentinel + Redis as a
distributed system have a semantics that changes depending on the
exact configuration of both.
If you want to protect against single node failures, it is trivial to
configure it, but if you want to bound data loss on arbitrary network
partitions, you need a more complex setup with an odd number of
servers and a Redis+Sentinel combo in every node.

Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Salvatore Sanfilippo

unread,
May 20, 2014, 8:58:41 AM5/20/14
to Redis DB
On Tue, May 20, 2014 at 2:56 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> about the Sentinels that are not going
> to be killed.

I mean forget about the Sentinels that ARE going to be killed.

Jonas Nilsson

unread,
May 20, 2014, 9:13:54 AM5/20/14
to redi...@googlegroups.com
Thanks you, this gave me a lot to think about:-)

/Jonas 

Dvir Volk

unread,
May 20, 2014, 4:52:09 PM5/20/14
to redi...@googlegroups.com
So there are a few solutions left. One is to just run Sentinels
outside the auto scaling group. This is fine if Sentinel is just used
as a system that should work in the case of a single master having
issues.
However in a setup which is more resistant to data loss on network
partitions (where the old master gets partitioned away together with
clients), we need to run a Sentinel in every node where there is a


Just my 2 cents: in 4 years of running clusters of redis instances over AWS, real network partitions almost never happen (I can't think of a single instance where this has happened without a major disruption in service altogether).
 
But your master dying on you out of the blue? Happens all the time, and in some AWS datacenters more than others: I just lost 3 servers (none of which ran redis BTW) in a single day on the AWS Brazil datacenter last week. 

So in the real world, if you want my 2 cents, make your configuration simple, optimize for availability, and prepare for your master dying on you without notice. it will happen.
 

Yiftach Shoolman

unread,
May 21, 2014, 4:09:45 PM5/21/14
to redi...@googlegroups.com
I agree with Dvir that network partition is less frequent in the cloud than a node failure event, but I wish I could say it almost never happens. We have been experiencing too many network partition events on AWS and on other clouds as well. This is one of the most,  if not the most, difficult problem that a distributed database has to deal with.
Be aware that not having the right deployment architecture for network partition, practically means that you can lose your entire dataset in one moment.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Yiftach Shoolman
+972-54-7634621

Dvir Volk

unread,
May 21, 2014, 4:20:43 PM5/21/14
to redi...@googlegroups.com
maybe it happens more in multi-AZ deployments, but within the same AZ it's very rare (or at least has been so far for us). We're probably running way less servers than you though.

And it's of course also a matter of your use case. We simply don't use redis in patterns where network partitions can do any harm. 

Josiah Carlson

unread,
May 21, 2014, 5:05:52 PM5/21/14
to redi...@googlegroups.com
One of the things that I like to do in AWS is to periodically have every host ping every other host. When bringing up new VMs, it is one of the first things I do from that new VM and from all of the other VMs we are using. It was born from an experience we had in the early summer of 2010 where replication between a Redis master and slave would periodically fail, or where the master wouldn't recognize that the slave had disconnected. Pinging between all pairs of hosts discovered that there was a connectivity issue between the master and that one slave (we had 4 slaves at the time), so we replaced both instances.

 - Josiah
Reply all
Reply to author
Forward
0 new messages