How to get notification when cluster failover?

564 views
Skip to first unread message

Protoss Hu

unread,
Oct 10, 2016, 9:39:47 PM10/10/16
to Redis DB
In sentinel, we can have "sentinel notification-script" set in sentinel.conf to be notified about master-slave failover. But it seems there is no cluster commands about setting notification script. Do I have to periodically analyze the output of cluster nodes commands to determine whether there is failover in cluster?

Tuco

unread,
Oct 12, 2016, 2:58:46 AM10/12/16
to Redis DB
I think it a very good idea to have a script which periodically checks the output of "cluster info" command from multiple machines and verifies that everything is fine.
Lets say you have a sentinel notification script which runs on your N sentinels, but lets say all your sentinel machines are down, you will not get a notification. 
Its better to have a simple script which runs on the your application servers and periodically checks "cluster info" command. 

Sergey

unread,
Oct 15, 2016, 12:25:54 AM10/15/16
to Redis DB
Yes, we considered that option too - checking cluster info. The only problem is that if the script is supposed to run by cron or anything - it might be a single point of failure. 
I have another (call it crazy) idea in mind, which I haven't tried and do not know whether it's good or bad. If it is possible to setup sentinels just to monitor all masters and not to attempt any failover procedure, the cluster of sentinels might be a good monitoring network, which may issue alerts via script execution. Also Sentinels can issue pub/sub message as far as I understand, which might be delivered to clients.
Do you think this is viable or not?

Rahul Babbar

unread,
Oct 15, 2016, 1:38:52 AM10/15/16
to redi...@googlegroups.com
Hi Sergey, 

The cluster info script will be quite lightweight, so you can run it every minute on the instances where you are planning to run sentinel instead. 
so they will act as notifications for you in case the cluster state is not ok. isn't this what you are planning to achieve using a set of sentinels which just issue alerts?

Also, i have only worked with sentinels only in 2.8.9, was not aware of the notification script which you mentioned in sentinel, and i think it would be easier to have a monitoring script which could monitor the servers as per our requirements, instead of altering the behaviour of sentinels.
As per my limited knowledge of sentinels, they score over clusters only if we want to use redis for pub-sub, because pub-sub in cluster is not recommended.

what we have is a few large cluster groups, and a master slave group, and a set of notification scripts.
1) a script running on each physical where multiple cluster nodes are running. It checks whether all the redis server processes are running, and if it is not, it sends a mail and starts it.
2) a script which runs every 10 mins on a few machines(3), and tries to check cluster info, checks the status ok, inserts a predefined key. It also does the same for master slave instance. If it gives error, it retries after 2 minutes, and if it again fails, it sends an email. This script runs on three separate physical achines, given that the replication of the system is 1, and we are willing to acknowledge the cluster failover in case two+ machines are down, it is ok.
3) since on a single physical machine, we have many masters, and other's slaves running, there is a script which runs every hour, and checks whether there are equal number of masters in each physical machine. If there are unequal, then it sends an email(this is needed to avoid the case i described earlier where two machines fail one after the other, leading to the cluster failing, which may not be the case in your case, depending on how your VMs are created from physical machines).

In case of hardware failure, mostly a machine restarting, redis process starts by itself(script 1), script 2 almost never fails because the cluster is ok even after hardware failure within a minute, we somehow have to manually make the cluster balance when script 3 sends mails. 

Apart from it, we have daily mailers analysing clusters and pointing out nodes where memory is more than 70% or connections are more than some limit. And a basic dashboard created using graphite and grafana to analyse the cluster stats over time.

Considering machine restart is not that often, once every few months or so, it has all been very stable and manageable.
Hope it helps. 

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/QEFH3bJT7ac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages