redis-sentinel master-to-slave resurrection is failing in kubernetes statefulset

1,696 views
Skip to first unread message

Chris Hiestand

unread,
Jul 25, 2017, 5:42:55 PM7/25/17
to Redis DB
I've setup two k8s (kubernetes) statefulsets: one for redis and the other for redis-sentinel (so they are not colocated) using this docker image based on redis 4: https://hub.docker.com/r/zestyio/redis-k8s-statefulset/ (it is basically a bash wrapper in front of redis:4-alpine to copy the configuration file from a template if it has no configuration file yet - namely, it is the first time the container has been started)

Here is the base config for one of the sentinel nodes (only the ordinal changes between nodes):

protected-mode no
sentinel monitor sessions sessions-redis-0.sessions-redis 6379 2
sentinel down-after-milliseconds sessions 2000
sentinel failover-timeout sessions 30000
sentinel parallel-syncs sessions 5
sentinel auth-pass sessions password
sentinel announce-ip redis-sentinel-0.redis-sentinel

Here is the base config of the redis master:

requirepass "password"
masterauth "password"
slave-announce-ip sessions-redis-0.sessions-redis

Here is the base config of the redis slave:

requirepass "password"
masterauth "password"
slaveof sessions-redis-0.sessions-redis 6379
slave-announce-ip sessions-redis-1.sessions-redis


Question 1 - why am I getting a constant stream of these in my logs every second? I think the reason why is because I am using sentinel announce-ip f.q.d.n instead of the ip address. This seems like a bug?

redis-sentinel-1 redis 1:X 25 Jul 20:05:22.584 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-2 redis 1:X 25 Jul 20:05:22.584 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-2 redis 1:X 25 Jul 20:05:22.587 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-1 redis 1:X 25 Jul 20:05:22.591 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-2 redis 1:X 25 Jul 20:05:22.610 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-1.redis-sentinel port 26379 for cde260d1986681f04621f871e1fa1824cdffbb22
redis-sentinel-2 redis 1:X 25 Jul 20:05:22.950 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-1.redis-sentinel port 26379 for cde260d1986681f04621f871e1fa1824cdffbb22
redis-sentinel-1 redis 1:X 25 Jul 20:05:22.843 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-2.redis-sentinel port 26379 for fadf63246bad445a104b8359cc76ba85f639d1ae
redis-sentinel-2 redis 1:X 25 Jul 20:05:22.958 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-1.redis-sentinel port 26379 for cde260d1986681f04621f871e1fa1824cdffbb22
redis-sentinel-1 redis 1:X 25 Jul 20:05:23.927 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-2.redis-sentinel port 26379 for fadf63246bad445a104b8359cc76ba85f639d1ae
redis-sentinel-1 redis 1:X 25 Jul 20:05:23.931 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-2.redis-sentinel port 26379 for fadf63246bad445a104b8359cc76ba85f639d1ae
redis-sentinel-1 redis 1:X 25 Jul 20:05:24.463 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-2 redis 1:X 25 Jul 20:05:24.463 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
redis-sentinel-2 redis 1:X 25 Jul 20:05:24.613 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-1.redis-sentinel port 26379 for cde260d1986681f04621f871e1fa1824cdffbb22
redis-sentinel-1 redis 1:X 25 Jul 20:05:24.658 * +sentinel-address-switch master sessions 10.4.3.36 6379 ip redis-sentinel-0.redis-sentinel port 26379 for 3dd16834421b9efffb4000cf027121f6e1b03f77
 

Question 2 - Is there anyway to fix the resurrection of master with this setup? How do we tell redis-sentinel to use the master's hostname, not ip address?

When I delete the redis master node, redis-sentinel correctly promotes the slave to master. But when the new master comes back online it stays in standalone master mode. I believe the reason why is because in kuberentes statefulset the IP address is not static, only the hostname is <https://github.com/kubernetes/kubernetes/issues/28969>. And so from the redis-sentinel perspective, the original master never comes back online. So the natural solution would be if redis-sentinel used the hostname, not the IP address. Is there a way to do this?


Wishlist:
Operations would be much simpler if redis and sentinel stored the hostname in the form it was given - either hostname or IP. I assume that IP is preferred for performance but I think we can let the ops/OS people worry about e.g. dns caching, if that's how they choose to configure their nodes.


Many thanks for reading this!

Chris Hiestand

unread,
Oct 4, 2017, 4:32:19 PM10/4/17
to Redis DB
I still don't know the answers to these so thought I'd try to revive this thread. Chief among the questions is just to confirm that after initial hostname resolution sentinel always uses IP addresses, not hostnames? I'd prefer that for starting or restarting inter-host connections hostnames are used and not ip addresses, when hostnames are given in the config. This preference is because the redis server/sentinel IPs will change but the hostnames are constant.

Chris Hiestand

unread,
Oct 5, 2017, 1:24:23 AM10/5/17
to Redis DB
So the chief problem I'm hitting right now is that sentinel stores redis master/slave connection information as IP instead of as hostnames.

127.0.0.1:26379> sentinel slaves sessions
1)  1) "name"
    2) "10.4.3.25:6379"
    3) "ip"
    4) "10.4.3.25"
    5) "port"
    6) "6379"

I would need line 4 above to read "sessions-redis-0.sessions-redis", or what has been set in "slave-announce-ip". But there is no "sentinel-announce-ip" configuration option in redis so I think I'm out of luck here.

There may be other problems of a similar nature in addition to this where ever IPs are stored instead of hostnames.

Vikas Yadav

unread,
Apr 23, 2018, 8:38:53 AM4/23/18
to Redis DB

Hi Chris, 

I have also come up with a similar issue. Were you able to find a solution to this? 

Chris Hiestand

unread,
Apr 23, 2018, 2:20:18 PM4/23/18
to Redis DB
Not exactly. I think this type of configuration isn't supported by redis, nor by kubernetes. Though I might have missed something.

I haven't had a chance to revisit this issue, but take a look at this helm chart: https://github.com/kubernetes/charts/tree/master/stable/redis-ha

It seems to allow HA redis in kubernetes though I'm not sure what the difference is. One thing different is the non-official redis image:

redis_image: quay.io/smile/redis:4.0.8r0


If you get a chance to figure out why that chart works and why our approach doesn't could you update this thread? Thanks! 

Laurens Martin

unread,
Nov 6, 2020, 1:32:45 PM11/6/20
to Redis DB
Hi,

I know this is an old thread, but did anyone ever manage to get this working? I'm having the exact same problem with the master not rejoining as a slave.
It seems to me that the helm charts available online (the one linked by Chirs, the bitnami chart) use a sentinel container in the same pod as the redis instance. I would like the sentinels to overlook multiple sets, so I'm looking for a setup that uses separate pods for the redis instances and the sentinels.

Kind regards,
Laurens
Op maandag 23 april 2018 om 20:20:18 UTC+2 schreef chrish...@gmail.com:

Steve Lipinski

unread,
Nov 10, 2020, 9:30:50 AM11/10/20
to Redis DB
We create a per-pod service for each pod in the redis-server statefulset.  Thus, we then use that K8s service IP as the slave announce for each server.  This provides a way to maintain a static IP assignment for each pod.
We still have challenges of a pod (formerly master) coming up and re-joining as a slave.  For that we have a shell/python entrypoint that attempts to discover the "new" master both through another k8s service as well as querying the sentinels.  If that entrypoint can find an active master, it starts up the redis-server process as a replica of that pod.  Otherwise, it goes into a fail-loop where it will sleep and then re-try until some point, giving up and just starting as master.
This works fairly well, and we use some other aspects to "hint" at the sleep/fail values used (e.g., is there a replicaof directive in the config file and the local pod ordinal index).  If formerly a replica, wait longer, if ordinal index higher, wait longer.  With this, the former-master *should* try to come back up first, but not always, and if not, pod -0 will wait less than say pod -3.  

The only "problem" we have with this approach is that it is not 100% bulletproof when everything restarts at the same time - e.g., all sentinel and server pods restart (e.g., K8s cluster failure or whatever).
In such scenarios, we are very likely to come up clean, but there is always the edge case (based on pod startup times) where we can encounter a dual-master scenario.  
This can be mitigated with an external STONITH kinda thing that detects multiple-masters, or even with setting min-replicas settings in the conf.  

Another consideration is some type of leader-elector sidecar to help by using a K8s resource as a timed-lock to prevent multiple pods from obtaining the master role.

John Fak

unread,
Nov 10, 2020, 2:08:42 PM11/10/20
to redi...@googlegroups.com
Funny Steve. 
This is sort of where we got to also.
The issue we see is even when you register the slave redis to the k8 static service IP - under the hood it communicates/knows the real IP. So you get a clean 1st failover and then no longer can failover clean.
So we use internal API call and python to reconcile under certain scenarios - but doesnt handle an entire k8 cluster crash (everything coming up together) as we couldnt find a way to detetmine who was the last "true master" without custom logging.

I think the true answer is to write an redis operator for your setup. Something we plan to do unless we go EE.



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/463db47f-13c9-4b2d-85ee-f8c0e17e9717n%40googlegroups.com.

Steve Lipinski

unread,
Nov 11, 2020, 9:37:40 AM11/11/20
to Redis DB
The trick for us is to set slave-announce-ip and slave-announce-port to be the service IP.  (They may be called replica-* now...).  Once we set that to the per-pod K8s service, we have not had any problems with failovers.  
Our bind is set to both the pod_ip and localhost.  Our replicaof and sentinel monitor directives only use K8s service IPs.  It seems to work and keep everything fairly clean.

John Fak

unread,
Nov 12, 2020, 11:32:22 AM11/12/20
to redi...@googlegroups.com, stevel...@gmail.com
@steve 
How do you do that  ? 
here is our code where we sort of do that ....



   if [[ `hostname` = "redis-clusterss-0" ]]
                        then
                             echo "master"
                             redis-server "/conf/redis.conf"
                        else
                             echo "slaaaaaave"
                             redis-server "/conf/redis.conf" slaveof redis-clusterss-0.redis-clustersvc 6379
       fi

So basically on initial build -  for ordinal pod [0] configure as master.
Then for the other pod [1] - register it as slave through service of pod [0] 

That works for oinital failover ...... but after that the old master gets a new IP ..... and the new master (original slave)  still tries to talk to it via old IP  

Can you share part of your config here ...



Steve Lipinski

unread,
Nov 13, 2020, 11:30:41 AM11/13/20
to Redis DB
Not sure if you're missing the part about setting slave-announce-ip/port to the K8s svc.  That is crucial.  We do that in our entrypoint and resolve the name to an IP via K8s env (Redis doesn't like to store IPs anywhere, apparently).  I couldn't copy/paste the code directly without lots of other explanation to whats going on, but for all intents and purposes, we're doing the equivalent of this with some of your code mixed-in.  Disclaimer, I did this semi-freehand, so excuse any syntax or other errors.

entrypoint.sh:

...
# function to find the current master and output it's svc IP
find_master() {
    #<code to connect to our master K8s service and get the slave-announce from current master's config>
    #<if ^ fails, check sentinels via K8s service to get current master>
}

# function that gets the passed host's service IP via K8s env
get_svc_ip() {
    typeset -u service_env=${1}-service-host
    service_env=${service_env//-/_}
    echo ${!service_env}
}

# Find the local pod's Service IP
my_svc_ip=$(get_svc_ip ${HOSTNAME})

# Add announce to config
grep slave-announce-ip /conf/redis.conf || {
    echo slave-announce-ip $my_svc_ip >> /conf/redis.conf
    echo slave-announce-port 6379 >> /conf/redis.conf
}

unset master
if [[ ! -f /conf/.started ]]; then
    # Only do this on the very first bringup of the system/cluster
    master=$(get_svc_ip ${HOSTNAME%-*}-0)
else
    master=$(find_master)
fi

unset slaveof_arg
[[ -n $master ]] && [[ $master != $my_svc_ip ]] && slaveof_arg="slaveof $master 6379""

touch /conf/.started
redis-server /conf/redis.conf $slaveof_arg

...
 

Hope that helps...

John Fak

unread,
Nov 13, 2020, 12:59:11 PM11/13/20
to redi...@googlegroups.com
It does
BUt I think the core logic is here

find_master() {
    #<code to connect to our master K8s service and get the slave-announce from current master's config>
    #<if ^ fails, check sentinels via K8s service to get current master>
}


AS the problem is - if the entire stack goes down - sentinel cant talk to the master as both PODS got re-ip'd ......... so you basically have two masters each listening on a new IP.



Reply all
Reply to author
Forward
0 new messages