Slave has flag disconnected but master link status is up

837 views
Skip to first unread message

Michi Lehr

unread,
Apr 12, 2021, 9:58:54 AM4/12/21
to Redis DB
Hi,

I get following error when I try to force a failover:

(error) NOGOODSLAVE No suitable replica to promote

I have following setup:

1x Redis Master
2x Redis Replica
3x Redis Sentinel

Sentinel reports status "ok"
master0:name=mymaster,status=ok,address=xxx.xxx.xxx.xxx:6379,slaves=2,sentinels=3

I can also see that the sync was successful on the replica
31497:S 12 Apr 2021 13:03:05.164 * Connecting to MASTER xxx.xxx.xxx.xxx:6379
31497:S 12 Apr 2021 13:03:05.164 * MASTER <-> REPLICA sync started
31497:S 12 Apr 2021 13:03:05.164 * Non blocking connect for SYNC fired the event.
31497:S 12 Apr 2021 13:03:05.165 * Master replied to PING, replication can continue...
31497:S 12 Apr 2021 13:03:05.165 * Trying a partial resynchronization (request xxxxxx:1376629081).
31497:S 12 Apr 2021 13:03:05.166 * Successful partial resynchronization with master.
31497:S 12 Apr 2021 13:03:05.166 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

Also the state is online (info replication)
slave0:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=1407160366,lag=0
slave1:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=1407167545,lag=0

What I saw is that both replicas have the flag "slave,disconnected" when I call SENTINEL replicas mymaster

I do not see any errors in the logs.

Where can there be a problem?
Did I miss something?

Thank you

Michi

Michi Lehr

unread,
Apr 12, 2021, 7:03:59 PM4/12/21
to Redis DB
Fixed it.

The replicas had following config:

user default on sanitize-payload #XXXXXXXXXXXX ~* +@all

With this config, the default user on the replica was not allowed to subscribe to the channels so no connection message was received.

Changed to

user default on sanitize-payload #XXXXXXXXXXXX ~* &* +@all

Maybe this is a migration error?

Itamar Haber

unread,
Apr 13, 2021, 8:11:37 AM4/13/21
to Redis DB
Hi Michi,

Thanks for raising this issue and providing the workaround. It is indeed an error, not so much as of migration, and more of the stupid developer who wrote that specific piece of code (disclaimer: author here). The ACL Pub/Sub is a new thing in v6.2. Due to my own incompetence, I broke the world by forgetting to handle the default user's persistence mechanism appropriately.

The good news is that the fix is already ready - see https://github.com/redis/redis/pull/8723. It should be included in the upcoming version 6.2.2 (ETA next week).

If it is possible for you to check the fix and provide your thumbs up/down on it, that would be amazing.

Cheers,
Itamar

Michi Lehr

unread,
Apr 13, 2021, 9:05:14 AM4/13/21
to Redis DB
Hi Itamar,

thank you for clearing that up! Fix looks good for me. I will test it later.

Since other users may be affected, do you think we should add an issue at GitHub with the workaround and a reference to your fix?

For me, this is the first place to look at - but I did not find an issue pointing to my problem (no error in logs but flag "disconnected".

Cheers,

Michi 

Itamar Haber

unread,
Apr 13, 2021, 11:37:36 AM4/13/21
to Redis DB
Dear Miichi,

> thank you

My pleasure (and mea culpa).

>  do you think we should add an issue at GitHub

While we aren't usually strict about matching issues to PRs, it makes sense to do so with this if only for the sake of your workaround :) Please feel free to open (& close) an issue.

Also, note that this affects v6.2.0 and v6.2.1, so hopefully the fix (which is somewhat breaking) won't hurt too many people.

Cheers,
Itamar

Reply all
Reply to author
Forward
0 new messages