Are All Redis Replicas Supposed to Become Unavailable During Master Failover?

Kevin Johnson

unread,

Apr 20, 2017, 7:26:57 PM4/20/17

to Redis DB

Hello,

I’ve been evaluating Redis as a caching / in-memory DB solution for our platform and I’ve been largely impressed as it appears to be a good option for us.

However, when using Redis replication coupled with Redis sentinel I am troubled by the behavior I’m seeing during master failover events.

Our dataset is approximately 35GB in size and it takes almost 10 minutes for a Redis replica to flush its own DB and reload the new DB synced from the new master.

During this time, our Redis clients experience in essence a complete outage due to response timeouts trying to reach the Redis replicas, which are all simultaneously blocking while they flush and reload their DBs.

I have configured Redis sentinel with “parallel-syncs” as the default of “1”, yet all replicas more or less sync at the same time. Also, PSYNC does not seem to initiate, either.

Is this a known/valid problem or is it likely something with my configuration?

Any guidance you might be able to provide would be most appreciated.

-Kevin

andyh

unread,

Apr 20, 2017, 9:15:54 PM4/20/17

to redi...@googlegroups.com

I don't think you mis-configure anything. It just takes that much amount of time to flush old data and load a RDB file into memory for 35GB data.

You can consider Redis Cluster which you can shard the key space across many machines so that the size of each Redis instance can be much smaller and take less time to recover from a failover.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

andyh

Andy Huang (Huangkejun)

hva...@gmail.com

unread,

Apr 21, 2017, 2:31:12 AM4/21/17

to Redis DB

How are your slaves set for the 'slave-serve-stale-data` parameter? How is your master set for the 'repl-backlog-size' parameter?

Or are you saying your slaves have their network interfaces so saturated by the sync with the master that clients can't complete a TCP connection to them?

Kevin Johnson

unread,

Apr 21, 2017, 10:43:07 AM4/21/17

to Redis DB

Thanks we are considering sharding.

On Thursday, April 20, 2017 at 7:15:54 PM UTC-6, iandyh wrote:

I don't think you mis-configure anything. It just takes that much amount of time to flush old data and load a RDB file into memory for 35GB data.

You can consider Redis Cluster which you can shard the key space across many machines so that the size of each Redis instance can be much smaller and take less time to recover from a failover.

On Fri, Apr 21, 2017 at 8:26 AM, Kevin Johnson <s...@troutlogic.com> wrote:

Hello,

I’ve been evaluating Redis as a caching / in-memory DB solution for our platform and I’ve been largely impressed as it appears to be a good option for us.

However, when using Redis replication coupled with Redis sentinel I am troubled by the behavior I’m seeing during master failover events.

Our dataset is approximately 35GB in size and it takes almost 10 minutes for a Redis replica to flush its own DB and reload the new DB synced from the new master.

During this time, our Redis clients experience in essence a complete outage due to response timeouts trying to reach the Redis replicas, which are all simultaneously blocking while they flush and reload their DBs.

I have configured Redis sentinel with “parallel-syncs” as the default of “1”, yet all replicas more or less sync at the same time. Also, PSYNC does not seem to initiate, either.

Is this a known/valid problem or is it likely something with my configuration?

Any guidance you might be able to provide would be most appreciated.

-Kevin

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Kevin Johnson

unread,

Apr 21, 2017, 10:51:00 AM4/21/17

to Redis DB

I have "slave-serve-stale-data" set to yes and the "repl-backlog-size" is set to 100MB. However, during my testing I'm not writing any new data to the master.

Meanwhile, I've been informed that this is indeed the default behavior during failover. The larger your data set the longer it takes to sync, flush, and reload it. And unfortunately all replicas more or less perform these activities at the same time. Also, PSYNC basically doesn't really work reliably in Redis versions 3.x and earlier.

Apparently Redis version 4.0 offers a more robust implementation of PSYNC. I will give it a try.

And finally FWIW, apparently AWS ElastiCache has implemented enhanced failover support such that replicas never become unavailable (for reads) during master failover.

Salvatore Sanfilippo

unread,

Apr 21, 2017, 11:04:45 AM4/21/17

to redi...@googlegroups.com

Hello, yep the TLDR is:

1) 3.2: After a failover, all the slaves need to resync with the
master, so they become unavailable for some time.
2) 4.0: After a failover, normally, the slaves will resynchronize
immediately. But if there is not enough backlog a full resync may be
needed.

The 4.0 replication is in general much much better. For instance
slaves are often able to resync immediately also after a SHUTDOWN and
an immediate restart, so that slave upgrades no longer mean a full
resync. And so forth.

A new 4.0 RC is going to be released in the next days.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

--

Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Reply all

Reply to author

Forward