Frequent switching of Redis master in a Redis-Sentinel Environment

1,196 views
Skip to first unread message

Mehedi Hasan

unread,
Jul 20, 2019, 11:40:58 AM7/20/19
to Redis DB
Hi Good People,
We have been using Redis-Sentinel cluster for almost 2 years. We have 14 clients (sentinel running in each client) and configurations of sentinel is like below:


bind 0.0.0.0
port 26379
sentinel myid ***************************************
sentinel monitor redis-cluster IP Port 11
sentinel down-after-milliseconds redis-cluster 10000
sentinel failover-timeout redis-cluster 10000
sentinel auth-pass redis-cluster *******************************************
logfile "/path/to/redis-sentinel.log"

# Generated by CONFIG REWRITE
daemonize yes
pidfile "/var/run/redis/sentinel.pid"
dir "/"
sentinel config-epoch redis-cluster 635
sentinel leader-epoch redis-cluster 635


Redis has AOF enabled (persec). It is being used for key-value database and also we are maintaining a queue through this. For last 1 month we are facing master switch almost everyday.  Initially we had seen  below error:

12646:M 04 Jul 01:21:24.082 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:21:31.100 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:21:40.046 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:21:48.095 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:21:57.039 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:22:07.094 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:22:52.005 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:24:42.066 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:24:57.047 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:25:39.076 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
12646:M 04 Jul 01:25:57.022 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
 
Our Redis server was in our private data center (VM) & was sharing disk with other servers. After observing those, we migrated both master & slave in a separate LUN where no other VM is mounted. But still we were facing those. We checked network, memory, disk & everything. Disk is HP 3PAR with SSD and disk performance is attached. 

Current configuration of each server is:
CPU: 12 vCPU
Memory: 72GB
Disk: 300GB (SSD)
Redis Max_memory: 20GB
Redis over-commit: False
THP: Enabled
vNIC capacity: 10G

We have not enabled over-commit has the virtual machine has sufficient memory.  The latest logs before last switches are something like below:

Master Log:
6308:C 19 Jul 22:22:00.040 * RDB: 1153 MB of memory used by copy-on-write
28654:M 19 Jul 22:22:00.857 * Background saving terminated with success
28654:M 19 Jul 22:23:01.055 * 10000 changes in 60 seconds. Saving...
28654:M 19 Jul 22:23:01.740 * Background saving started by pid 6760
28654:M 19 Jul 22:23:38.081 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
6760:C 19 Jul 22:25:59.717 * DB saved on disk
6760:C 19 Jul 22:26:00.205 * RDB: 1159 MB of memory used by copy-on-write
28654:M 19 Jul 22:26:01.017 * Background saving terminated with success
28654:M 19 Jul 22:27:02.016 * 10000 changes in 60 seconds. Saving...
28654:M 19 Jul 22:27:02.679 * Background saving started by pid 7270
28654:S 19 Jul 22:27:12.875 # Connection with slave IP:6379 lost.
28654:S 19 Jul 22:27:12.998 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
28654:S 19 Jul 22:27:12.998 * SLAVE OF IP:6379 enabled (user request from 'id=75176616 addr=IP:23816 fd=626 name=sentinel-5c7721e9-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
28654:S 19 Jul 22:27:13.000 # CONFIG REWRITE executed with success.
28654:S 19 Jul 22:27:13.482 * Connecting to MASTER IP:6379
28654:S 19 Jul 22:27:13.482 * MASTER <-> SLAVE sync started
28654:S 19 Jul 22:27:13.482 * Non blocking connect for SYNC fired the event.
28654:S 19 Jul 22:27:13.482 * Master replied to PING, replication can continue...
28654:S 19 Jul 22:27:13.483 * Trying a partial resynchronization (request ****************).
28654:S 19 Jul 22:27:38.429 * Full resync from master: ********************
28654:S 19 Jul 22:27:38.429 * Discarding previously cached master state.
7270:C 19 Jul 22:29:51.022 * DB saved on disk
7270:C 19 Jul 22:29:51.477 * RDB: 3642 MB of memory used by copy-on-write
28654:S 19 Jul 22:29:52.462 * Background saving terminated with success

Slave Log:
17194:C 19 Jul 22:23:49.183 * RDB: 1339 MB of memory used by copy-on-write
11795:S 19 Jul 22:23:49.942 * Background saving terminated with success
11795:S 19 Jul 22:24:50.093 * 10000 changes in 60 seconds. Saving...
11795:S 19 Jul 22:24:50.651 * Background saving started by pid 17634
11795:M 19 Jul 22:26:16.869 # Setting secondary replication ID to ce5d1d734a007a2b96183d48ede899c174e83f1b, valid up to offset: 162712239579. New replication ID is 15793e555c6080779d18e1c8f876bf0c1b2d5bfd
11795:M 19 Jul 22:26:16.869 # Connection with master lost.
11795:M 19 Jul 22:26:16.869 * Caching the disconnected master state.
11795:M 19 Jul 22:26:16.869 * Discarding previously cached master state.
11795:M 19 Jul 22:26:16.869 * MASTER MODE enabled (user request from 'id=1432442 addr=IP:38994 fd=28 name=sentinel-e1402af9-cmd age=69213 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
11795:M 19 Jul 22:26:16.870 # CONFIG REWRITE executed with success.
11795:M 19 Jul 22:26:16.923 * Replication backlog freed after 3600 seconds without connected slaves.
11795:M 19 Jul 22:27:13.483 * Slave IP:6379 asks for synchronization
11795:M 19 Jul 22:27:13.483 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for 'ce5d1d734a007a2b96183d48ede899c174e83f1b', my replication IDs are '68b45810acaf15a79c8d4bd6ee98d2759f1bb74b' and '0000000000000000000000000000000000000000')
11795:M 19 Jul 22:27:13.483 * Can't attach the slave to the current BGSAVE. Waiting for next BGSAVE for SYNC
17634:C 19 Jul 22:27:36.683 * DB saved on disk
17634:C 19 Jul 22:27:37.116 * RDB: 1361 MB of memory used by copy-on-write
11795:M 19 Jul 22:27:37.840 * Background saving terminated with success
11795:M 19 Jul 22:27:37.840 * Starting BGSAVE for SYNC with target: disk
11795:M 19 Jul 22:27:38.428 * Background saving started by pid 18056
11795:M 19 Jul 22:28:11.012 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
11795:M 19 Jul 22:29:21.001 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
18056:C 19 Jul 22:30:27.528 * DB saved on disk
18056:C 19 Jul 22:30:27.958 * RDB: 1350 MB of memory used by copy-on-write
11795:M 19 Jul 22:30:28.729 * Background saving terminated with success
11795:M 19 Jul 22:31:29.034 * 10000 changes in 60 seconds. Saving...
11795:M 19 Jul 22:31:29.619 * Background saving started by pid 18583
18583:C 19 Jul 22:34:16.692 * DB saved on disk
18583:C 19 Jul 22:34:17.160 * RDB: 1283 MB of memory used by copy-on-write
11795:M 19 Jul 22:34:17.883 * Background saving terminated with success
11795:M 19 Jul 22:35:18.074 * 10000 changes in 60 seconds. Saving...
11795:M 19 Jul 22:35:18.657 * Background saving started by pid 19043
19043:C 19 Jul 22:38:04.105 * DB saved on disk
19043:C 19 Jul 22:38:04.546 * RDB: 1305 MB of memory used by copy-on-write
11795:M 19 Jul 22:38:05.276 * Background saving terminated with success
11795:M 19 Jul 22:39:06.056 * 10000 changes in 60 seconds. Saving...
11795:M 19 Jul 22:39:06.642 * Background saving started by pid 19501
19501:C 19 Jul 22:41:52.293 * DB saved on disk
19501:C 19 Jul 22:41:52.735 * RDB: 2424 MB of memory used by copy-on-write
11795:M 19 Jul 22:41:53.639 * Background saving terminated with success
11795:M 19 Jul 22:42:54.074 * 10000 changes in 60 seconds. Saving...
11795:M 19 Jul 22:42:54.656 * Background saving started by pid 19983
19983:C 19 Jul 22:45:37.980 * DB saved on disk


Redis-info is also attached. CPU, Memory utilization is less than 50% during incidents. Can anyone help me to figure out what to do?


Disk Performance.png
redis_info.txt

Greg Andrews

unread,
Jul 20, 2019, 4:05:20 PM7/20/19
to Redis DB
So you have Sentinels indicating the master stops reponding to them, and the master's activity log around those times consistently indicates slow disk write performance.  You also seem to be saving to disk on all your Redis instances - master and slaves.

The first thing I would do is test whether this really is linked to disk writes:  Turn off the AOF on your master and see if this makes your Sentinels happier with the master's responsiveness.  You have AOF on your slaves, so you are not endangering your ability to recover data after a crash.  Then turn AOF back on and see if the problem re-appears.  Then turn it off again and ensure it disappears.

If you reach this point and have established that the AOF write troubles are triggering the failovers from the Sentinels, you can dig deeper into your disk subsystem to learn how/why and find a fix or workaround.  If the failovers are not being triggered by disk, then you can start monitoring the master redis-server process more closely to learn what's causing the lack of responsiveness at certain times.

One such non-disk possibility: As the warning in the https://redis.io/commands/keys page indicates, certain commands can ruin performance of a redis-server process.  If a client or a script is sending such a command to your master redis-server process, there's a distinct possibility the command can make the process slow to respond at just the time several sentinels want it to answer, and trigger a failover.  The slow log should reveal this particular kind of trouble, so it doesn't seem likely in your case.  I'm just giving it as an example of non-disk-related performance troubles.

One other thing I noticed about your Redis info:  The instance had been up for 242777 seconds (2.8 days) and in that time received 19,133,592 connections, but at the time of the INFO command, there were only 33 clients connected.  That means clients connect/disconnect with that Redis server an average rate of 79 per second.  It's very likely the peak rate of connections/disconnections is significantly higher.  See this post of mine from a year ago which has a link to a Trivago blog post about their Redis scaling adventures - one of the early ones being the performance impact of their Redis clients constantly connecting/disconnecting rather than keeping connections open.  Another possible source of poor responsiveness from the master to the Sentinels.

Greg Andrews

unread,
Jul 20, 2019, 4:15:57 PM7/20/19
to Redis DB
Oh, I just noticed.  You have two different types of saving to disk enabled:  RDB, aka background snapshots, and AOF.  These are somewhat redundant.

The AOF write complaints in the Redis logfile occur as the RDB dump is being performed.  So your process is dumping the entire 17GB database to a file, and apparently using up the available disk i/o bandwidth as it does so, producing complaints by the thread that's writing the small amount of AOF data.  Your disk graphs should show this spike in write activity, but you will likely have to zoom in to a 30-minute or 60-minute view rather than an 18-hour view, and you would prefer to see the peak or max numbers rather than the average.

I would still turn off disk saving, but start with the background snapshots (the config file "save" lines) and leave the AOF going.  This is the most likely source of your trouble.


On Saturday, July 20, 2019 at 8:40:58 AM UTC-7, Mehedi Hasan wrote:

Mehedi Hasan

unread,
Jul 20, 2019, 6:49:33 PM7/20/19
to redi...@googlegroups.com
Hi Andrews,
Thanks for going into the details. Actually in Redis documentatio,  they recommended not to use AOF alone (  https://redis.io/topics/persistence). 

Also I had the 'assumption' that rdb only writes the last 'N' changes over 'X' time as provided by the default value of 'save'. In this case, it was 10K changes every minute. But you are saying whole 17G data is being flushed to disk when snapshoting is triggered.  So my understanding was wrong?


Anyway, I am taking 3 recos from you:

1. Disabling rdb snapshoting in master
2. Keeping tcp keep alived connection open with Redis. 
3. Checking for blocking commands


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/fddfd13f-500f-4c24-b3e8-79dfe04f6dff%40googlegroups.com.

Greg Andrews

unread,
Jul 21, 2019, 2:38:59 AM7/21/19
to Redis DB
Unfortunately, the documentation is a little obscure on this point.  The AOF performs incremental writes to its file.  Snapshots do not perform incremental writes.  They save the entire dataset (database).  As the page you linked says in the "Snapshotting" section (https://redis.io/topics/persistence#snapshotting):

How it works

Whenever Redis needs to dump the dataset to disk, this is what happens:

  • Redis forks. We now have a child and a parent process.

  • The child starts to write the dataset to a temporary RDB file.

  • When the child is done writing the new RDB file, it replaces the old one.

In every place the page talks about snapshots, it refers to the data being saved as "the dataset".  The plain meaning is "the whole dataset", and nothing in that Persistence page, nor the blog post explanation, nor the SAVE / BGSAVE indicates something different from the plain meaning.  The only time incremental updates are discussed is in relation to the AOF file, which is a different file from the snapshot/RDB file.

The value of the 'save' parameter that appears (from your logfile entries) to be operative is 'save 60 10000".  The meaning is "If at least 10000 changes have been made in the last 60 seconds, I will save a snapshot of the database to the RDB file."

If the three action items you listed, please first try disabling the RDB/snapshot saves.  The available evidence indicates this is the most likely reason your master is not responding to Sentinels quickly enough.
To unsubscribe from this group and stop receiving emails from it, send an email to redi...@googlegroups.com.

Mehedi Hasan

unread,
Jul 22, 2019, 11:38:26 AM7/22/19
to Redis DB
Hi Again,
We will turn off RDB in our production tonight. However we got some more information after further investigation. We found that we have certain slow logs. For example, in normal condition we are getting slow response from INFO command.

 8) 1) (integer) 41141
    2) (integer) 1563809001
    3) (integer) 1200
    4) 1) "INFO"
    5) "127.0.0.1:50924"
    6) ""
 9) 1) (integer) 41140
    2) (integer) 1563808999
    3) (integer) 1176
    4) 1) "INFO"
    5) "REMOTE_IP:37389"
    6) "sentinel-61ed1d3d-cmd"
10) 1) (integer) 41139
    2) (integer) 1563808996
    3) (integer) 1455
    4) 1) "INFO"
    5) "REMOTE_IP:64862"
    6) "sentinel-1ac515bd-cmd"

Seems when PRedis is running INFO its getting slow response. Further dig out shown that, we are facing huge expiry of keys always. According to Redis documentation, active expiry happens on a probabilistic manner and max 200 expiry per second can happen. This can induce significant amount of latency. But its not clear in any forum how to avoid this.

Source of expiry: We are caching some information which has high expiry (60 days). Number of those keys of these type is around 15M. This business feature launched 2.5 months ago and we are suspecting that those keys have started to expiry and causing the latency.

Any help or suggestion in this regard?
Reply all
Reply to author
Forward
0 new messages