IO blocks during RDB persistence

271 views
Skip to first unread message

George Chilumbu

unread,
Jul 13, 2016, 12:44:27 AM7/13/16
to Redis DB
From our grafana monitoring system, we have noticed several blocks on IO during the RDB background saving. It appears as if the main Redis daemon is blocked by forked background process, causing read timeouts errors on our API servers shown below:

[12-Jul-2016 07:00:06 Asia/Taipei] PHP Warning:  Uncaught exception
'Predis\Connection\ConnectionException': Error while reading line from the server.
[tcp://master.redis-general.service.sanchong.consul:6379]

I have also attached the griffin dashboard showing these IO blocks


Our Redis cluster has three masters and 3 slaves (one for each master). OAF is disabled on all Redis instances, with only RDB enabled with the following config:

save 3600 100000

##Other relevant config
daemonize yes
timeout
0
tcp
-keepalive 60
stop
-writes-on-bgsave-error no
rdbchecksum yes
cluster
-require-full-coverage no
latency
-monitor-threshold 0
dbfilename
"dump.rdb"
tcp
-backlog 65535
maxmemory
52gb  #total Mem = 64G
maxmemory
-policy allkeys-lru
appendonly
no


With the current RDB save settings, it seems like the background saving is happening approximately on an hourly basis as demonstrated by the log data below:

1241:S 11 Jul 10:41:00.856 * Background saving started by pid 15292
15292:C 11 Jul 10:47:25.143 * DB saved on disk
15292:C 11 Jul 10:47:25.924 * RDB: 424 MB of memory used by copy-on-write
1241:S 11 Jul 10:47:26.912 * Background saving terminated with success
1241:S 11 Jul 11:47:27.059 * 100000 changes in 3600 seconds. Saving...
1241:S 11 Jul 11:47:27.912 * Background saving started by pid 21377
21377:C 11 Jul 11:53:48.990 * DB saved on disk
21377:C 11 Jul 11:53:49.783 * RDB: 297 MB of memory used by copy-on-write
1241:S 11 Jul 11:53:50.703 * Background saving terminated with success
1241:S 11 Jul 12:53:51.027 * 100000 changes in 3600 seconds. Saving...
1241:S 11 Jul 12:53:51.878 * Background saving started by pid 27275
27275:C 11 Jul 13:00:14.418 * DB saved on disk
27275:C 11 Jul 13:00:15.176 * RDB: 56 MB of memory used by copy-on-write
1241:S 11 Jul 13:00:16.018 * Background saving terminated with success
1241:S 11 Jul 14:00:17.058 * 100000 changes in 3600 seconds. Saving...

I understand that the IO blocks happens during the folk() process when the data set is big, causing Redis to stop serving clients for some millisecond or unto a second. My question is, would adding multiple save config maybe help with the IO block situation? Any suggestions on maybe other ways i can help resolve this issue?




Screen Shot 2016-07-13 at 12.18.28 PM.png

Yuri Paes Leme

unread,
Jul 13, 2016, 9:33:44 AM7/13/16
to Redis DB
Did you already consider to use append files instead RDB?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

George Chilumbu

unread,
Jul 13, 2016, 10:46:23 PM7/13/16
to Redis DB
Initially, we were using both AOF and RDB. But since both folk(), we had too many interruptions with IO. And since we wanted to take advantage of snapshots, we went with RDB only. 

I am thinking of maybe disabling both RDB and AOF on master serves, and only have them enabled on the slaves which are read-only. That way, the master can continue to function normally without these background saving interruptions. And during failover incase of a failure or crash, i will manually disable RDB and OAF again on the newly promoted master (former slave). So this way, the RDB and AOF persistence are only performed on slaves at all times.

Any advice or suggestions or critics on this idea?

CharSyam

unread,
Jul 13, 2016, 10:52:16 PM7/13/16
to redi...@googlegroups.com
What is your SAVE parameters in redis.conf?
RDB use much IO.

George Chilumbu

unread,
Jul 13, 2016, 11:01:21 PM7/13/16
to Redis DB
SAVE 3600 100000

CharSyam

unread,
Jul 14, 2016, 1:03:21 AM7/14/16
to redi...@googlegroups.com
In your graph, I think RDB causes High Disk IO.
so, If you need RDB I think RDB on slave is better.
or Just using AOF is also good :) 

Tuco

unread,
Jul 14, 2016, 2:11:21 AM7/14/16
to Redis DB
during the rdb save, load average is expected to increase, but even load average of 3 is good depending on the no of cores which your machine has. 3 load average means that the 3 CPU cores are being utilized, considering your system will have more cores than that, it is fine.

IO activity and context switches are also expected to increase. 

A better metric to understand whether it is affecting your application is whether you are seeing decrease in the no of requests which redis is able to server.
You should try to create a graph using grafana for the requests served and other attributes every 5-10 mins from the redis "info all" commands, and if it shows a decrease during the time when the rdb save is happening, then you have something to worry about, otherwise not...
Reply all
Reply to author
Forward
0 new messages