Redis background-saving problem

2,750 views
Skip to first unread message

chris

unread,
Jan 9, 2015, 6:15:32 AM1/9/15
to redi...@googlegroups.com
Dear all,

we are using redis as image cache, and had a strange problem/behaviour with background saving of the db , some minutes ago.

It was temporarily fixed by setting  vm.overcommit_memory = 1.
Problem happened 2 times on different redis-servers in the last 3 months.

I tried to kill (sigterm) the forks, but without any effect, maybe they were hanging?
Why do we have two forks/pids here? 17524 + 17525

As far as I know the background saving creates only one fork of the parent process.

Facts:
*********************************************
Redis Version: 2.8.17
RHEL 6.5
Kernel: 2.6.32-431.20.5.el6.x86_64

logfile says:
[17520] 09 Jan 11:12:59.059 * 10 changes in 300 seconds. Saving...
[17520] 09 Jan 11:12:59.734 * Background saving started by pid 21723
[21723] 09 Jan 11:15:56.376 * DB saved on disk
[21723] 09 Jan 11:15:56.866 * RDB: 46 MB of memory used by copy-on-write
[17520] 09 Jan 11:15:57.624 * Background saving terminated with success
[17520] 09 Jan 11:20:58.085 * 10 changes in 300 seconds. Saving...
[17520] 09 Jan 11:20:58.085 # Can't save in background: fork: Cannot allocate memory

redis.conf: maxmemory > 11 GB
Hardware RAM= 32 GB

kernel-setting: vm.overcommit_memory = 0

pstree
  |-redis-server,17520                                                            
  |   |-{redis-server},17524
  |   `-{redis-server},17525

top
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                     
17520 redis     20   0 16.3g  11g  756 R 100.2 37.4   1890:45 redis-server
 2794 tomcat    20   0 12.2g 4.1g  16m S 32.7 13.1   1157:16 java  

free -m
             total       used       free     shared    buffers     cached
Mem:         32219      31904        314          0        229      14206
-/+ buffers/cache:      17468      14750
Swap:         2047          0       2047

*************************************
Does anyone have an idea? Maybe more RAM?

Thanks in advance
Regards
Chris

Josiah Carlson

unread,
Jan 9, 2015, 1:29:46 PM1/9/15
to redi...@googlegroups.com
You're system thinks is out of memory, and may be right. Setting vm.overcommit_memory to 1 gives you more of an operational ceiling. On servers where Redis is running, leave it set at 1 unless you have a really good reason not to (does tomcat tell you not to?).

That said, it's not entirely clear to me why you're snapshotting 11 gigs of cached images to disk every 5 minutes unless a lot of images are being changed/updated. It seems to me that you would likely be able to eliminate the vast majority of your snapshotting by switching to AOF-based persistence (syncing to disk every second, or not bothering to fsync). At least then you are only rewriting the AOF when many images have been changed (at least 50% with the default settings, but this can be easily tuned up or down). If you are paying for data storage on a per IO basis (like EBS on AWS), you could measurably cut your costs as well as minimizing Redis forking.

Going a little bit deeper and extrapolating from your logs (assuming they are representative), roughly 1.1% of the data in your snapshot is being changed every 8 minutes or so (the actual snapshot cycle time - 5 minutes for the delay, 3 minutes for the actual snapshot). Thinking of it another way, for every 1 byte written to Redis, 89 bytes are written to disk. Ouch. Switching to AOF with no fsyncs with otherwise default AOF rewriting settings, you could cut your disk IO to be the equivalent of 1 byte written to Redis gets you 2 bytes written to disk. It would also reduce the number of times Redis forks to perform background snapshotting/AOF rewriting to roughly 1/45 of how often it does now. What does this mean in practice? AOF rewriting would be automatically called roughly once every 6 hours, instead of once every 8 minutes for the snapshotting you are currently seeing, and instead of writing 89 gigs of data to disk for every 1 gig written to Redis, you are down to 2 gigs.

Unless you've got something else going on too, I would suspect that much of your 14+ gigs of system cache is holding the 11 gig snapshot file that's also on disk and in Redis, so you're getting double coverage there, with no ability to actually keep it out of cache with it being refilled every 8 minutes (switch to AOF now, yesterday would have been better ;) ). I wouldn't necessarily recommend it*, but you may want to consider running something like [1] to clear your system cache after you've updated your Redis configuration (use the 'CONFIG SET' command to add AOF and disable snapshotting, then do the same thing to your on-disk configuration). 

[1] sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
* The above can and does force disk syncs, which can cause your machine to stutter if you have a lot of pending writes

In terms of PIDs, Redis uses background threads to handle a few mundane tasks, and those two child pids 17523 and 17524 are those background threads. Nothing to worry about. I don't know if Redis sets the signal mask on those threads to ignore sigterm, but I wouldn't be surprised if it did. And if it wasn't clear enough, you shouldn't be trying to kill them anyway, they aren't related in any way to any problem you are experiencing.

Regards,
 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages