You're system thinks is out of memory, and may be right. Setting vm.overcommit_memory to 1 gives you more of an operational ceiling. On servers where Redis is running, leave it set at 1 unless you have a really good reason not to (does tomcat tell you not to?).
That said, it's not entirely clear to me why you're snapshotting 11 gigs of cached images to disk every 5 minutes unless a lot of images are being changed/updated. It seems to me that you would likely be able to eliminate the vast majority of your snapshotting by switching to AOF-based persistence (syncing to disk every second, or not bothering to fsync). At least then you are only rewriting the AOF when many images have been changed (at least 50% with the default settings, but this can be easily tuned up or down). If you are paying for data storage on a per IO basis (like EBS on AWS), you could measurably cut your costs as well as minimizing Redis forking.
Going a little bit deeper and extrapolating from your logs (assuming they are representative), roughly 1.1% of the data in your snapshot is being changed every 8 minutes or so (the actual snapshot cycle time - 5 minutes for the delay, 3 minutes for the actual snapshot). Thinking of it another way, for every 1 byte written to Redis, 89 bytes are written to disk. Ouch. Switching to AOF with no fsyncs with otherwise default AOF rewriting settings, you could cut your disk IO to be the equivalent of 1 byte written to Redis gets you 2 bytes written to disk. It would also reduce the number of times Redis forks to perform background snapshotting/AOF rewriting to roughly 1/45 of how often it does now. What does this mean in practice? AOF rewriting would be automatically called roughly once every 6 hours, instead of once every 8 minutes for the snapshotting you are currently seeing, and instead of writing 89 gigs of data to disk for every 1 gig written to Redis, you are down to 2 gigs.
Unless you've got something else going on too, I would suspect that much of your 14+ gigs of system cache is holding the 11 gig snapshot file that's also on disk and in Redis, so you're getting double coverage there, with no ability to actually keep it out of cache with it being refilled every 8 minutes (switch to AOF now, yesterday would have been better ;) ). I wouldn't necessarily recommend it*, but you may want to consider running something like [1] to clear your system cache after you've updated your Redis configuration (use the 'CONFIG SET' command to add AOF and disable snapshotting, then do the same thing to your on-disk configuration).
[1] sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
* The above can and does force disk syncs, which can cause your machine to stutter if you have a lot of pending writes
In terms of PIDs, Redis uses background threads to handle a few mundane tasks, and those two child pids 17523 and 17524 are those background threads. Nothing to worry about. I don't know if Redis sets the signal mask on those threads to ignore sigterm, but I wouldn't be surprised if it did. And if it wasn't clear enough, you shouldn't be trying to kill them anyway, they aren't related in any way to any problem you are experiencing.
Regards,
- Josiah