Is it possible to terminate a BGSAVE already in process?

3,543 views
Skip to first unread message

Hikeonpast

unread,
Apr 16, 2012, 1:59:01 PM4/16/12
to Redis DB
Hi,

We recently had an issue where the physical drive to which Redis was
saving its RTB file had issues. Occasionally, the drive would freak
out and stop accepting writes (100% utilization, lots of iowait,
etc.). Looking at INFO, Redis was in the middle of a BGSAVE on the
failed drive. This appears to be a nearly unrecoverable scenario --
while we can configure the "dir" parameter to point to a different
drive, there doesn't appear to be a way to make the current BGSAVE
fail or abort. Issuing BGSAVE again does not start a new BGSAVE on a
new "dir" setting, because it's still working on the prior BGSAVE.
Starting a slave of this instance won't work, since it can't BGSAVE as
part of replication. We were able to enable AOF pointing to an
alternate drive to save the data before killing the instance,
fortunately.

Is there a way to abort a (blocked/failed) BGSAVE short of killing the
Redis instance, ideally from a client (redis-cli)?

Cheers,

Dean

Josiah Carlson

unread,
Apr 16, 2012, 2:16:23 PM4/16/12
to redi...@googlegroups.com
Not from the client, no.

But if you were on the server itself, you can find the child process
id, and kill that. Something like...

$ ps ax | grep redis
root 21831 0.0 0.6 76788 38736 ? Ssl Apr11 6:37
/usr/local/bin/redis-server /etc/redis/6379.conf
root 29482 0.0 0.6 77248 38732 ? R 11:17 0:00
/usr/local/bin/redis-server /etc/redis/6379.conf
josiah 29484 0.0 0.0 7624 936 pts/2 S+ 11:17 0:00 grep
--color=auto redis

The newer process (started at 11:17AM and not on April 11) is
definitely the child, but if you want to confirm...

$ pstree -pa 21831
redis-server,21831 /etc/redis/6379.conf
\u251c\u2500redis-server,29482 /etc/redis/6379.conf
\u251c\u2500{redis-server},21836
\u2514\u2500{redis-server},21837

29482 is the child BGSAVE process. The other ones without command-line
arguments are subthreads for AOF.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>

Hikeonpast

unread,
Apr 17, 2012, 12:42:16 AM4/17/12
to Redis DB
Hi Josiah,

Thanks for your reply. I believe we tried both kill and kill -9 on
the saving thread (it was an exciting morning, but not in a good way,
so apologies if I'm a bit short on specifics). Since the saving
thread was blocking on a hung I/O device, the kill was unsuccessful.
I'm guessing that if the OS can't kill the thread when blocked on IO,
that there's nothing that the Redis parent process could do either.
We hadn't experienced this particular issue before, and it caught us a
bit flatfooted. More than anything, I was curious if there were an
easy trick that we didn't try.

Cheers,

Dean

Josiah Carlson

unread,
Apr 17, 2012, 3:04:33 AM4/17/12
to redi...@googlegroups.com
Did you try unmounting the device?
Do you have access to the hardware? (if you did, and the device is
sata, you can just pull the cable and it shouldn't damage your
hardware; sata was designed for hot-swap)
Maybe use gdb on the running process to close the file handle?
It would take some preparation time, but this is the kind of thing
that there really needs to be a kernel-level call to abort on. I'm
sure there are some black-hats around (maybe not here) that could use
an exploit to inject something to do just about anything, but I don't
have a sense for who could do it, nor how long it would take...

This is the kind of thing that makes fuse useful; you can always kill fuse.

- Josiah

Reply all
Reply to author
Forward
0 new messages