> If your disk is slow during AOF re-write operation (again, you can prove
> that by looking at your iostat avgqu-sz and await/w_await params and see
> that they are growing),
Will look at that ASAP and report back. But — if my disk write speed
is 50 MB/s and my AOF file size (when rewritten) is 500 MB, then I've
got 10 seconds write no matter what. Which is consistent with what I
see in logs. (See also below.)
> then there is a chance that your Redis hasn't
> completed its fsync operations within 2 secs and the entire Redis operation
> is blocked until fsync is completed.
> See more details here
Here is what I found:
> You may wonder what happens to data that is written to the server while the rewrite is in progress. This new data is simply also written to the old (current) AOF
> file, and at the same time queued into an in-memory buffer, so that when the new AOF is ready we can write this missing part inside it, and finally replace the
> old AOF file with the new one.
Looks like I have to watch for the memory as well. But Redis has
plenty more available than AOF file would grow in that 10 seconds.
$ free -m
total used free shared buffers cached
Mem: 4012 2518 1493 0 141 1391
-/+ buffers/cache: 985 3026
Swap: 5362 0 5362
$ du -s /var/lib/redis/appendonly.aof; sleep 10; du -s
/var/lib/redis/appendonly.aof
827152 /var/lib/redis/appendonly.aof
827488 /var/lib/redis/appendonly.aof
But, I assume that you're pointing me at this quote:
> However if the disk can't cope with the write speed, and the background fsync(2) call is taking longer than 1 second,
> Redis may delay the write up to an additional second (in order to avoid that the write will block the main thread
> because of an fsync(2) running in the background thread against the same file descriptor). If a total of two seconds
> elapsed without that fsync(2) was able to terminate, Redis finally performs a (likely blocking) write(2) to transfer data
> to the disk at any cost.
So, the theory is that background fork doing AOF consumes all disk
write bandwidth, and parent Redis process simply can't flush its data.
Did I get that right?
OK, will experiment with iostat now, so we'd have some hard data to discuss.
Thanks,
Alexander.