AOF with "appendfsync no" still slow

376 views
Skip to first unread message

Daniel Mezzatto

unread,
Apr 16, 2012, 7:07:03 PM4/16/12
to redi...@googlegroups.com
Last week we switched our production Redis instances from RDB to AOF. After that, we had some problems with timeouts that never occurred before. Our client application has a 300 miliseconds timeout for the Redis replies. We are using the "appendfsync no" config.

After some debugging, we concluded that the write() call inside flushAppendOnlyFile() takes a lot of time when the OS buffer is filled and needs to be flushed. When this happens, Redis is blocked inside this write() call, the request queue fills and some requests cannot get a response in less than 300 milliseconds.

Wouldn't this scenario be faster if Redis had a "writing thread"? Instead of calling write(), flushAppendOnlyFile() could append to a circular buffer. This buffer would be read by this "writing thread" that calls write(). This way I think Redis would not block.

Any thoughts?

Pieter Noordhuis

unread,
Apr 16, 2012, 7:21:42 PM4/16/12
to redi...@googlegroups.com
Redis would block if that circular buffer fills up. Moving writes to a
thread does not change the fundamental problem, that when your disk
buffers are backed up, you may end up blocking until they free up
again. Because your kernel already applies some sort of write caching,
adding another layer for writes is just moving the problem away from
the kernel to Redis. Also (maybe even more importantly), using a
thread for writes adds tons of complexity when you DO need fsync
guarantees.

Can you provide some information about the infrastructure you run
Redis on? It seems that either you have a huge write spike that
overflows the kernel's write buffers, causing Redis to block on write,
or unpredictable/high latency for your disks.

Cheers,
Pieter

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/GiX8ReM74TsJ.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.

Daniel Mezzatto

unread,
Apr 16, 2012, 8:06:31 PM4/16/12
to redi...@googlegroups.com
Hi Pieter,

I have just remembered an old post from Salvatore's blog (http://antirez.com/post/fsync-different-thread-useless.html). He had already explained about this behavior but I did not remember it.

There is indeed a write spike (something like 100k HMSET setting 4 fields on each hash) every 2 hours. This is our update application running.
We have 128 Redis instances running on 12 quad-core Intel Xeon 2.6 GHz. The write are more or less equally distributed between these instances (we use a CRC16 sharding that is good enough for out needs).

Can you think of anything that might improve our performance?

Jeremy Zawodny

unread,
Apr 16, 2012, 8:09:46 PM4/16/12
to redi...@googlegroups.com
What about the disk(s) involved?

Jeremy

On Mon, Apr 16, 2012 at 5:06 PM, Daniel Mezzatto

>> > redis-db+u...@googlegroups.com.


>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/nD18XTrmxMIJ.


>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

> redis-db+u...@googlegroups.com.

Salvatore Sanfilippo

unread,
Apr 17, 2012, 4:01:04 AM4/17/12
to redi...@googlegroups.com
On Tue, Apr 17, 2012 at 2:06 AM, Daniel Mezzatto
<daniel....@gmail.com> wrote:
> I have just remembered an old post from Salvatore's blog
> (http://antirez.com/post/fsync-different-thread-useless.html). He had
> already explained about this behavior but I did not remember it.

Hi Daniel,

I don't think Pieter is trying to tell this to you, what Pieter is
saying is the following: most DB systems have a fundamental rule, that
disk should be, even with fsync disabled, fast enough to accept all
the writes that clients are performing. If disk is not fast enough to
accept writes you can write to a buffer, but because you write to the
buffer more than you can write to the disk, every second, the buffer
gets bigger and bigger, and soon or later you'll have something like:
if (buffer > MAX_BUFFER) { block_and_write(); }. That is exactly what
the kernel is *already doing*. So this has nothing to do with fsync
blocking. It's a fundamental rule.

This rule is escaped only by RDB persistence, because it is not bound
to write load, this is quite an exception in the database world, but
the cost is that latest writes are lot very durable. (RDB is different
because it does not serve on disk single writes, it performs a copy of
the DB at specified periods of time, so a key "x" can be updated 1
trilion times per second but still we write it on disk every 5
minutes).

Chances are that you are using a disk that is too slow compared to the
write load you are generating.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Hampus Wessman

unread,
Apr 17, 2012, 4:46:20 AM4/17/12
to redi...@googlegroups.com
Hi Daniel,

I assume that you're running Linux. You can configure how much data Linux buffers in memory before it starts to write data to disk and before it blocks writers. It might be useful for you to tweak those settings. Have a look at dirty_ratio, dirty_background_ratio and possibly dirty_expire_centiseconds. I think some Linux distributions have very conservative values as defaults. You could try to lower dirty_background_ratio (to start flushing buffers to disk earlier) and increase dirty_ratio (to allow more dirty data before writers are blocked). Does that solve the problem?

This page seems to have some good info about the subject:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Regards,
Hampus
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/nD18XTrmxMIJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Daniel Mezzatto

unread,
Apr 17, 2012, 9:48:17 AM4/17/12
to redi...@googlegroups.com
The disks are indeed pretty old ones. These 12 machines are some old ones that were running Apache httpd since 2007 and got repurposed to Redis last year.

Don't think we'll be able to upgrade the disks to faster ones...

I'll try to change the settings that Hampus pointed to see if they can improve the performance.

Thanks for the insights.
To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.

Salvatore Sanfilippo

unread,
Apr 17, 2012, 9:50:30 AM4/17/12
to redi...@googlegroups.com
On Tue, Apr 17, 2012 at 3:48 PM, Daniel Mezzatto
<daniel....@gmail.com> wrote:

> Don't think we'll be able to upgrade the disks to faster ones...

You may want to try a different filesystem or different ext3/ext4
mount flags, they can do a pretty big difference...

Reply all
Reply to author
Forward
0 new messages