Recently I started using the Xapian-based notmuch mail client for everyday
use. One of the things I was quite surprised by after the switch was the
incredible hit in interactive performance that is observed during database
updates. Things are particularly bad during runs of 'notmuch new,' which scans
the file system looking for new messages and adds them to the database.
Specifically, the worst of the performance hit appears to occur when the
database is being updated.
During these periods, even small chunks of I/O can become minute-long ordeals.
It is common for latencytop to show 30 second long latencies for page faults
and writing pages. Interactive performance is absolutely abysmal, with other
unrelated processes feeling horrible latencies, causing media players,
editors, and even terminals to grind to a halt.
Despite the system being clearly I/O bound, iostat shows pitiful disk
throughput (700kByte/second read, 300 kByte/second write). Certainly this poor
performance can, at least to some degree, be attributable to the fact that
Xapian uses fdatasync() to ensure data consistency. That being said, it seems
like Xapian's page usage causes horrible thrashing, hence the performance hit
on unrelated processes. Moreover, the hit on unrelated processes is so bad
that I would almost suspect that swap I/O is being serialized by fsync() as
well, despite being on a separate swap partition beyond the control of the
filesystem.
Xapian, however, is far from the first time I have seen this sort of
performance cliff. Rsync, which also uses fsync(), can also trigger this sort
of thrashing during system backups, as can rdiff. slocate's updatedb
absolutely kills interactive performance as well.
Issues similar to this have been widely reported[1-5] in the past, and despite
many attempts[5-8] within both I/O and memory managements subsystems to fix
it, the problem certainly remains. I have tried reducing swappiness from 60 to
40, with some small improvement and it has been reported[20] that these sorts
of symptoms can be negated through use of memory control groups to prevent
interactive process pages from being evicted.
I would really like to see this issue finally fixed. I have tried
several[2][3] times to organize the known data about this bug, although in all
cases discussion has stopped with claims of insufficient data (which is fair,
admittedly, it's a very difficult issue to tackle). However, I do think that
_something_ has to be done to alleviate the thrashing and poor interactive
performance that these work-loads cause.
Thanks,
- Ben
[1] http://bugzilla.kernel.org/show_bug.cgi?id=5900
[2] http://bugzilla.kernel.org/show_bug.cgi?id=7372
[3] http://bugzilla.kernel.org/show_bug.cgi?id=12309
[4] http://lkml.org/lkml/2009/4/28/24
[5] http://lkml.org/lkml/2009/3/26/72
[6] http://notmuchmail.org/pipermail/notmuch/2010/001868.html
[10] http://lkml.org/lkml/2009/5/16/225
[11] http://lkml.org/lkml/2007/7/21/219
[12] http://lwn.net/Articles/328363/
[13] http://lkml.org/lkml/2009/4/6/114
[20] http://lkml.org/lkml/2009/4/28/68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
What kernel version are you using; what distribution and what version
of that distro are you running; what file system are you using and
what if any mount options are you using? And what kind of hard drives
do you have?
I'm going to assume you're running into the standard ext3
"data=ordered" entagled writes problem. There are solutions, such as
switching to using ext4, mounting with data=writeback mode, but they
have various shortcomings.
A number of improvements have been made in ext3 and ext4 since some of
the discussions you quoted, but since you didn't tell us what
distribution version and/or what kernel version you are using, we
can't tell you are using those newer improvements yet.
- Ted
On Tue, 16 Mar 2010 21:24:39 -0400, ty...@mit.edu wrote:
> What kernel version are you using; what distribution and what version
> of that distro are you running; what file system are you using and
> what if any mount options are you using? And what kind of hard drives
> do you have?
While this problem has been around for some time, my current configuration
is the following:
Kernel 2.6.32 (although also reproducible with kernels at least as early as 2.6.28)
Filesystem: Now Btrfs (was ext4 less than a week ago), default mount options
Hard drive: Seagate Momentus 7200.4 (ST9500420AS)
Distribution: Ubuntu 9.10 (Karmic)
>
> I'm going to assume you're running into the standard ext3
> "data=ordered" entagled writes problem. There are solutions, such as
> switching to using ext4, mounting with data=writeback mode, but they
> have various shortcomings.
>
Unfortunately several people have continued to encounter unacceptable
latency, even with ext4 and data=writeback.
> A number of improvements have been made in ext3 and ext4 since some of
> the discussions you quoted, but since you didn't tell us what
> distribution version and/or what kernel version you are using, we
> can't tell you are using those newer improvements yet.
>
Sorry about that. I should know better by now.
- Ben
.... so did switching to Btrfs solve your latency issues, or are you
still having problems?
- Ted
Still having troubles although I'm now running 2.6.34-rc1 and things seem
mildly better. I'll try doing a backup tonight and report back.
- Ben
On Tue, Mar 16, 2010 at 08:31:12AM -0700, Ben Gamari wrote:
> Hey all,
>
> Recently I started using the Xapian-based notmuch mail client for everyday
> use. One of the things I was quite surprised by after the switch was the
> incredible hit in interactive performance that is observed during database
> updates. Things are particularly bad during runs of 'notmuch new,' which scans
> the file system looking for new messages and adds them to the database.
> Specifically, the worst of the performance hit appears to occur when the
> database is being updated.
>
> During these periods, even small chunks of I/O can become minute-long ordeals.
> It is common for latencytop to show 30 second long latencies for page faults
> and writing pages. Interactive performance is absolutely abysmal, with other
> unrelated processes feeling horrible latencies, causing media players,
> editors, and even terminals to grind to a halt.
>
> Despite the system being clearly I/O bound, iostat shows pitiful disk
> throughput (700kByte/second read, 300 kByte/second write). Certainly this poor
> performance can, at least to some degree, be attributable to the fact that
> Xapian uses fdatasync() to ensure data consistency. That being said, it seems
> like Xapian's page usage causes horrible thrashing, hence the performance hit
> on unrelated processes.
Where are the unrelated processes waiting? Can you get a sample of
several backtraces? (/proc/<pid>/stack should do it)
> Moreover, the hit on unrelated processes is so bad
> that I would almost suspect that swap I/O is being serialized by fsync() as
> well, despite being on a separate swap partition beyond the control of the
> filesystem.
It shouldn't be, until it reaches the bio layer. If it is on the same
block device, it will still fight for access. It could also be blocking
on dirty data thresholds, or page reclaim though -- writeback and
reclaim could easily be getting slowed down by the fsync activity.
Swapping tends to cause fairly nasty disk access patterns, combined with
fsync it could be pretty unavoidable.
>
> Xapian, however, is far from the first time I have seen this sort of
> performance cliff. Rsync, which also uses fsync(), can also trigger this sort
> of thrashing during system backups, as can rdiff. slocate's updatedb
> absolutely kills interactive performance as well.
>
> Issues similar to this have been widely reported[1-5] in the past, and despite
> many attempts[5-8] within both I/O and memory managements subsystems to fix
> it, the problem certainly remains. I have tried reducing swappiness from 60 to
> 40, with some small improvement and it has been reported[20] that these sorts
> of symptoms can be negated through use of memory control groups to prevent
> interactive process pages from being evicted.
So the workload is causing quite a lot of swapping as well? How much
pagecache do you have? It could be that you have too much pagecache and
it is pushing out anonymous memory too easily, or you might have too
little pagecache causing suboptimal writeout patterns (possibly writeout
from page reclaim rather than asynchronous dirty page cleaner threads,
which can really hurt).
Thanks,
Nick
> Hi,
>
> On Tue, Mar 16, 2010 at 08:31:12AM -0700, Ben Gamari wrote:
> > Hey all,
> >
> > Recently I started using the Xapian-based notmuch mail client for everyday
> > use. One of the things I was quite surprised by after the switch was the
> > incredible hit in interactive performance that is observed during database
> > updates. Things are particularly bad during runs of 'notmuch new,' which scans
> > the file system looking for new messages and adds them to the database.
> > Specifically, the worst of the performance hit appears to occur when the
> > database is being updated.
> >
> > During these periods, even small chunks of I/O can become minute-long ordeals.
> > It is common for latencytop to show 30 second long latencies for page faults
> > and writing pages. Interactive performance is absolutely abysmal, with other
> > unrelated processes feeling horrible latencies, causing media players,
> > editors, and even terminals to grind to a halt.
> >
> > Despite the system being clearly I/O bound, iostat shows pitiful disk
> > throughput (700kByte/second read, 300 kByte/second write). Certainly this poor
> > performance can, at least to some degree, be attributable to the fact that
> > Xapian uses fdatasync() to ensure data consistency. That being said, it seems
> > like Xapian's page usage causes horrible thrashing, hence the performance hit
> > on unrelated processes.
>
> Where are the unrelated processes waiting? Can you get a sample of several
> backtraces? (/proc/<pid>/stack should do it)
A call-graph profile will show the precise reason for IO latencies, and their
relatively likelihood.
It's really simple to do it with a recent kernel. Firstly, enable
CONFIG_BLK_DEV_IO_TRACE=y, CONFIG_EVENT_PROFILE=y:
Kernel performance events and counters (PERF_EVENTS) [Y/?] y
Tracepoint profiling sources (EVENT_PROFILE) [Y/n/?] y
Support for tracing block IO actions (BLK_DEV_IO_TRACE) [N/y/?] y
(boot into this kernel)
Then build perf via:
cd tools/perf/
make -j install
and then capture 10 seconds of the DB workload:
perf record -f -g -a -e block:block_rq_issue -c 1 sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.251 MB perf.data (~10977 samples) ]
and look at the call-graph output:
perf report
# Samples: 5
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......
#
80.00% kjournald [kernel.kallsyms] [k] perf_trace_block_rq_issue
|
--- perf_trace_block_rq_issue
scsi_request_fn
|
|--50.00%-- __blk_run_queue
| cfq_insert_request
| elv_insert
| __elv_add_request
| __make_request
| generic_make_request
| submit_bio
| submit_bh
| sync_dirty_buffer
| journal_commit_transaction
| kjournald
| kthread
| kernel_thread_helper
|
--50.00%-- __generic_unplug_device
generic_unplug_device
blk_unplug
blk_backing_dev_unplug
sync_buffer
__wait_on_bit
out_of_line_wait_on_bit
__wait_on_buffer
wait_on_buffer
journal_commit_transaction
kjournald
kthread
kernel_thread_helper
20.00% as [kernel.kallsyms] [k] perf_trace_block_rq_issue
|
--- perf_trace_block_rq_issue
scsi_request_fn
__generic_unplug_device
generic_unplug_device
blk_unplug
blk_backing_dev_unplug
page_cache_async_readahead
generic_file_aio_read
do_sync_read
vfs_read
sys_read
system_call_fastpath
0x39f8ad4930
This (very simple) example had 80% of the IO in kjournald and 20% of it in
'as'. The precise call-paths of IO issues are visible.
For general scheduler context-switch events you can use:
perf record -f -g -a -e context-switches -c 1 sleep 10
see 'perf list' for all events.
Thanks,
Ingo
I am experiencing very similar issue. My system is a regular desktop
PC and it suffers from very high I/O latencies (sometimes desktop
"hangs" for eight seconds or more) when copying large files. I tried
kernels up to 2.6.34-rc2, but without luck. This issue was raised at
Phoronix forums and Arjan (from Intel) noticed it can be VM related:
http://www.phoronix.com/forums/showpost.php?p=114975&postcount=51
Here is my perf timechart where you can notice I/O "steals" CPU from
the other tasks:
http://hotfile.com/dl/30596827/ebe566b/output.svg.gz.html
Regards!
P.S. if there is some way I can help more just let me know please.
It's also been my sneaking suspicion that swap is involved. I had lots
of RAM in anything I use, even the laptop and workstation. I'll try and
run some tests with lower memory and force it into swap, I've seen nasty
hangs that way.
--
Jens Axboe
I would suggest that you include a 2.6.31 kernel in your testing. I have
seen something that seems like "huge" stalls in 2.6.32 but I havent been
able to "dig into it" to find more.
In 2.6.32 I have seen IO-wait numbers around 80% on a 16 core machine
with 128GB of memory and load-numbers over 120 under workloads that
didn't make 2.6.31 sweat at all.
Filesystems are a mixture of ext3 and ext4 (so it could be the barriers)?
--
Jesper
I apologize for my extreme tardiness in replying to your responses. I was
hoping to have more time during Spring break in dealing with this issue than I
did (as always). Nevertheless, I'll hopefully be able to keep up with things at
this point. Specific replies will follow.
- Ben
>
> > Moreover, the hit on unrelated processes is so bad
> > that I would almost suspect that swap I/O is being serialized by fsync() as
> > well, despite being on a separate swap partition beyond the control of the
> > filesystem.
>
> It shouldn't be, until it reaches the bio layer. If it is on the same
> block device, it will still fight for access. It could also be blocking
> on dirty data thresholds, or page reclaim though -- writeback and
> reclaim could easily be getting slowed down by the fsync activity.
>
Hmm, this sounds interesting. Is there a way to monitor writeback throughput.
> Swapping tends to cause fairly nasty disk access patterns, combined with
> fsync it could be pretty unavoidable.
>
This is definitely a possibility. However, it seems to me like swapping should
be at least mildly favored over other I/O by the I/O scheduler. That being
said, I can certainly see how it would be difficult to implement such a
heuristic in a fair way so as not to block out standard filesystem access
during a thrashing spree.
> >
> > Xapian, however, is far from the first time I have seen this sort of
> > performance cliff. Rsync, which also uses fsync(), can also trigger this sort
> > of thrashing during system backups, as can rdiff. slocate's updatedb
> > absolutely kills interactive performance as well.
> >
> > Issues similar to this have been widely reported[1-5] in the past, and despite
> > many attempts[5-8] within both I/O and memory managements subsystems to fix
> > it, the problem certainly remains. I have tried reducing swappiness from 60 to
> > 40, with some small improvement and it has been reported[20] that these sorts
> > of symptoms can be negated through use of memory control groups to prevent
> > interactive process pages from being evicted.
>
> So the workload is causing quite a lot of swapping as well? How much
> pagecache do you have? It could be that you have too much pagecache and
> it is pushing out anonymous memory too easily, or you might have too
> little pagecache causing suboptimal writeout patterns (possibly writeout
> from page reclaim rather than asynchronous dirty page cleaner threads,
> which can really hurt).
>
As far as I can tell, the workload should fit in memory without a problem. This
machine has 4 gigabytes of memory, of which currently 2.8GB is page cache.
Seems high perhaps? I've included meminfo below. I can completely see how
overly-aggressive page-cache would result in this sort of behavior.
- Ben
MemTotal: 4048068 kB
MemFree: 47232 kB
Buffers: 48 kB
Cached: 2774648 kB
SwapCached: 1148 kB
Active: 2353572 kB
Inactive: 1355980 kB
Active(anon): 1343176 kB
Inactive(anon): 342644 kB
Active(file): 1010396 kB
Inactive(file): 1013336 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4883756 kB
SwapFree: 4882532 kB
Dirty: 24736 kB
Writeback: 0 kB
AnonPages: 933820 kB
Mapped: 88840 kB
Shmem: 750948 kB
Slab: 150752 kB
SReclaimable: 121404 kB
SUnreclaim: 29348 kB
KernelStack: 2672 kB
PageTables: 31312 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6907788 kB
Committed_AS: 2773672 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 364080 kB
VmallocChunk: 34359299100 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8552 kB
DirectMap2M: 4175872 kB
- Ben
- Ben
I have posted another profile[1] from an incident yesterday. As you can see,
both swapper and init (strange?) show up prominently in the profile. Moreover,
most processes seem to be in blk_peek_request a disturbingly large percentage
of the time. Both of these profiles were taken with 2.6.34-rc kernels.
Anyone have any ideas on how to proceed? Is more profile data necessary? Are
the existing profiles at all useful? Thanks,
- Ben
Apparently my initial email announcing my first set of profiles never made it
out. Sorry for the confusion. I've included it below.
From: Ben Gamari <bgamar...@gmail.com>
Subject: Re: Poor interactive performance with I/O loads with fsync()ing
To: Ingo Molnar <mi...@elte.hu>, Nick Piggin <npi...@suse.de>
Cc: ty...@mit.edu, linux-...@vger.kernel.org, Olly Betts
<ol...@survex.com>, martin f krafft <mad...@madduck.net>
Bcc: bga...@gmail.com
In-Reply-To: <20100317093...@elte.hu>
References: <4b9fa440.12135e...@mx.google.com>
<20100317045350.GA2869@laptop> <20100317093...@elte.hu>
On Wed, 17 Mar 2010 10:37:04 +0100, Ingo Molnar <mi...@elte.hu> wrote:
> A call-graph profile will show the precise reason for IO latencies, and their
> relatively likelihood.
Well, here is something for now. I'm not sure how valid the reproduction
workload is (git pull, rsync, and 'notmuch new' all running at once), but I
certainly did produce a few stalls and swapper is highest on the profile.
This was on 2.6.34-rc2. I've included part of the profile below, although more
complete set of data is available at [1].
Thanks,
- Ben
[1] http://mw0.mooo.com/~ben/latency-2010-03-25-a/
# Samples: 25295
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......
#
24.50% swapper [kernel.kallsyms] [k] blk_peek_request
|
--- blk_peek_request
scsi_request_fn
__blk_run_queue
|
|--98.32%-- blk_run_queue
| scsi_run_queue
| scsi_next_command
| scsi_io_completion
| scsi_finish_command
| scsi_softirq_done
| blk_done_softirq
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--99.56%-- do_IRQ
| | ret_from_intr
| | |
| | |--98.02%-- cpuidle_idle_call
| | | cpu_idle
| | | rest_init
| | | start_kernel
| | | x86_64_start_reservations
| | | x86_64_start_kernel
| | |
| | |--0.91%-- clockevents_notify
| | | lapic_timer_state_broadcast
| | | |
| | | |--83.64%-- acpi_idle_enter_bm
| | | | cpuidle_idle_call
| | | | cpu_idle
| | | | rest_init
| | | | start_kernel
| | | | x86_64_start_reservations
| | | | x86_64_start_kernel
| | | |
| | | --16.36%-- acpi_idle_enter_simple
| | | cpuidle_idle_call
| | | cpu_idle
| | | rest_init
| | | start_kernel
| | | x86_64_start_reservations
| | | x86_64_start_kernel
| | |
| | |--0.81%-- cpu_idle
| | | rest_init
| | | start_kernel
| | | x86_64_start_reservations
| | | x86_64_start_kernel
| | --0.26%-- [...]
| --0.44%-- [...]
|
--1.68%-- elv_completed_request
__blk_put_request
blk_finish_request
blk_end_bidi_request
blk_end_request
scsi_io_completion
scsi_finish_command
scsi_softirq_done
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
ret_from_intr
|
|--96.15%-- cpuidle_idle_call
| cpu_idle
| rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
|
|--1.92%-- cpu_idle
| rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
|
|--0.96%-- schedule
| cpu_idle
| rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
|
--0.96%-- clockevents_notify
lapic_timer_state_broadcast
acpi_idle_enter_bm
cpuidle_idle_call
cpu_idle
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
23.74% init [kernel.kallsyms] [k] blk_peek_request
|
--- blk_peek_request
scsi_request_fn
__blk_run_queue
|
|--98.77%-- blk_run_queue
| scsi_run_queue
| scsi_next_command
| scsi_io_completion
| scsi_finish_command
| scsi_softirq_done
| blk_done_softirq
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--99.87%-- do_IRQ
| | ret_from_intr
| | |
| | |--98.38%-- cpuidle_idle_call
| | | cpu_idle
| | | start_secondary
| | |
| | |--0.81%-- schedule
| | | cpu_idle
| | | start_secondary
| | |
| | |--0.56%-- cpu_idle
| | | start_secondary
| | --0.25%-- [...]
| --0.13%-- [...]
|
--1.23%-- elv_completed_request
__blk_put_request
blk_finish_request
blk_end_bidi_request
blk_end_request
scsi_io_completion
scsi_finish_command
scsi_softirq_done
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
ret_from_intr
cpuidle_idle_call
cpu_idle
start_secondary
5.85% chromium-browse [kernel.kallsyms] [k] blk_peek_request
|
--- blk_peek_request
scsi_request_fn
__blk_run_queue
blk_run_queue
scsi_run_queue
scsi_next_command
scsi_io_completion
scsi_finish_command
scsi_softirq_done
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
ret_from_intr
|
|--50.00%-- check_match.8653
|
--50.00%-- unlink_anon_vmas
free_pgtables
exit_mmap
mmput
exit_mm
do_exit
do_group_exit
sys_exit_group
system_call
...
> Hey all,
>
> I have posted another profile[1] from an incident yesterday. As you
> can see, both swapper and init (strange?) show up prominently in the
> profile. Moreover, most processes seem to be in blk_peek_request a
> disturbingly large percentage of the time. Both of these profiles
> were taken with 2.6.34-rc kernels.
>
> Anyone have any ideas on how to proceed? Is more profile data
> necessary? Are the existing profiles at all useful? Thanks,
profiles tend to be about cpu usage... and are rather poor to deal with
anything IO related.
latencytop might get closer in giving useful information....
(btw some general suggestion.. make sure you're using noatime or
relatime as mount option)
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
perf record -f -g -a -e block:block_rq_issue -c 1
Which I believe measures block requests issued, not CPU usage (correct me if
I'm wrong).
> profiles tend to be about cpu usage... and are rather poor to deal with
> anything IO related.
>
See above.
> latencytop might get closer in giving useful information....
>
Latencytop generally shows a large amount of time handling page faults.
> (btw some general suggestion.. make sure you're using noatime or
> relatime as mount option)
Thanks for the suggestion. I had actually forgotten relatime in my fstab, so
we'll see if there's any improvement now. That being said, I/O loads over small
numbers of files (e.g. xapian) are just as bad as loads over large numbers of
files. To me that weakly suggests perhaps atime updates aren't the issue (I
could be horribly wrong though).
- Ben
You don't say which file system you use, but ext3 and the file systems
with similar journal design (like reiserfs) all have known fsync starvation
issues. The problem is that any fsync has to wait for all transactions
to commit, and this might take a long time depending on how busy
the disk is.
ext4/XFS/JFS/btrfs should be better in this regard
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
> It's also been my sneaking suspicion that swap is involved. I had lots
> of RAM in anything I use, even the laptop and workstation. I'll try and
> run some tests with lower memory and force it into swap, I've seen nasty
> hangs that way.
I am not sure if the memory swapping is the case. According to KDE
System monitor swap is not even touched when copying files. I also
noticed similar responsiveness problems when extracting some rar files
- twelve parts , each about 100MB and the system becomes unresponsive
for some time since the first seconds of operation, then it behaves
normally and then problem is back. I have 2GB of RAM. I am using ext4
file system and I am using noatime mount option.
P.S. It seems it is a little better when I have AHCI mode set in BIOS
(at least when extracting archives).
P.S.2 I would be glad to provide some useful data. I created perf
chart, but if this is not enough just instruct me what should I do
next, please.
Regards
Pawel
- Ben
As I've said in the past, I am very interested in seeing this problem looked at and
would love to contribute whatever I can to that effort. However, without knowing what
information is necessary, I can be of only very limited use in my own debugging
efforts. Thanks,
- Ben
btrfs is known to perform poorly under fsync.
--
error compiling committee.c: too many arguments to function
- Ben
By design a copy on write tree fs would need to flush a whole
tree hierarchy on a sync. btrfs avoids this by using a special
log for fsync, but that causes more overhead if you have that
log on the same disk. So IO subsystem will do more work.
It's a bit like JBD data journaling.
However it should not have the stalls inherent in ext3's journaling.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
> On 04/09/2010 05:56 PM, Ben Gamari wrote:
> > On Mon, 29 Mar 2010 00:08:58 +0200, Andi Kleen<an...@firstfloor.org> wrote:
> >
> > > Ben Gamari<bgamar...@gmail.com> writes:
> > > ext4/XFS/JFS/btrfs should be better in this regard
> > >
> > >
> > I am using btrfs, so yes, I was expecting things to be better.
> > Unfortunately,
> > the improvement seems to be non-existent under high IO/fsync load.
> >
> >
>
> btrfs is known to perform poorly under fsync.
XFS does not do much better. Just moved my VM images back to ext for
that reason.
Thanks,
tglx
Did you move from XFS to ext3? ext3 defaults to barriers off, XFS on,
which can make a big difference depending on the disk. You can
disable them on XFS too of course, with the known drawbacks.
XFS also typically needs some tuning to get reasonable log sizes.
My point was merely (before people chime in with counter examples)
that XFS/btrfs/jfs don't suffer from the "need to sync all transactions for
every fsync" issue. There can (and will be) still other issues.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
> > XFS does not do much better. Just moved my VM images back to ext for
> > that reason.
>
> Did you move from XFS to ext3? ext3 defaults to barriers off, XFS on,
> which can make a big difference depending on the disk. You can
> disable them on XFS too of course, with the known drawbacks.
>
> XFS also typically needs some tuning to get reasonable log sizes.
>
> My point was merely (before people chime in with counter examples)
> that XFS/btrfs/jfs don't suffer from the "need to sync all transactions for
> every fsync" issue. There can (and will be) still other issues.
Yes, I moved them back from XFS to ext3 simply because moving them
from ext3 to XFS turned out to be a completely unusable disaster.
I know that I can tweak knobs on XFS (or any other file system), but I
would not have expected that it sucks that much for KVM with the
default settings which are perfectly fine for the other use cases
which made us move to XFS.
Thanks,
tglx
Thomas, what Andi was merely turning out, is that xfs has a really
concerning different default: barriers, that hurts with fsync().
In order to make a fair comparison of the two, you may want to mount xfs
with nobarrier or ext3 with barrier option set, and _then_ check which one
is sucking less.
I guess, that outcome will be interesting for quite a bunch of people in the
audience (including me¹).
Pete
¹) while in transition of getting rid of even suckier technology junk like
VMware-Server - but digging out a current², but _stable_ kernel release
seems harder then ever nowadays.
²) with operational VT-d support for kvm
Numbers? Workload description? Mount options? I hate it when all I
hear is "XFS sucked, so I went back to extN" reports without any
more details - it's hard to improve anything without any details
of the problems.
Also worth remembering is that XFS defaults to slow-but-safe
options, but ext3 defaults to fast-and-I-don't-give-a-damn-about-
data-safety, so there's a world of difference between the
filesystem defaults....
And FWIW, I run all my VMs on XFS using default mkfs and mount options,
and I can't say that I've noticed any performance problems at all
despite hammering the IO subsystems all the time. The only thing
I've ever done is occasionally run xfs_fsr across permanent qcow2
VM images to defrag them as the grow slowly over time...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
And if you are asking for details, the type of storage you use is also
quite interesting.
Thanks!
Ric