On Tue, 2 Mar 2010 16:31:03 GMT
bugzill...@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15426
>
> Summary: Running many copies of bonnie++ on different
> filesystems seems to deadlock in sync
> Product: File System
> Version: 2.5
> Kernel Version: 2.6.32
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: VFS
> AssignedTo: fs_...@kernel-bugs.osdl.org
> ReportedBy: jo...@calva.com
> Regression: No
>
>
> I create 8 ext3 filesystems on 8 disks and run a copy of bonnie++ on each
> filesystem. All but two or three of the Bonnie++ processes seem to hang in
> sync and I get log messages like:
>
> [ 1557.377626] INFO: task bonnie:2503 blocked for more than 120 seconds.
> [ 1557.377693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> [ 1557.377775] bonnie D 0000000000000000 0 2503 2489 0x00000000
> [ 1557.377778] ffffffff8144b1f0 0000000000000082 0000000000000000
> ffff88034ac155c0
> [ 1557.377780] ffff88033c079880 ffffffff81047f43 000000000000f8a0
> ffff88063b1e9fd8
> [ 1557.377783] 00000000000155c0 00000000000155c0 ffff88063bbe3f90
> ffff88063bbe4288
> [ 1557.377785] Call Trace:
> [ 1557.377787] [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
> [ 1557.377790] [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1557.377792] [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1557.377794] [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1557.377796] [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1557.377799] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
When the system is stuck, please run
echo w > /proc/sysrq-trigger
or, equivalently, hit `alt-sysrq-w'.
This will emit huge amounts of stack traces into the system logs and
onto the console. There's no need to send all of them - please
identify typical-looking ones and then send them in reply to this
email. Email clients like to wordwrap these traces - please try to
prevent that.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[ 1128.964403] kjournald D 0000000000000000 0 2196 2 0x00000000
[ 1128.964407] ffffffff8144b1f0 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964409] ffff88063d1762e0 ffff880340002c80 000000000000f8a0 ffff88063c6e9fd8
[ 1128.964412] 00000000000155c0 00000000000155c0 ffff88063d1762e0 ffff88063d1765d8
[ 1128.964415] Call Trace:
[ 1128.964423] [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964427] [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964431] [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964433] [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964437] [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964441] [<ffffffff8117b28d>] ? blk_recount_segments+0x17/0x27
[ 1128.964443] [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964447] [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964450] [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964455] [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964463] [<ffffffffa017f5a2>] ? journal_commit_transaction+0x8d9/0xe2b [jbd]
[ 1128.964467] [<ffffffffa01823fb>] ? kjournald+0xdf/0x226 [jbd]
[ 1128.964469] [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964472] [<ffffffffa018231c>] ? kjournald+0x0/0x226 [jbd]
[ 1128.964474] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964479] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964481] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964483] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
And three of (flush-8:32, flush-8:48 and flush-8:64):
[ 1128.964485] flush-8:32 D 0000000000000000 0 2221 2 0x00000000
[ 1128.964488] ffff88033e44b880 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964490] ffff88063d177100 0000001000022c80 000000000000f8a0 ffff88063b66dfd8
[ 1128.964493] 00000000000155c0 00000000000155c0 ffff88063d177100 ffff88063d1773f8
[ 1128.964495] Call Trace:
[ 1128.964497] [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964500] [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964502] [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964504] [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964506] [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964508] [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964511] [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964514] [<ffffffff810114ce>] ? common_interrupt+0xe/0x13
[ 1128.964516] [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964518] [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964522] [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964525] [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964528] [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964532] [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964534] [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964536] [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.964541] [<ffffffff8110642e>] ? writeback_single_inode+0xe7/0x2da
[ 1128.964543] [<ffffffff81107107>] ? writeback_inodes_wb+0x423/0x4fe
[ 1128.964546] [<ffffffff8110730e>] ? wb_writeback+0x12c/0x1ab
[ 1128.964549] [<ffffffff81107574>] ? wb_do_writeback+0x145/0x15b
[ 1128.964551] [<ffffffff811075bb>] ? bdi_writeback_task+0x31/0x9d
[ 1128.964556] [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964558] [<ffffffff810c7d36>] ? bdi_start_fn+0x70/0xca
[ 1128.964560] [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964562] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964564] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964566] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964568] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
And a few copies of bonnie (14 in total):
Two like this:
[ 1128.964735] bonnie D 0000000000000000 0 2240 2236 0x00000000
[ 1128.964737] ffff88033e44b880 0000000000000086 0000000000000000 00000000000009ef
[ 1128.964740] ffff8802bac52000 0000000000430bf5 000000000000f8a0 ffff88033d031fd8
[ 1128.964743] 00000000000155c0 00000000000155c0 ffff88033d811c40 ffff88033d811f38
[ 1128.964745] Call Trace:
[ 1128.964747] [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964749] [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964752] [<ffffffff8110c672>] ? sync_buffer+0x3b/0x40
[ 1128.964754] [<ffffffff812ee339>] ? __wait_on_bit_lock+0x3f/0x84
[ 1128.964756] [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964758] [<ffffffff812ee3e9>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[ 1128.964760] [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964764] [<ffffffffa017e068>] ? do_get_write_access+0x82/0x441 [jbd]
[ 1128.964766] [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964769] [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964774] [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964778] [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964782] [<ffffffffa0199d86>] ? ext3_orphan_add+0x94/0x15b [ext3]
[ 1128.964786] [<ffffffffa01941f8>] ? ext3_mark_inode_dirty+0x33/0x3c [ext3]
[ 1128.964790] [<ffffffffa019a6cb>] ? ext3_unlink+0x166/0x1b4 [ext3]
[ 1128.964793] [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964795] [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964797] [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964800] [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964803] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
One like this:
[ 1128.964804] bonnie D 0000000000000000 0 2241 2236 0x00000000
[ 1128.964807] ffff88033e44b880 0000000000000082 0000000000000000 ffffffff810b90a9
[ 1128.964809] ffff880000022c80 0000000000000000 000000000000f8a0 ffff88033c3abfd8
[ 1128.964812] 00000000000155c0 00000000000155c0 ffff88033d813f90 ffff88033d814288
[ 1128.964814] Call Trace:
[ 1128.964816] [<ffffffff810b90a9>] ? __alloc_pages_nodemask+0x10f/0x5ce
[ 1128.964819] [<ffffffff81064977>] ? bit_waitqueue+0x10/0xa0
[ 1128.964822] [<ffffffffa017e212>] ? do_get_write_access+0x22c/0x441 [jbd]
[ 1128.964824] [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964826] [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964829] [<ffffffff810170cd>] ? read_tsc+0xa/0x20
[ 1128.964832] [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964836] [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964840] [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964844] [<ffffffffa01941e6>] ? ext3_mark_inode_dirty+0x21/0x3c [ext3]
[ 1128.964847] [<ffffffffa019a6b4>] ? ext3_unlink+0x14f/0x1b4 [ext3]
[ 1128.964850] [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964852] [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964854] [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964856] [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964859] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
One like this:
[ 1128.964955] bonnie D 0000000000000000 0 2244 2236 0x00000000
[ 1128.964958] ffffffff8144b1f0 0000000000000082 0000000000000000 ffffffff811837bb
[ 1128.964960] ffff88033d813170 00000010a00fc7ab 000000000000f8a0 ffff88033ceadfd8
[ 1128.964963] 00000000000155c0 00000000000155c0 ffff88033d813170 ffff88033d813468
[ 1128.964965] Call Trace:
[ 1128.964967] [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964969] [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964971] [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964973] [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964976] [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964978] [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964980] [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964983] [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964985] [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964987] [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964990] [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964992] [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964994] [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964997] [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964999] [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.965003] [<ffffffff810b3cda>] ? __filemap_fdatawrite_range+0x4b/0x54
[ 1128.965007] [<ffffffffa019b5bb>] ? ext3_sync_fs+0x42/0x4b [ext3]
[ 1128.965009] [<ffffffff810b4998>] ? filemap_write_and_wait+0x17/0x32
[ 1128.965013] [<ffffffff8112d015>] ? sync_quota_sb+0x5d/0xf6
[ 1128.965015] [<ffffffff8110a38d>] ? __sync_filesystem+0x43/0x70
[ 1128.965017] [<ffffffff8110a454>] ? sync_filesystems+0x9a/0xe3
[ 1128.965019] [<ffffffff8110a4e2>] ? sys_sync+0x1c/0x2e
[ 1128.965021] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Seven bonnies like this:
[ 1128.965231] bonnie D 0000000000000000 0 2253 2236 0x00000000
[ 1128.965233] ffffffff8144b1f0 0000000000000082 0000000000000000 ffff88034ac155c0
[ 1128.965236] ffff88033ce7f700 ffffffff81047f43 000000000000f8a0 ffff88033c249fd8
[ 1128.965239] 00000000000155c0 00000000000155c0 ffff88033d814db0 ffff88033d8150a8
[ 1128.965241] Call Trace:
[ 1128.965243] [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
[ 1128.965246] [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965248] [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965250] [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965252] [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965255] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
And three bonnies like this:
[ 1128.965023] bonnie D 0000000000000000 0 2245 2236 0x00000000
[ 1128.965026] ffff88033e44b880 0000000000000082 0000000000000000 00000000000155c0
[ 1128.965028] 0000000000000005 00000000000155c0 000000000000f8a0 ffff88033cfc3fd8
[ 1128.965031] 00000000000155c0 00000000000155c0 ffff88033d817100 ffff88033d8173f8
[ 1128.965033] Call Trace:
[ 1128.965035] [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965038] [<ffffffff8104a196>] ? try_to_wake_up+0x249/0x259
[ 1128.965040] [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965042] [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965044] [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965047] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Is IO still going on, or does it appear to be stuck? From the traces
below, we have various procs caught in waiting for a request. So if
things are totally stuck, it could be some race in there.
--
Jens Axboe
Just a thought. What exactly is sync(2) supposed to do - block until
there are no more dirty pages, or block until all pages that were dirty
when the sync was done are clean? In other words is the problem simply
that pages are being dirtied faster than the sync is writing them out?
Our sync is currently broken in that regard, since it'll wait for too
long. We have a debated patch going, I have included it below. Any
chance you could give it a whirl?
The semantics of sync are supposed to be 'wait for dirty IO generated
BEFORE this sync call'.
--
Jens Axboe
Looking at your bug report, I saw that you were on 2.6.32. If you are
testing 2.6.33, let me know, the patch wont apply (I have another
variant for that).
--
Jens Axboe
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
That's the same patch you posted two weeks ago and on which Linus
commented he thinks it is broken. He also proposed another, simpler
patch (included below) which I'm running for a few days and which
fixes the sync problem for me.
However, it is not clear at all whether that simpler patch is
correct. Any comments on that?
Thanks
Andre
---
fs/fs-writeback.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1a7c42c..a0a8424 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -738,11 +738,16 @@ static long wb_writeback(struct bdi_writeback *wb,
long wrote = 0;
struct inode *inode;
- if (wbc.for_kupdate) {
- wbc.older_than_this = &oldest_jif;
- oldest_jif = jiffies -
- msecs_to_jiffies(dirty_expire_interval * 10);
- }
+ /*
+ * We never write back data that was dirtied _after_ we
+ * started writeback. But kupdate doesn't even want to
+ * write back recently dirtied stuff, only older data.
+ */
+ oldest_jif = jiffies-1;
+ wbc.older_than_this = &oldest_jif;
+ if (wbc.for_kupdate)
+ oldest_jif -= msecs_to_jiffies(dirty_expire_interval * 10);
+
if (!wbc.range_cyclic) {
wbc.range_start = 0;
wbc.range_end = LLONG_MAX;
--
The only person who always got his work done by Friday was Robinson Crusoe
I'll try the "simpler" patch.
Does it make no difference to Jens' patch, or no difference to an
unpatched kernel?
BTW, switching from cfq to noop or deadline also improved things a
bit for me.
Thanks
Andre