INFO: task hung in mpage_prepare_extent_to_map

44 views
Skip to first unread message

syzbot

unread,
Oct 28, 2019, 3:52:10 PM10/28/19
to ak...@linux-foundation.org, amir...@gmail.com, darric...@oracle.com, han...@cmpxchg.org, hu...@google.com, ja...@suse.cz, jgl...@redhat.com, jo...@toxicpanda.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, s...@canb.auug.org.au, songliu...@fb.com, syzkall...@googlegroups.com, william....@oracle.com, wi...@infradead.org
Hello,

syzbot found the following crash on:

HEAD commit: 12d61c69 Add linux-next specific files for 20191024
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15a0fa97600000
kernel config: https://syzkaller.appspot.com/x/.config?x=afb75fd8c9fd5ed8
dashboard link: https://syzkaller.appspot.com/bug?extid=efb9e48b9fbdc49bb34a
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13a63dc4e00000

The bug was bisected to:

commit 9c61acffe2b8833152041f7b6a02d1d0a17fd378
Author: Song Liu <songliu...@fb.com>
Date: Wed Oct 23 00:24:28 2019 +0000

mm,thp: recheck each page before collapsing file THP

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13eb6ec0e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=101b6ec0e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=17eb6ec0e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+efb9e4...@syzkaller.appspotmail.com
Fixes: 9c61acffe2b8 ("mm,thp: recheck each page before collapsing file THP")

INFO: task khugepaged:1084 blocked for more than 143 seconds.
Not tainted 5.4.0-rc4-next-20191024 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
khugepaged D27568 1084 2 0x80004000
Call Trace:
context_switch kernel/sched/core.c:3384 [inline]
__schedule+0x94a/0x1e70 kernel/sched/core.c:4069
schedule+0xd9/0x260 kernel/sched/core.c:4136
io_schedule+0x1c/0x70 kernel/sched/core.c:5780
wait_on_page_bit_common mm/filemap.c:1175 [inline]
__lock_page+0x422/0xab0 mm/filemap.c:1383
lock_page include/linux/pagemap.h:480 [inline]
mpage_prepare_extent_to_map+0xb3f/0xf90 fs/ext4/inode.c:2668
ext4_writepages+0xb6a/0x2e70 fs/ext4/inode.c:2866
? 0xffffffff81000000
do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344
__filemap_fdatawrite_range+0x2bc/0x3b0 mm/filemap.c:421
__filemap_fdatawrite mm/filemap.c:429 [inline]
filemap_flush+0x24/0x30 mm/filemap.c:456
collapse_file+0x36b1/0x41a0 mm/khugepaged.c:1652
khugepaged_scan_file mm/khugepaged.c:1890 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:1988 [inline]
khugepaged_do_scan mm/khugepaged.c:2072 [inline]
khugepaged+0x2da9/0x4360 mm/khugepaged.c:2117
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Showing all locks held in the system:
4 locks held by kworker/u4:0/7:
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at:
__write_once_size include/linux/compiler.h:226 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at: atomic64_set
include/asm-generic/atomic-instrumented.h:855 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at: atomic_long_set
include/asm-generic/atomic-long.h:40 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at: set_work_data
kernel/workqueue.c:620 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline]
#0: ffff8880a8284d28 ((wq_completion)writeback){+.+.}, at:
process_one_work+0x88b/0x1740 kernel/workqueue.c:2240
#1: ffff8880a988fdc0 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at:
process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244
#2: ffff88809b03a0d8 (&type->s_umount_key#32){++++}, at:
trylock_super+0x22/0x110 fs/super.c:418
#3: ffff88809b03c990 (&sbi->s_journal_flag_rwsem){.+.+}, at:
do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344
1 lock held by khungtaskd/1077:
#0: ffffffff88faba80 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5336
1 lock held by khugepaged/1084:
#0: ffff88809b03c990 (&sbi->s_journal_flag_rwsem){.+.+}, at:
do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344
1 lock held by rsyslogd/8623:
#0: ffff888098f5d860 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
fs/file.c:801
2 locks held by getty/8713:
#0: ffff8880a1ca4090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8714:
#0: ffff8880a0faf090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8715:
#0: ffff8880a0b1e090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8716:
#0: ffff8880a9486090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8717:
#0: ffff8880a161a090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8718:
#0: ffff888095db3090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
2 locks held by getty/8719:
#0: ffff88809b09e090 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340
#1: ffffc90005f092e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156
1 lock held by syz-execprog/8740:
#0: ffff8880a657d3e8 (&ei->i_mmap_sem){++++}, at:
ext4_filemap_fault+0x7e/0xb2 fs/ext4/inode.c:6291

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1077 Comm: khungtaskd Not tainted 5.4.0-rc4-next-20191024 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline]
watchdog+0xc8f/0x1350 kernel/hung_task.c:353
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-next-20191024 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:update_group_capacity+0xa/0x920 kernel/sched/fair.c:7647
Code: e8 0b 84 5b 00 e9 57 fd ff ff 48 8b 7d c8 e8 1d 84 5b 00 e9 49 fe ff
ff 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff df <55> 48 89 e5 41
57 41 56 49 89 fe 48 83 c7 08 48 89 fa 41 55 48 c1
RSP: 0018:ffff8880ae809878 EFLAGS: 00000246
RAX: dffffc0000000000 RBX: ffff8880ae809a50 RCX: ffffffff8153db8d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a9813000
RBP: ffff8880ae809b20 R08: 1ffff110153038a4 R09: ffffed10153038a5
R10: ffffed10153038a4 R11: ffff8880a981c527 R12: dffffc0000000000
R13: ffff8880a981c520 R14: ffff8880ae809af8 R15: ffff8880ae809c30
FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c420000240 CR3: 00000000a5df3000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
load_balance+0x389/0x2a90 kernel/sched/fair.c:8987
rebalance_domains+0x66a/0xba0 kernel/sched/fair.c:9413
_nohz_idle_balance+0x336/0x3f0 kernel/sched/fair.c:9826
nohz_idle_balance kernel/sched/fair.c:9872 [inline]
run_rebalance_domains+0x1c6/0x2d0 kernel/sched/fair.c:10056
__do_softirq+0x262/0x98c kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x19b/0x1e0 kernel/softirq.c:413
scheduler_ipi+0x38c/0x610 kernel/sched/core.c:2347
smp_reschedule_interrupt+0x78/0x4c0 arch/x86/kernel/smp.c:244
reschedule_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:853
</IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: a8 95 5a fa eb 8a 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 84 a7 53
00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 74 a7 53 00 fb f4 <c3> 90 55 48 89
e5 41 57 41 56 41 55 41 54 53 e8 be 75 0c fa e8 79
RSP: 0018:ffffffff88e07ce8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff02
RAX: 1ffffffff11e643f RBX: ffffffff88e7a1c0 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff88e7aa5c
RBP: ffffffff88e07d18 R08: ffffffff88e7a1c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff89c81680 R14: 0000000000000000 R15: 0000000000000000
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x84/0xb0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x3b7/0x6e0 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
rest_init+0x23b/0x371 init/main.c:451
arch_call_rest_init+0xe/0x1b
start_kernel+0x904/0x943 init/main.c:784
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Johannes Weiner

unread,
Oct 28, 2019, 4:14:50 PM10/28/19
to syzbot, ak...@linux-foundation.org, amir...@gmail.com, darric...@oracle.com, hu...@google.com, ja...@suse.cz, jgl...@redhat.com, jo...@toxicpanda.com, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, s...@canb.auug.org.au, songliu...@fb.com, syzkall...@googlegroups.com, william....@oracle.com, wi...@infradead.org
This is a double locking deadlock. The page lock is already held when
we call into filemap_flush() here, and does another lock_page() in
write_cache_pages().

To fix it, we have to either initiate flushing before acquiring the
page lock, or simply skip over dirty pages.

Maybe doing vfs_fsync_range() from the madvise(HUGEPAGE) call isn't a
bad idea after all? (I had discussed this with Song off-list before.)

Song Liu

unread,
Oct 28, 2019, 6:16:31 PM10/28/19
to Johannes Weiner, syzbot, Andrew Morton, amir...@gmail.com, darric...@oracle.com, hu...@google.com, ja...@suse.cz, jgl...@redhat.com, Josef Bacik, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, s...@canb.auug.org.au, syzkall...@googlegroups.com, william....@oracle.com, wi...@infradead.org


> On Oct 28, 2019, at 1:14 PM, Johannes Weiner <han...@cmpxchg.org> wrote:
>
> On Mon, Oct 28, 2019 at 12:52:09PM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: 12d61c69 Add linux-next specific files for 20191024
>> git tree: linux-next
>> console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D15a0fa97600000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=6-TXLGQxJcK1GdwMwa51423Y221rRncNiC_T09O0OLc&e=
>> kernel config: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_.config-3Fx-3Dafb75fd8c9fd5ed8&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=GuFgLJZOb7jtjZ5mDbkVT_zqtiVW4Py13e6Oq5CFxgY&e=
>> dashboard link: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fextid-3Defb9e48b9fbdc49bb34a&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=pF1hv-zGR8F378weGq9zxCE5ibI2_73qweMB_KuaZLM&e=
>> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_repro.syz-3Fx-3D13a63dc4e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=mI7ZOgrDWeG-p6vn2d_kj65a5g8J7exXJ2MIUUF84-w&e=
>>
>> The bug was bisected to:
>>
>> commit 9c61acffe2b8833152041f7b6a02d1d0a17fd378
>> Author: Song Liu <songliu...@fb.com>
>> Date: Wed Oct 23 00:24:28 2019 +0000
>>
>> mm,thp: recheck each page before collapsing file THP
>>
>> bisection log: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_bisect.txt-3Fx-3D13eb6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=YtSUy5Dtjo6tek7CvwzMTPL40BJwOC6rEom-AkVx0SM&e=
>> final crash: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_report.txt-3Fx-3D101b6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=BvPJx3QSPHgsN12jSZci_MqW_VxYp-MZpQtogZjlJOo&e=
>> console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D17eb6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=YPvxWpQDpk9MI9W6QCtxME64wmxL2CZ5ZtEkCn0nI0c&e=
Thanks syzbot and Johannes!

I just sent a quick fix, that just removes filemap_flush().

I will work on a better mechanism to flush the file.

Thanks,
Song

syzbot

unread,
Oct 29, 2019, 1:29:27 PM10/29/19
to Song Liu, han...@cmpxchg.org, songliu...@fb.com, syzkall...@googlegroups.com


>> On Oct 28, 2019, at 12:52 PM, syzbot
>> <syzbot+efb9e4...@syzkaller.appspotmail.com> wrote:

>> Hello,

>> syzbot found the following crash on:

>> HEAD commit: 12d61c69 Add linux-next specific files for 20191024
>> git tree: linux-next
>> console output:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D15a0fa97600000&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=8xopa1kL6D4q16eN80by-AdhG_u4mPdQI8NUg01pTwE&e=kernel
>> config:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_.config-3Fx-3Dafb75fd8c9fd5ed8&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=zq8S3lA2nC0OiuRNhYgLtAiD-gVlEPb899-80KAV4x8&e=dashboard
>> link:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fextid-3Defb9e48b9fbdc49bb34a&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=64e1feK3_khctpTqAj7KCxLYqojvYMp8fcTXiBUxldo&e=compiler:
>> gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_repro.syz-3Fx-3D13a63dc4e00000&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=TsNRueyKNVKWdJmbllDx3UOOaqPaT1qdlUHeFbBUoLU&e=
>> The bug was bisected to:

>> commit 9c61acffe2b8833152041f7b6a02d1d0a17fd378
>> Author: Song Liu <songliu...@fb.com>
>> Date: Wed Oct 23 00:24:28 2019 +0000

>> mm,thp: recheck each page before collapsing file THP

>> bisection log:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_bisect.txt-3Fx-3D13eb6ec0e00000&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=AejmOCnrU_kgxOYz3VPoPg7-z9THOwGzViidIS_SJC8&e=final
>> crash:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_report.txt-3Fx-3D101b6ec0e00000&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=8symcB1WWaRIHQWF6PxHIfcj7_bOmesk_t71qNFc0Lw&e=console
>> output:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D17eb6ec0e00000&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=_DGbcVOchqR9C-e9_HNSG-UvNv8wqM0WteMFaQbgxE4&e=
>> IMPORTANT: if you fix the bug, please add the following tag to the
>> commit:
>> Reported-by: syzbot+efb9e4...@syzkaller.appspotmail.com
>> Fixes: 9c61acffe2b8 ("mm,thp: recheck each page before collapsing file
>> THP")

>> INFO: task khugepaged:1084 blocked for more than 143 seconds.
>> Not tainted 5.4.0-rc4-next-20191024 #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> khugepaged D27568 1084 2 0x80004000
>> Call Trace:
>> context_switch kernel/sched/core.c:3384 [inline]
>> __schedule+0x94a/0x1e70 kernel/sched/core.c:4069
>> schedule+0xd9/0x260 kernel/sched/core.c:4136
>> io_schedule+0x1c/0x70 kernel/sched/core.c:5780
>> wait_on_page_bit_common mm/filemap.c:1175 [inline]
>> __lock_page+0x422/0xab0 mm/filemap.c:1383
>> lock_page include/linux/pagemap.h:480 [inline]
>> mpage_prepare_extent_to_map+0xb3f/0xf90 fs/ext4/inode.c:2668
>> ext4_writepages+0xb6a/0x2e70 fs/ext4/inode.c:2866
>> ? 0xffffffff81000000
>> do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344
>> __filemap_fdatawrite_range+0x2bc/0x3b0 mm/filemap.c:421
>> __filemap_fdatawrite mm/filemap.c:429 [inline]
>> filemap_flush+0x24/0x30 mm/filemap.c:456
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=37YYOcsNtmChiTZybUFcvyVgQPNWl7s8chb33-gc4ns&e=
>> for more information about syzbot.
>> syzbot engineers can be reached at syzk...@googlegroups.com.

>> syzbot will keep track of this bug report. See:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ-23status&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=zwBUEpLLEqsBhpA5FmP_Kh8QKDjLsO-xnaRrNlREtT0&e=
>> for
>> how to communicate with syzbot.
>> For information about bisection process see:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ-23bisection&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=gZ2Jt8IjuOdLU-Uy1qRANaWIFmFnSQRktMR4MDkbjlY&e=syzbot
>> can test patches for this bug, for details see:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ-23testing-2Dpatches&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=uME-Ka9JpEZ_EcXt5BnlmXDKKn2wPCPWjD3xHLFRaNY&s=SZaISsUCW3SJNiOu2nVltbd5z354hnBVn0Hf5vUrNuQ&e=

> #syz test: g...@github.com:liu-song-6/linux.git thp-fix-20191028

"g...@github.com:liu-song-6/linux.git" does not look like a valid git repo
address.



Song Liu

unread,
Oct 29, 2019, 1:54:23 PM10/29/19
to syzbot, han...@cmpxchg.org, syzkall...@googlegroups.com
#syz test: https://github.com/liu-song-6/linux.git thp-fix-20191028

syzbot

unread,
Oct 29, 2019, 3:09:02 PM10/29/19
to han...@cmpxchg.org, songliu...@fb.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
possible deadlock in __filemap_fdatawrite_range

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.4.0-rc5-next-20191028+ #0 Not tainted
-----------------------------------------------------
khugepaged/1083 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff888090de7928 (&sb->s_type->i_lock_key#23){+.+.}, at: spin_lock
include/linux/spinlock.h:338 [inline]
ffff888090de7928 (&sb->s_type->i_lock_key#23){+.+.}, at:
wbc_attach_fdatawrite_inode include/linux/writeback.h:266 [inline]
ffff888090de7928 (&sb->s_type->i_lock_key#23){+.+.}, at:
__filemap_fdatawrite_range+0x26e/0x3b0 mm/filemap.c:420

and this task is already holding:
ffff888090de7ab0 (&(&xa->xa_lock)->rlock#4){..-.}, at: spin_lock_irq
include/linux/spinlock.h:363 [inline]
ffff888090de7ab0 (&(&xa->xa_lock)->rlock#4){..-.}, at:
collapse_file+0x24d/0x4580 mm/khugepaged.c:1524
which would create a new lock dependency:
(&(&xa->xa_lock)->rlock#4){..-.} -> (&sb->s_type->i_lock_key#23){+.+.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&xa->xa_lock)->rlock#4){..-.}

... which became SOFTIRQ-irq-safe at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
test_clear_page_writeback+0x1da/0x11b0 mm/page-writeback.c:2728
end_page_writeback+0x244/0x530 mm/filemap.c:1339
end_buffer_async_write+0x679/0x980 fs/buffer.c:349
end_bio_bh_io_sync+0xed/0x140 fs/buffer.c:3015
bio_endio+0x609/0xaf0 block/bio.c:1818
req_bio_endio block/blk-core.c:271 [inline]
blk_update_request+0x49e/0x10d0 block/blk-core.c:1491
scsi_end_request+0x7f/0x830 drivers/scsi/scsi_lib.c:579
scsi_io_completion+0x20a/0x1420 drivers/scsi/scsi_lib.c:963
scsi_finish_command+0x3b7/0x670 drivers/scsi/scsi.c:228
scsi_softirq_done+0x326/0x3b0 drivers/scsi/scsi_lib.c:1477
blk_done_softirq+0x2fe/0x4d0 block/blk-softirq.c:37
__do_softirq+0x262/0x98c kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x19b/0x1e0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
do_IRQ+0xe3/0x280 arch/x86/kernel/irq.c:263
ret_from_intr+0x0/0x36
native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x84/0xb0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x3b7/0x6e0 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
start_secondary+0x2f4/0x410 arch/x86/kernel/smpboot.c:264
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242

to a SOFTIRQ-irq-unsafe lock:
(&sb->s_type->i_lock_key#23){+.+.}

... which became SOFTIRQ-irq-unsafe at:
...
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
iget_locked+0x320/0x4b0 fs/inode.c:1193
__ext4_iget+0x265/0x3e20 fs/ext4/inode.c:4835
ext4_fill_super+0x7b11/0xcdb0 fs/ext4/super.c:4489
mount_bdev+0x304/0x3c0 fs/super.c:1415
ext4_mount+0x35/0x40 fs/ext4/super.c:6039
legacy_get_tree+0x108/0x220 fs/fs_context.c:647
vfs_get_tree+0x8e/0x300 fs/super.c:1545
do_new_mount fs/namespace.c:2822 [inline]
do_mount+0x135a/0x1b50 fs/namespace.c:3142
ksys_mount+0xdb/0x150 fs/namespace.c:3351
do_mount_root+0x35/0x1d3 init/do_mounts.c:393
mount_block_root+0x353/0x61d init/do_mounts.c:422
mount_root+0x283/0x2cd init/do_mounts.c:612
prepare_namespace+0x26f/0x2ae init/do_mounts.c:671
kernel_init_freeable+0x5a0/0x5b9 init/main.c:1210
kernel_init+0x12/0x1bf init/main.c:1109
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sb->s_type->i_lock_key#23);
local_irq_disable();
lock(&(&xa->xa_lock)->rlock#4);
lock(&sb->s_type->i_lock_key#23);
<Interrupt>
lock(&(&xa->xa_lock)->rlock#4);

*** DEADLOCK ***

1 lock held by khugepaged/1083:
#0: ffff888090de7ab0 (&(&xa->xa_lock)->rlock#4){..-.}, at: spin_lock_irq
include/linux/spinlock.h:363 [inline]
#0: ffff888090de7ab0 (&(&xa->xa_lock)->rlock#4){..-.}, at:
collapse_file+0x24d/0x4580 mm/khugepaged.c:1524

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&(&xa->xa_lock)->rlock#4){..-.} {
IN-SOFTIRQ-W at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock_irqsave
include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd
kernel/locking/spinlock.c:159
test_clear_page_writeback+0x1da/0x11b0
mm/page-writeback.c:2728
end_page_writeback+0x244/0x530 mm/filemap.c:1339
end_buffer_async_write+0x679/0x980 fs/buffer.c:349
end_bio_bh_io_sync+0xed/0x140 fs/buffer.c:3015
bio_endio+0x609/0xaf0 block/bio.c:1818
req_bio_endio block/blk-core.c:271 [inline]
blk_update_request+0x49e/0x10d0 block/blk-core.c:1491
scsi_end_request+0x7f/0x830 drivers/scsi/scsi_lib.c:579
scsi_io_completion+0x20a/0x1420
drivers/scsi/scsi_lib.c:963
scsi_finish_command+0x3b7/0x670 drivers/scsi/scsi.c:228
scsi_softirq_done+0x326/0x3b0
drivers/scsi/scsi_lib.c:1477
blk_done_softirq+0x2fe/0x4d0 block/blk-softirq.c:37
__do_softirq+0x262/0x98c kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x19b/0x1e0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
do_IRQ+0xe3/0x280 arch/x86/kernel/irq.c:263
ret_from_intr+0x0/0x36
native_safe_halt+0xe/0x10
arch/x86/include/asm/irqflags.h:60
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x84/0xb0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x3b7/0x6e0 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
start_secondary+0x2f4/0x410
arch/x86/kernel/smpboot.c:264
secondary_startup_64+0xa4/0xb0
arch/x86/kernel/head_64.S:242
INITIAL USE at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:128
[inline]
_raw_spin_lock_irq+0x60/0x80
kernel/locking/spinlock.c:167
spin_lock_irq include/linux/spinlock.h:363 [inline]
__add_to_page_cache_locked+0x677/0xec0 mm/filemap.c:877
add_to_page_cache_lru+0x1d8/0x790 mm/filemap.c:943
do_read_cache_page+0x9fc/0x2140 mm/filemap.c:2770
read_cache_page+0x5e/0x70 mm/filemap.c:2874
read_mapping_page include/linux/pagemap.h:396 [inline]
read_dev_sector+0x71/0x310 block/partition-generic.c:667
read_part_sector block/partitions/check.h:38 [inline]
adfspart_check_ICS+0x12d/0xc90
block/partitions/acorn.c:361
check_partition+0x3bc/0x6ce block/partitions/check.c:167
rescan_partitions+0x230/0xa30
block/partition-generic.c:531
__blkdev_get+0xbae/0x1600 fs/block_dev.c:1599
blkdev_get+0x47/0x2c0 fs/block_dev.c:1707
register_disk block/genhd.c:655 [inline]
__device_add_disk+0xabf/0x1230 block/genhd.c:745
device_add_disk+0x2b/0x40 block/genhd.c:763
add_disk include/linux/genhd.h:429 [inline]
brd_init+0x237/0x41c drivers/block/brd.c:514
do_one_initcall+0x120/0x81a init/main.c:938
do_initcall_level init/main.c:1006 [inline]
do_initcalls init/main.c:1014 [inline]
do_basic_setup init/main.c:1031 [inline]
kernel_init_freeable+0x4ca/0x5b9 init/main.c:1191
kernel_init+0x12/0x1bf init/main.c:1109
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
}
... key at: [<ffffffff8ab268c0>] __key.18618+0x0/0x40
... acquired at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
wbc_attach_fdatawrite_inode include/linux/writeback.h:266 [inline]
__filemap_fdatawrite_range+0x26e/0x3b0 mm/filemap.c:420
__filemap_fdatawrite mm/filemap.c:429 [inline]
filemap_flush+0x24/0x30 mm/filemap.c:456
collapse_file+0x3b28/0x4580 mm/khugepaged.c:1609
khugepaged_scan_file mm/khugepaged.c:1890 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:1988 [inline]
khugepaged_do_scan mm/khugepaged.c:2072 [inline]
khugepaged+0x2da9/0x4360 mm/khugepaged.c:2117
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


the dependencies between the lock to be acquired
and SOFTIRQ-irq-unsafe lock:
-> (&sb->s_type->i_lock_key#23){+.+.} {
HARDIRQ-ON-W at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142
[inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
iget_locked+0x320/0x4b0 fs/inode.c:1193
__ext4_iget+0x265/0x3e20 fs/ext4/inode.c:4835
ext4_fill_super+0x7b11/0xcdb0 fs/ext4/super.c:4489
mount_bdev+0x304/0x3c0 fs/super.c:1415
ext4_mount+0x35/0x40 fs/ext4/super.c:6039
legacy_get_tree+0x108/0x220 fs/fs_context.c:647
vfs_get_tree+0x8e/0x300 fs/super.c:1545
do_new_mount fs/namespace.c:2822 [inline]
do_mount+0x135a/0x1b50 fs/namespace.c:3142
ksys_mount+0xdb/0x150 fs/namespace.c:3351
do_mount_root+0x35/0x1d3 init/do_mounts.c:393
mount_block_root+0x353/0x61d init/do_mounts.c:422
mount_root+0x283/0x2cd init/do_mounts.c:612
prepare_namespace+0x26f/0x2ae init/do_mounts.c:671
kernel_init_freeable+0x5a0/0x5b9 init/main.c:1210
kernel_init+0x12/0x1bf init/main.c:1109
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
SOFTIRQ-ON-W at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142
[inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
iget_locked+0x320/0x4b0 fs/inode.c:1193
__ext4_iget+0x265/0x3e20 fs/ext4/inode.c:4835
ext4_fill_super+0x7b11/0xcdb0 fs/ext4/super.c:4489
mount_bdev+0x304/0x3c0 fs/super.c:1415
ext4_mount+0x35/0x40 fs/ext4/super.c:6039
legacy_get_tree+0x108/0x220 fs/fs_context.c:647
vfs_get_tree+0x8e/0x300 fs/super.c:1545
do_new_mount fs/namespace.c:2822 [inline]
do_mount+0x135a/0x1b50 fs/namespace.c:3142
ksys_mount+0xdb/0x150 fs/namespace.c:3351
do_mount_root+0x35/0x1d3 init/do_mounts.c:393
mount_block_root+0x353/0x61d init/do_mounts.c:422
mount_root+0x283/0x2cd init/do_mounts.c:612
prepare_namespace+0x26f/0x2ae init/do_mounts.c:671
kernel_init_freeable+0x5a0/0x5b9 init/main.c:1210
kernel_init+0x12/0x1bf init/main.c:1109
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
INITIAL USE at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142
[inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
iget_locked+0x320/0x4b0 fs/inode.c:1193
__ext4_iget+0x265/0x3e20 fs/ext4/inode.c:4835
ext4_fill_super+0x7b11/0xcdb0 fs/ext4/super.c:4489
mount_bdev+0x304/0x3c0 fs/super.c:1415
ext4_mount+0x35/0x40 fs/ext4/super.c:6039
legacy_get_tree+0x108/0x220 fs/fs_context.c:647
vfs_get_tree+0x8e/0x300 fs/super.c:1545
do_new_mount fs/namespace.c:2822 [inline]
do_mount+0x135a/0x1b50 fs/namespace.c:3142
ksys_mount+0xdb/0x150 fs/namespace.c:3351
do_mount_root+0x35/0x1d3 init/do_mounts.c:393
mount_block_root+0x353/0x61d init/do_mounts.c:422
mount_root+0x283/0x2cd init/do_mounts.c:612
prepare_namespace+0x26f/0x2ae init/do_mounts.c:671
kernel_init_freeable+0x5a0/0x5b9 init/main.c:1210
kernel_init+0x12/0x1bf init/main.c:1109
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
}
... key at: [<ffffffff890ded28>] ext4_fs_type+0xa8/0x100
... acquired at:
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
wbc_attach_fdatawrite_inode include/linux/writeback.h:266 [inline]
__filemap_fdatawrite_range+0x26e/0x3b0 mm/filemap.c:420
__filemap_fdatawrite mm/filemap.c:429 [inline]
filemap_flush+0x24/0x30 mm/filemap.c:456
collapse_file+0x3b28/0x4580 mm/khugepaged.c:1609
khugepaged_scan_file mm/khugepaged.c:1890 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:1988 [inline]
khugepaged_do_scan mm/khugepaged.c:2072 [inline]
khugepaged+0x2da9/0x4360 mm/khugepaged.c:2117
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


stack backtrace:
CPU: 0 PID: 1083 Comm: khugepaged Not tainted 5.4.0-rc5-next-20191028+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_bad_irq_dependency kernel/locking/lockdep.c:2095 [inline]
check_irq_usage.cold+0x586/0x6fe kernel/locking/lockdep.c:2293
check_prev_add kernel/locking/lockdep.c:2480 [inline]
check_prevs_add kernel/locking/lockdep.c:2581 [inline]
validate_chain kernel/locking/lockdep.c:2971 [inline]
__lock_acquire+0x25b4/0x4a00 kernel/locking/lockdep.c:3955
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
wbc_attach_fdatawrite_inode include/linux/writeback.h:266 [inline]
__filemap_fdatawrite_range+0x26e/0x3b0 mm/filemap.c:420
__filemap_fdatawrite mm/filemap.c:429 [inline]
filemap_flush+0x24/0x30 mm/filemap.c:456
collapse_file+0x3b28/0x4580 mm/khugepaged.c:1609
khugepaged_scan_file mm/khugepaged.c:1890 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:1988 [inline]
khugepaged_do_scan mm/khugepaged.c:2072 [inline]
khugepaged+0x2da9/0x4360 mm/khugepaged.c:2117
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
BUG: sleeping function called from invalid context at
include/linux/percpu-rwsem.h:38
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1083, name:
khugepaged
INFO: lockdep is turned off.
irq event stamp: 58
hardirqs last enabled at (57): [<ffffffff81a6f157>] rmqueue
mm/page_alloc.c:3304 [inline]
hardirqs last enabled at (57): [<ffffffff81a6f157>]
get_page_from_freelist+0x3437/0x4330 mm/page_alloc.c:3692
hardirqs last disabled at (58): [<ffffffff8755523a>] __raw_spin_lock_irq
include/linux/spinlock_api_smp.h:126 [inline]
hardirqs last disabled at (58): [<ffffffff8755523a>]
_raw_spin_lock_irq+0x3a/0x80 kernel/locking/spinlock.c:167
softirqs last enabled at (0): [<ffffffff8143e8a2>]
copy_process+0x1822/0x6880 kernel/fork.c:2019
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffffffff81b2c5ad>] spin_lock_irq include/linux/spinlock.h:363 [inline]
[<ffffffff81b2c5ad>] collapse_file+0x24d/0x4580 mm/khugepaged.c:1524
CPU: 0 PID: 1083 Comm: khugepaged Not tainted 5.4.0-rc5-next-20191028+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
___might_sleep.cold+0x1fb/0x23e kernel/sched/core.c:6788
__might_sleep+0x95/0x190 kernel/sched/core.c:6741
percpu_down_read include/linux/percpu-rwsem.h:38 [inline]
ext4_writepages+0x1cb/0x2e70 fs/ext4/inode.c:2728
do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344
__filemap_fdatawrite_range+0x2bc/0x3b0 mm/filemap.c:421
__filemap_fdatawrite mm/filemap.c:429 [inline]
filemap_flush+0x24/0x30 mm/filemap.c:456
collapse_file+0x3b28/0x4580 mm/khugepaged.c:1609
khugepaged_scan_file mm/khugepaged.c:1890 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:1988 [inline]
khugepaged_do_scan mm/khugepaged.c:2072 [inline]
khugepaged+0x2da9/0x4360 mm/khugepaged.c:2117
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


Tested on:

commit: 8de721f7 thp: proper flush
git tree: https://github.com/liu-song-6/linux.git thp-fix-20191028
console output: https://syzkaller.appspot.com/x/log.txt?x=12fa8b74e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=cb86688f30db053d
dashboard link: https://syzkaller.appspot.com/bug?extid=efb9e48b9fbdc49bb34a

syzbot

unread,
Oct 29, 2019, 4:33:01 PM10/29/19
to han...@cmpxchg.org, songliu...@fb.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+efb9e4...@syzkaller.appspotmail.com

Tested on:

commit: a318b5ba thp: proper flush
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git
thp-fix-20191028
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Note: testing is done by a robot and is best-effort only.

Dmitry Vyukov

unread,
Nov 4, 2019, 3:04:18 AM11/4/19
to Song Liu, Johannes Weiner, syzbot, Andrew Morton, amir...@gmail.com, darric...@oracle.com, hu...@google.com, ja...@suse.cz, jgl...@redhat.com, Josef Bacik, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, s...@canb.auug.org.au, syzkall...@googlegroups.com, william....@oracle.com, wi...@infradead.org
Is this expected to reach linux-next soon?
It's still not there and in the past days this crash happened 17K+
times and effectively stalled linux-next testing:
https://syzkaller.appspot.com/bug?id=4a3b0ba28ec7d0277338be02e1331068504dc228
Reply all
Reply to author
Forward
0 new messages