[syzbot] INFO: task hung in nilfs_segctor_thread

16 views
Skip to first unread message

syzbot

unread,
Nov 9, 2022, 3:32:47 PM11/9/22
to konishi...@gmail.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 089d1c31224e Merge tag 'for-linus' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=163a050e880000
kernel config: https://syzkaller.appspot.com/x/.config?x=f7e100ed8aaa828e
dashboard link: https://syzkaller.appspot.com/bug?extid=f0c4082ce5ebebdac63b
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/a02b3b8ebb13/disk-089d1c31.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/52a3c8951f42/vmlinux-089d1c31.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d370e2467349/bzImage-089d1c31.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f0c408...@syzkaller.appspotmail.com

INFO: task segctord:5652 blocked for more than 143 seconds.
Not tainted 6.1.0-rc3-syzkaller-00332-g089d1c31224e #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23704 pid:5652 ppid:2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x8c9/0xd70 kernel/sched/core.c:6503
schedule+0xcb/0x190 kernel/sched/core.c:6579
rwsem_down_write_slowpath+0xfc1/0x1480 kernel/locking/rwsem.c:1190
__down_write_common kernel/locking/rwsem.c:1305 [inline]
__down_write kernel/locking/rwsem.c:1314 [inline]
down_write+0x231/0x270 kernel/locking/rwsem.c:1563
nilfs_transaction_lock+0x246/0x4b0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x593/0x11c0 fs/nilfs2/segment.c:2570
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/12:
#0: ffffffff8cb22630 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/13:
#0: ffffffff8cb22e30 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x30/0xd00 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/28:
#0: ffffffff8cb22460 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
3 locks held by udevd/2974:
2 locks held by getty/3289:
#0: ffff888027ad0098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:244
#1: ffffc900031262f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x53b/0x1650 drivers/tty/n_tty.c:2177
7 locks held by syz-executor.5/5653:
1 lock held by segctord/5652:
#0: ffff8880369932a0 (&nilfs->ns_segctor_sem){++++}-{3:3}, at: nilfs_transaction_lock+0x246/0x4b0 fs/nilfs2/segment.c:357
1 lock held by syz-executor.0/12093:
#0: ffff88807d6b9080 (&iint->mutex){+.+.}-{3:3}, at: ima_check_last_writer security/integrity/ima/ima_main.c:164 [inline]
#0: ffff88807d6b9080 (&iint->mutex){+.+.}-{3:3}, at: ima_file_free+0x109/0x3a0 security/integrity/ima/ima_main.c:198
1 lock held by syz-executor.0/12095:

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.1.0-rc3-syzkaller-00332-g089d1c31224e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
nmi_cpu_backtrace+0x46f/0x4f0 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1ba/0x420 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:148 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:220 [inline]
watchdog+0xcf5/0xd40 kernel/hung_task.c:377
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 4152 Comm: syz-fuzzer Not tainted 6.1.0-rc3-syzkaller-00332-g089d1c31224e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0033:0x42f560
Code: b4 d0 d0 16 00 00 0f 1f 40 00 48 81 fe 00 10 00 00 72 df 48 89 54 24 28 48 89 5c 24 20 48 89 f0 31 db 48 89 d9 0f 1f 44 00 00 <e8> 7b 46 fe ff 48 85 c0 75 18 48 8b 44 24 48 48 8b 4c 24 30 48 8b
RSP: 002b:000000c036133f60 EFLAGS: 00000246
RAX: 0000000001314a80 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000184 RSI: 0000000001314a80 RDI: 000000000045e5a0
RBP: 000000c036133f98 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000cdd5b8 R11: 000000c01a5937a0 R12: 000000c000042ed0
R13: 000000000186b680 R14: 000000c0009ee680 R15: 00007fd6f58ddc68
FS: 000000c0254c0890 GS: 0000000000000000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,
Feb 14, 2023, 3:14:43 AM2/14/23
to konishi...@gmail.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
syzbot has found a reproducer for the following issue on:

HEAD commit: f6feea56f66d Merge tag 'mm-hotfixes-stable-2023-02-13-13-5..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=165ee62b480000
kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
dashboard link: https://syzkaller.appspot.com/bug?extid=f0c4082ce5ebebdac63b
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14ba7207480000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15fd30d0c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1ae0143f08d5/disk-f6feea56.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/18b8a23fa0cb/vmlinux-f6feea56.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d915f4c5c8c0/bzImage-f6feea56.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/1acd3b288433/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+f0c408...@syzkaller.appspotmail.com

INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/12:
#0: ffffffff8cf258d0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x26/0xce0 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/13:
#0: ffffffff8cf260d0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x26/0xce0 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/28:
#0: ffffffff8cf25700 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
2 locks held by getty/4745:
#0: ffff88802c2eb098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:244
#1: ffffc900015b02f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6ab/0x1db0 drivers/tty/n_tty.c:2177
3 locks held by syz-executor996/5065:
1 lock held by segctord/5067:
#0: ffff888017ce92a0 (&nilfs->ns_segctor_sem){++++}-{3:3}, at: nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 28 Comm: khungtaskd Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x4e5/0x560 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1b4/0x3f0 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:148 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:220 [inline]
watchdog+0xf70/0xfb0 kernel/hung_task.c:377
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5065 Comm: syz-executor996 Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
RIP: 0010:__lock_release kernel/locking/lockdep.c:5372 [inline]
RIP: 0010:lock_release+0x333/0xaa0 kernel/locking/lockdep.c:5688
Code: 00 f0 ff 42 0f b6 04 3b 84 c0 0f 85 7e 05 00 00 45 89 2e 41 81 fd ff ff 0f 00 0f 87 ff 02 00 00 48 8b 44 24 40 42 0f b6 04 38 <84> c0 0f 85 34 05 00 00 89 16 4c 89 e0 48 c1 e8 03 42 80 3c 38 00
RSP: 0018:ffffc90003def1c0 EFLAGS: 00000087
RAX: 0000000000000000 RBX: 1ffff11004d808aa RCX: ffffc90003def203
RDX: 0000000000000003 RSI: ffff888026c044b0 RDI: ffff888026c04530
RBP: ffffc90003def2f0 R08: dffffc0000000000 R09: fffffbfff1ca4ece
R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888026c04530
R13: 0000000000020021 R14: ffff888026c04550 R15: dffffc0000000000
FS: 0000555556f2e300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056435c9b6680 CR3: 000000001e10a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf1/0x160 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fada4f355f9
Code: Unable to access opcode bytes at 0x7fada4f355cf.
RSP: 002b:00007ffdc80a3908 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fada4f355f9
RDX: 0000000020000040 RSI: 0000000040086e8b RDI: 0000000000000006
RBP: 0000000000000000 R08: 00007fada4fa3ec0 R09: 00007fada4fa3ec0
R10: 00007fada4fa3ec0 R11: 0000000000000246 R12: 00007ffdc80a3930
R13: 0000000000000000 R14: 431bde82d7b634db R15: 0000000000000000
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.549 msecs

Ryusuke Konishi

unread,
Feb 14, 2023, 4:11:42 AM2/14/23
to syzbot, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
It looks like the resize ioctl is holding r/w semaphore ns_segctor_sem
for too long and hangs the segment constructor thread. I'll take a
closer look.

Ryusuke Konishi

Ryusuke Konishi

unread,
Feb 14, 2023, 5:40:49 PM2/14/23
to Andrew Morton, linux-nilfs, syzbot, syzkall...@googlegroups.com, LKML
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.

The current nilfs2 implementation lacks this check, causing
out-of-bound block access when mounting devices smaller than 4096 bytes:

I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)

In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:

INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...

This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.

Link: https://lkml.kernel.org/r/0000000000004e...@google.com
Signed-off-by: Ryusuke Konishi <konishi...@gmail.com>
Reported-by: syzbot+f0c408...@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi...@gmail.com>
Cc: sta...@vger.kernel.org
---
fs/nilfs2/ioctl.c | 7 +++++++
fs/nilfs2/super.c | 9 +++++++++
fs/nilfs2/the_nilfs.c | 8 +++++++-
3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)

minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;

+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;

+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
--
2.34.1

Hillf Danton

unread,
Feb 14, 2023, 8:52:46 PM2/14/23
to Dmitry Vyukov, konishi...@gmail.com, linux-...@vger.kernel.org, Yu Zhao, Andrew Morton, syzbot, linu...@kvack.org, syzkall...@googlegroups.com
On Tue, 14 Feb 2023 00:14:42 -0800
Syzbot was launched without MGLRU enabled [1].
Dmitry could you turn it on by default?

Thanks
Hillf

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/swap.c?id=f6feea56f66d#n490
Reply all
Reply to author
Forward
0 new messages