[syzbot] INFO: task hung in nilfs_detach_log_writer

21 views
Skip to first unread message

syzbot

unread,
Oct 24, 2022, 6:07:39 AM10/24/22
to bra...@kernel.org, bro...@kernel.org, catalin...@arm.com, ebie...@xmission.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, madv...@linux.microsoft.com, mark.r...@arm.com, sc...@os.amperecomputing.com, syzkall...@googlegroups.com, wi...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: bbed346d5a96 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=12c63b3c880000
kernel config: https://syzkaller.appspot.com/x/.config?x=3a4a45d2d827c1e
dashboard link: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13010d3c880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16c58e6e880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e8e91bc79312/disk-bbed346d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c1cb3fb3b77e/vmlinux-bbed346d.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/ef6217a5c5b6/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e3973c...@syzkaller.appspotmail.com

INFO: task syz-executor217:3091 blocked for more than 143 seconds.
Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor217 state:D stack: 0 pid: 3091 ppid: 3089 flags:0x00000008
Call trace:
__switch_to+0x180/0x298 arch/arm64/kernel/process.c:557
context_switch kernel/sched/core.c:5182 [inline]
__schedule+0x414/0x5a0 kernel/sched/core.c:6494
schedule+0x64/0xa4 kernel/sched/core.c:6570
schedule_timeout+0x64/0x1b4 kernel/time/timer.c:1911
do_wait_for_common+0xf4/0x184 kernel/sched/completion.c:85
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x48/0x60 kernel/sched/completion.c:138
__flush_work+0xd4/0x144 kernel/workqueue.c:3073
flush_work+0x24/0x38 kernel/workqueue.c:3094
nilfs_segctor_destroy fs/nilfs2/segment.c:2726 [inline]
nilfs_detach_log_writer+0x1e0/0x4d0 fs/nilfs2/segment.c:2810
nilfs_put_super+0x28/0x9c fs/nilfs2/super.c:468
generic_shutdown_super+0x8c/0x190 fs/super.c:491
kill_block_super+0x30/0x78 fs/super.c:1427
deactivate_locked_super+0x70/0xe8 fs/super.c:332
deactivate_super+0xd0/0xd4 fs/super.c:363
cleanup_mnt+0x1f8/0x234 fs/namespace.c:1186
__cleanup_mnt+0x20/0x30 fs/namespace.c:1193
task_work_run+0xc4/0x14c kernel/task_work.c:177
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
do_notify_resume+0x174/0x1f0 arch/arm64/kernel/signal.c:1127
prepare_exit_to_user_mode arch/arm64/kernel/entry-common.c:137 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:142 [inline]
el0_svc+0x9c/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/11:
#0: ffff80000d433568 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x3c/0x450 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/12:
#0: ffff80000d433bb8 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x3c/0x450 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/27:
#0: ffff80000d433440 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x4/0x48 include/linux/rcupdate.h:279
2 locks held by getty/2727:
#0: ffff0000c7906098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x28/0x58 drivers/tty/tty_ldisc.c:244
#1: ffff80000f6162f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x19c/0x89c drivers/tty/n_tty.c:2177
1 lock held by syz-executor217/3091:
#0: ffff0000c90930e0 (&type->s_umount_key#41){+.+.}-{3:3}, at: deactivate_super+0xc8/0xd4 fs/super.c:362
3 locks held by kworker/1:0/3113:
#0: ffff0000c0010738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x270/0x504 kernel/workqueue.c:2262
#1: ffff800012703d80 ((work_completion)(&sci->sc_iput_work)){+.+.}-{0:0}, at: process_one_work+0x29c/0x504 kernel/workqueue.c:2264
#2: ffff0000c9093650 (sb_internal#2){.+.+}-{0:0}, at: nilfs_evict_inode+0x94/0x1cc fs/nilfs2/inode.c:911

=============================================



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Ryusuke Konishi

unread,
May 20, 2024, 9:26:54 AMMay 20
to Andrew Morton, linux...@vger.kernel.org, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, sjb...@psu.edu
Hello Andrew,

Please apply this series as bug fixes.

This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.

thank you always.

Ryusuke Konishi

Ryusuke Konishi (3):
nilfs2: fix use-after-free of timer for log writer thread
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix potential hang in nilfs_detach_log_writer()

fs/nilfs2/segment.c | 63 +++++++++++++++++++++++++++++++++++----------
1 file changed, 50 insertions(+), 13 deletions(-)

--
2.34.1

Ryusuke Konishi

unread,
May 20, 2024, 9:26:56 AMMay 20
to Andrew Morton, linux...@vger.kernel.org, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, sjb...@psu.edu
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.

The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info
structure is about to be freed, and is used regardless of the thread's
lifetime.

Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.

Reported-by: "Bai, Shuangpeng" <sjb...@psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Signed-off-by: Ryusuke Konishi <konishi...@gmail.com>
Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Tested-by: Ryusuke Konishi <konishi...@gmail.com>
Cc: sta...@vger.kernel.org
---
fs/nilfs2/segment.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 8654ab8ad534..4e274bc8eb79 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2118,8 +2118,10 @@ static void nilfs_segctor_start_timer(struct nilfs_sc_info *sci)
{
spin_lock(&sci->sc_state_lock);
if (!(sci->sc_state & NILFS_SEGCTOR_COMMIT)) {
- sci->sc_timer.expires = jiffies + sci->sc_interval;
- add_timer(&sci->sc_timer);
+ if (sci->sc_task) {
+ sci->sc_timer.expires = jiffies + sci->sc_interval;
+ add_timer(&sci->sc_timer);
+ }
sci->sc_state |= NILFS_SEGCTOR_COMMIT;
}
spin_unlock(&sci->sc_state_lock);
@@ -2320,10 +2322,21 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
*/
static void nilfs_segctor_accept(struct nilfs_sc_info *sci)
{
+ bool thread_is_alive;
+
spin_lock(&sci->sc_state_lock);
sci->sc_seq_accepted = sci->sc_seq_request;
+ thread_is_alive = (bool)sci->sc_task;
spin_unlock(&sci->sc_state_lock);
- del_timer_sync(&sci->sc_timer);
+
+ /*
+ * This function does not race with the log writer thread's
+ * termination. Therefore, deleting sc_timer, which should not be
+ * done after the log writer thread exits, can be done safely outside
+ * the area protected by sc_state_lock.
+ */
+ if (thread_is_alive)
+ del_timer_sync(&sci->sc_timer);
}

/**
@@ -2349,7 +2362,7 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, int mode, int err)
sci->sc_flush_request &= ~FLUSH_DAT_BIT;

/* re-enable timer if checkpoint creation was not done */
- if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
+ if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) && sci->sc_task &&
time_before(jiffies, sci->sc_timer.expires))
add_timer(&sci->sc_timer);
}
@@ -2539,6 +2552,7 @@ static int nilfs_segctor_thread(void *arg)
int timeout = 0;

sci->sc_timer_task = current;
+ timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);

/* start sync. */
sci->sc_task = current;
@@ -2606,6 +2620,7 @@ static int nilfs_segctor_thread(void *arg)
end_thread:
/* end sync. */
sci->sc_task = NULL;
+ timer_shutdown_sync(&sci->sc_timer);
wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */
spin_unlock(&sci->sc_state_lock);
return 0;
@@ -2669,7 +2684,6 @@ static struct nilfs_sc_info *nilfs_segctor_new(struct super_block *sb,
INIT_LIST_HEAD(&sci->sc_gc_inodes);
INIT_LIST_HEAD(&sci->sc_iput_queue);
INIT_WORK(&sci->sc_iput_work, nilfs_iput_work_func);
- timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);

sci->sc_interval = HZ * NILFS_SC_DEFAULT_TIMEOUT;
sci->sc_mjcp_freq = HZ * NILFS_SC_DEFAULT_SR_FREQ;
@@ -2748,7 +2762,6 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci)

down_write(&nilfs->ns_segctor_sem);

- timer_shutdown_sync(&sci->sc_timer);
kfree(sci);
}

--
2.34.1

Ryusuke Konishi

unread,
May 20, 2024, 9:26:59 AMMay 20
to Andrew Morton, linux...@vger.kernel.org, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, sjb...@psu.edu
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread
writes a checkpoint, unless there is an interrupt or other trigger to
resume log writing.

This turned out to be because, depending on the execution timing
of the log writer thread running in parallel, the log writer thread
may skip responding to nilfs_segctor_sync(), which causes a call to
schedule() waiting for completion within nilfs_segctor_sync() to lose
the opportunity to wake up.

The reason why waking up the task waiting in nilfs_segctor_sync() may
be skipped is that updating the request generation issued using a
shared sequence counter and adding an wait queue entry to the request
wait queue to the log writer, are not done atomically. There is a
possibility that log writing and request completion notification by
nilfs_segctor_wakeup() may occur between the two operations, and in
that case, the wait queue entry is not yet visible to
nilfs_segctor_wakeup() and the wake-up of nilfs_segctor_sync() will be
carried over until the next request occurs.

Fix this issue by performing these two operations simultaneously
within the lock section of sc_state_lock. Also, following the memory
barrier guidelines for event waiting loops, move the call to
set_current_state() in the same location into the event waiting loop
to ensure that a memory barrier is inserted just before the event
condition determination.

Signed-off-by: Ryusuke Konishi <konishi...@gmail.com>
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Tested-by: Ryusuke Konishi <konishi...@gmail.com>
Cc: sta...@vger.kernel.org
---
fs/nilfs2/segment.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 4e274bc8eb79..99c78a49e432 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2168,19 +2168,28 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci)
struct nilfs_segctor_wait_request wait_req;
int err = 0;

- spin_lock(&sci->sc_state_lock);
init_wait(&wait_req.wq);
wait_req.err = 0;
atomic_set(&wait_req.done, 0);
+ init_waitqueue_entry(&wait_req.wq, current);
+
+ /*
+ * To prevent a race issue where completion notifications from the
+ * log writer thread are missed, increment the request sequence count
+ * "sc_seq_request" and insert a wait queue entry using the current
+ * sequence number into the "sc_wait_request" queue at the same time
+ * within the lock section of "sc_state_lock".
+ */
+ spin_lock(&sci->sc_state_lock);
wait_req.seq = ++sci->sc_seq_request;
+ add_wait_queue(&sci->sc_wait_request, &wait_req.wq);
spin_unlock(&sci->sc_state_lock);

- init_waitqueue_entry(&wait_req.wq, current);
- add_wait_queue(&sci->sc_wait_request, &wait_req.wq);
- set_current_state(TASK_INTERRUPTIBLE);
wake_up(&sci->sc_wait_daemon);

for (;;) {
+ set_current_state(TASK_INTERRUPTIBLE);
+
if (atomic_read(&wait_req.done)) {
err = wait_req.err;
break;
--
2.34.1

Ryusuke Konishi

unread,
May 20, 2024, 9:27:03 AMMay 20
to Andrew Morton, linux...@vger.kernel.org, syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, sjb...@psu.edu
Syzbot has reported a potential hang in nilfs_detach_log_writer()
called during nilfs2 unmount.

Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call
trace below:

nilfs_detach_log_writer
nilfs_segctor_destroy
nilfs_segctor_kill_thread --> Shut down log writer thread
flush_work
nilfs_iput_work_func
nilfs_dispose_list
iput
nilfs_evict_inode
nilfs_transaction_commit
nilfs_construct_segment (if inode needs sync)
nilfs_segctor_sync --> Attempt to synchronize with
log writer thread
*** DEADLOCK ***

Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and
by forcing tasks that are already waiting to complete once after the
thread terminates.

The skipped inode metadata flushout will then be processed together in
the subsequent cleanup work in nilfs_segctor_destroy().

Signed-off-by: Ryusuke Konishi <konishi...@gmail.com>
Reported-by: syzbot+e3973c...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0
Tested-by: Ryusuke Konishi <konishi...@gmail.com>
Cc: sta...@vger.kernel.org
---
fs/nilfs2/segment.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 99c78a49e432..c27f0daec9af 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2190,6 +2190,14 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci)
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);

+ /*
+ * Synchronize only while the log writer thread is alive.
+ * Leave flushing out after the log writer thread exits to
+ * the cleanup work in nilfs_segctor_destroy().
+ */
+ if (!sci->sc_task)
+ break;
+
if (atomic_read(&wait_req.done)) {
err = wait_req.err;
break;
@@ -2205,7 +2213,7 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci)
return err;
}

-static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err)
+static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err, bool force)
{
struct nilfs_segctor_wait_request *wrq, *n;
unsigned long flags;
@@ -2213,7 +2221,7 @@ static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err)
spin_lock_irqsave(&sci->sc_wait_request.lock, flags);
list_for_each_entry_safe(wrq, n, &sci->sc_wait_request.head, wq.entry) {
if (!atomic_read(&wrq->done) &&
- nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq)) {
+ (force || nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq))) {
wrq->err = err;
atomic_set(&wrq->done, 1);
}
@@ -2362,7 +2370,7 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, int mode, int err)
if (mode == SC_LSEG_SR) {
sci->sc_state &= ~NILFS_SEGCTOR_COMMIT;
sci->sc_seq_done = sci->sc_seq_accepted;
- nilfs_segctor_wakeup(sci, err);
+ nilfs_segctor_wakeup(sci, err, false);
sci->sc_flush_request = 0;
} else {
if (mode == SC_FLUSH_FILE)
@@ -2746,6 +2754,13 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci)
|| sci->sc_seq_request != sci->sc_seq_done);
spin_unlock(&sci->sc_state_lock);

+ /*
+ * Forcibly wake up tasks waiting in nilfs_segctor_sync(), which can
+ * be called from delayed iput() via nilfs_evict_inode() and can race
+ * with the above log writer thread termination.
+ */
+ nilfs_segctor_wakeup(sci, 0, true);
+
if (flush_work(&sci->sc_iput_work))
flag = true;

--
2.34.1

Reply all
Reply to author
Forward
0 new messages