[syzbot] [fs?] INFO: task hung in eventpoll_release_file

20 views
Skip to first unread message

syzbot

unread,
Mar 20, 2023, 5:06:44 PM3/20/23
to bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot found the following issue on:

HEAD commit: 6f72958a49f6 Add linux-next specific files for 20230316
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=178faee6c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=dd64232230d3a0d8
dashboard link: https://syzkaller.appspot.com/bug?extid=e6dab35a08df7f7aa260
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1156a6ecc80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=179e9648c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/3fcac83e3d53/disk-6f72958a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/37b457d3103a/vmlinux-6f72958a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b82735625cce/bzImage-6f72958a.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e6dab3...@syzkaller.appspotmail.com

INFO: task syz-executor397:5874 blocked for more than 143 seconds.
Not tainted 6.3.0-rc2-next-20230316-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor397 state:D stack:27072 pid:5874 ppid:5147 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5305 [inline]
__schedule+0x1d23/0x5650 kernel/sched/core.c:6623
schedule+0xde/0x1a0 kernel/sched/core.c:6699
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6758
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0xa3b/0x1350 kernel/locking/mutex.c:747
eventpoll_release_file+0xe2/0x1d0 fs/eventpoll.c:962
eventpoll_release include/linux/eventpoll.h:53 [inline]
__fput+0x7da/0xa90 fs/file_table.c:312
task_work_run+0x16f/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0183a1b0b9
RSP: 002b:00007f01831bf208 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: 0000000000000116 RBX: 00007f0183a9d4a8 RCX: 00007f0183a1b0b9
RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000004
RBP: 00007f0183a9d4a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0183a9d4ac
R13: 00007fff8a59281f R14: 00007f01831bf300 R15: 0000000000022000
</TASK>

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/13:
#0: ffffffff8c795830 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0xd80 kernel/rcu/tasks.h:516
1 lock held by rcu_tasks_trace/14:
#0: ffffffff8c795530 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0xd80 kernel/rcu/tasks.h:516
1 lock held by khungtaskd/28:
#0: ffffffff8c796400 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x51/0x390 kernel/locking/lockdep.c:6545
2 locks held by getty/4761:
#0: ffff88814a34b098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x26/0x80 drivers/tty/tty_ldisc.c:243
#1: ffffc900015902f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xef4/0x13e0 drivers/tty/n_tty.c:2177
1 lock held by syz-executor397/5873:
1 lock held by syz-executor397/5874:
#0: ffff88801e3d5c68 (&ep->mtx){+.+.}-{3:3}, at: eventpoll_release_file+0xe2/0x1d0 fs/eventpoll.c:962

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.3.0-rc2-next-20230316-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
nmi_cpu_backtrace+0x29c/0x350 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x2a4/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:148 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xe16/0x1090 kernel/hung_task.c:379
kthread+0x33e/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5873 Comm: syz-executor397 Not tainted 6.3.0-rc2-next-20230316-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
RIP: 0010:arch_atomic_read arch/x86/include/asm/atomic.h:29 [inline]
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs include/linux/context_tracking.h:121 [inline]
RIP: 0010:rcu_is_watching+0x47/0xb0 kernel/rcu/tree.c:695
Code: c5 77 7a 48 8d 3c ed 00 fa 10 8c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 54 48 03 1c ed 00 fa 10 8c <48> b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 0f b6 14 02 48
RSP: 0018:ffffc90004c9fc58 EFLAGS: 00000282
RAX: dffffc0000000000 RBX: ffff8880b9936c68 RCX: ffffffff8164cf65
RDX: 1ffffffff1821f41 RSI: 0000000000000002 RDI: ffffffff8c10fa08
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff8e784717
R10: fffffbfff1cf08e2 R11: 0000000000000000 R12: ffff8880772cf1c8
R13: ffff88801e3d5c00 R14: ffff8880772cf180 R15: 0000000000000000
FS: 0000555556477400(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005642a1c9c5c0 CR3: 0000000025100000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
trace_lock_release include/trace/events/lock.h:69 [inline]
lock_release+0x4fb/0x670 kernel/locking/lockdep.c:5702
__raw_spin_unlock include/linux/spinlock_api_smp.h:141 [inline]
_raw_spin_unlock+0x16/0x40 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:390 [inline]
__ep_remove+0x6ce/0xa50 fs/eventpoll.c:737
ep_remove_safe fs/eventpoll.c:782 [inline]
ep_clear_and_put+0x21c/0x3b0 fs/eventpoll.c:815
ep_eventpoll_release+0x45/0x60 fs/eventpoll.c:831
__fput+0x27c/0xa90 fs/file_table.c:321
task_work_run+0x16f/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f01839d3dfb
Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44
RSP: 002b:00007fff8a592880 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 00007f01839d3dfb
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 0000000000152fad R08: 0000000000000000 R09: 00007fff8a5a7080
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f0183a9d4bc
R13: 00007fff8a592900 R14: 00007fff8a5929a0 R15: 00007f0183a9d4a0
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Mar 21, 2023, 2:19:27 AM3/21/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 20 Mar 2023 14:06:42 -0700
> HEAD commit: 6f72958a49f6 Add linux-next specific files for 20230316
> git tree: linux-next
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=179e9648c80000

Check if ep item is dying.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 6f72958a49f6

--- x/fs/eventpoll.c
+++ y/fs/eventpoll.c
@@ -718,7 +718,7 @@ static void ep_free(struct eventpoll *ep
* while running concurrently with eventpoll_release_file().
* Returns true if the eventpoll can be disposed.
*/
-static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
+static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force, int *del)
{
struct file *file = epi->ffd.file;
struct epitems_head *to_free;
@@ -726,6 +726,8 @@ static bool __ep_remove(struct eventpoll

lockdep_assert_irqs_enabled();

+ if (del)
+ *del = 0;
/*
* Removes poll wait queue hooks.
*/
@@ -754,6 +756,8 @@ static bool __ep_remove(struct eventpoll
free_ephead(to_free);

rb_erase_cached(&epi->rbn, &ep->rbr);
+ if (del)
+ *del = 1;

write_lock_irq(&ep->lock);
if (ep_is_linked(epi))
@@ -779,7 +783,7 @@ static bool __ep_remove(struct eventpoll
*/
static void ep_remove_safe(struct eventpoll *ep, struct epitem *epi)
{
- WARN_ON_ONCE(__ep_remove(ep, epi, false));
+ WARN_ON_ONCE(__ep_remove(ep, epi, false, NULL));
}

static void ep_clear_and_put(struct eventpoll *ep)
@@ -811,9 +815,16 @@ static void ep_clear_and_put(struct even
* dispose it.
*/
while ((rbp = rb_first_cached(&ep->rbr)) != NULL) {
+ int del = 0;
+
epi = rb_entry(rbp, struct epitem, rbn);
- ep_remove_safe(ep, epi);
+ __ep_remove(ep, epi, false, &del);
cond_resched();
+ if (!del) { /* epi is dying, lock and try again */
+ mutex_unlock(&ep->mtx);
+ udelay(1);
+ mutex_lock(&ep->mtx);
+ }
}

dispose = ep_refcount_dec_and_test(ep);
@@ -960,7 +971,7 @@ again:
*/
ep = epi->ep;
mutex_lock(&ep->mtx);
- dispose = __ep_remove(ep, epi, true);
+ dispose = __ep_remove(ep, epi, true, NULL);
mutex_unlock(&ep->mtx);

if (dispose)
--

syzbot

unread,
Mar 21, 2023, 2:42:28 AM3/21/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+e6dab3...@syzkaller.appspotmail.com

Tested on:

commit: 6f72958a Add linux-next specific files for 20230316
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1719c87ac80000
kernel config: https://syzkaller.appspot.com/x/.config?x=dd64232230d3a0d8
dashboard link: https://syzkaller.appspot.com/bug?extid=e6dab35a08df7f7aa260
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=178213a6c80000

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Mar 21, 2023, 5:34:23 AM3/21/23
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file fs/eventpoll.c
patch: **** unexpected end of file in patch



Tested on:

commit: 6f72958a Add linux-next specific files for 20230316
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
patch: https://syzkaller.appspot.com/x/patch.diff?x=12fb4091c80000

Paolo Abeni

unread,
Mar 21, 2023, 5:45:19 AM3/21/23
to syzbot, syzkall...@googlegroups.com
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 6f72958a49f6
---
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 25a59640748a0fd22a84a5aecb90815fbbca9cef..1db56c6175aab5af7bc637a452b68ed8bc11fd7f
100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -784,7 +784,7 @@ static void ep_remove_safe(struct eventpoll *ep, struct epitem *epi)

static void ep_clear_and_put(struct eventpoll *ep)
{
- struct rb_node *rbp;
+ struct rb_node *rbp, *next;
struct epitem *epi;
bool dispose;

@@ -810,7 +810,8 @@ static void ep_clear_and_put(struct eventpoll *ep)
* Since we still own a reference to the eventpoll struct, the loop can't
* dispose it.
*/
- while ((rbp = rb_first_cached(&ep->rbr)) != NULL) {
+ for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = next) {
+ next = rb_next(rbp);
epi = rb_entry(rbp, struct epitem, rbn);
ep_remove_safe(ep, epi);
cond_resched();


Paolo Abeni

unread,
Mar 21, 2023, 5:45:19 AM3/21/23
to syzbot, syzkall...@googlegroups.com
whoops, the patch was mangled in the previous attempt

---
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2b01fc05c307..afd69849796c 100644

syzbot

unread,
Mar 21, 2023, 6:00:23 AM3/21/23
to pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+e6dab3...@syzkaller.appspotmail.com

Tested on:

commit: 6f72958a Add linux-next specific files for 20230316
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=16ca0a4ec80000
kernel config: https://syzkaller.appspot.com/x/.config?x=dd64232230d3a0d8
dashboard link: https://syzkaller.appspot.com/bug?extid=e6dab35a08df7f7aa260
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=148386a6c80000

Theodore Ts'o

unread,
Apr 30, 2023, 2:34:40 AM4/30/23
to syzbot, bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
#syz set subsystem: fs

This somehow got tagged with the ext4 label, and not the fs label.
(And this is not the first one I've noticed). I'm beginning to
suspect there may have been some syzbot database hiccup? Anyway,
fixing...

- Ted

syzbot

unread,
Apr 30, 2023, 2:34:49 AM4/30/23
to ty...@mit.edu, bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu, vi...@zeniv.linux.org.uk
> #syz set subsystem: fs

The specified label "subsystem" is unknown.
Please use one of the supported labels.

The following labels are suported:
no-reminders, prio: {low, normal, high}, subsystems: {.. see below ..}
The list of subsystems: https://syzkaller.appspot.com/upstream/subsystems?all=true

Theodore Ts'o

unread,
Apr 30, 2023, 12:11:39 PM4/30/23
to syzbot, bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
#syz set subsystems: fs

Aleksandr Nogikh

unread,
May 2, 2023, 4:08:56 PM5/2/23
to Theodore Ts'o, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hi Ted,
FWIW one of this bug's crashes was attributed to ext4 [1] and syzbot's
logic in this case was to prefer a more specific subsystem (ext4) to a
more generic one (fs), even if it's not mentioned in the majority of
crashes.

[1] https://syzkaller.appspot.com/text?tag=CrashReport&x=171abfaac80000

--
Aleksandr
>
> - Ted
>
> --

Theodore Ts'o

unread,
May 3, 2023, 1:04:50 AM5/3/23
to Aleksandr Nogikh, syzbot, bra...@kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
One of the challenges is that the attribution is not necessasrily
accurate. One of the CPU's was running an ext4 workqueue task (which
was apparntly making forward progress) at the time of the crash.

It should also be noted that apparently there is a potential patch
which seems to fix the problem, and it's solely in the fs/eventpoll.c.
Unfortunately, it was not in the lore.kernel.org archives, since
apparently it wasn't cc'ed there. It's in the syzkaller-bugs Google
Groups archive, though, since Pauolo Abeni cc'ed the
syzkall...@googlegroups.com, but not the lore archive, on his
test:

https://groups.google.com/g/syzkaller-bugs/c/oiBUmGsqz_Q/m/Xi5iOeJNAgAJ

- Ted

syzbot

unread,
Sep 6, 2023, 6:42:48 AM9/6/23
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages