INFO: task hung in p9_fd_close

34 views
Skip to first unread message

syzbot

unread,
Aug 30, 2019, 3:28:08 PM8/30/19
to asma...@codewreck.org, da...@davemloft.net, eri...@gmail.com, linux-...@vger.kernel.org, lu...@ionkov.net, net...@vger.kernel.org, syzkall...@googlegroups.com, v9fs-de...@lists.sourceforge.net
Hello,

syzbot found the following crash on:

HEAD commit: 6525771f Merge tag 'arc-5.3-rc7' of git://git.kernel.org/p..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1118a71e600000
kernel config: https://syzkaller.appspot.com/x/.config?x=58485246ad14eafe
dashboard link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33
compiler: clang version 9.0.0 (/home/glider/llvm/clang
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1125ee12600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8b41a1...@syzkaller.appspotmail.com

INFO: task syz-executor.1:13699 blocked for more than 143 seconds.
Not tainted 5.3.0-rc6+ #94
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1 D28888 13699 9148 0x00004004
Call Trace:
context_switch kernel/sched/core.c:3254 [inline]
__schedule+0x877/0xc50 kernel/sched/core.c:3880
schedule+0x131/0x1e0 kernel/sched/core.c:3947
schedule_timeout+0x46/0x240 kernel/time/timer.c:1783
do_wait_for_common+0x2e7/0x4d0 kernel/sched/completion.c:83
__wait_for_common kernel/sched/completion.c:104 [inline]
wait_for_common kernel/sched/completion.c:115 [inline]
wait_for_completion+0x47/0x60 kernel/sched/completion.c:136
__flush_work+0xd4/0x150 kernel/workqueue.c:3040
__cancel_work_timer+0x420/0x570 kernel/workqueue.c:3127
cancel_work_sync+0x17/0x20 kernel/workqueue.c:3163
p9_conn_destroy net/9p/trans_fd.c:868 [inline]
p9_fd_close+0x297/0x3c0 net/9p/trans_fd.c:898
p9_client_create+0x974/0xee0 net/9p/client.c:1068
v9fs_session_init+0x192/0x18e0 fs/9p/v9fs.c:406
v9fs_mount+0x82/0x810 fs/9p/vfs_super.c:120
legacy_get_tree+0xf9/0x1a0 fs/fs_context.c:661
vfs_get_tree+0x8f/0x380 fs/super.c:1413
do_new_mount fs/namespace.c:2791 [inline]
do_mount+0x169d/0x2490 fs/namespace.c:3111
ksys_mount+0xcc/0x100 fs/namespace.c:3320
__do_sys_mount fs/namespace.c:3334 [inline]
__se_sys_mount fs/namespace.c:3331 [inline]
__x64_sys_mount+0xbf/0xd0 fs/namespace.c:3331
do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459879
Code: 8b 44 24 18 48 8b 4c 24 30 48 83 c1 08 48 89 0c 24 48 89 44 24 08 48
c7 44 24 10 10 00 00 00 e8 0d da fa ff 48 8b 44 24 18 48 <89> 44 24 40 48
8b 6c 24 20 48 83 c4 28 c3 e8 14 b9 ff ff eb 82 cc
RSP: 002b:00007f6b4dda7c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000459879
RDX: 0000000020000140 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000075c118 R08: 0000000020000480 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b4dda86d4
R13: 00000000004c5e2f R14: 00000000004da930 R15: 00000000ffffffff
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 1057 Comm: khungtaskd Not tainted 5.3.0-rc6+ #94
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
nmi_cpu_backtrace+0xaf/0x1a0 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x174/0x290 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x10/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace+0x17/0x20 include/linux/nmi.h:146
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0xbb9/0xbd0 kernel/hung_task.c:289
kthread+0x332/0x350 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0x10
arch/x86/include/asm/irqflags.h:60


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,
Sep 21, 2019, 12:19:07 PM9/21/19
to asma...@codewreck.org, da...@davemloft.net, eri...@gmail.com, linux-...@vger.kernel.org, lu...@ionkov.net, net...@vger.kernel.org, syzkall...@googlegroups.com, v9fs-de...@lists.sourceforge.net
syzbot has found a reproducer for the following crash on:

HEAD commit: f97c81dc Merge tag 'armsoc-late' of git://git.kernel.org/p..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=171a1555600000
kernel config: https://syzkaller.appspot.com/x/.config?x=61f948934213449f
dashboard link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33
compiler: clang version 9.0.0 (/home/glider/llvm/clang
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1641d429600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eff9ad600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8b41a1...@syzkaller.appspotmail.com

INFO: task syz-executor635:8085 blocked for more than 143 seconds.
Not tainted 5.3.0+ #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor635 D27848 8085 8070 0x00004004
Call Trace:
context_switch kernel/sched/core.c:3384 [inline]
__schedule+0x82e/0xc50 kernel/sched/core.c:4056
schedule+0x131/0x1e0 kernel/sched/core.c:4123
schedule_timeout+0x46/0x240 kernel/time/timer.c:1869
do_wait_for_common+0x2e7/0x4d0 kernel/sched/completion.c:83
__wait_for_common kernel/sched/completion.c:104 [inline]
wait_for_common kernel/sched/completion.c:115 [inline]
wait_for_completion+0x47/0x60 kernel/sched/completion.c:136
__flush_work+0xd4/0x150 kernel/workqueue.c:3040
__cancel_work_timer+0x420/0x570 kernel/workqueue.c:3127
cancel_work_sync+0x17/0x20 kernel/workqueue.c:3163
p9_conn_destroy net/9p/trans_fd.c:868 [inline]
p9_fd_close+0x297/0x3c0 net/9p/trans_fd.c:898
p9_client_create+0x974/0xee0 net/9p/client.c:1068
v9fs_session_init+0x192/0x18e0 fs/9p/v9fs.c:406
v9fs_mount+0x82/0x860 fs/9p/vfs_super.c:124
legacy_get_tree+0xf9/0x1a0 fs/fs_context.c:659
vfs_get_tree+0x8f/0x380 fs/super.c:1542
do_new_mount fs/namespace.c:2825 [inline]
do_mount+0x16c7/0x2510 fs/namespace.c:3145
ksys_mount+0xcc/0x100 fs/namespace.c:3354
__do_sys_mount fs/namespace.c:3368 [inline]
__se_sys_mount fs/namespace.c:3365 [inline]
__x64_sys_mount+0xbf/0xd0 fs/namespace.c:3365
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x447909
Code: e8 5c 14 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 9b 0c fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007eff1aea0db8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00000000006ddc78 RCX: 0000000000447909
RDX: 00000000200005c0 RSI: 0000000020000540 RDI: 0000000000000000
RBP: 00000000006ddc70 R08: 0000000020000680 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006ddc7c
R13: 00007ffc4e6377ef R14: 00007eff1aea19c0 R15: 000000000000002d

Showing all locks held in the system:
2 locks held by kworker/1:1/22:
#0: ffff8880aa4278e8 ((wq_completion)events){+.+.}, at: spin_unlock_irq
include/linux/spinlock.h:388 [inline]
#0: ffff8880aa4278e8 ((wq_completion)events){+.+.}, at:
process_one_work+0x75d/0x10e0 kernel/workqueue.c:2242
#1: ffff8880a9a3fd78 ((work_completion)(&m->wq)){+.+.}, at:
process_one_work+0x79f/0x10e0 kernel/workqueue.c:2244
1 lock held by khungtaskd/1053:
#0: ffffffff888d3900 (rcu_read_lock){....}, at: rcu_lock_acquire+0x4/0x30
include/linux/rcupdate.h:207
1 lock held by rsyslogd/7959:
#0: ffff8880a99f1e20 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x243/0x2e0
fs/file.c:801
2 locks held by getty/8049:
#0: ffff8880a602d450 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8050:
#0: ffff8880a902ac50 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f092e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8051:
#0: ffff88809c0f1750 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f312e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8052:
#0: ffff88809e344b90 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f152e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8053:
#0: ffff88809e344310 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f1d2e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8054:
#0: ffff88809c8aa410 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156
2 locks held by getty/8055:
#0: ffff8880a0267310 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:272
#1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x221/0x1b00 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1053 Comm: khungtaskd Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
nmi_cpu_backtrace+0xaf/0x1a0 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x174/0x290 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x10/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace+0x17/0x20 include/linux/nmi.h:146
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0xbb9/0xbd0 kernel/hung_task.c:289
kthread+0x332/0x350 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:check_preemption_disabled+0x6/0x2a0 lib/smp_processor_id.c:13
Code: 90 90 90 90 55 48 89 e5 e8 a7 b2 2e fe 48 c7 c7 79 91 38 88 48 c7 c6
10 72 4e 88 e8 04 00 00 00 5d c3 66 90 55 48 89 e5 41 57 <41> 56 41 55 41
54 53 48 83 ec 10 49 89 f6 49 89 ff e8 74 b2 2e fe
RSP: 0018:ffff8880aeb09ee0 EFLAGS: 00000093
RAX: ffffffff83448fc9 RBX: 0000000000000004 RCX: ffff8880a98c2340
RDX: 0000000000000000 RSI: ffffffff884e7210 RDI: ffffffff88389179
RBP: ffff8880aeb09ee8 R08: ffffffff816701e0 R09: fffffbfff117c3cd
R10: fffffbfff117c3cd R11: 0000000000000000 R12: 1ffff11015d64eaf
R13: ffff8880aeb00000 R14: dffffc0000000000 R15: ffff8880aeb2757c
FS: 0000000000000000(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000000a3d38000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:57
tick_nohz_stop_idle kernel/time/tick-sched.c:535 [inline]
tick_nohz_irq_enter kernel/time/tick-sched.c:1255 [inline]
tick_irq_enter+0xbd/0x3e0 kernel/time/tick-sched.c:1274
irq_enter+0x52/0xc0 kernel/softirq.c:354
scheduler_ipi+0xb3/0x4a0 kernel/sched/core.c:2337
smp_reschedule_interrupt+0x7a/0xa0 arch/x86/kernel/smp.c:244
reschedule_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:854
</IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: 3c fa eb ae 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 94 ab 3c
fa eb b0 90 90 e9 07 00 00 00 0f 00 2d b6 99 52 00 fb f4 <c3> 90 e9 07 00
00 00 0f 00 2d a6 99 52 00 f4 c3 90 90 55 48 89 e5
RSP: 0018:ffff8880a98cfe10 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff02
RAX: 1ffffffff1115179 RBX: ffff8880a98c2340 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff812ba81a RDI: ffff8880a98c2b80
RBP: ffff8880a98cfe18 R08: ffff8880a98c2b98 R09: ffffed1015318469
R10: ffffed1015318469 R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff11015318468 R14: dffffc0000000000 R15: 1ffffffff1115177
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x59/0xa0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x140/0x670 kernel/sched/idle.c:264
cpu_startup_entry+0x25/0x30 kernel/sched/idle.c:356
start_secondary+0x384/0x410 arch/x86/kernel/smpboot.c:264
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

Hillf Danton

unread,
Sep 22, 2019, 9:45:33 AM9/22/19
to syzbot, asma...@codewreck.org, da...@davemloft.net, eri...@gmail.com, linux-...@vger.kernel.org, lu...@ionkov.net, net...@vger.kernel.org, syzkall...@googlegroups.com, Hillf Danton, v9fs-de...@lists.sourceforge.net

On Sat, 21 Sep 2019 09:19:06 -0700
>
> syzbot has found a reproducer for the following crash on:
>
> HEAD commit: f97c81dc Merge tag 'armsoc-late' of git://git.kernel.org/p..
> git tree: upstream
> dashboard link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33
> compiler: clang version 9.0.0 (/home/glider/llvm/clang
> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+8b41a1...@syzkaller.appspotmail.com
>
> INFO: task syz-executor635:8085 blocked for more than 143 seconds.
> Not tainted 5.3.0+ #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor635 D27848 8085 8070 0x00004004
> Call Trace:
> context_switch kernel/sched/core.c:3384 [inline]
> __schedule+0x82e/0xc50 kernel/sched/core.c:4056
> schedule+0x131/0x1e0 kernel/sched/core.c:4123
> schedule_timeout+0x46/0x240 kernel/time/timer.c:1869
> do_wait_for_common+0x2e7/0x4d0 kernel/sched/completion.c:83
> __wait_for_common kernel/sched/completion.c:104 [inline]
> wait_for_common kernel/sched/completion.c:115 [inline]
> wait_for_completion+0x47/0x60 kernel/sched/completion.c:136
> __flush_work+0xd4/0x150 kernel/workqueue.c:3040
> __cancel_work_timer+0x420/0x570 kernel/workqueue.c:3127
> cancel_work_sync+0x17/0x20 kernel/workqueue.c:3163
> p9_conn_destroy net/9p/trans_fd.c:868 [inline]
> p9_fd_close+0x297/0x3c0 net/9p/trans_fd.c:898
> p9_client_create+0x974/0xee0 net/9p/client.c:1068
> v9fs_session_init+0x192/0x18e0 fs/9p/v9fs.c:406
> v9fs_mount+0x82/0x860 fs/9p/vfs_super.c:124
> legacy_get_tree+0xf9/0x1a0 fs/fs_context.c:659
> vfs_get_tree+0x8f/0x380 fs/super.c:1542
> do_new_mount fs/namespace.c:2825 [inline]
> do_mount+0x16c7/0x2510 fs/namespace.c:3145
> ksys_mount+0xcc/0x100 fs/namespace.c:3354
> __do_sys_mount fs/namespace.c:3368 [inline]
> __se_sys_mount fs/namespace.c:3365 [inline]
> __x64_sys_mount+0xbf/0xd0 fs/namespace.c:3365
> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe


On canceling a work, no barrier is needed if we did stole the work's
PENDING bit. An impetuous diff.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3123,8 +3123,11 @@ static bool __cancel_work_timer(struct w
* This allows canceling during early boot. We know that @work
* isn't executing.
*/
- if (wq_online)
- __flush_work(work, true);
+ if (wq_online) {
+ /* no need to insert barr if we stole PENDING */
+ if (!ret)
+ __flush_work(work, true);
+ }

clear_work_data(work);

--

And a saggy one.

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2996,6 +2996,10 @@ static bool start_flush_work(struct work

check_flush_dependency(pwq->wq, work);

+ /* no need to insert barr if we stole work's PENDING */
+ if (from_cancel && !worker && list_empty(&work->entry))
+ goto already_gone;
+
insert_wq_barrier(pwq, barr, work, worker);
spin_unlock_irq(&pool->lock);

--

Tetsuo Handa

unread,
Aug 26, 2022, 11:28:03 AM8/26/22
to Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.

Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
the file descriptor refers to a pipe. In other words, pipe file descriptor
needs to be handled as if socket file descriptor.

We somehow need to interrupt kernel_read()/kernel_write() on pipes.

A minimal change, which this patch is doing, is to set O_NONBLOCK flag
from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
of regular files. But this approach changes O_NONBLOCK flag on userspace-
supplied file descriptors (which might break userspace programs), and
O_NONBLOCK flag could be changed by userspace. It would be possible to set
O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
remains small race window for clearing O_NONBLOCK flag.

If we don't want to manipulate O_NONBLOCK flag, we might be able to
surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
processed by kernel threads which process global system_wq workqueue,
signals could not be delivered from remote threads when p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
needed if we count on signals for making kernel_read()/kernel_write()
non-blocking.

Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1]
Reported-by: syzbot <syzbot+8b41a1...@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+8b41a1...@syzkaller.appspotmail.com>
---
Although syzbot tested that this patch solves hung task problem, syzbot
cannot verify that this patch will not break functionality of p9 users.
Please test before applying this patch.

net/9p/trans_fd.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e758978b44be..9870597da583 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -821,11 +821,13 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
goto out_free_ts;
if (!(ts->rd->f_mode & FMODE_READ))
goto out_put_rd;
+ ts->rd->f_flags |= O_NONBLOCK;
ts->wr = fget(wfd);
if (!ts->wr)
goto out_put_rd;
if (!(ts->wr->f_mode & FMODE_WRITE))
goto out_put_wr;
+ ts->wr->f_flags |= O_NONBLOCK;

client->trans = ts;
client->status = Connected;
--
2.18.4

Tetsuo Handa

unread,
Aug 27, 2022, 2:12:06 AM8/27/22
to Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.

Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_{read,write}(). However, since p9_fd_open() does
not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_{read,write}() when the file
descriptor refers to a pipe. In other words, pipe file descriptor needs
to be handled as if socket file descriptor. We somehow need to interrupt
kernel_{read,write}() on pipes.

If we can tolerate "possibility of breaking userspace program by setting
O_NONBLOCK flag on userspace-supplied file descriptors" and "possibility
of race window that userspace program clears O_NONBLOCK flag between after
automatically setting O_NONBLOCK flag and before calling
kernel_{read,write}()", we could automatically set O_NONBLOCK flag
immediately before calling kernel_{read,write}().

A different approach, which this patch is doing, is to surround
kernel_{read,write}() with set_thread_flag(TIF_SIGPENDING) and
recalc_sigpending(). This might be ugly and bit costly, but does not
touch userspace-supplied file descriptors.

Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1]
Reported-by: syzbot <syzbot+8b41a1...@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+8b41a1...@syzkaller.appspotmail.com>
---
Although syzbot tested that this patch solves hung task problem, syzbot
cannot verify that this patch will not break functionality of p9 users.
Please test before applying this patch.

net/9p/trans_fd.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e758978b44be..e2f4e3245a80 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -256,11 +256,13 @@ static int p9_fd_read(struct p9_client *client, void *v, int len)
if (!ts)
return -EREMOTEIO;

- if (!(ts->rd->f_flags & O_NONBLOCK))
- p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
-
pos = ts->rd->f_pos;
+ /* Force non-blocking read() even without O_NONBLOCK. */
+ set_thread_flag(TIF_SIGPENDING);
ret = kernel_read(ts->rd, v, len, &pos);
+ spin_lock_irq(&current->sighand->siglock);
+ recalc_sigpending();
+ spin_unlock_irq(&current->sighand->siglock);
if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
client->status = Disconnected;
return ret;
@@ -423,10 +425,12 @@ static int p9_fd_write(struct p9_client *client, void *v, int len)
if (!ts)
return -EREMOTEIO;

- if (!(ts->wr->f_flags & O_NONBLOCK))
- p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
-
+ /* Force non-blocking write() even without O_NONBLOCK. */
+ set_thread_flag(TIF_SIGPENDING);
ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
+ spin_lock_irq(&current->sighand->siglock);
+ recalc_sigpending();
+ spin_unlock_irq(&current->sighand->siglock);
if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
client->status = Disconnected;
return ret;
--
2.18.4

Christian Schoenebeck

unread,
Sep 1, 2022, 11:23:32 AM9/1/22
to Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Tetsuo Handa, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org
On Samstag, 27. August 2022 08:11:48 CEST Tetsuo Handa wrote:
> syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
> from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
> started kernel_read() from p9_fd_read() from p9_read_work() and/or
> kernel_write() from p9_fd_write() from p9_write_work() requests.
>
> Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
> need to interrupt kernel_{read,write}(). However, since p9_fd_open() does
> not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
> p9_mux_poll_stop() needs to interrupt kernel_{read,write}() when the file
> descriptor refers to a pipe. In other words, pipe file descriptor needs
> to be handled as if socket file descriptor. We somehow need to interrupt
> kernel_{read,write}() on pipes.
>
> If we can tolerate "possibility of breaking userspace program by setting
> O_NONBLOCK flag on userspace-supplied file descriptors" and "possibility
> of race window that userspace program clears O_NONBLOCK flag between after
> automatically setting O_NONBLOCK flag and before calling
> kernel_{read,write}()", we could automatically set O_NONBLOCK flag
> immediately before calling kernel_{read,write}().
>
> A different approach, which this patch is doing, is to surround
> kernel_{read,write}() with set_thread_flag(TIF_SIGPENDING) and
> recalc_sigpending(). This might be ugly and bit costly, but does not
> touch userspace-supplied file descriptors.

So the intention in this alternative approach is to allow user space apps
still being able to perform blocking I/O, while at the same time making the
kernel thread interruptible to fix this hung task issue, correct?
Is the recalc_sigpending() block here actually needed? The TIF_SIGPENDING flag
is already cleared by net/9p/client.c, no?

Tetsuo Handa

unread,
Sep 1, 2022, 6:27:07 PM9/1/22
to Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org, linux-fsdevel
On 2022/09/02 0:23, Christian Schoenebeck wrote:
> So the intention in this alternative approach is to allow user space apps
> still being able to perform blocking I/O, while at the same time making the
> kernel thread interruptible to fix this hung task issue, correct?

Making the kernel thread "non-blocking" (rather than "interruptible") in order
not to be blocked on I/O on pipes.

Since kernel threads by default do not receive signals, being "interruptible"
or "killable" does not help (except for silencing khungtaskd warning). Being
"non-blocking" like I/O on sockets helps.

>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -256,11 +256,13 @@ static int p9_fd_read(struct p9_client *client, void
>> *v, int len) if (!ts)
>> return -EREMOTEIO;
>>
>> - if (!(ts->rd->f_flags & O_NONBLOCK))
>> - p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
>> -
>> pos = ts->rd->f_pos;
>> + /* Force non-blocking read() even without O_NONBLOCK. */
>> + set_thread_flag(TIF_SIGPENDING);
>> ret = kernel_read(ts->rd, v, len, &pos);
>> + spin_lock_irq(&current->sighand->siglock);
>> + recalc_sigpending();
>> + spin_unlock_irq(&current->sighand->siglock);
>
> Is the recalc_sigpending() block here actually needed? The TIF_SIGPENDING flag
> is already cleared by net/9p/client.c, no?

This is actually needed.

The thread which processes this function is a kernel workqueue thread which
is supposed to process other functions (which might call "interruptible"
functions even if signals are not received by default).

The thread which currently clearing the TIF_SIGPENDING flag is a user process
(which are calling "killable" functions from syscall context but effectively
"uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
(like p9_client_rpc() does) is very bad and needs to be avoided...

Dominique Martinet

unread,
Sep 3, 2022, 7:40:23 PM9/3/22
to Tetsuo Handa, Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org, linux-fsdevel
Thanks for the patch and sorry for the slow reply

v1 vs v2: my take is that I think v1 is easier to understand, and if you
pass a fd to be used as kernel end for 9p you shouldn't also be messing
with it so it's fair game to make it O_NONBLOCK -- we're reading and
writing to these things, the fds shouldn't be used by the caller after
the mount syscall.

Is there any reason you spent time working on v2, or is that just
theorical for not messing with userland fd ?

unless there's any reason I'll try to find time to test v1 and queue it
for 6.1

Tetsuo Handa wrote on Fri, Sep 02, 2022 at 07:25:30AM +0900:
> On 2022/09/02 0:23, Christian Schoenebeck wrote:
> > So the intention in this alternative approach is to allow user space apps
> > still being able to perform blocking I/O, while at the same time making the
> > kernel thread interruptible to fix this hung task issue, correct?
>
> Making the kernel thread "non-blocking" (rather than "interruptible") in order
> not to be blocked on I/O on pipes.
>
> Since kernel threads by default do not receive signals, being "interruptible"
> or "killable" does not help (except for silencing khungtaskd warning). Being
> "non-blocking" like I/O on sockets helps.

I'm still not 100% sure I understand the root of the deadlock, but I can
agree the worker thread shouldn't block.

We seem to check for EAGAIN where kernel_read/write end up being called
and there's a poll for scheduling so it -should- work, but I assume this
hasn't been tested much and might take a bit of time to test.


> The thread which currently clearing the TIF_SIGPENDING flag is a user process
> (which are calling "killable" functions from syscall context but effectively
> "uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
> By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
> (like p9_client_rpc() does) is very bad and needs to be avoided...

Yes, I really wish we could make this go away.
I started work to make the cancel (flush) asynchronous, but it
introduced a regression I never had (and still don't have) time to
figure out... If you have motivation to take over, the patches I sent
are here:
https://lore.kernel.org/all/20181217110111.GB17466@nautica/T/
(unfortunately some refactoring happened and they no longer apply, but
the logic should be mostly sane)


Thanks,
--
Dominique

Tetsuo Handa

unread,
Sep 3, 2022, 8:27:43 PM9/3/22
to Dominique Martinet, Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org, linux-fsdevel
On 2022/09/04 8:39, Dominique Martinet wrote:
> Is there any reason you spent time working on v2, or is that just
> theorical for not messing with userland fd ?

Just theoretical for not messing with userland fd, for programs generated
by fuzzers might use fds passed to the mount() syscall. I imagined that
syzbot again reports this problem when it started playing with fcntl().

For robustness, not messing with userland fd is the better. ;-)

>
> unless there's any reason I'll try to find time to test v1 and queue it
> for 6.1

OK.

> We seem to check for EAGAIN where kernel_read/write end up being called
> and there's a poll for scheduling so it -should- work, but I assume this
> hasn't been tested much and might take a bit of time to test.

Right. Since the I/O in kernel side is poll based multiplexing, forcing
non-blocking I/O -should- work. (But I can't test e.g. changes in CPU time
usage because I don't have environment to test. I assume that poll based
multiplexing saves us from doing busy looping.)

We are currently checking for ERESTARTSYS and EAGAIN. The former is for
non-socket fds which do not have O_NONBLOCK flag, and the latter is for
socket fds which have O_NONBLOCK flag. If we enforce O_NONBLOCK flag,
the former will become redundant. I think we can remove the former check
after you tested that setting O_NONBLOCK flag on non-socket fds does not break.

Christian Schoenebeck

unread,
Oct 6, 2022, 10:56:21 AM10/6/22
to Dominique Martinet, Tetsuo Handa, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org
I would also prefer this simpler v1 instead of v2 for now. One nitpicking ...

> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index e758978b44be..9870597da583 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -821,11 +821,13 @@ static int p9_fd_open(struct p9_client *client, int
> rfd, int wfd) goto out_free_ts;
> if (!(ts->rd->f_mode & FMODE_READ))
> goto out_put_rd;
> + ts->rd->f_flags |= O_NONBLOCK;

... I think this deserves a short comment like:

/* prevent hung task with pipes */

Anyway,

Reviewed-by: Christian Schoenebeck <linu...@crudebyte.com>

Dominique Martinet

unread,
Oct 6, 2022, 9:04:23 PM10/6/22
to Christian Schoenebeck, Tetsuo Handa, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org
Christian Schoenebeck wrote on Thu, Oct 06, 2022 at 04:55:23PM +0200:
> > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> > index e758978b44be..9870597da583 100644
> > --- a/net/9p/trans_fd.c
> > +++ b/net/9p/trans_fd.c
> > @@ -821,11 +821,13 @@ static int p9_fd_open(struct p9_client *client, int
> > rfd, int wfd) goto out_free_ts;
> > if (!(ts->rd->f_mode & FMODE_READ))
> > goto out_put_rd;
> > + ts->rd->f_flags |= O_NONBLOCK;
>
> ... I think this deserves a short comment like:
>
> /* prevent hung task with pipes */

Good point, I've sneaked in this comment:
/* prevent workers from hanging on IO when fd is a pipe */

https://github.com/martinetd/linux/commit/ef575281b21e9a34dfae544a187c6aac2ae424a9


> Reviewed-by: Christian Schoenebeck <linu...@crudebyte.com>

Thank you!

--
Dominique

Dominique Martinet

unread,
Oct 6, 2022, 9:40:51 PM10/6/22
to Tetsuo Handa, Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org, linux-fsdevel
Tetsuo Handa wrote on Sun, Sep 04, 2022 at 09:27:22AM +0900:
> On 2022/09/04 8:39, Dominique Martinet wrote:
> > Is there any reason you spent time working on v2, or is that just
> > theorical for not messing with userland fd ?
>
> Just theoretical for not messing with userland fd, for programs generated
> by fuzzers might use fds passed to the mount() syscall. I imagined that
> syzbot again reports this problem when it started playing with fcntl().
>
> For robustness, not messing with userland fd is the better. ;-)

By the way digging this back made me think about this a bit again.
My opinion hasn't really changed that if you want to shoot yourself in
the foot I don't think we're crossing any priviledge boundary here, but
we could probably prevent it by saying the mount call with close that fd
and somehow steal it? (drop the fget, close_fd after get_file perhaps?)

That should address your concern about robustess and syzbot will no
longer be able to play with fcntl on "our" end of the pipe. I think it's
fair to say that once you pass it to the kernel all bets are off, so
closing it for the userspace application could make sense, and the mount
already survives when short processes do the mount call and immediately
exit so it's not like we need that fd to be open...


What do you think?

(either way would be for 6.2, the patch is already good enough as is for
me)
--
Dominique

Tetsuo Handa

unread,
Oct 7, 2022, 7:53:14 AM10/7/22
to Dominique Martinet, Alexander Viro, Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov, syzbot, v9fs-de...@lists.sourceforge.net, syzkall...@googlegroups.com, net...@vger.kernel.org, linux-fsdevel
I found that pipe is using alloc_file_clone() which allocates "struct file"
instead of just incrementing "struct file"->f_count.

Then, can we add EXPORT_SYMBOL_GPL(alloc_file_clone) to fs/file_table.c and
use it like

struct file *f;

ts->rd = fget(rfd);
if (!ts->rd)
goto out_free_ts;
if (!(ts->rd->f_mode & FMODE_READ))
goto out_put_rd;
f = alloc_file_clone(ts->rd, ts->rd->f_flags | O_NONBLOCK, ts->rd->f_op);
if (IS_ERR(f))
goto out_put_rd;
fput(ts->rd);
ts->rd = f;

ts->wr = fget(wfd);
if (!ts->wr)
goto out_put_rd;
if (!(ts->wr->f_mode & FMODE_WRITE))
goto out_put_wr;
f = alloc_file_clone(ts->wr, ts->wr->f_flags | O_NONBLOCK, ts->wr->f_op);
if (IS_ERR(f))
goto out_put_wr;
fput(ts->wr);
ts->wr = f;

from p9_fd_open() for cloning "struct file" with O_NONBLOCK flag added?
Just an idea. I don't know whether alloc_file_clone() arguments are correct...

Reply all
Reply to author
Forward
0 new messages