INFO: rcu detected stall in n_tty_receive_char_special

125 views
Skip to first unread message

syzbot

unread,
Apr 6, 2018, 3:12:01 AM4/6/18
to gre...@linuxfoundation.org, jsl...@suse.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot hit the following crash on upstream commit
3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 (Sat Mar 31 01:52:36 2018 +0000)
kernel.h: Retain constant expression output for max()/min()
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5836679554269184
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-1647968177339044852
compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+18df35...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 1, t=125007 jiffies, g=42488, c=42487, q=11)
All QSes seen, last rcu_sched kthread activity 125014
(4295022441-4294897427), jiffies_till_next_fqs=3, root ->qsmask 0x0
kworker/u4:5 R running task 15272 8806 2 0x80000008
Workqueue: events_unbound flush_to_ldisc
Call Trace:
<IRQ>
sched_show_task.cold.87+0x27a/0x301 kernel/sched/core.c:5325
print_other_cpu_stall.cold.79+0x92f/0x9d2 kernel/rcu/tree.c:1481
check_cpu_stall.isra.61+0x706/0xf50 kernel/rcu/tree.c:1599
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:171
tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1179
__run_hrtimer kernel/time/hrtimer.c:1337 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1399
hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1457
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
</IRQ>
RIP: 0010:echo_char+0xae/0x2e0 drivers/tty/n_tty.c:915
RSP: 0018:ffff8801d33e71e0 EFLAGS: 00000a07 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffffc90013158000 RCX: ffffffff8375b1b7
RDX: 1ffff1003ad87636 RSI: ffffffff8375b1c6 RDI: ffff8801d6c3b1b4
RBP: ffff8801d33e7210 R08: ffff8801cf482540 R09: fffff5200262b460
R10: fffff5200262b460 R11: ffffc9001315a307 R12: 00000000000000cb
R13: ffff8801d6c3ae00 R14: 00000000c240f0bb R15: 00000000000000bb
n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
__receive_buf drivers/tty/n_tty.c:1611 [inline]
n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
n_tty_receive_buf2+0x33/0x40 drivers/tty/n_tty.c:1744
tty_ldisc_receive_buf+0xb0/0x190 drivers/tty/tty_buffer.c:456
tty_port_default_receive_buf+0x110/0x170 drivers/tty/tty_port.c:38
receive_buf drivers/tty/tty_buffer.c:475 [inline]
flush_to_ldisc+0x3e9/0x560 drivers/tty/tty_buffer.c:524
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
rcu_sched kthread starved for 125626 jiffies! g42488 c42487 f0x2
RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
RCU grace-period kthread stack dump:
rcu_sched R running task 23592 9 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2848 [inline]
__schedule+0x807/0x1e40 kernel/sched/core.c:3490
schedule+0xef/0x430 kernel/sched/core.c:3549
schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Dmitry Vyukov

unread,
Apr 6, 2018, 3:13:31 AM4/6/18
to syzbot, Greg Kroah-Hartman, Jiri Slaby, LKML, syzkall...@googlegroups.com
On Fri, Apr 6, 2018 at 9:12 AM, syzbot
<syzbot+18df35...@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 (Sat Mar 31 01:52:36 2018 +0000)
> kernel.h: Retain constant expression output for max()/min()
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=18df353d7540aa6b5467
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5836679554269184
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-1647968177339044852
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+18df35...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
But I am not sure because stall stacks are somewhat different.
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/883d24f7d4ccc52e19056928c5be%40google.com.
> For more options, visit https://groups.google.com/d/optout.

syzbot

unread,
Apr 15, 2018, 9:43:01 AM4/15/18
to dvy...@google.com, gre...@linuxfoundation.org, jsl...@suse.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
syzbot has found reproducer for the following crash on upstream commit
18b7fd1c93e5204355ddbf2608a097d64df81b88 (Sat Apr 14 15:50:50 2018 +0000)
Merge branch 'akpm' (patches from Andrew)
So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5283196245639168
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=5978502260064256
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=4993708470566912
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-8852471259444315113
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+18df35...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

INFO: rcu_sched self-detected stall on CPU
1-....: (124999 ticks this GP) idle=792/1/4611686018427387906
softirq=9920/9920 fqs=31243
(t=125000 jiffies g=4830 c=4829 q=20)
NMI backtrace for cpu 1
CPU: 1 PID: 4467 Comm: syzkaller756606 Not tainted 4.16.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
print_cpu_stall kernel/rcu/tree.c:1525 [inline]
check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
__run_hrtimer kernel/time/hrtimer.c:1386 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
</IRQ>
RIP: 0010:echo_char+0x96/0x2e0 drivers/tty/n_tty.c:915
RSP: 0018:ffff8801b079f6f8 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff8801b06ee1c0 RBX: ffffc90001e64000 RCX: ffffffff837ab4a7
RDX: 0000000000000000 RSI: ffffffff837ab4b6 RDI: 0000000000000001
RBP: ffff8801b079f728 R08: ffff8801b06ee1c0 R09: 0000000000000001
R10: ffff8801b06eea20 R11: ffff8801b06ee1c0 R12: 000000000000000f
R13: ffff8801b02b0dc0 R14: 000000012b1bcc47 R15: 0000000000000c47
n_tty_receive_char_special+0x13b3/0x31c0 drivers/tty/n_tty.c:1306
n_tty_receive_buf_fast drivers/tty/n_tty.c:1577 [inline]
__receive_buf drivers/tty/n_tty.c:1611 [inline]
n_tty_receive_buf_common+0x20ca/0x2c50 drivers/tty/n_tty.c:1709
n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1738
tiocsti drivers/tty/tty_io.c:2171 [inline]
tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:500 [inline]
do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
SYSC_ioctl fs/ioctl.c:708 [inline]
SyS_ioctl+0x24/0x30 fs/ioctl.c:706
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4404b9
RSP: 002b:00007ffee94bd7f8 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000200006c0 RCX: 00000000004404b9
RDX: 00000000200000c0 RSI: 0000000000005412 RDI: 0000000000000012
RBP: 6d74702f7665642f R08: 000000000000000a R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000401c30
R13: 0000000000401cc0 R14: 0000000000000000 R15: 0000000000000000

Tetsuo Handa

unread,
Apr 21, 2018, 11:35:13 AM4/21/18
to dvy...@google.com, gre...@linuxfoundation.org, jsl...@suse.com, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Dmitry Vyukov wrote:
> This looks somewhat similar to "INFO: rcu detected stall in __process_echoes":
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> But I am not sure because stall stacks are somewhat different.

Yes, this will be the same one which can be reproduced using same reproducer
provided by syzbot. You can try below one like "while :; do echo -n .; ./a.out; done".

----------------------------------------
#include <fcntl.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int fd[2];
unsigned int zero = 0;
char buf[128];
int ptyno = 0;
char c = 0;
char data[64] = { };
while (1) {
fd[0] = open("/dev/ptmx", O_RDWR);
ioctl(fd[0], TIOCSPTLCK, &zero);
write(fd[0], data, sizeof(data));
if (ioctl(fd[0], TIOCGPTN, &ptyno))
break;
sprintf(buf, "/dev/pts/%d", ptyno);
fd[1] = open(buf, O_RDONLY);
if (fd[1] == EOF)
break;
ioctl(fd[1], TIOCSTI, &c);
}
return 1;
}
----------------------------------------

Since tty subsystem is prone to race condition, same reproducer can hit
various different results.

[ 524.349009] a.out R running task 0 4052 1 0x80000084
[ 524.350879] Call Trace:
[ 524.351816] <IRQ>
[ 524.352672] sched_show_task+0x16b/0x230
[ 524.353864] show_state_filter+0x9d/0x190
[ 524.355098] sysrq_handle_showstate+0x7/0x10
[ 524.356342] __handle_sysrq+0x123/0x200
[ 524.357566] sysrq_filter+0x356/0x390
[ 524.358710] input_to_handler+0x4b/0xf0
[ 524.359911] input_pass_values.part.2+0x1ba/0x260
[ 524.361267] input_handle_event+0x11a/0x4f0
[ 524.362504] input_event+0x45/0x60
[ 524.363601] atkbd_interrupt+0x41a/0x6f0
[ 524.364815] serio_interrupt+0x3b/0x80
[ 524.365975] i8042_interrupt+0x1e8/0x3a0
[ 524.367197] ? _raw_spin_unlock_irq+0x24/0x40
[ 524.368456] __handle_irq_event_percpu+0x5b/0xc0
[ 524.369797] handle_irq_event_percpu+0x1b/0x50
[ 524.371102] handle_irq_event+0x2f/0x50
[ 524.372245] handle_edge_irq+0xcc/0x1a0
[ 524.373433] handle_irq+0x11/0x20
[ 524.374519] do_IRQ+0x52/0x100
[ 524.375519] common_interrupt+0xf/0xf
[ 524.376648] </IRQ>
[ 524.377473] RIP: 0010:n_tty_receive_char_special+0x4d1/0xad0
[ 524.379032] RSP: 0018:ffffc90006d2bd08 EFLAGS: 00000293 ORIG_RAX: ffffffffffffffd2
[ 524.380957] RAX: ffffc90010f7d000 RBX: ffffc90010f7d000 RCX: 00000011c8d443f0
[ 524.382783] RDX: 00000000000003ef RSI: ffff880126ce3800 RDI: 0000000000000000
[ 524.384644] RBP: ffff880126ce3800 R08: 0000000000000012 R09: 0000000000000000
[ 524.386479] R10: 0000000000000000 R11: ffffffff814184f3 R12: 0000000000000012
[ 524.388300] R13: ffff880126ce3800 R14: 0000000a13a9e288 R15: 0000000000000012
[ 524.390158] ? n_tty_receive_buf_common+0x43/0xd00
[ 524.391567] ? n_tty_receive_char_special+0x4ea/0xad0
[ 524.393052] ? n_tty_receive_buf_common+0x43/0xd00
[ 524.394464] n_tty_receive_buf_common+0xa43/0xd00
[ 524.395832] tty_ioctl+0x3e7/0x910
[ 524.397041] ? trace_hardirqs_on_caller+0xe7/0x180
[ 524.398512] do_vfs_ioctl+0x8a/0x650
[ 524.399666] ? syscall_trace_enter+0x19e/0x250
[ 524.400977] ksys_ioctl+0x6b/0x80
[ 524.402102] __x64_sys_ioctl+0x11/0x20
[ 524.403327] do_syscall_64+0x4f/0x1f0
[ 524.404494] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 524.660246] 7 locks held by a.out/4052:
[ 524.661533] #0: 00000000eb1b6dd2 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 524.663752] #1: 000000005920aabb (&o_tty->termios_rwsem/1){++++}, at: n_tty_receive_buf_common+0x43/0xd00
[ 524.666284] #2: 00000000fc85438a (&serio->lock){-.-.}, at: serio_interrupt+0x1f/0x80
[ 524.668422] #3: 00000000806da945 (&(&dev->event_lock)->rlock){-.-.}, at: input_event+0x32/0x60
[ 524.670712] #4: 000000005906d89a (rcu_read_lock){....}, at: input_pass_values.part.2+0x0/0x260
[ 524.673017] #5: 000000005906d89a (rcu_read_lock){....}, at: __handle_sysrq+0x0/0x200
[ 524.675144] #6: 00000000ed0ed6ec (tasklist_lock){.?.+}, at: debug_show_all_locks+0x37/0x190



[ 122.170234] a.out R running task 0 2818 1 0x80000084
[ 122.172442] Call Trace:
[ 122.173402] <IRQ>
[ 122.174271] sched_show_task+0x16b/0x230
[ 122.175512] show_state_filter+0x9d/0x190
[ 122.176693] sysrq_handle_showstate+0x7/0x10
[ 122.177924] __handle_sysrq+0x123/0x200
[ 122.179122] sysrq_filter+0x356/0x390
[ 122.180237] input_to_handler+0x4b/0xf0
[ 122.181385] input_pass_values.part.2+0x1ba/0x260
[ 122.182769] input_handle_event+0x11a/0x4f0
[ 122.184131] input_event+0x45/0x60
[ 122.185191] atkbd_interrupt+0x41a/0x6f0
[ 122.186350] serio_interrupt+0x3b/0x80
[ 122.187502] i8042_interrupt+0x1e8/0x3a0
[ 122.188668] __handle_irq_event_percpu+0x5b/0xc0
[ 122.189966] handle_irq_event_percpu+0x1b/0x50
[ 122.191384] handle_irq_event+0x2f/0x50
[ 122.192522] handle_edge_irq+0xcc/0x1a0
[ 122.193679] handle_irq+0x11/0x20
[ 122.194754] do_IRQ+0x52/0x100
[ 122.195764] common_interrupt+0xf/0xf
[ 122.196875] RIP: 0010:__do_softirq+0xa0/0x233
[ 122.198102] RSP: 0018:ffff88013fc03f68 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd3
[ 122.200212] RAX: ffff8801379bad80 RBX: 00000000ffffffff RCX: 0000000000000000
[ 122.202019] RDX: 0000000000000d84 RSI: ffff8801379bb5b8 RDI: ffff8801379bad80
[ 122.203886] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 122.205670] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 122.207536] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[ 122.209307] ? common_interrupt+0xa/0xf
[ 122.210533] ? __do_softirq+0x9a/0x233
[ 122.211711] ? clockevents_program_event+0x76/0xe0
[ 122.213047] irq_exit+0xdc/0xe0
[ 122.214156] smp_apic_timer_interrupt+0x4d/0x80
[ 122.215446] apic_timer_interrupt+0xf/0x20
[ 122.216671] </IRQ>
[ 122.217497] RIP: 0010:echo_char+0x29/0x90
[ 122.218744] RSP: 0018:ffffc90006787d00 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
[ 122.220616] RAX: ffffc90006959000 RBX: ffffc90006959000 RCX: 000000019b8c0c19
[ 122.222449] RDX: 0000000000000000 RSI: ffff880138ee1800 RDI: 0000000000000000
[ 122.224247] RBP: ffff880138ee1800 R08: 0000000000000012 R09: 0000000000000000
[ 122.226116] R10: 0000000000000000 R11: ffffffff814184f3 R12: 0000000000000012
[ 122.227921] R13: ffff880138ee1800 R14: 00000000d59f6933 R15: 0000000000000012
[ 122.229862] ? n_tty_receive_buf_common+0x43/0xd00
[ 122.231235] n_tty_receive_char_special+0x4ea/0xad0
[ 122.232622] ? n_tty_receive_buf_common+0x43/0xd00
[ 122.234084] n_tty_receive_buf_common+0xa43/0xd00
[ 122.235449] tty_ioctl+0x3e7/0x910
[ 122.236565] ? trace_hardirqs_on_caller+0xe7/0x180
[ 122.237993] do_vfs_ioctl+0x8a/0x650
[ 122.239157] ? syscall_trace_enter+0x19e/0x250
[ 122.240467] ksys_ioctl+0x6b/0x80
[ 122.241633] __x64_sys_ioctl+0x11/0x20
[ 122.242809] do_syscall_64+0x4f/0x1f0
[ 122.243969] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 122.624858] 7 locks held by a.out/2818:
[ 122.626110] #0: 00000000d2e12ac0 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 122.628238] #1: 00000000c1852a2e (&o_tty->termios_rwsem/1){++++}, at: n_tty_receive_buf_common+0x43/0xd00
[ 122.630579] #2: 00000000019e0dd4 (&serio->lock){-.-.}, at: serio_interrupt+0x1f/0x80
[ 122.632634] #3: 00000000165952ed (&(&dev->event_lock)->rlock){-.-.}, at: input_event+0x32/0x60
[ 122.634823] #4: 0000000013aedeb2 (rcu_read_lock){....}, at: input_pass_values.part.2+0x0/0x260
[ 122.637122] #5: 0000000013aedeb2 (rcu_read_lock){....}, at: __handle_sysrq+0x0/0x200
[ 122.639133] #6: 0000000094121093 (tasklist_lock){.?.+}, at: debug_show_all_locks+0x37/0x190



[ 701.099036] a.out D 0 4365 1 0x80000082
[ 701.100533] Call Trace:
[ 701.101735] ? __schedule+0x278/0x750
[ 701.103178] ? _raw_spin_unlock_irq+0x24/0x40
[ 701.104611] schedule+0x34/0x80
[ 701.105755] rwsem_down_write_failed+0x10a/0x230
[ 701.107228] ? call_rwsem_down_write_failed+0x13/0x20
[ 701.108781] call_rwsem_down_write_failed+0x13/0x20
[ 701.110544] ? n_tty_flush_buffer+0x14/0x50
[ 701.111905] down_write+0x3b/0x50
[ 701.113148] ? n_tty_flush_buffer+0x14/0x50
[ 701.114550] n_tty_flush_buffer+0x14/0x50
[ 701.115962] tty_ldisc_hangup+0x2a/0x1d0
[ 701.117270] __tty_hangup+0x13f/0x2b0
[ 701.118820] tty_release+0xf0/0x500
[ 701.120450] ? lock_acquire+0x43/0x60
[ 701.122099] __fput+0x95/0x1d0
[ 701.123377] task_work_run+0x84/0xa0
[ 701.124632] do_exit+0x2cc/0xac0
[ 701.125713] ? syscall_trace_enter+0x19e/0x250
[ 701.127039] do_group_exit+0x38/0xb0
[ 701.128469] __x64_sys_exit_group+0xf/0x10
[ 701.129795] do_syscall_64+0x4f/0x1f0
[ 701.130946] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 701.635816] 4 locks held by a.out/4365:
[ 701.637551] #0: 0000000051b314cc (&tty->legacy_mutex){+.+.}, at: tty_release+0x4d/0x500
[ 701.639958] #1: 00000000fcddaa23 (&tty->legacy_mutex/1){+.+.}, at: __tty_hangup+0x5c/0x2b0
[ 701.642259] #2: 000000005a4dd91b (&tty->ldisc_sem){++++}, at: tty_ldisc_ref+0x15/0x50
[ 701.645185] #3: 00000000997a5bc4 (&o_tty->termios_rwsem/1){++++}, at: n_tty_flush_buffer+0x14/0x50



[ 264.159739] a.out R running task 0 8464 1 0x80000084
[ 264.159739] Call Trace:
[ 264.159739] <IRQ>
[ 264.159739] sched_show_task+0x16b/0x230
[ 264.159739] show_state_filter+0x9d/0x190
[ 264.159739] sysrq_handle_showstate+0x7/0x10
[ 264.159739] __handle_sysrq+0x123/0x200
[ 264.159739] sysrq_filter+0x356/0x390
[ 264.159739] input_to_handler+0x4b/0xf0
[ 264.159739] input_pass_values.part.2+0x1ba/0x260
[ 264.159739] input_handle_event+0x11a/0x4f0
[ 264.159739] input_event+0x45/0x60
[ 264.159739] atkbd_interrupt+0x41a/0x6f0
[ 264.159739] serio_interrupt+0x3b/0x80
[ 264.159739] i8042_interrupt+0x1e8/0x3a0
[ 264.159739] ? _raw_spin_unlock_irq+0x24/0x40
[ 264.159739] __handle_irq_event_percpu+0x5b/0xc0
[ 264.159739] handle_irq_event_percpu+0x1b/0x50
[ 264.159739] handle_irq_event+0x2f/0x50
[ 264.159739] handle_edge_irq+0xcc/0x1a0
[ 264.159739] handle_irq+0x11/0x20
[ 264.159739] do_IRQ+0x52/0x100
[ 264.159739] common_interrupt+0xf/0xf
[ 264.159739] </IRQ>
[ 264.159739] RIP: 0010:__process_echoes+0xc4/0x260
[ 264.159739] RSP: 0018:ffffc90003257d18 EFLAGS: 00000296 ORIG_RAX: ffffffffffffffd3
[ 264.159739] RAX: 0000000000000004 RBX: 00000009abab4dd2 RCX: 00000009abab4dce
[ 264.159739] RDX: fffffff65454b232 RSI: ffff880137fed800 RDI: 0000000000000000
[ 264.159739] RBP: ffffc90003305000 R08: 0000000000000000 R09: 0000000000000001
[ 264.159739] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 264.159739] R13: ffff880137fed800 R14: 0000000000001f00 R15: 0000000000000000
[ 264.159739] ? __process_echoes+0x14c/0x260
[ 264.159739] n_tty_receive_buf_common+0x4b6/0xd00
[ 264.159739] tty_ioctl+0x3e7/0x910
[ 264.159739] ? trace_hardirqs_on_caller+0xe7/0x180
[ 264.159739] do_vfs_ioctl+0x8a/0x650
[ 264.159739] ? syscall_trace_enter+0x19e/0x250
[ 264.159739] ksys_ioctl+0x6b/0x80
[ 264.159739] __x64_sys_ioctl+0x11/0x20
[ 264.159739] do_syscall_64+0x4f/0x1f0
[ 264.159739] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 264.159739] 8 locks held by a.out/8464:
[ 264.159739] #0: 000000000a54adcf (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 264.159739] #1: 000000002abad3ab (&o_tty->termios_rwsem/1){++++}, at: n_tty_receive_buf_common+0x43/0xd00
[ 264.159739] #2: 00000000705a196b (&ldata->output_lock){+.+.}, at: n_tty_receive_buf_common+0x4a4/0xd00
[ 264.159739] #3: 000000008434776a (&serio->lock){-.-.}, at: serio_interrupt+0x1f/0x80
[ 264.159739] #4: 00000000710258a9 (&(&dev->event_lock)->rlock){-.-.}, at: input_event+0x32/0x60
[ 264.159739] #5: 000000004b5299a2 (rcu_read_lock){....}, at: input_pass_values.part.2+0x0/0x260
[ 264.159739] #6: 000000004b5299a2 (rcu_read_lock){....}, at: __handle_sysrq+0x0/0x200
[ 264.159739] #7: 00000000082c04ee (tasklist_lock){.?.+}, at: debug_show_all_locks+0x37/0x190

Tetsuo Handa

unread,
Apr 23, 2018, 12:01:28 PM4/23/18
to gre...@linuxfoundation.org, jsl...@suse.com, Peter Hurley, dvy...@google.com, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
for all input workers") added in 4.4-rc1.

Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
reverting that commit on top of 4.16 only helped surviving the test longer
than vanilla 4.16; still stalls at __process_echoes(). There might be similar
changes which need to be reverted as well.



# bad: [8005c49d9aea74d382f474ce11afbbc7d7130bec] Linux 4.4-rc1
# good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
# good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
# good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18
# good: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
# good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
# good: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect start 'HEAD' 'v4.3' 'v4.2' 'v4.1' 'v4.0' 'v3.19' 'v3.18' 'v3.17' 'v3.16' 'v3.15' 'v3.14' 'v3.13' 'v3.12' '--' 'drivers/tty/'
# good: [cc32382d9fd22dc8eebba4a245f50417267bda8e] serial: imx: make setup_ufcr more useful
git bisect good cc32382d9fd22dc8eebba4a245f50417267bda8e
# good: [32ede4a51754cb62b0d43d91cb7c4e3c57069a9c] serial: tegra: Add helper function for handling RX buffer
git bisect good 32ede4a51754cb62b0d43d91cb7c4e3c57069a9c
# good: [4b41b9539a1e9531f942ededfcdcff372317d2e7] tty: Prevent tty teardown during tty_write_message()
git bisect good 4b41b9539a1e9531f942ededfcdcff372317d2e7
# bad: [fd0d351de7bbd718bc2b34d5846854831aa2b88c] Merge tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad fd0d351de7bbd718bc2b34d5846854831aa2b88c
# good: [14c79092909a52b6fd6394b6ad5e7756c4f9565e] Merge branch 'parisc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good 14c79092909a52b6fd6394b6ad5e7756c4f9565e
# good: [41ecf1404b34d9975eb97f5005d9e4274eaeb76a] Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good 41ecf1404b34d9975eb97f5005d9e4274eaeb76a
# bad: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound workqueue for all input workers
git bisect bad e052c6d15c61cc4caff2f06cbca72b183da9f15e
# good: [e176058f0de53c2346734e5254835e0045364001] tty: Abstract tty buffer work
git bisect good e176058f0de53c2346734e5254835e0045364001
# first bad commit: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound workqueue for all input workers

Tetsuo Handa

unread,
Apr 24, 2018, 8:25:24 AM4/24/18
to gre...@linuxfoundation.org, jsl...@suse.com, Peter Hurley, dvy...@google.com, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2018/04/24 1:01, Tetsuo Handa wrote:
> Bisection reached to commit e052c6d15c61cc4c ("tty: Use unbound workqueue
> for all input workers") added in 4.4-rc1.
>
> Reverting that commit on top of 4.4 solved the lockup. But unfortunately,
> reverting that commit on top of 4.16 only helped surviving the test longer
> than vanilla 4.16; still stalls at __process_echoes(). There might be similar
> changes which need to be reverted as well.

I noticed that when stall inside __process_echoes() occurs, it is looping at
discard loop added by commit cbfd0340ae199337 ("n_tty: Process echoes in blocks").
Since ldata->echo_commit < ldata->echo_tail is for some reason true upon entry,
the discard loop serves as almost infinite loop.

--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -613,12 +613,21 @@ static size_t __process_echoes(struct tty_struct *tty)
int space, old_space;
size_t tail;
unsigned char c;
+ unsigned long start = jiffies;

old_space = space = tty_write_room(tty);

tail = ldata->echo_tail;
+ if (ldata->echo_commit < tail)
+ printk("BUG1: ldata->echo_commit=%ld tail=%ld space=%d old_space=%d\n",
+ ldata->echo_commit, tail, space, old_space);
while (ldata->echo_commit != tail) {
c = echo_buf(ldata, tail);
+ if (time_after(jiffies, start + 5 * HZ)) {
+ printk("ldata->echo_commit=%ld tail=%ld c=%c space=%d old_space=%d\n",
+ ldata->echo_commit, tail, c, space, old_space);
+ start = jiffies;
+ }
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;
@@ -729,6 +738,11 @@ static size_t __process_echoes(struct tty_struct *tty)
* of echo overrun before the next commit), then discard enough
* data at the tail to prevent a subsequent overrun */
while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+ if (time_after(jiffies, start + 5 * HZ)) {
+ printk("ldata->echo_commit=%ld ldata->echo_tail=%ld tail=%ld space=%d old_space=%d\n",
+ ldata->echo_commit, ldata->echo_tail, tail, space, old_space);
+ start = jiffies;
+ }
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -739,6 +753,9 @@ static size_t __process_echoes(struct tty_struct *tty)
}

ldata->echo_tail = tail;
+ if (ldata->echo_commit < tail)
+ printk("BUG2: ldata->echo_commit=%ld tail=%ld space=%d old_space=%d\n",
+ ldata->echo_commit, tail, space, old_space);
return old_space - space;
}



[ 5.890788] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 27.357324] BUG1: ldata->echo_commit=120 tail=130 space=7936 old_space=7936
[ 32.358527] ldata->echo_commit=120 ldata->echo_tail=130 tail=2994833619 space=0 old_space=7936
[ 37.362404] ldata->echo_commit=120 ldata->echo_tail=130 tail=5930715049 space=0 old_space=7936
[ 42.366124] ldata->echo_commit=120 ldata->echo_tail=130 tail=8943875147 space=0 old_space=7936
[ 47.370508] ldata->echo_commit=120 ldata->echo_tail=130 tail=11950164125 space=0 old_space=7936
[ 52.374224] ldata->echo_commit=120 ldata->echo_tail=130 tail=14880807667 space=0 old_space=7936
[ 57.378597] ldata->echo_commit=120 ldata->echo_tail=130 tail=17821338400 space=0 old_space=7936
[ 62.383638] ldata->echo_commit=120 ldata->echo_tail=130 tail=20811717629 space=0 old_space=7936
[ 63.203807] sysrq: SysRq : Resetting

[ 5.843667] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 28.306346] BUG1: ldata->echo_commit=128 tail=130 space=7936 old_space=7936
[ 33.307133] ldata->echo_commit=128 ldata->echo_tail=130 tail=2892537244 space=0 old_space=7936
[ 38.311478] ldata->echo_commit=128 ldata->echo_tail=130 tail=5766457276 space=0 old_space=7936
[ 43.315319] ldata->echo_commit=128 ldata->echo_tail=130 tail=8686538551 space=0 old_space=7936
[ 48.319164] ldata->echo_commit=128 ldata->echo_tail=130 tail=11556458920 space=0 old_space=7936
[ 50.122017] sysrq: SysRq : Resetting

[ 6.026322] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 38.149864] BUG1: ldata->echo_commit=118 tail=130 space=7936 old_space=7936
[ 43.150167] ldata->echo_commit=118 ldata->echo_tail=130 tail=2789490540 space=0 old_space=7936
[ 48.154012] ldata->echo_commit=118 ldata->echo_tail=130 tail=5592467000 space=0 old_space=7936
[ 53.158365] ldata->echo_commit=118 ldata->echo_tail=130 tail=8376906144 space=0 old_space=7936
[ 58.162211] ldata->echo_commit=118 ldata->echo_tail=130 tail=11158984120 space=0 old_space=7936
[ 60.487338] sysrq: SysRq : Resetting

[ 9.170831] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 25.929908] BUG1: ldata->echo_commit=120 tail=130 space=7936 old_space=7936
[ 30.930553] ldata->echo_commit=120 ldata->echo_tail=130 tail=3008469343 space=0 old_space=7936
[ 35.935475] ldata->echo_commit=120 ldata->echo_tail=130 tail=5857403750 space=0 old_space=7936
[ 40.940153] ldata->echo_commit=120 ldata->echo_tail=130 tail=8858176994 space=0 old_space=7936
[ 45.944527] ldata->echo_commit=120 ldata->echo_tail=130 tail=11838898685 space=0 old_space=7936
[ 48.948625] sysrq: SysRq : Resetting

[ 6.071991] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 22.890428] BUG1: ldata->echo_commit=128 tail=130 space=7936 old_space=7936
[ 27.891372] ldata->echo_commit=128 ldata->echo_tail=130 tail=3064270209 space=0 old_space=7936
[ 32.895212] ldata->echo_commit=128 ldata->echo_tail=130 tail=6085270166 space=0 old_space=7936
[ 37.899047] ldata->echo_commit=128 ldata->echo_tail=130 tail=9102650663 space=0 old_space=7936
[ 42.903405] ldata->echo_commit=128 ldata->echo_tail=130 tail=12195198062 space=0 old_space=7936
[ 44.497876] sysrq: SysRq : Resetting

[ 5.804688] input: PC Speaker as /devices/platform/pcspkr/input/input4
[ 23.632487] BUG1: ldata->echo_commit=102 tail=130 space=7936 old_space=7936
[ 28.633878] ldata->echo_commit=102 ldata->echo_tail=130 tail=2930081035 space=0 old_space=7936
[ 33.638168] ldata->echo_commit=102 ldata->echo_tail=130 tail=5867627875 space=0 old_space=7936
[ 38.654062] ldata->echo_commit=102 ldata->echo_tail=130 tail=8804236495 space=0 old_space=7936
[ 40.296804] sysrq: SysRq : Resetting

Tetsuo Handa

unread,
May 17, 2018, 8:19:21 AM5/17/18
to gre...@linuxfoundation.org, jsl...@suse.com, Peter Hurley, dvy...@google.com, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

From ada7279f7f034c5fd79fc04e1120069ea5f6cef2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Thu, 17 May 2018 20:42:50 +0900
Subject: [PATCH] n_tty: Access echo_* variables carefully.

syzbot is reporting stalls at __process_echoes() [1]. This is because
since ldata->echo_commit < ldata->echo_tail becomes true for some reason,
the discard loop is serving as almost infinite loop. This patch tries to
avoid falling into ldata->echo_commit < ldata->echo_tail situation by
making access to echo_* variables more carefully.

Since reset_buffer_flags() is called without output_lock held, it should
not touch echo_* variables. And omit a call to reset_buffer_flags() from
n_tty_open() by using vzalloc().

Since add_echo_byte() is called without output_lock held, it needs memory
barrier between storing into echo_buf[] and incrementing echo_head counter.
echo_buf() needs corresponding memory barrier before reading echo_buf[].
Lack of handling the possibility of not-yet-stored multi-byte operation
might be the reason of falling into ldata->echo_commit < ldata->echo_tail
situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to
echo_buf(ldata, tail + 1), the WARN_ON() fires.

The echo_* variables are now updated with output_lock held. Thus, I guess
that ldata->echo_commit < ldata->echo_tail is no longer possible. But to
become more carefully, use ldata->echo_commit > ldata->echo_tail as
the preventive condition to continue the "while" loops.

[1] https://syzkaller.appspot.com/bug?id=3d7481a346958d9469bebbeb0537d5f056bdd6e8

Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+18df35...@syzkaller.appspotmail.com>
Cc: Peter Hurley <pe...@hurleysoftware.com>
---
drivers/tty/n_tty.c | 51 ++++++++++++++++++++++++++++++---------------------
1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cbe98bc..060ef91 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -90,9 +90,6 @@ struct n_tty_data {
size_t read_head;
size_t commit_head;
size_t canon_head;
- size_t echo_head;
- size_t echo_commit;
- size_t echo_mark;
DECLARE_BITMAP(char_map, 256);

/* private to n_tty_receive_overrun (single-threaded) */
@@ -119,6 +116,9 @@ struct n_tty_data {
unsigned int column;
unsigned int canon_column;
size_t echo_tail;
+ size_t echo_head;
+ size_t echo_commit;
+ size_t echo_mark;

struct mutex atomic_read_lock;
struct mutex output_lock;
@@ -141,6 +141,7 @@ static inline unsigned char *read_buf_addr(struct n_tty_data *ldata, size_t i)

static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i)
{
+ smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */
return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
}

@@ -316,9 +317,7 @@ static inline void put_tty_queue(unsigned char c, struct n_tty_data *ldata)
static void reset_buffer_flags(struct n_tty_data *ldata)
{
ldata->read_head = ldata->canon_head = ldata->read_tail = 0;
- ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0;
ldata->commit_head = 0;
- ldata->echo_mark = 0;
ldata->line_start = 0;

ldata->erasing = 0;
@@ -617,13 +616,22 @@ static size_t __process_echoes(struct tty_struct *tty)
old_space = space = tty_write_room(tty);

tail = ldata->echo_tail;
- while (ldata->echo_commit != tail) {
+ if (WARN_ON(ldata->echo_commit < tail))
+ return 0;
+ while (ldata->echo_commit > tail) {
c = echo_buf(ldata, tail);
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;

/*
+ * Since add_echo_byte() is called without holding
+ * output_lock, we might see only portion of multi-byte
+ * operation.
+ */
+ if (ldata->echo_commit == tail + 1)
+ goto not_yet_stored;
+ /*
* If the buffer byte is the start of a multi-byte
* operation, get the next byte, which is either the
* op code or a control character value.
@@ -634,6 +642,8 @@ static size_t __process_echoes(struct tty_struct *tty)
unsigned int num_chars, num_bs;

case ECHO_OP_ERASE_TAB:
+ if (ldata->echo_commit == tail + 2)
+ goto not_yet_stored;
num_chars = echo_buf(ldata, tail + 2);

/*
@@ -728,7 +738,9 @@ static size_t __process_echoes(struct tty_struct *tty)
/* If the echo buffer is nearly full (so that the possibility exists
* of echo overrun before the next commit), then discard enough
* data at the tail to prevent a subsequent overrun */
- while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+ WARN_ON(ldata->echo_commit < tail);
+ while (ldata->echo_commit > tail &&
+ ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -738,6 +750,7 @@ static size_t __process_echoes(struct tty_struct *tty)
tail++;
}

+ not_yet_stored:
ldata->echo_tail = tail;
return old_space - space;
}
@@ -748,6 +761,7 @@ static void commit_echoes(struct tty_struct *tty)
size_t nr, old, echoed;
size_t head;

+ mutex_lock(&ldata->output_lock);
head = ldata->echo_head;
ldata->echo_mark = head;
old = ldata->echo_commit - ldata->echo_tail;
@@ -756,10 +770,12 @@ static void commit_echoes(struct tty_struct *tty)
* is over the threshold (and try again each time another
* block is accumulated) */
nr = head - ldata->echo_tail;
- if (nr < ECHO_COMMIT_WATERMARK || (nr % ECHO_BLOCK > old % ECHO_BLOCK))
+ if (nr < ECHO_COMMIT_WATERMARK ||
+ (nr % ECHO_BLOCK > old % ECHO_BLOCK)) {
+ mutex_unlock(&ldata->output_lock);
return;
+ }

- mutex_lock(&ldata->output_lock);
ldata->echo_commit = head;
echoed = __process_echoes(tty);
mutex_unlock(&ldata->output_lock);
@@ -810,7 +826,9 @@ static void flush_echoes(struct tty_struct *tty)

static inline void add_echo_byte(unsigned char c, struct n_tty_data *ldata)
{
- *echo_buf_addr(ldata, ldata->echo_head++) = c;
+ *echo_buf_addr(ldata, ldata->echo_head) = c;
+ smp_wmb(); /* Matches smp_rmb() in echo_buf(). */
+ ldata->echo_head++;
}

/**
@@ -1878,30 +1896,21 @@ static int n_tty_open(struct tty_struct *tty)
struct n_tty_data *ldata;

/* Currently a malloc failure here can panic */
- ldata = vmalloc(sizeof(*ldata));
+ ldata = vzalloc(sizeof(*ldata));
if (!ldata)
- goto err;
+ return -ENOMEM;

ldata->overrun_time = jiffies;
mutex_init(&ldata->atomic_read_lock);
mutex_init(&ldata->output_lock);

tty->disc_data = ldata;
- reset_buffer_flags(tty->disc_data);
- ldata->column = 0;
- ldata->canon_column = 0;
- ldata->num_overrun = 0;
- ldata->no_room = 0;
- ldata->lnext = 0;
tty->closing = 0;
/* indicate buffer work may resume */
clear_bit(TTY_LDISC_HALTED, &tty->flags);
n_tty_set_termios(tty, NULL);
tty_unthrottle(tty);
-
return 0;
-err:
- return -ENOMEM;
}

static inline int input_available_p(struct tty_struct *tty, int poll)
--
1.8.3.1


syzbot

unread,
May 17, 2018, 8:40:03 AM5/17/18
to dvy...@google.com, gre...@linuxfoundation.org, jsl...@suse.com, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, pe...@hurleysoftware.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
INFO: rcu detected stall in corrupted

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
1-....: (124998 ticks this GP) idle=da2/1/4611686018427387906
softirq=12308/12308 fqs=30969

1-....: (124998 ticks this GP) idle=da2/1/4611686018427387906
softirq=12308/12308 fqs=30969

(t=125002 jiffies g=6046 c=6045 q=1179)
(detected by 0, t=125002 jiffies, g=6046, c=6045, q=1179)
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5134 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:rep_nop arch/x86/include/asm/processor.h:667 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:672 [inline]
RIP: 0010:virt_spin_lock arch/x86/include/asm/qspinlock.h:69 [inline]
RIP: 0010:native_queued_spin_lock_slowpath+0x204/0xde0
kernel/locking/qspinlock.c:305
RSP: 0018:ffff8801db107030 EFLAGS: 00000002
RAX: 0000000000000000 RBX: ffffed003b620e1f RCX: 0000000000000004
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff88b911c0
RBP: ffff8801db1073a0 R08: fffffbfff1172239 R09: fffffbfff1172238
R10: fffffbfff1172238 R11: ffffffff88b911c3 R12: ffffffff88b911c0
R13: ffff8801db107378 R14: 0000000000000001 R15: dffffc0000000000
FS: 00007fd0bf7b5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d91e4 CR3: 00000001c68eb000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:674 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:30 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:90 [inline]
do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline]
_raw_spin_lock_irqsave+0x9e/0xc0 kernel/locking/spinlock.c:152
rcu_dump_cpu_stacks+0x9d/0x1c2 kernel/rcu/tree.c:1373
print_cpu_stall kernel/rcu/tree.c:1525 [inline]
check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
__run_hrtimer kernel/time/hrtimer.c:1398 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:add_echo_byte drivers/tty/n_tty.c:829 [inline]
RIP: 0010:echo_char+0x152/0x340 drivers/tty/n_tty.c:934
RSP: 0018:ffff8801aa1ef720 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff8801a9a5e180 RBX: ffffc90002bb4000 RCX: ffffffff837ff667
RDX: 0000000000000000 RSI: ffffffff837ff672 RDI: 0000000000000001
RBP: ffff8801aa1ef748 R08: ffff8801a9a5e180 R09: 0000000000000001
R10: ffff8801a9a5e9e0 R11: ffff8801a9a5e180 R12: 00000000000008df
R13: 0000000000000000 R14: 0000000000000008 R15: ffffc90002bb6268
n_tty_receive_char_special+0x1451/0x3510 drivers/tty/n_tty.c:1324
n_tty_receive_buf_fast drivers/tty/n_tty.c:1595 [inline]
__receive_buf drivers/tty/n_tty.c:1629 [inline]
n_tty_receive_buf_common+0x21bd/0x2ca0 drivers/tty/n_tty.c:1727
n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1756
tiocsti drivers/tty/tty_io.c:2171 [inline]
tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:500 [inline]
do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
__do_sys_ioctl fs/ioctl.c:708 [inline]
__se_sys_ioctl fs/ioctl.c:706 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455319
RSP: 002b:00007fd0bf7b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fd0bf7b56d4 RCX: 0000000000455319
RDX: 00000000200000c0 RSI: 0000000000005412 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000002f9 R14: 00000000006f77f8 R15: 0000000000000000
Code: 5c 00 00 00 00 48 83 e7 f8 49 29 fc 41 8d 4c 24 64 c1 e9 03 f3 48 ab
48 81 c4 48 03 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f3 90 <e9> 3b ff ff
ff 89 f1 81 fe 00 01 00 00 0f 84 9e 03 00 00 41 be
NMI backtrace for cpu 1
CPU: 1 PID: 5134 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
print_cpu_stall kernel/rcu/tree.c:1525 [inline]
check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
__run_hrtimer kernel/time/hrtimer.c:1398 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:add_echo_byte drivers/tty/n_tty.c:829 [inline]
RIP: 0010:echo_char+0x152/0x340 drivers/tty/n_tty.c:934
RSP: 0018:ffff8801aa1ef720 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff8801a9a5e180 RBX: ffffc90002bb4000 RCX: ffffffff837ff667
RDX: 0000000000000000 RSI: ffffffff837ff672 RDI: 0000000000000001
RBP: ffff8801aa1ef748 R08: ffff8801a9a5e180 R09: 0000000000000001
R10: ffff8801a9a5e9e0 R11: ffff8801a9a5e180 R12: 00000000000008df
R13: 0000000000000000 R14: 0000000000000008 R15: ffffc90002bb6268
n_tty_receive_char_special+0x1451/0x3510 drivers/tty/n_tty.c:1324
n_tty_receive_buf_fast drivers/tty/n_tty.c:1595 [inline]
__receive_buf drivers/tty/n_tty.c:1629 [inline]
n_tty_receive_buf_common+0x21bd/0x2ca0 drivers/tty/n_tty.c:1727
n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1756
tiocsti drivers/tty/tty_io.c:2171 [inline]
tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:500 [inline]
do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
__do_sys_ioctl fs/ioctl.c:708 [inline]
__se_sys_ioctl fs/ioctl.c:706 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455319
RSP: 002b:00007fd0bf7b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fd0bf7b56d4 RCX: 0000000000455319
RDX: 00000000200000c0 RSI: 0000000000005412 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000002f9 R14: 00000000006f77f8 R15: 0000000000000000
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.375
msecs


Tested on:

commit: e6506eb24187 Merge tag 'trace-v4.17-rc4-2' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15804957800000
kernel config: https://syzkaller.appspot.com/x/.config?x=d57d2358c5d82736
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=11608417800000

Tetsuo Handa

unread,
May 18, 2018, 6:35:31 AM5/18/18
to syzbot+18df35...@syzkaller.appspotmail.com, gre...@linuxfoundation.org, jsl...@suse.com, pe...@hurleysoftware.com, dvy...@google.com, syzkall...@googlegroups.com
I checked the values using debug printk() shown below, and found that the cause
of stall at n_tty_receive_char_special() is comparison against unstable
ldata->red_head variable. This problem became visible by commit bc5a5e3f45d04784
("n_tty: Don't wrap input buffer indices at buffer size").

@@ -1316,11 +1316,23 @@ n_tty_receive_char_special(struct tty_struct *tty, unsigned char c)
}
if (c == REPRINT_CHAR(tty) && L_ECHO(tty) && L_IEXTEN(tty)) {
size_t tail = ldata->canon_head;
+ unsigned long start = jiffies;
+ size_t tail0 = ldata->canon_head;
+ size_t head0 = ldata->read_head;
+ size_t tail1;
+ size_t head1;

finish_erasing(ldata);
echo_char(c, tty);
echo_char_raw('\n', ldata);
+ tail1 = ldata->canon_head;
+ head1 = ldata->read_head;
while (tail != ldata->read_head) {
+ if (time_after(jiffies, start + 3 * HZ)) {
+ printk("tail=%lu tail0=%lu tail1=%lu ldata->canon_head=%lu head0=%lu head1=%lu ldata->read_head=%lu\n",
+ tail, tail0, tail1, ldata->canon_head, ldata->read_head, head0, head1);
+ start = jiffies;
+ }
echo_char(read_buf(ldata, tail), tty);
tail++;
}

[ 37.196685] tail=516174078 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=1 ldata->read_head=1
[ 40.200463] tail=1011287392 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=1 ldata->read_head=1
[ 43.204063] tail=1505209552 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=1 ldata->read_head=1
[ 46.207323] tail=1994835138 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=1 ldata->read_head=1
[ 49.211247] tail=2477844767 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=1 ldata->read_head=1
[ 50.091019] sysrq: SysRq : Resetting

[ 37.554852] tail=529327752 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=50 ldata->read_head=48
[ 40.558663] tail=1009217532 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=50 ldata->read_head=48
[ 43.562033] tail=1482211174 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=50 ldata->read_head=48
[ 46.565079] tail=1968306955 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=50 ldata->read_head=48
[ 47.353144] sysrq: SysRq : Resetting

[ 38.636502] tail=543672971 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=0 ldata->read_head=1
[ 41.640314] tail=1037657899 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=0 ldata->read_head=1
[ 44.644304] tail=1487178794 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=0 ldata->read_head=1
[ 47.648043] tail=1946458931 tail0=0 tail1=0 ldata->canon_head=0 head0=32 head1=0 ldata->read_head=1
[ 50.517004] sysrq: SysRq : Resetting

Thus, here is the patch.
>From f9e5c69beb42fd4910d7da0da04f5c527a03e469 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Fri, 18 May 2018 12:50:04 +0900
Subject: [PATCH 1/2] n_tty: Fix stall at n_tty_receive_char_special().

syzbot is reporting stalls at n_tty_receive_char_special() [1]. This is
because comparison is not working as expected since ldata->read_head can
change at any moment. Mitigate this by explicitly masking with buffer size
when checking condition for "while" loops.
Fixes: bc5a5e3f45d04784 ("n_tty: Don't wrap input buffer indices at buffer size")
Cc: Peter Hurley <pe...@hurleysoftware.com>
---
drivers/tty/n_tty.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cbe98bc..b279f873 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -124,6 +124,8 @@ struct n_tty_data {
struct mutex output_lock;
};

+#define MASK(x) ((x) & (N_TTY_BUF_SIZE - 1))
+
static inline size_t read_cnt(struct n_tty_data *ldata)
{
return ldata->read_head - ldata->read_tail;
@@ -978,14 +980,15 @@ static void eraser(unsigned char c, struct tty_struct *tty)
}

seen_alnums = 0;
- while (ldata->read_head != ldata->canon_head) {
+ while (MASK(ldata->read_head) != MASK(ldata->canon_head)) {
head = ldata->read_head;

/* erase a single possibly multibyte character */
do {
head--;
c = read_buf(ldata, head);
- } while (is_continuation(c, tty) && head != ldata->canon_head);
+ } while (is_continuation(c, tty) &&
+ MASK(head) != MASK(ldata->canon_head));

/* do not partially erase */
if (is_continuation(c, tty))
@@ -1027,7 +1030,7 @@ static void eraser(unsigned char c, struct tty_struct *tty)
* This info is used to go back the correct
* number of columns.
*/
- while (tail != ldata->canon_head) {
+ while (MASK(tail) != MASK(ldata->canon_head)) {
tail--;
c = read_buf(ldata, tail);
if (c == '\t') {
@@ -1302,7 +1305,7 @@ static void n_tty_receive_parity_error(struct tty_struct *tty, unsigned char c)
finish_erasing(ldata);
echo_char(c, tty);
echo_char_raw('\n', ldata);
- while (tail != ldata->read_head) {
+ while (MASK(tail) != MASK(ldata->read_head)) {
echo_char(read_buf(ldata, tail), tty);
tail++;
}
@@ -2411,7 +2414,7 @@ static unsigned long inq_canon(struct n_tty_data *ldata)
tail = ldata->read_tail;
nr = head - tail;
/* Skip EOF-chars.. */
- while (head != tail) {
+ while (MASK(head) != MASK(tail)) {
if (test_bit(tail & (N_TTY_BUF_SIZE - 1), ldata->read_flags) &&
read_buf(ldata, tail) == __DISABLED_CHAR)
nr--;
--
1.8.3.1

>From 2d50a00f35a65d52b4d7776be7d81b036f31e35b Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Fri, 18 May 2018 12:52:42 +0900
Subject: [PATCH 2/2] n_tty: Access echo_* variables carefully.

syzbot is reporting stalls at __process_echoes() [1]. This is because
since ldata->echo_commit < ldata->echo_tail becomes true for some reason,
the discard loop is serving as almost infinite loop. This patch tries to
avoid falling into ldata->echo_commit < ldata->echo_tail situation by
making access to echo_* variables more carefully.

Since reset_buffer_flags() is called without output_lock held, it should
not touch echo_* variables. And omit a call to reset_buffer_flags() from
n_tty_open() by using vzalloc().

Since add_echo_byte() is called without output_lock held, it needs memory
barrier between storing into echo_buf[] and incrementing echo_head counter.
echo_buf() needs corresponding memory barrier before reading echo_buf[].
Lack of handling the possibility of not-yet-stored multi-byte operation
might be the reason of falling into ldata->echo_commit < ldata->echo_tail
situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to
echo_buf(ldata, tail + 1), the WARN_ON() fires.

Also, explicitly masking with buffer for the former "while" loop, and
use ldata->echo_commit > tail for the latter "while" loop.

[1] https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+108696...@syzkaller.appspotmail.com>
Cc: Peter Hurley <pe...@hurleysoftware.com>
---
drivers/tty/n_tty.c | 42 ++++++++++++++++++++++++------------------
1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index b279f873..4317422 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -143,6 +143,7 @@ static inline unsigned char *read_buf_addr(struct n_tty_data *ldata, size_t i)

static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i)
{
+ smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */
return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
}

@@ -318,9 +319,7 @@ static inline void put_tty_queue(unsigned char c, struct n_tty_data *ldata)
static void reset_buffer_flags(struct n_tty_data *ldata)
{
ldata->read_head = ldata->canon_head = ldata->read_tail = 0;
- ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0;
ldata->commit_head = 0;
- ldata->echo_mark = 0;
ldata->line_start = 0;

ldata->erasing = 0;
@@ -619,13 +618,20 @@ static size_t __process_echoes(struct tty_struct *tty)
old_space = space = tty_write_room(tty);

tail = ldata->echo_tail;
- while (ldata->echo_commit != tail) {
+ while (MASK(ldata->echo_commit) != MASK(tail)) {
c = echo_buf(ldata, tail);
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;

/*
+ * Since add_echo_byte() is called without holding
+ * output_lock, we might see only portion of multi-byte
+ * operation.
+ */
+ if (MASK(ldata->echo_commit) == MASK(tail + 1))
+ goto not_yet_stored;
+ /*
* If the buffer byte is the start of a multi-byte
* operation, get the next byte, which is either the
* op code or a control character value.
@@ -636,6 +642,8 @@ static size_t __process_echoes(struct tty_struct *tty)
unsigned int num_chars, num_bs;

case ECHO_OP_ERASE_TAB:
+ if (MASK(ldata->echo_commit) == MASK(tail + 2))
+ goto not_yet_stored;
num_chars = echo_buf(ldata, tail + 2);

/*
@@ -730,7 +738,8 @@ static size_t __process_echoes(struct tty_struct *tty)
/* If the echo buffer is nearly full (so that the possibility exists
* of echo overrun before the next commit), then discard enough
* data at the tail to prevent a subsequent overrun */
- while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+ while (ldata->echo_commit > tail &&
+ ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -740,6 +749,7 @@ static size_t __process_echoes(struct tty_struct *tty)
tail++;
}

+ not_yet_stored:
ldata->echo_tail = tail;
return old_space - space;
}
@@ -750,6 +760,7 @@ static void commit_echoes(struct tty_struct *tty)
size_t nr, old, echoed;
size_t head;

+ mutex_lock(&ldata->output_lock);
head = ldata->echo_head;
ldata->echo_mark = head;
old = ldata->echo_commit - ldata->echo_tail;
@@ -758,10 +769,12 @@ static void commit_echoes(struct tty_struct *tty)
* is over the threshold (and try again each time another
* block is accumulated) */
nr = head - ldata->echo_tail;
- if (nr < ECHO_COMMIT_WATERMARK || (nr % ECHO_BLOCK > old % ECHO_BLOCK))
+ if (nr < ECHO_COMMIT_WATERMARK ||
+ (nr % ECHO_BLOCK > old % ECHO_BLOCK)) {
+ mutex_unlock(&ldata->output_lock);
return;
+ }

- mutex_lock(&ldata->output_lock);
ldata->echo_commit = head;
echoed = __process_echoes(tty);
mutex_unlock(&ldata->output_lock);
@@ -812,7 +825,9 @@ static void flush_echoes(struct tty_struct *tty)

static inline void add_echo_byte(unsigned char c, struct n_tty_data *ldata)
{
- *echo_buf_addr(ldata, ldata->echo_head++) = c;
+ *echo_buf_addr(ldata, ldata->echo_head) = c;
+ smp_wmb(); /* Matches smp_rmb() in echo_buf(). */
+ ldata->echo_head++;
}

/**
@@ -1881,30 +1896,21 @@ static int n_tty_open(struct tty_struct *tty)

syzbot

unread,
May 18, 2018, 7:03:03 AM5/18/18
to dvy...@google.com, gre...@linuxfoundation.org, jsl...@suse.com, penguin...@i-love.sakura.ne.jp, pe...@hurleysoftware.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
BUG: soft lockup in n_tty_receive_buf_common

IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [syz-executor5:7720]
Modules linked in:
irq event stamp: 139954
hardirqs last enabled at (139953): [<ffffffff876009d5>]
restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (139954): [<ffffffff87600905>]
interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last enabled at (139952): [<ffffffff87800778>]
__do_softirq+0x778/0xaf5 kernel/softirq.c:311
softirqs last disabled at (139945): [<ffffffff81475281>] invoke_softirq
kernel/softirq.c:365 [inline]
softirqs last disabled at (139945): [<ffffffff81475281>]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
CPU: 1 PID: 7720 Comm: syz-executor5 Not tainted 4.17.0-rc5+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__process_echoes+0x277/0x8d0 drivers/tty/n_tty.c:733
RSP: 0018:ffff8801c418f7e0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 00000000688de2a3 RCX: ffffffff838004b7
RDX: ffffffff97721dc3 RSI: 0000000000000edf RDI: 0000000000000007
RBP: ffff8801c418f848 R08: ffff8801c817a740 R09: 0000000000000006
R10: ffff8801c817a740 R11: 0000000000000000 R12: ffffc90004d16000
R13: 0000000000000066 R14: ffffffff97721dc3 R15: 0000000000000000
FS: 00007ff449837700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004d91e4 CR3: 00000001d9f9f000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
flush_echoes drivers/tty/n_tty.c:801 [inline]
__receive_buf drivers/tty/n_tty.c:1618 [inline]
n_tty_receive_buf_common+0x11d5/0x2c50 drivers/tty/n_tty.c:1712
n_tty_receive_buf+0x30/0x40 drivers/tty/n_tty.c:1741
tiocsti drivers/tty/tty_io.c:2171 [inline]
tty_ioctl+0x7e7/0x1870 drivers/tty/tty_io.c:2557
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:500 [inline]
do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
__do_sys_ioctl fs/ioctl.c:708 [inline]
__se_sys_ioctl fs/ioctl.c:706 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455319
RSP: 002b:00007ff449836c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ff4498376d4 RCX: 0000000000455319
RDX: 00000000200000c0 RSI: 0000000000005412 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000002f9 R14: 00000000006f77f8 R15: 0000000000000000
Code: 82 0f 84 4f 05 00 00 e8 c8 29 f9 fd 4c 8d 73 02 4c 89 f3 e8 bc 29 f9
fd 4d 89 ee bf df 0e 00 00 49 29 de 4c 89 f6 e8 d9 2a f9 fd <49> 81 fe df
0e 00 00 0f 86 e2 04 00 00 e8 97 29 f9 fd 49 89 de


Tested on:

commit: 3acf4e395260 Merge tag 'hwmon-for-linus-v4.17-rc6' of git:..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1585ba77800000
kernel config: https://syzkaller.appspot.com/x/.config?x=d57d2358c5d82736
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=12064c0f800000

Tetsuo Handa

unread,
May 18, 2018, 9:24:25 AM5/18/18
to syzbot+18df35...@syzkaller.appspotmail.com, gre...@linuxfoundation.org, jsl...@suse.com, pe...@hurleysoftware.com, dvy...@google.com, syzkall...@googlegroups.com
syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer still triggered
> crash:
> BUG: soft lockup in n_tty_receive_buf_common
> patch: https://syzkaller.appspot.com/x/patch.diff?x=12064c0f800000

I posted two patches, but syzbot tested with only the first patch applied.
That's why syzbot hit a different crash which is addressed by the second patch.

Thus, I repost with two patches combined.
n_tty.c | 55 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index cbe98bc..4317422 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -124,6 +124,8 @@ struct n_tty_data {
struct mutex output_lock;
};

+#define MASK(x) ((x) & (N_TTY_BUF_SIZE - 1))
+
static inline size_t read_cnt(struct n_tty_data *ldata)
{
return ldata->read_head - ldata->read_tail;
@@ -141,6 +143,7 @@ static inline unsigned char *read_buf_addr(struct n_tty_data *ldata, size_t i)

static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i)
{
+ smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */
return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
}

@@ -316,9 +319,7 @@ static inline void put_tty_queue(unsigned char c, struct n_tty_data *ldata)
static void reset_buffer_flags(struct n_tty_data *ldata)
{
ldata->read_head = ldata->canon_head = ldata->read_tail = 0;
- ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0;
ldata->commit_head = 0;
- ldata->echo_mark = 0;
ldata->line_start = 0;

ldata->erasing = 0;
@@ -617,13 +618,20 @@ static size_t __process_echoes(struct tty_struct *tty)
old_space = space = tty_write_room(tty);

tail = ldata->echo_tail;
- while (ldata->echo_commit != tail) {
+ while (MASK(ldata->echo_commit) != MASK(tail)) {
c = echo_buf(ldata, tail);
if (c == ECHO_OP_START) {
unsigned char op;
int no_space_left = 0;

/*
+ * Since add_echo_byte() is called without holding
+ * output_lock, we might see only portion of multi-byte
+ * operation.
+ */
+ if (MASK(ldata->echo_commit) == MASK(tail + 1))
+ goto not_yet_stored;
+ /*
* If the buffer byte is the start of a multi-byte
* operation, get the next byte, which is either the
* op code or a control character value.
@@ -634,6 +642,8 @@ static size_t __process_echoes(struct tty_struct *tty)
unsigned int num_chars, num_bs;

case ECHO_OP_ERASE_TAB:
+ if (MASK(ldata->echo_commit) == MASK(tail + 2))
+ goto not_yet_stored;
num_chars = echo_buf(ldata, tail + 2);

/*
@@ -728,7 +738,8 @@ static size_t __process_echoes(struct tty_struct *tty)
/* If the echo buffer is nearly full (so that the possibility exists
* of echo overrun before the next commit), then discard enough
* data at the tail to prevent a subsequent overrun */
- while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
+ while (ldata->echo_commit > tail &&
+ ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) {
if (echo_buf(ldata, tail) == ECHO_OP_START) {
if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB)
tail += 3;
@@ -738,6 +749,7 @@ static size_t __process_echoes(struct tty_struct *tty)
tail++;
}

+ not_yet_stored:
ldata->echo_tail = tail;
return old_space - space;
}
@@ -748,6 +760,7 @@ static void commit_echoes(struct tty_struct *tty)
size_t nr, old, echoed;
size_t head;

+ mutex_lock(&ldata->output_lock);
head = ldata->echo_head;
ldata->echo_mark = head;
old = ldata->echo_commit - ldata->echo_tail;
@@ -756,10 +769,12 @@ static void commit_echoes(struct tty_struct *tty)
* is over the threshold (and try again each time another
* block is accumulated) */
nr = head - ldata->echo_tail;
- if (nr < ECHO_COMMIT_WATERMARK || (nr % ECHO_BLOCK > old % ECHO_BLOCK))
+ if (nr < ECHO_COMMIT_WATERMARK ||
+ (nr % ECHO_BLOCK > old % ECHO_BLOCK)) {
+ mutex_unlock(&ldata->output_lock);
return;
+ }

- mutex_lock(&ldata->output_lock);
ldata->echo_commit = head;
echoed = __process_echoes(tty);
mutex_unlock(&ldata->output_lock);
@@ -810,7 +825,9 @@ static void flush_echoes(struct tty_struct *tty)

static inline void add_echo_byte(unsigned char c, struct n_tty_data *ldata)
{
- *echo_buf_addr(ldata, ldata->echo_head++) = c;
+ *echo_buf_addr(ldata, ldata->echo_head) = c;
+ smp_wmb(); /* Matches smp_rmb() in echo_buf(). */
+ ldata->echo_head++;
}

/**
@@ -978,14 +995,15 @@ static void eraser(unsigned char c, struct tty_struct *tty)
}

seen_alnums = 0;
- while (ldata->read_head != ldata->canon_head) {
+ while (MASK(ldata->read_head) != MASK(ldata->canon_head)) {
head = ldata->read_head;

/* erase a single possibly multibyte character */
do {
head--;
c = read_buf(ldata, head);
- } while (is_continuation(c, tty) && head != ldata->canon_head);
+ } while (is_continuation(c, tty) &&
+ MASK(head) != MASK(ldata->canon_head));

/* do not partially erase */
if (is_continuation(c, tty))
@@ -1027,7 +1045,7 @@ static void eraser(unsigned char c, struct tty_struct *tty)
* This info is used to go back the correct
* number of columns.
*/
- while (tail != ldata->canon_head) {
+ while (MASK(tail) != MASK(ldata->canon_head)) {
tail--;
c = read_buf(ldata, tail);
if (c == '\t') {
@@ -1302,7 +1320,7 @@ static void n_tty_receive_parity_error(struct tty_struct *tty, unsigned char c)
finish_erasing(ldata);
echo_char(c, tty);
echo_char_raw('\n', ldata);
- while (tail != ldata->read_head) {
+ while (MASK(tail) != MASK(ldata->read_head)) {
echo_char(read_buf(ldata, tail), tty);
tail++;
}
@@ -1878,30 +1896,21 @@ static int n_tty_open(struct tty_struct *tty)
@@ -2411,7 +2420,7 @@ static unsigned long inq_canon(struct n_tty_data *ldata)

syzbot

unread,
May 18, 2018, 9:45:03 AM5/18/18
to dvy...@google.com, gre...@linuxfoundation.org, jsl...@suse.com, penguin...@i-love.sakura.ne.jp, pe...@hurleysoftware.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+18df35...@syzkaller.appspotmail.com

Tested on:

commit: 3acf4e395260 Merge tag 'hwmon-for-linus-v4.17-rc6' of git:..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=d57d2358c5d82736
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=1647084f800000

Note: testing is done by a robot and is best-effort only.
Reply all
Reply to author
Forward
0 new messages