tty: panic in tty_ldisc_restore

135 views
Skip to first unread message

Dmitry Vyukov

unread,
Feb 2, 2017, 12:49:09 PM2/2/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Hello,

Syzkaller fuzzer started crashing kernel with the following panics:

Kernel panic - not syncing: Couldn't open N_TTY ldisc for ircomm0 --- error -12.
CPU: 0 PID: 5637 Comm: syz-executor3 Not tainted 4.9.0 #6
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801d4ba7a18 ffffffff8234d0df ffffffff00000000 1ffff1003a974ed6
ffffed003a974ece 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
0000000000000000 0000000000000000 ffff8801d4ba76a8 00000000dabb4fad
Call Trace:
[<ffffffff8234d0df>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<ffffffff818280d4>] panic+0x1fb/0x412 kernel/panic.c:179
[<ffffffff826bb0d4>] tty_ldisc_restore drivers/tty/tty_ldisc.c:520 [inline]
[<ffffffff826bb0d4>] tty_set_ldisc+0x704/0x8b0 drivers/tty/tty_ldisc.c:579
[<ffffffff826a3a93>] tiocsetd drivers/tty/tty_io.c:2667 [inline]
[<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
[<ffffffff81a7a22f>] vfs_ioctl fs/ioctl.c:43 [inline]
[<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630 fs/ioctl.c:679
[<ffffffff81a7b72f>] SYSC_ioctl fs/ioctl.c:694 [inline]
[<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Kernel panic - not syncing: Couldn't open N_TTY ldisc for ptm2 --- error -12.
CPU: 0 PID: 7844 Comm: syz-executor0 Not tainted 4.9.0 #6
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801c3307a18 ffffffff8234d0df ffffffff00000000 1ffff10038660ed6
ffffed0038660ece 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
0000000000000000 0000000000000000 ffff8801c33076a8 00000000dabb4fad
Call Trace:
[<ffffffff8234d0df>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<ffffffff818280d4>] panic+0x1fb/0x412 kernel/panic.c:179
[<ffffffff826bb0d4>] tty_ldisc_restore drivers/tty/tty_ldisc.c:520 [inline]
[<ffffffff826bb0d4>] tty_set_ldisc+0x704/0x8b0 drivers/tty/tty_ldisc.c:579
[<ffffffff826a3a93>] tiocsetd drivers/tty/tty_io.c:2667 [inline]
[<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
[<ffffffff81a7a22f>] vfs_ioctl fs/ioctl.c:43 [inline]
[<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630 fs/ioctl.c:679
[<ffffffff81a7b72f>] SYSC_ioctl fs/ioctl.c:694 [inline]
[<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2


In all cases there is a vmalloc failure right before that:

syz-executor4: vmalloc: allocation failure, allocated 0 of 16384
bytes, mode:0x14000c2(GFP_KERNEL|__GFP_HIGHMEM), nodemask=(null)
syz-executor4 cpuset=/ mems_allowed=0
CPU: 1 PID: 4852 Comm: syz-executor4 Not tainted 4.9.0 #6
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff8801c41df898 ffffffff8234d0df ffffffff00000001 1ffff1003883bea6
ffffed003883be9e 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
0000000000000282 ffffffff84fd53c0 ffff8801dae65b38 ffff8801c41df4d0
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<ffffffff8186530f>] warn_alloc+0x21f/0x360
[<ffffffff819792c9>] __vmalloc_node_range+0x4e9/0x770
[< inline >] __vmalloc_node mm/vmalloc.c:1749
[< inline >] __vmalloc_node_flags mm/vmalloc.c:1763
[<ffffffff8197961b>] vmalloc+0x5b/0x70 mm/vmalloc.c:1778
[<ffffffff826ad77b>] n_tty_open+0x1b/0x470 drivers/tty/n_tty.c:1883
[<ffffffff826ba973>] tty_ldisc_open.isra.3+0x73/0xd0
drivers/tty/tty_ldisc.c:463
[< inline >] tty_ldisc_restore drivers/tty/tty_ldisc.c:510
[<ffffffff826bafb4>] tty_set_ldisc+0x5e4/0x8b0 drivers/tty/tty_ldisc.c:579
[< inline >] tiocsetd drivers/tty/tty_io.c:2667
[<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
[<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630
[< inline >] SYSC_ioctl fs/ioctl.c:698
[<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
[<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:204


I've found that it's even documented in the source code, but it does
not look like a good failure mode for allocation failure:

static int n_tty_open(struct tty_struct *tty)
{
struct n_tty_data *ldata;

/* Currently a malloc failure here can panic */
ldata = vmalloc(sizeof(*ldata));
if (!ldata)
goto err;


On commit 510948533b059f4f5033464f9f4a0c32d4ab0c08 of mmotm tree.

Greg Kroah-Hartman

unread,
Feb 2, 2017, 12:56:03 PM2/2/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
How are you running out of vmalloc() memory?

thanks,

greg k-h

Dmitry Vyukov

unread,
Feb 2, 2017, 1:04:02 PM2/2/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
I don't know exactly. But it does not seem to represent a problem for
the fuzzer.
Is it meant to be very hard to do?

Greg Kroah-Hartman

unread,
Feb 2, 2017, 1:23:38 PM2/2/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Yes, do you know of any normal way to cause it to fail?

thanks,

greg k-h

Dmitry Vyukov

unread,
Feb 7, 2017, 5:24:35 AM2/7/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
On Thu, Feb 2, 2017 at 7:23 PM, Greg Kroah-Hartman
I don't. But I means approximately nothing.
Do you mean that it is not possible to trigger?
Doesn't simply creating lots of kernel resources (files, sockets,
pipe) will do the trick? Or just paging in lots of memory? Even if the
process itself will be chosen as OOM kill target, it will still take
the machine down with itself due to the panic while returning from the
syscall, no?

Greg Kroah-Hartman

unread,
Feb 7, 2017, 5:43:12 AM2/7/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
I'm not saying that it's impossible, just an "almost" impossible thing
to hit. Obviously you have hit it, so it can happen :)

But, how to fix it? I really don't know. Unwinding a failure at this
point in time is very tough, as that comment shows. Any suggestions of
how it could be resolved are greatly appreciated.

thanks,

greg k-h

Dmitry Vyukov

unread,
Feb 7, 2017, 5:51:46 AM2/7/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
On Tue, Feb 7, 2017 at 11:43 AM, Greg Kroah-Hartman
Is it possible to not shutdown the old discipline tty_set_ldisc before
we prepare everything for the new one:

/* Shutdown the old discipline. */
tty_ldisc_close(tty, old_ldisc);

Currently it does:

close(old)
if (open(new))
open(old) // assume never fails

it looks inherently problematic.
Couldn't we do:

if (open(new))
return -ESOMETHING
close(old)

?

Dmitry Vyukov

unread,
Feb 17, 2017, 4:51:23 PM2/17/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Or can we just kill the task? Still better than kernel panic.

Dmitry Vyukov

unread,
Feb 28, 2017, 1:11:55 PM2/28/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
I guess we can't get away with killing the task as tty will be left in
inconsistent state and it is accessible to other tasks.
But what creating new ldisk first and then, if that succeeds,
destroying the old one?

Dmitry Vyukov

unread,
Mar 2, 2017, 1:27:56 PM3/2/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
This is hurting us badly.

Opening new disk before closing the old one turned out to be hard (too
much state saved in tty).
How about this one? It reuses the existing tty_ldisc_reinit helper. If
opening the old disk and N_TTY fails, it leaves ldisk == NULL. But
it's already possible in tty_ldisc_hangup, and the code seems to be
prepared for this.


diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 68947f6de5ad..eafb55570f6e 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -489,41 +489,6 @@ static void tty_ldisc_close(struct tty_struct
*tty, struct tty_ldisc *ld)
}

/**
- * tty_ldisc_restore - helper for tty ldisc change
- * @tty: tty to recover
- * @old: previous ldisc
- *
- * Restore the previous line discipline or N_TTY when a line discipline
- * change fails due to an open error
- */
-
-static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
-{
- struct tty_ldisc *new_ldisc;
- int r;
-
- /* There is an outstanding reference here so this is safe */
- old = tty_ldisc_get(tty, old->ops->num);
- WARN_ON(IS_ERR(old));
- tty->ldisc = old;
- tty_set_termios_ldisc(tty, old->ops->num);
- if (tty_ldisc_open(tty, old) < 0) {
- tty_ldisc_put(old);
- /* This driver is always present */
- new_ldisc = tty_ldisc_get(tty, N_TTY);
- if (IS_ERR(new_ldisc))
- panic("n_tty: get");
- tty->ldisc = new_ldisc;
- tty_set_termios_ldisc(tty, N_TTY);
- r = tty_ldisc_open(tty, new_ldisc);
- if (r < 0)
- panic("Couldn't open N_TTY ldisc for "
- "%s --- error %d.",
- tty_name(tty), r);
- }
-}
-
-/**
* tty_set_ldisc - set line discipline
* @tty: the terminal to set
* @ldisc: the line discipline
@@ -536,12 +501,7 @@ static void tty_ldisc_restore(struct tty_struct
*tty, struct tty_ldisc *old)

int tty_set_ldisc(struct tty_struct *tty, int disc)
{
- int retval;
- struct tty_ldisc *old_ldisc, *new_ldisc;
-
- new_ldisc = tty_ldisc_get(tty, disc);
- if (IS_ERR(new_ldisc))
- return PTR_ERR(new_ldisc);
+ int retval, old_disc;

tty_lock(tty);
retval = tty_ldisc_lock(tty, 5 * HZ);
@@ -554,7 +514,8 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
}

/* Check the no-op case */
- if (tty->ldisc->ops->num == disc)
+ old_disc = tty->ldisc->ops->num;
+ if (old_disc == disc)
goto out;

if (test_bit(TTY_HUPPED, &tty->flags)) {
@@ -563,42 +524,32 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
goto out;
}

- old_ldisc = tty->ldisc;
-
- /* Shutdown the old discipline. */
- tty_ldisc_close(tty, old_ldisc);
-
- /* Now set up the new line discipline. */
- tty->ldisc = new_ldisc;
- tty_set_termios_ldisc(tty, disc);
-
- retval = tty_ldisc_open(tty, new_ldisc);
- if (retval < 0) {
+ if (tty_ldisc_reinit(tty, disc) < 0) {
/* Back to the old one or N_TTY if we can't */
- tty_ldisc_put(new_ldisc);
- tty_ldisc_restore(tty, old_ldisc);
+ if (tty_ldisc_reinit(tty, old_disc) < 0) {
+ pr_err("tty: TIOCSETD failed, reinitializing N_TTY\n");
+ if (tty_ldisc_reinit(tty, N_TTY) < 0) {
+ /* At this point we have tty->ldisc == NULL. */
+ pr_err("tty: reinitializing N_TTY failed\n");
+ }
+ }
}

- if (tty->ldisc->ops->num != old_ldisc->ops->num &&
tty->ops->set_ldisc) {
+ if (tty->ldisc && tty->ldisc->ops->num != old_disc &&
+ tty->ops->set_ldisc) {
down_read(&tty->termios_rwsem);
tty->ops->set_ldisc(tty);
up_read(&tty->termios_rwsem);
}

- /* At this point we hold a reference to the new ldisc and a
- reference to the old ldisc, or we hold two references to
- the old ldisc (if it was restored as part of error cleanup
- above). In either case, releasing a single reference from
- the old ldisc is correct. */
- new_ldisc = old_ldisc;
out:
tty_ldisc_unlock(tty);

/* Restart the work queue in case no characters kick it off. Safe if
already running */
- tty_buffer_restart_work(tty->port);
+ if (tty->ldisc)
+ tty_buffer_restart_work(tty->port);
err:
- tty_ldisc_put(new_ldisc); /* drop the extra reference */
tty_unlock(tty);
return retval;
}
@@ -659,10 +610,8 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
int retval;

ld = tty_ldisc_get(tty, disc);
- if (IS_ERR(ld)) {
- BUG_ON(disc == N_TTY);
+ if (IS_ERR(ld))
return PTR_ERR(ld);
- }

if (tty->ldisc) {
tty_ldisc_close(tty, tty->ldisc);
@@ -674,10 +623,8 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
tty_set_termios_ldisc(tty, disc);
retval = tty_ldisc_open(tty, tty->ldisc);
if (retval) {
- if (!WARN_ON(disc == N_TTY)) {
- tty_ldisc_put(tty->ldisc);
- tty->ldisc = NULL;
- }
+ tty_ldisc_put(tty->ldisc);
+ tty->ldisc = NULL;
}
return retval;
}



Here is the whole function as the diff is somewhat difficult to read:

int tty_set_ldisc(struct tty_struct *tty, int disc)
{
int retval, old_disc;

tty_lock(tty);
retval = tty_ldisc_lock(tty, 5 * HZ);
if (retval)
goto err;

if (!tty->ldisc) {
retval = -EIO;
goto out;
}

/* Check the no-op case */
old_disc = tty->ldisc->ops->num;
if (old_disc == disc)
goto out;

if (test_bit(TTY_HUPPED, &tty->flags)) {
/* We were raced by hangup */
retval = -EIO;
goto out;
}

if (tty_ldisc_reinit(tty, disc) < 0) {
/* Back to the old one or N_TTY if we can't */
if (tty_ldisc_reinit(tty, old_disc) < 0) {
pr_err("tty: TIOCSETD failed, reinitializing N_TTY\n");
if (tty_ldisc_reinit(tty, N_TTY) < 0) {
/* At this point we have tty->ldisc == NULL. */
pr_err("tty: reinitializing N_TTY failed\n");
}
}
}

if (tty->ldisc && tty->ldisc->ops->num != old_disc &&
tty->ops->set_ldisc) {
down_read(&tty->termios_rwsem);
tty->ops->set_ldisc(tty);
up_read(&tty->termios_rwsem);
}

out:
tty_ldisc_unlock(tty);

/* Restart the work queue in case no characters kick it off. Safe if
already running */
if (tty->ldisc)
tty_buffer_restart_work(tty->port);
err:
tty_unlock(tty);
return retval;
}

Greg Kroah-Hartman

unread,
Mar 2, 2017, 2:27:40 PM3/2/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Really? How? Are you hitting this a lot? Why now and never before?
Are you really out of memory?

> Opening new disk before closing the old one turned out to be hard (too
> much state saved in tty).
> How about this one? It reuses the existing tty_ldisc_reinit helper. If
> opening the old disk and N_TTY fails, it leaves ldisk == NULL. But
> it's already possible in tty_ldisc_hangup, and the code seems to be
> prepared for this.

<snip>

I'll look at this after -rc1 is out, thanks.

greg k-h

Dmitry Vyukov

unread,
Mar 2, 2017, 2:31:10 PM3/2/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
This crashes our test bots a lot.
Why now... I don't have exact answer. Probably a combination of fuzzer
figuring out some magic sequences of syscalls and increased memory
consumption due to something (again maybe due to fuzzer figuring out
how to eat more memory).

Greg Kroah-Hartman

unread,
Mar 2, 2017, 2:37:53 PM3/2/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
If the fuzzer is suddenly eating more memory, you should be seeing lots
of other problems right? This can't be the only thing that has issues
with memory allocation failures?

thanks,

greg k-h

Dmitry Vyukov

unread,
Mar 3, 2017, 2:36:34 AM3/3/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
I remember 2 or 3 that started happenning roughly at the same time:
https://groups.google.com/forum/#!msg/syzkaller/tIx42qCVklk/fh0qjUboBgAJ
https://groups.google.com/forum/#!msg/syzkaller/vp1neyeoA8A/Is8aPdrpBgAJ
Both were quickly fixed.
There is strong bias towards failing larger, multi-page allocations.
so it's not that we are failing _all_ allocations in kernel code.

Dmitry Vyukov

unread,
Mar 4, 2017, 8:05:07 AM3/4/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Mailed the patch officially.
Once I fixed it, fuzzer was able to uncover another race leading to
crashes in tty_ldisc_ref_wait:
https://groups.google.com/d/msg/syzkaller/ZTsV9qLIzGA/opsLjyoEEAAJ

Dmitry Vyukov

unread,
Mar 13, 2017, 6:15:50 AM3/13/17
to Greg Kroah-Hartman, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
On Thu, Feb 2, 2017 at 7:23 PM, Greg Kroah-Hartman
Yes. And it turns out that it's actually super-easy to trigger and we
are not running out of memory.
Vmalloc has a check for fatal_signal_pending. So you just invoke
tiocsetd and kill the process concurrently. Machine is down.

Greg Kroah-Hartman

unread,
Mar 17, 2017, 1:08:19 AM3/17/17
to Dmitry Vyukov, Jiri Slaby, LKML, Peter Hurley, One Thousand Gnomes, syzkaller
Sorry for the delay, patches now applied, many thanks for them.

greg k-h
Reply all
Reply to author
Forward
0 new messages