[syzbot] [dri?] BUG: scheduling while atomic in drm_atomic_helper_wait_for_flip_done

19 views
Skip to first unread message

syzbot

unread,
Jan 18, 2024, 4:51:23ā€ÆAMJan 18
to air...@gmail.com, dan...@ffwll.ch, dri-...@lists.freedesktop.org, hamoha...@gmail.com, linux-...@vger.kernel.org, maarten....@linux.intel.com, maira...@riseup.net, melis...@gmail.com, mri...@kernel.org, rodrigosi...@gmail.com, syzkall...@googlegroups.com, tzimm...@suse.de
Hello,

syzbot found the following issue on:

HEAD commit: 1b1934dbbdcf Merge tag 'docs-6.8-2' of git://git.lwn.net/l..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1029adbde80000
kernel config: https://syzkaller.appspot.com/x/.config?x=68ea41b98043e6e8
dashboard link: https://syzkaller.appspot.com/bug?extid=06fa1063cca8163ea541
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-1b1934db.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/00b728a4f3de/vmlinux-1b1934db.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5a3fe8452d59/Image-1b1934db.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+06fa10...@syzkaller.appspotmail.com

BUG: scheduling while atomic: syz-executor.0/29225/0x00000002
Modules linked in:
CPU: 1 PID: 29225 Comm: syz-executor.0 Not tainted 6.7.0-syzkaller-10085-g1b1934dbbdcf #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec arch/arm64/kernel/stacktrace.c:291
show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:298
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x48/0x60 lib/dump_stack.c:106
dump_stack+0x18/0x24 lib/dump_stack.c:113
__schedule_bug+0x50/0x68 kernel/sched/core.c:5943
schedule_debug kernel/sched/core.c:5970 [inline]
__schedule+0x7f4/0x8a8 kernel/sched/core.c:6620
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0x34/0xc8 kernel/sched/core.c:6817
schedule_timeout+0x8c/0x100 kernel/time/timer.c:2183
do_wait_for_common kernel/sched/completion.c:95 [inline]
__wait_for_common kernel/sched/completion.c:116 [inline]
wait_for_common kernel/sched/completion.c:127 [inline]
wait_for_completion_timeout+0x74/0x16c kernel/sched/completion.c:167
drm_atomic_helper_wait_for_flip_done+0x6c/0xc4 drivers/gpu/drm/drm_atomic_helper.c:1719
vkms_atomic_commit_tail+0x60/0xd0 drivers/gpu/drm/vkms/vkms_drv.c:81
commit_tail+0xa4/0x18c drivers/gpu/drm/drm_atomic_helper.c:1832
drm_atomic_helper_commit+0x164/0x178 drivers/gpu/drm/drm_atomic_helper.c:2072
drm_atomic_commit+0xa8/0xe0 drivers/gpu/drm/drm_atomic.c:1514
drm_client_modeset_commit_atomic+0x210/0x270 drivers/gpu/drm/drm_client_modeset.c:1051
drm_client_modeset_commit_locked+0x5c/0x188 drivers/gpu/drm/drm_client_modeset.c:1154
drm_client_modeset_commit+0x30/0x58 drivers/gpu/drm/drm_client_modeset.c:1180
__drm_fb_helper_restore_fbdev_mode_unlocked drivers/gpu/drm/drm_fb_helper.c:251 [inline]
__drm_fb_helper_restore_fbdev_mode_unlocked+0xa8/0xe8 drivers/gpu/drm/drm_fb_helper.c:230
drm_fb_helper_set_par+0x30/0x4c drivers/gpu/drm/drm_fb_helper.c:1344
fb_set_var+0x21c/0x488 drivers/video/fbdev/core/fbmem.c:312
fbcon_switch+0x214/0x4d0 drivers/video/fbdev/core/fbcon.c:2110
flush_scrollback drivers/tty/vt/vt.c:912 [inline]
csi_J+0x254/0x260 drivers/tty/vt/vt.c:1527
do_con_trol drivers/tty/vt/vt.c:2408 [inline]
do_con_write+0x1a30/0x1e2c drivers/tty/vt/vt.c:2905
con_write+0x18/0x68 drivers/tty/vt/vt.c:3251
gsmld_write+0x64/0xd0 drivers/tty/n_gsm.c:3724
iterate_tty_write drivers/tty/tty_io.c:1021 [inline]
file_tty_write.constprop.0+0x134/0x28c drivers/tty/tty_io.c:1092
tty_write+0x14/0x20 drivers/tty/tty_io.c:1113
call_write_iter include/linux/fs.h:2085 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x1dc/0x2f4 fs/read_write.c:590
ksys_write+0x70/0x104 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__arm64_sys_write+0x1c/0x28 fs/read_write.c:652
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:51
el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:136
do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:155
el0_svc+0x34/0xd8 arch/arm64/kernel/entry-common.c:678
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:696
el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:595
BUG: scheduling while atomic: syz-executor.0/29225/0x00000000
Modules linked in:
CPU: 0 PID: 29225 Comm: syz-executor.0 Tainted: G W 6.7.0-syzkaller-10085-g1b1934dbbdcf #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec arch/arm64/kernel/stacktrace.c:291
show_stack+0x18/0x24 arch/arm64/kernel/stacktrace.c:298
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x48/0x60 lib/dump_stack.c:106
dump_stack+0x18/0x24 lib/dump_stack.c:113
__schedule_bug+0x50/0x68 kernel/sched/core.c:5943
schedule_debug kernel/sched/core.c:5970 [inline]
__schedule+0x7f4/0x8a8 kernel/sched/core.c:6620
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0x34/0xc8 kernel/sched/core.c:6817
futex_wait_queue+0x70/0x9c kernel/futex/waitwake.c:370
__futex_wait+0xc8/0x15c kernel/futex/waitwake.c:669
futex_wait+0x84/0x108 kernel/futex/waitwake.c:697
do_futex+0xf8/0x1a0 kernel/futex/syscalls.c:102
__do_sys_futex kernel/futex/syscalls.c:179 [inline]
__se_sys_futex kernel/futex/syscalls.c:160 [inline]
__arm64_sys_futex+0x7c/0x1a4 kernel/futex/syscalls.c:160
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:51
el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:136
do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:155
el0_svc+0x34/0xd8 arch/arm64/kernel/entry-common.c:678
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:696
el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:595


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Tetsuo Handa

unread,
Jan 18, 2024, 9:18:21ā€ÆAMJan 18
to syzbot, syzkall...@googlegroups.com, linux-serial, linux-...@vger.kernel.org
#syz set subsystems: serial

include/linux/tty_ldisc.h says "struct tty_ldisc_ops"->write is allowed to sleep.

include/linux/tty_driver.h says "struct tty_operations"->write is not allowed to sleep.

drivers/tty/vt/vt.c implements do_con_write() from con_write() sleeping, violating what
include/linux/tty_driver.h says. But how to fix?

- if (in_interrupt())
+ if (in_interrupt() || in_atomic())
return count;

in do_con_write() and con_flush_chars() ? But include/linux/preempt.h says
in_atomic() cannot know about held spinlocks in non-preemptible kernels.

Is there a way to detect spin_lock_irqsave(&gsm->tx_lock, flags) from gsmld_write() ?
Something like whether irq is disabled?

Tetsuo Handa

unread,
Jan 20, 2024, 5:34:25ā€ÆAMJan 20
to syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, linux-...@vger.kernel.org, linux-serial
syzbot is reporting sleep in atomic context, for gsmld_write() is calling
con_write() with spinlock held and IRQs disabled.

Since include/linux/tty_ldisc.h says that "struct tty_ldisc_ops"->write
(e.g. gsmld_write()) is allowed to sleep and include/linux/tty_driver.h
says that "struct tty_operations"->write (e.g. con_write()) is not
allowed to sleep, we should handle this problem on the con_write() side.

It seems that "Andrew Morton: console locking merge" in 2.4.10-pre11 added
in_interrupt() check to do_con_write()/con_put_char()/con_flush_chars()
in order to handle exceptional caller.

Since include/linux/preempt.h says that in_atomic() cannot know about held
spinlocks in non-preemptible kernels, but gsmld_write() is calling
con_write() with IRQs disabled, we can add irqs_disabled() check to
do_con_write()/con_flush_chars() in order to handle this case. Though,
I'm not sure whether returning the bytes to write is appropriate behavior
when do_con_write() can't work...

Reported-by: syzbot+06fa10...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=06fa1063cca8163ea541
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
---
drivers/tty/vt/vt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 156efda7c80d..0d3d602ae147 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -2856,7 +2856,7 @@ static int do_con_write(struct tty_struct *tty, const u8 *buf, int count)
struct vt_notifier_param param;
bool rescan;

- if (in_interrupt())
+ if (in_interrupt() || irqs_disabled())
return count;

console_lock();
@@ -3314,7 +3314,7 @@ static void con_flush_chars(struct tty_struct *tty)
{
struct vc_data *vc;

- if (in_interrupt()) /* from flush_to_ldisc */
+ if (in_interrupt() || irqs_disabled()) /* from flush_to_ldisc */
return;

/* if we race with con_close(), vt may be null */
--
2.18.4

Hillf Danton

unread,
Jan 20, 2024, 10:48:38ā€ÆPMJan 20
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, linux-...@vger.kernel.org, linux-serial
On Sat, 20 Jan 2024 19:34:02 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
> syzbot is reporting sleep in atomic context, for gsmld_write() is calling
> con_write() with spinlock held and IRQs disabled.

...

> --- a/drivers/tty/vt/vt.c
> +++ b/drivers/tty/vt/vt.c
> @@ -2856,7 +2856,7 @@ static int do_con_write(struct tty_struct *tty, const u8 *buf, int count)
> struct vt_notifier_param param;
> bool rescan;
>
> - if (in_interrupt())
> + if (in_interrupt() || irqs_disabled())
> return count;
>
> console_lock();

Given console_lock(), no sense could be made by calling do_con_write()
with spin lock held at the first place, regardless irq.

Tetsuo Handa

unread,
Jan 21, 2024, 6:35:28ā€ÆAMJan 21
to Hillf Danton, syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, linux-...@vger.kernel.org, linux-serial
On 2024/01/21 12:48, Hillf Danton wrote:
> On Sat, 20 Jan 2024 19:34:02 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
>> syzbot is reporting sleep in atomic context, for gsmld_write() is calling
>> con_write() with spinlock held and IRQs disabled.
>
> ...
>
>> --- a/drivers/tty/vt/vt.c
>> +++ b/drivers/tty/vt/vt.c
>> @@ -2856,7 +2856,7 @@ static int do_con_write(struct tty_struct *tty, const u8 *buf, int count)
>> struct vt_notifier_param param;
>> bool rescan;
>>
>> - if (in_interrupt())
>> + if (in_interrupt() || irqs_disabled())
>> return count;
>>
>> console_lock();
>
> Given console_lock(), no sense could be made by calling do_con_write()
> with spin lock held at the first place, regardless irq.

The question was how to detect it. Since in_atomic() is not a reliable method for
detecting that a spin lock is held, this patch instead chose irqs_disabled(), for
gsmld_write() is using spin_lock_irqsave(&gsm->tx_lock, flags).

Jiri Slaby

unread,
Jan 22, 2024, 1:48:57ā€ÆAMJan 22
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Andrew Morton, linux-...@vger.kernel.org, linux-serial
On 20. 01. 24, 11:34, Tetsuo Handa wrote:
> syzbot is reporting sleep in atomic context, for gsmld_write() is calling
> con_write() with spinlock held and IRQs disabled.

gsm should never be bound to a console in the first place.

Noone has sent a patch to deny that yet.

Follow:
https://lore.kernel.org/all/49453ebd-b321-4f34...@kernel.org/

And feel free to patch that ;).

thanks,
--
js
suse labs

Tetsuo Handa

unread,
Jan 22, 2024, 9:09:39ā€ÆAMJan 22
to Jiri Slaby, syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Andrew Morton, Starke, Daniel, Lee Jones, Fedor Pchelkin, linux-...@vger.kernel.org, linux-serial
OK. Here is a deny-listing based filter using device number of sysfs entry.
(We don't want to compare with the function address of con_write().
Thus, this patch is comparing with device major/minor numbers.)

----------
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 4036566febcb..6f9730dce5aa 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -3630,6 +3630,10 @@ static int gsmld_open(struct tty_struct *tty)
if (tty->ops->write == NULL)
return -EINVAL;

+ /* Can't be attached to virtual consoles. */
+ if (tty->dev && MAJOR(tty->dev->devt) == 4 && MINOR(tty->dev->devt) < 64)
+ return -EINVAL;
+
/* Attach our ldisc data */
gsm = gsm_alloc_mux();
if (gsm == NULL)
----------

Is it possible to use allow-listing based filtering?
(Attaching on /dev/tty (major=5, minor=0) causes current ssh session
to be closed. Unexpectedly loosing connection might be a problem for
fuzz testing...)

----------
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/tty.h>

int main(int argc, char *argv[]) {
int ldisc = N_GSM0710;

return ioctl(open(argv[1], O_RDWR | O_NOCTTY | O_NDELAY), TIOCSETD, &ldisc) == 0;
}
----------

Tetsuo Handa

unread,
Jan 24, 2024, 5:07:17ā€ÆAMJan 24
to Jiri Slaby, syzbot, syzkall...@googlegroups.com, Greg Kroah-Hartman, Andrew Morton, Starke, Daniel, Lee Jones, Fedor Pchelkin, linux-...@vger.kernel.org, linux-serial
syzbot is reporting sleep in atomic context, for gsmld_write() is calling
con_write() with spinlock held and IRQs disabled.

Since n_gsm is designed to be used for serial port, reject attaching to
virtual consoles and PTY devices, by checking tty's device major/minor
numbers at gsmld_open().

Link: https://www.kernel.org/doc/html/v6.7/driver-api/tty/n_gsm.html
drivers/tty/n_gsm.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 4036566febcb..14581483af78 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -3623,6 +3623,7 @@ static void gsmld_close(struct tty_struct *tty)
static int gsmld_open(struct tty_struct *tty)
{
struct gsm_mux *gsm;
+ int major;

if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -3630,6 +3631,17 @@ static int gsmld_open(struct tty_struct *tty)
if (tty->ops->write == NULL)
return -EINVAL;

+ major = tty->driver->major;
+ /* Reject Virtual consoles */
+ if (major == 4 && tty->driver->minor_start == 1)
+ return -EINVAL;
+ /* Reject Unix98 PTY masters/slaves */
+ if (major >= 128 && major <= 143)
+ return -EINVAL;
+ /* Reject BSD PTY masters/slaves */
+ if (major >= 2 && major <= 3)
+ return -EINVAL;
+
/* Attach our ldisc data */
gsm = gsm_alloc_mux();
if (gsm == NULL)
--
2.18.4


Greg Kroah-Hartman

unread,
Jan 24, 2024, 8:14:36ā€ÆAMJan 24
to Tetsuo Handa, Jiri Slaby, syzbot, syzkall...@googlegroups.com, Andrew Morton, Starke, Daniel, Lee Jones, Fedor Pchelkin, linux-...@vger.kernel.org, linux-serial
That is a lot of hard-coded magic numbers, why aren't these defined
anywhere?

But really, this is only for fuzz testers, why can't they just not ever
bind this to a console? Why do we have to have these checks in the
kernel to prevent userspace from doing dumb things that no real
userspace will ever do?

thanks,

greg k-h

Tetsuo Handa

unread,
Feb 3, 2024, 8:46:51ā€ÆAMFeb 3
to Greg Kroah-Hartman, Jiri Slaby, syzbot, syzkall...@googlegroups.com, Andrew Morton, Starke, Daniel, Lee Jones, Fedor Pchelkin, linux-...@vger.kernel.org, linux-serial, Linus Torvalds
On 2024/01/24 22:14, Greg Kroah-Hartman wrote:
>> @@ -3630,6 +3631,17 @@ static int gsmld_open(struct tty_struct *tty)
>> if (tty->ops->write == NULL)
>> return -EINVAL;
>>
>> + major = tty->driver->major;
>> + /* Reject Virtual consoles */
>> + if (major == 4 && tty->driver->minor_start == 1)
>> + return -EINVAL;
>> + /* Reject Unix98 PTY masters/slaves */
>> + if (major >= 128 && major <= 143)
>> + return -EINVAL;
>> + /* Reject BSD PTY masters/slaves */
>> + if (major >= 2 && major <= 3)
>> + return -EINVAL;
>
> That is a lot of hard-coded magic numbers, why aren't these defined
> anywhere?

Well, include/uapi/linux/major.h defines

#define TTY_MAJOR 4
#define UNIX98_PTY_MASTER_MAJOR 128
#define UNIX98_PTY_MAJOR_COUNT 8
#define UNIX98_PTY_SLAVE_MAJOR (UNIX98_PTY_MASTER_MAJOR+UNIX98_PTY_MAJOR_COUNT)
#define PTY_MASTER_MAJOR 2
#define PTY_SLAVE_MAJOR 3

but does not define end of UNIX98_PTY_SLAVE_MAJOR range, and
no file defines start of minor number for virtual consoles.

Does fixing this bug worth updating include/uapi/linux/major.h and adding
include/uapi/linux/minor.h ? Since these numbers won't change, I feel that
a line of comment is sufficient.

>
> But really, this is only for fuzz testers, why can't they just not ever
> bind this to a console? Why do we have to have these checks in the
> kernel to prevent userspace from doing dumb things that no real
> userspace will ever do?

Fuzz testing is for testing unexpected arguments. This bug is nothing but
missing validation that should be fixed on the kernel side rather than
trying to tune fuzzers.

>
> thanks,
>
> greg k-h

Greg Kroah-Hartman

unread,
Feb 6, 2024, 9:28:27ā€ÆAMFeb 6
to Tetsuo Handa, Jiri Slaby, syzbot, syzkall...@googlegroups.com, Andrew Morton, Starke, Daniel, Lee Jones, Fedor Pchelkin, linux-...@vger.kernel.org, linux-serial, Linus Torvalds
Then use the ones you have, don't make us be forced to figure out what
is going on please.

> Does fixing this bug worth updating include/uapi/linux/major.h and adding
> include/uapi/linux/minor.h ? Since these numbers won't change, I feel that
> a line of comment is sufficient.

No, don't add new ones where not needed.

> > But really, this is only for fuzz testers, why can't they just not ever
> > bind this to a console? Why do we have to have these checks in the
> > kernel to prevent userspace from doing dumb things that no real
> > userspace will ever do?
>
> Fuzz testing is for testing unexpected arguments. This bug is nothing but
> missing validation that should be fixed on the kernel side rather than
> trying to tune fuzzers.

I'll push back on this, fuzzers, running as root, can easily crash the
kernel as root can crash the kernel easily. So trying to keep the
kernel from hurting itself like this is odd to me.

Again, just tell the fuzzers to not do this. I don't want random
hard-coded numbers in here, that's adding maintance requirements on us
in the kernel for random userspace tools doing random things :(

thanks,

greg k-h

Tetsuo Handa

unread,
Apr 20, 2024, 7:17:44ā€ÆAMĀ (7 days ago)Ā Apr 20
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
#syz dup: BUG: sleeping function called from invalid context in console_lock (2)

Reply all
Reply to author
Forward
0 new messages