INFO: rcu detected stall in io_playback_transfer

16 views
Skip to first unread message

syzbot

unread,
Apr 6, 2018, 3:09:02 AM4/6/18
to alsa-...@alsa-project.org, dan.ca...@oracle.com, linux-...@vger.kernel.org, mi...@kernel.org, o-ta...@sakamocchi.jp, pe...@perex.cz, syzkall...@googlegroups.com, ti...@suse.com, vi...@zeniv.linux.org.uk
Hello,

syzbot hit the following crash on upstream commit
e02d37bf55a9a36f22427fd6dd733fe104d817b6 (Thu Apr 5 17:42:07 2018 +0000)
Merge tag 'sound-4.17-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=4f2016cf5185da7759dc

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6609942245015552
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-4805825610197092128
compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4f2016...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

INFO: rcu_sched self-detected stall on CPU
1-....: (124999 ticks this GP) idle=212/1/4611686018427387906
softirq=57947/57947 fqs=31225
(t=125000 jiffies g=30827 c=30826 q=171)
NMI backtrace for cpu 1
CPU: 1 PID: 13330 Comm: syz-executor4 Not tainted 4.16.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x1b9/0x29f lib/dump_stack.c:53
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
print_cpu_stall kernel/rcu/tree.c:1525 [inline]
check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
__rcu_pending kernel/rcu/tree.c:3356 [inline]
rcu_pending kernel/rcu/tree.c:3401 [inline]
rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
update_process_times+0x2d/0x70 kernel/time/timer.c:1636
tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:171
tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1179
__run_hrtimer kernel/time/hrtimer.c:1337 [inline]
__hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1399
hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1457
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
</IRQ>
RIP: 0010:snd_pcm_oss_write3+0xf6/0x220 sound/core/oss/pcm_oss.c:1236
RSP: 0018:ffff88018f57ed00 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: ffffffffffffffe0 RBX: ffffffffffffffe0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000ffffffe0 RDI: 00000000ffffffe0
RBP: ffff88018f57ed48 R08: ffff8801b29daaf8 R09: 0000000000000006
R10: ffff8801b29da280 R11: 0000000000000000 R12: 0000000000000001
R13: ffff8801ce8c56c0 R14: ffff8801aa45a300 R15: ffffffffffffffe0
io_playback_transfer+0x274/0x310 sound/core/oss/io.c:47
snd_pcm_plug_write_transfer+0x36c/0x470 sound/core/oss/pcm_plugin.c:619
snd_pcm_oss_write2+0x25c/0x460 sound/core/oss/pcm_oss.c:1365
snd_pcm_oss_sync1+0x332/0x5a0 sound/core/oss/pcm_oss.c:1606
snd_pcm_oss_sync.isra.29+0x790/0x980 sound/core/oss/pcm_oss.c:1682
snd_pcm_oss_release+0x214/0x290 sound/core/oss/pcm_oss.c:2559
__fput+0x34d/0x890 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x1e4/0x290 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1aee/0x2730 kernel/exit.c:865
do_group_exit+0x16f/0x430 kernel/exit.c:968
get_signal+0x886/0x1960 kernel/signal.c:2469
do_signal+0x90/0x2020 arch/x86/kernel/signal.c:810
exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x792/0x9d0 arch/x86/entry/common.c:292
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4552d9
RSP: 002b:00007f5c2a0e9c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
RAX: fffffffffffffdfc RBX: 00007f5c2a0ea6d4 RCX: 00000000004552d9
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000020000240
RBP: 000000000072c010 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000413 R14: 00000000006f9268 R15: 0000000000000002


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Tetsuo Handa

unread,
Apr 8, 2018, 8:59:30 AM4/8/18
to ti...@suse.com, vi...@zeniv.linux.org.uk, syzbot, alsa-...@alsa-project.org, dan.ca...@oracle.com, linux-...@vger.kernel.org, mi...@kernel.org, o-ta...@sakamocchi.jp, pe...@perex.cz, syzkall...@googlegroups.com
I manually simplified the reproducer.

It is quite strange that removing unshare() hides this lockup bug.
Also, explicitly closing by "close()" hides this lockup bug.
Triggering "fput() from do_exit()" from "different namespace"
somehow affects this lockup bug?

----------------------------------------
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/soundcard.h>

int main(int argc, char *argv[])
{
const int fd = open("/dev/dsp1", O_RDWR);
int frag = (0 << 16) | 0;
char buf[48] = { };
unshare(CLONE_NEWNS);
ioctl(fd, SNDCTL_DSP_SETFRAGMENT, &frag);
write(fd, buf, sizeof(buf));
return 0;
}
----------------------------------------

Dmitry Vyukov

unread,
Apr 8, 2018, 9:10:17 AM4/8/18
to Tetsuo Handa, Takashi Iwai, Al Viro, syzbot, alsa-...@alsa-project.org, Dan Carpenter, LKML, Ingo Molnar, Takashi Sakamoto, Jaroslav Kysela, syzkall...@googlegroups.com
Takashi has already fixed this. See this thread:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/%22INFO$3A$20rcu$20detected$20stall$20in$20snd_pcm_oss_write3$20(2)%22%7Csort:date/syzkaller-bugs/D2xWV7WTRDk/sCw_gImmCAAJ

The fix commit includes tag this bug:
Reported-by: syzbot+4f2016...@syzkaller.appspotmail.com
But it's just that it's mentioned in the thread for another bug.

Tetsuo Handa

unread,
Apr 8, 2018, 9:26:55 AM4/8/18
to dvy...@google.com, ti...@suse.com, vi...@zeniv.linux.org.uk, syzbot+4f2016...@syzkaller.appspotmail.com, alsa-...@alsa-project.org, dan.ca...@oracle.com, linux-...@vger.kernel.org, mi...@kernel.org, o-ta...@sakamocchi.jp, pe...@perex.cz, syzkall...@googlegroups.com
Yes, I noticed it just before I post this. But the reason I posted this anyway
is that I feel that there is some other bug revealed by this reproducer.
How can Takashi' patch explain my observation (with linux-next-20180406) ?

/* Hits the stall */
int main(int argc, char *argv[])
{
const int fd = open("/dev/dsp1", O_RDWR);
int frag = (0 << 16) | 0;
char buf[48] = { };
unshare(CLONE_NEWNS);
ioctl(fd, SNDCTL_DSP_SETFRAGMENT, &frag);
write(fd, buf, sizeof(buf));
return 0;
}

/* Does not hit the stall */
int main(int argc, char *argv[])
{
const int fd = open("/dev/dsp1", O_RDWR);
int frag = (0 << 16) | 0;
char buf[48] = { };
ioctl(fd, SNDCTL_DSP_SETFRAGMENT, &frag);
write(fd, buf, sizeof(buf));
return 0;
}

/* Does not hit the stall */
int main(int argc, char *argv[])
{
const int fd = open("/dev/dsp1", O_RDWR);
int frag = (0 << 16) | 0;
char buf[48] = { };
unshare(CLONE_NEWNS);
ioctl(fd, SNDCTL_DSP_SETFRAGMENT, &frag);
write(fd, buf, sizeof(buf));
close(fd);
return 0;
}

Takashi Iwai

unread,
Apr 9, 2018, 8:59:45 AM4/9/18
to Tetsuo Handa, dvy...@google.com, alsa-...@alsa-project.org, syzkall...@googlegroups.com, mi...@kernel.org, dan.ca...@oracle.com, pe...@perex.cz, o-ta...@sakamocchi.jp, syzbot+4f2016...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, vi...@zeniv.linux.org.uk
On Sun, 08 Apr 2018 15:26:21 +0200,
Tetsuo Handa wrote:
>
> Dmitry Vyukov wrote:
> > Takashi has already fixed this. See this thread:
> > https://groups.google.com/forum/#!searchin/syzkaller-bugs/%22INFO$3A$20rcu$20detected$20stall$20in$20snd_pcm_oss_write3$20(2)%22%7Csort:date/syzkaller-bugs/D2xWV7WTRDk/sCw_gImmCAAJ
>
> Yes, I noticed it just before I post this. But the reason I posted this anyway
> is that I feel that there is some other bug revealed by this reproducer.
> How can Takashi' patch explain my observation (with linux-next-20180406) ?

The bug is triggered by any buffer underrun of a PCM stream via OSS
emulation, so the condition can vary in any form. That is, if an app
doesn't give the enough amount in time, the PCM goes to XRUN state.
Then at closing, the driver tries to sync the pending data and hits
the bug.


Takashi
Reply all
Reply to author
Forward
0 new messages