[syzbot] INFO: task hung in do_proc

syzbot

unread,

Aug 28, 2021, 11:52:19 AM8/28/21

to gre...@linuxfoundation.org, jo...@kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, mathia...@linux.intel.com, st...@rowland.harvard.edu, syzkall...@googlegroups.com

Hello,

syzbot found the following issue on:

HEAD commit: d5ae8d7f85b7 Revert "media: dvb header files: move some he..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=155fa21e300000
kernel config: https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
dashboard link: https://syzkaller.appspot.com/bug?extid=ada0f7d3d9fd2016d927
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a1ab0e300000

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15662ee1300000
final oops: https://syzkaller.appspot.com/x/report.txt?x=17662ee1300000
console output: https://syzkaller.appspot.com/x/log.txt?x=13662ee1300000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ada0f7...@syzkaller.appspotmail.com

INFO: task syz-executor.0:8700 blocked for more than 143 seconds.
Not tainted 5.14.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:23192 pid: 8700 ppid: 8455 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4681 [inline]
__schedule+0xc07/0x11f0 kernel/sched/core.c:5938
schedule+0x14b/0x210 kernel/sched/core.c:6017
schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1857
do_wait_for_common+0x2da/0x480 kernel/sched/completion.c:85
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion_timeout+0x46/0x60 kernel/sched/completion.c:157
usb_start_wait_urb+0x167/0x550 drivers/usb/core/message.c:63
do_proc_bulk+0x978/0x1080 drivers/usb/core/devio.c:1236
proc_bulk drivers/usb/core/devio.c:1273 [inline]
usbdev_do_ioctl drivers/usb/core/devio.c:2547 [inline]
usbdev_ioctl+0x3441/0x6b10 drivers/usb/core/devio.c:2713
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:1069 [inline]
__se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665e9
RSP: 002b:00007f15a90dc188 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665e9
RDX: 0000000020000340 RSI: 00000000c0185502 RDI: 0000000000000004
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffcc552e1bf R14: 00007f15a90dc300 R15: 0000000000022000

Showing all locks held in the system:
1 lock held by khungtaskd/1610:
#0: ffffffff8c717ec0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30 arch/x86/pci/mmconfig_64.c:151
1 lock held by in:imklog/8130:
#0: ffff888035998870 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x24e/0x2f0 fs/file.c:974

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1610 Comm: khungtaskd Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1d3/0x29f lib/dump_stack.c:105
nmi_cpu_backtrace+0x16c/0x190 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x191/0x2f0 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
watchdog+0xd06/0xd50 kernel/hung_task.c:295
kthread+0x453/0x480 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10 Comm: kworker/u4:1 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
RIP: 0010:lock_is_held_type+0x1/0x180 kernel/locking/lockdep.c:5653
Code: 74 df 83 3d d8 22 e1 03 00 75 d6 48 c7 c7 80 c3 2e 8a 48 c7 c6 c0 c3 2e 8a 31 c0 e8 c9 8d 72 f7 0f 0b eb bd e8 90 fd ff ff 55 <41> 57 41 56 41 55 41 54 53 48 83 ec 10 65 48 8b 04 25 28 00 00 00
RSP: 0018:ffffc90000cf7990 EFLAGS: 00000202
RAX: 1ffffffff18e3801 RBX: 1ffff9200019ef34 RCX: 0000000080000001
RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffffffff8c717e20
RBP: ffffc90000cf7a28 R08: dffffc0000000000 R09: fffffbfff1b74ee6
R10: fffffbfff1b74ee6 R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff9200019ef58 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0a1e997000 CR3: 000000000c48e000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
lock_is_held include/linux/lockdep.h:283 [inline]
rcu_read_lock_sched_held+0x87/0x110 kernel/rcu/update.c:125
trace_lock_acquire+0x59/0x190 include/trace/events/lock.h:13
lock_acquire+0xa4/0x4a0 kernel/locking/lockdep.c:5596
rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267
rcu_read_lock include/linux/rcupdate.h:687 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:404 [inline]
batadv_nc_worker+0xc8/0x5b0 net/batman-adv/network-coding.c:715
process_one_work+0x833/0x10c0 kernel/workqueue.c:2276
worker_thread+0xac1/0x1320 kernel/workqueue.c:2422
kthread+0x453/0x480 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess):
0: 74 df je 0xffffffe1
2: 83 3d d8 22 e1 03 00 cmpl $0x0,0x3e122d8(%rip) # 0x3e122e1
9: 75 d6 jne 0xffffffe1
b: 48 c7 c7 80 c3 2e 8a mov $0xffffffff8a2ec380,%rdi
12: 48 c7 c6 c0 c3 2e 8a mov $0xffffffff8a2ec3c0,%rsi
19: 31 c0 xor %eax,%eax
1b: e8 c9 8d 72 f7 callq 0xf7728de9
20: 0f 0b ud2
22: eb bd jmp 0xffffffe1
24: e8 90 fd ff ff callq 0xfffffdb9
29: 55 push %rbp
* 2a: 41 57 push %r15 <-- trapping instruction
2c: 41 56 push %r14
2e: 41 55 push %r13
30: 41 54 push %r12
32: 53 push %rbx
33: 48 83 ec 10 sub $0x10,%rsp
37: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
3e: 00 00

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Alan Stern

unread,

Aug 28, 2021, 2:04:01 PM8/28/21

to syzbot, gre...@linuxfoundation.org, jo...@kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, mathia...@linux.intel.com, syzkall...@googlegroups.com

Looks like the wait needs to be interruptible.

Alan Stern

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d5ae8d7f85b7

Index: usb-devel/drivers/usb/core/message.c
===================================================================
--- usb-devel.orig/drivers/usb/core/message.c
+++ usb-devel/drivers/usb/core/message.c
@@ -60,12 +60,18 @@ static int usb_start_wait_urb(struct urb
goto out;

expire = timeout ? msecs_to_jiffies(timeout) : MAX_SCHEDULE_TIMEOUT;
- if (!wait_for_completion_timeout(&ctx.done, expire)) {
+ retval = wait_for_completion_interruptible_timeout(&ctx.done, expire);
+ if (retval <= 0) {
usb_kill_urb(urb);
- retval = (ctx.status == -ENOENT ? -ETIMEDOUT : ctx.status);
+ if (retval < 0)
+ retval = -EINTR;
+ else if (ctx.status == -ENOENT)
+ retval = -ETIMEDOUT;
+ else
+ retval = ctx.status;

dev_dbg(&urb->dev->dev,
- "%s timed out on ep%d%s len=%u/%u\n",
+ "%s timed out or interrupted on ep%d%s len=%u/%u\n",
current->comm,
usb_endpoint_num(&urb->ep->desc),
usb_urb_dir_in(urb) ? "in" : "out",

syzbot

unread,

Aug 28, 2021, 4:05:12 PM8/28/21

to gre...@linuxfoundation.org, jo...@kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, mathia...@linux.intel.com, st...@rowland.harvard.edu, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ada0f7...@syzkaller.appspotmail.com

Tested on:

commit: d5ae8d7f Revert "media: dvb header files: move some he..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel config: https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
dashboard link: https://syzkaller.appspot.com/bug?extid=ada0f7d3d9fd2016d927
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1

patch: https://syzkaller.appspot.com/x/patch.diff?x=16c2799d300000

Note: testing is done by a robot and is best-effort only.

Alan Stern

unread,

Aug 28, 2021, 9:58:26 PM8/28/21

to Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

usb_start_wait_urb() can be called from user processes by means of the
USBDEVFS_BULK and USBDEVFS_CONTROL ioctls in usbfs. Consequently it
should not contain an uninterruptible wait of arbitrarily long length
(the timeout value is specified here by the user, so it can be
practically anything). Doing so leads the kernel to complain about
"Task X blocked for more than N seconds", as found in testing by
syzbot:

INFO: task syz-executor.0:8700 blocked for more than 143 seconds.
Not tainted 5.14.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:23192 pid: 8700 ppid: 8455 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4681 [inline]
__schedule+0xc07/0x11f0 kernel/sched/core.c:5938
schedule+0x14b/0x210 kernel/sched/core.c:6017
schedule_timeout+0x98/0x2f0 kernel/time/timer.c:1857
do_wait_for_common+0x2da/0x480 kernel/sched/completion.c:85
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion_timeout+0x46/0x60 kernel/sched/completion.c:157
usb_start_wait_urb+0x167/0x550 drivers/usb/core/message.c:63
do_proc_bulk+0x978/0x1080 drivers/usb/core/devio.c:1236

proc_bulk drivers/usb/core/devio.c:1273 [inline]
usbdev_do_ioctl drivers/usb/core/devio.c:2547 [inline]
usbdev_ioctl+0x3441/0x6b10 drivers/usb/core/devio.c:2713

...

This patch fixes the problem by converting the uninterruptible wait to
an interruptible one. For the most part this won't affect calls to
usb_start_wait_urb(), because they are made by kernel threads and so
can't receive most signals.

But in some cases such calls may occur in user threads in contexts
other than usbfs ioctls. A signal in these circumstances could cause
a USB transfer to fail when otherwise it wouldn't. The outcome
wouldn't be too dreadful, since USB transfers can fail at any time and
the system is prepared to handle these failures gracefully. In
theory, for example, a signal might cause a driver's probe routine to
fail; in practice if the user doesn't want a probe to fail then he
shouldn't send interrupt signals to the probing process.

Overall, then, making these delays interruptible seems to be an
acceptable risk.

Signed-off-by: Alan Stern <st...@rowland.harvard.edu>
Reported-and-tested-by: syzbot+ada0f7...@syzkaller.appspotmail.com
CC: sta...@vger.kernel.org

---

[as1964]

drivers/usb/core/message.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

Index: usb-devel/drivers/usb/core/message.c
===================================================================
--- usb-devel.orig/drivers/usb/core/message.c
+++ usb-devel/drivers/usb/core/message.c
@@ -60,12 +60,18 @@ static int usb_start_wait_urb(struct urb
goto out;

expire = timeout ? msecs_to_jiffies(timeout) : MAX_SCHEDULE_TIMEOUT;
- if (!wait_for_completion_timeout(&ctx.done, expire)) {
+ retval = wait_for_completion_interruptible_timeout(&ctx.done, expire);
+ if (retval <= 0) {
usb_kill_urb(urb);
- retval = (ctx.status == -ENOENT ? -ETIMEDOUT : ctx.status);

+ if (ctx.status != -ENOENT) /* URB already completed */
+ retval = ctx.status;
+ else if (retval == 0)

+ retval = -ETIMEDOUT;
+ else

+ retval = -EINTR;

Johan Hovold

unread,

Aug 30, 2021, 3:57:01 AM8/30/21

to Alan Stern, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

While probe() triggered through sysfs or by module loading is one
example, the USB msg helpers are also called in a lot of other
user-thread contexts such as open() calls etc. It might even be that the
majority of these calls can be done from user threads (post
enumeration).

> Overall, then, making these delays interruptible seems to be an
> acceptable risk.

Possibly, but changing the API like this to fix the usbfs ioctls seems
like using a bit of a too big hammer to me, especially when backporting
to stable.

> Signed-off-by: Alan Stern <st...@rowland.harvard.edu>
> Reported-and-tested-by: syzbot+ada0f7...@syzkaller.appspotmail.com
> CC: sta...@vger.kernel.org

Johan

Alan Stern

unread,

Aug 30, 2021, 10:46:15 AM8/30/21

to Johan Hovold, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On Mon, Aug 30, 2021 at 09:56:50AM +0200, Johan Hovold wrote:
> On Sat, Aug 28, 2021 at 09:58:25PM -0400, Alan Stern wrote:
> > This patch fixes the problem by converting the uninterruptible wait to
> > an interruptible one. For the most part this won't affect calls to
> > usb_start_wait_urb(), because they are made by kernel threads and so
> > can't receive most signals.
> >
> > But in some cases such calls may occur in user threads in contexts
> > other than usbfs ioctls. A signal in these circumstances could cause
> > a USB transfer to fail when otherwise it wouldn't. The outcome
> > wouldn't be too dreadful, since USB transfers can fail at any time and
> > the system is prepared to handle these failures gracefully. In
> > theory, for example, a signal might cause a driver's probe routine to
> > fail; in practice if the user doesn't want a probe to fail then he
> > shouldn't send interrupt signals to the probing process.
>
> While probe() triggered through sysfs or by module loading is one
> example, the USB msg helpers are also called in a lot of other
> user-thread contexts such as open() calls etc. It might even be that the
> majority of these calls can be done from user threads (post
> enumeration).

Could be. It's not a well defined matter; it depends on what drivers
are in use and how they are used.

Consider that a control message in a driver is likely to use the
default USB_CTRL_[GS]ET_TIMEOUT value of 5 seconds. Does it make sense
to allow uninterruptible wait states to last as long as that?

And to what extent does it matter if we make these delays
interruptible? A signal delivered during a system call will be fielded
when the call returns if not earlier; the only difference will be that
now some USB messages may be aborted. For things like SIGINT or
SIGTERM this seems reasonable. (I'm not so sure about things like
SIGALRM, SIGIO, or SIGSTOP, though.)

> > Overall, then, making these delays interruptible seems to be an
> > acceptable risk.
>
> Possibly, but changing the API like this to fix the usbfs ioctls seems
> like using a bit of a too big hammer to me, especially when backporting
> to stable.

Perhaps the stable backport could be delayed for a while (say, one
release cycle).

Do you have alternative suggestions? I don't think we want special
interruptible versions of usb_control_msg() and usb_bulk_msg() just for
use by usbfs.

Alan Stern

Oliver Neukum

unread,

Aug 30, 2021, 11:12:00 AM8/30/21

to Alan Stern, Johan Hovold, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On 30.08.21 16:46, Alan Stern wrote:
> Do you have alternative suggestions? I don't think we want special
> interruptible versions of usb_control_msg() and usb_bulk_msg() just for
> use by usbfs.

Hi,

why not just pass a flag?

Regards
Oliver

Alan Stern

unread,

Aug 30, 2021, 12:09:16 PM8/30/21

to Oliver Neukum, Johan Hovold, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

We could. But you're ignoring the question I asked earlier in that
email: Is a 5-second uninterruptible delay (the default USB control
timeout) acceptable in general?

Alan Stern

Oliver Neukum

unread,

Aug 31, 2021, 4:53:08 AM8/31/21

to Alan Stern, Johan Hovold, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

We cannot change the five seconds, so this boils down to whether
we can always handle signals when we need to send control messages.
The answer to that is negative.

Suspend/resume, block IO, probe, ioctl(). This list will be rather long.

Regards
Oliver

Johan Hovold

unread,

Aug 31, 2021, 5:14:08 AM8/31/21

to Alan Stern, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On Mon, Aug 30, 2021 at 10:46:13AM -0400, Alan Stern wrote:
> On Mon, Aug 30, 2021 at 09:56:50AM +0200, Johan Hovold wrote:
> > On Sat, Aug 28, 2021 at 09:58:25PM -0400, Alan Stern wrote:
> > > This patch fixes the problem by converting the uninterruptible wait to
> > > an interruptible one. For the most part this won't affect calls to
> > > usb_start_wait_urb(), because they are made by kernel threads and so
> > > can't receive most signals.
> > >
> > > But in some cases such calls may occur in user threads in contexts
> > > other than usbfs ioctls. A signal in these circumstances could cause
> > > a USB transfer to fail when otherwise it wouldn't. The outcome
> > > wouldn't be too dreadful, since USB transfers can fail at any time and
> > > the system is prepared to handle these failures gracefully. In
> > > theory, for example, a signal might cause a driver's probe routine to
> > > fail; in practice if the user doesn't want a probe to fail then he
> > > shouldn't send interrupt signals to the probing process.
> >
> > While probe() triggered through sysfs or by module loading is one
> > example, the USB msg helpers are also called in a lot of other
> > user-thread contexts such as open() calls etc. It might even be that the
> > majority of these calls can be done from user threads (post
> > enumeration).
>
> Could be. It's not a well defined matter; it depends on what drivers
> are in use and how they are used.

Right, but the commit message seemed to suggest that these helpers being
run from interruptible threads was the exception.

> Consider that a control message in a driver is likely to use the
> default USB_CTRL_[GS]ET_TIMEOUT value of 5 seconds. Does it make sense
> to allow uninterruptible wait states to last as long as that?

Perhaps sometimes? I don't have a use case at hand, but I haven't
reviewed all drivers either.

The comment above usb_start_wait_urb() (which also needs to be updated,
by the way) even suggests that drivers should "implement their own
interruptible routines" so perhaps this has just gone unnoticed for 20
odd years. And the question then becomes, why didn't we use
interruptible sleep from the start?

And trying to answer that I find that that's precisely what we did, but
for some reason it was changed to uninterruptible sleep in v2.4.11
without a motivation (that I could easily find spelled out).

> And to what extent does it matter if we make these delays
> interruptible? A signal delivered during a system call will be fielded
> when the call returns if not earlier; the only difference will be that
> now some USB messages may be aborted. For things like SIGINT or
> SIGTERM this seems reasonable. (I'm not so sure about things like
> SIGALRM, SIGIO, or SIGSTOP, though.)

I'm not saying I'm necessarily against the change. It just seemed a bit
rushed to change the (stable) API like this while claiming it won't
affect most call sites.

> > > Overall, then, making these delays interruptible seems to be an
> > > acceptable risk.
> >
> > Possibly, but changing the API like this to fix the usbfs ioctls seems
> > like using a bit of a too big hammer to me, especially when backporting
> > to stable.
>
> Perhaps the stable backport could be delayed for a while (say, one
> release cycle).

That might work.

> Do you have alternative suggestions? I don't think we want special
> interruptible versions of usb_control_msg() and usb_bulk_msg() just for
> use by usbfs.

usbfs could carry a temporary local implementation as the documentation
for usb_start_wait_urb() currently suggests. I assume we can't limit the
usbfs timeouts.

Johan

Oliver Neukum

unread,

Aug 31, 2021, 6:47:57 AM8/31/21

to Johan Hovold, Alan Stern, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On 31.08.21 11:13, Johan Hovold wrote:
> The comment above usb_start_wait_urb() (which also needs to be updated,
> by the way) even suggests that drivers should "implement their own
> interruptible routines" so perhaps this has just gone unnoticed for 20
> odd years. And the question then becomes, why didn't we use
> interruptible sleep from the start?
>
> And trying to answer that I find that that's precisely what we did, but
> for some reason it was changed to uninterruptible sleep in v2.4.11
> without a motivation (that I could easily find spelled out).

I must admit that I do not remember. But there are a lot of situations
requiring control messages that do not allow signal delivery.

Take for example a device that is HID and storage. We are doing
HID error handling, which can involve a device reset. You absolutely
cannot deliver a signal right now, as you have a device that is in an
undefined
state in the block layer.

It looks to me very much like we need both versions and as a rule of thumb,
while you would use GFP_NOIO you must also use the uninterruptible
messaging.

Regards
Oliver

Oliver Neukum

unread,

Aug 31, 2021, 7:02:17 AM8/31/21

to Johan Hovold, Alan Stern, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

Upon further considerations forcing user space to handle signals also
breaks the API, albeit in a more subtle manner. I'd suggest that we use
wait_event_killable_timeout(). And do it the way Alan initially disliked,
that is code a version for use by usbfs.

Thus we'd avoid the issue of having an unkillable process, but we do
not impose a need to handle signals.

Regards
Oliver

Johan Hovold

unread,

Aug 31, 2021, 7:10:43 AM8/31/21

to Alan Stern, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On Tue, Aug 31, 2021 at 11:13:59AM +0200, Johan Hovold wrote:
> On Mon, Aug 30, 2021 at 10:46:13AM -0400, Alan Stern wrote:

> > Consider that a control message in a driver is likely to use the
> > default USB_CTRL_[GS]ET_TIMEOUT value of 5 seconds. Does it make sense
> > to allow uninterruptible wait states to last as long as that?
>
> Perhaps sometimes? I don't have a use case at hand, but I haven't
> reviewed all drivers either.
>
> The comment above usb_start_wait_urb() (which also needs to be updated,
> by the way) even suggests that drivers should "implement their own
> interruptible routines" so perhaps this has just gone unnoticed for 20
> odd years. And the question then becomes, why didn't we use
> interruptible sleep from the start?
>
> And trying to answer that I find that that's precisely what we did, but
> for some reason it was changed to uninterruptible sleep in v2.4.11
> without a motivation (that I could easily find spelled out).

Here it is:

https://lore.kernel.org/lkml/2001081801...@devserv.devel.redhat.com/

It's rationale does not seem valid anymore (i.e. the NULL deref), but
the example is still instructive.

If you close for example a v4l application you still expect the device
to be shut down orderly but with interruptible sleep all control
requests during shutdown will be aborted immediately.

Similar for USB serial devices which would for example fail to deassert
DTR/RTS, etc. (I just verified with the patch applied.)

Johan

Alan Stern

unread,

Aug 31, 2021, 1:03:40 PM8/31/21

to Johan Hovold, Oliver Neukum, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On Tue, Aug 31, 2021 at 01:02:11PM +0200, Oliver Neukum wrote:
> Upon further considerations forcing user space to handle signals also
> breaks the API, albeit in a more subtle manner. I'd suggest that we use
> wait_event_killable_timeout(). And do it the way Alan initially disliked,
> that is code a version for use by usbfs.
>
> Thus we'd avoid the issue of having an unkillable process, but we do
> not impose a need to handle signals.

Okay, I'll play it safe and rewrite the patch, adding special-purpose
routines just for usbfs.

Will wait_event_killable_timeout() prevent complaints about tasks being
blocked for too long (the reason syzbot reported this in the first
place)?

Alan Stern

Oliver Neukum

unread,

Sep 1, 2021, 4:16:29 AM9/1/21

to Alan Stern, Johan Hovold, Greg KH, syzbot, syzkall...@googlegroups.com, USB mailing list

On 31.08.21 19:03, Alan Stern wrote:
>
> Will wait_event_killable_timeout() prevent complaints about tasks being
> blocked for too long (the reason syzbot reported this in the first
> place)?

Very good question. TASK_KILLABLE is its own task state, so I'd say
if it doesn't the test suite needs to be fixed.

Regards
Oliver

Alan Stern

unread,

Sep 2, 2021, 4:04:09 PM9/2/21

to syzbot, linux-...@vger.kernel.org, linu...@vger.kernel.org, syzkall...@googlegroups.com

Let's see if making the wait killable rather than interruptible fixes the
issue. This patch avoids modifying the regular usb_start_wait_urb(),
creating a separate usbfs_start_wait_urb() just for this purpose.

Alan Stern

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d5ae8d7f85b7

Index: usb-devel/drivers/usb/core/devio.c
===================================================================
--- usb-devel.orig/drivers/usb/core/devio.c
+++ usb-devel/drivers/usb/core/devio.c
@@ -32,6 +32,7 @@
#include <linux/usb.h>
#include <linux/usbdevice_fs.h>
#include <linux/usb/hcd.h> /* for usbcore internals */
+#include <linux/usb/quirks.h>
#include <linux/cdev.h>
#include <linux/notifier.h>
#include <linux/security.h>
@@ -1102,14 +1103,55 @@ static int usbdev_release(struct inode *
return 0;
}

+static void usbfs_blocking_completion(struct urb *urb)
+{
+ complete((struct completion *) urb->context);
+}
+
+/*
+ * Much like usb_start_wait_urb, but returns status separately from
+ * actual_length and uses a killable wait.
+ */
+static int usbfs_start_wait_urb(struct urb *urb, int timeout,
+ unsigned int *actlen)
+{
+ DECLARE_COMPLETION_ONSTACK(ctx);
+ unsigned long expire;
+ int rc;
+
+ urb->context = &ctx;
+ urb->complete = usbfs_blocking_completion;
+ *actlen = 0;
+ rc = usb_submit_urb(urb, GFP_KERNEL);
+ if (unlikely(rc))
+ return rc;
+
+ expire = (timeout ? msecs_to_jiffies(timeout) : MAX_SCHEDULE_TIMEOUT);
+ rc = wait_for_completion_killable_timeout(&ctx, expire);
+ if (rc <= 0) {
+ usb_kill_urb(urb);
+ *actlen = urb->actual_length;
+ if (urb->status != -ENOENT)
+ ; /* Completed before it was killed */
+ else if (rc < 0)
+ return -EINTR;
+ else
+ return -ETIMEDOUT;
+ }
+ *actlen = urb->actual_length;
+ return urb->status;
+}
+
static int do_proc_control(struct usb_dev_state *ps,
struct usbdevfs_ctrltransfer *ctrl)
{
struct usb_device *dev = ps->dev;
unsigned int tmo;
unsigned char *tbuf;
- unsigned wLength;
+ unsigned wLength, actlen;
int i, pipe, ret;
+ struct urb *urb = NULL;
+ struct usb_ctrlrequest *dr = NULL;

ret = check_ctrlrecip(ps, ctrl->bRequestType, ctrl->bRequest,
ctrl->wIndex);
@@ -1122,51 +1164,63 @@ static int do_proc_control(struct usb_de
sizeof(struct usb_ctrlrequest));
if (ret)
return ret;
+
+ ret = -ENOMEM;
tbuf = (unsigned char *)__get_free_page(GFP_KERNEL);
- if (!tbuf) {
- ret = -ENOMEM;
+ if (!tbuf)
goto done;
- }
+ urb = usb_alloc_urb(0, GFP_NOIO);
+ if (!urb)
+ goto done;
+ dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
+ if (!dr)
+ goto done;
+
+ dr->bRequestType = ctrl->bRequestType;
+ dr->bRequest = ctrl->bRequest;
+ dr->wValue = cpu_to_le16(ctrl->wValue);
+ dr->wIndex = cpu_to_le16(ctrl->wIndex);
+ dr->wLength = cpu_to_le16(ctrl->wLength);
+
tmo = ctrl->timeout;
snoop(&dev->dev, "control urb: bRequestType=%02x "
"bRequest=%02x wValue=%04x "
"wIndex=%04x wLength=%04x\n",
ctrl->bRequestType, ctrl->bRequest, ctrl->wValue,
ctrl->wIndex, ctrl->wLength);
- if ((ctrl->bRequestType & USB_DIR_IN) && ctrl->wLength) {
+
+ if ((ctrl->bRequestType & USB_DIR_IN) && wLength) {
pipe = usb_rcvctrlpipe(dev, 0);
- snoop_urb(dev, NULL, pipe, ctrl->wLength, tmo, SUBMIT, NULL, 0);
+ usb_fill_control_urb(urb, dev, pipe, (unsigned char *) dr, tbuf,
+ wLength, NULL, NULL);
+ snoop_urb(dev, NULL, pipe, wLength, tmo, SUBMIT, NULL, 0);

usb_unlock_device(dev);
- i = usb_control_msg(dev, pipe, ctrl->bRequest,
- ctrl->bRequestType, ctrl->wValue, ctrl->wIndex,
- tbuf, ctrl->wLength, tmo);
+ i = usbfs_start_wait_urb(urb, tmo, &actlen);
usb_lock_device(dev);
- snoop_urb(dev, NULL, pipe, max(i, 0), min(i, 0), COMPLETE,
- tbuf, max(i, 0));
- if ((i > 0) && ctrl->wLength) {
- if (copy_to_user(ctrl->data, tbuf, i)) {
+ snoop_urb(dev, NULL, pipe, actlen, i, COMPLETE, tbuf, actlen);
+ if (!i && actlen) {
+ if (copy_to_user(ctrl->data, tbuf, actlen)) {
ret = -EFAULT;
- goto done;
+ goto recv_fault;
}
}
} else {
- if (ctrl->wLength) {
- if (copy_from_user(tbuf, ctrl->data, ctrl->wLength)) {
+ if (wLength) {
+ if (copy_from_user(tbuf, ctrl->data, wLength)) {
ret = -EFAULT;
goto done;
}
}
pipe = usb_sndctrlpipe(dev, 0);
- snoop_urb(dev, NULL, pipe, ctrl->wLength, tmo, SUBMIT,
- tbuf, ctrl->wLength);
+ usb_fill_control_urb(urb, dev, pipe, (unsigned char *) dr, tbuf,
+ wLength, NULL, NULL);
+ snoop_urb(dev, NULL, pipe, wLength, tmo, SUBMIT, tbuf, wLength);

usb_unlock_device(dev);
- i = usb_control_msg(dev, pipe, ctrl->bRequest,
- ctrl->bRequestType, ctrl->wValue, ctrl->wIndex,
- tbuf, ctrl->wLength, tmo);
+ i = usbfs_start_wait_urb(urb, tmo, &actlen);
usb_lock_device(dev);
- snoop_urb(dev, NULL, pipe, max(i, 0), min(i, 0), COMPLETE, NULL, 0);
+ snoop_urb(dev, NULL, pipe, actlen, i, COMPLETE, NULL, 0);
}
if (i < 0 && i != -EPIPE) {
dev_printk(KERN_DEBUG, &dev->dev, "usbfs: USBDEVFS_CONTROL "
@@ -1174,8 +1228,15 @@ static int do_proc_control(struct usb_de
current->comm, ctrl->bRequestType, ctrl->bRequest,
ctrl->wLength, i);
}
- ret = i;
+ ret = (i < 0 ? i : actlen);
+
+ recv_fault:
+ /* Linger a bit, prior to the next control message. */
+ if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
+ msleep(200);
done:
+ kfree(dr);
+ usb_free_urb(urb);
free_page((unsigned long) tbuf);
usbfs_decrease_memory_usage(PAGE_SIZE + sizeof(struct urb) +
sizeof(struct usb_ctrlrequest));
@@ -1195,10 +1256,11 @@ static int do_proc_bulk(struct usb_dev_s
struct usbdevfs_bulktransfer *bulk)
{
struct usb_device *dev = ps->dev;
- unsigned int tmo, len1, pipe;
- int len2;
+ unsigned int tmo, len1, len2, pipe;
unsigned char *tbuf;
int i, ret;
+ struct urb *urb = NULL;
+ struct usb_host_endpoint *ep;

ret = findintfep(ps->dev, bulk->ep);
if (ret < 0)
@@ -1206,14 +1268,17 @@ static int do_proc_bulk(struct usb_dev_s
ret = checkintf(ps, ret);
if (ret)
return ret;
+
+ len1 = bulk->len;
+ if (len1 < 0 || len1 >= (INT_MAX - sizeof(struct urb)))
+ return -EINVAL;
+
if (bulk->ep & USB_DIR_IN)
pipe = usb_rcvbulkpipe(dev, bulk->ep & 0x7f);
else
pipe = usb_sndbulkpipe(dev, bulk->ep & 0x7f);
- if (!usb_maxpacket(dev, pipe, !(bulk->ep & USB_DIR_IN)))
- return -EINVAL;
- len1 = bulk->len;
- if (len1 >= (INT_MAX - sizeof(struct urb)))
+ ep = usb_pipe_endpoint(dev, pipe);
+ if (!ep || !usb_endpoint_maxp(&ep->desc))
return -EINVAL;
ret = usbfs_increase_memory_usage(len1 + sizeof(struct urb));
if (ret)
@@ -1223,17 +1288,29 @@ static int do_proc_bulk(struct usb_dev_s
* len1 can be almost arbitrarily large. Don't WARN if it's
* too big, just fail the request.
*/
+ ret = -ENOMEM;
tbuf = kmalloc(len1, GFP_KERNEL | __GFP_NOWARN);
- if (!tbuf) {
- ret = -ENOMEM;
+ if (!tbuf)
goto done;
+ urb = usb_alloc_urb(0, GFP_KERNEL);
+ if (!urb)
+ goto done;
+
+ if ((ep->desc.bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) ==
+ USB_ENDPOINT_XFER_INT) {
+ pipe = (pipe & ~(3 << 30)) | (PIPE_INTERRUPT << 30);
+ usb_fill_int_urb(urb, dev, pipe, tbuf, len1,
+ NULL, NULL, ep->desc.bInterval);
+ } else {
+ usb_fill_bulk_urb(urb, dev, pipe, tbuf, len1, NULL, NULL);
}
+
tmo = bulk->timeout;
if (bulk->ep & 0x80) {
snoop_urb(dev, NULL, pipe, len1, tmo, SUBMIT, NULL, 0);

usb_unlock_device(dev);
- i = usb_bulk_msg(dev, pipe, tbuf, len1, &len2, tmo);
+ i = usbfs_start_wait_urb(urb, tmo, &len2);
usb_lock_device(dev);
snoop_urb(dev, NULL, pipe, len2, i, COMPLETE, tbuf, len2);

@@ -1253,12 +1330,13 @@ static int do_proc_bulk(struct usb_dev_s
snoop_urb(dev, NULL, pipe, len1, tmo, SUBMIT, tbuf, len1);

usb_unlock_device(dev);
- i = usb_bulk_msg(dev, pipe, tbuf, len1, &len2, tmo);
+ i = usbfs_start_wait_urb(urb, tmo, &len2);
usb_lock_device(dev);
snoop_urb(dev, NULL, pipe, len2, i, COMPLETE, NULL, 0);
}
ret = (i < 0 ? i : len2);
done:
+ usb_free_urb(urb);
kfree(tbuf);
usbfs_decrease_memory_usage(len1 + sizeof(struct urb));
return ret;

syzbot

unread,

Sep 2, 2021, 4:46:15 PM9/2/21

to linux-...@vger.kernel.org, linu...@vger.kernel.org, st...@rowland.harvard.edu, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ada0f7...@syzkaller.appspotmail.com

Tested on:

commit: d5ae8d7f Revert "media: dvb header files: move some he..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel config: https://syzkaller.appspot.com/x/.config?x=2fd902af77ff1e56
dashboard link: https://syzkaller.appspot.com/bug?extid=ada0f7d3d9fd2016d927
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1

patch: https://syzkaller.appspot.com/x/patch.diff?x=112280a3300000

Reply all

Reply to author

Forward

[syzbot] INFO: task hung in do_proc_bulk

syzbot

Alan Stern

syzbot

Alan Stern

Johan Hovold

Alan Stern

Oliver Neukum

Alan Stern

Oliver Neukum

Johan Hovold

Oliver Neukum

Oliver Neukum

Johan Hovold

Alan Stern

Oliver Neukum

Alan Stern

syzbot