Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] USB: fix deadlock in HCD code

0 views
Skip to first unread message

Jiri Kosina

unread,
May 21, 2008, 8:20:13 AM5/21/08
to
hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
contexts, so the non-IRQ context has to disable IRQs in order to prevent
deadlocking with IRQ context.

Signed-off-by: Jiri Kosina <jko...@suse.cz>

drivers/usb/core/hcd.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index bf10e9c..19279ed 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1000,8 +1000,9 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);
int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
{
int rc = 0;
+ unsigned long flags;

- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);

/* Check that the URB isn't being killed */
if (unlikely(urb->reject)) {
@@ -1034,7 +1035,7 @@ int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
goto done;
}
done:
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
return rc;
}
EXPORT_SYMBOL_GPL(usb_hcd_link_urb_to_ep);
@@ -1106,10 +1107,11 @@ EXPORT_SYMBOL_GPL(usb_hcd_check_unlink_urb);
*/
void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb)
{
+ unsigned long flags;
/* clear all state linking urb to this dev (and hcd) */
- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);
list_del_init(&urb->urb_list);
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
}
EXPORT_SYMBOL_GPL(usb_hcd_unlink_urb_from_ep);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jiri Kosina

unread,
May 21, 2008, 9:30:06 AM5/21/08
to
On Wed, 21 May 2008, Oliver Neukum wrote:

> > hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> > contexts, so the non-IRQ context has to disable IRQs in order to prevent
> > deadlocking with IRQ context.

> Which non-irq context is that?

One example -- assume usb_submit_urb() called from non-IRQ context. Then

usb_hcd_submit_urb() -> rh_urb_enqueue() -> rh_queue_status() ->
usb_hcd_link_urb_to_ep().

--
Jiri Kosina
SUSE Labs

Oliver Neukum

unread,
May 21, 2008, 9:30:16 AM5/21/08
to
Am Mittwoch 21 Mai 2008 14:09:43 schrieb Jiri Kosina:
> hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> contexts, so the non-IRQ context has to disable IRQs in order to prevent
> deadlocking with IRQ context.

Which non-irq context is that?

Regards
Oliver

Jiri Kosina

unread,
May 21, 2008, 9:40:10 AM5/21/08
to
On Wed, 21 May 2008, Oliver Neukum wrote:

> This turns out not to be the case. Interrupts are disabled.

You're right, this callchain can't cause the deadlock indeed. I'll go
through the other possibilities.

To give some background to others reading this thread -- Leonardo (in CC)
reported [1] that this patch fixes hard kernel hangs he has been seeing
when using USB CDMA modem.

Still, we don't currently seem to see the place where the interrupts get
enabled that makes the deadlock to trigger.

[1] https://bugzilla.novell.com/show_bug.cgi?id=378509

Oliver Neukum

unread,
May 21, 2008, 9:40:07 AM5/21/08
to
Am Mittwoch 21 Mai 2008 15:27:50 schrieb Jiri Kosina:
> On Wed, 21 May 2008, Oliver Neukum wrote:
>
> > > hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> > > contexts, so the non-IRQ context has to disable IRQs in order to prevent
> > > deadlocking with IRQ context.
> > Which non-irq context is that?
>
> One example -- assume usb_submit_urb() called from non-IRQ context. Then
>
> usb_hcd_submit_urb() -> rh_urb_enqueue() -> rh_queue_status() ->
> usb_hcd_link_urb_to_ep().
>

This turns out not to be the case. Interrupts are disabled.

static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
{
int retval;
unsigned long flags;
int len = 1 + (urb->dev->maxchild / 8);

spin_lock_irqsave (&hcd_root_hub_lock, flags);
if (hcd->status_urb || urb->transfer_buffer_length < len) {
dev_dbg (hcd->self.controller, "not queuing rh status urb\n");
retval = -EINVAL;
goto done;
}

retval = usb_hcd_link_urb_to_ep(hcd, urb);

I'll investigate.

Regards
Oliver

Oliver Neukum

unread,
May 21, 2008, 9:50:12 AM5/21/08
to
Am Mittwoch 21 Mai 2008 15:36:49 schrieb Jiri Kosina:
> On Wed, 21 May 2008, Oliver Neukum wrote:
>
> > This turns out not to be the case. Interrupts are disabled.
>
> You're right, this callchain can't cause the deadlock indeed. I'll go
> through the other possibilities.

usb_hcd_flush_endpoint() is an obvious case. Used in the suspend code.
Your patch is indeed correct, but I fear there might be a second bug caused
by wrong calling conditions.

Regards
Oliver

Oliver Neukum

unread,
May 21, 2008, 9:50:17 AM5/21/08
to
Am Mittwoch 21 Mai 2008 15:27:50 schrieb Jiri Kosina:
> On Wed, 21 May 2008, Oliver Neukum wrote:
>
> > > hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> > > contexts, so the non-IRQ context has to disable IRQs in order to prevent
> > > deadlocking with IRQ context.
> > Which non-irq context is that?
>
> One example -- assume usb_submit_urb() called from non-IRQ context. Then
>
> usb_hcd_submit_urb() -> rh_urb_enqueue() -> rh_queue_status() ->
> usb_hcd_link_urb_to_ep().
>

Sorry, yes other code paths take it with enabled interrupts.
The patch is good and necessary for 2.6.26.

Regards
Oliver

Alan Stern

unread,
May 21, 2008, 10:20:18 AM5/21/08
to
On Wed, 21 May 2008, Oliver Neukum wrote:

> Am Mittwoch 21 Mai 2008 15:36:49 schrieb Jiri Kosina:
> > On Wed, 21 May 2008, Oliver Neukum wrote:
> >
> > > This turns out not to be the case. Interrupts are disabled.
> >
> > You're right, this callchain can't cause the deadlock indeed. I'll go
> > through the other possibilities.

The functions you are worried about (usb_hcd_link_urb_to_ep and
usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
be disabled by their callers. This patch isn't needed.

(Of course, other patches may be needed to insure that the callers do
indeed disable interrupts. But I'm not aware of any such need.)

> usb_hcd_flush_endpoint() is an obvious case.

It's not obvious to me. Where does usb_hcd_flush_endpoint() acquire
hcd_urb_list_lock with interrupts enabled?

> Used in the suspend code.
> Your patch is indeed correct, but I fear there might be a second bug caused
> by wrong calling conditions.

The problem in the Novell bugzilla entry was caused by the fact that
the OHCI irq routine was invoked with interrupts enabled, owing to a
missing IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.

Alan Stern

Jiri Kosina

unread,
May 21, 2008, 10:30:24 AM5/21/08
to
On Wed, 21 May 2008, Alan Stern wrote:

> > Used in the suspend code. Your patch is indeed correct, but I fear
> > there might be a second bug caused by wrong calling conditions.
> The problem in the Novell bugzilla entry was caused by the fact that the
> OHCI irq routine was invoked with interrupts enabled, owing to a missing
> IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.

That indeed is 2.6.25 kernel. I guess you are talking about commit
442258e2ff69 here. If so, the reporter is definitely using the kernel
containing this commit, and the lockups still trigger.

Seems that my patch is papering over the real bug (someone enabling
interrupts somewhere) indeed, but I can't seem to be able to find such
codepath.

Thanks,

--
Jiri Kosina
SUSE Labs

Oliver Neukum

unread,
May 21, 2008, 10:40:16 AM5/21/08
to
Am Mittwoch 21 Mai 2008 16:13:45 schrieb Alan Stern:
> > usb_hcd_flush_endpoint() is an obvious case.
>
> It's not obvious to me.  Where does usb_hcd_flush_endpoint() acquire
> hcd_urb_list_lock with interrupts enabled?

Sorry, yes looking again, it does complicated stuff with dropping the
lock but leaving interrupts disabled.

Sorry
Oliver

Alan Stern

unread,
May 21, 2008, 10:50:07 AM5/21/08
to
On Wed, 21 May 2008, Jiri Kosina wrote:

> On Wed, 21 May 2008, Alan Stern wrote:
>
> > > Used in the suspend code. Your patch is indeed correct, but I fear
> > > there might be a second bug caused by wrong calling conditions.
> > The problem in the Novell bugzilla entry was caused by the fact that the
> > OHCI irq routine was invoked with interrupts enabled, owing to a missing
> > IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.
>
> That indeed is 2.6.25 kernel. I guess you are talking about commit
> 442258e2ff69 here.

Yes.

> If so, the reporter is definitely using the kernel
> containing this commit, and the lockups still trigger.
>
> Seems that my patch is papering over the real bug (someone enabling
> interrupts somewhere) indeed, but I can't seem to be able to find such
> codepath.

You could try testing the interrupt-enable flag at various places
in ohci-hcd (start with finish_urb) and printing an error message if
interrupts are enabled.

One possibility is that in an earlier call to finish_urb,
usb_hcd_giveback_urb was called with interrupts disabled and returned
with interrupts enabled. In other words, some driver's callback
routine may have enabled interrupts incorrectly.

Alan Stern

Alan Stern

unread,
May 21, 2008, 10:50:22 AM5/21/08
to
On Wed, 21 May 2008, David Vrabel wrote:

> Alan Stern wrote:
> >
> > The functions you are worried about (usb_hcd_link_urb_to_ep and
> > usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
> > be disabled by their callers. This patch isn't needed.
>

> This requirement is the only reason the whci-hcd driver disables
> interrupts. Removing this requirement would reduce the time that
> interrupts are disabled in the whci-hcd driver.

That doesn't sound like a valid approach. If you don't disable
interrupts then you aren't protected against an interrupt handler
submitting an URB and accessing your data structures while whci-hcd is
in the middle of updating them.

David Vrabel

unread,
May 21, 2008, 11:00:24 AM5/21/08
to
Alan Stern wrote:
> On Wed, 21 May 2008, David Vrabel wrote:
>
>> Alan Stern wrote:
>>> The functions you are worried about (usb_hcd_link_urb_to_ep and
>>> usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
>>> be disabled by their callers. This patch isn't needed.
>> This requirement is the only reason the whci-hcd driver disables
>> interrupts. Removing this requirement would reduce the time that
>> interrupts are disabled in the whci-hcd driver.
>
> That doesn't sound like a valid approach. If you don't disable
> interrupts then you aren't protected against an interrupt handler
> submitting an URB and accessing your data structures while whci-hcd is
> in the middle of updating them.

I can't see how urbs can be submitted to whci-hcd from an interrupt
handler (urb callbacks are always called from a workqueue thread) but
they could be submitted from a timer, so you are correct.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

Alan Stern

unread,
May 21, 2008, 11:00:22 AM5/21/08
to
On Wed, 21 May 2008, David Vrabel wrote:

> Alan Stern wrote:
> > On Wed, 21 May 2008, David Vrabel wrote:
> >
> >> Alan Stern wrote:
> >>> The functions you are worried about (usb_hcd_link_urb_to_ep and
> >>> usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
> >>> be disabled by their callers. This patch isn't needed.
> >> This requirement is the only reason the whci-hcd driver disables
> >> interrupts. Removing this requirement would reduce the time that
> >> interrupts are disabled in the whci-hcd driver.
> >
> > That doesn't sound like a valid approach. If you don't disable
> > interrupts then you aren't protected against an interrupt handler
> > submitting an URB and accessing your data structures while whci-hcd is
> > in the middle of updating them.
>
> I can't see how urbs can be submitted to whci-hcd from an interrupt
> handler (urb callbacks are always called from a workqueue thread)

_Your_ callbacks are always called from a workqueue thread. Another
driver's callbacks might not be, and that other driver might submit an
URB. Or unlink one.

> but
> they could be submitted from a timer, so you are correct.

Alan Stern

David Vrabel

unread,
May 21, 2008, 11:30:24 AM5/21/08
to
Alan Stern wrote:
>
> The functions you are worried about (usb_hcd_link_urb_to_ep and
> usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
> be disabled by their callers. This patch isn't needed.

This requirement is the only reason the whci-hcd driver disables


interrupts. Removing this requirement would reduce the time that
interrupts are disabled in the whci-hcd driver.

David


--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

Leonardo Chiquitto

unread,
May 21, 2008, 4:00:22 PM5/21/08
to

>> > > Used in the suspend code. Your patch is indeed correct, but I fear
>> > > there might be a second bug caused by wrong calling conditions.
>> > The problem in the Novell bugzilla entry was caused by the fact that the
>> > OHCI irq routine was invoked with interrupts enabled, owing to a missing
>> > IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.
>>
>> That indeed is 2.6.25 kernel. I guess you are talking about commit
>> 442258e2ff69 here.
>
> Yes.
>
>> If so, the reporter is definitely using the kernel
>> containing this commit, and the lockups still trigger.
>>
>> Seems that my patch is papering over the real bug (someone enabling
>> interrupts somewhere) indeed, but I can't seem to be able to find such
>> codepath.
>
> You could try testing the interrupt-enable flag at various places
> in ohci-hcd (start with finish_urb) and printing an error message if
> interrupts are enabled.
>
> One possibility is that in an earlier call to finish_urb,
> usb_hcd_giveback_urb was called with interrupts disabled and returned
> with interrupts enabled. In other words, some driver's callback
> routine may have enabled interrupts incorrectly.

Alan,

I am trying your suggestion right now. Lets see if it finds something.

Thanks,
Leonardo

Oliver Neukum

unread,
May 21, 2008, 4:10:18 PM5/21/08
to
Am Mittwoch 21 Mai 2008 16:46:45 schrieb Alan Stern:
> One possibility is that in an earlier call to finish_urb,
> usb_hcd_giveback_urb was called with interrupts disabled and returned
> with interrupts enabled.  In other words, some driver's callback
> routine may have enabled interrupts incorrectly.

Are we sure the interrupt line is not shared with another driver that calls
spin_unlock_irq() in its interrupt handler?

Regards
Oliver

Peter Zijlstra

unread,
May 21, 2008, 6:30:23 PM5/21/08
to
On Wed, 2008-05-21 at 14:09 +0200, Jiri Kosina wrote:
> hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> contexts, so the non-IRQ context has to disable IRQs in order to prevent
> deadlocking with IRQ context.

Just wondering,... doesn't lockdep say anything about the situation?

0 new messages