Signed-off-by: Jiri Kosina <jko...@suse.cz>
drivers/usb/core/hcd.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index bf10e9c..19279ed 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1000,8 +1000,9 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);
int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
{
int rc = 0;
+ unsigned long flags;
- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);
/* Check that the URB isn't being killed */
if (unlikely(urb->reject)) {
@@ -1034,7 +1035,7 @@ int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb)
goto done;
}
done:
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
return rc;
}
EXPORT_SYMBOL_GPL(usb_hcd_link_urb_to_ep);
@@ -1106,10 +1107,11 @@ EXPORT_SYMBOL_GPL(usb_hcd_check_unlink_urb);
*/
void usb_hcd_unlink_urb_from_ep(struct usb_hcd *hcd, struct urb *urb)
{
+ unsigned long flags;
/* clear all state linking urb to this dev (and hcd) */
- spin_lock(&hcd_urb_list_lock);
+ spin_lock_irqsave(&hcd_urb_list_lock, flags);
list_del_init(&urb->urb_list);
- spin_unlock(&hcd_urb_list_lock);
+ spin_unlock_irqrestore(&hcd_urb_list_lock, flags);
}
EXPORT_SYMBOL_GPL(usb_hcd_unlink_urb_from_ep);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> > hcd_urb_list_lock is used for synchronization between IRQ and non-IRQ
> > contexts, so the non-IRQ context has to disable IRQs in order to prevent
> > deadlocking with IRQ context.
> Which non-irq context is that?
One example -- assume usb_submit_urb() called from non-IRQ context. Then
usb_hcd_submit_urb() -> rh_urb_enqueue() -> rh_queue_status() ->
usb_hcd_link_urb_to_ep().
--
Jiri Kosina
SUSE Labs
Which non-irq context is that?
Regards
Oliver
> This turns out not to be the case. Interrupts are disabled.
You're right, this callchain can't cause the deadlock indeed. I'll go
through the other possibilities.
To give some background to others reading this thread -- Leonardo (in CC)
reported [1] that this patch fixes hard kernel hangs he has been seeing
when using USB CDMA modem.
Still, we don't currently seem to see the place where the interrupts get
enabled that makes the deadlock to trigger.
[1] https://bugzilla.novell.com/show_bug.cgi?id=378509
This turns out not to be the case. Interrupts are disabled.
static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
{
int retval;
unsigned long flags;
int len = 1 + (urb->dev->maxchild / 8);
spin_lock_irqsave (&hcd_root_hub_lock, flags);
if (hcd->status_urb || urb->transfer_buffer_length < len) {
dev_dbg (hcd->self.controller, "not queuing rh status urb\n");
retval = -EINVAL;
goto done;
}
retval = usb_hcd_link_urb_to_ep(hcd, urb);
I'll investigate.
Regards
Oliver
usb_hcd_flush_endpoint() is an obvious case. Used in the suspend code.
Your patch is indeed correct, but I fear there might be a second bug caused
by wrong calling conditions.
Regards
Oliver
Sorry, yes other code paths take it with enabled interrupts.
The patch is good and necessary for 2.6.26.
Regards
Oliver
> Am Mittwoch 21 Mai 2008 15:36:49 schrieb Jiri Kosina:
> > On Wed, 21 May 2008, Oliver Neukum wrote:
> >
> > > This turns out not to be the case. Interrupts are disabled.
> >
> > You're right, this callchain can't cause the deadlock indeed. I'll go
> > through the other possibilities.
The functions you are worried about (usb_hcd_link_urb_to_ep and
usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
be disabled by their callers. This patch isn't needed.
(Of course, other patches may be needed to insure that the callers do
indeed disable interrupts. But I'm not aware of any such need.)
> usb_hcd_flush_endpoint() is an obvious case.
It's not obvious to me. Where does usb_hcd_flush_endpoint() acquire
hcd_urb_list_lock with interrupts enabled?
> Used in the suspend code.
> Your patch is indeed correct, but I fear there might be a second bug caused
> by wrong calling conditions.
The problem in the Novell bugzilla entry was caused by the fact that
the OHCI irq routine was invoked with interrupts enabled, owing to a
missing IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.
Alan Stern
> > Used in the suspend code. Your patch is indeed correct, but I fear
> > there might be a second bug caused by wrong calling conditions.
> The problem in the Novell bugzilla entry was caused by the fact that the
> OHCI irq routine was invoked with interrupts enabled, owing to a missing
> IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.
That indeed is 2.6.25 kernel. I guess you are talking about commit
442258e2ff69 here. If so, the reporter is definitely using the kernel
containing this commit, and the lockups still trigger.
Seems that my patch is papering over the real bug (someone enabling
interrupts somewhere) indeed, but I can't seem to be able to find such
codepath.
Thanks,
--
Jiri Kosina
SUSE Labs
Sorry, yes looking again, it does complicated stuff with dropping the
lock but leaving interrupts disabled.
Sorry
Oliver
> On Wed, 21 May 2008, Alan Stern wrote:
>
> > > Used in the suspend code. Your patch is indeed correct, but I fear
> > > there might be a second bug caused by wrong calling conditions.
> > The problem in the Novell bugzilla entry was caused by the fact that the
> > OHCI irq routine was invoked with interrupts enabled, owing to a missing
> > IRQF_DISABLED flag. That bug has already been fixed in 2.6.25.
>
> That indeed is 2.6.25 kernel. I guess you are talking about commit
> 442258e2ff69 here.
Yes.
> If so, the reporter is definitely using the kernel
> containing this commit, and the lockups still trigger.
>
> Seems that my patch is papering over the real bug (someone enabling
> interrupts somewhere) indeed, but I can't seem to be able to find such
> codepath.
You could try testing the interrupt-enable flag at various places
in ohci-hcd (start with finish_urb) and printing an error message if
interrupts are enabled.
One possibility is that in an earlier call to finish_urb,
usb_hcd_giveback_urb was called with interrupts disabled and returned
with interrupts enabled. In other words, some driver's callback
routine may have enabled interrupts incorrectly.
Alan Stern
> Alan Stern wrote:
> >
> > The functions you are worried about (usb_hcd_link_urb_to_ep and
> > usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
> > be disabled by their callers. This patch isn't needed.
>
> This requirement is the only reason the whci-hcd driver disables
> interrupts. Removing this requirement would reduce the time that
> interrupts are disabled in the whci-hcd driver.
That doesn't sound like a valid approach. If you don't disable
interrupts then you aren't protected against an interrupt handler
submitting an URB and accessing your data structures while whci-hcd is
in the middle of updating them.
I can't see how urbs can be submitted to whci-hcd from an interrupt
handler (urb callbacks are always called from a workqueue thread) but
they could be submitted from a timer, so you are correct.
David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/
> Alan Stern wrote:
> > On Wed, 21 May 2008, David Vrabel wrote:
> >
> >> Alan Stern wrote:
> >>> The functions you are worried about (usb_hcd_link_urb_to_ep and
> >>> usb_hcd_unlink_urb_from_ep) are documented as requiring that interrupts
> >>> be disabled by their callers. This patch isn't needed.
> >> This requirement is the only reason the whci-hcd driver disables
> >> interrupts. Removing this requirement would reduce the time that
> >> interrupts are disabled in the whci-hcd driver.
> >
> > That doesn't sound like a valid approach. If you don't disable
> > interrupts then you aren't protected against an interrupt handler
> > submitting an URB and accessing your data structures while whci-hcd is
> > in the middle of updating them.
>
> I can't see how urbs can be submitted to whci-hcd from an interrupt
> handler (urb callbacks are always called from a workqueue thread)
_Your_ callbacks are always called from a workqueue thread. Another
driver's callbacks might not be, and that other driver might submit an
URB. Or unlink one.
> but
> they could be submitted from a timer, so you are correct.
Alan Stern
This requirement is the only reason the whci-hcd driver disables
interrupts. Removing this requirement would reduce the time that
interrupts are disabled in the whci-hcd driver.
David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/
Alan,
I am trying your suggestion right now. Lets see if it finds something.
Thanks,
Leonardo
Are we sure the interrupt line is not shared with another driver that calls
spin_unlock_irq() in its interrupt handler?
Regards
Oliver
Just wondering,... doesn't lockdep say anything about the situation?