Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] ASPM: Fix pcie devices with non-pcie children

31 views
Skip to first unread message

Matthew Garrett

unread,
Mar 27, 2012, 10:20:02 AM3/27/12
to
Commit 4949be16822e92a18ea0cc1616319926628092ee changed the behaviour of
pcie_aspm_sanity_check() to always return 0 if aspm is disabled, in order
to avoid cases where we changed ASPM state on pre-PCIe 1.1 devices. This
skipped the secondary function of pcie_aspm_sanity_check which was to avoid
us enabling ASPM on devices that had non-PCIe children, causing us to hit
a BUG_ON later on. Move the aspm_disabled check so we continue to honour
that scenario.

Signed-off-by: Matthew Garrett <m...@redhat.com>
Cc: sta...@vger.kernel.org
---
drivers/pci/pcie/aspm.c | 13 ++++++++++---
1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 86111d9..41e367b 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -521,9 +521,6 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
int pos;
u32 reg32;

- if (aspm_disabled)
- return 0;
-
/*
* Some functions in a slot might not all be PCIe functions,
* very strange. Disable ASPM for the whole slot
@@ -532,6 +529,16 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
pos = pci_pcie_cap(child);
if (!pos)
return -EINVAL;
+
+ /*
+ * If ASPM is disabled then we're not going to change
+ * the BIOS state. It's safe to continue even if it's a
+ * pre-1.1 device
+ */
+
+ if (aspm_disabled)
+ continue;
+
/*
* Disable ASPM for pre-1.1 PCIe device, we follow MS to use
* RBER bit to determine if a function is 1.1 version device
--
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Colin Ian King

unread,
Mar 27, 2012, 1:00:02 PM3/27/12
to
We got a user who's now verified this fixes the BUG_ON() during boot,
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/961482/comments/51

.. so this does the trick. Thanks Matthew.

Jonathan Nieder

unread,
Mar 28, 2012, 5:20:02 PM3/28/12
to
Hi Matthew,

Matthew Garrett wrote:

> Commit 4949be16822e92a18ea0cc1616319926628092ee changed the behaviour of
> pcie_aspm_sanity_check() to always return 0 if aspm is disabled, in order
> to avoid cases where we changed ASPM state on pre-PCIe 1.1 devices. This
> skipped the secondary function of pcie_aspm_sanity_check which was to avoid
> us enabling ASPM on devices that had non-PCIe children, causing us to hit
> a BUG_ON later on. Move the aspm_disabled check so we continue to honour
> that scenario.

janek (cc-ed) never experienced the BUG_ON. Instead, starting with
v3.3 and v3.2.12 his hard disk using the pata_jmicron driver was not
detected during boot-up, resulting in the message "gave up waiting for
root device" and a failed boot.

Found in

Debian kernel 3.2.12-1
Debian kernel 3.3-1~experimental.1
Upstream 3.3
Linus's "master" as of 2012-03-28

Based on the thread [1] we blamed 4949be16822. janek tried the patch
above on top of linus's "master". The result:

> Thanks. This patch fixes the problem.

In other words, this gets the pata_jmicron driver to enumerate its
drives again, a positive effect that wasn't even advertised in the
commit message. ;-) Thanks for writing it.

Sincerely,
Jonathan

[1] http://thread.gmane.org/gmane.linux.kernel/1271264/focus=1271785

Jonathan Nieder

unread,
Mar 29, 2012, 12:40:02 PM3/29/12
to
From: Matthew Garrett <m...@redhat.com>
Date: Tue, 27 Mar 2012 10:17:41 -0400

Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
Some other systems using the pata_jmicron driver fail to boot because
no disks are detected. Passing pcie_aspm=force on the kernel command
line works around it.

The cause: commit 4949be16822e ("PCI: ignore pre-1.1 ASPM quirking
when ASPM is disabled") changed the behaviour of
pcie_aspm_sanity_check() to always return 0 if aspm is disabled, in
order to avoid cases where we changed ASPM state on pre-PCIe 1.1
devices. This skipped the secondary function of
pcie_aspm_sanity_check which was to avoid us enabling ASPM on devices
that had non-PCIe children, causing trouble later on. Move the
aspm_disabled check so we continue to honour that scenario.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=42979 and
http://bugs.debian.org/665420

[jn: with more symptoms in log message]

Reported-by: Romain Francoise <rom...@orebokech.com> # kernel panic
Reported-by: Chris Holland <bandido...@gmail.com> # disk detection trouble
Signed-off-by: Matthew Garrett <m...@redhat.com>
Cc: sta...@vger.kernel.org
Tested-by: Hatem Masmoudi <hatem.m...@gmail.com> # Dell Latitude E5520
Tested-by: janek <jan...@gmail.com> # pata_jmicron with JMB362/JMB363
Signed-off-by: Jonathan Nieder <jrni...@gmail.com>
---
Hi Andrew,

This patch only appeared a couple of days ago[1], but it fixes a
noticeable regression so I would like to make sure the patch becomes
part of mainline and the 3.2.y- and 3.3.y-stable trees soon. Could
you pick it up for linux-next until it makes its way to the PCI tree?

Regression was introduced between 3.3-rc7 and 3.3 and between 3.2.11
and 3.2.12. Prevents boot on affected machines, though there is a
workaround. Details about the symptoms and fix are above.

Thanks,
Jonathan

[1] http://thread.gmane.org/gmane.linux.kernel.pci/14503

drivers/pci/pcie/aspm.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 4bdef24cd412..b500840a143b 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -508,9 +508,6 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
int pos;
u32 reg32;

- if (aspm_disabled)
- return 0;
-
/*
* Some functions in a slot might not all be PCIe functions,
* very strange. Disable ASPM for the whole slot
@@ -519,6 +516,16 @@ static int pcie_aspm_sanity_check(struct pci_dev *pdev)
pos = pci_pcie_cap(child);
if (!pos)
return -EINVAL;
+
+ /*
+ * If ASPM is disabled then we're not going to change
+ * the BIOS state. It's safe to continue even if it's a
+ * pre-1.1 device
+ */
+
+ if (aspm_disabled)
+ continue;
+
/*
* Disable ASPM for pre-1.1 PCIe device, we follow MS to use
* RBER bit to determine if a function is 1.1 version device
--
1.7.10.rc1

Andrew Morton

unread,
Mar 29, 2012, 4:50:03 PM3/29/12
to
Just about the only person who wasn't copied on this email is, umm, the
PCI maintainer!

Matthew Garrett

unread,
Mar 29, 2012, 5:00:02 PM3/29/12
to
On Thu, Mar 29, 2012 at 01:46:14PM -0700, Andrew Morton wrote:

> Just about the only person who wasn't copied on this email is, umm, the
> PCI maintainer!

Jesse just handed that off to Bjorn…

--
Matthew Garrett | mj...@srcf.ucam.org

Jonathan Nieder

unread,
Mar 29, 2012, 5:10:01 PM3/29/12
to
Andrew Morton wrote:
> Jonathan Nieder <jrni...@gmail.com> wrote:

>> From: Matthew Garrett <m...@redhat.com>
>> Date: Tue, 27 Mar 2012 10:17:41 -0400
[...]
>> commit 4949be16822e ("PCI: ignore pre-1.1 ASPM quirking
>> when ASPM is disabled") changed the behaviour of
>> pcie_aspm_sanity_check() to always return 0 if aspm is disabled, in
>> order to avoid cases where we changed ASPM state on pre-PCIe 1.1
>> devices. This skipped the secondary function of
>> pcie_aspm_sanity_check which was to avoid us enabling ASPM on devices
>> that had non-PCIe children, causing trouble later on.
[...]
>> Could
>> you pick it up for linux-next until it makes its way to the PCI tree?
[...]
> Just about the only person who wasn't copied on this email is, umm, the
> PCI maintainer!

Well spotted. Thanks for catching it.

Andrew Morton

unread,
Mar 29, 2012, 5:10:02 PM3/29/12
to
On Thu, 29 Mar 2012 21:56:35 +0100
Matthew Garrett <mj...@srcf.ucam.org> wrote:

> On Thu, Mar 29, 2012 at 01:46:14PM -0700, Andrew Morton wrote:
>
> > Just about the only person who wasn't copied on this email is, umm, the
> > PCI maintainer!
>
> Jesse just handed that off to Bjorn…

Oh. So he did. My search of MAINTAINERS turned up the probably wrong

T: git git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci.git

Bjorn Helgaas

unread,
Mar 29, 2012, 5:50:02 PM3/29/12
to
On Thu, Mar 29, 2012 at 2:59 PM, Andrew Morton
<ak...@linux-foundation.org> wrote:
> On Thu, 29 Mar 2012 21:56:35 +0100
> Matthew Garrett <mj...@srcf.ucam.org> wrote:
>
>> On Thu, Mar 29, 2012 at 01:46:14PM -0700, Andrew Morton wrote:
>>
>> > Just about the only person who wasn't copied on this email is, umm, the
>> > PCI maintainer!
>>
>> Jesse just handed that off to Bjorn…
>
> Oh.  So he did.   My search of MAINTAINERS turned up the probably wrong
>
> T:      git git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci.git

Yep, I'll update MAINTAINERS as soon as I set up a kernel.org tree.

The patch itself looks fine to me, so in case anybody wants to pick it
up earlier:

Acked-by: Bjorn Helgaas <bhel...@google.com>

I do think that ASPM path is disappointingly hard to follow, which
likely contributed to the bug in the first place.
pcie_aspm_sanity_check() is a terrible name for something that returns
0/errno (which is treated as a bool meaning something like "do ASPM on
this device"). And the idea that we save this blacklist information
in the form of "link->aspm_enabled = ASPM_STATE_ALL" is weird.

But obviously, I'm ignorant of ASPM in general.

Bjorn
0 new messages