The commit 5877960869333e42ebeb733e8d9d5630ff96d350 (included since 2.6.32.4) crashes (locks up) the 82915G/GV/910GL Controller when intel-agp.c tries to access nonexistent GTT entries at:
- for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
+ for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {
Rationale: I915 (gma900) has 128 MB of video memory (maximum), as per intel.com ( http://www.intel.com/support/graphics/intel915g/sb/CS-012579.htm ) and lscpi:
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 4147
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ff480000 (32-bit, non-prefetchable) [size=512K]
Region 1: I/O ports at ec00 [size=8]
Region 2: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Region 3: Memory at ff440000 (32-bit, non-prefetchable) [size=256K]
Capabilities: <access denied>
AFAIK, that implies that its gtt_total_size (in pages) should be 32K (as num_entries showed before the commit) instead of 64K.
Note: The IS_I915 macro includes 945; however, only GMA900 (I915) had 128 MB as the maximum AFAIK. Therefore, I divided the IS_I915 macro. I do not know about the "E7221" (please check).
How to reproduce: Access kernel.org in iceweasel (Debian Lenny) and the X server will crash. Sometimes, the kernel freezes.
Please review. The fix should be applied to stable series, as well as 2.6.33 and 2.6.34-rc1.
Signed-off-by: Miguel Ojeda <miguel.oje...@gmail.com>
---
--- linux-2.6.32.stable/drivers/char/agp/intel-agp.c.old 2010-03-10 15:32:36.000000000 +0100
+++ linux-2.6.32.stable/drivers/char/agp/intel-agp.c 2010-03-10 22:38:23.000000000 +0100
@@ -65,11 +65,11 @@
#define PCI_DEVICE_ID_INTEL_IGDNG_MC2_HB 0x006a
#define PCI_DEVICE_ID_INTEL_IGDNG_M_IG 0x0046
-/* cover 915 and 945 variants */
#define IS_I915 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_E7221_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915G_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB)
+
+#define IS_I945 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GM_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GME_HB)
@@ -724,14 +724,14 @@ static void intel_i830_init_gtt_entries(
break;
case I915_GMCH_GMS_STOLEN_48M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(48) - KB(size);
else
gtt_entries = 0;
break;
case I915_GMCH_GMS_STOLEN_64M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(64) - KB(size);
else
gtt_entries = 0;
@@ -1305,6 +1305,8 @@ static int intel_i915_create_gatt_table(
if (IS_G33)
gtt_map_size = 1024 * 1024; /* 1M on G33 */
+ else if (IS_I915)
+ gtt_map_size = 128 * 1024; /* 128K on I915 */
intel_private.gtt = ioremap(temp2, gtt_map_size);
if (!intel_private.gtt)
return -ENOMEM;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Cc'ing the original committers.
I think that page is wrong, and http://www.intel.com/design/chipsets/datashts/301467.htm
has info that 256K is for GTT bar, so max video memory size is 256M. On my 915G
board, I can choose 128M/256M in BIOS setup.
> >
> > 00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) (prog-if 00 [VGA controller])
> > Subsystem: Intel Corporation Device 4147
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0
> > Interrupt: pin A routed to IRQ 11
> > Region 0: Memory at ff480000 (32-bit, non-prefetchable) [size=512K]
> > Region 1: I/O ports at ec00 [size=8]
> > Region 2: Memory at d8000000 (32-bit, prefetchable) [size=128M]
> > Region 3: Memory at ff440000 (32-bit, non-prefetchable) [size=256K]
This also tells bar 3 for GTT has 256K.
> > Capabilities: <access denied>
> >
> >
> > AFAIK, that implies that its gtt_total_size (in pages) should be 32K (as num_entries showed before the commit) instead of 64K.
> >
> > Note: The IS_I915 macro includes 945; however, only GMA900 (I915) had 128 MB as the maximum AFAIK. Therefore, I divided the IS_I915 macro. I do not know about the "E7221" (please check).
> >
> > How to reproduce: Access kernel.org in iceweasel (Debian Lenny) and the X server will crash. Sometimes, the kernel freezes.
> >
I can't produce this on my 915G board with 128M or 256M memory config. Could you
paste dmesg in your failure or just hang?
--
Open Source Technology Center, Intel ltd.
$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
> Attached dmesg, lspci -vv, config and xorg.
>
> When the X server crashes, the kernel does not report anything
This seems to have gone all quiet?
As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
2.6.32->2.6.33 regression?
It seems so, I was about to ping Zhenyu.
>
> As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
> 2.6.32->2.6.33 regression?
Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
2.6.33 and 2.6.34-rc1. See my original message for more details about
the crash.
Could you try recent X.org intel driver release? Your failure X log
showed some pretty old UMS driver's render flush time out message,
I think whose version never has been well tested with new kms/gem stuff.
>
> >
> > As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
> > 2.6.32->2.6.33 regression?
>
> Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
> 2.6.33 and 2.6.34-rc1. See my original message for more details about
> the crash.
>
Could you bisect? I doubt it's caused by David's patch, as if it is, you
will have trouble at early agp init time, instead of current problem looks
like something causing rendering hang..and I can't think of how mapping GTT
to a scratch page could cause problem in case GTT bar is truely 256K.
Sure. Debian Stable got version 2.3.
>>
>> >
>> > As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
>> > 2.6.32->2.6.33 regression?
>>
>> Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
>> 2.6.33 and 2.6.34-rc1. See my original message for more details about
>> the crash.
>>
>
> Could you bisect? I doubt it's caused by David's patch, as if it is, you
> will have trouble at early agp init time, instead of current problem looks
> like something causing rendering hang..and I can't think of how mapping GTT
> to a scratch page could cause problem in case GTT bar is truely 256K.
I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
I don't know if such commit is really the true problem (does not seem
so if you are right); however, it is the commit that breaks the X
server as I stated in the original message.
>
> --
> Open Source Technology Center, Intel ltd.
>
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEARECAAYFAkumJgwACgkQsQQaM014GCe2cgCbB8LJnqn7n7J++lA8Pd+TfBQb
> 2BQAn2sYv0DZuSniLTcf9F3cX+cGIgVl
> =gQHY
> -----END PGP SIGNATURE-----
I have no 5877960869333e42ebeb733e8d9d5630ff96d350 here. This is why
we ask that people always identify patches by both the hash and the
full title..
> I don't know if such commit is really the true problem (does not seem
> so if you are right); however, it is the commit that breaks the X
> server as I stated in the original message.
--
I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
--------
agp/intel-agp: Clear entire GTT on startup
Some BIOSes fail to initialise the GTT, which will cause DMA faults when
the IOMMU is enabled. We need to clear the whole thing to point at the
scratch page, not just the part that Linux is going to use.
Signed-off-by: David Woodhouse <David.W...@intel.com>
[anholt: Note that this may also help with stability in the presence of
driver bugs, by not drawing to memory we don't own]
Signed-off-by: Eric Anholt <er...@anholt.net>
--------
Christian.
[0] http://github.com/pfactum/pf-kernel/commit/5877960869333e42ebeb733e8d9d5630ff96d350
--
BOFH excuse #384:
it's an ID-10-T error
Sorry, I thought that saying 2.6.32.x and the commit would be enough:.
This is the one I meant:
Indeed. Also in
> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >
> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
>
> Indeed. Also in
>
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
Does reverting that patch from the current code fix the crash?
--- a/drivers/char/agp/intel-agp.c~revert-1
+++ a/drivers/char/agp/intel-agp.c
@@ -207,7 +207,6 @@ static struct _intel_private {
* popup and for the GTT.
*/
int gtt_entries; /* i830+ */
- int gtt_total_size;
union {
void __iomem *i9xx_flush_page;
void *i8xx_flush_page;
@@ -1239,7 +1238,7 @@ static int intel_i915_configure(void)
readl(intel_private.registers+I810_PGETBL_CTL); /* PCI Posting. */
if (agp_bridge->driver->needs_scratch_page) {
- for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {
+ for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
writel(agp_bridge->scratch_page, intel_private.gtt+i);
}
readl(intel_private.gtt+i-1); /* PCI Posting. */
@@ -1394,8 +1393,6 @@ static int intel_i915_create_gatt_table(
if (!intel_private.gtt)
return -ENOMEM;
- intel_private.gtt_total_size = gtt_map_size / 4;
-
temp &= 0xfff80000;
intel_private.registers = ioremap(temp, 128 * 4096);
@@ -1485,8 +1482,6 @@ static int intel_i965_create_gatt_table(
if (!intel_private.gtt)
return -ENOMEM;
- intel_private.gtt_total_size = gtt_size / 4;
-
intel_private.registers = ioremap(temp, 128 * 4096);
if (!intel_private.registers) {
iounmap(intel_private.gtt);
_
I will recheck tomorrow but I think that it does, if
intel_private.gtt_total_size is still 32768 in current code instead of
65536 (I think I tried the patch I sent in 2.6.34-rc1 too).
Yes. In addition, applying the patch I provided also fixes it in current code.
Current code:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=01e77706cdde7c0b47e5ca1f4284a795504c7c40
> On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
> <ak...@linux-foundation.org> wrote:
> > On Tue, 23 Mar 2010 12:40:05 +0100
> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >
> >> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> >> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >> >
> >> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
> >>
> >> Indeed. Also in
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
> >
> > Does reverting that patch from the current code fix the crash?
>
> Yes. In addition, applying the patch I provided also fixes it in current code.
>
Well great. A whole pile of new stuff has turned up in linux-next's
drivers/char/agp/intel-agp.c. As far as I can tell none of it
address the regression which you've reported and your patch no longer
applies at all so I have to drop the patch.
Perhaps "agp/intel: put back check that we have a driver for the
bridge" fixes it, but it isn't tagged for -stable backporting.
Rafael, Maciej: if you're not already tracking this as a 2.6.32->2.6.33
regression then please do so.
David, can you please help us to get this sorted out in both 2.6.33.x
and in mainline?
From: Miguel Ojeda <miguel.oje...@gmail.com>
Commit fc61901373987ad61851ed001fe971f3ee8d96a3 ("agp/intel-agp: Clear
entire GTT on startup") (included since 2.6.32.4) crashes (locks up) the
The fix should be applied to stable series, as well as 2.6.33 and
2.6.34-rc1.
Signed-off-by: Miguel Ojeda <miguel.oje...@gmail.com>
Cc: David Woodhouse <David.W...@intel.com>
Cc: Eric Anholt <er...@anholt.net>
Cc: Zhenyu Wang <zhe...@linux.intel.com>
Cc: Dave Airlie <air...@linux.ie>
Cc: <sta...@kernel.org>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
---
drivers/char/agp/intel-agp.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff -puN drivers/char/agp/intel-agp.c~intel-agpc-fix-crash-when-accessing-nonexistent-gtt-entries-in-i915 drivers/char/agp/intel-agp.c
--- a/drivers/char/agp/intel-agp.c~intel-agpc-fix-crash-when-accessing-nonexistent-gtt-entries-in-i915
+++ a/drivers/char/agp/intel-agp.c
@@ -74,11 +74,11 @@ EXPORT_SYMBOL(intel_agp_enabled);
#define PCI_DEVICE_ID_INTEL_SANDYBRIDGE_M_HB 0x0104
#define PCI_DEVICE_ID_INTEL_SANDYBRIDGE_M_IG 0x0106
-/* cover 915 and 945 variants */
#define IS_I915 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_E7221_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915G_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB)
+
+#define IS_I945 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GM_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GME_HB)
@@ -824,14 +824,14 @@ static void intel_i830_init_gtt_entries(
break;
case I915_GMCH_GMS_STOLEN_48M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(48) - KB(size);
else
gtt_entries = 0;
break;
case I915_GMCH_GMS_STOLEN_64M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(64) - KB(size);
else
gtt_entries = 0;
@@ -1400,6 +1400,8 @@ static int intel_i915_create_gatt_table(
if (IS_G33)
gtt_map_size = 1024 * 1024; /* 1M on G33 */
+ else if (IS_I915)
+ gtt_map_size = 128 * 1024; /* 128K on I915 */
intel_private.gtt = ioremap(temp2, gtt_map_size);
if (!intel_private.gtt)
return -ENOMEM;
diff -puN /dev/null /dev/null
I can try linux-next and see if it works again.
In addition, I have to test the regression with some newer X version
yet, as Zhenyu told me.
> On Tue, Apr 27, 2010 at 7:57 PM, Andrew Morton
> <ak...@linux-foundation.org> wrote:
> > On Thu, 25 Mar 2010 17:55:56 +0100
> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >
> >> On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
> >> <ak...@linux-foundation.org> wrote:
> >> > On Tue, 23 Mar 2010 12:40:05 +0100
> >> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >
> >> >> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> >> >> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> >> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >> >> >
> >> >> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
> >> >>
> >> >> Indeed. Also in
> >> >>
> >> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
> >> >
> >> > Does reverting that patch from the current code fix the crash?
> >>
> >> Yes. In addition, applying the patch I provided also fixes it in current code.
> >>
> >
> > Well great. __A whole pile of new stuff has turned up in linux-next's
> > drivers/char/agp/intel-agp.c. __As far as I can tell none of it
> > address the regression which you've reported and your patch no longer
> > applies at all so I have to drop the patch.
> >
> > Perhaps "agp/intel: put back check that we have a driver for the
> > bridge" fixes it, but it isn't tagged for -stable backporting.
>
> I can try linux-next and see if it works again.
Thanks.
> In addition, I have to test the regression with some newer X version
> yet, as Zhenyu told me.
That would seem to be counter-productive. If you install a newer X and
the bug goes away, you've just gone and made it harder for yourself to
reproduce the bug.
Oh, of course. I meant compiling the latest version somewhere and have
both versions ready (the Debian's one and the one from x.org).
Is there any difference in the X org log files between booting
with/without this patch.
I suspect its one of those crappy cases where an old userspace driver
directly hits the hardware and the kernel AGP driver hits it
different,
I don't think reverting this patch can help anyone, as its clearly not
wrong, we also have no way to know in advance what userspace driver is
going to come along.
The only other way I can think to do this, is when the KMS drm driver
comes up it sets up the extra GTT entries and we leave the AGP driver
only touching what is always has.
In that case then the KMS driver will just work, and anyone running
UMS can stay in the 200xs.
Dave.