[PATCH] intel-agp.c: Fix crash when accessing nonexistent GTT entries in i915

Miguel Ojeda

unread,

Mar 10, 2010, 5:20:03 PM3/10/10

to

Hi,

The commit 5877960869333e42ebeb733e8d9d5630ff96d350 (included since 2.6.32.4) crashes (locks up) the 82915G/GV/910GL Controller when intel-agp.c tries to access nonexistent GTT entries at:

- for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
+ for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {

Rationale: I915 (gma900) has 128 MB of video memory (maximum), as per intel.com ( http://www.intel.com/support/graphics/intel915g/sb/CS-012579.htm ) and lscpi:

00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 4147
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ff480000 (32-bit, non-prefetchable) [size=512K]
Region 1: I/O ports at ec00 [size=8]
Region 2: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Region 3: Memory at ff440000 (32-bit, non-prefetchable) [size=256K]
Capabilities: <access denied>

AFAIK, that implies that its gtt_total_size (in pages) should be 32K (as num_entries showed before the commit) instead of 64K.

Note: The IS_I915 macro includes 945; however, only GMA900 (I915) had 128 MB as the maximum AFAIK. Therefore, I divided the IS_I915 macro. I do not know about the "E7221" (please check).

How to reproduce: Access kernel.org in iceweasel (Debian Lenny) and the X server will crash. Sometimes, the kernel freezes.

Please review. The fix should be applied to stable series, as well as 2.6.33 and 2.6.34-rc1.

Signed-off-by: Miguel Ojeda <miguel.oje...@gmail.com>
---
--- linux-2.6.32.stable/drivers/char/agp/intel-agp.c.old 2010-03-10 15:32:36.000000000 +0100
+++ linux-2.6.32.stable/drivers/char/agp/intel-agp.c 2010-03-10 22:38:23.000000000 +0100
@@ -65,11 +65,11 @@
#define PCI_DEVICE_ID_INTEL_IGDNG_MC2_HB 0x006a
#define PCI_DEVICE_ID_INTEL_IGDNG_M_IG 0x0046

-/* cover 915 and 945 variants */
#define IS_I915 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_E7221_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915G_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB)
+
+#define IS_I945 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GM_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GME_HB)

@@ -724,14 +724,14 @@ static void intel_i830_init_gtt_entries(
break;
case I915_GMCH_GMS_STOLEN_48M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(48) - KB(size);
else
gtt_entries = 0;
break;
case I915_GMCH_GMS_STOLEN_64M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(64) - KB(size);
else
gtt_entries = 0;
@@ -1305,6 +1305,8 @@ static int intel_i915_create_gatt_table(

if (IS_G33)
gtt_map_size = 1024 * 1024; /* 1M on G33 */
+ else if (IS_I915)
+ gtt_map_size = 128 * 1024; /* 128K on I915 */
intel_private.gtt = ioremap(temp2, gtt_map_size);
if (!intel_private.gtt)
return -ENOMEM;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Miguel Ojeda

unread,

Mar 11, 2010, 2:40:02 AM3/11/10

to

Cc'ing the original committers.

Zhenyu Wang

unread,

Mar 11, 2010, 3:40:01 AM3/11/10

to

On 2010.03.11 08:31:57 +0100, Miguel Ojeda wrote:
> On Wed, Mar 10, 2010 at 11:09 PM, Miguel Ojeda
> <miguel.oje...@gmail.com> wrote:
> > Hi,
> >
> > The commit 5877960869333e42ebeb733e8d9d5630ff96d350 (included since 2.6.32.4) crashes (locks up) the 82915G/GV/910GL Controller when intel-agp.c tries to access nonexistent GTT entries at:
> >
> > - for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
> > + for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {
> >
> > Rationale: I915 (gma900) has 128 MB of video memory (maximum), as per intel.com ( http://www.intel.com/support/graphics/intel915g/sb/CS-012579.htm ) and lscpi:

I think that page is wrong, and http://www.intel.com/design/chipsets/datashts/301467.htm
has info that 256K is for GTT bar, so max video memory size is 256M. On my 915G
board, I can choose 128M/256M in BIOS setup.

> >
> > 00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) (prog-if 00 [VGA controller])
> > Subsystem: Intel Corporation Device 4147
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0
> > Interrupt: pin A routed to IRQ 11
> > Region 0: Memory at ff480000 (32-bit, non-prefetchable) [size=512K]
> > Region 1: I/O ports at ec00 [size=8]
> > Region 2: Memory at d8000000 (32-bit, prefetchable) [size=128M]
> > Region 3: Memory at ff440000 (32-bit, non-prefetchable) [size=256K]

This also tells bar 3 for GTT has 256K.

> > Capabilities: <access denied>
> >
> >
> > AFAIK, that implies that its gtt_total_size (in pages) should be 32K (as num_entries showed before the commit) instead of 64K.
> >
> > Note: The IS_I915 macro includes 945; however, only GMA900 (I915) had 128 MB as the maximum AFAIK. Therefore, I divided the IS_I915 macro. I do not know about the "E7221" (please check).
> >
> > How to reproduce: Access kernel.org in iceweasel (Debian Lenny) and the X server will crash. Sometimes, the kernel freezes.
> >

I can't produce this on my 915G board with 128M or 256M memory config. Could you
paste dmesg in your failure or just hang?

--
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

signature.asc

Andrew Morton

unread,

Mar 19, 2010, 4:30:02 PM3/19/10

to

On Thu, 11 Mar 2010 16:54:26 +0100
Miguel Ojeda <miguel.oje...@gmail.com> wrote:

> Attached dmesg, lspci -vv, config and xorg.
>
> When the X server crashes, the kernel does not report anything

This seems to have gone all quiet?

As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
2.6.32->2.6.33 regression?

Miguel Ojeda

unread,

Mar 20, 2010, 9:10:01 AM3/20/10

to

On Fri, Mar 19, 2010 at 9:27 PM, Andrew Morton
<ak...@linux-foundation.org> wrote:
> On Thu, 11 Mar 2010 16:54:26 +0100
> Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>
>> Attached dmesg, lspci -vv, config and xorg.
>>
>> When the X server crashes, the kernel does not report anything
>
> This seems to have gone all quiet?

It seems so, I was about to ping Zhenyu.

>
> As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
> 2.6.32->2.6.33 regression?

Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
2.6.33 and 2.6.34-rc1. See my original message for more details about
the crash.

Zhenyu Wang

unread,

Mar 21, 2010, 10:00:03 AM3/21/10

to

On 2010.03.20 14:04:56 +0100, Miguel Ojeda wrote:
> On Fri, Mar 19, 2010 at 9:27 PM, Andrew Morton
> <ak...@linux-foundation.org> wrote:
> > On Thu, 11 Mar 2010 16:54:26 +0100
> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >
> >> Attached dmesg, lspci -vv, config and xorg.
> >>
> >> When the X server crashes, the kernel does not report anything
> >
> > This seems to have gone all quiet?
>
> It seems so, I was about to ping Zhenyu.

Could you try recent X.org intel driver release? Your failure X log
showed some pretty old UMS driver's render flush time out message,
I think whose version never has been well tested with new kms/gem stuff.

>
> >
> > As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
> > 2.6.32->2.6.33 regression?
>
> Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
> 2.6.33 and 2.6.34-rc1. See my original message for more details about
> the crash.
>

Could you bisect? I doubt it's caused by David's patch, as if it is, you
will have trouble at early agp init time, instead of current problem looks
like something causing rendering hang..and I can't think of how mapping GTT
to a scratch page could cause problem in case GTT bar is truely 256K.

signature.asc

Miguel Ojeda

unread,

Mar 21, 2010, 11:40:01 AM3/21/10

to

On Sun, Mar 21, 2010 at 2:58 PM, Zhenyu Wang <zhe...@linux.intel.com> wrote:
> On 2010.03.20 14:04:56 +0100, Miguel Ojeda wrote:
>> On Fri, Mar 19, 2010 at 9:27 PM, Andrew Morton
>> <ak...@linux-foundation.org> wrote:
>> > On Thu, 11 Mar 2010 16:54:26 +0100
>> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> >
>> >> Attached dmesg, lspci -vv, config and xorg.
>> >>
>> >> When the X server crashes, the kernel does not report anything
>> >
>> > This seems to have gone all quiet?
>>
>> It seems so, I was about to ping Zhenyu.
>
> Could you try recent X.org intel driver release? Your failure X log
> showed some pretty old UMS driver's render flush time out message,
> I think whose version never has been well tested with new kms/gem stuff.
>

Sure. Debian Stable got version 2.3.

>>
>> >
>> > As this was a 2.6.32->2.6.32.4 regression, I assume that it's also a
>> > 2.6.32->2.6.33 regression?
>>
>> Yep. All kernels I tested since 2.6.32.4 crash, including 2.6.32.10,
>> 2.6.33 and 2.6.34-rc1. See my original message for more details about
>> the crash.
>>
>
> Could you bisect? I doubt it's caused by David's patch, as if it is, you
> will have trouble at early agp init time, instead of current problem looks
> like something causing rendering hang..and I can't think of how mapping GTT
> to a scratch page could cause problem in case GTT bar is truely 256K.

I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.

I don't know if such commit is really the true problem (does not seem
so if you are right); however, it is the commit that breaks the X
server as I stated in the original message.

>
> --
> Open Source Technology Center, Intel ltd.
>
> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
>

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iEYEARECAAYFAkumJgwACgkQsQQaM014GCe2cgCbB8LJnqn7n7J++lA8Pd+TfBQb
> 2BQAn2sYv0DZuSniLTcf9F3cX+cGIgVl
> =gQHY
> -----END PGP SIGNATURE-----

Andrew Morton

unread,

Mar 23, 2010, 12:00:02 AM3/23/10

to

I have no 5877960869333e42ebeb733e8d9d5630ff96d350 here. This is why
we ask that people always identify patches by both the hash and the
full title..

> I don't know if such commit is really the true problem (does not seem
> so if you are right); however, it is the commit that breaks the X
> server as I stated in the original message.

--

Christian Kujau

unread,

Mar 23, 2010, 12:20:01 AM3/23/10

to

On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.

I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:

--------
agp/intel-agp: Clear entire GTT on startup

Some BIOSes fail to initialise the GTT, which will cause DMA faults when
the IOMMU is enabled. We need to clear the whole thing to point at the
scratch page, not just the part that Linux is going to use.

Signed-off-by: David Woodhouse <David.W...@intel.com>
[anholt: Note that this may also help with stability in the presence of
driver bugs, by not drawing to memory we don't own]
Signed-off-by: Eric Anholt <er...@anholt.net>
--------

Christian.

[0] http://github.com/pfactum/pf-kernel/commit/5877960869333e42ebeb733e8d9d5630ff96d350
--
BOFH excuse #384:

it's an ID-10-T error

Miguel Ojeda

unread,

Mar 23, 2010, 6:10:02 AM3/23/10

to

On Tue, Mar 23, 2010 at 1:57 AM, Andrew Morton

Sorry, I thought that saying 2.6.32.x and the commit would be enough:.
This is the one I meant:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=5877960869333e42ebeb733e8d9d5630ff96d350

Miguel Ojeda

unread,

Mar 23, 2010, 7:50:02 AM3/23/10

to

On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
>> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
>
> I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:

Indeed. Also in

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3

Andrew Morton

unread,

Mar 24, 2010, 2:20:01 PM3/24/10

to

On Tue, 23 Mar 2010 12:40:05 +0100
Miguel Ojeda <miguel.oje...@gmail.com> wrote:

> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >
> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
>
> Indeed. Also in
>
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3

Does reverting that patch from the current code fix the crash?

--- a/drivers/char/agp/intel-agp.c~revert-1
+++ a/drivers/char/agp/intel-agp.c
@@ -207,7 +207,6 @@ static struct _intel_private {
* popup and for the GTT.
*/
int gtt_entries; /* i830+ */
- int gtt_total_size;
union {
void __iomem *i9xx_flush_page;
void *i8xx_flush_page;
@@ -1239,7 +1238,7 @@ static int intel_i915_configure(void)
readl(intel_private.registers+I810_PGETBL_CTL); /* PCI Posting. */

if (agp_bridge->driver->needs_scratch_page) {
- for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {
+ for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
writel(agp_bridge->scratch_page, intel_private.gtt+i);
}
readl(intel_private.gtt+i-1); /* PCI Posting. */
@@ -1394,8 +1393,6 @@ static int intel_i915_create_gatt_table(
if (!intel_private.gtt)
return -ENOMEM;

- intel_private.gtt_total_size = gtt_map_size / 4;
-
temp &= 0xfff80000;

intel_private.registers = ioremap(temp, 128 * 4096);
@@ -1485,8 +1482,6 @@ static int intel_i965_create_gatt_table(
if (!intel_private.gtt)
return -ENOMEM;

- intel_private.gtt_total_size = gtt_size / 4;
-
intel_private.registers = ioremap(temp, 128 * 4096);
if (!intel_private.registers) {
iounmap(intel_private.gtt);
_

Miguel Ojeda

unread,

Mar 24, 2010, 3:10:02 PM3/24/10

to

On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
<ak...@linux-foundation.org> wrote:
> On Tue, 23 Mar 2010 12:40:05 +0100
> Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>
>> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
>> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
>> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
>> >
>> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
>>
>> Indeed. Also in
>>
>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
>
> Does reverting that patch from the current code fix the crash?
>

I will recheck tomorrow but I think that it does, if
intel_private.gtt_total_size is still 32768 in current code instead of
65536 (I think I tried the patch I sent in 2.6.34-rc1 too).

Miguel Ojeda

unread,

Mar 25, 2010, 1:00:03 PM3/25/10

to

On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
<ak...@linux-foundation.org> wrote:
> On Tue, 23 Mar 2010 12:40:05 +0100
> Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>
>> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
>> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
>> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
>> >
>> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
>>
>> Indeed. Also in
>>
>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
>
> Does reverting that patch from the current code fix the crash?

Yes. In addition, applying the patch I provided also fixes it in current code.

Current code:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=01e77706cdde7c0b47e5ca1f4284a795504c7c40

Andrew Morton

unread,

Apr 27, 2010, 2:10:03 PM4/27/10

to

On Thu, 25 Mar 2010 17:55:56 +0100
Miguel Ojeda <miguel.oje...@gmail.com> wrote:

> On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
> <ak...@linux-foundation.org> wrote:
> > On Tue, 23 Mar 2010 12:40:05 +0100
> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >
> >> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> >> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >> >
> >> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
> >>
> >> Indeed. Also in
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
> >
> > Does reverting that patch from the current code fix the crash?
>
> Yes. In addition, applying the patch I provided also fixes it in current code.
>

Well great. A whole pile of new stuff has turned up in linux-next's
drivers/char/agp/intel-agp.c. As far as I can tell none of it
address the regression which you've reported and your patch no longer
applies at all so I have to drop the patch.

Perhaps "agp/intel: put back check that we have a driver for the
bridge" fixes it, but it isn't tagged for -stable backporting.

Rafael, Maciej: if you're not already tracking this as a 2.6.32->2.6.33
regression then please do so.

David, can you please help us to get this sorted out in both 2.6.33.x
and in mainline?

From: Miguel Ojeda <miguel.oje...@gmail.com>

Commit fc61901373987ad61851ed001fe971f3ee8d96a3 ("agp/intel-agp: Clear
entire GTT on startup") (included since 2.6.32.4) crashes (locks up) the

The fix should be applied to stable series, as well as 2.6.33 and
2.6.34-rc1.

Signed-off-by: Miguel Ojeda <miguel.oje...@gmail.com>
Cc: David Woodhouse <David.W...@intel.com>
Cc: Eric Anholt <er...@anholt.net>
Cc: Zhenyu Wang <zhe...@linux.intel.com>
Cc: Dave Airlie <air...@linux.ie>
Cc: <sta...@kernel.org>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
---

drivers/char/agp/intel-agp.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff -puN drivers/char/agp/intel-agp.c~intel-agpc-fix-crash-when-accessing-nonexistent-gtt-entries-in-i915 drivers/char/agp/intel-agp.c
--- a/drivers/char/agp/intel-agp.c~intel-agpc-fix-crash-when-accessing-nonexistent-gtt-entries-in-i915
+++ a/drivers/char/agp/intel-agp.c
@@ -74,11 +74,11 @@ EXPORT_SYMBOL(intel_agp_enabled);
#define PCI_DEVICE_ID_INTEL_SANDYBRIDGE_M_HB 0x0104
#define PCI_DEVICE_ID_INTEL_SANDYBRIDGE_M_IG 0x0106

-/* cover 915 and 945 variants */
#define IS_I915 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_E7221_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915G_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB || \
- agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
+ agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82915GM_HB)
+
+#define IS_I945 (agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945G_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GM_HB || \
agp_bridge->dev->device == PCI_DEVICE_ID_INTEL_82945GME_HB)

@@ -824,14 +824,14 @@ static void intel_i830_init_gtt_entries(

break;
case I915_GMCH_GMS_STOLEN_48M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(48) - KB(size);
else
gtt_entries = 0;
break;
case I915_GMCH_GMS_STOLEN_64M:
/* Check it's really I915G */
- if (IS_I915 || IS_I965 || IS_G33 || IS_G4X)
+ if (IS_I915 || IS_I945 || IS_I965 || IS_G33 || IS_G4X)
gtt_entries = MB(64) - KB(size);
else
gtt_entries = 0;

@@ -1400,6 +1400,8 @@ static int intel_i915_create_gatt_table(

if (IS_G33)
gtt_map_size = 1024 * 1024; /* 1M on G33 */
+ else if (IS_I915)
+ gtt_map_size = 128 * 1024; /* 128K on I915 */
intel_private.gtt = ioremap(temp2, gtt_map_size);
if (!intel_private.gtt)
return -ENOMEM;

diff -puN /dev/null /dev/null

Miguel Ojeda

unread,

Apr 27, 2010, 4:20:02 PM4/27/10

to

On Tue, Apr 27, 2010 at 7:57 PM, Andrew Morton

<ak...@linux-foundation.org> wrote:
> On Thu, 25 Mar 2010 17:55:56 +0100
> Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>
>> On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
>> <ak...@linux-foundation.org> wrote:
>> > On Tue, 23 Mar 2010 12:40:05 +0100
>> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> >
>> >> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
>> >> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
>> >> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
>> >> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
>> >> >
>> >> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
>> >>
>> >> Indeed. Also in
>> >>
>> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
>> >
>> > Does reverting that patch from the current code fix the crash?
>>
>> Yes. In addition, applying the patch I provided also fixes it in current code.
>>
>
> Well great. �A whole pile of new stuff has turned up in linux-next's
> drivers/char/agp/intel-agp.c. �As far as I can tell none of it
> address the regression which you've reported and your patch no longer
> applies at all so I have to drop the patch.
>
> Perhaps "agp/intel: put back check that we have a driver for the
> bridge" fixes it, but it isn't tagged for -stable backporting.

I can try linux-next and see if it works again.

In addition, I have to test the regression with some newer X version
yet, as Zhenyu told me.

Andrew Morton

unread,

Apr 27, 2010, 4:30:02 PM4/27/10

to

On Tue, 27 Apr 2010 22:06:21 +0200
Miguel Ojeda <miguel.oje...@gmail.com> wrote:

> On Tue, Apr 27, 2010 at 7:57 PM, Andrew Morton
> <ak...@linux-foundation.org> wrote:
> > On Thu, 25 Mar 2010 17:55:56 +0100
> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >
> >> On Wed, Mar 24, 2010 at 7:14 PM, Andrew Morton
> >> <ak...@linux-foundation.org> wrote:
> >> > On Tue, 23 Mar 2010 12:40:05 +0100
> >> > Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >
> >> >> On Tue, Mar 23, 2010 at 5:14 AM, Christian Kujau <li...@nerdbynature.de> wrote:
> >> >> > On Mon, 22 Mar 2010 at 20:57, Andrew Morton wrote:
> >> >> >> On Sun, 21 Mar 2010 16:30:20 +0100 Miguel Ojeda <miguel.oje...@gmail.com> wrote:
> >> >> >> > I bisected in order to find the commit 5877960869333e42ebeb733e8d9d5630ff96d350.
> >> >> >
> >> >> > I believe this[0] is fc61901373987ad61851ed001fe971f3ee8d96a3 upstream:
> >> >>
> >> >> Indeed. Also in
> >> >>
> >> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commit;h=fc61901373987ad61851ed001fe971f3ee8d96a3
> >> >
> >> > Does reverting that patch from the current code fix the crash?
> >>
> >> Yes. In addition, applying the patch I provided also fixes it in current code.
> >>
> >

> > Well great. __A whole pile of new stuff has turned up in linux-next's
> > drivers/char/agp/intel-agp.c. __As far as I can tell none of it

> > address the regression which you've reported and your patch no longer
> > applies at all so I have to drop the patch.
> >
> > Perhaps "agp/intel: put back check that we have a driver for the
> > bridge" fixes it, but it isn't tagged for -stable backporting.
>
> I can try linux-next and see if it works again.

Thanks.

> In addition, I have to test the regression with some newer X version
> yet, as Zhenyu told me.

That would seem to be counter-productive. If you install a newer X and
the bug goes away, you've just gone and made it harder for yourself to
reproduce the bug.

Miguel Ojeda

unread,

Apr 27, 2010, 5:10:02 PM4/27/10

to

On Tue, Apr 27, 2010 at 10:26 PM, Andrew Morton

Oh, of course. I meant compiling the latest version somewhere and have
both versions ready (the Debian's one and the one from x.org).

Dave Airlie

unread,

Apr 28, 2010, 3:50:01 AM4/28/10

to

On Fri, Mar 12, 2010 at 1:54 AM, Miguel Ojeda
<miguel.oje...@gmail.com> wrote:

> On Thu, Mar 11, 2010 at 9:34 AM, Zhenyu Wang <zhe...@linux.intel.com> wrote:
>> On 2010.03.11 08:31:57 +0100, Miguel Ojeda wrote:
>>> On Wed, Mar 10, 2010 at 11:09 PM, Miguel Ojeda
>>> <miguel.oje...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > The commit 5877960869333e42ebeb733e8d9d5630ff96d350 (included since 2.6.32.4) crashes (locks up) the 82915G/GV/910GL Controller when intel-agp.c tries to access nonexistent GTT entries at:
>>> >
>>> > - for (i = intel_private.gtt_entries; i < current_size->num_entries; i++) {
>>> > + for (i = intel_private.gtt_entries; i < intel_private.gtt_total_size; i++) {
>>> >
>>> > Rationale: I915 (gma900) has 128 MB of video memory (maximum), as per intel.com ( http://www.intel.com/support/graphics/intel915g/sb/CS-012579.htm ) and lscpi:
>>
>> I think that page is wrong, and http://www.intel.com/design/chipsets/datashts/301467.htm
>> has info that 256K is for GTT bar, so max video memory size is 256M. On my 915G
>> board, I can choose 128M/256M in BIOS setup.
>

> You are right, I can choose between 128M/256M in BIOS too. The BIOS
> config is the following:
>
> IGD Aperture Size: 128 MB
> DVMT MODE: FIXED
> IGD DVMT/FIXED MEMORY: 32 MB

>
>>
>>> >
>>> > 00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) (prog-if 00 [VGA controller])
>>> > Subsystem: Intel Corporation Device 4147
>>> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>> > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>> > Latency: 0
>>> > Interrupt: pin A routed to IRQ 11
>>> > Region 0: Memory at ff480000 (32-bit, non-prefetchable) [size=512K]
>>> > Region 1: I/O ports at ec00 [size=8]
>>> > Region 2: Memory at d8000000 (32-bit, prefetchable) [size=128M]
>>> > Region 3: Memory at ff440000 (32-bit, non-prefetchable) [size=256K]
>>
>> This also tells bar 3 for GTT has 256K.
>>
>

> I see. I just guessed by seeing that gtt_total_size is the double than
> older num_entries and that "default" gtt_map_size is 256 instead of
> 128. Then I checked the webpage and lspci, I wrote the fix and it
> worked, so I thought it was 128 MB in fact (so GTT double in size).

>
>>> > Capabilities: <access denied>
>>> >
>>> >
>>> > AFAIK, that implies that its gtt_total_size (in pages) should be 32K (as num_entries showed before the commit) instead of 64K.
>>> >
>>> > Note: The IS_I915 macro includes 945; however, only GMA900 (I915) had 128 MB as the maximum AFAIK. Therefore, I divided the IS_I915 macro. I do not know about the "E7221" (please check).
>>> >
>>> > How to reproduce: Access kernel.org in iceweasel (Debian Lenny) and the X server will crash. Sometimes, the kernel freezes.
>>> >
>>
>> I can't produce this on my 915G board with 128M or 256M memory config.
>

> It also occurs in other applications/ways. Maybe you could try to do
> some scrolling (I recall it also triggered it) or try to open other
> browsers (konqueror is another application that seems to trigger it
> easily in this box).

>
>> Could you
>> paste dmesg in your failure or just hang?
>

> Attached dmesg, lspci -vv, config and xorg.
>

> When the X server crashes, the kernel does not report anything:
>
> Linux agpgart interface v0.103
> agpgart-intel 0000:00:00.0: Intel 915G Chipset
> DEBUG i915 num_entries = 32768
> DEBUG i915 intel_private.gtt_total_size = 65536
> agpgart-intel 0000:00:00.0: detected 8060K stolen memory
> agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xd8000000
>
> I added a couple of printk's to see the value of num_entries (older
> loop limit) and gtt_total_size (newer).
>
> However, xorg reports the error:
>
> Error in I830WaitLpRing(), timeout for 2 seconds
> pgetbl_ctl: 0x7ffe0001 getbl_err: 0x00000000
> ipeir: 0x00000000 iphdr: 0x15000000
> LP ring tail: 0x00013aa8 head: 0x00013ae0 len: 0x0001f001 start 0x00000000
> eir: 0x0000 esr: 0x0000 emr: 0xffff
> instdone: 0xffc1 instpm: 0x0000
> memmode: 0x00000306 instps: 0x800f00ca
> hwstam: 0xffff ier: 0x0000 imr: 0xffff iir: 0x0000
> Ring at virtual 0xaf1c6000 head 0x13ae0 tail 0x13aa8 count 32754
> 00013a60: 00004820
> 00013a64: 00000000
> ...
> 00013adc: 00000001
> 00013ae0: 02001910
> Ring end
> space: 48 wanted 56
>
> Fatal server error:
> lockup
>
>
> FatalError re-entered, aborting
> I830Sync: BEGIN_LP_RING called without closing ADVANCE_LP_RING
>
> I do not know about output/reports when the video/kernel freezes
> completely (or seems so).
>

Is there any difference in the X org log files between booting
with/without this patch.

I suspect its one of those crappy cases where an old userspace driver
directly hits the hardware and the kernel AGP driver hits it
different,

I don't think reverting this patch can help anyone, as its clearly not
wrong, we also have no way to know in advance what userspace driver is
going to come along.

The only other way I can think to do this, is when the KMS drm driver
comes up it sets up the extra GTT entries and we leave the AGP driver
only touching what is always has.

In that case then the KMS driver will just work, and anyone running
UMS can stay in the 200xs.

Dave.