[BUG] unable to handle kernel paging request in next-20080516

Sitsofe Wheeler

unread,

May 17, 2008, 8:51:17 AM5/17/08

to linux-...@vger.kernel.org

Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
will occur. At first I thought it might be provoked by vga=0x164 but this
does not appear to be the case and the issue is seemingly random. I've
hand transcribed the oops so there may be errors in it but hopefully it
will still help:

BUG: unable to handle kernel paging request at e6f17fac
IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
*pde = 2714b163 *pte = 26f17160
Oops: 0000 [#1] DEBUG_PAGEALLOC
last sysfs file:

Pid: 1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
EIP is at scsi_bus_uevent+0x1/0x17
EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
Call Trace:
[<c0237f3a>] ? dev_uevent+0x8e/0xca
[<c0237eac>] ? dev_uevent+0x0/0xca
[<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
[<c01da52a>] ? kobject_uevent_env+0xa/0xc
[<c023884b>] ? device_add+0x2bf/0x3f0
[<c0321905>] ? mutex_unlock+0x8/0xa
[<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
[<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
[<c025f9ef>] ? __scsi_add_device+0x85/0xab
[<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
[<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
[<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
[<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
[<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
[<c027219c>] ? via_init_one+0x1da/0x1e3
[<c01e5670>] ? pci_device_probe+0x39/0x59
[<c023a0a1>] ? driver_probe_device+0x9f/0x119
[<c023a158>] ? __driver_attach+0x3d/0x5f
[<c023990a>] ? bus_for_each_dev+0x3e/0x60
[<c0239f39>] ? driver_attach+0x14/0x16
[<c023a11b>] ? __driver_attach+0x0/0x5f
[<c0239c9d>] ? bus_add_driver+0x99/0x1a0
[<c023a2d6>] ? driver_register+0x71/0xcd
[<c01e5852>] ? __pci_register_driver+0x53/0x81
[<c04205b1>] ? kernel_init+0x0/0xc4
[<c04378fc>] ? via_init+0x14/0x16
[<c0132800>] ? trace_softirqs_on+0x78/0x7e
[<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
[<c04205b1>] ? kernel_init+0x0/0x1c4
[<c04205b1>] ? kernel_init+0x0/0x1c4
[<c010373f>] ? kernel_thread_helper+0x7/0x10
=======================

--
Sitsofe | http://sucs.org/~sits/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,

May 18, 2008, 5:14:49 AM5/18/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, linux...@vger.kernel.org, Greg KH

(cc's added)

I thought we'd already fixed this?

Sitsofe Wheeler

unread,

May 18, 2008, 7:22:32 AM5/18/08

to linux-...@vger.kernel.org, linux...@vger.kernel.org

On Sun, 18 May 2008 02:14:23 -0700, Andrew Morton wrote:

> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>

>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a
>> panic will occur. At first I thought it might be provoked by vga=0x164
>> but this does not appear to be the case and the issue is seemingly
>> random. I've hand transcribed the oops so there may be errors in it but
>> hopefully it will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac IP:
>> [<c02604d6>] scsi_bus_uevent+0x1/0x17 *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>

> I thought we'd already fixed this?

Thanks to your tip off I've found that this bug is already in bugzilla
(complete with the commit that caused the regression) - http://
bugzilla.kernel.org/show_bug.cgi?id=10711 . There's nothing there that
says it has been fixed though. I'll look harder before reporting problems
next time.

--
Sitsofe | http://sucs.org/~sits/

Sitsofe Wheeler

unread,

May 18, 2008, 12:08:46 PM5/18/08

to linux-...@vger.kernel.org, linux...@vger.kernel.org

Andrew Morton wrote:

> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
> <sit...@yahoo.com> wrote:
>
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
>> will occur. At first I thought it might be provoked by vga=0x164 but this
>> does not appear to be the case and the issue is seemingly random. I've
>> hand transcribed the oops so there may be errors in it but hopefully it
>> will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac
>> IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>
>

> I thought we'd already fixed this?

If it hasn't yet been fixed I think it can be narrowed down to
[dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource() .
Be aware that the problem also seems to go away if an initrd file is present. I
struggled to revert this commit against the latest linux-next due to conflicts.

Here's the commit message:

Author: Bjorn Helgaas <bjorn....@hp.com> 2008-04-28 23:34:35
Committer: Len Brown <len....@intel.com> 2008-04-29 08:22:28
Child: cc8c2e308194f0997c718c7c735550ff06754d20 (PNP: make generic pnp_add_io_resource())
Branches: v2.6.26rc1, remotes/origin/master, remotes/linux-next/stable, remotes/linux-next/master, remotes/linux-next/history, master, linux-next, bisect
Follows: v2.6.25
Precedes: v2.6.26-rc1, next-20080502, next-20080501, next-20080430

PNP: make generic pnp_add_dma_resource()

Add a pnp_add_dma_resource() that can be used by all the PNP
backends. This consolidates a little more pnp_resource_table
knowledge into one place.

Signed-off-by: Bjorn Helgaas <bjorn....@hp.com>
Signed-off-by: Len Brown <len....@intel.com>

Here's the git-bisect log:

# bad: [2ddcca36c8bcfa251724fe342c8327451988be0d] Linux 2.6.26-rc1
# good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25
git-bisect start 'v2.6.26-rc1' 'v2.6.25'
# good: [7ae44cfa7ab29b277691327e8de790d7b880722f] [ALSA] snd-powermac: style awacs.s and awacs.h
git-bisect good 7ae44cfa7ab29b277691327e8de790d7b880722f
# good: [c60264c494a119cd3a716a22edc0137b11de6d1e] smack: fix integer as NULL pointer warning in smack_lsm.c
git-bisect good c60264c494a119cd3a716a22edc0137b11de6d1e
# good: [3977c965ec35ce1a7eac988ad313f0fc9aee9660] ext4: zero out small extents when writing to prealloc area.
git-bisect good 3977c965ec35ce1a7eac988ad313f0fc9aee9660
# good: [ccf2779544eecfcc5447e2028d1029b6d4ff7bb6] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git-bisect good ccf2779544eecfcc5447e2028d1029b6d4ff7bb6
# bad: [55e462b05b5df4fd113c4a304c4f487d44b0898e] memcg: simple stats for memory resource controller
git-bisect bad 55e462b05b5df4fd113c4a304c4f487d44b0898e
# good: [96916090f488986a4ebb8e9ffa6a3b50881d5ccd] Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release
git-bisect good 96916090f488986a4ebb8e9ffa6a3b50881d5ccd
# bad: [6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2
# good: [f5d94ff014cb7e6212f40fc6644f3fd68507df33] PNP: pass resources, not indexes, to pnp_check_port(), et al
git-bisect good f5d94ff014cb7e6212f40fc6644f3fd68507df33
# bad: [d152cf5d0c3325979e71ee53b425fdd51a1a285a] PNPACPI: move _CRS/_PRS warnings closer to the action
git-bisect bad d152cf5d0c3325979e71ee53b425fdd51a1a285a
# good: [784f01d5bdeae7d7005ede17305306b042ba2617] PNP: add struct pnp_resource
git-bisect good 784f01d5bdeae7d7005ede17305306b042ba2617
# good: [dbddd0383c59d588f8db5e773b062756e39117ec] PNP: make generic pnp_add_irq_resource()
git-bisect good dbddd0383c59d588f8db5e773b062756e39117ec
# bad: [cc8c2e308194f0997c718c7c735550ff06754d20] PNP: make generic pnp_add_io_resource()
git-bisect bad cc8c2e308194f0997c718c7c735550ff06754d20
# bad: [dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource()
git-bisect bad dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2

--
Sitsofe | http://sucs.org/~sits/

Greg KH

unread,

May 18, 2008, 1:50:13 PM5/18/08

to Andrew Morton, Sitsofe Wheeler, linux-...@vger.kernel.org, linux...@vger.kernel.org

I have a patch for it, posted to lkml on Friday (or was it thursday...)
Then on Friday I went and audited all users of device_create and found 5
other places where this same problem will occur (or something almost
like it) and fixed them up and Cc:ed the subsystem maintainers that were
affected.

I wanted a round of tests in linux-next to happen before sending them
all to Linus. I'll do that on Monday as they missed the last linux-next
release.

If you want to test them out yourself, the patches are this one first:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
and then add any one of the rest of the patches in the directory at:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
depending on the subsystem you are having problems with. There are 12
different ones in there.

hope this helps,

greg k-h

Sitsofe Wheeler

unread,

May 18, 2008, 4:31:21 PM5/18/08

to linux-...@vger.kernel.org, linux...@vger.kernel.org

(I've dropped akpm because the mail server doesn't like where I'm sending
from with this address)

Greg KH wrote:

> On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
>>
>> (cc's added)
>>
>> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>> <sit...@yahoo.com> wrote:
>>
>> > BUG: unable to handle kernel paging request at e6f17fac
>> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> > *pde = 2714b163 *pte = 26f17160
>> > Oops: 0000 [#1] DEBUG_PAGEALLOC
>> > last sysfs file:
>>

>> I thought we'd already fixed this?
>

> If you want to test them out yourself, the patches are this one first:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
> and then add any one of the rest of the patches in the directory at:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
> depending on the subsystem you are having problems with. There are 12
> different ones in there.
>
> hope this helps,

Bad news - the patches all applied to 2.6.26-rc2 / current HEAD but the
problem remained.

The trace at the end seems slightly different though (alas I have to
transcribe):

BUG: unable to handle kernel paging request at e725ffac
IP: [<c025fdb6>] scsi_bus_uevent+0x1/0x17
*pde = 27845163 *pte = 2725f160
Oops: 0000 [#1] DEBUG_PAGEALLOC

[...]

dev_uevent
dev_uevent
kobject_uevent_env
mutex_unlock
kobject_uevent
device_add
mutex_unlock
scsi_sys_add_sdev
scsi_probe_and_add_lun
mark_held_locks
__scsi_add_device
ata_scsi_scan_host
ata_host_register
ata_pci_sff_activate_host
ata_sff_interrupt
ata_pci_sff_init_one
pci_device_probe
driver_probe_device
__driver_attach
bus_for_each_dev
driver_attach
__driver_attach
bus_add_driver
driver_register
__pci_register_driver
kernel_init
via_init
kernel_init
kernel_init
kernel_init
krenel_thread_helper

--
Sitsofe | http://sucs.org/~sits/

James Bottomley

unread,

May 22, 2008, 7:46:44 PM5/22/08

to Andrew Morton, Sitsofe Wheeler, linux-...@vger.kernel.org, linux...@vger.kernel.org, Greg KH

Actually, I think this is a very subtle bug; what I think is happening
is that after Hannes sysfs changes, we now add scsi_bus_type to the
target device. However, scsi_bus_uevent() unconditionally casts from
dev to a struct scsi_device and then looks at the type entry. My theory
is that in this particular config going from struct scsi_target to
struct device and back to struct scsi_device actually tips us over into
unmapped space for the -> type deref.

Hopefully this should fix it by checking the device type before doing
the deref.

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 049103f..93d2b67 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -359,7 +359,12 @@ static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)

static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct scsi_device *sdev = to_scsi_device(dev);
+ struct scsi_device *sdev;
+
+ if (dev->type != &scsi_dev_type)
+ return 0;
+
+ sdev = to_scsi_device(dev);

add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
return 0;

Sitsofe Wheeler

unread,

May 23, 2008, 3:40:27 PM5/23/08

to linux-...@vger.kernel.org, linux...@vger.kernel.org

James Bottomley wrote:

> Actually, I think this is a very subtle bug; what I think is happening
> is that after Hannes sysfs changes, we now add scsi_bus_type to the
> target device. However, scsi_bus_uevent() unconditionally casts from
> dev to a struct scsi_device and then looks at the type entry. My theory
> is that in this particular config going from struct scsi_target to
> struct device and back to struct scsi_device actually tips us over into
> unmapped space for the -> type deref.
>
> Hopefully this should fix it by checking the device type before doing
> the deref.

This fixed the problem for me (it was horribly intermittant but I've done
10+ consecutive reboots without seeing an oopos). I changed the patch to
printk everytime the condition was hit and it seems to happen twice per
PATA device - once after each scsi?: pata_via message and then again after
each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .

The thing I don't understand about your explanation is that it sounds like
the device struct is being round-tripped (but is just being cast to
different things along the way). If this is the case why would this problem
ever arise? Surely if it is really a struct scsi_device underneath there
should be no problem?

--
Sitsofe | http://sucs.org/~sits/

James Bottomley

unread,

May 23, 2008, 4:27:05 PM5/23/08

to Sitsofe Wheeler, linux...@vger.kernel.org, linux-...@vger.kernel.org

On Fri, 2008-05-23 at 20:34 +0100, Sitsofe Wheeler wrote:
> <posted & mailed>
>
> James Bottomley wrote:
>
> > Actually, I think this is a very subtle bug; what I think is happening
> > is that after Hannes sysfs changes, we now add scsi_bus_type to the
> > target device. However, scsi_bus_uevent() unconditionally casts from
> > dev to a struct scsi_device and then looks at the type entry. My theory
> > is that in this particular config going from struct scsi_target to
> > struct device and back to struct scsi_device actually tips us over into
> > unmapped space for the -> type deref.
> >
> > Hopefully this should fix it by checking the device type before doing
> > the deref.
>
> This fixed the problem for me (it was horribly intermittant but I've done
> 10+ consecutive reboots without seeing an oopos). I changed the patch to
> printk everytime the condition was hit and it seems to happen twice per
> PATA device - once after each scsi?: pata_via message and then again after
> each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .
>
> The thing I don't understand about your explanation is that it sounds like
> the device struct is being round-tripped (but is just being cast to
> different things along the way). If this is the case why would this problem
> ever arise? Surely if it is really a struct scsi_device underneath there
> should be no problem?

The event is called for all generic device objects belonging to the
scsi_bus_type. That means both struct scsi_device and struct
scsi_target objects. When it's called for struct scsi_target objects,
casting out to struct scsi_device does the wrong thing.

James