BUG: unable to handle kernel paging request at e6f17fac
IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
*pde = 2714b163 *pte = 26f17160
Oops: 0000 [#1] DEBUG_PAGEALLOC
last sysfs file:
Pid: 1, comm: swapper Not tainted (2.6.26-rc2-next-20080516skw #30)
EIP: 0060:[<c02604d6>] EFLAGS: 00010282 CPU: 0
EIP is at scsi_bus_uevent+0x1/0x17
EAX: e6f18014 EBX: e6f18014 ECX: c02604d5 EDX: e7173000
ESI: e7173000 EDI: e7173000 EBP: e7851ca0 ESP: e7851c90
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=e7850000 task=e7848000 task.ti=e7850000)
Stack: e7851ca0 c0237f3a c0237eac 00000000 e7851ce4 c01da36d 00000000 e6f180fc
e7835000 c03ebf42 e7163240 c03af631 c040b050 c040b598 00000000 e6f18014
00000000 e7851cdc 00000000 e6f18014 00000000 e7851cec c01da52a e7851d2c
Call Trace:
[<c0237f3a>] ? dev_uevent+0x8e/0xca
[<c0237eac>] ? dev_uevent+0x0/0xca
[<c01da36d>] ? kobject_uevent_env+0x14c/0x2ff
[<c01da52a>] ? kobject_uevent_env+0xa/0xc
[<c023884b>] ? device_add+0x2bf/0x3f0
[<c0321905>] ? mutex_unlock+0x8/0xa
[<c02607b4>] ? scsi_sysfs_add_sdev+0x39/0x1d3
[<c025f037>] ? scsi_probe_and_add_lun+0x714/0x08
[<c025f9ef>] ? __scsi_add_device+0x85/0xab
[<c026a70c>] ? ata_scsi_scan_host+0x7f/0x15e
[<c0267ec8>] ? ata_host_register+0x1c8/0x1e5
[<c026ec75>] ? ata_pci_sff_activate_host+0x179/0x19f
[<c0270b61>] ? ata_sff_interupt+0x0/0x1d7
[<c026f076>] ? ata_pci_sff_init_one+0x97/0xe1
[<c027219c>] ? via_init_one+0x1da/0x1e3
[<c01e5670>] ? pci_device_probe+0x39/0x59
[<c023a0a1>] ? driver_probe_device+0x9f/0x119
[<c023a158>] ? __driver_attach+0x3d/0x5f
[<c023990a>] ? bus_for_each_dev+0x3e/0x60
[<c0239f39>] ? driver_attach+0x14/0x16
[<c023a11b>] ? __driver_attach+0x0/0x5f
[<c0239c9d>] ? bus_add_driver+0x99/0x1a0
[<c023a2d6>] ? driver_register+0x71/0xcd
[<c01e5852>] ? __pci_register_driver+0x53/0x81
[<c04205b1>] ? kernel_init+0x0/0xc4
[<c04378fc>] ? via_init+0x14/0x16
[<c0132800>] ? trace_softirqs_on+0x78/0x7e
[<c01dd90c>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0102c3a>] ? restore_nocheck_notrace+0x0/0xe
[<c04205b1>] ? kernel_init+0x0/0x1c4
[<c04205b1>] ? kernel_init+0x0/0x1c4
[<c010373f>] ? kernel_thread_helper+0x7/0x10
=======================
--
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
I thought we'd already fixed this?
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a
>> panic will occur. At first I thought it might be provoked by vga=0x164
>> but this does not appear to be the case and the issue is seemingly
>> random. I've hand transcribed the oops so there may be errors in it but
>> hopefully it will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac IP:
>> [<c02604d6>] scsi_bus_uevent+0x1/0x17 *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>
> I thought we'd already fixed this?
Thanks to your tip off I've found that this bug is already in bugzilla
(complete with the commit that caused the regression) - http://
bugzilla.kernel.org/show_bug.cgi?id=10711 . There's nothing there that
says it has been fixed though. I'll look harder before reporting problems
next time.
--
Sitsofe | http://sucs.org/~sits/
> (cc's added)
>
> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
> <sit...@yahoo.com> wrote:
>
>> Sometimes when booting next-20080516 on Ubuntu Gutsy an oops then a panic
>> will occur. At first I thought it might be provoked by vga=0x164 but this
>> does not appear to be the case and the issue is seemingly random. I've
>> hand transcribed the oops so there may be errors in it but hopefully it
>> will still help:
>>
>> BUG: unable to handle kernel paging request at e6f17fac
>> IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> *pde = 2714b163 *pte = 26f17160
>> Oops: 0000 [#1] DEBUG_PAGEALLOC
>> last sysfs file:
>>
>
> I thought we'd already fixed this?
If it hasn't yet been fixed I think it can be narrowed down to
[dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource() .
Be aware that the problem also seems to go away if an initrd file is present. I
struggled to revert this commit against the latest linux-next due to conflicts.
Here's the commit message:
Author: Bjorn Helgaas <bjorn....@hp.com> 2008-04-28 23:34:35
Committer: Len Brown <len....@intel.com> 2008-04-29 08:22:28
Child: cc8c2e308194f0997c718c7c735550ff06754d20 (PNP: make generic pnp_add_io_resource())
Branches: v2.6.26rc1, remotes/origin/master, remotes/linux-next/stable, remotes/linux-next/master, remotes/linux-next/history, master, linux-next, bisect
Follows: v2.6.25
Precedes: v2.6.26-rc1, next-20080502, next-20080501, next-20080430
PNP: make generic pnp_add_dma_resource()
Add a pnp_add_dma_resource() that can be used by all the PNP
backends. This consolidates a little more pnp_resource_table
knowledge into one place.
Signed-off-by: Bjorn Helgaas <bjorn....@hp.com>
Signed-off-by: Len Brown <len....@intel.com>
Here's the git-bisect log:
# bad: [2ddcca36c8bcfa251724fe342c8327451988be0d] Linux 2.6.26-rc1
# good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25
git-bisect start 'v2.6.26-rc1' 'v2.6.25'
# good: [7ae44cfa7ab29b277691327e8de790d7b880722f] [ALSA] snd-powermac: style awacs.s and awacs.h
git-bisect good 7ae44cfa7ab29b277691327e8de790d7b880722f
# good: [c60264c494a119cd3a716a22edc0137b11de6d1e] smack: fix integer as NULL pointer warning in smack_lsm.c
git-bisect good c60264c494a119cd3a716a22edc0137b11de6d1e
# good: [3977c965ec35ce1a7eac988ad313f0fc9aee9660] ext4: zero out small extents when writing to prealloc area.
git-bisect good 3977c965ec35ce1a7eac988ad313f0fc9aee9660
# good: [ccf2779544eecfcc5447e2028d1029b6d4ff7bb6] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git-bisect good ccf2779544eecfcc5447e2028d1029b6d4ff7bb6
# bad: [55e462b05b5df4fd113c4a304c4f487d44b0898e] memcg: simple stats for memory resource controller
git-bisect bad 55e462b05b5df4fd113c4a304c4f487d44b0898e
# good: [96916090f488986a4ebb8e9ffa6a3b50881d5ccd] Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release
git-bisect good 96916090f488986a4ebb8e9ffa6a3b50881d5ccd
# bad: [6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 6de3d58dcfbab516dbe9aff36ea9542f40cd1bf2
# good: [f5d94ff014cb7e6212f40fc6644f3fd68507df33] PNP: pass resources, not indexes, to pnp_check_port(), et al
git-bisect good f5d94ff014cb7e6212f40fc6644f3fd68507df33
# bad: [d152cf5d0c3325979e71ee53b425fdd51a1a285a] PNPACPI: move _CRS/_PRS warnings closer to the action
git-bisect bad d152cf5d0c3325979e71ee53b425fdd51a1a285a
# good: [784f01d5bdeae7d7005ede17305306b042ba2617] PNP: add struct pnp_resource
git-bisect good 784f01d5bdeae7d7005ede17305306b042ba2617
# good: [dbddd0383c59d588f8db5e773b062756e39117ec] PNP: make generic pnp_add_irq_resource()
git-bisect good dbddd0383c59d588f8db5e773b062756e39117ec
# bad: [cc8c2e308194f0997c718c7c735550ff06754d20] PNP: make generic pnp_add_io_resource()
git-bisect bad cc8c2e308194f0997c718c7c735550ff06754d20
# bad: [dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2] PNP: make generic pnp_add_dma_resource()
git-bisect bad dc16f5f2ede8cc2acf8ac22857a7fecf3a4296c2
--
Sitsofe | http://sucs.org/~sits/
I have a patch for it, posted to lkml on Friday (or was it thursday...)
Then on Friday I went and audited all users of device_create and found 5
other places where this same problem will occur (or something almost
like it) and fixed them up and Cc:ed the subsystem maintainers that were
affected.
I wanted a round of tests in linux-next to happen before sending them
all to Linus. I'll do that on Monday as they missed the last linux-next
release.
If you want to test them out yourself, the patches are this one first:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
and then add any one of the rest of the patches in the directory at:
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
depending on the subsystem you are having problems with. There are 12
different ones in there.
hope this helps,
greg k-h
(I've dropped akpm because the mail server doesn't like where I'm sending
from with this address)
Greg KH wrote:
> On Sun, May 18, 2008 at 02:14:23AM -0700, Andrew Morton wrote:
>>
>> (cc's added)
>>
>> On Sat, 17 May 2008 12:50:24 +0000 (UTC) Sitsofe Wheeler
>> <sit...@yahoo.com> wrote:
>>
>> > BUG: unable to handle kernel paging request at e6f17fac
>> > IP: [<c02604d6>] scsi_bus_uevent+0x1/0x17
>> > *pde = 2714b163 *pte = 26f17160
>> > Oops: 0000 [#1] DEBUG_PAGEALLOC
>> > last sysfs file:
>>
>> I thought we'd already fixed this?
>
> If you want to test them out yourself, the patches are this one first:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/driver-core-add-device_create_vargs-and-device_create_drvdata.patch
> and then add any one of the rest of the patches in the directory at:
>
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver-core.current/
> depending on the subsystem you are having problems with. There are 12
> different ones in there.
>
> hope this helps,
Bad news - the patches all applied to 2.6.26-rc2 / current HEAD but the
problem remained.
The trace at the end seems slightly different though (alas I have to
transcribe):
BUG: unable to handle kernel paging request at e725ffac
IP: [<c025fdb6>] scsi_bus_uevent+0x1/0x17
*pde = 27845163 *pte = 2725f160
Oops: 0000 [#1] DEBUG_PAGEALLOC
[...]
dev_uevent
dev_uevent
kobject_uevent_env
mutex_unlock
kobject_uevent
device_add
mutex_unlock
scsi_sys_add_sdev
scsi_probe_and_add_lun
mark_held_locks
__scsi_add_device
ata_scsi_scan_host
ata_host_register
ata_pci_sff_activate_host
ata_sff_interrupt
ata_pci_sff_init_one
pci_device_probe
driver_probe_device
__driver_attach
bus_for_each_dev
driver_attach
__driver_attach
bus_add_driver
driver_register
__pci_register_driver
kernel_init
via_init
kernel_init
kernel_init
kernel_init
krenel_thread_helper
--
Sitsofe | http://sucs.org/~sits/
Actually, I think this is a very subtle bug; what I think is happening
is that after Hannes sysfs changes, we now add scsi_bus_type to the
target device. However, scsi_bus_uevent() unconditionally casts from
dev to a struct scsi_device and then looks at the type entry. My theory
is that in this particular config going from struct scsi_target to
struct device and back to struct scsi_device actually tips us over into
unmapped space for the -> type deref.
Hopefully this should fix it by checking the device type before doing
the deref.
James
---
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 049103f..93d2b67 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -359,7 +359,12 @@ static int scsi_bus_match(struct device *dev, struct device_driver *gendrv)
static int scsi_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct scsi_device *sdev = to_scsi_device(dev);
+ struct scsi_device *sdev;
+
+ if (dev->type != &scsi_dev_type)
+ return 0;
+
+ sdev = to_scsi_device(dev);
add_uevent_var(env, "MODALIAS=" SCSI_DEVICE_MODALIAS_FMT, sdev->type);
return 0;
James Bottomley wrote:
> Actually, I think this is a very subtle bug; what I think is happening
> is that after Hannes sysfs changes, we now add scsi_bus_type to the
> target device. However, scsi_bus_uevent() unconditionally casts from
> dev to a struct scsi_device and then looks at the type entry. My theory
> is that in this particular config going from struct scsi_target to
> struct device and back to struct scsi_device actually tips us over into
> unmapped space for the -> type deref.
>
> Hopefully this should fix it by checking the device type before doing
> the deref.
This fixed the problem for me (it was horribly intermittant but I've done
10+ consecutive reboots without seeing an oopos). I changed the patch to
printk everytime the condition was hit and it seems to happen twice per
PATA device - once after each scsi?: pata_via message and then again after
each scsi 0:0:0:0: Direct-Accesss ATA DISKID etc : 0 ANSI: 5 .
The thing I don't understand about your explanation is that it sounds like
the device struct is being round-tripped (but is just being cast to
different things along the way). If this is the case why would this problem
ever arise? Surely if it is really a struct scsi_device underneath there
should be no problem?
--
Sitsofe | http://sucs.org/~sits/
The event is called for all generic device objects belonging to the
scsi_bus_type. That means both struct scsi_device and struct
scsi_target objects. When it's called for struct scsi_target objects,
casting out to struct scsi_device does the wrong thing.
James