we tried to upgrade a couple of our proliant servers from 2.6.31.6 to
2.6.32.1.
On two of our DL385g1 servers we had problems booting 2.6.32.1, as they
paniced.
One of them eventually booted correctly when it was decided to log its
serial console output; that strategy proved unsuccessful with the second
box.
[ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
..
[ 5.308739] Call Trace:
[ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
[ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
[ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
[ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
[ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
[ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
[ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
[ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
[ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
[ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
[ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
[ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
[ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
[ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b
[ 5.308739] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 5.308739] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 5.308739] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 5.308739] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 5.308739] [<ffffffff81036a00>] ? child_rip+0x0/0x20
is from the machine that reliably fails to boot.
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/ravel
hosts the complete serial console output.
What I caught on the second box, that eventually decided to boot is
similar, but not identical:
[ 19.028333] Call Trace:
[ 19.028333] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
[ 19.028333] [<ffffffff810c3840>] kstrdup+0x40/0x70
[ 19.028333] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
[ 19.028333] [<ffffffff81150b17>] ? sysfs_add_one+0x27/0xd0
[ 19.028333] [<ffffffff81151bf7>] sysfs_do_create_link+0x87/0x160
[ 19.028333] [<ffffffff81151cee>] sysfs_create_link+0xe/0x10
[ 19.028333] [<ffffffff81422072>] device_add+0x272/0x730
[ 19.028333] [<ffffffff8139779e>] ? kvasprintf+0x6e/0x90
[ 19.028333] [<ffffffff81422549>] device_register+0x19/0x20
[ 19.028333] [<ffffffff8142262c>] device_create_vargs+0xdc/0xf0
[ 19.028333] [<ffffffff8142268b>] device_create+0x4b/0x50
[ 19.028333] [<ffffffff813e9702>] ? extract_entropy+0xe2/0x140
[ 19.028333] [<ffffffff813f573f>] misc_register+0xbf/0x180
[ 19.028333] [<ffffffff8107a4e0>] ? init_oops_id+0x0/0x40
[ 19.028333] [<ffffffff81a2626b>] ? pm_qos_power_init+0x0/0xe1
[ 19.028333] [<ffffffff81a262a3>] pm_qos_power_init+0x38/0xe1
[ 19.028333] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
[ 19.028333] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
[ 19.028333] [<ffffffff81036a0a>] child_rip+0xa/0x20
[ 19.028333] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
[ 19.028333] [<ffffffff81036a00>] ? child_rip+0x0/0x20
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-bad
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/klecker-good
for the output during a successful boot.
The config file can be found at
http://asteria.noreply.org/~weasel/volatile/2009-12-15-1VAB84BxJzE/config-2.6.32.1-dsa-amd64
Cheers,
Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> we tried to upgrade a couple of our proliant servers from 2.6.31.6 to
> 2.6.32.1.
>
> On two of our DL385g1 servers we had problems booting 2.6.32.1, as they
> paniced.
Several more do not boot .32 reliably. Anything I can try?
> [ 5.304749] BUG: unable to handle kernel NULL pointer dereference at 000000000000001f
> ..
> [ 5.308739] Call Trace:
> [ 5.308739] [<ffffffff810c3840>] kstrdup+0x40/0x70
> [ 5.308739] [<ffffffff81150d77>] sysfs_new_dirent+0xf7/0x110
> [ 5.308739] [<ffffffff8115121d>] create_dir+0x3d/0xc0
> [ 5.308739] [<ffffffff81090af1>] ? autoremove_wake_function+0x11/0x40
> [ 5.308739] [<ffffffff811512d4>] sysfs_create_dir+0x34/0x50
> [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> [ 5.308739] [<ffffffff8138e961>] kobject_add_internal+0xe1/0x1e0
> [ 5.308739] [<ffffffff8138eb78>] kobject_add_varg+0x38/0x60
> [ 5.308739] [<ffffffff8138ec15>] kobject_init_and_add+0x75/0x90
> [ 5.308739] [<ffffffff81150560>] ? sysfs_ilookup_test+0x0/0x20
> [ 5.308739] [<ffffffff8115082d>] ? sysfs_find_dirent+0x2d/0x40
> [ 5.308739] [<ffffffff81150ec1>] ? sysfs_addrm_finish+0x21/0x250
> [ 5.308739] [<ffffffff8138e7ea>] ? kobject_get+0x1a/0x30
> [ 5.308739] [<ffffffff810e6fe4>] ? kmem_cache_alloc+0x84/0xc0
> [ 5.308739] [<ffffffff814238d4>] bus_add_driver+0x94/0x260
> [ 5.308739] [<ffffffff81424ed9>] driver_register+0x79/0x160
> [ 5.308739] [<ffffffff815a28a3>] __hid_register_driver+0x43/0x80
> [ 5.308739] [<ffffffff81a3d7ff>] ? gyration_init+0x0/0x1b
> [ 5.308739] [<ffffffff81a3d818>] gyration_init+0x19/0x1b
Seems to be caused by the "gyration driver" whatever that is. Do you
have such a USB device?
It could be some module mismatch, it looks suspicious
and from a quick look the gyration driver does nothing bad
in that init path. Try a make clean and remove/rebuild/reinstall all the modules
on the target system.
If that doesn't help perhaps disable CONFIG_HID_GYRATION,
but from your other oops something more seems to be broken anyways.
> [ 5.308739] [<ffffffff81009048>] do_one_initcall+0x38/0x1a0
> [ 5.308739] [<ffffffff81a0e6b5>] kernel_init+0x172/0x1ca
> [ 5.308739] [<ffffffff81036a0a>] child_rip+0xa/0x20
> [ 5.308739] [<ffffffff81a0e543>] ? kernel_init+0x0/0x1ca
> [ 5.308739] [<ffffffff81036a00>] ? child_rip+0x0/0x20
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
Doubtful.
> It could be some module mismatch, it looks suspicious
> and from a quick look the gyration driver does nothing bad
> in that init path. Try a make clean and remove/rebuild/reinstall all the modules
> on the target system.
>
> If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> but from your other oops something more seems to be broken anyways.
This is a static kernel - no module support. Anyway, I also tried
without CONFIG_USB_HID (which pulls in all the other HID_* things) but
no luck.
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
Try a make distclean + rebuild anyways.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
> > If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> > but from your other oops something more seems to be broken anyways.
>
> This is a static kernel - no module support. Anyway, I also tried
> without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> no luck.
However, disabling all of HID (CONFIG_HID_SUPPORT=n) makes the system
boot (Previously HID, HIDRAW and HID_SUPPORT were still enabled).
> > This is a static kernel - no module support. Anyway, I also tried
> > without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> > no luck.
>
> Try a make distclean + rebuild anyways.
I usually do. make-kpkg doesn't really like building from dirty
directories all that much.
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
> On Tue, 22 Dec 2009, Peter Palfrader wrote:
>
> > > If that doesn't help perhaps disable CONFIG_HID_GYRATION,
> > > but from your other oops something more seems to be broken anyways.
> >
> > This is a static kernel - no module support. Anyway, I also tried
> > without CONFIG_USB_HID (which pulls in all the other HID_* things) but
> > no luck.
>
> However, disabling all of HID (CONFIG_HID_SUPPORT=n) makes the system
> boot (Previously HID, HIDRAW and HID_SUPPORT were still enabled).
However, I still see panics on boot occassionally, tho not so often or
reproducible. So far only on dl385 (opteron) systems.
And all of the backtraces go through sysfs_new_dirent() near the top.
It's suspicious if you don't have such devices, that would
point to something being confused in the driver probing
layer.
>
> However, I still see panics on boot occassionally, tho not so often or
> reproducible. So far only on dl385 (opteron) systems.
Multiple systems and the same oopses?
>
> And all of the backtraces go through sysfs_new_dirent() near the top.
Please post full oopses.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.