This is with freshly compiled 1.8.2 patchless clients, vanilla 2.6.27.39
kernel on OpenSuse 10.3. The servers are currently running 1.6.6 on
RHEL5.3. However, they are seen as soon as the module is loaded and
before any file system is mounted.
It doesn't seem to affect the clients ability to mount and write to the
file system. Should we be worried?
GREG
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
caller is set_ptldebug_header+0x41/0xa0 [libcfs]
Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
Call Trace:
[<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
[<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
[<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
[<ffffffff802514c2>] smp_call_function+0x3f/0x5e
[<ffffffff80254e9a>] load_module+0x166f/0x176e
[<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
[<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass]
[<ffffffff80209041>] _stext+0x41/0x110
[<ffffffff80255037>] sys_init_module+0x9e/0x1ab
[<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
Lustre: OBD class driver, http://www.lustre.org/
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
caller is set_ptldebug_header+0x41/0xa0 [libcfs]
Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
Call Trace:
[<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
[<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
[<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
[<ffffffff80254e9a>] load_module+0x166f/0x176e
[<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
[<ffffffffa014c091>] init_obdclass+0x91/0x3e4 [obdclass]
[<ffffffff80209041>] _stext+0x41/0x110
[<ffffffff80255037>] sys_init_module+0x9e/0x1ab
[<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
Lustre: Lustre Version: 1.8.2
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
caller is set_ptldebug_header+0x41/0xa0 [libcfs]
Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
Call Trace:
[<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
[<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
[<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
[<ffffffff80254e9a>] load_module+0x166f/0x176e
[<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
[<ffffffffa014c0d7>] init_obdclass+0xd7/0x3e4 [obdclass]
[<ffffffff80209041>] _stext+0x41/0x110
[<ffffffff80255037>] sys_init_module+0x9e/0x1ab
[<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
Lustre: Build Version: 1.8.2-20100123020848-PRISTINE-2.6.27.39-default
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
caller is set_ptldebug_header+0x41/0xa0 [libcfs]
Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
Call Trace:
[<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
[<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
[<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
[<ffffffffa059331e>] libcfs_ipif_query+0x12e/0x2e0 [libcfs]
[<ffffffffa0592e9d>] libcfs_ipif_free_enumeration+0x3d/0x210 [libcfs]
[<ffffffff8030fdb9>] snprintf+0x44/0x4c
[<ffffffff80211af8>] read_tsc+0x9/0x1c
[<ffffffffa076529d>] ksocknal_startup+0x30d/0x4f0 [ksocklnd]
[<ffffffff8030ac8e>] ida_get_new_above+0xfd/0x188
[<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34
[<ffffffff80424d0b>] _spin_unlock_irqrestore+0x12/0x2c
[<ffffffffa0596bc8>] libcfs_ip_addr2str+0x38/0x40 [libcfs]
[<ffffffffa05d0e4c>] lnet_startup_lndnis+0x1dc/0x6d0 [lnet]
[<ffffffffa05d144d>] LNetNIInit+0x10d/0x1d0 [lnet]
[<ffffffffa014c130>] init_module+0x0/0x4c4 [ptlrpc]
[<ffffffffa06efbd5>] ptlrpc_ni_init+0x25/0x160 [ptlrpc]
[<ffffffffa06effaa>] ptlrpc_init_portals+0xa/0xc0 [ptlrpc]
[<ffffffffa06d4b99>] ptlrpc_init_xid+0x9/0x70 [ptlrpc]
[<ffffffffa014c1cd>] init_module+0x9d/0x4c4 [ptlrpc]
[<ffffffffa062bae0>] class_disconnect+0x0/0x490 [obdclass]
[<ffffffff80209041>] _stext+0x41/0x110
[<ffffffff80255037>] sys_init_module+0x9e/0x1ab
[<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
Lustre: Added LNI 172.23.115.206@tcp [8/256/0/180]
BUG: using smp_processor_id() in preemptible [00000000] code:
acceptor_988/6055
caller is set_ptldebug_header+0x41/0xa0 [libcfs]
Pid: 6055, comm: acceptor_988 Not tainted 2.6.27.39-default #2
Call Trace:
[<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
[<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
[<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
[<ffffffff80361798>] account+0xd8/0xe8
[<ffffffff80229682>] __wake_up+0x38/0x4f
[<ffffffff80239020>] local_bh_disable+0xe/0x10
[<ffffffff80424a58>] _spin_lock_bh+0x13/0x29
[<ffffffff803aac17>] release_sock+0x19/0xaa
[<ffffffff80238b30>] local_bh_enable_ip+0x82/0x9c
[<ffffffff803f9ea6>] inet_listen+0x71/0x78
[<ffffffffa0592dce>] libcfs_sock_listen+0xbe/0x130 [libcfs]
[<ffffffff80424a04>] _spin_lock_irqsave+0x18/0x34
[<ffffffffa05e97bd>] lnet_acceptor+0x19d/0x6a0 [lnet]
[<ffffffff8020cf19>] child_rip+0xa/0x11
[<ffffffffa05e9620>] lnet_acceptor+0x0/0x6a0 [lnet]
[<ffffffff8020cf0f>] child_rip+0x0/0x11
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; http://www.lustre.org/
--
Greg Matthews 01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
Gregory Matthews wrote:
> BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024
> caller is set_ptldebug_header+0x41/0xa0 [libcfs]
> Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
>
> Call Trace:
> [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
> [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
> [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
> [<ffffffff802514c2>] smp_call_function+0x3f/0x5e
> [<ffffffff80254e9a>] load_module+0x166f/0x176e
> [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
> [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass]
> [<ffffffff80209041>] _stext+0x41/0x110
> [<ffffffff80255037>] sys_init_module+0x9e/0x1ab
> [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
this looks a lot like this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=12518
Just need to figure out if the fix can be backported to 2.6.27.39
G
I think that is a different bug.
In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving
set_ptldebug_header() after the call to trace_get_tcd(), which should
pin the thread to the CPU by disabling preempt and stop the warning.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:
> On 2010-03-17, at 02:59, Gregory Matthews wrote:
>> Gregory Matthews wrote:
>>> BUG: using smp_processor_id() in preemptible [00000000] code:
>>> modprobe/6024
>>> caller is set_ptldebug_header+0x41/0xa0 [libcfs]
>>> Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
>>>
>>> Call Trace:
>>> [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
>>> [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
>>> [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
>>> [<ffffffff802514c2>] smp_call_function+0x3f/0x5e
>>> [<ffffffff80254e9a>] load_module+0x166f/0x176e
>>> [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
>>> [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass]
>>> [<ffffffff80209041>] _stext+0x41/0x110
>>> [<ffffffff80255037>] sys_init_module+0x9e/0x1ab
>>> [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b
> In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving
> set_ptldebug_header() after the call to trace_get_tcd(), which should
> pin the thread to the CPU by disabling preempt and stop the warning.
thanks for the advice. I've compiled new lustre packages with the
suggested fix and am currently running racer. So far, the above bug has
not reappeared but there is something similar:
Mar 18 10:34:57 i15-pilatus1 kernel: BUG: using smp_processor_id() in
preemptible [00000000] code: syslog-ng/2332
Mar 18 10:34:57 i15-pilatus1 kernel: caller is wake_up_klogd+0x27/0x3d
Mar 18 10:34:57 i15-pilatus1 kernel: Pid: 2332, comm: syslog-ng Not
tainted 2.6.27.39-default #2
Mar 18 10:34:57 i15-pilatus1 kernel:
Mar 18 10:34:57 i15-pilatus1 kernel: Call Trace:
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80313c2b>]
debug_smp_processor_id+0xd3/0xe8
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80234af4>]
wake_up_klogd+0x27/0x3d
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80366a55>]
write_chan+0x25a/0x2ef
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff8022bf45>]
default_wake_function+0x0/0xe
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80364bfa>]
tty_write+0x191/0x227
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff803667fb>]
write_chan+0x0/0x2ef
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80294b00>]
vfs_write+0xad/0x136
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff80294f86>]
sys_write+0x45/0x6e
Mar 18 10:34:57 i15-pilatus1 kernel: [<ffffffff8020bf8b>]
system_call_fastpath+0x16/0x1b
wake_up_klogd looks like a kernel function from kernel/printk.c so
perhaps this is /now/ kernel bug 12518. In the meantime, lustre just got
more compatible with the preempt kernel. Will this fix make it into
mainline?
My diff is:
--- lnet/libcfs/tracefile.c.orig 2010-03-18 08:45:53.000000000 +0000
+++ lnet/libcfs/tracefile.c 2010-03-18 08:50:09.000000000 +0000
@@ -258,10 +258,13 @@
if (strchr(file, '/'))
file = strrchr(file, '/') + 1;
+ tcd = trace_get_tcd();
+ /* suggested fix for stack traces in syslog, proposed by Andreas
+ *
http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/903cbf9ac251db33
+ */
set_ptldebug_header(&header, subsys, mask, line, CDEBUG_STACK());
- tcd = trace_get_tcd();
if (tcd == NULL) /* arch may not log in IRQ
context */
goto console;
GREG
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
--
Greg Matthews 01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
I've submitted bug 22409 with this patch, though I've updated the
comment. I can't say for sure which release it will be in, but I
don't see a big barrier to accepting it in short order.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
_______________________________________________
Andreas Dilger wrote:
> I've submitted bug 22409 with this patch, though I've updated the
> comment. I can't say for sure which release it will be in, but I don't
> see a big barrier to accepting it in short order.
after running racer over the weekend on some test clients I see a lot of
these, is this lustre or cat? There are lustre functions listed in the
stack trace.
GREG
Mar 22 04:04:13 dec055 kernel: BUG: scheduling while atomic:
cat/4439/0x00000002
Mar 22 04:04:13 dec055 kernel: Modules linked in: mgc lustre lov mdc
lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs gsd autofs4 nfs
lockd nfs_acl sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 microcode loop dm_mod rtc_cmos usb_storage iTCO_wdt
rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp
pci_hotplug button joydev dcdbas sg usbhid hid ff_memless ehci_hcd
uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd
ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock thermal
processor thermal_sys [last unloaded: libcfs]
Mar 22 04:04:13 dec055 kernel: Pid: 4439, comm: cat Not tainted
2.6.27.39-default #2
Mar 22 04:04:13 dec055 kernel:
Mar 22 04:04:13 dec055 kernel: Call Trace:
Mar 22 04:04:13 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd
Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 22 04:04:13 dec055 kernel: [<ffffffffa062e08b>]
osc_queue_async_io+0x63b/0x10c0 [osc]
Mar 22 04:04:13 dec055 kernel: [<ffffffff804236a9>]
schedule_timeout+0x1e/0xad
Mar 22 04:04:13 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f
Mar 22 04:04:13 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36
Mar 22 04:04:13 dec055 kernel: [<ffffffffa06a8b57>]
lov_putref+0x37/0xf90 [lov]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa07319ce>]
llap_from_page_with_lockh+0x42e/0x2670 [lustre]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa06add05>]
lov_queue_async_io+0x165/0x4b0 [lov]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa06c9113>]
lov_stripe_number+0x213/0x280 [lov]
Mar 22 04:04:13 dec055 kernel: [<ffffffff80424a04>]
_spin_lock_irqsave+0x18/0x34
Mar 22 04:04:13 dec055 kernel: [<ffffffffa06bf0e1>]
lov_get_info+0x151/0x2370 [lov]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0
[libcfs]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0
[obdclass]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa073d2e3>]
ll_readpage+0xd63/0x1f60 [lustre]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa06cd3dc>]
lov_fini_cancel_set+0x1ac/0x290 [lov]
Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 22 04:04:13 dec055 kernel: [<ffffffffa0534e54>]
ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc]
Mar 22 04:04:13 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 22 04:04:13 dec055 kernel: [<ffffffff8026b087>]
generic_file_aio_read+0x3c9/0x551
Mar 22 04:04:13 dec055 kernel: [<ffffffffa070d1fa>]
ll_file_aio_read+0xf1a/0x2350 [lustre]
Mar 22 04:04:13 dec055 kernel: [<ffffffff80424a04>]
_spin_lock_irqsave+0x18/0x34
Mar 22 04:04:13 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 22 04:04:13 dec055 kernel: [<ffffffffa0712e79>]
ll_file_read+0xb9/0xd0 [lustre]
Mar 22 04:04:13 dec055 kernel: [<ffffffff80246516>]
autoremove_wake_function+0x0/0x2e
Mar 22 04:04:13 dec055 kernel: [<ffffffff8020c636>] retint_kernel+0x26/0x30
Mar 22 04:04:13 dec055 kernel: [<ffffffff80294570>]
rw_verify_area+0x7f/0x9f
Mar 22 04:04:13 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133
Mar 22 04:04:13 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e
Mar 22 04:04:13 dec055 kernel: [<ffffffff8020bf8b>]
system_call_fastpath+0x16/0x1b
Mar 22 04:04:13 dec055 kernel:
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
--
Greg Matthews 01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
It definitely wouldn't have anything to do with cat itself.
Unfortunately, I can't see anywhere in that call stack where we are
scheduling while atomic. It appears (from what I can make of the
stack trace) we are in osc_queue_async_io() and the only place we grab
a spinlock is in a very isolated piece of code.
> BUG: scheduling while atomic: cat/4439/0x00000002
> Call Trace:
> [<ffffffff80422c6a>] schedule+0xf7/0x7bd
> [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
> [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc]
> [<ffffffffa06add05>] lov_queue_async_io+0x165/0x4b0 [lov]
> [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre]
> [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551
> [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre]
> [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre]
> [<ffffffff80294c33>] vfs_read+0xaa/0x133
> [<ffffffff80294f18>] sys_read+0x45/0x6e
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
_______________________________________________
Mar 21 05:31:42 dec055 kernel: BUG: scheduling while atomic:
cat/4439/0x00000002
Mar 21 05:31:42 dec055 kernel: Modules linked in:<3>BUG: scheduling
while atomic: cat/4508/0x00000002
Mar 21 05:31:42 dec055 kernel: mgc lustre lov mdc lquotaModules linked
in: mgc lustre lov mdc lquota osc ksocklnd osc ksocklnd ptlrpc obdclass
lnet ptlrpc obdclass lnet lvfs libcfs lvfs libcfs gsd gsd autofs4 nfs
autofs4 nfs lockd lockd nfs_acl nfs_acl sunrpc iptable_filter ip_tables
ip6table_filter sunrpc ip6_tables x_tables ipv6 microcode iptable_filter
ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode loop loop
dm_mod dm_mod rtc_cmos rtc_cmos usb_storage iTCO_wdt rtc_core
iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp
usb_storage pci_hotplug button iTCO_wdt rtc_core iTCO_vendor_support
bnx2 sr_mod rtc_lib cdrom serio_raw shpchp pci_hotplug button joydev
dcdbas sg usbhid hid ff_memless ehci_hcd uhci_hcd usbcore sd_mod mptsas
mptscsih mptbase joydev dcdbas sg usbhid hid ff_memless ehci_hcd
uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd
ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock
scsi_transport_sas edd ext3 mbcache jbd fan megaraid_sas ata_p
Mar 21 05:31:42 dec055 kernel: iix libata scsi_mod dock thermal
processor thermal_sys [last unloaded: libcfs]
Mar 21 05:31:42 dec055 kernel: Pid: 4439, comm: cat Not tainted
2.6.27.39-default #2
Mar 21 05:31:42 dec055 kernel: thermal processor thermal_sys [last
unloaded: libcfs]
Mar 21 05:31:42 dec055 kernel: Pid: 4508, comm: cat Not tainted
2.6.27.39-default #2
Mar 21 05:31:42 dec055 kernel:
Mar 21 05:31:42 dec055 kernel: Call Trace:
Mar 21 05:31:42 dec055 kernel:
Mar 21 05:31:42 dec055 kernel: Call Trace:
Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa062e08b>]
osc_queue_async_io+0x63b/0x10c0 [osc]
Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>]
schedule_timeout+0x1e/0xad
Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f
Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a8b57>]
lov_putref+0x37/0xf90 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff8042490f>] _spin_lock+0xe/0x24
Mar 21 05:31:42 dec055 kernel: [<ffffffff80229776>] task_rq_lock+0x40/0x79
Mar 21 05:31:42 dec055 kernel: [<ffffffff8022bf33>]
try_to_wake_up+0x188/0x19a
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06c9113>]
lov_stripe_number+0x213/0x280 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf0e1>]
lov_get_info+0x151/0x2370 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0
[libcfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80316a65>]
__percpu_counter_add+0x74/0x9b
Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>]
schedule_timeout+0x1e/0xad
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa07319ce>]
llap_from_page_with_lockh+0x42e/0x2670 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f
Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a88c0>]
lov_getref+0x20/0x40 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf046>]
lov_get_info+0xb6/0x2370 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0
[obdclass]
Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>]
set_next_entity+0x18/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>]
ll_readpage+0xd63/0x1f60 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>]
lov_fini_cancel_set+0x1ac/0x290 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>]
ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>]
generic_file_aio_read+0x3c9/0x551
Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>]
ll_file_aio_read+0xf1a/0x2350 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>]
_spin_lock_irqsave+0x18/0x34
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>]
ll_file_read+0xb9/0xd0 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0
[libcfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0
[obdclass]
Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>]
set_next_entity+0x18/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>]
ll_readpage+0xd63/0x1f60 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>]
lov_fini_cancel_set+0x1ac/0x290 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>]
ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>]
generic_file_aio_read+0x3c9/0x551
Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>]
ll_file_aio_read+0xf1a/0x2350 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>]
_spin_lock_irqsave+0x18/0x34
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>]
ll_file_read+0xb9/0xd0 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>]
autoremove_wake_function+0x0/0x2e
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>]
rw_verify_area+0x7f/0x9f
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e
Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>]
system_call_fastpath+0x16/0x1b
Mar 21 05:31:42 dec055 kernel:
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>]
autoremove_wake_function+0x0/0x2e
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>]
rw_verify_area+0x7f/0x9f
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f00>] sys_read+0x2d/0x6e
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e
Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>]
system_call_fastpath+0x16/0x1b
Mar 21 05:31:42 dec055 kernel:
Mar 21 05:31:42 dec055 kernel: BUG: scheduling while atomic:
cat/4439/0x00000002
Mar 21 05:31:42 dec055 kernel: Modules linked in: mgc lustre lov mdc
lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs gsd autofs4 nfs
lockd nfs_acl sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 microcode loop dm_mod rtc_cmos usb_storage iTCO_wdt
rtc_core iTCO_vendor_support bnx2 sr_mod rtc_lib cdrom serio_raw shpchp
pci_hotplug button joydev dcdbas sg usbhid hid ff_memless ehci_hcd
uhci_hcd usbcore sd_mod mptsas mptscsih mptbase scsi_transport_sas edd
ext3 mbcache jbd fan megaraid_sas ata_piix libata scsi_mod dock thermal
processor thermal_sys [last unloaded: libcfs]
Mar 21 05:31:42 dec055 kernel: Pid: 4439, comm: cat Not tainted
2.6.27.39-default #2
Mar 21 05:31:42 dec055 kernel:
Mar 21 05:31:42 dec055 kernel: Call Trace:
Mar 21 05:31:42 dec055 kernel: [<ffffffff80422c6a>] schedule+0xf7/0x7bd
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffff804236a9>]
schedule_timeout+0x1e/0xad
Mar 21 05:31:42 dec055 kernel: [<ffffffff80423e63>] __down+0x62/0x8f
Mar 21 05:31:42 dec055 kernel: [<ffffffff802495ca>] down+0x27/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06a8b57>]
lov_putref+0x37/0xf90 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80231b1b>]
check_preempt_wakeup+0x190/0x19d
Mar 21 05:31:42 dec055 kernel: [<ffffffff8022bf33>]
try_to_wake_up+0x188/0x19a
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06c9113>]
lov_stripe_number+0x213/0x280 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06bf0e1>]
lov_get_info+0x151/0x2370 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04255e2>] cfs_alloc+0x52/0xb0
[libcfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04c1893>] oig_init+0xa3/0x2c0
[obdclass]
Mar 21 05:31:42 dec055 kernel: [<ffffffff8022982f>]
set_next_entity+0x18/0x36
Mar 21 05:31:42 dec055 kernel: [<ffffffffa073d2e3>]
ll_readpage+0xd63/0x1f60 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffffa06cd3dc>]
lov_fini_cancel_set+0x1ac/0x290 [lov]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0534e54>]
ldlm_lock_remove_from_lru+0x44/0x100 [ptlrpc]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
Mar 21 05:31:42 dec055 kernel: [<ffffffff8026b087>]
generic_file_aio_read+0x3c9/0x551
Mar 21 05:31:42 dec055 kernel: [<ffffffffa070d1fa>]
ll_file_aio_read+0xf1a/0x2350 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff80424a04>]
_spin_lock_irqsave+0x18/0x34
Mar 21 05:31:42 dec055 kernel: [<ffffffffa04552e1>]
lprocfs_counter_add+0xb1/0x120 [lvfs]
Mar 21 05:31:42 dec055 kernel: [<ffffffffa0712e79>]
ll_file_read+0xb9/0xd0 [lustre]
Mar 21 05:31:42 dec055 kernel: [<ffffffff804233cf>] thread_return+0x9f/0xc7
Mar 21 05:31:42 dec055 kernel: [<ffffffff80246516>]
autoremove_wake_function+0x0/0x2e
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294570>]
rw_verify_area+0x7f/0x9f
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294c33>] vfs_read+0xaa/0x133
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f00>] sys_read+0x2d/0x6e
Mar 21 05:31:42 dec055 kernel: [<ffffffff80294f18>] sys_read+0x45/0x6e
Mar 21 05:31:42 dec055 kernel: [<ffffffff8020bf8b>]
system_call_fastpath+0x16/0x1b
Mar 21 05:31:42 dec055 kernel:
Andreas Dilger wrote:
> It definitely wouldn't have anything to do with cat itself.
> Unfortunately, I can't see anywhere in that call stack where we are
> scheduling while atomic. It appears (from what I can make of the stack
> trace) we are in osc_queue_async_io() and the only place we grab a
> spinlock is in a very isolated piece of code.
>
>> BUG: scheduling while atomic: cat/4439/0x00000002
>> Call Trace:
>> [<ffffffff80422c6a>] schedule+0xf7/0x7bd
>> [<ffffffff80424c84>] _spin_unlock+0x10/0x2b
>> [<ffffffffa062e08b>] osc_queue_async_io+0x63b/0x10c0 [osc]
>> [<ffffffffa06add05>] lov_queue_async_io+0x165/0x4b0 [lov]
>> [<ffffffffa073d2e3>] ll_readpage+0xd63/0x1f60 [lustre]
>> [<ffffffff8026b087>] generic_file_aio_read+0x3c9/0x551
>> [<ffffffffa070d1fa>] ll_file_aio_read+0xf1a/0x2350 [lustre]
>> [<ffffffffa0712e79>] ll_file_read+0xb9/0xd0 [lustre]
>> [<ffffffff80294c33>] vfs_read+0xaa/0x133
>> [<ffffffff80294f18>] sys_read+0x45/0x6e
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
--
Greg Matthews 01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK