2.6.23 spinlock hang in kswapd under heavy disk write loads

Berkley Shands

unread,

Oct 10, 2007, 12:10:14 PM10/10/07

to

2.6.23 with CONFIG_DEBUG_SPINLOCK on does not hang under very high write loads
to either an LSI8888ELP (write rate 1.1GB/Sec) or to a highpoint RR2340 (write rate 1.0GB/Sec).
With CONFIG_DEBUG_SPINLOCK off however, the system hangs with kswapd getting
100% of the cpu and most if not all other processes are locked out. Sometimes even the
keyboard is locked out.

This is seen on a supermicro H8DM3-2 and H8DMi-2 motherboards, 16 GB RAM, 2222 SE or 2216 processors.
Not seen on a Uniwide 3546ES, or on a Supermicro H8DM8 with 8222 CPUS.

kswapd sits at (according to KDB)
__spinlock_irqsave + 0x15
prepare_to_wait + 0x15
kswapd + 0xe3
kthread + 0x47

berkley
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Nick Piggin

unread,

Oct 10, 2007, 6:10:06 PM10/10/07

to

On Thursday 11 October 2007 01:33, Berkley Shands wrote:
> 2.6.23 with CONFIG_DEBUG_SPINLOCK on does not hang under very high write
> loads to either an LSI8888ELP (write rate 1.1GB/Sec) or to a highpoint
> RR2340 (write rate 1.0GB/Sec). With CONFIG_DEBUG_SPINLOCK off however, the
> system hangs with kswapd getting 100% of the cpu and most if not all other
> processes are locked out. Sometimes even the keyboard is locked out.
>
> This is seen on a supermicro H8DM3-2 and H8DMi-2 motherboards, 16 GB RAM,
> 2222 SE or 2216 processors. Not seen on a Uniwide 3546ES, or on a
> Supermicro H8DM8 with 8222 CPUS.
>
> kswapd sits at (according to KDB)
> __spinlock_irqsave + 0x15
> prepare_to_wait + 0x15
> kswapd + 0xe3
> kthread + 0x47

This is the pgdat->kswapd_wait waitqueue lock, by the looks?
It should be basically impossible to get this lock wrong, so
perhaps what happens is some memory corruption which is padded
(or some other random change) by spinlock debugging?

Is this very reproducable? Does your dmesg have any funny
messages after performing this heavy write load with
CONFIG_DEBUG_SPINLOCK set?

Berkley Shands

unread,

Oct 10, 2007, 9:10:08 PM10/10/07

to

100% reproducible on the two motherboards in question.
Does not happen on any other motherboard I have in my possession
(not tyan, not uniwide, not socket 940...)

No errors, no dmesg, nothing with debug_spinlock set.
<sysrq> shows lots (when it works), but by then too many things are
locked up to be of much use. I can get into KDB and look around
(2.6.22 for kdb - it hangs there too). Even access to the local disk is
blocked.
Processes in core and running remain there (iostat, top, ...).
I personally think the bios are suspect on the PCIe, as symptoms change with
the bios rev. I did a major paper on SAS performance with one H8DMi,
but it got a bios rev, and now crashes. Missed interrupt? APIC sending
an interrupt to multiple cpus? I don't know.

Tell me what to look at, and I can get you the info. It usually takes 20
seconds
to go bang, using either the LSI8888ELP or the rocket raid 2340. Other
controllers
are too slow. 2.6.20 does not lock up. It is also 200MB/Sec slower in
writing :-)

thanks for the response.

berkley

Nick Piggin

unread,

Oct 10, 2007, 11:30:21 PM10/10/07

to

On Friday 12 October 2007 10:56, Berkley Shands wrote:
> 100% reproducible on the two motherboards in question.
> Does not happen on any other motherboard I have in my possession
> (not tyan, not uniwide, not socket 940...)
>
> No errors, no dmesg, nothing with debug_spinlock set.
> <sysrq> shows lots (when it works), but by then too many things are
> locked up to be of much use. I can get into KDB and look around
> (2.6.22 for kdb - it hangs there too). Even access to the local disk is
> blocked.
> Processes in core and running remain there (iostat, top, ...).
> I personally think the bios are suspect on the PCIe, as symptoms change
> with the bios rev. I did a major paper on SAS performance with one H8DMi,
> but it got a bios rev, and now crashes. Missed interrupt? APIC sending an
> interrupt to multiple cpus? I don't know.
>
> Tell me what to look at, and I can get you the info. It usually takes 20
> seconds
> to go bang, using either the LSI8888ELP or the rocket raid 2340. Other
> controllers
> are too slow. 2.6.20 does not lock up. It is also 200MB/Sec slower in
> writing :-)
>
> thanks for the response.

OK, it does sound suspiciously like a hardware bug, or some
unrelated software bug that is causing memory scribbles...

A few things you could do.

One is that you could verify that it indeed is the kswapd_wait
spinlock that it is spinning on, and then when you see the lockup,
you could verify that no other tasks are holding the lock. (it is
quite an inner lock, so you shouldn't have to wade through call
chains...). That would confirm corruption. Dumping the lock
contents and the fields in the structure around the lock might
give a clue.

You could put the spinlock somewhere else and see what happens
(move it around in the structure, or get even more creative...).
or do something like have 2 spinlocks, and when you encounter
the lockup, verify whether or not they agree.

(It sounds like you're pretty capable, but if you want me to have
a look at doing a patch or two to help, let me know.)

Another is to bisect the problem, however as you say the kernel
is going slower, so you may just bisect to the point where it
is sustaining enough load to trigger the bug, so this may not be
worth you time just yet.

You could _try_ turning on slab debugging. If there is random
corruption, it might get caught. Maybe it will just change
things enough to hide the problem though.

Thanks for reporting!

Mr. Berkley Shands

unread,

Oct 11, 2007, 12:30:19 PM10/11/07

to

With DEBUG_SLAB on, I can run only a very short time under 2.6.23
before a kernel panic.

[ 626.028180] eth0: too many iterations (6) in nv_nic_irq.
[ 626.167583] eth0: too many iterations (6) in nv_nic_irq.
[ 626.206729] eth0: too many iterations (6) in nv_nic_irq.
[ 626.400171] eth0: too many iterations (6) in nv_nic_irq.
[ 626.446374] eth0: too many iterations (6) in nv_nic_irq.

followed by the crash at
forcedeth:nv_nic_irq_optimized+0x89
handle_IRQ_event+0x25
handle_edge_IRQ+0xe3
do_IRQ

berkley

This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.

Nick Piggin

unread,

Oct 11, 2007, 11:10:05 PM10/11/07

to

On Friday 12 October 2007 02:23, Mr. Berkley Shands wrote:
> With DEBUG_SLAB on, I can run only a very short time under 2.6.23
> before a kernel panic.
>
> [ 626.028180] eth0: too many iterations (6) in nv_nic_irq.
> [ 626.167583] eth0: too many iterations (6) in nv_nic_irq.
> [ 626.206729] eth0: too many iterations (6) in nv_nic_irq.
> [ 626.400171] eth0: too many iterations (6) in nv_nic_irq.
> [ 626.446374] eth0: too many iterations (6) in nv_nic_irq.
>
> followed by the crash at
> forcedeth:nv_nic_irq_optimized+0x89
> handle_IRQ_event+0x25
> handle_edge_IRQ+0xe3
> do_IRQ

OK, thanks. Probably we should try to fix this problem before
attempting the more elusive looking corruption.

CC'ed forcedeth maintainer and netdev.

Can you reiterate what your hardware is (including lspci)?