Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

unexplained high load report

223 views
Skip to first unread message

Keith Keller

unread,
Nov 22, 2012, 12:21:56 AM11/22/12
to
Hi all,

I recently put a new fileserver into service, and am a bit confused by
an unexplained high-ish load. I am wondering what sorts of things I
should look for next.

Some background: this is a newly installed CentOS 6.3 box running nfsd
and smbd. It was updated last week from yum, so should have all the
major updates available, including the latest kernel. It has a 3ware
controller that supports an external disk array, currently with an
11-disk RAID6 under lvm. During pre-deploy burn-in I didn't find any
problems.

While under seemingly no load, the actual load is reported at around 4:

20:55:03 up 4 days, 23:14, 1 user, load average: 3.95, 3.96, 3.91

The previous server would not get that high unless there was significant
disk activity. But at the time I took this w, there was little disk
activity on the filesystem, and both nfsd and smbd were almost always
in S state. And, somewhat surprisingly, I can notice no performance
degradation in either reads or writes despite the modestly high load;
on the old fileserver I would see a slight speed hit on disk accesses
when the load got this high.

top doesn't show anything unusual, and no process is generally using
more than 2% of the CPU. One example (sorted by CPU, first few lines):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1759 root 20 0 0 0 0 D 1.5 0.0 6:42.60 xfsaild/dm-3
1 root 20 0 19352 1600 1284 S 0.0 0.0 0:01.58 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.09 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:02.08 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:33.47 ksoftirqd/0

The xfsaild daemons (four of them, one for each XFS-mounted filesystem,
I am guessing) are generally in D state, but web searches on this have
usually pointed to a concurrent issue (e.g., other XFS daemons in R
state instead of S), which this box isn't experiencing. I can't find
any other reports of xfsaild in D state being a problem.

powertop reports this. Nothing seems out of the ordinary:

Cn Avg residency
C0 (cpu running) ( 1.6%)
polling 0.0ms ( 0.0%)
C1 mwait 0.3ms ( 0.0%)
C2 mwait 0.9ms ( 0.1%)
C3 mwait 0.6ms ( 0.0%)
C4 mwait 22.0ms (98.4%)
P-states (frequencies)
2.40 Ghz 0.0%
2.31 Ghz 0.0%
2.21 Ghz 0.0%
2.10 Ghz 0.0%
2.00 Ghz 0.0%
1.91 Ghz 0.0%
1.80 Ghz 0.0%
1.71 Ghz 0.0%
1.60 Ghz 0.0%
1500 Mhz 0.0%
1400 Mhz 0.0%
1300 Mhz 0.0%
1200 Mhz 100.0%
Wakeups-from-idle per second : 45.9 interval: 15.0s
no ACPI power usage estimate available
Top causes for wakeups:
45.0% (641.7) <interrupt> : extra timer interrupt
21.6% (307.3) <kernel core> : hrtimer_start (tick_sched_timer)
7.0% (100.0) xfsaild/dm-1 : xfsaild (process_timeout)
7.0% (100.0) xfsaild/dm-2 : xfsaild (process_timeout)
7.0% ( 99.7) xfsaild/dm-0 : xfsaild (process_timeout)
7.0% ( 99.5) xfsaild/dm-3 : xfsaild (process_timeout)
1.3% ( 18.3) usbhid-ups : hrtimer_start_range_ns
(hrtimer_wakeup)
1.2% ( 17.1) <kernel core> : hrtimer_start_range_ns
(tick_sched_timer)
0.5% ( 6.9) <interrupt> : ahci
0.3% ( 4.7) <interrupt> : ehci_hcd:usb2
0.3% ( 4.0) USB device 2-1.1 : Smart-UPS 3000 RM FW:666.6.D USB
FW:2.4 (American Power Conversion)

[snip]

The only difference I can find so far is the number of nfsd processes:
the new box is currently running only 8, whereas the old box ran 32. I
do plan on updating this, and perhaps that will repair the issue, but
I'd expect to see more nfsd processes in D state if this were an issue;
every time I look they are almost always in S. (If I know I'm writing a
large file I can rarely catch an nfsd move to D state.)

So: what have I missed, and what else can I check to give me more
information? Or should I not be so worried about a high load that's not
materially impacting the system?

--keith

--
kkeller...@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

Richard Kettlewell

unread,
Nov 22, 2012, 5:19:39 AM11/22/12
to
Keith Keller <kkeller...@wombat.san-francisco.ca.us> writes:
> While under seemingly no load, the actual load is reported at around 4:
>
> 20:55:03 up 4 days, 23:14, 1 user, load average: 3.95, 3.96, 3.91
>
[...]
> The xfsaild daemons (four of them, one for each XFS-mounted filesystem,
> I am guessing) are generally in D state, but web searches on this have
> usually pointed to a concurrent issue (e.g., other XFS daemons in R
> state instead of S), which this box isn't experiencing. I can't find
> any other reports of xfsaild in D state being a problem.

Four processes (actually kernel threads in this case, I think) in ‘D’
certainly explains a load average of nearly four. Why they’re in that
state I couldn’t say.

--
http://www.greenend.org.uk/rjk/

Sylvain Robitaille

unread,
Nov 22, 2012, 11:09:33 AM11/22/12
to
On Thu, 22 Nov 2012 10:19:39 +0000, Richard Kettlewell wrote:

> Four processes (actually kernel threads in this case, I think)
> in ???D??? certainly explains a load average of nearly four.

I agree. Keith, I'm reasonably certain you want to be looking more
closely there. Although, I don't use XFS and so can't claim to have any
expertise with it, it doesn't seem to me as though this condition would
be considered "normal".

--
----------------------------------------------------------------------
Sylvain Robitaille s...@encs.concordia.ca

Systems analyst / AITS Concordia University
Faculty of Engineering and Computer Science Montreal, Quebec, Canada
----------------------------------------------------------------------

Keith Keller

unread,
Nov 22, 2012, 9:15:27 PM11/22/12
to
On 2012-11-22, Sylvain Robitaille <s...@alcor.concordia.ca> wrote:
> On Thu, 22 Nov 2012 10:19:39 +0000, Richard Kettlewell wrote:
>
>> Four processes (actually kernel threads in this case, I think)
>> in ???D??? certainly explains a load average of nearly four.
>
> I agree. Keith, I'm reasonably certain you want to be looking more
> closely there. Although, I don't use XFS and so can't claim to have any
> expertise with it, it doesn't seem to me as though this condition would
> be considered "normal".

Thanks Richard and Sylvain--that was my hunch as well. I will check
with the XFS mailing list.

sergei.j...@gmail.com

unread,
Nov 23, 2012, 4:42:41 AM11/23/12
to
Expecting the same issue after kernel upgrade to 2.6.32-279.14.1.el6.x86_64
High load in uptime and top. But server works fine, iostat, vmstat help

Keith Keller

unread,
Nov 23, 2012, 3:11:58 PM11/23/12
to
On 2012-11-23, sergei.j...@gmail.com <sergei.j...@gmail.com> wrote:
> Expecting the same issue after kernel upgrade to 2.6.32-279.14.1.el6.x86_64
> High load in uptime and top. But server works fine, iostat, vmstat help

My understanding is that there was a patch in recent RHEL/CentOS
kernels that fixed a bug by introducing this behavior. There was a real
bug fix to the main line kernel that hasn't made it to the RHEL/CentOS
kernels yet.

See here:

http://oss.sgi.com/pipermail/xfs/2012-November/022767.html

It's not totally clear to me whether there is actually a problem, or
whether it's safe to ignore the higher load until there's a patch. You
could probably downgrade if you wanted, but unless I hear otherwise I
plan on waiting, since the patches did fix some fairly serious bugs.

sergei.j...@gmail.com

unread,
Nov 25, 2012, 6:16:15 AM11/25/12
to
My personal choice was to migrate to ext4
0 new messages