Our server (Sun Netra 1280, Solaris 2.8) always has more than 20% iowait. It
looks really bad. I am trying to find a way to identify which cause high
iowait.
Any idea will be greatly appreciated. Thanks in advance!
Evan
> Our server (Sun Netra 1280, Solaris 2.8) always has more than 20% iowait. It
> looks really bad.
I/Owait is not always a problem. Why do you think it is bad in this
case?
> I am trying to find a way to identify which cause high iowait.
Whenever your CPU has some idle time, and at least one thread has an
outstanding I/O call, you'll accumulate I/O wait time.
If you have very little CPU needs but a lot of I/O needs (think of a
database serving lots of simple queries), then you'd probably see lots
of iowait time.
--
Darren Dunham ddu...@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Why is it that folks *always* assume that IOWait is bad?
I wrote a doc on this a bit over a year ago, have a read of
http://sunsolve.sun.com/search/document.do?assetkey=1-9-75659-1
Because of all of the misunderstanding associated with IOwait, it is
defined to be 0 in Solaris 10.
The really important thing to take away from that document is that
IOWait is a subset of idle. You only get IOwait time if there is nothing
else ready to run from the dispatch queues.
alan.
--
Alan Hargreaves - http://blogs.sun.com/tpenta
Kernel/VOSJEC/Performance Engineer
Product Technical Support (APAC)
Sun Microsystems
That's the pinnacle of wrong answers. A hard-coded zero, I mean.
Hey, scan rate doesn't mean crap either, even though I read more
articles in this group from folks that say *any* scan rate is bad.
Should I assume Solaris 11 will have that hard-coded to zero too?
A loose cough doesn't mean you have pneumonia. But that doesn't mean
you should ignore it either.
Rich
If I run NCPU cpu-bound threads on a machine, I will never see
any IOwait regardless of how many thousands of other threads that
block waiting for IO... and if I write a program that spawns NCPU
threads on an otherwise idle machine and those threads do nothing
but random reads from a dvd, I'll see 100% iowait on every cpu.
Because this statistic has virtually no meaning on an MP, iowait
per cpu is no longer reported.
- Bart
Have you tried "iostat/sar" to see on which disk huge number of I/O
operation are executed. "iostat -P" can even give report for each disk
partition. Then you may decide which application program causes that based
on your knowledge of applications on your server.
Use "serv" (service time, in fact it's response time) in their report as an
indicator of disk I/O performance, if it keeps great than 30 (ms), you may
think about to distribute I/O operations on different disk, use disk mirror
and etc.
Regards,
Michael
"music4" <mus...@163.net> wrote in message
news:cveml7$c...@netnews.proxy.lucent.com...
Alan,
Thanks for the article. Now, I understatnd what iowait means. But when CPU
is occupied by a thread that is waiting for IO, can CPU be used by other
threads?
If not, although CPU is idle (do nothing but wait), I will think wait is
also busy. High iowait means a lot of CPU time are idle but can not be used
to process other threads. That's reason why I feel high iowait is bad. And
therefore I want to analysis why iowait is hight, and try to reduce iowait
to make more CPU time to be available for other threads.
Correct me please.
Evan
"music4" <mus...@163.net> wrote in message
news:cvh4mf$r...@netnews.proxy.lucent.com...
Hello ?!?
Iowait means precisely that the CPU is available.
Is English your mother tongue? If not, you should
focus your efforts towards learning the language
so you can read manuals profficiently ;-)
dk
I am not an English speaking man. I need to improve my English skill. But
have you read Alan's artical about how iowait is calculated? My
understanding was based on Alan's article rather than the word "IOwait".
According to Alan's article, there are four values for CPU statistic: idle,
kernal, user and wait. When a thread is waiting for IO, wait counter is
increased. But if CPU can be occupied by other thread, kernal or user
counter will also be increased. Is that true?
>That's the pinnacle of wrong answers. A hard-coded zero, I mean.
The placeholder value is left there because so many tools depend
on looking at it. "0" is about as meaningful as the current value.
If you want to know about I/O, use iostat. I/O wait is really
only a measure of the relative time needed to process the data
versus the time needed to get it off disk. A characteristic
of the workload.
Also, when you start a CPU bound job, your I/O wait suddenly
drops to zero; yet the jobs which want the I/O are still
waiting just as much. That doesn't strike me as useful.
>Hey, scan rate doesn't mean crap either, even though I read more
>articles in this group from folks that say *any* scan rate is bad.
>Should I assume Solaris 11 will have that hard-coded to zero too?
No, a scanrate is a meaningful indicator of the system being low
on memory. (Having just seen a 2-way Opteron system stressed with
a scan rate of 5 million, I wouldn't say it's quite meaningful)
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Whatever. It's a no-win. Both values are useless.
> If you want to know about I/O, use iostat. I/O wait is really
> only a measure of the relative time needed to process the data
> versus the time needed to get it off disk. A characteristic
> of the workload.
If I'd like to know about I/O, I'll use my own tools that sort
processes by which one is generating the most I/O. Then I'll know what
process is creating the problem. iostat paints with broad strokes and
is used as a next alternative. I would, however, like to see better
per-process I/O metrics.
> Also, when you start a CPU bound job, your I/O wait suddenly
> drops to zero; yet the jobs which want the I/O are still
> waiting just as much. That doesn't strike me as useful.
If you're that CPU bound, you have a bigger issue than the I/O wait and
you should be looking for CPU bound processes and what their problem is
anyway.
> No, a scanrate is a meaningful indicator of the system being low
> on memory. (Having just seen a 2-way Opteron system stressed with
> a scan rate of 5 million, I wouldn't say it's quite meaningful)
>
> Casper
Scan rate is useful in the same way that I/O wait is. In the absence of
better metrics, it must suffice. A per-process average page residency
time would be better. Then, the volatility of the working set of each
process can be examined. Any current measurement for APRT, at the
process or system level, is ad hoc and cannot be taken seriously.
Many think if they see a blip in the sr column of vmstat that they have
a memory shortage. First, the VM system scratching an itch does not
qualify as a shortfall. Second, the CPU power, bus bandwidth and disk
speeds of modern computers allows for a scan rate much higher and for
longer bursts than many will give berth for. A continuous high rate
(and that's a relative measure) is indicative of memory contention. An
extremely high spike over a short period, if it's an aberration, can be
noted, but not acted on. Such high spikes occurring frequently with the
VM system settling back down to quiescence can be disruptive and should
be treated with more physical memory.
>If I'd like to know about I/O, I'll use my own tools that sort
>processes by which one is generating the most I/O. Then I'll know what
>process is creating the problem. iostat paints with broad strokes and
>is used as a next alternative. I would, however, like to see better
>per-process I/O metrics.
So use dtrace; it allows you to do exactly that in S10.