Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Processor Queue Length...

1 view
Skip to first unread message

Holger Thiesemann

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to
Hi,

looking for some information about performance monitoring on Windows NT
computers, I found the following note (www.conitech.com/windows):

"A special note - Performance Monitor cannot display an accurate count of
Processor Queue Length unless another thread-related counter is being
watched. So, whenever you watch Processor Queue Length, be sure to watch one
of the counters listed under the Thread header. "

The obvious question: what's going on with the performance monitor, that I
need to use another thread related counter to get meaningful numbers?

Thanks for your input!!!

Holger

QuestionExchange

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to
Hello, there!
I'm not 100% sure that this is correct, but I have done a
little research on this.
If Performance Monitor is watching the current thread to see
how it is running, then it would also be monitoring itself
(performance monitor). On a normal, everyday level,
performance monitor does not run, so having it watch a system
when performance monitor is running means that you are watching
the normal load on the system PLUS performance monitor, thereby
giving you inaccurate results.
The best way that I have been told to monitor these types of
counters is to do it from a different NT system so that having
Performance Monitor run on the primary system does not affect
the system performance figures.
I'm sorry if this seems convoluted. It's kinda tough to
describe.
Scott

--
This answer is courtesy of QuestionExchange.com
http://www.questionexchange.com/showUsenetGuest.jhtml?ans_id=6881&cus_id=USENET&qtn_id=6883

James C. Owens

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to
On Mon, 1 Nov 1999 14:30:34 +0100, "Holger Thiesemann"
<holger.t...@dzsh.landsh.de> wrote:

>Hi,
>
>looking for some information about performance monitoring on Windows NT
>computers, I found the following note (www.conitech.com/windows):
>
>"A special note - Performance Monitor cannot display an accurate count of
>Processor Queue Length unless another thread-related counter is being
>watched. So, whenever you watch Processor Queue Length, be sure to watch one
>of the counters listed under the Thread header. "
>
>The obvious question: what's going on with the performance monitor, that I
>need to use another thread related counter to get meaningful numbers?
>
>Thanks for your input!!!
>
>Holger
>

This is not exactly true, as some of the counters under the "objects"
heading will suffice, as well, such as "Threads", which displays the
number of threads currently on the system.

BTW, it's easy to tell whether the processor queue length counter is
working correctly or not. If it is inactive, it will always display a
value of zero. A non-zero value indicates the counter is active and
working properly.

I believe the reason that you have to monitor a thread related counter
to monitor processor queue length has to do with the slight kernel
overhead required to keep track of the processor queue length. This is
constructed from the thread states in the processor queue. Unless you
are monitoring some thread related counter, the kernel is not going to
spend the resources to calculate the queue length.

To be honest, I think the overhead required on a modern PII or PIII
machine amounts to less than 0.1% processor time, so the overhead
issue is a moot point now.

Any heavy NT kernel experts out there to expand on this?

Some of my own thoughts below. (Sorry for the length, but I have quite
a bit to say on this subject.) Many of the concepts below apply other
SMP operating systems, not just to NT.

There are varying opinions on what value of the processor queue length
constitutes a "processor bottlenecked" system, and, unfortunately, the
answer is not straightforward and involves more than the processor(s).
I use a general rule *to start* that the average processor queue
length should not exceed 2*N at a processor utilization of 80%, where
N is the number of processors in the system. Remember that the NT
scheduler uses a single processor queue to feed all processors, so
this rule uses that point. Some further discussion is warranted here.

The average processor queue length has some correlation to processor
utilization, but measures a different aspect. It really measures
thread scheduler bottlenecking. Consider two different dual processor
SMP systems running at 75% processor utilization. The first system has
two threads that are active eating up almost all of the 75% processor
utilization. This system will probably show an average queue length of
about 1.0, and indicates that the NT scheduler is coping well with
scheduling threads to processors (as it should with only two active
threads to deal with). The second machine is running a highly threaded
set of applications that have 20 active threads all competing for the
processors. A measurement of this system's processor queue length
would probably show the value at about eight or so, yet processor
utilization is not maxed out! This indicates that there are waiting
threads, yet they are not fully utilizing processor capability. Why?!
The reason is that the kernel scheduler has become a bottleneck to the
system, and there are so many preemptions, etc. associated with the
large number of active threads, that the scheduler cannot keep up...
i.e. the scheduler can't fill all of the processors' time with threads
to execute, so the processors sit idle some of the time waiting on
threads to execute.

This indicates several interrelated issues:

(1) Monitoring of the "processor queue length" counter becomes ever
more important the more processors that are in an SMP system, because
the scheduler has to work harder and harder as the number of
processors increase to keep all of the processors busy. The counter is
less important than simple processor utilization in a uniprocessor
system for this same reason - the scheduler is really not stressed
hard in a single processor system.

(2) The bandwidth available to the scheduler for the thread queuing
structures is almost always limited by L2 cache bandwidth, because the
structures are singular in nature (i.e. one queue structure - a common
queue - for all processors in the system), and must be maintained in a
consistent state system-wide - i.e. if in L2 cache - we are talking
about significant snooping here on the PII and PIII systems with
separate L2 caches for each chip. In any case significant emphasis
gets placed in the end on not only the backside bus, but the frontside
GTL+ bus for inter-processor communication and communication with main
memory to maintain kernel scheduling structures coherent. The
geometric increase in the inter-processor bus bandwidth required for
cache coherency, coupled with the N*processor "queue velocity" of the
common processor queue (which requires ever more bandwidth to support)
and the limitations on frontside bus and main memory bandwidth, shows
why SMP scaleability is limited without clever inter-processor bus
enhancements and bandwidth increases. As a side note, the L2 bandwidth
and main memory bandwidth is even more constrained on the older Socket
7 architecture with a common, lower speed, direct-mapped L2 for two
processors and lower speed EDO or FPM memory. Fortunately very few
Socket 7 SMP implementations use more than two processors.

(3) Another related counter to look at is the "context switches per
second", as this is also a measure of how hard the scheduler is
working. High context switches per second almost always correspond to
a relatively high value in the processor queue length.

(4) Applications *can* be over-theaded. The properly programmed
multi-threaded application creates an appropriate number of threads
depending on the number of processors found in the system. Using
*more* threads than is necessary to fully utilize processor
parallelism eats up resources, over-stresses the kernel scheduler, and
can actually result in a *reduction* in performance due to scheduler
bottlenecking. This condition is especially important to watch for in
application/middle tier servers, which typically run highly
multithreaded database engines, applications, and web-servers.

In summary:

Situation 1: High values for context switches per second (consistently
>~5000/sec on a typical PII/PIII SMP system with two or more
processors) *and* relatively high numbers (consistently greater than 2
times the number of processors) for processor queue length occurring
*with* processor utilization *less* than 100% indicates a kernel
scheduler bottleneck that would most likely be resolved by reducing
the number of contending threads for processor time, or using a
motherboard with improved inter-processor/main memory bandwidth.
(Reducing the number of contending threads is obviously a software
optimization issue. A good example of improving the MB would be to go
from a 440NX server MB to the newer Profusion based MB's.)

Situation 2: Processor time is at 100%, with moderate values of
context switches per second and processor queue length (i.e. at or
less than 2*N on average), then the processors themselves are the
bottleneck, and direct system improvements can be achieved by using
*faster* processors. If processor queue length is moderate, additional
performance can also be achieved by adding processors to the MB
(observing the usual licensing and stepping issues), as long as the
applications auto-tune, or can be manually tuned to optimally use the
new number of processors.

Situation 3: Processor time is at 100%, but context switches per
second and processor queue length are also high. This situation is a
mix of 1 and 2. In this case, an increase in processor speed *may*
help. An increase in the number of processors most likely will *not*
improve performance, as this will place a higher load on the scheduler
and result in situation 1.

Sorry for the long-winded post. I hope some of this is helpful.

Regards,

James C. Owens
owe...@bellatlantic.net
james...@earthlink.net
WinNT 4.0 Server (Bld 1381: SP 6) Super P6DBU 2xPII 400 MHz 256 MB RAM
SuSE Linux 6.1 (kernel 2.2.7) Tyan S1563D 2xP55C 166 MHz 192 MB RAM

0 new messages