how to minimize interrupt latency using interrupt affinity in Vist

pgruebele

unread,

Nov 11, 2008, 9:31:00 AM11/11/08

to

I have a PCI device for which I wrote a standard PnP WDM driver. This is a
timer device and based on timer interrupts the ISR needs to do a very short
amount of processing - no DPCs need queueing and only the PCI board itself is
read/written in the ISR.

My requirement is that the ISR be called within .2ms of hardware interrupt.
However, I am seeing jitter of up to at least 1ms (ISR gets called up to 1ms
after hardware interrupt). I understand that Vista (latest updates) is not a
RTOS and that other ISRs can take up to .25ms (according to tracelog) and
that under load conditions, several of these other ISR calls may be handled
before my driver's ISR. But I did the following in order to make sure that
my ISR runs on a processor core which is not ever used by other ISRs:

1. I used the interrupt affinity policy tool to change the processor
affinity for all device drivers (except mine) to 0x3 (this is on a Q6600
4-core system).

2. I used the affinity policiy tool to set my driver's affinity to 0x4

3. Using tracelog, I verified that under system load, only cores 0 & 1 are
used by other drivers' interrupts and that only core 3 is used by my ISR.

In spite of this, I am getting large jitter in ISR latency for my driver (up
to more than 1ms) under system load (using HD, Direct3D etc) - even though my
device is the only device generating interrupts on core 3!

I have the following questions:

1. Does anyone have any idea why this might be the case?

2. On another note, even though I set affinity to 0x3 for all drivers except
mine, they only generate interrupts on core 0 (instead of cores 0 & 1). Why
is that?

3. How can one change Irql and SynchronizeIrql for the interrupt. The
documentation only states that these values are supposed to be taken from
CM_PARTIAL_RESOURCE_DESCRIPTOR. What if I want my IRQL to be very high so
that my driver ISR will preempt other ISR's even if they are running on the
same cpu core? If this is possible then I would not have to prevent all
devices except for mine from generating interrupts on core 2.

Thanks

Philip

Eliyas Yakub [MSFT]

unread,

Nov 12, 2008, 2:54:38 PM11/12/08

to

1) I suggest using the new Xperf tool from Microsoft
(http://msdn.microsoft.com/en-us/library/cc305187.aspx) to analyze perf
issues.
2) You cannot configure IRQL value of your interrupt. You have to work with
whatever system assigns for your device.
3) I have forward this message to some folks I know. I will let you know if
they have any idea.

-Eliyas

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message
news:C10C18D1-8B22-42E1...@microsoft.com...

pgruebele

unread,

Nov 12, 2008, 11:56:01 PM11/12/08

to

Thanks Eliyas.

Xperf is pretty nice but it is missing lots of the DPC/ISR details that the
tracelog provides. Actually, it would be VERY useful if the CPU or ISR usage
in Xpref would include context switches due to interrupts and DPC (so that
one can see nested interrupts etc).

I'm looking forward to getting more information from you once your sources
get back to you.

Regards

Philip

Eliyas Yakub [MSFT]

unread,

Nov 13, 2008, 10:55:36 PM11/13/08

to

Xperf has both the DPC and ISR info and is a superset of what tracelog
provides. It also exposes every single context switch including those due to
DPCs/ISRs.

This is the official forum for discussing windows performance tools:
http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/threads

-Eliyas

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:A2064E9A-80AE-4D24...@microsoft.com...

pgruebele

unread,

Nov 14, 2008, 10:51:03 AM11/14/08

to

Thanks.

I moved my xperf post to that forum...

Still looking forward to hearing from you and your sources :-)

Eliyas Yakub [MSFT]

unread,

Nov 14, 2008, 3:12:35 PM11/14/08

to

I will summarize the discussion that I witnessed between two engineers.

-----------

We are not able to clearly explain the reason for this jitter. If your
device is the only interrupting device on processor 3, it should have very
low interrupt latency. Setting affinity to 0x3 allows the hardware to
select between core 0 and core 1 at the time of each interrupt. The
hardware is supposed to choose the “lowest priority” processor but the
algorithms it uses differ from machine to machine. In many cases, core 0 is
the tie-breaker and lots of interrupts end up on core 0.

The best answer we can come up with for why this is happening is that there
is some other kernel activity going on – kernel mode workers that are
disabling interrupts, significant IPI traffic, etc. Maybe from some HD
software? Xperf traces are a good place for you to start. If you are
running on x86 the latency impact of this kind of activity is probably
higher than you are running 64 bit.

----------

So my suggestion would be that you use Xperf to figure out what's going on.
If you can't do that yourself, contact MS DDK support and give the log file
of Xperf and they will be able to work with the kernel engineer to analyze
the trace and identify the problem.

Good Luck.

-Eliyas

pgruebele

unread,

Nov 14, 2008, 3:34:04 PM11/14/08

to

Thanks for the info.

The thing is that all interrupts appear to go to CPU0 and most hardware
device DPCs appear to go to CPU0 as well, so there should not be a whole lot
of IPIs, right?

The only DPCs that go to all cores are those from tcpip.sys and usbport.sys,
both of which are very short and infrequent and should not cause this type of
latency even if they cause IPIs.

There must be something else going on... any ideas would be appreciated...

Thanks

Philip Gruebele

pgruebele

unread,

Nov 18, 2008, 1:48:02 PM11/18/08

to

Hi Eliyas.

I just want to confirm some assumptions in relation to this interrupts
latency issue I am having:

1. each processor/core handles interrupts indepedently of the others.

2. if my device is the only device generating interrupts on core2 (other
devices are on core 0), then no matter what my device's IRQL is set to, its
interrupt should always be serviced immediately since no other interrupts of
higher IRQL should ever be running on core2.

3. if a device driver ISR or DPC temporarily disables interrupts or performs
other similar actions, it will only disable these for the core that it is
running on (core0), and should have no effect on my ISR running on core2.

4. after my device generates an interrupt but before the kernel actually
calls my ISR, does the kernel try to acquire any spin locks which could
explain this latency?

Thanks in advance

Philip Gruebele

pgruebele

unread,

Nov 18, 2008, 4:28:00 PM11/18/08

to

I have some further data which shows a concrete example of unexplainable
incidents of excessive interrupt jitter.

Running Xperf gives me the exact sequence of all system interrupts. With it
and excel, I am able to look at the time delta between each of my interrupts
(nirlpk driver, the only one interrupting on cpu core 2). I ran a lot of
disk intensive file searches and managed to make my interrupt get skipped
altogether. The table below shows this. nirlpk is my driver and it is the
ONLY one generating interrupts on core 2. Most of the nirlpk interrupts (not
shown in table) are almost exactly 1.25ms apart as they should be (this is
the period at which the hardware generates interrupts). However, the 2
nirlpk ISR calls in this part of the xperf table are 2.86ms apart. Note that
there is no ISR activity between 28409.61188 and 28411.48736, so there is
absolutely no reason why my interrupt should have been called so late.

DRIVER core ISREnterTime ISRExitTime
_________________________________________________________
nirlpk.sys 2 28408.6198 28408.63592
USBPORT.SYS 0 28408.66316 28408.66808
HDAudBus.sys 0 28408.68044 28408.69992
USBPORT.SYS 0 28409.1872 28409.1902
HDAudBus.sys 0 28409.19404 28409.2018
ubohci.sys 0 28409.21444 28409.21864
ubohci.sys 0 28409.22256 28409.22848
ubohci.sys 0 28409.2426 28409.24764
USBPORT.SYS 0 28409.32928 28409.33412
HDAudBus.sys 0 28409.33704 28409.34732
ubohci.sys 0 28409.59304 28409.59644
ubohci.sys 0 28409.5994 28409.6024
ubohci.sys 0 28409.605 28409.61188
nirlpk.sys 2 28411.48736 28411.50836
ubohci.sys 0 28411.48976 28411.50056

How is this explained? These skipped or delayed ISR calls happen much more
frequently with system activity. Yet the table below shows that there was no
ISR activity before my delayed ISR call at 28411.48736. So, even if my ISR
were being called on core 0 like the other ISRs, there would be no reasonable
explanation of why my ISR is called so late...

Why is the kernel taking so long to call my isr? Once again, the ISR does
not do any serious processing. All it does is acknowledge the interrupt and
check for interrupt overrun (which it got in the example above). I can
therefore say with some certainty that the problem must lie with either the
kernel or interrupt hardware (modern Q6600 680Sli ACPI machine).

Regards

Philip Gruebele

Pavel A.

unread,

Nov 18, 2008, 4:39:44 PM11/18/08

to

pgruebele wrote:
> I have some further data which shows a concrete example of unexplainable
> incidents of excessive interrupt jitter.
>
> Running Xperf gives me the exact sequence of all system interrupts. With it
> and excel, I am able to look at the time delta between each of my interrupts
> (nirlpk driver, the only one interrupting on cpu core 2). I ran a lot of
> disk intensive file searches and managed to make my interrupt get skipped
> altogether. The table below shows this. nirlpk is my driver and it is the
> ONLY one generating interrupts on core 2. Most of the nirlpk interrupts (not
> shown in table) are almost exactly 1.25ms apart as they should be (this is
> the period at which the hardware generates interrupts). However, the 2
> nirlpk ISR calls in this part of the xperf table are 2.86ms apart. Note that
> there is no ISR activity between 28409.61188 and 28411.48736, so there is
> absolutely no reason why my interrupt should have been called so late.

>.........

Xperf doesn't see SMIs, right?

--PA

Scott Noone

unread,

Nov 18, 2008, 4:58:27 PM11/18/08

to

"I can therefore say with some certainty that the problem must lie with
either the kernel or interrupt hardware "

Or with some other piece of software. Don't forget that there is the CLI
instruction that will disable maskable interrupts on the processor. This is
used in various parts of the kernel and in some exported APIs (the
ExInterlockedXxx package comes to mind). I'd find it slightly unusual for a
driver to disable interrupts on the processor with any kind of regular
frequency, though it wouldn't surprise me.

Also, the xperf output shown doesn't seem to take into account system
management interrupts (e.g. clock). These would delay your ISR from running
also.

This doesn't really help get you a solution, of course. But I guess the
moral is that Windows is not real time and even though you've affinitized
all your interrupts to a particular processor you still don't own that
processor. Other things can (and will) thwart your attempts to get
consistent results.

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:5712D646-43D3-49FB...@microsoft.com...

>> > is some other kernel activity going on - kernel mode workers that are

pgruebele

unread,

Nov 18, 2008, 5:25:01 PM11/18/08

to

Thanks

The thing is that:

1. doing disk activity such as searching for files greatly increases this
interrupt jitter but this would not cause increases in SMIs, right? So SMIs
don't appear to be the culprit. Other things like tcp traffic seem to have a
similar effect as disk activity.

2. CLI/IF is processor/core specific right? If all ISRs except for mine run
on core 0, then they can only disable interrupts for that core... So other
drivers can't really be causing this (I have verified 100% that only my
driver raises interrupt on core 2).

3. the jitter can reach up to >1.6ms which seems like an awfuly long time.
Since it happens mainly with system load (not just CPU load since interrupts
have priority over all threads...), this means that device driver ISR/DPC
acticity must somehow be causing this indirectly. The question is why?

My application is actually soft-realtime so it can cope with these timing
errors OK. The problem is that when the system gets loaded, this interrupt
jitter becomes so large and frequent that it causes me to have to re-measure
too much data. I don't expect hard realtime performance. I just want to
understand why things are behaving as badly as they are given the
configuration I created...

Thanks

Philip

Scott Noone

unread,

Nov 18, 2008, 6:57:16 PM11/18/08

to

>So other
> drivers can't really be causing this (I have verified 100% that only my
> driver raises interrupt on core 2).

As long as threads can still be scheduled on the processor then there can be
activity on that processor. Are you also changing the affinity of all
threads so that nothing is scheduled on that proc?

Out of curiosity, which O/S is this?

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:5F8A1548-16E3-4824...@microsoft.com...

pgruebele

unread,

Nov 18, 2008, 8:11:00 PM11/18/08

to

This is Vista with the latest bits.

It should not be necessary to change thread/process affinity since
interrupts pre-empt all threads, including realtime priority threads. It is
my understanding that no matter what threads are running on the system,
interrupts are serviced at a level that is higher than any scheduler managed
threads...

Philip

"Scott Noone" wrote:

> As long as threads can still be scheduled on the processor then there can be
> activity on that processor. Are you also changing the affinity of all
> threads so that nothing is scheduled on that proc?
>
> Out of curiosity, which O/S is this?
>
>
>

Scott Noone

unread,

Nov 18, 2008, 8:27:32 PM11/18/08

to

You're correct, your interrupt will interrupt any thread executing on the
processor. However, the overhead of scheduling and dispatching on the
processor could delay your interrupt from being delivered.

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:68AE9531-67DD-47F4...@microsoft.com...

pgruebele

unread,

Nov 18, 2008, 9:16:00 PM11/18/08

to

Could it delay it by 1.5ms though? All the interrupt does is store the
processor context on the stack and call the interrupt vector - no matter how
many threads are running. Even if the system idle thread is running the
result should be the same as when there are many high priority threads
scheduled to run...

Philip

Pavel A.

unread,

Nov 18, 2008, 10:00:16 PM11/18/08

to

How you measure the time of the actual hardware interrupts
and when the ISR runs - by some external timer or by the windows time
(whatever it is) ?
--PA

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:A62C9CC9-5E0A-44D2...@microsoft.com...

pgruebele

unread,

Nov 18, 2008, 10:21:01 PM11/18/08

to

The board that is generating the interrupt is a NI6602 PCI timer board. So,
the ISR simply reads the timer register to see how long ago the timer
actually rolled over and caused an interrupt. This means that the only
overhead in the ISR is a few 32 bit PCI bus reads. I played around with
maximum pci express transfer length (reducing it from the default 4096), but
that made no difference. Also, xperf seems to be seeing the same jitter as I
am, and my ISR is always very short (<20us, ~5us) so that rules PCI bus
contention problems out. Originally I though PCI bus contention was causing
my ISR's PCI bus reads to take a really long time, but that's not the case.

Anyway, I'm pretty sure that xperf and my ISR timer data are reporting
correct data to me... I just can't find a reasonable explanation for why
this is happening...

I also thought that about 2-3 months ago this was all working perfectly with
jitter <100us. I don't know why now this is happening. I can't be 100% sure
that this was working, but I have a funny feeling that perhaps some Vista
"reliability" fixes that MS released in the last few months might have
changed something which causes this jitter... This is just a hunch and may
well be wrong. Food for though...

Philip

pgruebele

unread,

Nov 18, 2008, 10:29:01 PM11/18/08

to

I also tried rolling back all driver installations to see if any driver could
be misbehaving. This did not help. The thing is that pretty much any
activity on the computer (network, disk, graphics, even cpu) seem to increase
this jitter considerably. It does not seem to be tied to any individual
driver/hardware.

Philip

pgruebele

unread,

Nov 18, 2008, 10:33:01 PM11/18/08

to

One difference is that the timer board is not on interrupt 18 instead of 10
(I think that's where it used to be). I don't think that IRQ should have any
effect on the things that are happening...

Pavel A.

unread,

Nov 19, 2008, 6:40:54 AM11/19/08

to

pgruebele wrote:
> I also tried rolling back all driver installations to see if any driver could
> be misbehaving. This did not help. The thing is that pretty much any
> activity on the computer (network, disk, graphics, even cpu) seem to increase
> this jitter considerably. It does not seem to be tied to any individual
> driver/hardware.
>

Then, maybe this is effect of IPIs or something like that.
xperf won't see these.

--PA

Scott Noone

unread,

Nov 19, 2008, 9:51:52 AM11/19/08

to

> Could it delay it by 1.5ms though?

I would say it could delay it N units. The system is non-deterministic,
there are way too many factors here.

>All the interrupt does is store the processor context on the stack and call
>the interrupt vector - no matter how many threads are running.

I guess I'm not making the point clear on this one. If there is a thread
actively executing on the processor at an IRQL < your device IRQL, then
sure, the thread is interrupted and your ISR runs. However, during that
thread's time slice it could do things to prevent your ISR from being
delivered, notably CLI or raising the IRQL.

You also have the clock, SMIs (which are used for all kinds of weird
purposes, including fixing platform issues), IPIs (cross processor
scheduling, TLB shootdown), and whatever overhead xperf has (meant to be
minimal, but it's there).

If it were my job to track down this latency, I'd try this across more
systems with different processors and see if there was a difference. Also, I
think Eliyas recommended trying this on x64, which would also be
interesting.

If this this business critical I'd recommend ditching the software based
efforts and getting an Arium: http://www.arium.com/. That should help get a
much better picture as to how the processor is spending its time during the
delay.

HTH and good luck!

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:A62C9CC9-5E0A-44D2...@microsoft.com...

pgruebele

unread,

Nov 20, 2008, 1:16:05 PM11/20/08

to

Only Kernel threads could use CLI or raise IRQL high enough to mask my
interrupt right? And they would have to run on the same cpu core in order to
have any effect as well, right?

Are there any software tools that will measure SMI, IPI, CLOCK, and TLB
shootdown occurences? I can't find any windows related info on TLB
shootdown...

Anyway, I am doing a clean vista install with SP1 but no further updates to
see if this problem really did start with some of the newer updates or
perhaps some driver update that I am not aware of.

I will report back on my findings...

-Philip

pgruebele

unread,

Nov 20, 2008, 1:34:06 PM11/20/08

to

An easy solution would be to hijack a logical processor myself and just use
busy polling. This would eat an entire logical processor but I would think
that it would work.

If I use a CPU with hyper threading such as the new i7 4 core processor,
each core has 2 hyperthreading logical processors. Each of those
hyperthreading logical processors has its own interrupt registers and related
hardware. If I hijack one of those hyperthreading logical processors and use
it to busy poll my timer so that I remove most of the jitter problem, I
wonder if this busy polling will consume the entire core or whether it will
be able to coexist peacefully with another thread running on the same
processor core... I guess this is an intel architecture problem. The busy
polling thread would loop around a few simple integer instructions, so it
seems like this might work...

Just a though. With cores and hyperthreading increasing the way it is, it
seems like a better solution than adding a RTOS kernel to vista or trying to
figure out how to resolve the interrupt jitter problem, right?

-philip

Scott Noone

unread,

Nov 20, 2008, 3:25:13 PM11/20/08

to

>Only Kernel threads could use CLI or raise IRQL high enough to mask my
>interrupt right?

When a thread is running in user mode it can invoke a system service. That
will transition the mode of thread from the least privileged (user) to the
most privileged (kernel) and begin executing inside the O/S. Various device
drivers may be called while processing the system service, and they may do
either of the above. Also, the O/S may do either of the above during this
process also.

Page faults can come into play here too, the O/S may invoke the storage
stack to bring pages in and that will involve everything from the file
system all the way down to the storage controller (and any filters above or
in between).

> Are there any software tools that will measure SMI, IPI, CLOCK, and TLB
> shootdown occurences?

Not that I'm aware of, though I've never had to so I've never gone looking.

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:A05DD66F-24DD-4A4E...@microsoft.com...

pgruebele

unread,

Nov 20, 2008, 8:01:03 PM11/20/08

to

On a general note, is it reasonable to have over 1.5ms latency for an
interrupt call? Wouldn't some hardware devices have problems with so much
latency? We are talking millions of CPU instructions here, an eternity in
hardware terms...

-philip

Scott Noone

unread,

Nov 21, 2008, 11:00:04 AM11/21/08

to

I could say what I *feel*, but that wouldn't be an engineering answer since
I've never taken the time to measure this across a variety of platforms. I'm
sure someone else here has at some point, so maybe someone can give a more
informed opinion. If not, I'm sure your findings will be interesting.

-scott

--
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

"pgruebele" <pgru...@discussions.microsoft.com> wrote in message

news:3A4E0F31-C9BC-41DB...@microsoft.com...