Cycle counter?

Steven Schulze

unread,

Jul 19, 1998, 3:00:00 AM7/19/98

to

Hi!

I need to be able to time very short loops accurately (As in "see how long
it takes to execute". I DON'T want to have a short-interval timer. There's
a big difference). This is important in my code, because I'm writing a DLL
that's a plug-in to a real-time multitrack HD audio recorder. The plug-in
provides audio processing functions to the main program. A single file
sometimes gets 50MB, and this program works with multiple such files.
You can see how a slow loop that operates on each sample can severely
bring down the system.

Anyway, I tried various ways to time the loops accurately, but haven't found
anything accurate enough. The best I found is to use
QueryPerformanceCounter(), which gives me a timing resolution of 838ns,
quite small steps. The problem is not with the timer's resolution, but with
the fact that every time I do the timing, I get different results. This
varies from day to day, and appears to be connected with what mood the OS is
in on that day, and also the position of the moon.

I try to make the timing intervals small, so that my code isn't pre-empted
before the timing period is over. I also tried putting a Sleep(0), or
Sleep(10), call just before I start timing, to make sure the likelihood of
my code being pre-empted is small. I also boost the PriorityClass and
ThreadPriority to it's max level just before the timing starts, and restore
it just after. These techniques does help to some extent, but I still get
results that vary as much as 35% from one to the next run, and from day to
day. Obviously I can't trust this to see if one version of my code is 10%
faster than another or not.

It appears to me as if there's some low-level hardware interrupts that is
happening that I have no control over.

I also looked at the "Zen Timer", but it appears as if that's for DOS only.

So, my simple question is: How can I actually know how many cycles MY code
takes to complete. If I knew that, it wouldn't matter if my code was
interrupted, because the cycle count would be for my code only.

Imagine this feature in VC++:

You compile your code with debug info on, but the release build. Then you
start debugging, and place a breakpoint just before the code you want to
time. When you start single-stepping, you open a debug window (just like
the Watch, Memory, etc), and there you have some options as to what type of
CPU you want to simulate, and you can also reset the cycle counter. Then,
as you single-step (or run up to the next breakpoint), the debugger adds
cycles to the counter, depending on the selected CPU. I see no
reason why it can't do this. It already knows the assembly instructions, it
just needs to be taught how long each one takes, as well as about pairing,
overlapped instructions, etc etc. This would be SO easy to time critical
code then, because you can know EXACTLY how may cycles a piece of code will
take to execute, as well as simulating running it on different CPUs.

When can we expect such a cool feature in VC++? Anyone have any idea? And
why would it NOT be possible to do it? Would it be possible to add a
third-party plug-in to VC++ to do this, and are there any available at this
point?

Any insight, advice, comments welcome.

Steven Schulze
Concord, CA

Felix Kasza [MVP]

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Steven,

> Then, as you single-step [...], the debugger adds
> cycles to the counter

You have correctly summarized the problems inherent in timing small
sections of code, but I don't think your fix will work.

The time to access a *single* aligned DWORD may vary from an apparent
zero cycles (if run in parallel to, say, an FP instruction, with a
memory barrier following) to some 10 million (!) cycles (cache miss, TLB
miss, PTE faulted in from disk, data page faulted in from disk). Unless
you run this on a box with no VM, you are out of luck. Oh, and I hope
you unplugged the network card.

Even if we ignore all this, the debugger won't be able to accurately
keep track, as a single-step or other sort of interrupted flow of
execution means that the instruction queue is empty when your code is
resumed.

On the other hand, in real life, your DLL will also run in a real, busy,
system ... so maybe it would make more sense to run a test series under
heavy load and _then_ measure the total execution time, to arrive at the
slack time left on that config under such and such a load. In that case,
just run the code long enough to average out the asynchronous nature of
interrupts and the like.

--

Cheers,

Felix.

If you post a reply, kindly refrain from emailing it, too.
Note to spammers: fel...@mvps.org is my real email address.
No anti-spam address here. Just one comment: IN YOUR FACE!

Dimitris Staikos

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Steven Schulze wrote in message ...

>I need to be able to time very short loops accurately

Did you try GetThreadTimes? This should give you the amount of time that the
thread spent executing, no matter how many times it was preempted by other
threads. Now, I'm not sure who gets charged by NT for time spent in hardware
interrupts, or exactly how and when NT charges the time (it should be doing
it
either at the end of a quantum, or when the thread goes to sleep/wait
state),
but I think you ought to try this function out and see if the results are
more stable.

Hope that helps.

---------------------------------------------------------------------------
Dimitris Staikos
dsta...@unibrain.com (Business mail only please),
dsta...@softlab.ece.ntua.gr (Personal mail), ICQ UIN 169876
Software Systems & Applications Chief Engineer, UniBrain, www.unibrain.com
---------------------------------------------------------------------------
Any sufficiently advanced bug is indistinguishable from a feature.
Some make it happen, some watch it happen, and some say, "what happened?"
How can we remember our ignorance, which our progress requires,
when we are using our knowledge all the time?
How a man plays a game shows something of his character,
how he loses shows all of it.
---------------------------------------------------------------------------

Steven Schulze

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Felix Kasza [MVP] wrote in message <35b2e04f....@207.68.144.15>...

>Steven,
>
> > Then, as you single-step [...], the debugger adds
> > cycles to the counter
>
>You have correctly summarized the problems inherent in timing small
>sections of code, but I don't think your fix will work.
>
>The time to access a *single* aligned DWORD may vary from an apparent
>zero cycles (if run in parallel to, say, an FP instruction, with a
>memory barrier following) to some 10 million (!) cycles (cache miss, TLB
>miss, PTE faulted in from disk, data page faulted in from disk). Unless
>you run this on a box with no VM, you are out of luck. Oh, and I hope
>you unplugged the network card.
>
>Even if we ignore all this, the debugger won't be able to accurately
>keep track, as a single-step or other sort of interrupted flow of
>execution means that the instruction queue is empty when your code is
>resumed.

Yes, I understand all this, but the idea is to simulate a "perfect" machine
so that what you see is the best possible performance of your code under the
best possible situation. This way, you can concentrate on your code to get
IT'S cycle count as low as possible. There's nothing you can do in your
code to guard against a noisy system, but if you can get YOUR OWN code
optimized as good as possible, then you've done all you can.

BTW, I have a Beta copy of Intel's VTune 3.0, and it actually has a similar
feature, where it will show you in detail a selected range of code's cycle
time, pairing, penalties, etc, etc. It makes some basic assumptions, as in
that the data is already in the cache, etc, etc, and then does a simulation
on the CPU of your choice. Unfortunately it's a little buggy, and it's
REALLY slow. And it's a pain to switch between VC++ and VTune for every
change you make in your code. (About 10 minutes turnaround on my computer).
The fact that it's able to give me this kind of info in the first place
tells me it's definitely possible.

Also, you can give a listing of you disassembled code to a assembly
programming guru, and in a short time he can tell you: "Under perfect
conditions, this code will take xxx cycles to execute". Why can't the
debugger do the same for me?

Steven Schulze
Concord, CA

Steven Schulze

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Dimitris Staikos wrote in message <6ov56p$lss$1...@ulysses.noc.ntua.gr>...

>
>Steven Schulze wrote in message ...
>
>>I need to be able to time very short loops accurately
>
>Did you try GetThreadTimes? This should give you the amount of time that
the
>thread spent executing, no matter how many times it was preempted by other
>threads. Now, I'm not sure who gets charged by NT for time spent in
hardware
>interrupts, or exactly how and when NT charges the time (it should be doing
>it
>either at the end of a quantum, or when the thread goes to sleep/wait
>state),
>but I think you ought to try this function out and see if the results are
>more stable.
>
>Hope that helps.

Excellent suggestion! While this is definitely a good way to go,
unfortunately, I'm using W98, and, of course, it's not available under W98
:(

Oh well...

But thanks for telling me about this function. I was unaware of it.
Sometime in the near future, I should switch to NT, and then I can start
playing with it. Until then, I guess I'm stuck in W98-land...

Steven Schulze
Concord, CA

Felix Kasza [MVP]

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Steven,

> and in a short time he can tell you: "Under perfect
> conditions, this code will take xxx cycles to execute".
> Why can't the debugger do the same for me?

The debugger can't do it because there are not enough people asking for
just that. And while I do not claim any honorific (except "slob",
maybe), I can tell you that I hate cycle-counting. Passionately. I'd
rather have a real result, with imprecision, from a profiler than a
manual count from me. :-\

Felix Kasza [MVP]

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Dimitris,

> Now, I'm not sure who gets charged by NT for time spent in hardware
> interrupts

I fear this won't help, as the time spent in ISRs and such is charged to
the thread currently running. (Not 100% sure, though.)

For a definitive answer, we lean back and wait if Jamie H. notices us.
:-)

bobby sawhney

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Hi Stephen,

If u r using a pentium or above chip, then you may want to use
an opcode which just does what you need, it gets a cycle count
from the chip. Lookup on a RDTSC in a pentium assembly coding
guide.

Here is some code which emits the correct assembly code.
It compiles with VC 5.0 but i am sure you can modify it to
work with any other compiler.

/* cl -W3 -O2 -Ox rdtsc1.c */
#include <stdio.h>
#define RDTSC __asm _emit 0x0f __asm _emit 0x31
static __inline unsigned __int64 get_clock ()
{
unsigned long lo;
unsigned long hi;
_asm {
RDTSC
mov lo,eax
mov hi,edx
}
return (((unsigned __int64) hi)<<32) + lo;
}
int main()
{
unsigned __int64 t0, t1;
t0 = get_clock ();
t1 = get_clock ();
printf ("Cycles elapsed = %I64d\n", t1 - t0);
return (0);
}

-bobby

Steven Schulze wrote in message ...

Steven Schulze

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

Felix Kasza [MVP] wrote in message <35b36338....@207.68.144.15>...

>Steven,
>
> > and in a short time he can tell you: "Under perfect
> > conditions, this code will take xxx cycles to execute".
> > Why can't the debugger do the same for me?
>
>The debugger can't do it because there are not enough people asking for
>just that. And while I do not claim any honorific (except "slob",
>maybe), I can tell you that I hate cycle-counting.

So do I, which is why I'd love such a feature.

>Passionately. I'd
>rather have a real result, with imprecision, from a profiler than a
>manual count from me. :-\

Yes, but while it might be ok for some people, other people might need more
precise info, since their projects might require it.

BTW, I was thinking - the compiler DEFINATELY already know this info, since
it needs it to do the optimizations. Why can't we be privy to this info as
well, if we need it?

Steven Schulze
Concord, CA

Steven Schulze

unread,

Jul 20, 1998, 3:00:00 AM7/20/98

to

bobby sawhney wrote in message ...

>Hi Stephen,
>
>If u r using a pentium or above chip, then you may want to use
>an opcode which just does what you need, it gets a cycle count
>from the chip. Lookup on a RDTSC in a pentium assembly coding
>guide.
>
>Here is some code which emits the correct assembly code.
>It compiles with VC 5.0 but i am sure you can modify it to
>work with any other compiler.

Thanks, I'll DEFINITELY look into.

BTW, does the RDTSC use the same clock as QueryPerformanceCounter()?

Steven Schulze
Concord, CA

Ian MacDonald

unread,

Jul 21, 1998, 3:00:00 AM7/21/98

to

In article <#U7t9OGt...@uppssnewspub05.moswest.msn.net>,
Capta...@msn.com says...

I believe you will still have some accuracy problems using this (RDTSC)
instruction due to the context switches that can occur while you are
timing code.

I'm not sure how much this helps but I recall reading in one of the
programming journals (WDJ, MSJ or WinTech) about a VxD someone wrote to
monitor context switches so that you could account for the number of CPU
cycles executed out of context from your timed code. The VxD allowed you
you register your RDTSC counter variable and thread ID with it. The VxD
would then subtract from your counter variable, the number of CPU cycles
executed outside of your threads context. Perhaps someone else remembers
this article more specifically.

You might also want to take a look at
http://developer.intel.com/drg/pentiumII/appnotes/RDTSCPM1.HTM
for other issues that can affect accuracy of this instruction.

Hope this helps,

-- Ian MacDonald

Steven Schulze

unread,

Jul 21, 1998, 3:00:00 AM7/21/98

to

Ian MacDonald wrote in message ...

>In article <#U7t9OGt...@uppssnewspub05.moswest.msn.net>,
>Capta...@msn.com says...
>>
>> bobby sawhney wrote in message ...
>> >Hi Stephen,
>> >
>> >If u r using a pentium or above chip, then you may want to use
>> >an opcode which just does what you need, it gets a cycle count
>> >from the chip. Lookup on a RDTSC in a pentium assembly coding
>> >guide.
>> >
>> >Here is some code which emits the correct assembly code.
>> >It compiles with VC 5.0 but i am sure you can modify it to
>> >work with any other compiler.
>>
>>
>> Thanks, I'll DEFINITELY look into.
>>
>> BTW, does the RDTSC use the same clock as QueryPerformanceCounter()?
>>
>> Steven Schulze
>> Concord, CA
>>
>>
>
>I believe you will still have some accuracy problems using this (RDTSC)
>instruction due to the context switches that can occur while you are
>timing code.

I have somewhat of an answer to this problem. What I did (using
QueryPerformanceCounter()), was that I wrote a class that has a function
called Reset(), Start(), Stop(), and Show(). First you call Reset(), which
resets all members, then you run your code multiple times (I run it up to
1000 times), and every time you first call Start(), then do the code, then
call Stop(). The class then adds this time to the total time, as well as
keeping track of the fastest as well as the slowest run.

When you then call Show(), it shows the fastest, slowest and average times.
This gives pretty good results, because there's GOT to be at least one run
where the code wasn't pre-empted. Also, I try to keep the segments of code
I time pretty short, so that it's possible to get through it without
pre-empting at least once.

The problem is simply that I think QueryPerformanceCounter is flaky.
Sometimes I get a time of 0 (even WITH my code in-between Start() and
Stop()), which should be impossible, given that the resolution is so small
(838ns). It's almost as if the counter itself is updated from software that
needs to be pre-empted first, although the info I have on it suggests it's a
hardware item.

But by combining the RDTSC with the method I describe above, I might get
satisfactory results.

Steven Schulze
Concord, CA

Bob Rubendunst

unread,

Jul 21, 1998, 3:00:00 AM7/21/98

to

Felix Kasza [MVP] wrote:
>
> Steven,
>
> > and in a short time he can tell you: "Under perfect
> > conditions, this code will take xxx cycles to execute".
> > Why can't the debugger do the same for me?

Note that perfect conditions are:

1. No other code running while the test code is running.
2. No caches enabled.
3. No other hardware process like DMA or refresh working.
4. No weird programming tricks being done in interrupt service routines.

In other words, it's not a real number, but just a guess. The only way
to make these decisions is to start with the cycle counts, program up a
test case, and TEST IT! This will still be only an approximation of what
happens when it gets out in the field.

bobby sawhney

unread,

Jul 21, 1998, 3:00:00 AM7/21/98

to

>>
>> BTW, does the RDTSC use the same clock as QueryPerformanceCounter()?
>>

RDTSC uses the clock speed of the processor, so if you are
using a 400MHZ processor each cycle would be 2.5 nanosecs.

What i use this most for is measuring cycles of assembly code.
Also, you may want to subtract the overhead of RDTSC call itself
so that it does not affect your measurements, to do this call RDTSC
twice in a row and make a note of the cycles elapsed this would be
the overhead.

>I believe you will still have some accuracy problems using this (RDTSC)
>instruction due to the context switches that can occur while you are
>timing code.
>

To avoid context switches as much as possible, do a yield before
u start timing an assembly code section. This ensures that you
have a new timeslice from the OS when you wakeup.

-bobby

Steven Schulze

unread,

Jul 21, 1998, 3:00:00 AM7/21/98

to

Bob Rubendunst wrote in message <35B4BC03...@softm.com>...

I think my original point is being lost here...

Here's my main point.

If the Cycle Guru can tell me: "This code of yours will execute in 755
cycles under perfect conditions". I then re-write my code, and ask him
again, and he says "Now your code will execute in 621 cycles under perfect
conditions".

Now, that kind of info can help me immensely to speed up my code. See, I
don't need to worry about DMA interrupts, etc, etc, because if I can get my
code to perform as fast as possible under "perfect" conditions, I know it'll
also perform better than my original code under a noisy system.

The problem with "TEST IT!" as you put it, is that I can't get an accurate
reading of how long MY code takes to execute, because the results vary too
much. I can't make informed decisions as to whether version A or version B
of my code is faster for that very reason.

BTW, Intel has an "Optimizing Tutor" that shows you how to optimize code,
and shows how many cycles a certain version of code will take to execute vs
another version. Obviously this is a legitimate way to analyze code, but
you insist that it is not. I wish I had a way to see my code in the same
way as Intel shows in it's examples. That's what I'm saying.

Also, tell me, how does the people that hand-optimize code figure out which
version of code will run faster?

Steven Schulze
Concord, CA

Gus Gustafson

unread,

Jul 22, 1998, 3:00:00 AM7/22/98

to

Steven Schulze wrote:

<snip>

> Also, tell me, how does the people that hand-optimize code figure out which
> version of code will run faster?

Steve, there is an ancient rule of thumb that suggests that 10% of an
application's code will take 90% of the CPU time. My rule for optimization is
to wait until the end of the development, then if the code executes too slowly
for comfortable use, instrument the code with some performance tool; find the
10%; hand optimize it; repeat the process until the code executes "fast
enough". If there is no 10% (or 20%) then there may be some basic problem in
the design.

I'm afraid that the thread is simply saying you can't get exactly what you want
in a absolute sense. You can get what you want in a relative sense.

IMO
Gus Gustafson
gus...@gte.net

Steven Schulze

unread,

Jul 22, 1998, 3:00:00 AM7/22/98

to

Gus Gustafson wrote in message <35B5EE44...@gte.net>...

>Steven Schulze wrote:
>
><snip>
>
>> Also, tell me, how does the people that hand-optimize code figure out
which
>> version of code will run faster?
>
>Steve, there is an ancient rule of thumb that suggests that 10% of an
>application's code will take 90% of the CPU time. My rule for optimization
is
>to wait until the end of the development, then if the code executes too
slowly
>for comfortable use, instrument the code with some performance tool; find
the
>10%; hand optimize it; repeat the process until the code executes "fast
>enough". If there is no 10% (or 20%) then there may be some basic problem
in
>the design.

I know EXACTLY where time is spent in my code. It's the loop that executes
50 million times for a 50 million byte file. I can pinpoint it down to 7
lines in my code.

Also, FOR THIS SPECIFIC application, there's no "fast enough", there's only
"faster is better". It's an application that processes multiple audio files
(up to 44), each with a size of to about 50MB for 5 minutes of audio (that's
the extreme case, though). It's a program that tries to do stuff in
real-time. Anytime you have the slightest performance bottleneck, you could
lose the ability to process one or two more of those 44 files in real-time.

So, this isn't a simple case of having the spell-checker take 5 instead of 6
seconds to finish - how nice, this is a REAL case where speed makes a
difference in how the program can be applied.

>I'm afraid that the thread is simply saying you can't get exactly what you
want
>in a absolute sense. You can get what you want in a relative sense.

BTW, as I asked before, how does the compiler make it's decisions as to
which version of your code will execute faster when it does it's
optimizations, since it's not really executing your code to make a
"relative" decision, as you say? How on earth does it do it, since any
cycle counting on assembly code is being shot down as being unrealistic?

Why should I not be able to look at my code and make conclusions based on
information about cycle times, just as the compiler does (yes, I know about
pairing, penalties, etc, etc)? Or are we as programmers simply not able to
work down to this level anymore? I guess so.

Steven Schulze
Concord, CA

Bob Rubendunst

unread,

Jul 22, 1998, 3:00:00 AM7/22/98

to

Steven Schulze wrote:

> I know EXACTLY where time is spent in my code. It's the loop that executes
> 50 million times for a 50 million byte file. I can pinpoint it down to 7
> lines in my code.

Steven,

Why don't you share those 7 lines of code with us? We can trade messages
forever, but your code won't get any faster this way!

If you are shy, or those seven lines are very dear, try the book "Zen of
Code Optimization" by Michael Abrash.

William DePalo [MVP]

unread,

Jul 22, 1998, 3:00:00 AM7/22/98

to

Steven Schulze wrote in message
>

>I know EXACTLY where time is spent in my code. It's the loop that executes
>50 million times for a 50 million byte file. I can pinpoint it down to 7
>lines in my code.
>

Consider this an FYI which I offer in case any of this is new to you. If
not, please ignore it, I'm not trying to heighten your exasperation.

You can get a listing of the machine instructions generated by the compiler
with the /FA switch. There used to be a time when this listing didn't take
into account optimizations, I don't know if that is still true. If so, you
should be able to look at the machine instructions with a debugger. Intel's
processor manual list the number of clock cycles an instruction takes. MS
reprinted that info with the copy of MASM I bought several years ago. As it
is now fashionable to exclude printed docs under the guise of saving forests
I'm not sure that the assembler includes the info any more.

The trouble with the info it is that is not all simple. The number of clock
cycles an instruction takes depends on the processor, the addressing mode
and perhaps even the value of the arguments. For example, if I am reading
the table correctly, the integer multiply instruction (IMUL) takes from 13
to 42 clocks on a 486 using double word operands.

Regards,
Will

David Lowndes

unread,

Jul 23, 1998, 3:00:00 AM7/23/98

to

>You can get a listing of the machine instructions generated by the compiler
>with the /FA switch.

Also, I'm sure later versions of MASM can generate the processor
timing information on a listing, so I guess that by assembling the
compiler generated assembler output with MASM, you could get the
timing information on a listing. It's a bit long winded, so it's not
something you'd want to do very often.

Dave
----
Address is altered to discourage junk mail.
Please post responses to the newsgroup thread,
there's no need for follow up email copies.
http://www.bj.co.uk

Jan K. Avlasov

unread,

Jul 27, 1998, 3:00:00 AM7/27/98

to

Steven Schulze wrote:
> if I can get my
> code to perform as fast as possible under "perfect" conditions, I know it'll
> also perform better than my original code under a noisy system.
>

IMHO it is not true. If you get your code run 100 cycles less it will
not be a point in the system there all other threads may take a several
millions
cycles to execute. It is just a drop in the ocean.

I think you better develop your code in a way then no other
applications
will be permitted to run (in case you do it under win95). This will save
you some
valuable time.

Yan

Steven Schulze

unread,

Jul 27, 1998, 3:00:00 AM7/27/98

to

Bob Rubendunst wrote in message <35B60B39...@softm.com>...

>Steven Schulze wrote:
>
>> I know EXACTLY where time is spent in my code. It's the loop that
executes
>> 50 million times for a 50 million byte file. I can pinpoint it down to 7
>> lines in my code.
>

>Steven,
>
>Why don't you share those 7 lines of code with us? We can trade messages
>forever, but your code won't get any faster this way!
>
>If you are shy, or those seven lines are very dear, try the book "Zen of
>Code Optimization" by Michael Abrash.

Actually, that was a generalization. I have about 40 - 50 such small
functions, varying from about 4 to 100 lines of code.

BTW, I've been able to write a class now that uses the RDTSC to measure
elapsed cycles. While it's not perfect, it's pretty good. I can get
repeatable results (which is what I need) down to less than 0.1% (after
doing multiple runs and taking the minimum result). Not bad.

Steven Schulze
Concord, CA

Steven Schulze

unread,

Jul 27, 1998, 3:00:00 AM7/27/98

to

Jan K. Avlasov wrote in message <35BC83EE...@inp.nsk.su>...

>Steven Schulze wrote:
>> if I can get my
>> code to perform as fast as possible under "perfect" conditions, I know
it'll
>> also perform better than my original code under a noisy system.
>>
>
> IMHO it is not true. If you get your code run 100 cycles less it will
>not be a point in the system there all other threads may take a several
>millions
>cycles to execute. It is just a drop in the ocean.

No, it's not. If the LOOP takes 700 cycles for one version of the code, and
the loop takes 600 cycles for a different version of code (but doing the
same thing), then that's a 17% improvement for that specific loop. Now, if
my program spends an awful lot of time in that loop (as my code does,
processing a large file of audio data), then it's a SIGNIFICANT improvement.

I'm pretty aware of the fact that trying to optimize the WHOLE program is
fruitless, but since my program does what it does, it can benefit a lot from
optimizing the small sections of code that the program probably spends 95%
or more of it's time in (while processing).

Boaz Tamir

unread,

Jul 28, 1998, 3:00:00 AM7/28/98

to

The article you recall is "Extend Your Application with Dynamically Loaded
VxDs Under Windows 95". MSJ, May 1995.

Boaz Tamir.

Ian MacDonald wrote:
>
> In article <#U7t9OGt...@uppssnewspub05.moswest.msn.net>,

>
> I believe you will still have some accuracy problems using this (RDTSC)
> instruction due to the context switches that can occur while you are
> timing code.
>

Ian MacDonald

unread,

Jul 28, 1998, 3:00:00 AM7/28/98

to

In article <35BD6F12...@iil.intel.com>, bo...@iil.intel.com says...

> The article you recall is "Extend Your Application with Dynamically Loaded
> VxDs Under Windows 95". MSJ, May 1995.
>
> Boaz Tamir.
>

May '95 !?

Holy smokes. It seems like just yesterday.
I guess it just goes to show that you should never throw away any of your
old magazines.

Thanks for remembering.

-- Ian