Time in Microseconds !!!

Hugo Bérubé

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Hello,

Can I retreive the time in time unit little than millisecond ?

I search a way to get the time difference beetween to events. Those events
are generated approximatly at a 10kHz rate (every 100 microseconds).

Thanks,

Hugo

Dan Evens

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Cross posted to four news groups, two of which are clearly
off topic.

No soup for you!
--
Dan Evens
Standard disclaimers etc. No spam please.

Hugo Bérubé <Hugo....@drev.dnd.ca> wrote in article
<ussNE6u1$GA.281@cppssbbsa04>...

George Lozovoi

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Hi,
AFAIK you cannot. You need to use special embedded system to have guarantied
synchronisation for so short time pieces.

--
Best regards,
George

Victor Bazarov

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

"Hugo Bérubé" <Hugo....@drev.dnd.ca> wrote...

> Hello,
>
> Can I retreive the time in time unit little than millisecond ?
>
> I search a way to get the time difference beetween to events. Those
events
> are generated approximatly at a 10kHz rate (every 100 microseconds).

OK, it's not my place to judge your cross-posting style, so I am just
following up.

Take a look at QueryPerformanceCounter in your Windows API
documentation.

Victor
--
Please remove capital A's from my address when replying by mail

Michael D. Ober

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Take a look at GetTickCount().

Mike Ober.

"Victor Bazarov" <vAba...@dAnai.com> wrote in message
news:#$l3VYv1$GA.259@cppssbbsa05...

Henrik E. Rasmussen

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Windows 9x has a timer resolution of 55 ms and cannot be more precise than
that. Windows NT (and 2000 I think) has a timer resolution of 10 ms. You can
get an apx. value using GetTickCount().

Regards
Henrik

"Hugo Bérubé" <Hugo....@drev.dnd.ca> wrote in message
news:ussNE6u1$GA.281@cppssbbsa04...

> Hello,
>
> Can I retreive the time in time unit little than millisecond ?
>
> I search a way to get the time difference beetween to events. Those
events
> are generated approximatly at a 10kHz rate (every 100 microseconds).
>

> Thanks,
>
> Hugo
>
>

Bobby Sawhney

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

On x86 chips (Pentium & higher, as far as I remember ...)
You can use assembly to get the time stamp counter, this counter
is inremented for each clock cycle of the CPU and offers the best time
resolution available. (for a 1GHz chip, resolution = 1nanosec)

One use of this is for measuring small portions of assembly code, and can
also be used for your needs.
The assembly instruction to use is RDTSC.
Here is an informative link on the intel site
http://developer.intel.com/software/idap/resources/technical_collateral/pent
iumii/RDTSCPM1.HTM

-bobby

Error Coad

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

> Take a look at GetTickCount().

And then forget about it, because it is of no help whatsoever here.

1) Its precision is only 1 msec (not one microsecond)
2) Even worse, its resolution is 55 msec!

QueryPerformanceCounter() is the correct answer.

Michael D. Ober <mdo.@.wakeassoc.com.nospam> wrote in message
news:8e925.1938$T13....@newsread2.prod.itd.earthlink.net...

> Take a look at GetTickCount().
>
> Mike Ober.
>
> "Victor Bazarov" <vAba...@dAnai.com> wrote in message
> news:#$l3VYv1$GA.259@cppssbbsa05...
> > "Hugo Bérubé" <Hugo....@drev.dnd.ca> wrote...

> > > Hello,
> > >
> > > Can I retreive the time in time unit little than millisecond ?
> > >
> > > I search a way to get the time difference beetween to events. Those
> > events
> > > are generated approximatly at a 10kHz rate (every 100 microseconds).
> >

Bobby Sawhney

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

On x86 chips (Pentium & higher, as far as I remember ...)
You can use assembly to get the time stamp counter, this counter
is inremented for each clock cycle of the CPU and offers the best time
resolution available. (for a 1GHz chip, resolution = 1nanosec)

One use of this is for measuring small portions of assembly code, and can
also be used for your needs.
The assembly instruction to use is RDTSC.
Here is an informative link on the intel site
http://developer.intel.com/software/idap/resources/technical_collateral/pent
iumii/RDTSCPM1.HTM

-bobby

"George Lozovoi" <geo...@datagistics.com> wrote in message
news:u9YulRv1$GA.274@cppssbbsa04...

Joe O'Leary

unread,

Jun 15, 2000, 3:00:00 AM6/15/00

to

Look up multimedia timers. I've never actually needed to use them,
but I know they support a much higher resolution than GetTickCount

Joe O'

> Hugo Bérubé wrote

Slava M. Usov

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Error Coad <ec...@squat.com> wrote in message
news:qY925.520$Qb1....@monger.newsread.com...

> > Take a look at GetTickCount().
>
> And then forget about it, because it is of no help whatsoever here.
>
> 1) Its precision is only 1 msec (not one microsecond)
> 2) Even worse, its resolution is 55 msec!
>
> QueryPerformanceCounter() is the correct answer.

Nope. QPC() takes around 500 CPU ticks to execute, which on typical CPUs is
around 2 us. This is of course better than GetTickCount(). The correct
answer is the RDTSC instruction, but you should be prepared to do the
bookkeeping Real Fast (TM).

And I don't even mention that QPC() works properly only with particular NT
HALs, typically the SMP ones, and nevermind Win9x, which will welcome you
with the good ol' fuzzy 55 ms.

--

Slava

Please send any replies to this newsgroup.
microsoft.public.win32.programmer.kernel

Richard Norman

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Yes, the CPU can count clock cycles with the resolution you specify.
The problem is with the operating system. There is no guarantee that
you can execute that instruction at the specific time you want to collect
a measurement.

Suppose you want to collect the time right NOW! But between "NOW"
and the execution of your RDTSC instruction, a task switch can occur
so that many milliseconds elapse before you actually read the "current"
clock time.

Bobby Sawhney <bo...@publishone.com> wrote in message
news:ev8gS0w1$GA.295@cppssbbsa04...

Rufus...@guntherintl.com

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

If you are doing nothing else but timing events, you could continuously poll
the timer until the event occurs, then get an additional timing value. This
will
at least give you a window within which the event occurred...

Richard Norman wrote in message ...

Michael D. Ober

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

The question of timing occurs on a recurring basis in these groups. You
MUST remember that NO version of Microsoft Windows is real time, so getting
timer resolution lower than a time slice is 1) very difficult to do, and 2)
unreliable. On Win9x, the time slice is 55 milliseconds, on NT/2000, it's
10 milliseconds. For reliable timing, you should assume that on average,
one-half a time slice has occurred when you performed your last time check.
This means that Win9x isn't reliable for timing under about 83 milliseconds
and NT/2000 under 15 milliseconds.

That said, there is a real time extension to NT. I don't recall the URL for
it, but it does exist. Also, some hardware supports much better resolution
and you might be able to take advantage of that either through the
multimedia interface or through a hardware driver.

Mike Ober.

<Rufus...@GuntherIntl.com> wrote in message
news:urZ8xP61$GA.198@cppssbbsa05...

Error Coad

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

No kidding. Well, that's why I like the newsgroups: in between the
smartasses like me are people like you with new, useful info. I had thought
that QueryPerformanceCounter was a tight wrapper of some kind for the RDTSC
instruction. Guess not.

Slava M. Usov <stripit...@usa.net> wrote in message
news:ekFyn5x1$GA.245@cppssbbsa04...

Error Coad

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

> unreliable. On Win9x, the time slice is 55 milliseconds, on NT/2000, it's
> 10 milliseconds. For reliable timing, you should assume that on average,

This simply is untrue. Using the multimedia timers it is possible to get
resolution down in the millisecond range, using 16-bit code, even on 95 and
98. I can say this with some assurance because I have written music software
that, to the musician's ear, is very stable in its timings at all tempos. I
have not measured the accuracy of the timing, but even a reasonably decent
musician can detect timing problems on the order of a few milliseconds.

Michael D. Ober <mdo.@.wakeassoc.com.nospam> wrote in message

news:vur25.1107$FC6....@newsread1.prod.itd.earthlink.net...

Rufus...@guntherintl.com

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

You're not alone. I also thought that until about 1 month or so ago.

Error Coad wrote in message ...

Michael D. Ober

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Please read before writing. If you had, you would have seen that I
specifically mentioned multimedia timers as a method for getting better
resolution. Even with multimedia timers, you may still have to deal with
latency issues since Win32 doesn't guarantee response times, just that
everything will get to run.

Mike Ober.

"Error Coad" <ec...@squat.com> wrote in message

news:Ets25.705$pg7....@newshog.newsread.com...

Slava M. Usov

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Error Coad <ec...@squat.com> wrote in message

news:mbs25.702$pg7....@newshog.newsread.com...

> No kidding. Well, that's why I like the newsgroups: in between the
> smartasses like me are people like you with new, useful info. I had
thought
> that QueryPerformanceCounter was a tight wrapper of some kind for the
RDTSC
> instruction. Guess not.

If you're into it, you can use the dumpbin to produce the list of exports of
your hal.dll, lookup the address for KeQueryPerformanceCounter, and then use
the dumpbin to disasm hal.dll, and then go to the address of
KeQueryPerformanceCounter. This routine is eventually called when
QueryPerformanceCounter() is executed. In the standard uniprocessor HAL, the
core of KeQueryPerformanceCounter() is

80014E27: 9C pushfd
80014E28: FA cli
80014E29: 8B 1D 20 56 01 80 mov ebx,dword ptr ds:[80015620h]
80014E2F: 8B 35 24 56 01 80 mov esi,dword ptr ds:[80015624h]
80014E35: B0 00 mov al,0
80014E37: E6 43 out 43h,al
80014E39: EB 00 jmp 80014E3B
80014E3B: E4 40 in al,40h
80014E3D: EB 00 jmp 80014E3F
80014E3F: 0F B6 C8 movzx ecx,al
80014E42: E4 40 in al,40h
80014E44: 8A E8 mov ch,al
80014E46: 9D popfd

which reads the good old timer, with its usual frequency of 18.2 ticks per
second. This code uses in/out which are very slow. Plus the usual overhead
of getting into kernel mode. Plus there is about twice as many additional
code in this routine which does other things.

The 500 ticks I mentioned in my previous message were related to the SMP
HAL, the uniprocessor HAL makes everything about three times slower. The
core of KeQueryPerformanceCounter() of the SMP HAL, BTW, is:

80012C7C: 0F 31 rdtsc
80012C7E: C2 04 00 ret 4

No interrupts disabled, no ins, no outs. Just one RDTSC. The only overhead
is the user mode/kernel mode transition.

I don't know how QPC() is implemented in Win9x, but I hear it has terrible
resolution.

The best way and which is almost as portable as Win32 code in general, is
just pure RDTSC instruction.

Slava M. Usov

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

<Rufus...@GuntherIntl.com> wrote in message
news:urZ8xP61$GA.198@cppssbbsa05...
> If you are doing nothing else but timing events, you could continuously
poll
> the timer until the event occurs, then get an additional timing value.
This
> will
> at least give you a window within which the event occurred...

Another way is to collect both GetTickCount() values and RDTSC values.
Everytime you see GetTickCount() changed from the previous value, you may
skip this measurement.

The theory behind it, the value that GetTickCount() reports is updated only
on timer interrupts, at the infamous 55 ms frequency. A context switch
normally occurs every three or so ticks, but handling of the timer interrupt
*alone* distorts the picture reported by RDTSC.

Everything depends on what you're doing with timings. If you want to obtain
timings of various code paths, then you should correlate RDTSC with timer
interrupts in the manner specified above. If you're polling RDTSC for some
other reason, then you decide.

Error Coad

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Whhopps, I obviously missed that detail. Sorree. However, I think I've seen
someone claiming that MM timers have this deficiency as well. That's
certainly true for 32-bit timers, but not for 16-bit timers in Win9x.

I find it ironic that MS is pushing us towards a 64 bit future, when IMHO
they still haven't finished the conversion from 16 to 32-bits!

Michael D. Ober <mdo.@.wakeassoc.com.nospam> wrote in message

news:STs25.1866$ds.4...@newsread2.prod.itd.earthlink.net...

> Please read before writing. If you had, you would have seen that I
> specifically mentioned multimedia timers as a method for getting better
> resolution. Even with multimedia timers, you may still have to deal with
> latency issues since Win32 doesn't guarantee response times, just that
> everything will get to run.
>
> Mike Ober.
>

> "Error Coad" <ec...@squat.com> wrote in message

> > > <Rufus...@GuntherIntl.com> wrote in message
> > > news:urZ8xP61$GA.198@cppssbbsa05...
> > > > If you are doing nothing else but timing events, you could
> continuously
> > > poll
> > > > the timer until the event occurs, then get an additional timing
value.
> > > This
> > > > will
> > > > at least give you a window within which the event occurred...
> > > >

Raymond Chen

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

The reason why QPC doesn't always use RDTSC is twofold.

1. On multiprocessor systems, you have to make sure that if you
execute RDTSC on both processors simultaneously, they return the
same answer back. Otherwise, an app that gets migrated between
processors is going to see time go backwards.

If a HAL cannot guarantee this behavior, it cannot use RDTSC for
QPC.

2. On systems where the CPU runs at multiple speeds (laptops with
power management, the new intel speedstep stuff, etc.), you
cannot use RDTSC because it counts cycles, not time. When the
CPU changes speed, the cycles per second changes, which
invalidates the result of QueryPerformanceFrequency.

So if a system supports dynamic CPU speeds, it cannot use RDTSC
for QPC.
--
(My return address is intentionally invalid to foil spammers. Delete the
".---" to get my real address. I do this on my own time with my own money;
my responses are not to be considered official technical support or advice.
Personal requests for assistance will be ignored.)

J. Wesley Cleveland

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Raymond Chen wrote:
>
> The reason why QPC doesn't always use RDTSC is twofold.
>
> 1. On multiprocessor systems, you have to make sure that if you
> execute RDTSC on both processors simultaneously, they return the
> same answer back. Otherwise, an app that gets migrated between
> processors is going to see time go backwards.
>
> If a HAL cannot guarantee this behavior, it cannot use RDTSC for
> QPC.

Then why does the SMP version of NT use RTDSC, and not the uniprocessor
?

J. Wesley Cleveland

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

"Slava M. Usov" wrote:
>
> <Rufus...@GuntherIntl.com> wrote in message
> news:urZ8xP61$GA.198@cppssbbsa05...
> > If you are doing nothing else but timing events, you could continuously
> poll
> > the timer until the event occurs, then get an additional timing value.
> This
> > will
> > at least give you a window within which the event occurred...
>

> Another way is to collect both GetTickCount() values and RDTSC values.
> Everytime you see GetTickCount() changed from the previous value, you may
> skip this measurement.
>
> The theory behind it, the value that GetTickCount() reports is updated only
> on timer interrupts, at the infamous 55 ms frequency.

This does not seem to be correct. We use logfiles for debugging
purposes, log the tick count, and frequently get differences of 4 or 5
ms on Win9x. Perhaps GetTickCount() reads the RTC or timer chip.

Gary Chanson

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Error Coad <ec...@squat.com> wrote in message

news:iFu25.765$Qb1....@monger.newsread.com...

> Whhopps, I obviously missed that detail. Sorree. However, I think I've
seen
> someone claiming that MM timers have this deficiency as well. That's
> certainly true for 32-bit timers, but not for 16-bit timers in Win9x.
>
> I find it ironic that MS is pushing us towards a 64 bit future, when IMHO
> they still haven't finished the conversion from 16 to 32-bits!

They figure that the 32 bit versions are already obsolete so why fix
them? They'll just get it right "next time"...

--

-GJC
-gcha...@shore.net

Slava M. Usov

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

J. Wesley Cleveland <jw...@oro.net> wrote in message
news:394A9CA4...@oro.net...
> "Slava M. Usov" wrote:

[...]

> > The theory behind it, the value that GetTickCount() reports is updated
only
> > on timer interrupts, at the infamous 55 ms frequency.
>
> This does not seem to be correct. We use logfiles for debugging
> purposes, log the tick count, and frequently get differences of 4 or 5
> ms on Win9x. Perhaps GetTickCount() reads the RTC or timer chip.

Ugh, sorry, last time I touched Win95 was some early 1996 [note, not 9x, but
95, I have never touched anything 9x that was not 95 :-)], and I'm
constantly forgetting to add "tested on NT, extrapolated to 9x". So it looks
like the GetTickCount() on 9x provides greater accuracy than QPC(). Kinda
cool :-) I wonder, why they always end up having things *backwards*?

To that matter, I may add that on NT GetTickCount() is a just a handful of
instructions which read some variable in the memory shared between the
kernel and the user modes, typically taking 7-9 CPU clocks to execute. I
guess it is not the case on Win9x. Sigh.

Slava M. Usov

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Raymond Chen <raym...@microsoft.com.---> wrote in message
news:334lks0e28gi0bbac...@4ax.com...

> The reason why QPC doesn't always use RDTSC is twofold.
>
> 1. On multiprocessor systems, you have to make sure that if you
> execute RDTSC on both processors simultaneously, they return the
> same answer back. Otherwise, an app that gets migrated between
> processors is going to see time go backwards.
>
> If a HAL cannot guarantee this behavior, it cannot use RDTSC for
> QPC.

Provided that the hardware asserts the RESET pin on all CPUs simultaneously,
all RDTSCs will be synchronized. [I have posted a few articles about that,
so all whom it may concern, please do a Deja search.] I'd bet that all
hardware designs around do that. Can you say "hot-swap CPUs"? Maybe, but
they'd require a non-standard HAL [even assuming that NT/2000 might actually
run on them], and we're talking about the standard uniprocessor/SMP HALs
supplied with NT/2000 anyway.

At any rate, the uniprocessor HAL does not ever have to assume anything like
that.

> 2. On systems where the CPU runs at multiple speeds (laptops with
> power management, the new intel speedstep stuff, etc.), you
> cannot use RDTSC because it counts cycles, not time. When the
> CPU changes speed, the cycles per second changes, which
> invalidates the result of QueryPerformanceFrequency.
>

> So if a system supports dynamic CPU speeds, it cannot use RDTSC
> for QPC.

Non sequitur. *Any* modern system out there can do that, even those that can
happily run the SMP HAL [I'm typing this on one such machine, with this
nonsense disabled, though]. Following your logic, no machine may ever use
RDTSC. If I have to choose, a high-resolution time source which may fail
[and mind you, QPC() does have a BOOL return value, so it may fail, and no
PSDK document says it cannot fail after some successful runs] versus an
always low-resolution source that cannot fail, I'd choose the first option,
especially having GetTickCount() for low resolution timing always available.

But, what exactly computers are configured to change CPU frequency on the
fly? Notebooks? I can see two
main applications for precise timing, debugging/profiling and real-time
process control. I don't see how notebooks may be useful in either endeavor,
unless they run off a UPS plugged into a wall socket nearby, at full speed.
I can imagine that users of such applications are qualified enough to follow
the guidance to never change CPU frequency when the application is running.

Richard Norman

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

There is a subtle misconception in this line of thought. It is true that the
32
bit operating systems can't do a lot of very low level hardware-oriented
procedures like the 16 bit systems could. But that is because the 32 bit
OS's are "more advanced". They are protected multi-threaded, potentially
multi-user systems which protect one user or process from another. The
16 bit systems assumed that the current user "owned" the entire machine
and could do anything with the hardware.

If you want to own the hardware of your machine and control things yourself,
which includes guaranteeing yourself response latency and timing resolution,
then get a true real-time OS. But don't blame Microsoft for having the same
"problems" (features?) that, say, Unix has had all along.

Win95 can do things that NT can't because it really sits on top of a
bastardized 16 bit core. Win95 also doesn't protect my application from
the stupid things yours does so I can't use it. But it is necessary for
things like games.

Gary Chanson <gcha...@no.spam.shore.net> wrote in message
news:VIx25.251$sd6....@news.shore.net...

Gary Chanson

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Slava M. Usov <stripit...@usa.net> wrote in message

news:#udhaG$1$GA....@cppssbbsa02.microsoft.com...

As far as I know, this is exactly how Win9x is implemented. There's no
way that anyone can see "4 or 5 ms" via GetTickCount() since it's resolution
is 55 ms.

--

-GJC
-gcha...@shore.net

Slava M. Usov

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Gary Chanson <gcha...@no.spam.shore.net> wrote in message

news:aSD25.3$m_3...@news.shore.net...

>
> Slava M. Usov <stripit...@usa.net> wrote in message

[...]

> > To that matter, I may add that on NT GetTickCount() is a just a handful
of
> > instructions which read some variable in the memory shared between the
> > kernel and the user modes, typically taking 7-9 CPU clocks to execute. I
> > guess it is not the case on Win9x. Sigh.
>
> As far as I know, this is exactly how Win9x is implemented. There's
no
> way that anyone can see "4 or 5 ms" via GetTickCount() since it's
resolution
> is 55 ms.

Hm. Strange. Perhaps some Win9x releases do it differently? Can anybody post
the actual code of GetTickCount() in kernel32.dll of Win9x? The NT version
is:

77F05F8D: BA 00 00 FE 7F mov edx,7FFE0000h
77F05F92: 8B 02 mov eax,dword ptr [edx]
77F05F94: F7 62 04 mul eax,dword ptr [edx+4]
77F05F97: 0F AC D0 18 shrd eax,edx,18h
77F05F9B: C3 ret

This code reads the 32 bit tick count at 0x7FFE0000, multiples by the "tick
duration" at 0x7FFE0004, then divides the resultant by 2^24 [by shifting to
right]. The final step is necessary because the multiplier is stored in
1/(2^24) ms units [roughly 1/16 nanos] and on my machine equals to
0x0FA00000, which is 15.625 ms. Different from the known and loved 55 ms.
This is consistent with sysinternals information at
http://www.sysinternals.com/nt5.htm

Just for fun, below is the code for a simple utility that determines the
tick granularity by monitoring the value of GetTickCount() and thus should
work on all Win32 systems.

#include <windows.h>
#include <iostream>

int main()
{
typedef unsigned int u;

u c1 = GetTickCount();
for( u samples = 0; samples < 10; samples++ )
{
u c2;
while( (c2 = GetTickCount()) == c1 );
std::cout << " " << c2 - c1;
c1 = c2;
}

std::cout << std::endl;

return 0;
}

A typical output on my machine was:

16 15 16 16 15 16 15 16 16 15

confirming the 15.625 tick duration.

Slava M. Usov

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Richard Norman <rsno...@mediaone.net> wrote in message
news:5zC25.250$j92....@typhoon.mw.mediaone.net...

> There is a subtle misconception in this line of thought. It is true that
the
> 32
> bit operating systems can't do a lot of very low level hardware-oriented
> procedures like the 16 bit systems could. But that is because the 32 bit
> OS's are "more advanced". They are protected multi-threaded, potentially
> multi-user systems which protect one user or process from another. The
> 16 bit systems assumed that the current user "owned" the entire machine
> and could do anything with the hardware.

I don't think so. In bare DOS environments, perhaps. As soon as the user
loaded a few TSRs [SideKick, anyone?], you would have to be very careful and
don't assume you're in the total command of the system, unless you reset all
interrupt vectors and basically prevented the TSR code from running. When
you loaded any "memory extender" or "expander", or both, you could *no
longer* relinquish the control of the machine [unless you did some very
destructive actions to the memory manager], because you would be a happy
inhabitant of the V86 machine. Win3.x and Win9x are in fact very "advanced"
versions of memory managers and unless you have spent years tracing through
system code, you better keep away of "doing anything with hardware", for the
most likely result will be a crash of the whole system.

Neither Win3.x nor Win9x did at any time assume that the applications would
access the hardware directly. They assumed that the applications would *not*
do that, and *asked* the programmers not to. These "16 bit systems" did not
however protected themselves very well, because they simply could not
without a major redesign. Some programmers simply took as a given that if
they could do things the way they had been doing since 1982 [or 1981?], and
the "16 bit system" did not make that impossible, they should by all means
continue doing so. Unfortunately, some of these programmers were Microsoft
employees, so many people regarded this practice as "unofficially blessed"
by Microsoft, even though Microsoft officially urged them not to. The sheer
mass of these programmers and their applications makes Microsoft keep this
crap around for decades. [No, that's not a money issue. They could have
earned more money forcing all Win95 users to upgrade to an NT-based system
for home sector.]

> If you want to own the hardware of your machine and control things
yourself,
> which includes guaranteeing yourself response latency and timing
resolution,
> then get a true real-time OS. But don't blame Microsoft for having the
same
> "problems" (features?) that, say, Unix has had all along.
>
> Win95 can do things that NT can't because it really sits on top of a
> bastardized 16 bit core. Win95 also doesn't protect my application from
> the stupid things yours does so I can't use it. But it is necessary for
> things like games.

Which exactly things can Win95 do that NT cannot? Crash an neighbor
application in a separate address space and take the system with it whenever
the phase of the Moon permits?

Or provide better response latency and timing resolution? Sorry, we just
have discussed the timing resolution question, and for timing latency you
might go to the USENIX site and read an online article that compares the
latencies of Win9x and NT. You will be surprised.

That it can run applications from 80's? As of 17-Jun-2000, I don't care, do
you?

Games? As far as I can tell [I'm not much of a gamer, if at all, mind you]
all major games run under NT, and most die-hard gamers I know prefer NT, due
to its stability and performance, especially for Open GL games they say.
True, some cheap video cards do not perform well under NT, but spending a
few bucks for a better gaming experience is never a question. Personally, I
could care less for this whole game issue.

Tomislav Canic

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Why not try hardware solution? Microcontroller PIC12C508
at a price of <$1 and near 1 MIPS is good choice. Let it
count the events and use RS232 to communicate with.
Assembler is for free, and simple programmer costs
about $3.

--

Best Regards,
Tomislav Canic
Kninska 7
22320 DRNIS
CROATIA

+385(0)21-347-829
+385(0)91-502-92-71
+385(0)22-887-596
tomisla...@NOSPAMsi.tel.hr
Remove NOSPAM from address above.

Microcontrollers, analog & digital electronics,
external AD & DA PC converters, LCD displays,
emulators, EPROM & AT89C2051 programmers, AVR,
hardware programming with Deplhi & Visual Basic.
Documentation for your convenience: Word, Protel,
AutoCAD, Electronics WorkBench.

J. Wesley Cleveland wrote in message <394A9CA4...@oro.net>...

>
>
>"Slava M. Usov" wrote:
>>
>> <Rufus...@GuntherIntl.com> wrote in message
>> news:urZ8xP61$GA.198@cppssbbsa05...
>> > If you are doing nothing else but timing events, you could continuously
>> poll
>> > the timer until the event occurs, then get an additional timing value.
>> This
>> > will
>> > at least give you a window within which the event occurred...
>>
>> Another way is to collect both GetTickCount() values and RDTSC values.
>> Everytime you see GetTickCount() changed from the previous value, you may
>> skip this measurement.
>>

Tomislav Canic

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Slava M. Usov wrote in message
<#vt2Nc71$GA....@cppssbbsa02.microsoft.com>...

>Error Coad <ec...@squat.com> wrote in message

>news:mbs25.702$pg7....@newshog.newsread.com...

>The 500 ticks I mentioned in my previous message were related to the SMP
>HAL, the uniprocessor HAL makes everything about three times slower. The
>core of KeQueryPerformanceCounter() of the SMP HAL, BTW, is:
>
> 80012C7C: 0F 31 rdtsc
> 80012C7E: C2 04 00 ret 4
>
>No interrupts disabled, no ins, no outs. Just one RDTSC. The only overhead
>is the user mode/kernel mode transition.
>
>I don't know how QPC() is implemented in Win9x, but I hear it has terrible
>resolution.
>
>The best way and which is almost as portable as Win32 code in general, is
>just pure RDTSC instruction.
>

>Slava
>

Somewhere (MMX Instructions Manual?) I read that counters
for a cache hit statistic may be used for tiny precise loops
- do somebody knows more about it?
And, about 500 ticks for the RDTSC - are you 100% shure that
500 is correct number? Somewhere I read that TSC counter is
incremented every CPU cycle (not 100% shure).

Dirk Schreib

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

Slava M. Usov <stripit...@usa.net> wrote in message

news:uDbu7iH2$GA.197@cppssbbsa04...

> Gary Chanson <gcha...@no.spam.shore.net> wrote in message
> news:aSD25.3$m_3...@news.shore.net...
> >
> > Slava M. Usov <stripit...@usa.net> wrote in message
>
> [...]

just use the following:

__int64 clock64()
{
_asm rdtsc
// ignore warning about missing return statement
};

it is as accurate as your processors clock.
On my notebook it gets incremented 600 000 000 times a second.
(It only works if your processor supports the RDTSC instruction.)

Ciao
Dirk

Ziv Caspi

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

Raymond Chen <raym...@microsoft.com.---> wrote in message
news:334lks0e28gi0bbac...@4ax.com...
> The reason why QPC doesn't always use RDTSC is twofold.
>

There's a third reason, I believe. RDTSC appeared only on Pentium and later
processors. Supposedly, NT4 was meant to run on i486 (it doesn't run on
i386 because of some interlocked operations it uses).

Ziv.

Error Coad

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

It's not just things like games - it's also professional tools like musical
editors, sound design tools, etc.

> then get a true real-time OS. But don't blame Microsoft for having the
same
> "problems" (features?) that, say, Unix has had all along.

Why not? Just because the computer science students that developed Unix made
mistakes, is no reason for MS to fall into the same trap!

Richard Norman <rsno...@mediaone.net> wrote in message
news:5zC25.250$j92....@typhoon.mw.mediaone.net...
> There is a subtle misconception in this line of thought. It is true that
the
> 32
> bit operating systems can't do a lot of very low level hardware-oriented
> procedures like the 16 bit systems could. But that is because the 32 bit
> OS's are "more advanced". They are protected multi-threaded, potentially
> multi-user systems which protect one user or process from another. The
> 16 bit systems assumed that the current user "owned" the entire machine
> and could do anything with the hardware.
>

> If you want to own the hardware of your machine and control things
yourself,
> which includes guaranteeing yourself response latency and timing
resolution,
> then get a true real-time OS. But don't blame Microsoft for having the
same
> "problems" (features?) that, say, Unix has had all along.
>
> Win95 can do things that NT can't because it really sits on top of a
> bastardized 16 bit core. Win95 also doesn't protect my application from
> the stupid things yours does so I can't use it. But it is necessary for
> things like games.
>

> Gary Chanson <gcha...@no.spam.shore.net> wrote in message

> news:VIx25.251$sd6....@news.shore.net...

> >
> > Error Coad <ec...@squat.com> wrote in message

J. Wesley Cleveland

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

"Slava M. Usov" wrote:
>
> Gary Chanson <gcha...@no.spam.shore.net> wrote in message

> news:aSD25.3$m_3...@news.shore.net...
> >
> > Slava M. Usov <stripit...@usa.net> wrote in message
>
> [...]
>

> > > To that matter, I may add that on NT GetTickCount() is a just a handful
> of
> > > instructions which read some variable in the memory shared between the
> > > kernel and the user modes, typically taking 7-9 CPU clocks to execute. I
> > > guess it is not the case on Win9x. Sigh.
> >
> > As far as I know, this is exactly how Win9x is implemented. There's
> no
> > way that anyone can see "4 or 5 ms" via GetTickCount() since it's
> resolution
> > is 55 ms.

I just ran the following program and got

int main(int argc, char* argv[])
{
for (int i=0;i<25;i++)
{
printf("%d\n", GetTickCount());
}
return 0;
}

1463076
1463076
1463076
1463076
1463076
1463077
1463077
1463077
1463077
1463077
1463077
1463077
1463077
1463077
1463077
1463078
1463078
1463078
1463078
1463078
1463080
1463080
1463080
1463080

>
> Hm. Strange. Perhaps some Win9x releases do it differently? Can anybody post
> the actual code of GetTickCount() in kernel32.dll of Win9x?

Sure

mov gs,word ptr ds:[0BFFBC2B4h] ; 4 bytes at 0xc000ea30 base
mov eax,gs:[00000000]
sub edx,edx
mov gs,dx
ret

Slava M. Usov

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Tomislav Canic <tomisla...@si.tel.hr> wrote in message
news:8igsh5$ob2$1...@as102.tel.hr...

> Somewhere (MMX Instructions Manual?) I read that counters
> for a cache hit statistic may be used for tiny precise loops
> - do somebody knows more about it?

I believe you're talking about Pentium and P6 families "performance events".
In a nutshell, these CPUs have two performance monitoring counters, and they
may be programmed to count "performance events". Some of the events are
cache hits/misses. The counters only maintain the ever-increasing sum of the
events, and you in fact should use RDTSC to get their rate. For more
details, download and read "The Intel Architecture Software Developer's
Manual, Volume 3: System Programming Guide (Order Number 243192)."

> And, about 500 ticks for the RDTSC - are you 100% shure that
> 500 is correct number? Somewhere I read that TSC counter is
> incremented every CPU cycle (not 100% shure).

500 ticks was for the execution time of QueryPerformanceCounter(), not for
RDTSC. That was one of the reasons I said "use RDTSC, not QPC()".

Slava M. Usov

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

J. Wesley Cleveland <jw...@oro.net> wrote in message

news:394E7DC9...@oro.net...

[...]

> 1463076

[...]

> 1463077

[...]

> 1463078

[...]

> mov gs,word ptr ds:[0BFFBC2B4h] ; 4 bytes at 0xc000ea30 base
> mov eax,gs:[00000000]
> sub edx,edx
> mov gs,dx
> ret

Hmm... perhaps some application/VxD needs higher frequency and reprograms
the timer/RTC?

Slava M. Usov

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Tomislav Canic <tomisla...@si.tel.hr> wrote in message

news:8igsh6$ob2$2...@as102.tel.hr...

> Why not try hardware solution? Microcontroller PIC12C508
> at a price of <$1 and near 1 MIPS is good choice. Let it
> count the events and use RS232 to communicate with.
> Assembler is for free, and simple programmer costs
> about $3.

The only problem is that I cannot get that serial bytestream go directly
into my application. The UART's FIFO will buffer them, then the serial
driver will buffer them, and I'll end up getting runs of timer events every
55 ms :-) [or :-( ]

Tomislav Canic

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Slava M. Usov wrote in message ...

>Tomislav Canic <tomisla...@si.tel.hr> wrote in message
>news:8igsh6$ob2$2...@as102.tel.hr...
>> Why not try hardware solution? Microcontroller PIC12C508
>> at a price of <$1 and near 1 MIPS is good choice. Let it
>> count the events and use RS232 to communicate with.
>> Assembler is for free, and simple programmer costs
>> about $3.
>
>The only problem is that I cannot get that serial bytestream go directly
>into my application. The UART's FIFO will buffer them, then the serial

^^^^^^^^^^^^^^
Hi, Slava! You can turn-off the FIFO.

>driver will buffer them, and I'll end up getting runs of timer events every

^^^^^^
Turn it off and get RS232 chars via interrupt for every char.

>55 ms :-) [or :-( ]

Read the original post:
"Can I retreive the time in time unit little than millisecond ?
I search a way to get the time difference beetween to events. Those
events
are generated approximatly at a 10kHz rate (every 100 microseconds)."

Maybe original poster can says more about his application?
Does it need to capture _every_ event? If those are periodical events with
const.
period for many events, than it is not necesary to capture every event, but
one in every _n_ events.
I think that HW solution is not bad one.

Best Rgds,
Tomislav Canic

Slava M. Usov

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Tomislav Canic <tomisla...@si.tel.hr> wrote in message

news:8ioucm$1sa$4...@as102.tel.hr...

> Slava M. Usov wrote in message ...
> >Tomislav Canic <tomisla...@si.tel.hr> wrote in message
> >news:8igsh6$ob2$2...@as102.tel.hr...
> >> Why not try hardware solution? Microcontroller PIC12C508
> >> at a price of <$1 and near 1 MIPS is good choice. Let it
> >> count the events and use RS232 to communicate with.
> >> Assembler is for free, and simple programmer costs
> >> about $3.
> >
> >The only problem is that I cannot get that serial bytestream go directly
> >into my application. The UART's FIFO will buffer them, then the serial
> ^^^^^^^^^^^^^^
> Hi, Slava! You can turn-off the FIFO.

Tomislav,

in general, I cannot turn off the FIFO because

1. The system may not keep up with the serial port handling every single
byte. For one ms rate most systems probably can, but "that depends".

2. See below.

> >driver will buffer them, and I'll end up getting runs of timer events
every
>
> ^^^^^^
> Turn it off and get RS232 chars via interrupt for every char.

We're are in user land and cannot do it. If you mean that a special driver
is required, this greatly reduce the range of installations your application
+ driver can run on. If say the user wants to use the serial port for your
application and for some external modem, this requires a reboot on most
systems. Personally, I'd not like to reboot just to check my email.

[...]

> I think that HW solution is not bad one.

No it is not, it is just more pain in the neck for users and administrators,
and is not acceptable in every situation.

Vince

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

> Why not try hardware solution? Microcontroller PIC12C508
> at a price of <$1 and near 1 MIPS is good choice. Let it
> count the events and use RS232 to communicate with.
> Assembler is for free, and simple programmer costs
> about $3.

I would not use serial port to get real-time results for precision
issues: what amount of time has elapsed between the moment
the time was computed and the one I actually get the figures ?
The best is to access a RTC chip directly.

But, especially under MS OS's, you still haven't any guarantee
that no time has elapsed between the moment you get the value
and the one it is computed just because of possible task
switching. You may even not assume the delay is always the
same for the same reasons. Of course it statistically has a mean
value but you have to compute it. Besides it might be not
significant at all if the delay varies a lot around that mean value.

Anyway MS said since the very beginning that they would not
offer any guarantee for [strictly-speaking] real-time computing
with their OS's.

Note I'm talking about words that were said a couple of years ago
and things might have changed in between but I don't think so.

If you want to perform real-time computing, it is best to let this
task to a so-designed device like automation or microcontroller-
oriented devices, as you said. Note the microcontroller should
be triggered directly by the 'thing' to measure and the OS
machine should only perform some kind of monitoring.

The whole time-measurement process should take place in the
microcontroller. I presume that's what you though of.

If you can bear some lack of precision I think you can measure
times that reach the millisecond. Beyond that limit it becomes
statistically useless IMHO.

Vincent

Vince

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

I personally think this rate is too high to keep enough precision
in the measurement (see my previous reply in this group about
MS OS and real-time).

Could you detail the context ?

Vincent

------

"Joe O'Leary" <jol...@artisoft.com> a écrit dans le message news:
Zoc25.34230$hp4.8...@newsread1.prod.itd.earthlink.net...
> Look up multimedia timers. I've never actually needed to use them,
> but I know they support a much higher resolution than GetTickCount
>
> Joe O'
>
> > Hugo Bérubé wrote
> > > Hello,

Ken Reneris

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

The multimedia timers are for waking up a thread at a more precise time.

To simple read the time at a higher resolution use QueryPerformanceCounter()
and QueryPerformanceFrequancy(). That is the best the OS can provide as a
standard feature of the platform. (E.g., without an addin card).

It should be noted that the crystal the platform uses for
QueryPerformanceCounter may not be the same crystal that's in use for
keeping the time of day. So the times reported by QueryPerformanceCounter
may drift from GetTickCount() if monitored over a long period of time. But
for short duration timings QueryPerformanceCounter() is what to use. Also
note that reading the performance counter requires a reading the timer
device - so the function itself as a tiny bit of overhead that you may need
to take into account.

- Ken

"Vince" <none@nowhere> wrote in message news:962181131.24692@callisto...

Slava M. Usov

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Ken Reneris <k...@reneris.com> wrote in message
news:slkgb3...@corp.supernews.com...

[...]

> To simple read the time at a higher resolution use
QueryPerformanceCounter()
> and QueryPerformanceFrequancy(). That is the best the OS can provide as
a
> standard feature of the platform. (E.g., without an addin card).

That's not true. It would help if you read the thread before posting.
Nothing personal, and I do believe that you'd benefit from reading.

Ken Reneris

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

The thread wasn't available on my news reader.

- Ken

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:#qYNNwb4$GA.1596@cpmsftngp04...

ken_ren...@my-deja.com

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

The problem is with RDTSC... it's been
implemented in different ways through the course
of the processors and sometimes does not count
what one thinks. E.g., on some older systems it
stops when the machine hits the idle loop or
slows when the power managment h/w throttles the
cpu to a slowers speed.

On some MP systems the rdtsc counter doesn't
track across processors because the processors
may be using different crystals on each processor
(or in the case of some systems, may support
different speed processors) causing the counts
between processors to "drift". On most MP
systems this does not occur because the
processors are using the same clock signal as
their reference and the counters therefore keep
in lock step.

In the MPS table the bios produces there is a
flag which informs the OS if the OS can rely on
the RDTSC for counts. If so, that is what NT
uses. At boot it determines the frequency of the
counter (by watching it against the RTC) and
resets the count on all processors at the same
time. It should be noted that on W2K I beleive
the count needs to get a base adjustment in case
the system was a sleep or hibernated.

--

Using rdtsc is fine in a application for dev work
as you can control the platform. However, trying
to ship that random users in the world is a
different story, as I suspect it will behave
differently on some machines (portables the most
notably). QueryPerformanceCounter() is the
attempt by the OS to use what is knows to be a
stable source.

What really needs to happen is another flag added
to the firmware to inform the OS when it's OK to
rely on rdtsc for the UP case.

- Ken

In article <394A9AE6...@oro.net>,
"J. Wesley Cleveland" <jw...@oro.net> wrote:

>
>
> Raymond Chen wrote:
> >
> > The reason why QPC doesn't always use RDTSC
is twofold.
> >

> > 1. On multiprocessor systems, you have to
make sure that if you
> > execute RDTSC on both processors
simultaneously, they return the
> > same answer back. Otherwise, an app that
gets migrated between
> > processors is going to see time go backwards.
> >
> > If a HAL cannot guarantee this behavior, it
cannot use RDTSC for
> > QPC.
>

> Then why does the SMP version of NT use RTDSC,
and not the uniprocessor
> ?
>

Sent via Deja.com http://www.deja.com/
Before you buy.

k...@reneris.com

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Slava M. Usov

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

<k...@reneris.com> wrote in message news:8jfoh5$c5k$1...@nnrp1.deja.com...

[...]

> On some MP systems the rdtsc counter doesn't track across processors
> because the processors may be using different crystals on each
> processor (or in the case of some systems, may support different speed
> processors) causing the counts between processors to "drift". On most
> MP systems this does not occur because the processors are using the
> same clock signal as their reference and the counters therefore keep in
> lock step.

Well, I can buy different crystals [although I can't help myself thinking
about a possible engineering reason for that], but never different speed
processors. A system with different speed processors is not SMP by the
definition, and NT will only run on true SMP systems.

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:ekFyn5x1$GA.245@cppssbbsa04...

> Error Coad <ec...@squat.com> wrote in message

> news:qY925.520$Qb1....@monger.newsread.com...
> > > Take a look at GetTickCount().
> >
> > And then forget about it, because it is of no help whatsoever here.
> >
> > 1) Its precision is only 1 msec (not one microsecond)
> > 2) Even worse, its resolution is 55 msec!
> >
> > QueryPerformanceCounter() is the correct answer.
>
> Nope. QPC() takes around 500 CPU ticks to execute, which on typical CPUs
is
> around 2 us. This is of course better than GetTickCount(). The correct
> answer is the RDTSC instruction, but you should be prepared to do the
> bookkeeping Real Fast (TM).
>
> And I don't even mention that QPC() works properly only with particular NT
> HALs, typically the SMP ones, and nevermind Win9x, which will welcome you
> with the good ol' fuzzy 55 ms.

I don't think that is correct. I have written a timer class that uses
QueryPerformanceCounter(). By using a calibrating function, I'm able to get
extremely accurate readings. After the class is calibrated (in the
constructor), if I call my class' Start() and Stop() function right after
each other, I get a result of only a few elapsed clock cycles (the class
compensated for time spent in the Start() and Stop() functions). I would
say that under 9x on my PII450, the results are within 10ns, if not better,
for non-pre-empted runs.

I have used this class on 9x, NT4, and W2K with very similar results. I
have not seen the 55ms resolution you talk about. In fact, I originally
wrote the class on W95 on a PentiumI 150, so all my testing was done on it.
I used it to optimize loops in an audio processing application.

I did also try it on different hardware, not just different OSes, and so far
they all give me good results. I am aware that QueryPerformanceCounter() is
not supported on all hardware, but so far I haven't come accross any that
doesn't support it.

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Steven Schulze" <steven....@ktd-kyocera.com> wrote in message
news:OwVCH8f4$GA.246@cppssbbsa04...

It's been a while since I looked at that code, so I made a mistake in my
message. I wasn't using QueryPerformanceCounter(), but the "RDTSC" assembly
command. So just replace QueryPerformanceCounter() with RDTSC in my
previous message.

The same still applies to both NT, W2K and 9x as far as resolution and
accuracy of the RDTSC method of timing.

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:e#rzo2g4$GA....@cppssbbsa02.microsoft.com...

> Steven Schulze <steven....@ktd-kyocera.com> wrote in message

> news:#TxXzEg4$GA.245@cppssbbsa04...

>
> > It's been a while since I looked at that code, so I made a mistake in my
> > message. I wasn't using QueryPerformanceCounter(), but the "RDTSC"
> assembly
> > command. So just replace QueryPerformanceCounter() with RDTSC in my
> > previous message.
>

> Sorry, and what is that going to reveal? A part of my message? The message
> that you said was not correct? Would you like to tell us why you decided
to
> ditch QPC() in favor of RDTSC? Please do, this will certainly reconstruct
> the other part of my message.

Relax, will you! If anyone bothers to go back a few messages, they'll see
you are correct and everybody else is incorrect, so just take it easy....

Whatever, some people seem to insist that it's impossible to get good
resolution and precision on 9X, which is not true. Using RDTSC will indeed
give you very good results (yes, as you pointed out). So my previous
message is more directed to the people who say otherwise, not you in
particular.

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:ekFyn5x1$GA.245@cppssbbsa04...
> Error Coad <ec...@squat.com> wrote in message
> news:qY925.520$Qb1....@monger.newsread.com...

> > > Take a look at GetTickCount().
> >
> > And then forget about it, because it is of no help whatsoever here.
> >
> > 1) Its precision is only 1 msec (not one microsecond)
> > 2) Even worse, its resolution is 55 msec!
> >
> > QueryPerformanceCounter() is the correct answer.
>
> Nope. QPC() takes around 500 CPU ticks to execute, which on typical CPUs
is
> around 2 us. This is of course better than GetTickCount(). The correct
> answer is the RDTSC instruction, but you should be prepared to do the
> bookkeeping Real Fast (TM).
>
> And I don't even mention that QPC() works properly only with particular NT
> HALs, typically the SMP ones, and nevermind Win9x, which will welcome you
> with the good ol' fuzzy 55 ms.

Actually, I do think you are wrong. Here is an excerpt from MSDN regarding
QPC():

"...The Win32 API QueryPerformanceCounter() returns the resolution of a
high- resolution performance counter if the hardware supports one. For x86,
the resolution is about 0.8 microseconds (0.0008 ms). You need to call
QueryPerformanceFrequency() to get the frequency of the high-resolution
performance counter...."

It seems this resolution is hardware dependent, not OS dependent. This has
been my experience on NT and 9X. I get the same results. And 0.0008 ms is
nowhere NEAR your claimed 55ms on 9X.

I think GetTickCount() is the one to avoid, not QPC().

Steven Schulze

Woodrow_Stool

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

I'd like to chime in here. There's been a bit of spew posted in these
microsecond timing threads that I'm surprised occured on a Microsoft
sponsored news server.

QueryPerformanceCounter() works correctly as advertised on any version of
Win32 - 95, 98, NT4, 2000. To use it properly you first call
QueryPerformanceFrequency() to get the clock rate. On an x86 box this is
the 8253/8254 timer controller, which has a tick rate of ~838 ns. Slava
went to alot of trouble to extract this code from the kernel in another
thread but came to the wrong conclusion - seeing that the 8253/8254 was
being read, he concluded it only returned "ticks" with 55ms resolution -
BZZT! Not true - this thing is ticking away at ~1.19 Mhz, and rolls over
every ~55 ms - but the individual tick count read (which is what the
extracted code does) is accurate to ~838 nanoseconds, which is exactly what
QueryPerformanceFrequency() tells you.

We publish a precision timing library that works across MSDOS, Win16, Win32,
and shortly Linux. I assure all of you that QueryPerformanceCounter() on
Win32 is the Real Deal.

>
> I think GetTickCount() is the one to avoid, not QPC().
>

Correct!

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Slava M. Usov <stripit...@usa.net> wrote in message

news:eON6WFj4$GA....@cppssbbsa02.microsoft.com...

> Steven Schulze <steven....@ktd-kyocera.com> wrote in message

> news:#GAwRjh4$GA.196@cppssbbsa04...

>
> > Actually, I do think you are wrong. Here is an excerpt from MSDN
> regarding
> > QPC():
> >
> > "...The Win32 API QueryPerformanceCounter() returns the resolution of a
> > high- resolution performance counter if the hardware supports one. For
> x86,
> > the resolution is about 0.8 microseconds (0.0008 ms). You need to call
> > QueryPerformanceFrequency() to get the frequency of the high-resolution
> > performance counter...."
>

> Very interesting. What was the version of MSDN library you got that from?
> The online library says nothing even remotely close to that. I looked both
> at the general description of high-res counters,

Actually, it's very close to that, in fact it says EXACTLY that. Try
Article ID: Q115232, 3rd paragraph. From that it's pretty clear that QPC()
is hardware dependent, not OS dependent as you keep insisting on...

I'm using a pretty old MSDN version - Jan 1999. I'm sure they pretty much
all have it in there, you just have to look for it.

Steven Schulze

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Slava M. Usov <stripit...@usa.net> wrote in message

news:unXwaFj4$GA....@cppssbbsa02.microsoft.com...

> Steven Schulze <steven....@ktd-kyocera.com> wrote in message

> news:OE2nsEh4$GA.65@cppssbbsa04...

>
> > Relax, will you! If anyone bothers to go back a few messages, they'll
see
> > you are correct and everybody else is incorrect, so just take it
easy....
>

> I'm quite relaxed, thank you. Perhaps, it is just me, but I check my
sources
> before posting, especially if that contradicts to some other people's
> claims. And for some unknown reason I expect that from others. A question
of
> attitude, I guess.

Really? As it turns out you were wrong about QPC(), so I would not be too
quick about pointing the finger to other people's postings. QPC() has a
resolution of 838ns on most x86 hardware, irrespective of OS. This is a far
cry from your claimed 55ms on 9X. In another post you admitted you "don't
really care for Win9x, and just mechanically repeat what I[you] have
gathered from this [.kernel] forum". Yet you shoot down other people that
actually have EXPERIENCE using QPC() on 9X.

I'm telling you now that my own, actual hands-on experience confirms that
I'm getting the advertised resolution out of QPC() on 9X. I actually did
dig up my old code today, and confirmed the resolution - I'm seeing a
repeatable 10-12 microseconds or so on a short code run. That's pretty
high-res.

What results did YOU experience on 9X?

Steven Schulze

Guillaume

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

stev...@pacbell.net (Steven Schulze) wrote in
<ObZY6Ok4$GA.880@cpmsftngp05>:

>Actually, it's very close to that, in fact it says EXACTLY that. Try
>Article ID: Q115232, 3rd paragraph. From that it's pretty clear that
>QPC() is hardware dependent, not OS dependent as you keep insisting
>on...

I looked it up, and it's indeed in the knowledge base.
Apparently it's only valid for Win NT/x86 platforms though.

--
___________________________________________________________________
XS Software
Your software solutions at http://lightning.prohosting.com/~xssoft/

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Steven Schulze <steven....@ktd-kyocera.com> wrote in message

news:#TxXzEg4$GA.245@cppssbbsa04...

> It's been a while since I looked at that code, so I made a mistake in my
> message. I wasn't using QueryPerformanceCounter(), but the "RDTSC"
assembly
> command. So just replace QueryPerformanceCounter() with RDTSC in my
> previous message.

Sorry, and what is that going to reveal? A part of my message? The message
that you said was not correct? Would you like to tell us why you decided to
ditch QPC() in favor of RDTSC? Please do, this will certainly reconstruct
the other part of my message.

--

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Steven Schulze <steven....@ktd-kyocera.com> wrote in message

news:#GAwRjh4$GA.196@cppssbbsa04...

> Actually, I do think you are wrong. Here is an excerpt from MSDN
regarding
> QPC():
>
> "...The Win32 API QueryPerformanceCounter() returns the resolution of a
> high- resolution performance counter if the hardware supports one. For
x86,
> the resolution is about 0.8 microseconds (0.0008 ms). You need to call
> QueryPerformanceFrequency() to get the frequency of the high-resolution
> performance counter...."

Very interesting. What was the version of MSDN library you got that from?
The online library says nothing even remotely close to that. I looked both
at the general description of high-res counters,

http://msdn.microsoft.com/library/default.asp?URL=/library/psdk/winui/timers
_827m.htm , then at the article on QPF(),
http://msdn.microsoft.com/library/default.asp?URL=/library/psdk/winui/timers
_6mk9.htm , and at QPC(),
http://msdn.microsoft.com/library/default.asp?URL=/library/psdk/winui/timers
_4z76.htm , nothing similar.

> It seems this resolution is hardware dependent, not OS dependent.

It's both. This depends on the source of the timing signal the OS chooses,
and that's OS dependent, on how the signal is obtained and used, and that's
again OS dependent, and on the resoltion of the signal, which is hardware
dependent. Two standard NT HALs, the SMP and the non-SMP ones, use different
timing sources. The SMP HAL uses the RDTSC instruction, and it's effective
resolution is the CPU_tick_duration*500 = 500 / CPU_frequency, which amounts
to ~2 us for 300 MHz, which I think is still typical. The non-SMP HAL uses
the PIT counter, normally programmed for the minimal frequency, and that
results in 55 ms. To the best of my knowledge, Win9x uses the PIT counter as
well, so it should have 55 ms, too.

I do admit that there were posts indicating higher resolutions on Win9x,
there were even a post recently claiming that GetTickCount() on Win9x() had
1 ms resolution. But the general agreement is that Win9x has 55 ms
resolution for everything. If you want, you may query Deja and see for
yourself.

> This has
> been my experience on NT and 9X. I get the same results. And 0.0008 ms
is

> nowhere NEAR your claimed 55ms on 9X.

It is this 0.8 us figure that is nowhere close to the resolutions observed
in reality. In the NT case, you're welcome to disassemble the relevant HAL
routines and see why 0.8 us is totally bogus. I don't really care for Win9x,
and just mechanically repeat what I have gathered from this [.kernel] forum.

> I think GetTickCount() is the one to avoid, not QPC().

Depends. I would use RDTSC in any case.

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Steven Schulze <steven....@ktd-kyocera.com> wrote in message

news:OE2nsEh4$GA.65@cppssbbsa04...

> Relax, will you! If anyone bothers to go back a few messages, they'll see
> you are correct and everybody else is incorrect, so just take it easy....

I'm quite relaxed, thank you. Perhaps, it is just me, but I check my sources
before posting, especially if that contradicts to some other people's
claims. And for some unknown reason I expect that from others. A question of
attitude, I guess.

--

Darth Dyson

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

In brief - i think these guys are saying yes!
"Hugo Bérubé" <Hugo....@drev.dnd.ca> wrote in message
news:ussNE6u1$GA.281@cppssbbsa04...

> Hello,
>
> Can I retreive the time in time unit little than millisecond ?
>
> I search a way to get the time difference beetween to events. Those
events
> are generated approximatly at a 10kHz rate (every 100 microseconds).
>

> Thanks,
>
> Hugo
>
>

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Woodrow_Stool <inco...@nospam.com> wrote in message
news:OrzEiwi4$GA....@cppssbbsa02.microsoft.com...

> I'd like to chime in here. There's been a bit of spew posted in these
> microsecond timing threads that I'm surprised occured on a Microsoft
> sponsored news server.
>
> QueryPerformanceCounter() works correctly as advertised on any version of
> Win32 - 95, 98, NT4, 2000. To use it properly you first call
> QueryPerformanceFrequency() to get the clock rate. On an x86 box this is
> the 8253/8254 timer controller, which has a tick rate of ~838 ns. Slava
> went to alot of trouble to extract this code from the kernel in another
> thread but came to the wrong conclusion - seeing that the 8253/8254 was
> being read, he concluded it only returned "ticks" with 55ms resolution -
> BZZT! Not true - this thing is ticking away at ~1.19 Mhz, and rolls over
> every ~55 ms - but the individual tick count read (which is what the
> extracted code does) is accurate to ~838 nanoseconds, which is exactly
what
> QueryPerformanceFrequency() tells you.

Hmm. I did not claim that NT's QPC() had 55 ms resolution. I simply pointed
out that it would read the PIT counter, which is slow and hurts overall
performance. The in/out instructions usually lock the bus, take a lot of
cycles to execute, etc. This means it is not very suitable for real
heavy-duty profiling. Its actual resolution is much worse than the
theoretical 0.8 us, see http://x63.deja.com/getdoc.xp?AN=492115793 for an
independent review.

I did said QPC() had 55 ms resolution on Win9x, but that's what I was myself
told. If you maintain that Win9x does have a better resolution for QPC(),
then from now on I'll be repeating that after you.

> We publish a precision timing library that works across MSDOS, Win16,
Win32,
> and shortly Linux. I assure all of you that QueryPerformanceCounter() on
> Win32 is the Real Deal.

I don't think so, but not because the resolution is bad. It's just too much
overhead.

> > I think GetTickCount() is the one to avoid, not QPC().
> >
>

> Correct!

GetTickCount(), or, better, GetSystemTimeAsFileTime(), is to use for "rough"
timing, RDTSC is to use for fine timing, and QPC() is to avoid. The time it
takes to execute QPC() on a uniprocessor NT machine is enough to enter and
leave mutex FOUR TIMES and then spin some. And all this time with the bus
locked and interrupts disabled.

Will Dean

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message
news:unXwaFj4$GA....@cppssbbsa02.microsoft.com...
>

> I'm quite relaxed, thank you. Perhaps, it is just me, but I check my
sources
> before posting, especially if that contradicts to some other people's
> claims. And for some unknown reason I expect that from others. A question
of
> attitude, I guess.

But you shouldn't cite a disassembly (your own?) wrongly and then use the
term 'bogus' about someone else's information. That's not a good attitude
either.

Will

Will Dean

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:eON6WFj4$GA....@cppssbbsa02.microsoft.com...

> to ~2 us for 300 MHz, which I think is still typical. The non-SMP HAL uses
> the PIT counter, normally programmed for the minimal frequency, and that
> results in 55 ms. To the best of my knowledge, Win9x uses the PIT counter
as
> well, so it should have 55 ms, too.
>

> It is this 0.8 us figure that is nowhere close to the resolutions observed
> in reality. In the NT case, you're welcome to disassemble the relevant HAL
> routines and see why 0.8 us is totally bogus. I don't really care for
Win9x,
> and just mechanically repeat what I have gathered from this [.kernel]
forum.

Bogus?

Under DOS/9x, the PIT generates ~55ms by generating an interrupt at
over/under-flow (can't remember which) of a 16bit counter.

It's clocked at about 1.17 MHz, so the slowest interupt rate it can generate
is about 55ms. Each tick of the counter is 65536 times smaller or about
840us. On non-SMP machines, I've always assumed that it's programmed to
wrap sooner, giving the 10ms rate.

On non SMP HAL's and 95, QPC reads the timer directly to get the resolution
of approximately 0.8us - it is presumably the hairy business of combinining
this with the overflow counters that involves all the interrupt disabling
and other bad things.

Will

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Will Dean <{newsdump}@industrial.demon.co.uk> wrote in message
news:962377901.10298.0...@news.demon.co.uk...

> On non SMP HAL's and 95, QPC reads the timer directly to get the
resolution
> of approximately 0.8us - it is presumably the hairy business of
combinining
> this with the overflow counters that involves all the interrupt disabling
> and other bad things.

Yes, and exactly for this reason the resolution is far from 0.8 us [speaking
NT here]. As indicated in a Deja URL I referenced in another message, the
actual resolution is around 30 us. Even if the hardware timer ticked at 1
GHz, I would say "don't use QPC() it's slow and terrible." And I say just
that. Besides, I always suggest a valid and better alternative, so I don't
understand why you [seemingly] and a few other folks want to use QPC() so
badly.

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Steven Schulze <stev...@pacbell.net> wrote in message
news:uvwSZRk4$GA.464@cpmsftngp05...

> Really? As it turns out you were wrong about QPC(), so I would not be too
> quick about pointing the finger to other people's postings. QPC() has a
> resolution of 838ns on most x86 hardware, irrespective of OS. This is a
far
> cry from your claimed 55ms on 9X. In another post you admitted you "don't
> really care for Win9x, and just mechanically repeat what I[you] have
> gathered from this [.kernel] forum". Yet you shoot down other people that
> actually have EXPERIENCE using QPC() on 9X.

1. Show me how exactly I was wrong about QPC(). I admit that I was wrong
about QPC() on a particular platform, but that does not mean that my general
"don't use QPC() use RDTC" idea was flawed in any way. Even your own posts
admit that you preferred RDTSC over QPC(). For some reason, you're not
willing to explain why. And you followed up my message where I stated that
general idea, and you told that the general idea was wrong.

2. I did not shoot people with or without experience. I was a little bit
irritated that you first, "disproved" me, then realized you were wrong, then
"corrected" your claim, basically reproducing my general principle, still
pretending that I was wrong. Did you ever admit that QPC() was actually a
lot worse than RDTSC? No, and you used it just because you like inline
assembly, correct?

3. I don't care a thing ;-) about the "resolution" of QPC() so long as it is
one of the slowest Win32 API, and there is a better timing source.

> I'm telling you now that my own, actual hands-on experience confirms that
> I'm getting the advertised resolution out of QPC() on 9X. I actually did
> dig up my old code today, and confirmed the resolution - I'm seeing a
> repeatable 10-12 microseconds or so on a short code run. That's pretty
> high-res.

Sure. Notice the general execution time of the profiled code with QPC(), and
then do the same with RDTSC. QPC() is especially helpful when one tries to p
rofile a high-performance synchronization mechanism, when the amount of time
spent in WaitForXXX(), ReleaseMutex(), SetEvent(), etc are negligible to the
amount of time it takes execute QPC() twice.

Apart from that, QPC() is a nice thing with a handy long name and superb
resolution. Yeah, *right*.

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Will Dean <{newsdump}@industrial.demon.co.uk> wrote in message

news:962377902.10298.1...@news.demon.co.uk...

> But you shouldn't cite a disassembly (your own?) wrongly and then use the
> term 'bogus' about someone else's information. That's not a good attitude
> either.

I repeat. I cited the disassembly, and you could look up my message with
Deja. The message with disassemble did not say anything about 55 ms. It
simple showed that the code that reads the PIT counter, regardless of its
speed, is very slow and locks down the machine completely for the duration
it runs. And yes, 0.8 us is bogus, since you cannot get two successive
readings even at 8 us. If that is not bogus, what is bogus then?

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Steven Schulze <stev...@pacbell.net> wrote in message

news:ObZY6Ok4$GA.880@cpmsftngp05...

> Actually, it's very close to that, in fact it says EXACTLY that. Try
> Article ID: Q115232, 3rd paragraph. From that it's pretty clear that
QPC()
> is hardware dependent, not OS dependent as you keep insisting on...

Sure. Except that the article says *strictly* about NT. If that is not OS
dependent, I don't know what you mean by OS dependent.

> I'm using a pretty old MSDN version - Jan 1999. I'm sure they pretty much
> all have it in there, you just have to look for it.

http://support.microsoft.com/support/kb/articles/q115/2/32.asp , and the
title of it is: "INFO: Timer Resolution in Windows NT and Windows 2000".
*Certainly* relevant for beating me on Win9x, for which I, admittedly, don't
have any sources except anecdotical stories. Do YOU have any sources
different from anecdotical stories?

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Darth Dyson <feel...@hotmail.com> wrote in message
news:OuysdVk4$GA.464@cpmsftngp05...

> In brief - i think these guys are saying yes!

Yes. Via RDTSC. :-)

Joe O'Leary

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

How does one determine at runtime if the RDTSC instruction is even
available? I know that on WinNT/2000 I can call the Win32
IsProcessorFeatureAvailable() function to determine of the CPU
supports RDTSC. But what do you do it on a Win9x machine? I need to
be able to do that before I can change my code to use it.

Joe O'

"Slava M. Usov" wrote
> Steven Schulze wrote

Slava M. Usov

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Joe O'Leary <jol...@artisoft.com> wrote in message
news:Ib475.22097$_b3.4...@newsread1.prod.itd.earthlink.net...

> How does one determine at runtime if the RDTSC instruction is even
> available? I know that on WinNT/2000 I can call the Win32
> IsProcessorFeatureAvailable() function to determine of the CPU
> supports RDTSC. But what do you do it on a Win9x machine? I need to
> be able to do that before I can change my code to use it.

Below is a section of the Intel Architecture Software Developer's Manual,
Volume 3. Sorry about wrapping.

[begin quote]

15.5. TIME-STAMP COUNTER
The Intel Architecture (beginning with the Pentium R processor) defines a
time-stamp counter
mechanism that can be used to monitor and identify the relative time of
occurrence of processor
events. The time-stamp counter architecture includes an instruction for
reading the time-stamp
counter (RDTSC), a feature bit (TCS flag) that can be read with the CPUID
instruction, a time-stamp
counter disable bit (TSD flag) in control register CR4, and a model-specific
time-stamp
counter.
Following execution of the CPUID instruction, the TSC flag in register EDX
(bit 4) indicates
(when set) that the time-stamp counter is present in a particular Intel
Architecture processor
implementation. (Refer to "CPUID-CPU Identification" in Chapter 3 of the
Intel Architecture
Software Developer's Manual, Volume 2.)
The time-stamp counter (as implemented in the Pentium R and P6 family
processors) is a 64-bit
counter that is set to 0 following the hardware reset of the processor.
Following reset, the counter
is incremented every processor clock cycle, even when the processor is
halted by the HLT
instruction or the external STPCLK# pin.
The RDTSC instruction reads the time-stamp counter and is guaranteed to
return a monotoni-cally
increasing unique value whenever executed, except for 64-bit counter
wraparound. Intel
guarantees, architecturally, that the time-stamp counter frequency and
configuration will be such
that it will not wraparound within 10 years after being reset to 0. The
period for counter wrap is
several thousands of years in the Pentium R and P6 family processors.
Normally, the RDTSC instruction can be executed by programs and procedures
running at any
privilege level and in virtual-8086 mode. The TSD flag in control register
CR4 (bit 2) allows
use of this instruction to be restricted to only programs and procedures
running at privilege level
0. A secure operating system would set the TSD flag during system
initialization to disable user
access to the time-stamp counter. An operating system that disables user
access to the time-stamp
counter should emulate the instruction through a user-accessible programming
interface.
The RDTSC instruction is not serializing or ordered with other instructions.
Thus, it does not
necessarily wait until all previous instructions have been executed before
reading the counter.
Similarly, subsequent instructions may begin execution before the RDTSC
instruction operation
is performed.
The RDMSR and WRMSR instructions can read and write the time-stamp counter,
respectively,
as a model-specific register (TSC). The ability to read and write the
time-stamp counter with the
RDMSR and WRMSR instructions is not an architectural feature, and may not be
supported by
future Intel Architecture processors. Writing to the time-stamp counter with
the WRMSR
instruction resets the count. Only the low order 32-bits of the time-stamp
counter can be written
to; the high-order 32 bits are 0 extended (cleared to all 0s).

[end quote]

Steven Schulze

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:eKEZv8q4$GA....@cppssbbsa02.microsoft.com...

> Steven Schulze <stev...@pacbell.net> wrote in message

> news:uvwSZRk4$GA.464@cpmsftngp05...
>
> > Really? As it turns out you were wrong about QPC(), so I would not be
too
> > quick about pointing the finger to other people's postings. QPC() has a
> > resolution of 838ns on most x86 hardware, irrespective of OS. This is a
> far
> > cry from your claimed 55ms on 9X. In another post you admitted you
"don't
> > really care for Win9x, and just mechanically repeat what I[you] have
> > gathered from this [.kernel] forum". Yet you shoot down other people
that
> > actually have EXPERIENCE using QPC() on 9X.
>
> 1. Show me how exactly I was wrong about QPC(). I admit that I was wrong
> about QPC() on a particular platform, but that does not mean that my
general
> "don't use QPC() use RDTC" idea was flawed in any way. Even your own posts
> admit that you preferred RDTSC over QPC(). For some reason, you're not
> willing to explain why. And you followed up my message where I stated that
> general idea, and you told that the general idea was wrong.

The reason I preferred RDTSC was simply because I liked the idea of 1 cycle
resolution vs 838ns resolution. On my PII 450, that's 2.22ns vs 838ns.

> 2. I did not shoot people with or without experience. I was a little bit
> irritated that you first, "disproved" me, then realized you were wrong,
then
> "corrected" your claim, basically reproducing my general principle, still
> pretending that I was wrong. Did you ever admit that QPC() was actually a
> lot worse than RDTSC? No, and you used it just because you like inline
> assembly, correct?

Yes, but as it turned out, the reason why I used RDTSC instead of QPC() was
not because QPC() had a 55ms resolution, as you claimed. Essentially, my
first message is still correct, save for me initially writing QPC() instead
of RDTSC. And if you really want to get into the details, when I went back
to my old code yesterday, I found that I had two different timer classes,
one using QPC(), and the other using both QPC() and RDTSC. The second class
needs QPC() to initially determine the CPU speed, because by just using
RDTSC by itself, you can't get to an actual elapsed time is seconds - you
need the CPU speed.

So, essentially, the only difference between my two timing classes is
resolution. One has 838ns, the other has (depending on CPU speed), around
1ns to 2ns (yes, I know there's overhead, but see below...).

> 3. I don't care a thing ;-) about the "resolution" of QPC() so long as it
is
> one of the slowest Win32 API, and there is a better timing source.

Yes, true, but with a little clever programming and calibrating before
actually doing timing tests, you can compensate for that. I found that the
time to call QPC(), while relatively long, is pretty consistant. Keeping
this in mind, it's easy to subtract the time spent in the class wrapper
functions from the time returned from QPC(). One thing I always do before
doing the calibration is to call Sleep(0) to reduce the chances of my
calibrating function being pre-empted mid-test. Works pretty good. When I
do a Start() and Stop() right after each other once my timing class has been
calibrated, I usually get 0, 0.8 or 1.7us between multiple runs. So it's
pretty safe to assume that for non-pre-empted code runs, the results are
within, say, 5us. Not too shabby. The results from my RDTSC class is
similar, but at an even better resolution, of course.

> > I'm telling you now that my own, actual hands-on experience confirms
that
> > I'm getting the advertised resolution out of QPC() on 9X. I actually
did
> > dig up my old code today, and confirmed the resolution - I'm seeing a
> > repeatable 10-12 microseconds or so on a short code run. That's pretty
> > high-res.
>

> Sure. Notice the general execution time of the profiled code with QPC(),
and
> then do the same with RDTSC. QPC() is especially helpful when one tries to
p
> rofile a high-performance synchronization mechanism, when the amount of
time
> spent in WaitForXXX(), ReleaseMutex(), SetEvent(), etc are negligible to
the
> amount of time it takes execute QPC() twice.
>
> Apart from that, QPC() is a nice thing with a handy long name and superb
> resolution. Yeah, *right*.

Yes, but as I said, you can compensate for that, plus 838ns is pretty high
res, and yes, this is also true on 9X. High enough for MOST types of test,
and again, still WAY higher than your initial 55ms claim...

Steven Schulze

Joe O'Leary

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Thanks!

Joe O'

"Slava M. Usov" <stripit...@usa.net>

> Joe O'Leary <jol...@artisoft.com> wrote i

> > How does one determine at runtime if the RDTSC instruction is even
> > available?
>

> Below is a section of the Intel Architecture Software Developer's
Manual,
> Volume 3. Sorry about wrapping.
>
> [begin quote]

(snip)

Ken Reneris

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

I made a different post via Deja news but I think it eat it.

The RDTSC function is fine if you control the platform you use it on because
on almost all platforms it works as one expects. However, there are a
couple of expcetions where the count does not work like you expect which is
why NT doesn't use it on all systems (because people file bugs for these
corner systems and the only solution was to drop RDTSC support).

On MP systems there is a flag in the MPS 1.4d BIOS table which informs the
MP OS if the RDTSC is safe to use. If so, the OS can reset/calibrate QPC()
to use RDTSC. If the flag is not set, then NT will use some other timer
even on MP systems (either the 8254 or the ACPI timer). In an MP case, some
high end MP machines support processors of non-matched speeds. If that has
been what is installed, the BIOS will tell the OS not to use the RDTSC.
Also, in most MP systems are designed to a signal crystal to derive bus and
cpu clocks from; however, some might use different crystals at different
locations. If different crystals are used to derive a processors' clocking
then the processors will drift apart by the acceptable error margin of a
non-compensated crystal (50ppm). This doesn't sound like much, but after
you've been running your system for a week the times between the drift
between then can add up to seconds. So if you take time on one processor,
then later on another you would get something that doesn't correlate very
well. Again in this case NT will revert to using a different timer as the
timing source.

On non-MP machines, there's not similar BIOS flag and there are corner cases
where RDTSC doesn't work as expected. Since the OS has no decent way to
tell if RDTSC will track time it ends up not using it. Most of these
systems are older, but when RDTSC was first introduced the stl-halt
instruction and STPCLK# signal would cause the counter to stop on some
steppings and not others. (STPCLK# is used in portables to save power /
lower temperature.. the h/w attempts to slow down the processor while the
machine is running.) In addition, there were some attempts at actually
altering the input frequency to the processor to drop the voltage which
saves even more power. These things would cause the speed of the counter
(as compared to a wall clock) to change.

It's important to note that RDTSC is a cycle counter, and for the most part
I think it succeeds at being that. However, trying to map the cycle
counter to real time is where the issue occurs. If you only want to count
how long something is in cycles, rdtsc is a good choice. The smaller the
timing the better it is... if you can capture the context switch and
interrupt count on both sides you can even make sure your only timing your
code. If you want to time something that takes a long time (e.g., involves
context switches and interrupts) I might use QFP() counter instead. And if
you want to relate the captured timing to real time (esp over a large
duration), I wouldn't use RDTSC unless your a vertical market solution where
you can control the platform as well. For example, if you use RDTSC to
determine the time to put frames of a movie up.. you might notice that by
the time the movie ended that on some systems your not near the right time.

Please send all flames elsewhere - I only offer what I known about the
topic. My personal opinion is that I think UP systems should add a similar
flag to what MP systems have to allow the OS to use RDTSC when possible as
the QFP().

- Ken

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:u3K5q8q4$GA....@cppssbbsa02.microsoft.com...

> Will Dean <{newsdump}@industrial.demon.co.uk> wrote in message

> news:962377901.10298.0...@news.demon.co.uk...
>
> > On non SMP HAL's and 95, QPC reads the timer directly to get the
> resolution
> > of approximately 0.8us - it is presumably the hairy business of
> combinining
> > this with the overflow counters that involves all the interrupt
disabling
> > and other bad things.
>
> Yes, and exactly for this reason the resolution is far from 0.8 us
[speaking
> NT here]. As indicated in a Deja URL I referenced in another message, the
> actual resolution is around 30 us. Even if the hardware timer ticked at 1
> GHz, I would say "don't use QPC() it's slow and terrible." And I say just
> that. Besides, I always suggest a valid and better alternative, so I don't
> understand why you [seemingly] and a few other folks want to use QPC() so
> badly.
>

Steven Schulze

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:uwFxxLu4$GA...@cppssbbsa02.microsoft.com...

> Steven Schulze <steven....@ktd-kyocera.com> wrote in message

> news:OTQgdns4$GA.468@cpmsftngp03...

>
> > The reason I preferred RDTSC was simply because I liked the idea of 1
> cycle
> > resolution vs 838ns resolution. On my PII 450, that's 2.22ns vs 838ns.
>

> OK, I can buy that. More to the point, I had a few occasions to profile
code
> spans that would not exceed 200-500 CPU ticks. Needless to say, QPC() is
> useless, and no calibrating can compensate for that. I don't think this
> happens very often, but that happens. And why have two different things
when
> one of them is simply the best?

Simple. If you want an easy method of doing performance tests, but you
don't need extremely high resolution, then QPC() is a good choice. It's
much easier to use than RDTSC. If you want more accuracy, use RDTSC,
keeping in mind it's a little bit more involved to code.

Also, in my case I simply have the two methods because initially I was
unaware of the RDTSC method. Once I found out about it, I implimented it in
a new class. There's no other reason why I have two seperate timing
methods.

But you are right, once you have some sort of RDTSC implimentation (like a
class, etc), there's no reason to use QPC() anymore.

Steven Schulze

Slava M. Usov

unread,

Jul 1, 2000, 3:00:00 AM7/1/00

to

Steven Schulze <steven....@ktd-kyocera.com> wrote in message
news:OTQgdns4$GA.468@cpmsftngp03...

> The reason I preferred RDTSC was simply because I liked the idea of 1
cycle
> resolution vs 838ns resolution. On my PII 450, that's 2.22ns vs 838ns.

OK, I can buy that. More to the point, I had a few occasions to profile code
spans that would not exceed 200-500 CPU ticks. Needless to say, QPC() is
useless, and no calibrating can compensate for that. I don't think this
happens very often, but that happens. And why have two different things when
one of them is simply the best?

> Yes, but as it turned out, the reason why I used RDTSC instead of QPC()
was
> not because QPC() had a 55ms resolution, as you claimed. Essentially, my
> first message is still correct, save for me initially writing QPC()
instead
> of RDTSC.

BTW, I never said your message was incorrect. My reply [a day later, I think
that was a bit too quick of me, and I apologize for being rude] was for
another reason. You are still raving about that 55 ms, while not noticing
that the message of mine had a slightly different focus. It is still
correct, regardless of that. For the modern and the soon-to-be-over-GHz
CPUs, ~1 MHz is not really high resolution. And even if it were 1 GHz now,
QPC() would still be too much overhead and a performance drain. You may not
care about that just like I don't care about Win9x, but everybody's happy
when using RDTSC.

[...]

> Yes, but as I said, you can compensate for that, plus 838ns is pretty high
> res, and yes, this is also true on 9X. High enough for MOST types of
test,
> and again, still WAY higher than your initial 55ms claim...

OK, my 55-ms-QPC()-on-Win9x claim is wrong. Happy? Now tell me, why should I
use QPC()? Why not RDTSC? Why should I put up with QPC() inefficiencies?
Just why? Will you then want to say that with some clever coding the
new-style security APIs can be made usable? How about MFC sockets?

Just listen to your reasoning: "You [i.e., I] said that QPC() on Win9x had
55 ms resolution. That's wrong. So you whole message is wrong. So QPC() is
just as good as RDTSC, except it has a little lower resolution." So any
person who happened to read just that would think: "Hey, QPC() is a good
thing. I don't need to mess with my old compiler to emit the RDTSC
instruction, I will just use QPC()."

What's your point? To make me say "QPC() on Win9x has a resolution better
than 55 ms"? Come on, I have admitted that for a few times now. I will not
say that its source has 0.8 us resolution, since you have failed to
reference anything to confirm that, but I have said and am saying it again:
"some people report that QPC() on Win9x has sub microsecond resolution". Now
what?

Me thesis still holds: "Forget about QPC() and use RDTSC. You have nothing
to lose and everything to gain." Want to add anything?

Ken Reneris

unread,

Jul 1, 2000, 3:00:00 AM7/1/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:uztd7Q74$GA.244@cppssbbsa05...
> Ken Reneris <k...@reneris.com> wrote in message
> news:slqct06...@corp.supernews.com...
>
> I strongly believe that NT is not designed to run on such systems. For
> example, its NtQuerySystemInformation() returns the number of processors
but
> it does not return frequency of each CPU; rather, it returns the CPU
family,
> stepping, and frequency, one for all CPUs.

Last I knew Microsoft does not directly support it, but NT will run on
different speed processors and some OEMs do do it and support it (e.g.,
Microsoft will refer support on such systems to the OEM). It is moslty in
larger iron where owners of such machines buy upgrades for the machines but
don't want to replace the whole machine at once, so what ends up happening
is the machine has some previous year processors and when they need more
throughput ends up with some current year processors being added.

Note that NtQuerySystemInformation() is not an exposed API -
QuerySystemInformation() is, and you are right that QuerySystemInformation()
does not expose the per-processor information.

But you can find the per-processor information in the registery at:
LOCAL_MACHINE\Hardware\Description\System\CentralProcessor\N

where 'N' is the processor number (e.g, 0, 1, 2 ... 31). It includes
processor-vendor, stepping, frequency and other info for each processor.

QuerySystemInformation() returns the common/lowest processor level being
supported on the platform.

> Anyway, are you aware of any such system, which runs NT? No flames, I
really
> am curious about that.

I think most of the larger OEMs will allow it. The more-or-less standard 4
processor chip set can deal with it, so the OEMs like to say it's possible
but the test matrix gets to be very complex so I don't know what's actually
supported. And, of course, most anything with more then 4 processors
supports mixed speeds as the investment in the machines is so large.

> OK... I agree there is a theoretical possibility. But are there any
machines
> with separate crystals that run NT? All I can think of are a few high-end
> Sun and IBM servers that do hot-swap CPU modules. Perhaps, HP as well. But
> to the best of my knowledge there are no such machines running NT. And I'm
> not even sure that said machines have separates crystals.

Some of the older MP machines, and many of the machines that support more
then 4 processors - and yes NT runs on them. It is why the flag got added
into the MPS spec if the first place. The first NT 4.0 betas used RDTSC on
all MP boxes and then there was the drift problem that was noticed on some
larger systems by (I believe NCR) and the flag needed to be added. Some
other systems turned up afterwords as well.

> [...]
>
> For short timings, RDTSC will do just fine. For real paranoid code, it's
> possible to determine if there is a TSC shift between CPUs, I believe down
> to ten CPU clocks or so.

> To detect if there was a context switch between two measurements, if
that's
> necessary for some reason, GetTickCount() will do fine.

> Frequency shift... well, I can't imagine somebody's going to run software
> that does accurate timing on a portable or anything like that. For
example,
> I've seen quite a few commodity computers used as routers, all sort of
> gateways, servers, process control hosts, etc, all with power management
> disabled. Personal computers and notebooks don't count here, for the human
> operator may and should ensure that a time-critical process won't be
> subjected to a frequency shift.

It all really depends on what and how the timing is being used for, and for
some people portables can not be ruled out. There are many usage models
where RDTSC will work, others where QFP() is a better choice, and yet others
where something else is better.

I should point out that disabling power management does not necessarily
disable thermal management. Some systems rely on slowing the processor(s)
down in thermal situations. Given that damage might be caused, these
features typical are not turn offable. (Most attempt to control thermal
by first using a fan, then slowing the processor, then if possible
disconnecting
the power).

> It's too late, I guess. The old systems, which are likely to have faulty
> CPUs, will not have it. The new systems will not need it. Besides, I don't
> think that it's totally impossible for the OS vendor to compile the list
of
> CPUs that do RDTSC incorrectly.

"incorrectly" is a matter of prespective. RDTSC is a cycle counter. In
all casesit does that. So from the processor's point of view RDTSC works
as advertised. It is not spec'ed that the count will count in a uniform
time, just that it will count processor cycles. Actual cycles the that
processor ran. The subtlty is that cycles may not be created equal. Cycles
might be different on one processor or another in an MP system or cycles
might change for power or thermal management. In fact, if you built a
processor you could argue that you don't want RDTSC to count real-time, that
the interesting thing is to know the number of processor cycles (e.g., work)
that has occured. It is this count that is used for messuring and tuning
code
(either directly or sometimes as part of the performance counters, which
some of which use the same cycle count).

Perhaps the OS could do more - I do not know. But I do know that its very
difficult for the OS to determine, before it occurs, if the platforms RDTSC
counter is going to track with real-time into the future. I still think
the
best answer is to have the platform inform the OS if it will or not.

---

Below is from the Intel web site. Note the third sentence about portable
software. I do not know which Pentium processor they are refering too, but
they mean the internal processor cycle clock. And on some portables that
means it will not count when the h/w slows the processor down, and on at
least some of the earlier pentiums it will not count when the OS hits the
idle loop if the idle loop if the OS uses "sti-hlt" to halt the processor
while waiting. (NT uses sti-hlt on UP systems and some MP systems in W2K
but not normally (and not on NT4), Win95 does not use sti-hlt because of a
different h/w issue).

- Ken

http://developer.intel.com/design/intarch/techinfo/Pentium/mdelregs.htm

Time Stamp Counter (TSC)

A dedicated, free-running, 64-bit time stamp counter is provided on chip.
Note that on the Pentium processor, this counter increments on every clock
cycle, although it is not guaranteed that this will be true on future
processors. As a time stamp counter, the RDTSC instruction reports values
that are guaranteed to be unique and monotonically increasing. >>>Portable
software should not expect that the counter reports absolute time or clock
counts.<<< The user level RDTSC (Read Time Stamp Counter) instruction is
provided to allow a program of any privilege level to sample its value. A
bit in CR4, TSD (Time Stamp Disable) is provided to disable this instruction
in secure environments. Supervisor mode programs may sample this counter
using the RDMSR instruction or reset/preset this counter with a WRMSR
instruction. The counter is cleared after reset.

While the user level RDTSC instruction and a corresponding 64-bit time stamp
counter will be provided in all future Pentium processor compatible
processors, access to this counter via the RDMSR/WRMSR instructions is
dependent upon the particular implementation.

---

Slava M. Usov

unread,

Jul 2, 2000, 3:00:00 AM7/2/00

to

Ken Reneris <k...@reneris.com> wrote in message
news:slqct06...@corp.supernews.com...

No flames below, just a few questions.

[...]

> On MP systems there is a flag in the MPS 1.4d BIOS table which informs the
> MP OS if the RDTSC is safe to use. If so, the OS can reset/calibrate
QPC()
> to use RDTSC. If the flag is not set, then NT will use some other timer
> even on MP systems (either the 8254 or the ACPI timer). In an MP case,
some
> high end MP machines support processors of non-matched speeds. If that
has
> been what is installed, the BIOS will tell the OS not to use the RDTSC.

I strongly believe that NT is not designed to run on such systems. For

example, its NtQuerySystemInformation() returns the number of processors but
it does not return frequency of each CPU; rather, it returns the CPU family,
stepping, and frequency, one for all CPUs.

Anyway, are you aware of any such system, which runs NT? No flames, I really
am curious about that.

> Also, in most MP systems are designed to a signal crystal to derive bus

and
> cpu clocks from; however, some might use different crystals at different
> locations. If different crystals are used to derive a processors'
clocking
> then the processors will drift apart by the acceptable error margin of a
> non-compensated crystal (50ppm). This doesn't sound like much, but after
> you've been running your system for a week the times between the drift
> between then can add up to seconds. So if you take time on one
processor,
> then later on another you would get something that doesn't correlate very
> well. Again in this case NT will revert to using a different timer as the
> timing source.

OK... I agree there is a theoretical possibility. But are there any machines

with separate crystals that run NT? All I can think of are a few high-end
Sun and IBM servers that do hot-swap CPU modules. Perhaps, HP as well. But
to the best of my knowledge there are no such machines running NT. And I'm
not even sure that said machines have separates crystals.

[...]

For short timings, RDTSC will do just fine. For real paranoid code, it's
possible to determine if there is a TSC shift between CPUs, I believe down
to ten CPU clocks or so.

To detect if there was a context switch between two measurements, if that's
necessary for some reason, GetTickCount() will do fine.

Frequency shift... well, I can't imagine somebody's going to run software
that does accurate timing on a portable or anything like that. For example,
I've seen quite a few commodity computers used as routers, all sort of
gateways, servers, process control hosts, etc, all with power management
disabled. Personal computers and notebooks don't count here, for the human
operator may and should ensure that a time-critical process won't be
subjected to a frequency shift.

> Please send all flames elsewhere - I only offer what I known about the

> topic. My personal opinion is that I think UP systems should add a
similar
> flag to what MP systems have to allow the OS to use RDTSC when possible as
> the QFP().

It's too late, I guess. The old systems, which are likely to have faulty

CPUs, will not have it. The new systems will not need it. Besides, I don't
think that it's totally impossible for the OS vendor to compile the list of
CPUs that do RDTSC incorrectly.

--

Slava M. Usov

unread,

Jul 2, 2000, 3:00:00 AM7/2/00

to

Ken Reneris <k...@reneris.com> wrote in message

news:sltbqe...@corp.supernews.com...

[...]

> Some of the older MP machines, and many of the machines that support more
> then 4 processors - and yes NT runs on them. It is why the flag got added
> into the MPS spec if the first place. The first NT 4.0 betas used RDTSC
on
> all MP boxes and then there was the drift problem that was noticed on some
> larger systems by (I believe NCR) and the flag needed to be added. Some
> other systems turned up afterwords as well.

Interesting. The revision of the MP spec that included that bit is dated
05/12/97, which is after [IMHO] the public release of NT 4.0. AFAIK, NCR
crafted [and still does?] special HALs for their hardware anyway.

[...]

> "incorrectly" is a matter of prespective. RDTSC is a cycle counter. In
> all casesit does that. So from the processor's point of view RDTSC works
> as advertised.

Incorrectly as implied by the following: "... incremented every processor

clock cycle, even when the processor is halted by the HLT instruction or the

external STPCLK# pin" [The Intel Architecture Software Developer's Manual,
Volume 3: System Programing Guide]. Incidentally, this manual does not even
mention differences in RDTSC implementations among different steppings of
early CPUs, [clearly for me] implying those were bugs.

[...]

> Perhaps the OS could do more - I do not know. But I do know that its very
> difficult for the OS to determine, before it occurs, if the platforms
RDTSC
> counter is going to track with real-time into the future. I still think
> the
> best answer is to have the platform inform the OS if it will or not.

I still think that clock speed-ups and slow-downs don't occur as a course of
normal running. Not even on notebooks.

What I'd like to see instead of both RDTSC and QPC() is an independent
hardware counter mapped into user-space. Always running at the same speed,
say 1 THz. Note that it does not have to run faster than the actual CPU: all
it has to do is to use a multiplier. Or better yet, map a "page of
counters", all "running" at a few pre-defined speeds. All counters may be
again derived from the single one, running at the current CPU speed.

Gary Chanson

unread,

Jul 3, 2000, 3:00:00 AM7/3/00

to

Will Dean <{newsdump}@industrial.demon.co.uk> wrote in message
news:962377901.10298.0...@news.demon.co.uk...
>
> On non SMP HAL's and 95, QPC reads the timer directly to get the
resolution
> of approximately 0.8us - it is presumably the hairy business of
combinining
> this with the overflow counters that involves all the interrupt disabling
> and other bad things.

Only because they're not very smart. I have a neat routine which does
this without disabling interrupts. All it takes is an extra read of the
overflow data. Can anyone figure out how it's done?

--

-GJC
-gcha...@shore.net

Ken Reneris

unread,

Jul 3, 2000, 3:00:00 AM7/3/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:OE5YFSE5$GA.1420@cpmsftngp05...

> Ken Reneris <k...@reneris.com> wrote in message
>

> Interesting. The revision of the MP spec that included that bit is dated
> 05/12/97, which is after [IMHO] the public release of NT 4.0. AFAIK, NCR
> crafted [and still does?] special HALs for their hardware anyway.

It was added for halmps (the one that's not specific to a particular OEM).

>
> Incorrectly as implied by the following: "... incremented every processor
> clock cycle, even when the processor is halted by the HLT instruction or
the
> external STPCLK# pin" [The Intel Architecture Software Developer's Manual,
> Volume 3: System Programing Guide]. Incidentally, this manual does not
even
> mention differences in RDTSC implementations among different steppings of
> early CPUs, [clearly for me] implying those were bugs.

That's interesting, but it still doesn't say that rdtsc is going to count
real time. I don't have my old pentium level vol 3, but I suspect the
wording got more precise in the pentium-pro volume to disallow the flip-flop
behaviour that happened on the pentium. (and it wouldn't have been an
errata on the pentium if it's exact behaviour in these cases wasn't defined
at the time).

> I still think that clock speed-ups and slow-downs don't occur as a course
of
> normal running. Not even on notebooks.

They do, but the stpclk# doesn't effect the current processors and
"speedstep" was just recently announced so I don't think it's in any
available portables yet. However, platforms that use speedstep will
actually change the input frequency to the processor so that it can lower
the voltage as well as slowing down the rate of the chip cycling. This
yields significantly more power (and thermal) savings then just slowing the
chip down. I would assume that this is going to effect the rate of the
rdtsc counter. Also transmeta has been making good claims on very low power
and it's due to changing the frequncy and power while running. I suspect
there are still corner cases out there, with more to come, where rdtsc is
not tied to real-time in all cases.

> What I'd like to see instead of both RDTSC and QPC() is an independent
> hardware counter mapped into user-space.

You would want some API to either read the counter or locate it anyway... it
might as well just be QPC(). If there was a good counter that could be
determined from userspace, QFP() could just wrap it and return the answer
without a kernel mode transition. And who knows maybe someday that will
happen.

- Ken

Ken Reneris

unread,

Jul 4, 2000, 3:00:00 AM7/4/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:OM3A8Ug5$GA.2232@cpmsftngp03...
>
> Talking on time in microseconds, I screwed my system time in months...
>
> >
> > I understand that, but this is as problematic for RDTSC users as it is
for
> > QPC() users. The SMP HAL uses RDTSC for QPC(), so if anything changes
> > or stops the clock rate, both are affected.

The SMP will not be effected. It only uses RDTSC if the BIOS tells it that
it can use it for real-time type time. If the BIOS doesn't tell the SMP hal
this, the HAL will use a different time source.

- Ken

Slava M. Usov

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

Talking on time in microseconds, I screwed my system time in months...

--S

Slava M. Usov <stripit...@usa.net> wrote in message

news:uxVH8xR5$GA.292@cppssbbsa04...

> Ken Reneris <k...@reneris.com> wrote in message

> news:sm09u21...@corp.supernews.com...
>
> [...]

>
> > They do, but the stpclk# doesn't effect the current processors and
> > "speedstep" was just recently announced so I don't think it's in any
> > available portables yet. However, platforms that use speedstep will
> > actually change the input frequency to the processor so that it can
lower
> > the voltage as well as slowing down the rate of the chip cycling. This
> > yields significantly more power (and thermal) savings then just slowing
> the
> > chip down. I would assume that this is going to effect the rate of the
> > rdtsc counter. Also transmeta has been making good claims on very low
> power
> > and it's due to changing the frequncy and power while running. I
suspect
> > there are still corner cases out there, with more to come, where rdtsc
is
> > not tied to real-time in all cases.
>

> I understand that, but this is as problematic for RDTSC users as it is for
> QPC() users. The SMP HAL uses RDTSC for QPC(), so if anything changes
> or stops the clock rate, both are affected.
>

> > You would want some API to either read the counter or locate it
anyway...
> it
> > might as well just be QPC(). If there was a good counter that could be
> > determined from userspace, QFP() could just wrap it and return the
answer
> > without a kernel mode transition. And who knows maybe someday that will
> > happen.
>

> Sure. But not before that happens :-)

Slava M. Usov

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

Ken Reneris <k...@reneris.com> wrote in message

news:sm595jl...@corp.supernews.com...

>
> "Slava M. Usov" <stripit...@usa.net> wrote in message

> news:OM3A8Ug5$GA.2232@cpmsftngp03...

> >
> > Talking on time in microseconds, I screwed my system time in months...
> >
> > >

> > > I understand that, but this is as problematic for RDTSC users as it is
> for
> > > QPC() users. The SMP HAL uses RDTSC for QPC(), so if anything changes
> > > or stops the clock rate, both are affected.
>

> The SMP will not be effected. It only uses RDTSC if the BIOS tells it
that
> it can use it for real-time type time. If the BIOS doesn't tell the SMP
hal
> this, the HAL will use a different time source.

That's wrong. The machine that I'm writing this on, has built-in facilities
to reduce CPU clocking in case a CPU temperature exceeds some preset value.
And that's an SMP machine. And it does use RDTSC for
KeQueryPerformanceCounter(). And yes, the clock slow down is enabled, albeit
the temperature is set to a rather high value, but that should have no
difference for BIOS/HAL.

And who knows, may be one day I'll shove a hair drier into the case and see
everything running slower, and KeQueryPerformanceCounter() reporting bogus
samples.

--S

Ken Reneris

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

"Slava M. Usov" <stripit...@usa.net> wrote in message

news:#ckJ$tj5$GA.292@cppssbbsa04...

> Ken Reneris <k...@reneris.com> wrote in message

> news:sm595jl...@corp.supernews.com...

> >
> > "Slava M. Usov" <stripit...@usa.net> wrote in message

> > news:OM3A8Ug5$GA.2232@cpmsftngp03...

>
> That's wrong. The machine that I'm writing this on, has built-in
facilities
> to reduce CPU clocking in case a CPU temperature exceeds some preset
value.
> And that's an SMP machine. And it does use RDTSC for
> KeQueryPerformanceCounter(). And yes, the clock slow down is enabled,
albeit
> the temperature is set to a rather high value, but that should have no
> difference for BIOS/HAL.

The MB you have uses STPCLK# to "slow" the CPU down, and the CPU you have
doesn't slow RDTSC when STPCLK is used. So the system is OK.

And *even if* you have a CPU where STPCLK# effected the RDTSC, it would be a
BIOS bug if it reported that the OS could use RDTSC as QPC().

- Ken

Slava M. Usov

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

Ken Reneris <k...@reneris.com> wrote in message

news:sm6f5u...@corp.supernews.com...

> The MB you have uses STPCLK# to "slow" the CPU down, and the CPU you have
> doesn't slow RDTSC when STPCLK is used. So the system is OK.
>
> And *even if* you have a CPU where STPCLK# effected the RDTSC, it would be
a
> BIOS bug if it reported that the OS could use RDTSC as QPC().

OK... we have made the full circle now. Now you're telling that the CPU slow
down will not affect the RDTSC in any way. And if if it did, that would be a
BIOS/MB bug. What's the problem with RDTSC, then? And which BIOS flag tells
the HAL to use or not to use RDTSC?

Bruce Dawson

unread,

Jul 5, 2000, 3:00:00 AM7/5/00

to

> OK... we have made the full circle now. Now you're telling that the CPU slow
> down will not affect the RDTSC in any way. And if if it did, that would be a
> BIOS/MB bug. What's the problem with RDTSC, then?

No - that's not what he's saying. He's saying that _if_ the BIOS says that the
OS
can use RDTSC as QPC() _then_ it is a bug if RDTSC is affected by a CPU
slowdown. That means that using RDTSC in production code is dangerous
because you don't know whether the BIOS will qualify it for use with QPC().

Personally I avoid RDTSC in production code because it can cause problems
on older CPUs - '486 chips and, I believe, Cyrix chips - that don't support it.
I use it for performance testing in house only.

"Slava M. Usov" wrote:

> Ken Reneris <k...@reneris.com> wrote in message

> news:sm6f5u...@corp.supernews.com...
>
> > The MB you have uses STPCLK# to "slow" the CPU down, and the CPU you have
> > doesn't slow RDTSC when STPCLK is used. So the system is OK.
> >
> > And *even if* you have a CPU where STPCLK# effected the RDTSC, it would be
> a
> > BIOS bug if it reported that the OS could use RDTSC as QPC().
>
> OK... we have made the full circle now. Now you're telling that the CPU slow
> down will not affect the RDTSC in any way. And if if it did, that would be a
> BIOS/MB bug. What's the problem with RDTSC, then? And which BIOS flag tells
> the HAL to use or not to use RDTSC?
>

> --
>
> Slava
>
> Please send any replies to this newsgroup.
> microsoft.public.win32.programmer.kernel

--
.Bruce Dawson, Humongous Entertainment.
http://www.humongous.com/

Slava M. Usov

unread,

Jul 6, 2000, 3:00:00 AM7/6/00

to

Bruce Dawson <bru...@humongous.com> wrote in message
news:3963C0B5...@humongous.com...

> > OK... we have made the full circle now. Now you're telling that the CPU
slow
> > down will not affect the RDTSC in any way. And if if it did, that would
be a
> > BIOS/MB bug. What's the problem with RDTSC, then?
>
> No - that's not what he's saying. He's saying that _if_ the BIOS says that
the
> OS
> can use RDTSC as QPC() _then_ it is a bug if RDTSC is affected by a CPU
> slowdown. That means that using RDTSC in production code is dangerous
> because you don't know whether the BIOS will qualify it for use with
QPC().

Look: there is a bit in an MPS table that says that CPUs are running off
different clocks. This bit, if set, says that RDTSC may be unsafe. Even in
this case, it's possible to use RDTSC, provided you can distinguish between
CPUs. Or if the time between two RDTSC invokations is so small that a thread
switch is unlikely and you don't care about RDTSC differences.

But to the best of my knowledge there are no other bits in the MPS tables
that deal with RDTSC, and specifically there are no bits that specify how
the slow down is implemented.

If there are no such bits, and the HAL decides to use RDTSC [on many SMP
machines it does] to implement QPC(), then QPC() is as good as RDTSC w.r.t.
CPU slowdown.

Now, is there such a bit?

> Personally I avoid RDTSC in production code because it can cause problems
> on older CPUs - '486 chips and, I believe, Cyrix chips - that don't
support it.
> I use it for performance testing in house only.

There is a class of scenarios where QPC(), even if reporting the results of
an RDTSC executed is kernel, is too slow. There's absolutely no way to use
it. If QPC() is used on non-SMP machines, it has huge overhead: the
execution time it takes is enough to bitblt a few kilobytes of an image. In
this forum, people often ask questions like: "I'm digitizing video input,
and I have to read a serial port as well. But I can't seem to perform both,
the CPU usage is just too high." Using QPC() instead of RDTSC in such
situations is prohibitively costly. And the slow older CPUs are usually
ruled out, together with anything that must slow the CPU down in the middle
of processing.

V-man

unread,

Aug 6, 2000, 3:00:00 AM8/6/00

to

Is this instruction linked to the same unsupported higher precision RTC (or
whatever it is) of BIOS's that has existed for the past decade or so?

I heard that in a hardware group. Sounds like this instruction has some ties
to the processor design itself from what some of you have said.

Anyway, is it safe to use, meaning is it reliable on all PC's through QPC()
or inline assembly code?

V-man

Slava M. Usov wrote in message ...
>Error Coad <ec...@squat.com> wrote in message
>news:qY925.520$Qb1....@monger.newsread.com...
>> > Take a look at GetTickCount().
>>
>> And then forget about it, because it is of no help whatsoever here.
>>
>> 1) Its precision is only 1 msec (not one microsecond)
>> 2) Even worse, its resolution is 55 msec!
>>
>> QueryPerformanceCounter() is the correct answer.
>
>Nope. QPC() takes around 500 CPU ticks to execute, which on typical CPUs is
>around 2 us. This is of course better than GetTickCount(). The correct
>answer is the RDTSC instruction, but you should be prepared to do the
>bookkeeping Real Fast (TM).
>
>And I don't even mention that QPC() works properly only with particular NT
>HALs, typically the SMP ones, and nevermind Win9x, which will welcome you
>with the good ol' fuzzy 55 ms.

Rob Barris

unread,

Aug 8, 2000, 3:00:00 AM8/8/00

to

In article <ewJxTB3$$GA.277@cppssbbsa04>, "V-man" <j_ca...@excie.com>
wrote:

We've used QPC on Win9x and it does a lot better than 1mS/55mS.

Rob

Corey Cooper

unread,

Aug 14, 2000, 3:00:00 AM8/14/00

to

I have used Ryle Design's ExacTicks product for years (and before that their
DOS timing tool product). It can be found at
http://www.RyleDesign.com/timetools.html
it's shareware, easy to use, and quite stable.
C.

"Hugo Bérubé" wrote:

> Hello,
>
> Can I retreive the time in time unit little than millisecond ?
>
> I search a way to get the time difference beetween to events. Those events
> are generated approximatly at a 10kHz rate (every 100 microseconds).
>
> Thanks,
>
> Hugo

Emmanuel TULOUP

unread,

Aug 24, 2000, 3:00:00 AM8/24/00

to

There is an absolute thing to have The Best precision:

If you read the number of CPU cycles passed since the boot, I guess you got
the precision you want !

----------------------------------------------------------------------------
-----------------------
#include <windows.h>

#include "TimerCpu.h"

__int64 FrequenceCPU()
{
__int64 a,b;

a = LireCycle();
Sleep(1000);
b = LireCycle();
return (b-a);
}

__int64 LireCycle()
{
unsigned int startH,
startL;

_asm
{
_emit 0x0F
_emit 0x31

mov startH,edx
mov startL,eax
}

return ( (((__int64)startH) << 32) | (__int64)startL );
}
----------------------------------------------------------------------------
-----------------------

good precision !

"Hugo Bérubé" <Hugo....@drev.dnd.ca> a écrit dans le message news:
ussNE6u1$GA.281@cppssbbsa04...

J. Wesley Cleveland

unread,

Aug 24, 2000, 3:00:00 AM8/24/00

to

If you are using VC6, this compiles down to an inline rdtsc instruction

#pragma warning( disable : 4035 )
inline __int64 Timer_ReadTSC_sys(void)
{
__asm rdtsc

Bruce Dawson

unread,

Aug 24, 2000, 3:00:00 AM8/24/00

to

I used frequency determination code like that for a while, but I
found it was fairly inaccurate, because the Sleep()
call is not guaranteed to return in exactly one second.
If you divide the result by the actual elapsed time, and
run the whole thing at a temporarily raised priority (so
that control doesn't switch between the getWallClockTime()
functions and LireCycle() functions) then you get much
greater accuracy, even with a shorter delay.

Emmanuel TULOUP wrote:

--

Emmanuel TULOUP

unread,

Aug 25, 2000, 3:00:00 AM8/25/00

to

Yes, the Sleep causes a little error, about 0.25 %,
but we can have a precision of micro-seconde, and less.

"Bruce Dawson" <bru...@humongous.com> a écrit dans le message news:
39A5A2C1...@humongous.com...

Bruce Dawson

unread,

Aug 25, 2000, 3:00:00 AM8/25/00

to

The error in Sleep is completely dependent on what is going on
in your system and can be arbitrarily large. It is quite possible
for your code to be swapped out during the Sleep() - or
worse.

The innaccuracies and inconsistencies I saw without measuring
the actual elapsed time were, under reasonable conditions,
substantially greater than 0.25%

Jim Conyngham

unread,

Sep 1, 2000, 6:05:33 PM9/1/00

to

I have seen problems in similar code. The best I've been able to do is about 1%
error, unless you're will to do the measurement over much longer than 1 second.

Jim Conyngham