On Thu, 20 Mar 2014 16:43:10 -0400, Albert van der Horst
<
alb...@spenarnc.xs4all.nl> wrote:
> In article <op.xcuu9nka6zenlw@localhost>,
> Rod Pemberton <dont_us...@xnothavet.cqm> wrote:
>> On Sat, 15 Mar 2014 09:48:00 -0400, Assad
>> <
assad....@alum.swarthmore.edu> wrote:
>>> So a practical question:
>>>
>>> How to measure the execution speed of an algorithm in Forth?
>>>
>>
>> A few people mentioned the RDTSC instruction for x86 processors.
>> As mentioned in the past here, RDTSC is only for single-core
>> processors, Pentium or later. RDTSC doesn't work correctly on
>
> Party line/folklore/urban legend.
Really? ...
> Read my post. We Forthers care about whether it actually works.
>
> Just try it.
> TICKS 1000 MS TICKS
>
...
>> multiple core processors. RDTSCP is for multiple core processors.
>> You're also supposed to issue a serializing instruction prior to
>> executing RDTSC or RDTSCP. One such instruction is CPUID which
>> can be used to determine if the RDTSC and/or RDTSCP instructions
>> are available. Even when using RDTSC and/or RDTSCP some processors:
>> 1) fail to update the TSC (Time Stamp Counter) using the actual
>> clock speed of the processor
>> 2) suffer from TSC drifts
>> 3) have TSCs which are affected by power management events
>
> Have you actually tried any of this out.
>
I do have sample C code in DOS for calling RDTSC, but not RDTSCP.
Except for the sample, I don't have any code that uses either.
I do need to implement it for my Forth and my stalled OS project.
> Can you point to anywhere the obscurest corner of the Internet
> to find a story about a multicore processor that fails to have
> the RDTSC?
>
Fails to *have* an RDTSC instruction, or fails to have a correctly
*working* RDTSC instruction?
Every x86 processor since the Pentium _should_ have the RDTSC instruction.
As for pointing to stories about multicore processors that fail to
have a correctly working RDTSC and/or RDTSCP anywhere in the obscurest
corners of the Internet, yes, I can point to a few dark corners:
"TSC drift can occur on K8 AMD multi-processor platforms and
single-processor dual-core platforms as they do not provide frequency
independent TSC. This drift does not occur on single-processor single-core
platforms for obvious reasons."
http://web.archive.org/web/20101129154633/http://developer.amd.com/documentation/articles/Pages/1214200692.aspx
"...efficeon updates RDTSC independently of the actual CPU clock speed and
at a rate that corresponds to the maximum CPU frequency."
http://www.vanshardware.com/articles/2004/05/040517_efficeonFreeze/040517_efficeonFreeze.htm
"The behavior of the RDTSCP instruction is implementation dependent. The
TSC counts at a constant rate, but may be affected by power management
events (such as frequency changes), depending on the processor
implementation.
If CPUID 8000_0007.edx[8] = 1, then the TSC rate is ensured to be invariant
across all P-States, C-States, and stop-grant transitions (such as STPCLK
Throttling); therefore, the TSC is suitable for use as a source of time."
-from AMD64 Architecture Programmer's Manual Volume 3: General-Purpose
and System Instructions, AMD64 24594 r3.14
The following .pdf says that out-of-order-execution CPU's (PII,
PPro) need to use CPUID and RDTSC:
1) together
2) together three or four times before using them to time code
"Using the RDTSC Instruction for Performance Monitoring"
http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
FYI, list of serializing instructions via Sandpile.org
http://www.sandpile.org/x86/coherent.htm
Most of that was harvested from other posts of mine to comp.lang.asm.x86 in
'06 and '08. I updated one link to WayBack archive's cache of an AMD page.
Rod Pemberton