wrote in message
news:3949f35f-fb8c-4f97...@googlegroups.com...
> On Friday, June 23, 2017 at 10:13:19 PM UTC+10, James Van Buskirk wrote:
> > wrote in message
> > news:2e8d296d-09c1-4f98...@googlegroups.com...
> > > I'd be interested in 64bit gFortran (Windows) code for RDTSC, if you
> > > have
> > > it available.
> > > It is my impression that QueryPerformanceCounter = RDTSC/1024, ie a
> > > 10-bit
> > > shift
> > The code is just
> > RDTSC
> > SHL RDX, 32
> > OR RAX, RDX
> > RET
> > But gfortran doesn't have an inline assembler so I wrote code that
> > pokes the appropriate machine code into memory and creates
> > procedure pointers to it. Posted at
> >
https://groups.google.com/d/msg/comp.lang.fortran/G5B-O3tvNGE/D-xgSru6KrUJ
> Thanks for your suggestion, although I find the reference a bit complex.
> I was hoping for code for: INTEGER*8 FUNCTION RDTSC_tick ()
> I would like to run it with 64-bit gFortran from
http://www.equation.com,
> which
> I have found to be a fairly robust windows version of the gFortran
> compiler.
Some of the complexity is there because there are multiple functions
included
in the example some some of it due is due use of ambivalent machine code
that includes a branch to select a 32- or 64-bit version as well as both
versions.
I pared it down a bit to include only RDTSC, and did the selection between
32- or 64-bit code in the Fortran part, so the machine code is now much more
readable. Also I used an initializer for my function pointer so that on the
first invocation the code is poked into memory and the procedure pointer
is pointed at the poked code which is then invoked so that the user doesn't
have to initiate that step himself. Unfortunately ifort 16.0 can't handle
that
initializer so I was only able to test with gfortran (both 32- and 64-bits).
Can
someone confirm that it works with more recent ifort?
D:\gfortran\clf\rdtsc>type hello.f90
module rdtsc_mod
use ISO_C_BINDING
implicit none
! We will not export anything but the pointer to the rdtsc function
private
! Interface for rdtsc function
abstract interface
function rdtsc_iface() bind(C)
import
implicit none
integer(C_INT64_T) rdtsc_iface
end function rdtsc_iface
end interface
! Define pointer to rdtsc function and initialize to point
! at initialization function
procedure(rdtsc_iface), pointer, public :: rdtsc => rdtsc_init
! Constants required for VirtualAlloc and VirtualProtect
integer(C_INT32_T), parameter :: &
MEM_COMMIT = int(Z'00001000',C_INT32_T), &
MEM_RESERVE = int(Z'00002000',C_INT32_T), &
PAGE_READWRITE = int(Z'04',C_INT32_T), &
PAGE_EXECUTE = int(Z'10',C_INT32_T)
! Interfaces for VirtualAlloc, GetLastError, and VirtualProtect
interface
function VirtualAlloc(lpAddress,dwSize,flAllocationType, &
flProtect) bind(C,name='VirtualAlloc')
import
implicit none
!DEC$ ATTRIBUTES STDCALL :: VirtualAlloc
!GCC$ ATTRIBUTES STDCALL :: VirtualAlloc
type(C_PTR) VirtualAlloc
type(C_PTR), value :: lpAddress
integer(C_SIZE_T), value :: dwSize
integer(C_INT32_T), value :: flAllocationType
integer(C_INT32_T), value :: flProtect
end function VirtualAlloc
function GetLastError() bind(C,name='GetLastError')
import
implicit none
!DEC$ ATTRIBUTES STDCALL :: GetLastError
!GCC$ ATTRIBUTES STDCALL :: GetLastError
integer(C_INT32_T) GetLastError
end function GetLastError
function VirtualProtect(lpAddress,dwSize,flNewProtect, &
lpflOldProtect) bind(C,name='VirtualProtect')
import
implicit none
!DEC$ ATTRIBUTES STDCALL :: VirtualProtect
!GCC$ ATTRIBUTES STDCALL :: VirtualProtect
integer(C_INT32_T) VirtualProtect
type(C_PTR), value :: lpAddress
integer(C_SIZE_T), value :: dwSize
integer(C_INT32_T), value :: flNewProtect
integer(C_INT32_T) :: lpflOldProtect
end function VirtualProtect
end interface
contains
! Initialization procedure for rdtsc. It will be called on the
! first invocation of rdtsc and sets up our real rdtsc function
function rdtsc_init() bind(C)
integer(C_INT64_T) rdtsc_init
! Machine code for 32-bit function
integer(C_INT8_T), target :: BAD_STUFF_32(3)
data BAD_STUFF_32 / &
Z'0F', Z'31', & ! rdtsc
Z'C3' / ! ret
! Machine code for 64-bit function
integer(C_INT8_T), target :: BAD_STUFF_64(10)
data BAD_STUFF_64 / &
Z'0F', Z'31', & ! rdtsc
Z'48', Z'C1', Z'E2', Z'20', & ! shl rdx, 32
Z'48', Z'09', Z'D0', & ! or rax, rdx
Z'C3' / ! ret
! Address the OS allocates for our function via VirtualAlloc
type(C_PTR) rdtsc_address
! Last error code
integer(C_INT32_T) last
! Fortran pointer to write our function to
integer(C_INT8_T), pointer :: rdtsc_code(:)
! Error status from VirtualProtect
integer(C_INT32_T) status
! Variable to store old memory protection code
integer(C_INT32_T) OldProtect
! Get writable address from OS to put our function in
! Need different sizes for 32- and 64-bit modes
if(bit_size(0_C_INTPTR_T) == 32) then
rdtsc_address = VirtualAlloc( &
lpAddress = C_NULL_PTR, &
dwSize = size(BAD_STUFF_32,kind=C_SIZE_T), &
flAllocationType = ior(MEM_COMMIT,MEM_RESERVE), &
flProtect = PAGE_READWRITE)
else
rdtsc_address = VirtualAlloc( &
lpAddress = C_NULL_PTR, &
dwSize = size(BAD_STUFF_64,kind=C_SIZE_T), &
flAllocationType = ior(MEM_COMMIT,MEM_RESERVE), &
flProtect = PAGE_READWRITE)
end if
! If something goes wrong, print out error code and abort
if(.NOT.C_ASSOCIATED(rdtsc_address)) then
last = GetLastError()
write(*,'(*(g0))') &
'rdtsc_init failed in VirtualAlloc with code ', last
stop
end if
! Get Fortran pointer to allocated memory and poke our
! function into it. Then mark it as executable. Need
! to poke in code appropriate to address size.
if(bit_size(0_C_INTPTR_T) == 32) then
call C_F_POINTER(rdtsc_address,rdtsc_code, &
[size(BAD_STUFF_32)])
rdtsc_code = BAD_STUFF_32
status = VirtualProtect( &
lpAddress = rdtsc_address, &
dwSize = size(BAD_STUFF_32,kind=C_SIZE_T), &
flNewProtect = PAGE_EXECUTE, &
lpflOldProtect = OldProtect)
else
call C_F_POINTER(rdtsc_address,rdtsc_code, &
[size(BAD_STUFF_64)])
rdtsc_code = BAD_STUFF_64
status = VirtualProtect( &
lpAddress = rdtsc_address, &
dwSize = size(BAD_STUFF_64,kind=C_SIZE_T), &
flNewProtect = PAGE_EXECUTE, &
lpflOldProtect = OldProtect)
end if
! If something goes wrong, print out error code and abort
if(status == 0) then
last = GetLastError()
write(*,'(*(g0))') &
'rdtsc_init failed in VirtualProtect with code ', last
stop
end if
! Point the function pointer at the function we just poked into memory
call C_F_PROCPOINTER(transfer(rdtsc_address,C_NULL_FUNPTR), &
rdtsc)
! We still have to return the TSC value for transparency
rdtsc_init = rdtsc()
end function rdtsc_init
end module rdtsc_mod
program hello
use rdtsc_mod
use ISO_C_BINDING, only: C_INT64_T
implicit none
integer(C_INT64_T) t0, tf
t0 = rdtsc()
write(*,'(*(g0))') 'Hello, world'
tf = rdtsc()
write(*,'(*(g0))') 'Time = ',tf-t0
t0 = rdtsc()
tf = rdtsc()
write(*,'(*(g0))') 'Time = ',tf-t0
endprogram hello
D:\gfortran\clf\rdtsc>gfortran -fno-range-check hello.f90 -ohello
D:\gfortran\clf\rdtsc>hello
Hello, world
Time = 438947
Time = 36
As can be seen in the above example (note copious comments) all
the user need do is USE the rdtsc_mod module and then the
rdtsc function works transparently.
> I suspect that it is contained in the code you referenced, but
> I would prefer something more concise.
Check
> There are two other questions you may be able to reply to:
> 1) Does RDTSC need some initialisation ? A version of RDTSC
> interface I have for ifort (2013) does use an initialisation code.
> I also use QueryPerformanceCounter so suspect RDTSC is being
> initialised via these calls.
No, RDTSC needs no initialization. It's possible to write to the TSC,
but that's a privileged instruction so mostly one just uses the
difference between two values read and subtracts. Since my
example doesn’t link to a *.LIB or *.OBJ file it needs some
preliminaries where, with the help of the OS, Fortran can set
up a function which can subsequently be invoked.
> 2) I am puzzled by the speed up of the processor in my i7-6700HQ
> or i7-4790K. ( as reported by task manager) SYSTEM_CLOCK
> appears to provide accurate clock times, but I am wondering
> about RDTSC and also QueryPerformanceCounter, which appear
> to be RDTSC / 1024.
Yeah, I haven't played around with the consequences of the
processor varying its speed. You are sort of on your own here.
> The cycle rate of RDTSC is also a problem, which I approximate
> by using QueryPerformanceFrequency * 1024
I normally use RDTSC only for timing short chunks of code where
other timers don't have enough precision. Relative times are what
are important to me, so I'm not so interested in wall-clock time.
> For Silverfrost FTN95 /64, RDTSC_VAL@() is a 64 bit replacement
> for REAL*10 CPU_CLOCK@ and appears to work for 32 bit also.
> As INTEGER*8, I find this is a better option.
> RDTSC rate is not documented and appears to be the processor
> rate. My approach for RDTSC_tick_rate works on all pc's I have
> available to test.
RDTSC is documented in 325462.pdf. The rate is the processor
rate. It should be noted that RDTSC advances in chunks of the
bus pace in units of the processor pace, so you may find that all
timings have a common divisor like 5 or 13 or something.
Check