Hi,
> Would you know why hb_threadReleaseCPU is hardcoded to 20 milliseconds under windows?
It was chosen long time ago not by me.
Probably smaller values may be ignored by some windows kernels due to
internal system timer resolution. See the subject about kernel timer
resoltion in Windows.
> Below I am sending a self contained example that shows how harbour application performance
> can result in significant (100%) performance gain using directly WAPI Sleep() with the
> appropriate resolution instead of hb_idlesleep() that releases the cpu for at least 20
> milliseconds.
Code where performance is important should not use any nonezero Sleep()
calls. It should make it's job as fast as possible and then freese for
longer time period. Otherwise most of CPU time is consumed by process
switching what is very expensive task and only some programs to monitor
process time shows that it is very low but in fact the whole system
overhead is very high.
> I saw that hb_idleState() also runs the garbage collector and scheduled hb_idleAdd
> tasks beyond releasing the cpu, but maybe the minimum sleep slice can be less than
> 20 milliseconds to improve performance.
As I said this function should not be used in any time critical functions.
It will reduce totatl system performance.
best regards,
Przemek
Hi Leonardo,
> >As I said this function should not be used in any time critical functions.
> >It will reduce totatl system performance.
> The attached .diff from harbour\src\vm\thread.c adds a new function
> hb_threadSleep() adapted (basically cut and pasted) from
> hb_threadReleaseCPU() to allow simple thread sleeping without any
> hb_idle* handling on all supported platforms.
I know this xHarobur code and I intentionally didn't implemented it
in Harobur. When I said this functions should not be used in
tima critical code then I was thinking also about such modified
versions.
I also do not think it's good idea to intorduce to core copde many
functions making the same job.
> Though I only tested it under Windows 7, using hb_threadSleep()
> instead of hb_idleSleep() increased the performance to 20 times
> faster!
Probably now you reached the effect of pure yield() function without
any timeout and this is what you need.
Look at Mindaugas test code to and check your kernel interrupts.
Anyhow even 1 million times does not change anything. The problem is
only in the fact that you are using such function or its modified
version.
Why you want to change Harbour core code instead of fixing your code?
BTW you restored old code we eliminated in the past because at least
for few platforms it was totally disabling process freezing.
> If you see it as a good addition to Harbour pls commit it to harbour SVN.
Probably the only missing thing in Harbour is wrapper for some pure
yield() functions without any timeout and real event loop but it's
much bigger modification and now I do not have enough spare time for it.
I'll add optional support for bigger precision in hb_idleSleep()
in the nearest future but many systems will not respect it rounding
small values down but as I said in some cases simple yield() function
can be useful.
best regards,
Przemek
> Hi Leonardo,
My name is Leandro but some friends call me Leo, so you almost hit it!
>I know this xHarobur code and I intentionally didn't implemented it
>in Harobur. When I said this functions should not be used in
>tima critical code then I was thinking also about such modified
>versions.
>I also do not think it's good idea to intorduce to core copde many
>functions making the same job.
I sent a second .diff version moving the function wrapping to hb_threadSleep
and keeping the definition of the fixed minimal time slices to
hb_threadReleaseCPU.
Is it any better?
>Probably now you reached the effect of pure yield() function without
>any timeout and this is what you need.
>Look at Mindaugas test code to and check your kernel interrupts.
Where can I get Mindaugas test code?
>Anyhow even 1 million times does not change anything. The problem is
>only in the fact that you are using such function or its modified
>version.
>Why you want to change Harbour core code instead of fixing your code?
It is not one thing instead of the other. I fixed my code and hb_threadSleep
could be usefull to someonelse.
I want to help. :) But I understand this sometimes may require knowledge
that I don't have.
>BTW you restored old code we eliminated in the past because at least
>for few platforms it was totally disabling process freezing.
Well, I was sure I was reproducing the precedend logic and keeping disabled
what was disabled. Would you mind to tell me where I messed it up?
>Probably the only missing thing in Harbour is wrapper for some pure
>yield() functions without any timeout and real event loop but it's
>much bigger modification and now I do not have enough spare time for it.
ok. This will be very usefull for speed critical applications. I hope you
can find time to do it.
>I'll add optional support for bigger precision in hb_idleSleep()
>in the nearest future but many systems will not respect it rounding
>small values down but as I said in some cases simple yield() function
>can be useful.
ok Przemek, thank you.
Best regards,
Leandro
Hi Leandro,
> >Hi Leonardo,
> My name is Leandro but some friends call me Leo, so you almost hit it!
I'm really sorry - I was too tired.
> Where can I get Mindaugas test code?
He sent it to this list with test results few days ago.
best regards,
Przemek
On 2011.12.07 14:28, Leandro Damasio - 2D Info wrote:
> In the results, where is written "epsilon_prg= 2 0.0000009057 s", is it
> correct to understand it took two processor cicles between two calls to
> hptimer_counter() from prg level?
Not 2 CPU cycles, but 2 high precision timer cycles. Timer does not
necessary use CPU RDTSC instruction. Power management timer, High
precision event timer (HPET), or even Programmable interrupt timer can
also be used. Update rate of timer is also indicated in test.
The interesting thing is that even RDTSC (CPU timer) is used. epsilon_c
and epsilon_prg are quite similar. This means QueryPerformanceCounter()
has a big overhead and is much more complex than simple RDTSC instruction:
freq= 3000150000 Hz 3000.15 MHz
epsilon_c= 945 0.0000003150 s
epsilon_prg= 1305 0.0000004350 s
More info about timers and possible usage in Windows OS:
http://en.wikipedia.org/wiki/Programmable_Interval_Timer
http://en.wikipedia.org/wiki/Time_Stamp_Counter
http://en.wikipedia.org/wiki/High_Precision_Event_Timer#cite_note-6
http://en.wikipedia.org/wiki/NTLDR serch for "USEPMTIMER"
http://www.baldwin.cx/~phoenix/reference/docs/acpi.pdf sections:
4.7.2.1 Power Management Timer, 4.7.3.3 Power Management Timer (PM_TMR)
http://www.intel.com/hardwaredesign/hpetspec_1.pdf
http://wiki.osdev.org/Programmable_Interval_Timer#Frequency_Dividers
http://forum.slysoft.com/archive/index.php/t-22236.html
Regards,
Mindaugas
>Not 2 CPU cycles, but 2 high precision timer cycles. Timer does not
>necessary use CPU RDTSC instruction. Power management timer, High precision
>event timer (HPET), or even Programmable interrupt timer can also be used.
>Update rate of timer is also indicated in test.
Ah, ok.
>The interesting thing is that even RDTSC (CPU timer) is used. epsilon_c and
>epsilon_prg are quite similar. This means QueryPerformanceCounter() has a
>big overhead and is much more complex than simple RDTSC instruction:
> freq= 3000150000 Hz 3000.15 MHz
> epsilon_c= 945 0.0000003150 s
> epsilon_prg= 1305 0.0000004350 s
I didn't analise your test results well enough before but now I see your
point.
>More info about timers and possible usage in Windows OS:
Thank you Mindaugas, I'll surelly check it out.
Best regards,
Leandro