hb_threadReleaseCPU minimum time slice

201 views
Skip to first unread message

Leandro Damasio - 2D Info

unread,
Dec 5, 2011, 9:44:15 AM12/5/11
to Harbour Project
Hi Przemek,
Would you know why hb_threadReleaseCPU is hardcoded to 20 milliseconds under windows?
Below I am sending a self contained example that shows how harbour application performance can result in significant (100%) performance gain using directly WAPI Sleep() with the appropriate resolution instead of hb_idlesleep() that releases the cpu for at least 20 milliseconds.
I saw that hb_idleState() also runs the garbage collector and scheduled hb_idleAdd tasks beyond releasing the cpu, but maybe the minimum sleep slice can be less than 20 milliseconds to improve performance.
Best regards,
Leandro
 
<code>
 
procedure main
local pThread1
local pThread2
local nDuration  :=15// seconds
local nCount1    :=0
local nCount2    :=0
local nResolution:=GetTickResolution()
 
 
pThread1:=hb_threadStart({|rrr,ddd,ccc|sleep1(rrr,ddd,@ccc)},nResolution,nDuration*1000,@nCount1)
pThread1:=hb_threadStart({|rrr,ddd,ccc|sleep2(rrr,ddd,@ccc)},nResolution,nDuration*1000,@nCount2)
 
hb_threadWaitForAll()
 
? "minimum resolution    :", nResolution,"milliseconds."
? "using hb_idleSleep    :", nCount1/nDuration, "cicles/Sec."
? "using Sleep           :", nCount2/nDuration, "cicles/Sec."
? "hb_idleSleep makes it :", nCount2/nCount1, "times slower!"
return
 
static function sleep1(nResolution,nDuration,nCount1)
local tStart:=hb_milliseconds()
local nSleep:=nResolution/1000
do while hb_milliseconds()-tStart<nDuration
   hb_idlesleep(nSleep)
   nCount1++
endd
return
 
static function sleep2(nResolution,nDuration,nCount2)
local tStart:=hb_milliseconds()
local nSleep:=nResolution
do while hb_milliseconds()-tStart<nDuration
   JustSleep(nSleep)
   nCount2++
endd
return
 
#pragma BEGINDUMP
 
#include <windows.h>
#include "hbapi.h"
 
HB_FUNC_STATIC(JUSTSLEEP)
{
   Sleep(hb_parnl(1));
}
 
HB_FUNC_STATIC(GETTICKRESOLUTION)
{
   TIMECAPS tc;
   timeGetDevCaps((LPTIMECAPS) &tc,(UINT) sizeof(tc));
   hb_retni(tc.wPeriodMin);
}
#pragma ENDDUMP
</code>

Przemysław Czerpak

unread,
Dec 5, 2011, 10:35:04 AM12/5/11
to harbou...@googlegroups.com
On Mon, 05 Dec 2011, Leandro Damasio - 2D Info wrote:

Hi,

> Would you know why hb_threadReleaseCPU is hardcoded to 20 milliseconds under windows?

It was chosen long time ago not by me.
Probably smaller values may be ignored by some windows kernels due to
internal system timer resolution. See the subject about kernel timer
resoltion in Windows.

> Below I am sending a self contained example that shows how harbour application performance
> can result in significant (100%) performance gain using directly WAPI Sleep() with the
> appropriate resolution instead of hb_idlesleep() that releases the cpu for at least 20
> milliseconds.

Code where performance is important should not use any nonezero Sleep()
calls. It should make it's job as fast as possible and then freese for
longer time period. Otherwise most of CPU time is consumed by process
switching what is very expensive task and only some programs to monitor
process time shows that it is very low but in fact the whole system
overhead is very high.

> I saw that hb_idleState() also runs the garbage collector and scheduled hb_idleAdd
> tasks beyond releasing the cpu, but maybe the minimum sleep slice can be less than
> 20 milliseconds to improve performance.

As I said this function should not be used in any time critical functions.
It will reduce totatl system performance.

best regards,
Przemek

Leandro Damasio - 2D Info

unread,
Dec 5, 2011, 1:26:21 PM12/5/11
to harbou...@googlegroups.com
>As I said this function should not be used in any time critical functions.
>It will reduce totatl system performance.
Przemek,
The attached .diff from harbour\src\vm\thread.c adds a new function
hb_threadSleep() adapted (basically cut and pasted) from
hb_threadReleaseCPU() to allow simple thread sleeping without any hb_idle*
handling on all supported platforms.
Though I only tested it under Windows 7, using hb_threadSleep() instead of
hb_idleSleep() increased the performance to 20 times faster!
If you see it as a good addition to Harbour pls commit it to harbour SVN.
Best regards,
Leandro
thread.diff

Leandro Damasio - 2D Info

unread,
Dec 5, 2011, 5:45:36 PM12/5/11
to harbou...@googlegroups.com
Viktor,
I'm still kind of confused about how to do it, but maybe this .diff version
fits better to harbour rules than the first one. Does it?
It delegates OS specific "sleep" function selection from
hb_threadReleaseCPU() to hb_threadSleep() and keeps relevant comments so
they don't appear twice.
Regards,
Leandro
thread.diff

Przemysław Czerpak

unread,
Dec 6, 2011, 4:01:23 AM12/6/11
to harbou...@googlegroups.com
On Mon, 05 Dec 2011, Leandro Damasio - 2D Info wrote:

Hi Leonardo,

> >As I said this function should not be used in any time critical functions.
> >It will reduce totatl system performance.

> The attached .diff from harbour\src\vm\thread.c adds a new function
> hb_threadSleep() adapted (basically cut and pasted) from
> hb_threadReleaseCPU() to allow simple thread sleeping without any
> hb_idle* handling on all supported platforms.

I know this xHarobur code and I intentionally didn't implemented it
in Harobur. When I said this functions should not be used in
tima critical code then I was thinking also about such modified
versions.
I also do not think it's good idea to intorduce to core copde many
functions making the same job.

> Though I only tested it under Windows 7, using hb_threadSleep()
> instead of hb_idleSleep() increased the performance to 20 times
> faster!

Probably now you reached the effect of pure yield() function without
any timeout and this is what you need.
Look at Mindaugas test code to and check your kernel interrupts.
Anyhow even 1 million times does not change anything. The problem is
only in the fact that you are using such function or its modified
version.
Why you want to change Harbour core code instead of fixing your code?
BTW you restored old code we eliminated in the past because at least
for few platforms it was totally disabling process freezing.

> If you see it as a good addition to Harbour pls commit it to harbour SVN.

Probably the only missing thing in Harbour is wrapper for some pure
yield() functions without any timeout and real event loop but it's
much bigger modification and now I do not have enough spare time for it.
I'll add optional support for bigger precision in hb_idleSleep()
in the nearest future but many systems will not respect it rounding
small values down but as I said in some cases simple yield() function
can be useful.

best regards,
Przemek

Leandro Damasio - 2D Info

unread,
Dec 6, 2011, 6:29:13 AM12/6/11
to harbou...@googlegroups.com
Hi Przemek,

> Hi Leonardo,

My name is Leandro but some friends call me Leo, so you almost hit it!

>I know this xHarobur code and I intentionally didn't implemented it
>in Harobur. When I said this functions should not be used in
>tima critical code then I was thinking also about such modified
>versions.
>I also do not think it's good idea to intorduce to core copde many
>functions making the same job.

I sent a second .diff version moving the function wrapping to hb_threadSleep
and keeping the definition of the fixed minimal time slices to
hb_threadReleaseCPU.
Is it any better?

>Probably now you reached the effect of pure yield() function without
>any timeout and this is what you need.
>Look at Mindaugas test code to and check your kernel interrupts.

Where can I get Mindaugas test code?

>Anyhow even 1 million times does not change anything. The problem is
>only in the fact that you are using such function or its modified
>version.
>Why you want to change Harbour core code instead of fixing your code?

It is not one thing instead of the other. I fixed my code and hb_threadSleep
could be usefull to someonelse.
I want to help. :) But I understand this sometimes may require knowledge
that I don't have.

>BTW you restored old code we eliminated in the past because at least
>for few platforms it was totally disabling process freezing.

Well, I was sure I was reproducing the precedend logic and keeping disabled
what was disabled. Would you mind to tell me where I messed it up?

>Probably the only missing thing in Harbour is wrapper for some pure
>yield() functions without any timeout and real event loop but it's
>much bigger modification and now I do not have enough spare time for it.

ok. This will be very usefull for speed critical applications. I hope you
can find time to do it.

>I'll add optional support for bigger precision in hb_idleSleep()
>in the nearest future but many systems will not respect it rounding
>small values down but as I said in some cases simple yield() function
>can be useful.

ok Przemek, thank you.

Best regards,
Leandro

Przemysław Czerpak

unread,
Dec 6, 2011, 2:48:52 PM12/6/11
to harbou...@googlegroups.com
On Tue, 06 Dec 2011, Leandro Damasio - 2D Info wrote:

Hi Leandro,

> >Hi Leonardo,
> My name is Leandro but some friends call me Leo, so you almost hit it!

I'm really sorry - I was too tired.

> Where can I get Mindaugas test code?

He sent it to this list with test results few days ago.

best regards,
Przemek

Leandro Damasio - 2D Info

unread,
Dec 7, 2011, 7:28:53 AM12/7/11
to harbou...@googlegroups.com
Hi Przemek,
I looked into Mindaugas test code and it is really very interesting test!
Some of the results seems so fast that I don't know if I'm reading them
correctly. :)
In the results, where is written "epsilon_prg= 2 0.0000009057 s",
is it correct to understand it took two processor cicles between two calls
to hptimer_counter() from prg level?
Wouldn't that be super super fast, or maybe impossibly fast?!
About the lack of pure yield functions to harbour, don't you think it is
also very important to have hb_threadSuspend(pThread) and
hb_threadResume(pThread)? IMO remote activation and desactivation of threads
would increase very much the power of Harbour applications in some cases.
Best regards,
Leandro

Mindaugas Kavaliauskas

unread,
Dec 7, 2011, 9:37:38 AM12/7/11
to harbou...@googlegroups.com
Hi,


On 2011.12.07 14:28, Leandro Damasio - 2D Info wrote:
> In the results, where is written "epsilon_prg= 2 0.0000009057 s", is it
> correct to understand it took two processor cicles between two calls to
> hptimer_counter() from prg level?

Not 2 CPU cycles, but 2 high precision timer cycles. Timer does not
necessary use CPU RDTSC instruction. Power management timer, High
precision event timer (HPET), or even Programmable interrupt timer can
also be used. Update rate of timer is also indicated in test.

The interesting thing is that even RDTSC (CPU timer) is used. epsilon_c
and epsilon_prg are quite similar. This means QueryPerformanceCounter()
has a big overhead and is much more complex than simple RDTSC instruction:
freq= 3000150000 Hz 3000.15 MHz
epsilon_c= 945 0.0000003150 s
epsilon_prg= 1305 0.0000004350 s

More info about timers and possible usage in Windows OS:
http://en.wikipedia.org/wiki/Programmable_Interval_Timer
http://en.wikipedia.org/wiki/Time_Stamp_Counter
http://en.wikipedia.org/wiki/High_Precision_Event_Timer#cite_note-6
http://en.wikipedia.org/wiki/NTLDR serch for "USEPMTIMER"
http://www.baldwin.cx/~phoenix/reference/docs/acpi.pdf sections:
4.7.2.1 Power Management Timer, 4.7.3.3 Power Management Timer (PM_TMR)
http://www.intel.com/hardwaredesign/hpetspec_1.pdf
http://wiki.osdev.org/Programmable_Interval_Timer#Frequency_Dividers
http://forum.slysoft.com/archive/index.php/t-22236.html


Regards,
Mindaugas

Leandro Damasio - 2D Info

unread,
Dec 7, 2011, 10:52:02 AM12/7/11
to harbou...@googlegroups.com
Hi Mindaugas,

>Not 2 CPU cycles, but 2 high precision timer cycles. Timer does not
>necessary use CPU RDTSC instruction. Power management timer, High precision
>event timer (HPET), or even Programmable interrupt timer can also be used.
>Update rate of timer is also indicated in test.

Ah, ok.

>The interesting thing is that even RDTSC (CPU timer) is used. epsilon_c and
>epsilon_prg are quite similar. This means QueryPerformanceCounter() has a
>big overhead and is much more complex than simple RDTSC instruction:
> freq= 3000150000 Hz 3000.15 MHz
> epsilon_c= 945 0.0000003150 s
> epsilon_prg= 1305 0.0000004350 s

I didn't analise your test results well enough before but now I see your
point.

>More info about timers and possible usage in Windows OS:

Thank you Mindaugas, I'll surelly check it out.

Best regards,
Leandro

Leandro Damasio - 2D Info

unread,
Dec 8, 2011, 1:07:27 PM12/8/11
to harbou...@googlegroups.com
Hi Przemek,
 
>> Would you know why hb_threadReleaseCPU is hardcoded to 20 milliseconds under windows?
>
>It was chosen long time ago not by me.
>Probably smaller values may be ignored by some windows kernels due to
>internal system timer resolution. See the subject about kernel timer
>resoltion in Windows.
 
There is something maybe related to the 20 ms harbour developers early defined as the minimum thread sleep time under windows:
Apparently 20 milliseconds is considered the average time slice that windows scheduller allocates to each thread.
MSDN mentions it here and google also finds several articles/discussions saying the same.
Best regards,
Leandro
Reply all
Reply to author
Forward
0 new messages