Precision of MS delay

Krishna Myneni

unread,

Aug 30, 2021, 11:30:00 PM8/30/21

to

There can be a significant difference in the precision (and accuracy) of
the delay provided by the standard word MS in different implementations.
We recognized that using an OS sleep function was not suitable for our
precision delay needs early on in the development of kForth (32-bit
version), and switched to using a polling loop to implement MS. To
demonstrate the difference in precision, I performed some tests
comparing kForth-64 (ver 0.2.2) and Gforth (ver. 0.7.9_20210506), which
have different implementations of MS. Test code is given below the results.

\ Test the precision of 100 millisecond MS delay

\ kForth-64
100 ms-precision

Mean difference (microseconds) = 0.017
St. deviation (microseconds) = 0.33886
ok

\ Gforth 64-bit version
100 ms-precision
Mean difference (microseconds) = 388.364
St. deviation (microseconds) = 86.7326700849227

In 1000 measurements, the mean difference between 100 MS delay and the
elapsed time, measured by a microsecond resolution timer, is within one
microsecond in kForth-64 (results should be similar for kForth-32), but
is close to 400 microseconds on average in Gforth. The standard
deviations for the measured difference distributions, likewise shows a
distribution with a width of better than one microsecond for the kForth
MS implementation, while Gforth's shows a standard deviation of 87
microseconds.

The difference is attributable to Gforth's implementation of MS using a
nanosleep system call. Gforth's implementation has better than 1
millisecond resolution, based on the above measurements. The
demonstration simply illustrates that MS can be implemented to higher
precision and accuracy, if one needs such performance in delays.

--
Krishna Myneni

----------
\ ms-precision.4th
\
\ Measure the precision of MS using a microsecond timer.
\

\ include ans-words

false value utimerFound?

[DEFINED] US2@ [IF]
true to utimerFound?
: trial ( udelay_ms -- delta_us )
dup us2@ rot ms us2@ 2swap d- rot 1000 m* d- d>s ;
[ELSE]

[DEFINED] UTIME [IF]
true to utimerFound?
: trial ( udelay_ms -- delta_us )
dup utime rot ms utime 2swap d- rot 1000 m* d- d>s ;
[THEN]

[THEN]

utimerFound? 0= [IF]
cr .( Sorry, recognized microsecond timer not found! ) cr
quit
[THEN]

: fsquare fdup f* ;

1000 constant NTRIALS

create data[ NTRIALS cells allot
: ]! ( n a idx -- ) cells + ! ;
: ]@ ( a idx -- n ) cells + @ ;

: collect ( udelay_ms -- )
NTRIALS 0 DO
dup trial data[ I ]!
LOOP drop ;

: stats ( -- ) ( F: -- mean stdev )
0.0e0 NTRIALS 0 DO data[ I ]@ s>f f+ LOOP
NTRIALS s>f f/
fdup 0e
NTRIALS 0 DO
fover data[ I ]@ s>f f- fsquare f+
LOOP fswap fdrop
NTRIALS 1- s>f f/ fsqrt ;

: ms-precision ( udelay_ms -- )
collect stats fswap
cr ." Mean difference (microseconds) = " f.
cr ." St. deviation (microseconds) = " f.
cr ;
---------

Marcel Hendrix

unread,

Aug 31, 2021, 4:11:39 AM8/31/21

to

On Tuesday, August 31, 2021 at 5:30:00 AM UTC+2, Krishna Myneni wrote:
> There can be a significant difference in the precision (and accuracy) of
> the delay provided by the standard word MS in different implementations.
> We recognized that using an OS sleep function was not suitable for our
> precision delay needs early on in the development of kForth (32-bit
> version), and switched to using a polling loop to implement MS.

Actually, this is quite hard to do because of modern processors' power
management. A simple busy-wait won't work. What are you polling?

-marcel

Anton Ertl

unread,

Aug 31, 2021, 7:13:03 AM8/31/21

to

On IA-32 and AMD64 CPUs since Core Solo and K10 (Phenom), the rdtsc
counter runs at a constant rate, so you can poll rtdsc in a busy loop
until the deadline is reached.

And then the OS decides that your thread has occupied enough CPU time
and gives the CPU to a different thread or process right before the
deadline:-) If you cannot get high precision from the OS, one
alternative is to use the OS nanosleep(), pselect() and the like for,
n-e ms (where e depends on the imprecision of the OS call), and use
busy waiting for the rest. This does not guarantee that the scheduler
will not interrupt your thread at an inconvenient time, but it lowers
the probability.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Krishna Myneni

unread,

Aug 31, 2021, 8:20:31 AM8/31/21

to

The system clock. See the source file vmc.c, function C_usec() from any
of the Linux kForth packages. MS scales the argument by 1000 and calls
C_usec(). The relevant polling loop is

do
{
gettimeofday (&tv2, NULL);
} while (timercmp(&tv1, &tv2, >)) ;

tv2 is the current time, and tv1 contains the computed end time. The
times have a field with microsecond resolution.

--
Krishna

Krishna Myneni

unread,

Aug 31, 2021, 10:16:21 AM8/31/21

to

On 8/31/21 5:27 AM, Anton Ertl wrote:
...

> If you cannot get high precision from the OS, one
> alternative is to use the OS nanosleep(), pselect() and the like for,
> n-e ms (where e depends on the imprecision of the OS call), and use
> busy waiting for the rest. This does not guarantee that the scheduler
> will not interrupt your thread at an inconvenient time, but it lowers
> the probability.
>

All approaches to implementing a delay on a system with interrupts will
result in a probability distribution for the deltas, where delta is the
actual elapsed time minus the requested delay time. However, the polling
implementation dramatically reduces the width of the distribution
compared to calling a sleep function. Representative statistics for the
distributions as a function of delay are given below for the two
approaches. The mu and sigma are the mean and standard deviation in
microseconds for the distribution of deltas in 1000 trials for the word MS.

Gforth 64-bit (nanosleep call)
-------
Delay (ms), mu (us), sigma (us)
1, 118, 31.4
10, 300, 110.
100, 388, 86.7
500, 791, 114.
1000, 1223, 253.

kForth-64 (polling system clock)
---------
Delay (ms), mu (us), sigma (us)
1, 0.009, 0.226
10, 0.037, 0.553
100, 0.017, 0.339
500, 0.021, 0.216
1000, 0.035, 0.382

The nanosleep call used by Gforth leads to a clear systematic drift to
larger positive delta in the mean, with increasing delay, and this is to
be expected. At 1000 ms delay, the average delay produced by MS is more
than 1 ms greater. The width of the distribution for this approach also
shows an overall systematic drift to increasing width with delay.

It is difficult to tell, over this range of delays, whether or not there
is a similar, but considerably smaller, systematic drift in the mean vs
delay for the polling approach -- I expect there to be a systematic
error with increasing delay. There is also no clear indication that the
width of the distribution is increasing with delay, over this range of
delays, for the polling approach. The ratio of the sigmas vs delay is
given below.

Delay (ms) sigma(nanosleep)/sigma(polling)
1, 3490
10, 199
100, 255
500, 528
1000, 663

The above results are probably best case results on my system, where
nothing beyond system processes were running while the measurements were
performed, although several applications were open.

kForth provides the following timing words:

Precision Delays: MS US
Non-Time-Critical Delays: USLEEP
Fetch Time: MS@ US2@ TIME&DATE

--
Krishna

anti...@math.uni.wroc.pl

unread,

Sep 1, 2021, 7:11:21 AM9/1/21

to

What about running your test on heavily loaded machine? That is
when you have 100% load on each core? Or having several instances
of Forth doing MS in parallel (but with different setpoints)?

--
Waldek Hebisch

Krishna Myneni

unread,

Sep 1, 2021, 8:41:27 AM9/1/21

to

Yes, it needs to be tested to see how fast the delay precision and
accuracy degrade when the system load is gradually increased. In typical
use we run Forth applications on a system in which no other applications
are concurrently running, i.e. specialized data acquisition systems.
Linux, like most modern OS's, is efficient in distributing load across
multiple cores.

--
Krishna

Mike

unread,

Sep 1, 2021, 10:48:02 AM9/1/21

to

On Wednesday, September 1, 2021 at 3:41:27 PM UTC+3, Krishna Myneni wrote:

> Yes, it needs to be tested to see how fast the delay precision and
> accuracy degrade when the system load is gradually increased. In typical
> use we run Forth applications on a system in which no other applications
> are concurrently running, i.e. specialized data acquisition systems.
> Linux, like most modern OS's, is efficient in distributing load across
> multiple cores.
>
> --
> Krishna

In order to get realtime performance you need to run the forth process with a realtime schedule.

Assigning a realtime schedule to forth would make it it's performance independent of the load
caused by the fair scheduling processes.
Fair scheduling is normally used in linux if nothing else is requested.

There are many scheduling possibilites in Linux.
https://www.kernel.org/doc/html/latest/scheduler/index.html

Mike

anti...@math.uni.wroc.pl

unread,

Sep 1, 2021, 5:44:48 PM9/1/21

to

Have you considerd what happens when there are several "real time"
processes and each tries to use 100 % CPU time?

--
Waldek Hebisch

Krishna Myneni

unread,

Sep 1, 2021, 9:55:25 PM9/1/21

to

Thanks for the link. I'm not looking to approximate a real-time Forth,
but only about minimizing the width of probability distribution
(improving the precision) for the difference between a requested delay
and the actual delay. This can be useful, for example, if I want to
generate a trigger (e.g. set a digital line high) or respond to an
external trigger and wait for some fixed delay before reading an input.

--
Krishna

Krishna Myneni

unread,

Sep 1, 2021, 10:03:53 PM9/1/21

to

On 9/1/21 7:41 AM, Krishna Myneni wrote:
> On 9/1/21 6:11 AM, anti...@math.uni.wroc.pl wrote:

...

>> What about running your test on heavily loaded machine? That is
>> when you have 100% load on each core? Or having several instances
>> of Forth doing MS in parallel (but with different setpoints)?
>>
>
> Yes, it needs to be tested to see how fast the delay precision and
> accuracy degrade when the system load is gradually increased. In typical
> use we run Forth applications on a system in which no other applications
> are concurrently running, i.e. specialized data acquisition systems.
> Linux, like most modern OS's, is efficient in distributing load across
> multiple cores.

A simple way to test this might be to fork the Forth process, similar to
the way that the program parallel-mm.4th (link below) works, to generate
multiple processes. It should be possible to examine the precision vs.
number of forks, for a given delay. All forked processes should begin
the precision measurements at nearly the same time, however, maybe by
using a start time set by the initial process.

--
Krishna

parallel-mm.4th:
https://github.com/mynenik/kForth-64/blob/master/forth-src/parallel-mm.4th

koo

unread,

Sep 2, 2021, 2:12:59 AM9/2/21

to

I've been wondering implementation of precision timing function embedded in north bridge.
It might be easier, since all data will be store in DDRAM.
Regards, all