RFC: unify timer callback handling on all platforms

29 views
Skip to first unread message

Albrecht Schlosser

unread,
Jul 5, 2021, 4:02:45 PM7/5/21
to fltk.coredev
I'm writing this because ...

On 7/5/21 3:09 PM Manolo wrote in thread "About mixing Fl::awake(Fl_Awake_Handler, void*) and Fl::awake(void*)":

I'm asking because I'm exploring the use of Posix timers to implement Fl::add_timeout() 
for the Wayland platform (see the timer_create() and timer_settime() functions).
These functions allow to trigger the timer either as a signal or starting a thread.
I've chosen the thread option, and have the thread call Fl::awake(cb, data)
so that the timer's callback gets then called by the main thread in its event loop.

@Manolo: Before you dive too deep into a specific implementation for Wayland I'd like to share some thoughts I'm having since some time now to unify the timer handling on all platforms. I believe that the Linux timer implementation is superior to the Windows and maybe also the macOS implementation. The Linux timer implementation works like this (maybe over-simplified):

(1) Every call to Fl::add_timeout() or Fl::repeat_timeout() adds a timer entry to the internal timer queue. This queue is sorted by the timer's due time.

(2) There's only one system timer, using the smallest delta value, i.e. the time of the first timer in the queue.

(3) Whenever the timer triggers (or maybe more often) the event handling decrements the delta time of all timers.

(4) All timers callbacks of expired timers are called.

(5) A new timer with the shortest delay (which is always the first timer in the queue) is scheduled.

(6) Wait for timer events...

This is AFAICT done because the standard Unix timers can be interrupted and need to re-scheduled whenever such interrupts occur.

The benefit of this approach is as described in Fl::repeat_timer() docs: if the call to Fl::repeat_timer() is done "late" it can be corrected by the delay and the overall timer sequence of repeated timers is more accurate than on other platforms.

On the Windows platform we're (AFAICT) using one system timer per Fl::add/repeat_timer() call. The Windows timer events are less accurate anyway, but a change as designed for Unix/Linux could probably contribute to more accuracy of repeated timer events because the correction of timer delay as on Unix/Linux could work better (it does not currently on Windows).

I know less about the macOS platform, but I know for sure that the timer handling is different. There are inconsistencies WRT Unix/Linux/Windows on the user visible level (which I intend to demonstrate with a test program anyway) but which are too difficult (and OT now).

That all said: I hope that the Wayland implementation would be basically like the Unix/Linux timer queue handling so we can easily unify all platforms.

More about the unification: I'm thinking of a platform independent timer queue where Fl::add_timeout() and friends would be platform independent. They would add an Fl_Timeout_XX object to the timer queue which may contain platform specific timer data (or not?). Triggering the timeout would then, as always, be done by the system, the timer queue handling would still be platform independent, as well as calling the callbacks etc.. The more I think about it, the more I believe that only the scheduling of this single timer event would be a platform dependent (i.e. system driver) function.

Comments welcome, particularly from Manolo but also from other devs. If I'm not missing anything (are there any drawbacks?) I could start soon to develop a new timeout handling model which does all of this as described.

Manolo

unread,
Jul 6, 2021, 1:22:42 AM7/6/21
to fltk.coredev
Le lundi 5 juillet 2021 à 22:02:45 UTC+2, Albrecht Schlosser a écrit :

@Manolo: Before you dive too deep into a specific implementation for Wayland I'd like to share some thoughts I'm having since some time now to unify the timer handling on all platforms. I believe that the Linux timer implementation is superior to the Windows and maybe also the macOS implementation. The Linux timer implementation works like this (maybe over-simplified):
I believe this means the timer implementation for the X11 FLTK platform (which covers Linux but also Unix and Darwin).


(1) Every call to Fl::add_timeout() or Fl::repeat_timeout() adds a timer entry to the internal timer queue. This queue is sorted by the timer's due time.

(2) There's only one system timer, using the smallest delta value, i.e. the time of the first timer in the queue.
In my view, there's no system timer at all. FLTK sets the max length of the next select/poll call to the smallest delta value,
which has the effect of breaking the event loop at the desired time. This setup is possible because with X11 (and with Wayland too)
the event loop is built using a select/poll call that returns when data arrive on a fd or when the waiting delay expires.


(3) Whenever the timer triggers (or maybe more often) the event handling decrements the delta time of all timers.
I find this procedure awkward, even though it's correct.


(4) All timers callbacks of expired timers are called.

(5) A new timer with the shortest delay (which is always the first timer in the queue) is scheduled.

(6) Wait for timer events...
In my view, there're no real timer events: the poll/select call expires.


This is AFAICT done because the standard Unix timers can be interrupted and need to re-scheduled whenever such interrupts occur.

The benefit of this approach is as described in Fl::repeat_timer() docs: if the call to Fl::repeat_timer() is done "late" it can be corrected by the delay and the overall timer sequence of repeated timers is more accurate than on other platforms.

On the Windows platform we're (AFAICT) using one system timer per Fl::add/repeat_timer() call. The Windows timer events are less accurate anyway, but a change as designed for Unix/Linux could probably contribute to more accuracy of repeated timer events because the correction of timer delay as on Unix/Linux could work better (it does not currently on Windows).

I know less about the macOS platform, but I know for sure that the timer handling is different. There are inconsistencies WRT Unix/Linux/Windows on the user visible level (which I intend to demonstrate with a test program anyway) but which are too difficult (and OT now).
The macOS FLTK platform uses a system timer: the event loop is made by calling a function that does "wait until an event arrives",
and Fl::add_timeout creates a system object that makes the waiting function run the timer cb when the delay has expired.
My idea was to also use a true system timer for the Wayland platform (but that could be for all Linux). Posix timers do that.
They trigger either a signal or a thread after a specified delay. With the thread approach, having the child thread call Fl::awake(cb, data)
allows the main thread to stop waiting and process the timeout cb.


That all said: I hope that the Wayland implementation would be basically like the Unix/Linux timer queue handling so we can easily unify all platforms.

More about the unification: I'm thinking of a platform independent timer queue where Fl::add_timeout() and friends would be platform independent. They would add an Fl_Timeout_XX object to the timer queue which may contain platform specific timer data (or not?). Triggering the timeout would then, as always, be done by the system, the timer queue handling would still be platform independent, as well as calling the callbacks etc.. The more I think about it, the more I believe that only the scheduling of this single timer event would be a platform dependent (i.e. system driver) function.

As written above,  the X11 approach uses the fd through which all X11 data arrives and the poll/select call on this fd to simulate
timeout events: it reduces the max waiting time of the poll/select call. Is your idea to change the organization of the event loop
of other platforms (namely macOS) and have it wait for GUI events for a time determined by the next scheduled timeout?

Albrecht Schlosser

unread,
Jul 6, 2021, 10:04:32 AM7/6/21
to fltkc...@googlegroups.com
On 7/6/21 7:22 AM Manolo wrote:

Le lundi 5 juillet 2021 à 22:02:45 UTC+2, Albrecht Schlosser a écrit :

@Manolo: Before you dive too deep into a specific implementation for Wayland I'd like to share some thoughts I'm having since some time now to unify the timer handling on all platforms. I believe that the Linux timer implementation is superior to the Windows and maybe also the macOS implementation. The Linux timer implementation works like this (maybe over-simplified):
I believe this means the timer implementation for the X11 FLTK platform (which covers Linux but also Unix and Darwin).


(1) Every call to Fl::add_timeout() or Fl::repeat_timeout() adds a timer entry to the internal timer queue. This queue is sorted by the timer's due time.

(2) There's only one system timer, using the smallest delta value, i.e. the time of the first timer in the queue.
In my view, there's no system timer at all. FLTK sets the max length of the next select/poll call to the smallest delta value,
which has the effect of breaking the event loop at the desired time. This setup is possible because with X11 (and with Wayland too)
the event loop is built using a select/poll call that returns when data arrive on a fd or when the waiting delay expires.

Yes, that's true. What I wanted to say is that FLTK does not schedule a different (system) timer for each timer entry of the timer queue as opposed to the Windows implementation (IIRC).


(3) Whenever the timer triggers (or maybe more often) the event handling decrements the delta time of all timers.
I find this procedure awkward, even though it's correct.

Awkward or not, it's more precise than that of the Windows timers. My intention is to simplify and unify the timer event handling on all platforms. We have the documented "delta adjustment" of Fl::repeat_timeout() which definitely doesn't work well (or not at all) on Windows which is why timers on Windows tend to drift away. One reason of inaccuracy is the Windows timer granularity which has limits (this affects each single timer). The overall timer accuracy of Fl::repeat_timer() could, however, be improved (less or no overall timer drift). Let's say: if you n repeated timers then the average timer interval should be as accurate as possible.


(4) All timers callbacks of expired timers are called.

(5) A new timer with the shortest delay (which is always the first timer in the queue) is scheduled.

(6) Wait for timer events...
In my view, there're no real timer events: the poll/select call expires.

Yep, I agree. As said above...

This is AFAICT done because the standard Unix timers can be interrupted and need to [be] re-scheduled whenever such interrupts occur.


The benefit of this approach is as described in Fl::repeat_timer() docs: if the call to Fl::repeat_timer() is done "late" it can be corrected by the delay and the overall timer sequence of repeated timers is more accurate than on other platforms.

On the Windows platform we're (AFAICT) using one system timer per Fl::add/repeat_timer() call. The Windows timer events are less accurate anyway, but a change as designed for Unix/Linux could probably contribute to more accuracy of repeated timer events because the correction of timer delay as on Unix/Linux could work better (it does not currently on Windows).

I know less about the macOS platform, but I know for sure that the timer handling is different. There are inconsistencies WRT Unix/Linux/Windows on the user visible level (which I intend to demonstrate with a test program anyway) but which are too difficult (and OT now).

The macOS FLTK platform uses a system timer: the event loop is made by calling a function that does "wait until an event arrives",
and Fl::add_timeout creates a system object that makes the waiting function run the timer cb when the delay has expired.

Is it correct that this is one distinct system timer per timer queue entry?


My idea was to also use a true system timer for the Wayland platform (but that could be for all Linux). Posix timers do that.
They trigger either a signal or a thread after a specified delay. With the thread approach, having the child thread call Fl::awake(cb, data)
allows the main thread to stop waiting and process the timeout cb.

Hmm, is this a necessary change for the Wayland platform, or do you want to do it because you find the current implementation "awkward"?

Did you consider the possibly new requirements, platform dependencies, and such? If we extended this to the normal Unix platform ("all Linux", as you wrote), would this affect compatibility?

Note: I don't consider the current Unix/Linux implementation awkward (it has its advantages), and if you want to change it to POSIX timers we should discuss this here in general. I would rather change all other implementations to the "manage our own timer event queue" approach like it is in our current Unix/Linux implementation. But this is not a contradiction because the timer event scheduling and the timer event processing can be independent - see my description below.


That all said: I hope that the Wayland implementation would be basically like the Unix/Linux timer queue handling so we can easily unify all platforms.

More about the unification: I'm thinking of a platform independent timer queue where Fl::add_timeout() and friends would be platform independent. They would add an Fl_Timeout_XX object to the timer queue which may contain platform specific timer data (or not?). Triggering the timeout would then, as always, be done by the system, the timer queue handling would still be platform independent, as well as calling the callbacks etc.. The more I think about it, the more I believe that only the scheduling of this single timer event would be a platform dependent (i.e. system driver) function.

As written above,  the X11 approach uses the fd through which all X11 data arrives and the poll/select call on this fd to simulate
timeout events: it reduces the max waiting time of the poll/select call. Is your idea to change the organization of the event loop
of other platforms (namely macOS) and have it wait for GUI events for a time determined by the next scheduled timeout?

I can't answer this question with yes or no.

My basic idea is to unify (and therefore simplify) the timer event handling on all platforms. I've seen IMHO too much platform specific code to handle timer events. The more platforms we add, the more platform specific code we need to maintain.

My goal is to do as much as possible in platform independent code. This platform independent code would schedule timer events by adding them to the timer event queue - in my model the Unix/Linux code would be a valid implementation. This could be done on all current and future platforms in a platform independent way. There would also be only one platform independent timer event processing function. This could be (like) the current  Unix/Linux implementation which decrements the timer delay of all timer events in the queue.

The only platform dependent code should be the scheduling of the timer event. This could be as it is now on Unix/Linux to reduce the select/poll timer to the next timer delay or to schedule exactly one system timer on each platform which would be the delta time to the next (first) timer event. The only requirement is that the resulting timer event would call the platform independent process_timer_event() function.

For further clarification: my model would allow to use the fd approach we're using now as well as POSIX timers and Windows or macOS timers as long as we're only scheduling one timer for all systems and we're doing the timer event processing in only one system independent function. This is basically what I want to achieve.

This would allow us to get the same behavior on all current and future platforms including the optimal "repeated timer delay correction" with the minimum of platform specific code. A nice side effect would be that porting to another platform would be simplified.

Manolo

unread,
Jul 6, 2021, 1:10:46 PM7/6/21
to fltk.coredev
Le mardi 6 juillet 2021 à 16:04:32 UTC+2, Albrecht Schlosser a écrit :
I know less about the macOS platform, but I know for sure that the timer handling is different. There are inconsistencies WRT Unix/Linux/Windows on the user visible level (which I intend to demonstrate with a test program anyway) but which are too difficult (and OT now).

The macOS FLTK platform uses a system timer: the event loop is made by calling a function that does "wait until an event arrives",
and Fl::add_timeout creates a system object that makes the waiting function run the timer cb when the delay has expired.

Is it correct that this is one distinct system timer per timer queue entry?
Yes, one system timer per queue entry.


My idea was to also use a true system timer for the Wayland platform (but that could be for all Linux). Posix timers do that.
They trigger either a signal or a thread after a specified delay. With the thread approach, having the child thread call Fl::awake(cb, data)
allows the main thread to stop waiting and process the timeout cb.

Hmm, is this a necessary change for the Wayland platform, or do you want to do it because you find the current implementation "awkward"?
No. The X11 implementation works with Wayland just as well as with X11.

More about the unification: I'm thinking of a platform independent timer queue where Fl::add_timeout() and friends would be platform independent. They would add an Fl_Timeout_XX object to the timer queue which may contain platform specific timer data (or not?). Triggering the timeout would then, as always, be done by the system, the timer queue handling would still be platform independent, as well as calling the callbacks etc.. The more I think about it, the more I believe that only the scheduling of this single timer event would be a platform dependent (i.e. system driver) function.

As written above,  the X11 approach uses the fd through which all X11 data arrives and the poll/select call on this fd to simulate
timeout events: it reduces the max waiting time of the poll/select call. Is your idea to change the organization of the event loop
of other platforms (namely macOS) and have it wait for GUI events for a time determined by the next scheduled timeout?

I can't answer this question with yes or no.

My basic idea is to unify (and therefore simplify) the timer event handling on all platforms. I've seen IMHO too much platform specific code to handle timer events. The more platforms we add, the more platform specific code we need to maintain.

My goal is to do as much as possible in platform independent code. This platform independent code would schedule timer events by adding them to the timer event queue - in my model the Unix/Linux code would be a valid implementation. This could be done on all current and future platforms in a platform independent way. There would also be only one platform independent timer event processing function. This could be (like) the current  Unix/Linux implementation which decrements the timer delay of all timer events in the queue.

The only platform dependent code should be the scheduling of the timer event. This could be as it is now on Unix/Linux to reduce the select/poll timer to the next timer delay or to schedule exactly one system timer on each platform which would be the delta time to the next (first) timer event. The only requirement is that the resulting timer event would call the platform independent process_timer_event() function.

For further clarification: my model would allow to use the fd approach we're using now as well as POSIX timers and Windows or macOS timers as long as we're only scheduling one timer for all systems and we're doing the timer event processing in only one system independent function. This is basically what I want to achieve.

This would allow us to get the same behavior on all current and future platforms including the optimal "repeated timer delay correction" with the minimum of platform specific code. A nice side effect would be that porting to another platform would be simplified.
Good. I think I begin to understand more clearly what is your proposal. It requires to distinguish between timer queue entries and system timers
and to have only one system timer. I had not envisaged this before. That seems to allow to much reduce the platform-specific part of timer support.
I look forward your proposal.

Albrecht Schlosser

unread,
Jul 6, 2021, 3:15:12 PM7/6/21
to fltkc...@googlegroups.com
Thanks for your comments. I'm going to look closer into the code and I'll work on a proposal...

Ian MacArthur

unread,
Jul 6, 2021, 3:16:23 PM7/6/21
to coredev fltk
On 5 Jul 2021, at 21:02, Albrecht Schlosser wrote:
>
> (1) Every call to Fl::add_timeout() or Fl::repeat_timeout() adds a timer entry to the internal timer queue. This queue is sorted by the timer's due time.
>
> (2) There's only one system timer, using the smallest delta value, i.e. the time of the first timer in the queue.
>
> (3) Whenever the timer triggers (or maybe more often) the event handling decrements the delta time of all timers.
>
> (4) All timers callbacks of expired timers are called.
>
> (5) A new timer with the shortest delay (which is always the first timer in the queue) is scheduled.
>
> (6) Wait for timer events...
>
> This is AFAICT done because the standard Unix timers can be interrupted and need to re-scheduled whenever such interrupts occur.


Also, since that design was created, it is possibly worth considering that the major desktop OS now tend to apply timer coalescing, at least to some extent, which would not have been the case in times past...

What this means is that the OS tries to shuffle as many timer-related events as possible into a smaller set of time slots, so that at each time slot it can execute “all” the pending events then sleep the CPU (or at least drop the CPU into a lower power state) and thus minimise power/heat and maximise battery life etc.

I don't know if that would affect how we think about this issue or not, though.



Bill Spitzak

unread,
Jul 6, 2021, 4:30:53 PM7/6/21
to fltkc...@googlegroups.com
I absolutely agree that the simpler approach where code is shared is right. Most of the fltk timer code is just to provide a convenient api. Internally the moment Fl::wait() is called it know exactly how long it is until the next time. All it needs to do in system-specific code is wait at most that amount of time for an event to come from the user interface. I think most systems have a timeout on the "wait for an event" function, it would be implemented using that. It would not use any "timer events" provided by the system.
There is some more difficulty with also waiting for feedback from other fd's, that I think does require some variation between platforms. The last I remember it was just punted on Windows, adding an fd to listen to in fltk did not work, or at least delayed the response until a UI event came in. This may not be as big of a problem as it sounds, as the only reason the fd monitoring is in fltk at all was because it was a trivial addition to the X11 version.



--
You received this message because you are subscribed to the Google Groups "fltk.coredev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fltkcoredev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fltkcoredev/42064247-9E8D-48B7-846F-E9BEDD261F81%40gmail.com.

Albrecht Schlosser

unread,
Jul 7, 2021, 10:45:59 AM7/7/21
to fltkc...@googlegroups.com
On 7/6/21 9:15 PM Albrecht Schlosser wrote:
>
>  I'm going to look closer into the code and I'll work on a proposal...

Before actually working on the new proposal I tried to fix two
inconsistencies of the macOS platform. I committed one fix
(87475c20d6cc81912e) and created PR #248. Manolo, could you please
review the PR? I'm pretty confident that it's correct (it fixes the
issue) but I may have missed something. Looking forward to your review at:

https://github.com/fltk/fltk/pull/248

Manolo

unread,
Jul 9, 2021, 5:11:58 AM7/9/21
to fltk.coredev
Le mercredi 7 juillet 2021 à 16:45:59 UTC+2, Albrecht Schlosser a écrit :

Before actually working on the new proposal I tried to fix two
inconsistencies of the macOS platform. I committed one fix
(87475c20d6cc81912e) and created PR #248. Manolo, could you please
review the PR?
OK with this change.

I'm uncertain about what is expected by Fl::repeat_timeout().
Its general goal is to schedule a new timeout at a given delay (δ) after the previous timeout
(last_t) was scheduled. My question is "what should Fl::repeat_timeout() do when it runs after this
delay expired (after last_t + δ) ?". The current implementation for the macOS platform prioritizes the regularity
of timeouts, and schedules a new timeout for last_t + n * δ where n is the smallest integer value that puts this date in the future.
I believe the Unix platform implementation prioritizes the running of timeout callbacks and
has the callback run several times without delay.

What do FLTK developers believe should be the priority of Fl::repeat_timeout()?

Albrecht Schlosser

unread,
Jul 9, 2021, 8:31:35 AM7/9/21
to fltkc...@googlegroups.com
On 7/9/21 11:11 AM Manolo wrote:
Le mercredi 7 juillet 2021 à 16:45:59 UTC+2, Albrecht Schlosser a écrit :

Before actually working on the new proposal I tried to fix two
inconsistencies of the macOS platform. I committed one fix
(87475c20d6cc81912e) and created PR #248. Manolo, could you please
review the PR?
OK with this change.

Thanks for confirmation. As you may have noticed I merged and closed this PR.


I'm uncertain about what is expected by Fl::repeat_timeout().

Fl::repeat_timeout() is exactly the same as Fl::add_timeout() with at "correction" that takes into account if the last timeout (i.e. the current timeout callback) expired (a little) too late. In the theory this should allow "for more accurate timing" as the docs say. The example code that "will print 'TICK' each second ... with a fair degree of accuracy" demonstrates its intention. The Unix implementation tries to calculate a delay of the time when the timer expired as opposed to when it was expected to expire (variable: missed_timeout_by) and corrects the delay by that amount, i.e. if the current timer was "missed" by 10 ms then Fl::repeat_timeout(1.0) should schedule a new timeout at 1.0s - 10 ms from "now". The meaning of "expired" above is the time when FLTK got control of the timeout, i.e. somewhere in the function elapse_timeouts(). If the current delta is negative, then we missed the scheduled timeout by that amount.

A good measure of this behavior would be to calculate the average delay of a sequence of, say, 50 timeouts of 1.0 seconds. If this algorithm works well, the average would be pretty exact 1.0.

This "correction" is done on the Unix/Linux platform but IIRC not on Windows. I'm not sure about macOS, something I need to investigate further (with your help).

The overall (platform independent) behavior of Fl::repeat_timeout() is something I'd like to address in the new, platform independent, timeout code.


Its general goal is to schedule a new timeout at a given delay (δ) after the previous timeout
(last_t) was scheduled. My question is "what should Fl::repeat_timeout() do when it runs after this
delay expired (after last_t + δ) ?". The current implementation for the macOS platform prioritizes the regularity
of timeouts, and schedules a new timeout for last_t + n * δ where n is the smallest integer value that puts this date in the future.

As you describe it, I believe the goal is the same, but the implementation might be slightly different.


I believe the Unix platform implementation prioritizes the running of timeout callbacks and
has the callback run several times without delay.

Hmm, I don't understand what you mean with this sentence. There's no repetition in Fl::repeat_timeout() itself, it's just one (next) timeout that is to be scheduled.

Let's make an example. To be more realistic, let the delay (δ) be 0.1 sec, i.e. Fl::repeat_timeout(0.1, ...), and the first timeout is triggered at time 10.0 sec. I'll list the sequence below line by line

10.00  Fl::repeat_timeout(0.1) -> delay 0.10 for next timeout

timer code runs a little late due to system load:

10.12  (missed_timeout_by = 0.02), Fl::repeat_timeout(0.1) -> delay 0.08 for next timeout (should be 10.20)

Now, is your question what happens if the next timeout expiration is run by FLTK much later, at:

10.31 (missed_timeout_by = 0.11), Fl::repeat_timeout(0.1), correction by -0.11 yields -0.01 delay, i.e. the next timer schedule has already passed?

I think in this code the Unix/Linux would schedule the next timeout immediately so we don't miss a timeout.

Do you say that the macOS code would schedule the next timeout at 10.40? Is this the difference you're talking about?

If this was the case, then the average of timer delays would suffer significantly (by 0.1/n), whereas the Unix implementation would only "drift" once by 0.01 seconds and the average would increase by 0.01/n).


What do FLTK developers believe should be the priority of Fl::repeat_timeout()?

My personal opinion is that the next timeout should be sheduled as soon as possible if the calculated "next" timeout has already passed (if I understood your question).

Fl::repeat_timeout() should be triggered as exact as possible after the point in time where the last (current) timeout should have been triggered plus the delay given as the argument to "allow for more accurate timing", as the docs express it.

In other words: the above described sequence of n timeouts should not "drift away" as it probably does on Windows in our current implementation because there's no correction applied. I had planned to write such a demo program anyway. I'll do this shortly and post it here.

Manolo

unread,
Jul 9, 2021, 9:25:49 AM7/9/21
to fltk.coredev
Yes. That's exactly my question. What to do in that situation?

I think in this code the Unix/Linux would schedule the next timeout immediately so we don't miss a timeout.
That's what I believe Unix does too.


Do you say that the macOS code would schedule the next timeout at 10.40? Is this the difference you're talking about?
Yes.

If this was the case, then the average of timer delays would suffer significantly (by 0.1/n), whereas the Unix implementation would only "drift" once by 0.01 seconds and the average would increase by 0.01/n).
In terms of average delay, I agree skipping is bad.
But in terms of rythm, having 2 iterations without delay in between is very bad.


What do FLTK developers believe should be the priority of Fl::repeat_timeout()?

My personal opinion is that the next timeout should be sheduled as soon as possible if the calculated "next" timeout has already passed (if I understood your question).

Fl::repeat_timeout() should be triggered as exact as possible after the point in time where the last (current) timeout should have been triggered plus the delay given as the argument to "allow for more accurate timing", as the docs express it.

In other words: the above described sequence of n timeouts should not "drift away" as it probably does on Windows in our current implementation because there's no correction applied. I had planned to write such a demo program anyway. I'll do this shortly and post it here.

When the timeout is just a little bit late, say by delta, the correct solution is clear: schedule the next timeout to now + delay - delta.
My question arises when the timeout is very late, more than the delay between successive timeouts. What to do in that situation?
Either :
- skip one iteration, because its time is over, and schedule for next iteration;
- play two iterations without delay in between.
Both choices avoid the drift seen with the Windows implementation.

Albrecht Schlosser

unread,
Jul 9, 2021, 10:14:34 AM7/9/21
to fltkc...@googlegroups.com
On 7/9/21 3:25 PM Manolo wrote:
Le vendredi 9 juillet 2021 à 14:31:35 UTC+2, Albrecht Schlosser a écrit :
What do FLTK developers believe should be the priority of Fl::repeat_timeout()?

My personal opinion is that the next timeout should be scheduled as soon as possible if the calculated "next" timeout has already passed (if I understood your question).


Fl::repeat_timeout() should be triggered as exact as possible after the point in time where the last (current) timeout should have been triggered plus the delay given as the argument to "allow for more accurate timing", as the docs express it.

In other words: the above described sequence of n timeouts should not "drift away" as it probably does on Windows in our current implementation because there's no correction applied. I had planned to write such a demo program anyway. I'll do this shortly and post it here.

When the timeout is just a little bit late, say by delta, the correct solution is clear: schedule the next timeout to now + delay - delta.
My question arises when the timeout is very late, more than the delay between successive timeouts. What to do in that situation?
Either :
- skip one iteration, because its time is over, and schedule for next iteration;
- play two iterations without delay in between.

We do not know what exactly happened in such a case, i.e. where in the application the system has spent so much time that the next timer iteration is "now" (when Fl::repeat_timeout() is called) already passed. To see what I'm contemplating, see this timer callback pseudo code:

void timer_cb(...) {
    some_stuff();
    Fl::repeat_timeout(delta, timer_cb);
}


First question: is this order useful/correct, or should repeat_timeout() be called early in the callback?

Maybe the user program can only determine whether to call repeat_timeout() after doing some calculations which costs some time. That said, I believe that this is legit user code.

Second question: did some_stuff() spend too much time or has it been other system load (or event processing) that prevented the timer callback from running in time?

If it has been some other event handling or another thread and if some_stuff() is really short, and if the next iteration doesn't suffer from a similar system load, then some_stuff() may indeed be called twice with only little or practically no delay. But how can we know this, and how can we know what the author of the application anticipated?

OTOH, if the application code is in the opposite order:

void timer_cb(...) {
  Fl::repeat_timeout(delta, timer_cb);
  some_stuff();
}


What would happen if timer_cb() is called too late by 0.6*delta and some_stuff() needs 0.5*delta to execute? We'd have scheduled the timer already with delay 0.4*delta but it will be executed after a minimum of 1.1*delta. And it will be executed (not skipped). The next iteration would then be 0.9*delta later.

If the same timing happens in the first callback model, the delay would already be at 1.1*delta when Fl::repeat_timeout() gets called and the macOS code would silently skip one iteration, whereas the Unix code would execute the next iteration as soon as possible.

I hope you (all) can follow me, it's difficult to describe what I'm thinking. Given these two examples I believe that the Unix approach is more consistent.

That all said, however, we're talking about real border cases with very high timeout frequencies when the overall application and system load is higher than the application can process correctly. In that case we're in the realm of undefined behavior and every solution would be possible. In the Unix code the normal timer loop would be turned into something like an idle callback, and that's likely to be expected. If this happens, there must be something wrong with the application anyway. Skipping timer callbacks would IMHO not a be proper solution.

Bill Spitzak

unread,
Jul 9, 2021, 12:54:29 PM7/9/21
to fltkc...@googlegroups.com
I vaguely remember that repeat_timeout, if the calculated remaining time was zero or negative, would punt and instead act like add_timeout. My feeling was that if a program was too slow it would be running the timeouts continuously if the alternative of just calling it immediately was done. There was certainly no testing as to whether this was the correct solution or not.


--
You received this message because you are subscribed to the Google Groups "fltk.coredev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fltkcoredev...@googlegroups.com.

Ian MacArthur

unread,
Jul 9, 2021, 1:25:58 PM7/9/21
to coredev fltk
On 9 Jul 2021, at 17:54, Bill Spitzak wrote:
>
> I vaguely remember that repeat_timeout, if the calculated remaining time was zero or negative, would punt and instead act like add_timeout. My feeling was that if a program was too slow it would be running the timeouts continuously if the alternative of just calling it immediately was done. There was certainly no testing as to whether this was the correct solution or not.

But, to me at least, it sounds like it probably is.

The crux, like Bill said, is that is you are running so slowly that you miss the timeout, then trying to “fill in” all the missing timeouts is only going to make matters worse, I imagine...


Albrecht Schlosser

unread,
Jul 9, 2021, 4:56:20 PM7/9/21
to fltkc...@googlegroups.com
On 7/9/21 7:25 PM Ian MacArthur wrote:
On 9 Jul 2021, at 17:54, Bill Spitzak wrote:
I vaguely remember that repeat_timeout, if the calculated remaining time was zero or negative, would punt and instead act like add_timeout. My feeling was that if a program was too slow it would be running the timeouts continuously if the alternative of just calling it immediately was done. There was certainly no testing as to whether this was the correct solution or not.

The current solution is clearly to deliver the next timeout as soon as possible:

time += missed_timeout_by; if (time < -.05) time = 0;

After this statement the timeout is queued with delta == time (0).

However, my tests seem to indicate that something's going awry with the calculation of missed_timeout_by . There are some strange effects which I'm still investigating.


But, to me at least, it sounds like it probably is.

In don't know yet what exactly is happening with the current code. But I imagine that a user program that's not able to process the timeouts fast enough would be entering a loop, similar to an idle callback. Skipping single repeat_timeout's would lead to undeterministic behavior.

And it does (in parts), see my test program logs below.


The crux, like Bill said, is that is you are running so slowly that you miss the timeout, then trying to “fill in” all the missing timeouts is only going to make matters worse, I imagine...

Sure, as I said above, that's what I'd expect. This is a program error but wouldn't it be easier to diagnose the error if FLTK would not "try to help" and skip particular timer callbacks?

I'm attaching a test program as announced: timer2.cxx.

The following constants describe the test case:

const double delay = 0.500; // timer delay
const int timeouts = 8; // number of timeouts to be tested
const int load1 = 600; // simulated workload in ms before Fl::repeat_timeout()

Fl::repeat_timeout(delay) is called in a timer callback after a simulated workload of 600 ms duration which is clearly longer than the timer delay. The test is repeated 8 times. The outcome is "interesting": deterministic but different on all three major platforms. Note that I tested Windows in cross-compiler mode on my Linux box, but this does hopefully not matter.

(1) Linux:

$ bin/test/timer2
Tick -2 at 50.5994
Tick -1 at 51.1001
Tick  0 at 51.6000, delay =  0.5000
Tick  1 at 52.7004, delta =  1.1004, total =  1.1004, average = 1.100448
Tick  2 at 53.3009, delta =  0.6004, total =  1.7009, average = 0.850435
Tick  3 at 54.4017, delta =  1.1009, total =  2.8017, average = 0.933907
Tick  4 at 55.0020, delta =  0.6003, total =  3.4020, average = 0.850496
Tick  5 at 56.1030, delta =  1.1010, total =  4.5030, average = 0.900599
Tick  6 at 56.7033, delta =  0.6003, total =  5.1033, average = 0.850555
Tick  7 at 57.8042, delta =  1.1009, total =  6.2042, average = 0.886321
Tick  8 at 58.4046, delta =  0.6003, total =  6.8046, average = 0.850572
Done.

(2) Windows:

$ wine bin/test/timer2.exe
Tick -2 at  4.7490
Tick -1 at  5.2510
Tick  0 at  5.7520, delay =  0.5000
Tick  1 at  6.8550, delta =  1.1030, total =  1.1030, average = 1.103000
Tick  2 at  7.9570, delta =  1.1020, total =  2.2050, average = 1.102500
Tick  3 at  9.0600, delta =  1.1030, total =  3.3080, average = 1.102667
Tick  4 at 10.1630, delta =  1.1030, total =  4.4110, average = 1.102750
Tick  5 at 11.2660, delta =  1.1030, total =  5.5140, average = 1.102800
Tick  6 at 12.3700, delta =  1.1040, total =  6.6180, average = 1.103000
Tick  7 at 13.4730, delta =  1.1030, total =  7.7210, average = 1.103000
Tick  8 at 14.5760, delta =  1.1030, total =  8.8240, average = 1.103000
Done.

(3) macOS:

$ bin/test/timer2
Tick -2 at  0.4407
Tick -1 at  0.9397
Tick  0 at  1.4412, delay =  0.5000
Tick  1 at  2.4445, delta =  1.0032, total =  1.0032, average = 1.003243
Tick  2 at  3.4420, delta =  0.9975, total =  2.0008, average = 1.000387
Tick  3 at  4.4445, delta =  1.0025, total =  3.0033, average = 1.001087
Tick  4 at  5.4402, delta =  0.9957, total =  3.9990, average = 0.999742
Tick  5 at  6.4444, delta =  1.0042, total =  5.0032, average = 1.000643
Tick  6 at  7.4398, delta =  0.9953, total =  5.9986, average = 0.999760
Tick  7 at  8.4445, delta =  1.0047, total =  7.0032, average = 1.000461
Tick  8 at  9.4445, delta =  1.0000, total =  8.0033, average = 1.000408
Done.


(1) Linux: The effective delay alternates between 1.1 and 0.6 seconds (reproducibly). This is certainly not as designed and very likely a bug in the calculation and handling ((not) resetting?) of missed_timeout_by. I'm investigating...

The average is closest to the intended delay: ~0.85 sec.

(2) Windows: there's reproducibly no correction, the effective delay is always ~1.1 + x seconds, hence the average is also ~1.1 sec.

(3) macOS: the effective delay is ~1.0 seconds, as Manolo described: 2 * 0,5 = 1.0 sec. Average 1.0 is the intended delay times two.


That is: three platforms -- three different implementations -- three different results.

Sure, these are border cases, but I see that Wayland and Android are other candidates for having their own implementations. This should be avoided!


What we IMHO need to do is:

(a) define and describe and eventually document the "correct behavior"
(b) unify all platforms by providing a platform-agnostic common algorithm

The discussion here is good to solve (a) and I'm striving to do (b) which should use an algorithm defined by (a) and can be modified in one place for all current and future platforms.

Please feel free to use my test program with other cases and report your findings. The constants at the top of the program may be modified as you need. A better test program would have a GUI to modify the test params, but I don't know whether I'll ever do that. And there will likely be completely different test scenarios...

timer2.cxx
Reply all
Reply to author
Forward
0 new messages