Rare crash on repeat_timeout on macOS

28 views
Skip to first unread message

Gonzalo Garramuno

unread,
Dec 7, 2021, 11:10:58 AM12/7/21
to fltkc...@googlegroups.com
I am getting sporadic crashes on macOS with the repeat_timeout function. Here’s the stack trace:

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000008
Exception Note: EXC_CORPSE_NOTIFY

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [2775]

VM Regions Near 0x8:
-->
__TEXT 1063af000-1067c7000 [ 4192K] r-x/r-x SM=COW /Users/*

0 com.apple.CoreFoundation 0x00007fff204eab2e _CFGetNonObjCTypeID + 10
1 com.apple.CoreFoundation 0x00007fff2045c6fe CFRunLoopTimerSetNextFireDate + 58
2 mrv 0x00000001067088a6 Fl_Cocoa_Screen_Driver::repeat_timeout(double, void (*)(void*), void*) + 102
3 mrv 0x00000001065852b2 mrv::ImageView::timeout() + 1122
4 mrv 0x000000010659692d mrv::ImageView::handle_timeout() + 941
5 mrv 0x00000001067087b5 do_timer(__CFRunLoopTimer*, void*) + 53
6 com.apple.CoreFoundation 0x00007fff20438be9 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
7 com.apple.CoreFoundation 0x00007fff204386dd __CFRunLoopDoTimer + 927
8 com.apple.CoreFoundation 0x00007fff2043823a __CFRunLoopDoTimers + 307
9 com.apple.CoreFoundation 0x00007fff2041ee13 __CFRunLoopRun + 1988
10 com.apple.CoreFoundation 0x00007fff2041df8c CFRunLoopRunSpecific + 563
11 com.apple.HIToolbox 0x00007fff286661f3 RunCurrentEventLoopInMode + 292
12 com.apple.HIToolbox 0x00007fff28665f55 ReceiveNextEventCommon + 587
13 com.apple.HIToolbox 0x00007fff28665cf3 _BlockUntilNextEventMatchingListInModeWithFilter + 70
14 com.apple.AppKit 0x00007fff22c27172 _DPSNextEvent + 864
15 com.apple.AppKit 0x00007fff22c25945 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1364
16 mrv 0x000000010670c4a6 Fl_Cocoa_Screen_Driver::wait(double) + 582
17 mrv 0x000000010668c9ed Fl::run() + 29
18 mrv 0x0000000106680833 main + 10387
19 libdyld.dylib 0x00007fff20343f3d start + 1

Any ideas?



Gonzalo Garramuno
ggar...@gmail.com




Manolo

unread,
Dec 7, 2021, 11:33:26 AM12/7/21
to fltk.coredev
I would sggest you run your program under a debugger and make sure the pointer you transmit
as 3rg argument to Fl::repeat_timeout() remains valid until the moment when the timeout runs.

Gonzalo Garramuno

unread,
Dec 7, 2021, 11:34:52 AM12/7/21
to fltkc...@googlegroups.com
Here’s the crash with line numbers (not that it helps much):

0 com.apple.CoreFoundation 0x00007fff204eab2e _CFGetNonObjCTypeID + 10
1 com.apple.CoreFoundation 0x00007fff2045c6fe CFRunLoopTimerSetNextFireDate + 58
2 mrv-dbg 0x0000000109a581d9 Fl_Cocoa_Screen_Driver::repeat_timeout(double, void (*)(void*), void*) + 201 (Fl_Cocoa_Screen_Driver.cxx:518)
3 mrv-dbg 0x00000001099b3d13 Fl::repeat_timeout(double, void (*)(void*), void*) + 51 (Fl.cxx:241)
4 mrv-dbg 0x00000001097bd2c5 mrv::ImageView::timeout() + 1941 (mrvImageView.cpp:4019)
5 mrv-dbg 0x00000001097d39e6 mrv::ImageView::handle_timeout() + 1126 (mrvImageView.cpp:8489)
6 mrv-dbg 0x00000001097aa675 mrv::static_timeout(mrv::ImageView*) + 21 (mrvImageView.cpp:1792)
7 mrv-dbg 0x0000000109a580bc do_timer(__CFRunLoopTimer*, void*) + 92 (Fl_Cocoa_Screen_Driver.cxx:457)
8 com.apple.CoreFoundation 0x00007fff20438be9 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
9 com.apple.CoreFoundation 0x00007fff204386dd __CFRunLoopDoTimer + 927
10 com.apple.CoreFoundation 0x00007fff2043823a __CFRunLoopDoTimers + 307
11 com.apple.CoreFoundation 0x00007fff2041ee13 __CFRunLoopRun + 1988
12 com.apple.CoreFoundation 0x00007fff2041df8c CFRunLoopRunSpecific + 563
13 com.apple.HIToolbox 0x00007fff286661f3 RunCurrentEventLoopInMode + 292
14 com.apple.HIToolbox 0x00007fff28665f55 ReceiveNextEventCommon + 587
15 com.apple.HIToolbox 0x00007fff28665cf3 _BlockUntilNextEventMatchingListInModeWithFilter + 70
16 com.apple.AppKit 0x00007fff22c27172 _DPSNextEvent + 864
17 com.apple.AppKit 0x00007fff22c25945 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1364
18 mrv-dbg 0x0000000109a5de6b do_queued_events(double) + 187 (Fl_cocoa.mm:756)
19 mrv-dbg 0x0000000109a5dc64 Fl_Cocoa_Screen_Driver::wait(double) + 244 (Fl_cocoa.mm:796)
20 mrv-dbg 0x00000001099b41a8 Fl::wait(double) + 40 (Fl.cxx:449)
21 mrv-dbg 0x00000001099b4254 Fl::run() + 36 (Fl.cxx:469)
22 mrv-dbg 0x000000010999de79 main + 15641 (main.cpp:552)
23 libdyld.dylib 0x00007fff20343f3d start + 1

Gonzalo Garramuno
ggar...@gmail.com




Gonzalo Garramuno

unread,
Dec 7, 2021, 4:29:20 PM12/7/21
to fltkc...@googlegroups.com


> El 7 dic. 2021, a las 13:33, Manolo <manol...@gmail.com> escribió:
>
> I would sggest you run your program under a debugger and make sure the pointer you transmit
> as 3rg argument to Fl::repeat_timeout() remains valid until the moment when the timeout runs.
>

I debugged the program with printfs and found out that the dangling pointer is not the one as parameter to repeat_timeout (which is "this" of the class), but the current_timer->timer one. It seems there’s a running issue with the list of mac_timers and their timing to be ready to be called on repeat_timeout, leading them to be not initialized. I am unsure of this Fl_Cocoa_Screen_Driver.cxx code as it seems to be running on one thread but do things that several threads would do.


void Fl_Cocoa_Screen_Driver::repeat_timeout(double time, Fl_Timeout_Handler cb, void* data)
{
if (!current_timer) {
add_timeout(time, cb, data);
return;
}

//////////////////////// ADDED and fixes my crashes, but I consider this a hack and not a true solution.
if ( current_timer->timer == nullptr ) {
return;
}

// k = how many times 'time' seconds after the last scheduled timeout until the future
double k = ceil( (CFAbsoluteTimeGetCurrent() - current_timer->next_timeout) / time);
if (k < 1) k = 1;
current_timer->next_timeout += k * time;
CFRunLoopTimerSetNextFireDate(current_timer->timer, current_timer->next_timeout );
current_timer->callback = cb;
current_timer->data = data;
current_timer->pending = 1;
}


Gonzalo Garramuno
ggar...@gmail.com




Albrecht Schlosser

unread,
Dec 7, 2021, 5:22:05 PM12/7/21
to fltkc...@googlegroups.com
On 12/7/21 10:29 PM Gonzalo Garramuno wrote:
I debugged the program with printfs and found out that the dangling pointer is not the one as parameter to repeat_timeout (which is "this" of the class), but the current_timer->timer one.  It seems there’s a running issue with the list of mac_timers and their timing to be ready to be called on repeat_timeout, leading them to be not initialized.   I am unsure of this Fl_Cocoa_Screen_Driver.cxx code as it seems to be running on one thread but do things that several threads would do.


void Fl_Cocoa_Screen_Driver::repeat_timeout(double time, Fl_Timeout_Handler cb, void* data)
{
  if (!current_timer) {
    add_timeout(time, cb, data);
    return;
  }

//////////////////////// ADDED and fixes my crashes, but I consider this a hack and not a true solution.
  if ( current_timer->timer == nullptr ) {
      return;
  }


Hmm, OK, this observation rang a bell somewhere in my brain (faint memory). I believe there are two issues, one in the FLTK code and one liekly in your program, that "work together":

(1) the code in FLTK is (IMHO) incorrect, your added check works around this particular issue. I have this on my todo list and I'll probably file an issue so this won't be forgotten.

(2) the code in your program *might* call Fl::repeat_timeout() outside a timer callback which causes the 'current_timer' variable to be null. You can work around it in your program if you check that you call Fl::repeat_timeout() only in a timer callback that was triggered for this particular timer you want to repeat. Otherwise you must call Fl::add_timeout().

The (false) assumption in FLTK on macOS that the user code calls Fl::repeat_timeout() *only* in the timer callback of the same timer it wants to *repeat* is the culprit (IIRC). This is in platform specific code only on macOS and should be fixed because the timer code on other platforms works differently.

Thanks for the reminder, I need to check my todo list and must not forget it. You can help if you file a GitHub issue...

Albrecht Schlosser

unread,
Dec 7, 2021, 5:47:21 PM12/7/21
to fltkc...@googlegroups.com
On 12/7/21 11:22 PM Albrecht Schlosser wrote:
> The (false) assumption in FLTK on macOS that the user code calls
> Fl::repeat_timeout() *only* in the timer callback of the same timer it
> wants to *repeat* is the culprit (IIRC). This is in platform specific
> code only on macOS and should be fixed because the timer code on other
> platforms works differently.
>
> ... I need to check my todo list and must not forget it. ...

Note: I filed GitHub Issue #306 and added this to the "Release FLTK
1.4.0" Milestone so it won't be forgotten.
https://github.com/fltk/fltk/issues/306

Greg Ercolano

unread,
Dec 7, 2021, 5:47:37 PM12/7/21
to fltkc...@googlegroups.com

On 12/7/21 2:22 PM, Albrecht Schlosser wrote:

[..]

The (false) assumption in FLTK on macOS that the user code calls Fl::repeat_timeout() *only* in the timer callback of the same timer it wants to *repeat* is the culprit (IIRC). This is in platform specific code only on macOS and should be fixed because the timer code on other platforms works differently.


    Hmm, just reacting to the "(false) assumption in fltk"..

    Perhaps I'm missing something, but hasn't it always been the case
    calling Fl::repeat_timeout() is only valid within the timeout's callback?
    The docs seem to have always been clear about this:

1.1.x docs: "You may only call this method inside a timeout callback."
1.3.8 docs: "You may only call this method inside a timeout callback."
1.4.x docs: "You may only call this method inside a timeout callback of the same timer or at least a closely related timer [..]"

    I actually can't think of a valid reason to call it outside of that context..?

Gonzalo Garramuno

unread,
Dec 7, 2021, 5:55:28 PM12/7/21
to fltkc...@googlegroups.com
Note that in my case I am not calling repeat_timeout from other place or with different parameters (other than the delay). It Is being called from the same timeout with the same pointer.


Gonzalo Garramuno
ggar...@gmail.com




Albrecht Schlosser

unread,
Dec 7, 2021, 6:10:40 PM12/7/21
to fltkc...@googlegroups.com
OK, this may be a different issue than I thought. Looking at your
report, your "fix" is:

if ( current_timer->timer == nullptr ) {
return;
}



If this does no longer crash then 'current_timer' itself can't be NULL
(otherwise it would still crash) but that was the case I had in mind.

The case that (current_timer != nullptr) but (current_timer->timer ==
nullptr) would definitely be different.

Thanks for the clarification.

Is it possible that you write a small reproducer? I'm afraid not, but I
had to ask...

Albrecht Schlosser

unread,
Dec 7, 2021, 6:13:47 PM12/7/21
to fltkc...@googlegroups.com
Sure, you are right, that's the intention and it's documented this way. The 1.4.0 docs have been extended by me to make sure we documented that the user code is only allowed to do it this way, i.e. FLTK is on the safe side.

But the reality is different, and the same method works differently on *nix and Windows platforms, basically falling back to Fl::add_timeout() if ... yes, what? I don't know. On these platforms the code does at least start a timer and does not crash. It's a kind of error tolerance.

The comment I added was more about the accuracy of Fl::repeat_timeout(), basically in the *nix (Posix) implementation. This uses an internal "delta time" (variable name: 'missed_timeout_by') which is only defined inside the timer callback.

My goal is to remove this constraint and to be more error tolerant with a well-defined behavior and the fallback to Fl::add_timeout() if the context is not within a timer callback.

Once this is unified on all platforms we can document the new behavior which shouldn't break any correct program.

Manolo

unread,
Dec 8, 2021, 2:40:50 AM12/8/21
to fltk.coredev
The doc of Fl::repeat_timeout() states :
"You may only call this method inside a timeout callback of the same timer or at least a closely related timer, otherwise the timing accuracy can't be improved and the behavior is undefined."

In your code, the timeout callback function is mrv::static_timeout(), which calls at some point ImageView::timeout() which does Fl::repeat_timeout().

The question is what happens between the beginnings of mrv::static_timeout() and of ImageView::timeout() ?
If the event loop runs in between, then another timeout event can be triggered, and chaos arises.
¿ Do you call Fl::wait(), Fl::flush(), Fl::check() within mrv::static_timeout()  or functions it calls ?
If yes, that would be in contradiction to the correct use of Fl::repeat_timeout().

Manolo

unread,
Dec 8, 2021, 2:46:30 AM12/8/21
to fltk.coredev
@Gonzalo: Also, could you, please, replace Fl::repeat_timeout() calls by Fl::add_timeout()
and report if that fixes the problem ?


Gonzalo Garramuno

unread,
Dec 8, 2021, 11:39:21 AM12/8/21
to fltkc...@googlegroups.com


> El 8 dic. 2021, a las 04:46, Manolo <manol...@gmail.com> escribió:
>
> @Gonzalo: Also, could you, please, replace Fl::repeat_timeout() calls by Fl::add_timeout()
> and report if that fixes the problem ?
>

That also seems to fix the problem.


Gonzalo Garramuno
ggar...@gmail.com




Gonzalo Garramuno

unread,
Dec 8, 2021, 11:40:39 AM12/8/21
to fltkc...@googlegroups.com


> El 8 dic. 2021, a las 04:40, Manolo <manol...@gmail.com> escribió:
>
> The doc of Fl::repeat_timeout() states :
> "You may only call this method inside a timeout callback of the same timer or at least a closely related timer, otherwise the timing accuracy can't be improved and the behavior is undefined."
>
> In your code, the timeout callback function is mrv::static_timeout(), which calls at some point ImageView::timeout() which does Fl::repeat_timeout().
>
> The question is what happens between the beginnings of mrv::static_timeout() and of ImageView::timeout() ?

Nothing. Static timeout calls view->timeout() on the Imageview class.

> If the event loop runs in between, then another timeout event can be triggered, and chaos arises.
> ¿ Do you call Fl::wait(), Fl::flush(), Fl::check() within mrv::static_timeout() or functions it calls ?
> If yes, that would be in contradiction to the correct use of Fl::repeat_timeout().
>

No.


Gonzalo Garramuno
ggar...@gmail.com




Gonzalo Garramuno

unread,
Dec 8, 2021, 1:39:44 PM12/8/21
to fltkc...@googlegroups.com


> El 7 dic. 2021, a las 20:10, Albrecht Schlosser <Albrech...@online.de> escribió:
>
> The case that (current_timer != nullptr) but (current_timer->timer == nullptr) would definitely be different.ç

I think the issue is that current_timer->timer is not an atomic variable and it is changed (deleted and recreated) in between callbacks (albeit this is a guess, as I am not sure how macOS triggers timeouts). But if so, that’s why changing the call to add_timeout solves the issue, too.


Gonzalo Garramuno
ggar...@gmail.com




Manolo

unread,
Dec 8, 2021, 2:59:24 PM12/8/21
to fltk.coredev
Le mardi 7 décembre 2021 à 22:29:20 UTC+1, ggar...@gmail.com a écrit :


> El 7 dic. 2021, a las 13:33, Manolo <manol...@gmail.com> escribió:
>
> I would sggest you run your program under a debugger and make sure the pointer you transmit
> as 3rg argument to Fl::repeat_timeout() remains valid until the moment when the timeout runs.
>

I debugged the program with printfs and found out that the dangling pointer is not the one as parameter to repeat_timeout (which is "this" of the class), but the current_timer->timer one. It seems there’s a running issue with the list of mac_timers and their timing to be ready to be called on repeat_timeout, leading them to be not initialized. I am unsure of this Fl_Cocoa_Screen_Driver.cxx code as it seems to be running on one thread but do things that several threads would do.


void Fl_Cocoa_Screen_Driver::repeat_timeout(double time, Fl_Timeout_Handler cb, void* data)
{
if (!current_timer) {
add_timeout(time, cb, data);
return;
}

//////////////////////// ADDED and fixes my crashes, but I consider this a hack and not a true solution.
if ( current_timer->timer == nullptr ) {
return;
}
 
That you get  current_timer->timer == nullptr most probably means that delete_timer()
has been called before repeat_timeout() was called.
This means that the control did not stay continuously inside your callback function,
but was returned to cocoa at some point by your callback, in contradiction to what is required to use repeat_timeout().

Gonzalo Garramuño

unread,
Dec 8, 2021, 5:00:01 PM12/8/21
to fltkc...@googlegroups.com


El 8/12/21 a las 16:59, Manolo escribió:
 
That you get  current_timer->timer == nullptr most probably means that delete_timer()
has been called before repeat_timeout() was called.
Yes, that was the cause.

This means that the control did not stay continuously inside your callback function,
but was returned to cocoa at some point by your callback, in contradiction to what is required to use repeat_timeout().

No, it actually meant that a stray thread on my side was calling Fl::remove_timeout, breaking both the contract with fltk (of not calling things outside the main thread) and breaking Fl::repeat_timeout.

Thanks, Manolo.  Your advice is always helpful.

Reply all
Reply to author
Forward
0 new messages