A user of my program reported random crashes when working, which
I cannot reproduce. I managed to get a stack trace from him, but
not much more information yet. The relevant information follows.
Any help is appreciated.
6: KiUserExceptionDispatcher - 0x140711815885776
7: Fl_Window::handle - 0x140709297099744
8: Fl_Window::y_root - 0x140709297101632
9: fl_win32_xid - 0x140709297228272
10: Fl_Tiled_Image::draw - 0x140709297052528
11: fl_control_modifier - 0x140709296916000
12: Fl_Window::icons - 0x140709297290000
13: Fl::run - 0x140709296628080
It is a weird stack trace as I don't have any Fl_Tiled_Image in
my program. Also, I would like to know what's and when is
fl_control_modifier() called. If it is when you use CTRL, then I
have a clue what to start checking.
-- Gonzalo Garramuño ggar...@gmail.com
-- Gonzalo Garramuño ggar...@gmail.com
A user of my program reported random crashes when working, which I cannot reproduce. I managed to get a stack trace from him, but not much more information yet. The relevant information follows. Any help is appreciated.
6: KiUserExceptionDispatcher - 0x140711815885776
7: Fl_Window::handle - 0x140709297099744
8: Fl_Window::y_root - 0x140709297101632 9: fl_win32_xid - 0x140709297228272 10: Fl_Tiled_Image::draw - 0x140709297052528
11: fl_control_modifier - 0x140709296916000 12: Fl_Window::icons - 0x140709297290000 13: Fl::run - 0x140709296628080
It is a weird stack trace as I don't have any Fl_Tiled_Image in my program.
Also, I would like to know what's and when is fl_control_modifier() called. If it is when you use CTRL, then I have a clue what to start checking.
FL/platform_types.h-136-# define FL_COMMAND fl_command_modifier() FL/platform_types.h:137:# define FL_CONTROL fl_control_modifier()One last note: assuming there is a bug somewhere in the program (either your code or a used library) it can be something called a "Heisenbug" that goes away when you rebuild your program, when data is changed, or when you build it in Debug rather than Release mode. IMHO the best you can do is to use a memory analyzer like Address Sanitizer (ASAN) or valgrind to check for access violations in thorough tests. IIRC Visual Studio supports ASAN, hence you should be able to build your program with ASAN support on Windows, and I'd guess that MSYS2 also supports (clang and) ASAN.
Note: a stack trace w/o debug info is not very helpful. If this issue persists I recommend to deploy an executable from a debug build (assuming that you build the executable and not the end user).
KiUserExceptionDispatcher is not defined by FLTK. This indicates that the handle() method of your derived window class calls KiUserExceptionDispatcher().
7: Fl_Window::handle - 0x140709297099744
I have no idea why Fl_Tiled_Image::draw() would (indirectly) call Fl_Window::handle(). There's some clipping done, hence fl_win32_xid() might be plausible.
8: Fl_Window::y_root - 0x140709297101632 9: fl_win32_xid - 0x140709297228272 10: Fl_Tiled_Image::draw - 0x140709297052528
Fl_Tiled_Image::draw() may be called (IIRC) when the 'plastic' scheme is used to draw the background image
IMHO the best you can do is to use a memory analyzer like Address Sanitizer (ASAN) or valgrind to check for access violations in thorough tests. IIRC Visual Studio supports ASAN, hence you should be able to build your program with ASAN support on Windows, and I'd guess that MSYS2 also supports (clang and) ASAN.
I hope this helps. Good luck.
-- Gonzalo Garramuño ggar...@gmail.com
On 4/24/24 19:14 Gonzalo Garramuño wrote:
A user of my program reported random crashes when working, which I cannot reproduce. I managed to get a stack trace from him, but not much more information yet. The relevant information follows. Any help is appreciated.
Okay. Here's the real stack trace that points to a bug in FLTK:
0: mrv::callback - 0x140701647072160 (C:\code\mrv2\mrv2\lib\mrvCore\win32\mrvSignalHandler.cpp:37) 1: log2f - 0x140705404926352 (line number unavailable) 2: `__scrt_common_main_seh'::`1'::filt$0 - 0x140701651910942 (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:304) 3: _C_specific_handler - 0x140705005169712 (line number unavailable) 4: _chkstk - 0x140705442906848 (line number unavailable) 5: RtlFindCharInUnicodeString - 0x140705442355664 (line number unavailable) 6: KiUserExceptionDispatcher - 0x140705442902992 (line number unavailable) 7: Fl_Window::handle - 0x140701644897184 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window.cxx:595) 8: Fl_Window_Driver::hide_common - 0x140701644960032 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window_Driver.cxx:173) 9: Fl_WinAPI_Window_Driver::hide - 0x140701645080560 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\drivers\WinAPI\Fl_WinAPI_Window_Driver.cxx:470) 10: Fl_Timeout::do_timeouts - 0x140701644961840 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Timeout.cxx:490) 11: Fl_System_Driver::wait - 0x140701645030560 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_System_Driver.cxx:360) 12: Fl_WinAPI_System_Driver::wait - 0x140701645075376 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_win32.cxx:362) 13: Fl::run - 0x140701644884672 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl.cxx:605) 14: main - 0x140701644700128 (C:\code\mrv2\mrv2\src\main.cpp:70) 15: WinMain - 0x140701644699808 (C:\code\mrv2\mrv2\src\main.cpp:125) 16: __scrt_common_main_seh - 0x140701651483248 (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288) 17: BaseThreadInitThunk - 0x140705406854496 (line number unavailable) 18: RtlUserThreadStart - 0x140705442605600 (line number unavailable) GIT TAG is eeed39524606f1717dd2634ffe52e4640a606841
C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window.cxx:595
is: Fl_Widget* p = parent(); for (;p->visible();p = p->parent()) {} it should be: Fl_Widget* p = parent(); for (;p && p->visible();p = p->parent()) {}
-- Gonzalo Garramuño ggar...@gmail.com
Actually, the fix should be these two
lines:
Fl_Widget* p = parent(); for (p &&
;p->visible();p = p->parent()) {}
if (!p || p->type() >= FL_WINDOW) break; //
don't do the unmap
-- Gonzalo Garramuño ggar...@gmail.com
Gonzalo: I believe we must know exactly the scenario that produces the crash, even more because it happens in 22 years old code. The code in question searches "what really turned invisible". This means that it’s expected to run only for a window below something in a widget tree that has been turned invisible before. In that situation p is never NULL because p->visible() is false before p gets replaced by p->parent(). So what exactly is the scenario in which this code is run by this particular client app? Isn’t there something wrong before? Manolo
The problem happens when calling hide on a window from within an FL_LEAVE event. I was working around the issue with a timeout but it would still sometimes fail.
Here's a video of the issue (with the console printing out the Fl_Window.cxx nullptr):
https://mega.nz/file/7DhlCLiS#yWd1k-n10M86QQBO0gbjx-mDZNk99P-gP4ZnZhWApxs
As you move or enter the timeline a thumbnail appears, like on Netflix. The problem is calling hide when exiting the the timeline (ie. FL_LEAVE of the timeline widget). Note I never destroy the thumbnail, just hide it.
-- Gonzalo Garramuño ggar...@gmail.com
The problem happens when calling hide on a window from within an FL_LEAVE event. I was working around the issue with a timeout but it would still sometimes fail.Here's a video of the issue (with the console printing out the Fl_Window.cxx nullptr):
https://mega.nz/file/7DhlCLiS#yWd1k-n10M86QQBO0gbjx-mDZNk99P-gP4ZnZhWApxs
And here's a video with the patch applied (no crash):
https://mega.nz/file/OCAD3Tob#8gZiKPyV1gwDW8sf67CEmfMGwhJg08kVJWBjEuV7jRY
-- Gonzalo Garramuño ggar...@gmail.com
NOTE: while I was almost ready to post this message a new reply from Gonzalo appeared. I'm going to look into the other reply soon. My questions are similar to what Manolo asked. I'm posting my reply anyway as-is ...
On 4/28/24 10:50 Gonzalo Garramuño wrote:
On 4/28/2024 5:36 AM, Gonzalo Garramuño wrote:
Make that instead:
Actually, the fix should be these two lines:
Fl_Widget* p = parent(); for (p && ;p->visible();p = p->parent()) {}
if (!p || p->type() >= FL_WINDOW) break; // don't do the unmap
Fl_Widget* p = parent(); for (p && ;p->visible();p = p->parent()) {}
if (p && p->type() >= FL_WINDOW) break; // don't do the unmap
Gonzalo, I agree that the existing code is broken, and it was IMHO broken since it was introduced in the year 2000 as Manolo mentioned.
However, a typo seems to be in your suggested code in in `for (p && ;p->visible();` . I propose the following:
if (visible()) {Fl_Widget* p = parent();for (; p && p->visible(); p = p->parent()) {}if (p && p->as_window()) break; // don't do the unmap}pWindowDriver->unmap();
I have these questions:
(1) After seeing your last proposal I considered changing it to the first proposal above, wondering which one was correct. Why did you change the first proposal to the later one? I came to the same conclusion, but can you elaborate why you changed it?
(2) Did you have a chance to test this code with the patch or let your user test it? Did it change the behavior?
(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?
The more info you can provide, the better we can fix the bug.
According to the stack trace posted elsewhere I assume that your program hides either a window or another widget, maybe one that contains a subwindow, inside a timer callback. Does this sound plausible (possible)?
Maybe the crash is triggered because the timer callback is called while the user has a menu open in your program. If we could confirm this it might give us more clues. Can you please ask the user if the crash happens while a menu is open?
It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.
-- Gonzalo Garramuño ggar...@gmail.com
On 4/28/2024 11:16 AM, 'Albrecht Schlosser' via fltk.coredev wrote:
Gonzalo, I agree that the existing code is broken, and it was IMHO broken since it was introduced in the year 2000 as Manolo mentioned.Agreed. The typo is because I did not copy the code, I wrote it manually. Sorry.
However, a typo seems to be in your suggested code in in `for (p && ;p->visible();` . I propose the following:
if (visible()) {Fl_Widget* p = parent();for (; p && p->visible(); p = p->parent()) {}if (p && p->as_window()) break; // don't do the unmap}pWindowDriver->unmap();
(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?The problem seems to be that the Window is hidden while being in an FL_LEAVE event of another window.
The timeout was a work-around (which randomly failed too) to hide the window.
It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.I think indeed it was a Windows issue (bug). Sadly when I wrote the timeout code, I did not document *why* I wrote it and had not delved into FLTK's code.
(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?The problem seems to be that the Window is hidden while being in an FL_LEAVE event of another window.
OK, let's recap and try to simplify the scenario, maybe to create a small example program that triggers the issue so we are sure and can test easier:
(1) You have a (main) window A (subclassed from an FLTK window class)
(2) You open another window B (the small preview window)
(3) Window B is a modal (or non-modal) window (stays on top of A)
(4) You drag some "object" (in the timeline) inside your window A
(5) Your window A gets an FL_LEAVE event when the mouse cursor leaves window A while dragging
(6) You hide window B while handling FL_LEAVE in window A's handle method
Is all of this true? If not, please correct my wrong assumptions.
I believe so. Here's a self contained example showing the crash on MSVC and compiled with MSys64 as my terminal as usual:
https://mega.nz/file/OaJ0XKwa#qgxJK2q9E9fzqO3Ltf5wUA2HR4ts4cCHWLc9lvoRIU4
More questions: which of the windows (A, B, or both) are OpenGL windows?
The timeout was a work-around (which randomly failed too) to hide the window.
According to your video with the crash you removed the timeout and did it as described above. Is this correct?
It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.I think indeed it was a Windows issue (bug). Sadly when I wrote the timeout code, I did not document *why* I wrote it and had not delved into FLTK's code.
Now that you can reproduce the issue, can you also reproduce it on Linux (X11 or Wayland) or macOS?
Note: I'm asking these questions because I wonder why this bug is triggered in your code and was not reported earlier. Meanwhile I'm pretty sure that the fix given above is correct and necessary but I'm still wondering what the root cause of the issue may be.
-- Gonzalo Garramuño ggar...@gmail.com
On 28/4/24 12:13, 'Albrecht Schlosser' via fltk.coredev wrote:
OK, thanks for confirmation. Can you try my patch (code above) ?
Tried it. Seems to fix Windows, Wayland and XWayland (did not try macOS and X11 yet but I am guessing they should be fine too).
El 28/4/24 a las 19:45, 'Albrecht Schlosser' via fltk.coredev escribió:
Summary:
(1) You should fix your program (see above) and ...
See this new version of the program that no longer crashes:
https://mega.nz/file/aHYWkabR#EUFPKmDA_87NSLKekWjH8fHb4kmpaKz8kht_KjNsyZc
To avoid the crashes besides your suggestions I cleaned up the code and added clamping of the thumbnail subwindow to the parent windows' borders.
(2) I will fix the potential NULL pointer dereference.
Yes, do so just in case, as it is still a bug. Thanks!