Windows 11 crash (still investigating)

瀏覽次數:21 次
跳到第一則未讀訊息

Gonzalo Garramuño

未讀,
2024年4月24日 下午1:14:354月24日
收件者:fltkc...@googlegroups.com

A user of my program reported random crashes when working, which I cannot reproduce.  I managed to get a stack trace from him, but not much more information yet.  The relevant information follows.  Any help is appreciated.

6: KiUserExceptionDispatcher - 0x140711815885776
7: Fl_Window::handle - 0x140709297099744
8: Fl_Window::y_root - 0x140709297101632
9: fl_win32_xid - 0x140709297228272
10: Fl_Tiled_Image::draw - 0x140709297052528
11: fl_control_modifier - 0x140709296916000
12: Fl_Window::icons - 0x140709297290000
13: Fl::run - 0x140709296628080

It is a weird stack trace as I don't have any Fl_Tiled_Image in my program.  Also, I would like to know what's and when is fl_control_modifier() called.  If it is when you use CTRL, then I have a clue what to start checking.

-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月24日 下午1:16:104月24日
收件者:fltkc...@googlegroups.com
I forgot to mention.  I am using FLTK 1.4 git tag:

d2bd3c62408945227bb13133ad6ce270851b4872
-- 
Gonzalo Garramuño
ggar...@gmail.com

Albrecht Schlosser

未讀,
2024年4月24日 下午1:52:454月24日
收件者:fltkc...@googlegroups.com
On 4/24/24 19:14 Gonzalo Garramuño wrote:

A user of my program reported random crashes when working, which I cannot reproduce.  I managed to get a stack trace from him, but not much more information yet.  The relevant information follows.  Any help is appreciated.


Note: a stack trace w/o debug info is not very helpful. If this issue persists I recommend to deploy an executable from a debug build (assuming that you build the executable and not the end user).

6: KiUserExceptionDispatcher - 0x140711815885776
 
KiUserExceptionDispatcher is not defined by FLTK. This indicates that the handle() method of your derived window class calls KiUserExceptionDispatcher().

7: Fl_Window::handle - 0x140709297099744


I have no idea why Fl_Tiled_Image::draw() would (indirectly) call Fl_Window::handle(). There's some clipping done, hence fl_win32_xid() might be plausible.

8: Fl_Window::y_root - 0x140709297101632
9: fl_win32_xid - 0x140709297228272
10: Fl_Tiled_Image::draw - 0x140709297052528

Fl_Tiled_Image::draw() may be called (IIRC) when the 'plastic' scheme is used to draw the background image.

11: fl_control_modifier - 0x140709296916000
12: Fl_Window::icons - 0x140709297290000
13: Fl::run - 0x140709296628080

It is a weird stack trace as I don't have any Fl_Tiled_Image in my program.


See above.


Also, I would like to know what's and when is fl_control_modifier() called.  If it is when you use CTRL, then I have a clue what to start checking.


I'd guess it's use to check if a particular modifier (FL_COMMAND or FL_CONTROL) is set in a key event or to check for a shortcut:
FL/platform_types.h-136-#  define FL_COMMAND    fl_command_modifier()
FL/platform_types.h:137:#  define FL_CONTROL    fl_control_modifier()

One last note: assuming there is a bug somewhere in the program (either your code or a used library) it can be something called a "Heisenbug" that goes away when you rebuild your program, when data is changed, or when you build it in Debug rather than Release mode. IMHO the best you can do is to use a memory analyzer like Address Sanitizer (ASAN) or valgrind to check for access violations in thorough tests. IIRC Visual Studio supports ASAN, hence you should be able to build your program with ASAN support on Windows, and I'd guess that MSYS2 also supports (clang and) ASAN.

I hope this helps. Good luck.

Gonzalo Garramuño

未讀,
2024年4月25日 凌晨4:48:514月25日
收件者:fltkc...@googlegroups.com


On 4/24/2024 2:52 PM, 'Albrecht Schlosser' via fltk.coredev wrote:
Note: a stack trace w/o debug info is not very helpful. If this issue persists I recommend to deploy an executable from a debug build (assuming that you build the executable and not the end user).
I have now sent a debug build to my user on github.  I am waiting for him to test it.
KiUserExceptionDispatcher is not defined by FLTK. This indicates that the handle() method of your derived window class calls KiUserExceptionDispatcher().
This is a common stack trace step when a SEGV or similar exception happens on MSVC.

7: Fl_Window::handle - 0x140709297099744


I have no idea why Fl_Tiled_Image::draw() would (indirectly) call Fl_Window::handle(). There's some clipping done, hence fl_win32_xid() might be plausible.
8: Fl_Window::y_root - 0x140709297101632
9: fl_win32_xid - 0x140709297228272
10: Fl_Tiled_Image::draw - 0x140709297052528

Fl_Tiled_Image::draw() may be called (IIRC) when the 'plastic' scheme is used to draw the background image
That's interesting.


       IMHO the best you can do is to use a memory analyzer like Address
      Sanitizer (ASAN) or valgrind to check for access violations in
      thorough tests. IIRC Visual Studio supports ASAN, hence you should
      be able to build your program with ASAN support on Windows, and
      I'd guess that MSYS2 also supports (clang and) ASAN.
Sadly AddressSanitizer I believe it is still in beta on Visual Studio and valgrind is not available on Windows.


I hope this helps. Good luck.
Yes, it helped.  But I am still waiting on my user to get back to me.
-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 凌晨4:30:314月28日
收件者:fltkc...@googlegroups.com


On 4/24/2024 2:52 PM, 'Albrecht Schlosser' via fltk.coredev wrote:
On 4/24/24 19:14 Gonzalo Garramuño wrote:

A user of my program reported random crashes when working, which I cannot reproduce.  I managed to get a stack trace from him, but not much more information yet.  The relevant information follows.  Any help is appreciated.

Okay.  Here's the real stack trace that points to a bug in FLTK:

0: mrv::callback - 0x140701647072160 (C:\code\mrv2\mrv2\lib\mrvCore\win32\mrvSignalHandler.cpp:37)
1: log2f - 0x140705404926352 (line number unavailable)
2: `__scrt_common_main_seh'::`1'::filt$0 - 0x140701651910942 (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:304)
3: _C_specific_handler - 0x140705005169712 (line number unavailable)
4: _chkstk - 0x140705442906848 (line number unavailable)
5: RtlFindCharInUnicodeString - 0x140705442355664 (line number unavailable)
6: KiUserExceptionDispatcher - 0x140705442902992 (line number unavailable)
7: Fl_Window::handle - 0x140701644897184 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window.cxx:595)
8: Fl_Window_Driver::hide_common - 0x140701644960032 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window_Driver.cxx:173)
9: Fl_WinAPI_Window_Driver::hide - 0x140701645080560 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\drivers\WinAPI\Fl_WinAPI_Window_Driver.cxx:470)
10: Fl_Timeout::do_timeouts - 0x140701644961840 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Timeout.cxx:490)
11: Fl_System_Driver::wait - 0x140701645030560 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_System_Driver.cxx:360)
12: Fl_WinAPI_System_Driver::wait - 0x140701645075376 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_win32.cxx:362)
13: Fl::run - 0x140701644884672 (C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl.cxx:605)
14: main - 0x140701644700128 (C:\code\mrv2\mrv2\src\main.cpp:70)
15: WinMain - 0x140701644699808 (C:\code\mrv2\mrv2\src\main.cpp:125)
16: __scrt_common_main_seh - 0x140701651483248 (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288)
17: BaseThreadInitThunk - 0x140705406854496 (line number unavailable)
18: RtlUserThreadStart - 0x140705442605600 (line number unavailable)


GIT TAG is eeed39524606f1717dd2634ffe52e4640a606841

C:\code\mrv2\BUILD-Msys-amd64-small\Release\FLTK-prefix\src\FLTK\src\Fl_Window.cxx:595

is:
            Fl_Widget* p = parent(); for (;p->visible();p = p->parent()) {}

it should be:

            Fl_Widget* p = parent(); for (;p && p->visible();p = p->parent()) {}


-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 凌晨4:36:554月28日
收件者:fltkc...@googlegroups.com

Actually, the fix should be these two lines:

            Fl_Widget* p = parent(); for (p && ;p->visible();p = p->parent()) {}
            if (!p || p->type() >= FL_WINDOW) break; // don't do the unmap

-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 凌晨4:50:554月28日
收件者:fltkc...@googlegroups.com

On 4/28/2024 5:36 AM, Gonzalo Garramuño wrote:
>
> Actually, the fix should be these two lines:
>
>             Fl_Widget* p = parent(); for (p && ;p->visible();p =
> p->parent()) {}
>             if (!p || p->type() >= FL_WINDOW) break; // don't do the unmap
>
Make that instead:

            Fl_Widget* p = parent(); for (p && ;p->visible();p =
p->parent()) {}
            if (p && p->type() >= FL_WINDOW) break; // don't do the unmap

--
Gonzalo Garramuño
ggar...@gmail.com

Manolo

未讀,
2024年4月28日 清晨5:29:384月28日
收件者:fltk.coredev
Fl_Window.cxx line 595
  Fl_Widget* p = parent(); for (;p->visible();p = p->parent()) {}
has been unchanged since at least 30 october 2002 (22 years). So, it's surprising a null pointer derefencing
popups up today. We should know exactly if the crash is caused by p being null and p->visible()
being attempted. It could be that p has been deleted before which would be another kind of
error, more likely in the client program.

Gonzalo Garramuño

未讀,
2024年4月28日 清晨6:34:064月28日
收件者:fltkc...@googlegroups.com
Now that I know where to look, I printed out the pointers with:

          if (visible()) {
              printf("FL_HIDE:\n");
            Fl_Widget* p = parent();
            for (;p->visible();p = p->parent())
            {
                printf("\t%p\n", p);
                printf("\tparent=%p\n", p->parent());
            }
            if (p->type() >= FL_WINDOW) break; // don't do the unmap
          }


FL_HIDE:
        0000016A4D2ED8B0
        parent=0000000000000000

I don't know what more proof do you want.

--
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 上午10:10:494月28日
收件者:fltkc...@googlegroups.com
Manolo wrote me privately, but he wanted me to share it.


On 4/28/2024 10:33 AM, Manolo Gouy wrote:
Gonzalo: I believe we must know exactly the scenario that produces the crash,
even more because it happens in 22 years old code. The code in question
searches "what really turned invisible". This means that it’s expected to run
only for a window below something in a widget tree that has been turned
invisible before. In that situation p is never NULL because p->visible() is
false before p gets replaced by p->parent(). So what exactly is the scenario
in which this code is run by this particular client app? Isn’t there something wrong before?

Manolo

The problem happens when calling hide on a window from within an FL_LEAVE event. I was working around the issue with a timeout but it would still sometimes fail.

Here's a video of the issue (with the console printing out the Fl_Window.cxx nullptr):

https://mega.nz/file/7DhlCLiS#yWd1k-n10M86QQBO0gbjx-mDZNk99P-gP4ZnZhWApxs

As you move or enter the timeline a thumbnail appears, like on Netflix. The problem is calling hide when exiting the the timeline (ie. FL_LEAVE of the timeline widget). Note I never destroy the thumbnail, just hide it.

-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 上午10:12:294月28日
收件者:fltkc...@googlegroups.com


On 4/28/2024 11:10 AM, Gonzalo Garramuño wrote:
The problem happens when calling hide on a window from within an FL_LEAVE event. I was working around the issue with a timeout but it would still sometimes fail.

Here's a video of the issue (with the console printing out the Fl_Window.cxx nullptr):

https://mega.nz/file/7DhlCLiS#yWd1k-n10M86QQBO0gbjx-mDZNk99P-gP4ZnZhWApxs



    

And here's a video with the patch applied (no crash):

https://mega.nz/file/OCAD3Tob#8gZiKPyV1gwDW8sf67CEmfMGwhJg08kVJWBjEuV7jRY

-- 
Gonzalo Garramuño
ggar...@gmail.com

Albrecht Schlosser

未讀,
2024年4月28日 上午10:16:084月28日
收件者:fltkc...@googlegroups.com
NOTE: while I was almost ready to post this message a new reply from Gonzalo appeared. I'm going to look into the other reply soon. My questions are similar to what Manolo asked. I'm posting my reply anyway as-is ...
Gonzalo, I agree that the existing code is broken, and it was IMHO broken since it was introduced in the year 2000 as Manolo mentioned.

However, a typo seems to be in your suggested code in in `for (p && ;p->visible();` . I propose the following:

if (visible()) {
Fl_Widget* p = parent();
for (; p && p->visible(); p = p->parent()) {}
if (p && p->as_window()) break; // don't do the unmap
}
pWindowDriver->unmap();

    I have these questions:

(1) After seeing your last proposal I considered changing it to the first proposal above, wondering which one was correct. Why did you change the first proposal to the later one? I came to the same conclusion, but can you elaborate why you changed it?

(2) Did you have a chance to test this code with the patch or let your user test it? Did it change the behavior?

(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?

The more info you can provide, the better we can fix the bug.

According to the stack trace posted elsewhere I assume that your program hides either a window or another widget, maybe one that contains a subwindow, inside a timer callback. Does this sound plausible (possible)?

Maybe the crash is triggered because the timer callback is called while the user has a menu open in your program. If we could confirm this it might give us more clues. Can you please ask the user if the crash happens while a menu is open?

If my assumption (timeout that hides a widget or window) is true, then this could be a Windows specific issue triggered by a particular Windows behavior that hides a modal (menu) window when its "parent window" is hidden - where "parent window" is a Windows specific relation and not FLTK's parent() relation of subwindows.

It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.


Gonzalo Garramuño

未讀,
2024年4月28日 上午10:25:314月28日
收件者:fltkc...@googlegroups.com


On 4/28/2024 11:16 AM, 'Albrecht Schlosser' via fltk.coredev wrote:
NOTE: while I was almost ready to post this message a new reply from Gonzalo appeared. I'm going to look into the other reply soon. My questions are similar to what Manolo asked. I'm posting my reply anyway as-is ...

On 4/28/24 10:50 Gonzalo Garramuño wrote:

On 4/28/2024 5:36 AM, Gonzalo Garramuño wrote:

Actually, the fix should be these two lines:

            Fl_Widget* p = parent(); for (p && ;p->visible();p = p->parent()) {}
            if (!p || p->type() >= FL_WINDOW) break; // don't do the unmap

Make that instead:

            Fl_Widget* p = parent(); for (p && ;p->visible();p = p->parent()) {}
            if (p && p->type() >= FL_WINDOW) break; // don't do the unmap

Gonzalo, I agree that the existing code is broken, and it was IMHO broken since it was introduced in the year 2000 as Manolo mentioned.

However, a typo seems to be in your suggested code in in `for (p && ;p->visible();` . I propose the following:

if (visible()) {
Fl_Widget* p = parent();
for (; p && p->visible(); p = p->parent()) {}
if (p && p->as_window()) break; // don't do the unmap
}
pWindowDriver->unmap();
Agreed.  The typo is because I did not copy the code, I wrote it manually.  Sorry.

I have these questions:

(1) After seeing your last proposal I considered changing it to the first proposal above, wondering which one was correct. Why did you change the first proposal to the later one? I came to the same conclusion, but can you elaborate why you changed it?
Because I think the window where the FL_HIDE is should still get unmapped, as it is being shown.  Calling break; would not unmap it.


(2) Did you have a chance to test this code with the patch or let your user test it? Did it change the behavior?

I sent it to my tester (have not heard back).  But since, I was able to reproduce it.  See my videos.

(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?

The problem seems to be that the Window is hidden while being in an FL_LEAVE event of another window.

The more info you can provide, the better we can fix the bug.

According to the stack trace posted elsewhere I assume that your program hides either a window or another widget, maybe one that contains a subwindow, inside a timer callback. Does this sound plausible (possible)?
The timeout was a work-around (which randomly failed too) to hide the window.

Maybe the crash is triggered because the timer callback is called while the user has a menu open in your program. If we could confirm this it might give us more clues. Can you please ask the user if the crash happens while a menu is open?

No.  It is not a menu opened, it is calling hide() on another window from a handle() of another.

It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.

I think indeed it was a Windows issue (bug).  Sadly when I wrote the timeout code, I did not document *why* I wrote it and had not delved into FLTK's code.
-- 
Gonzalo Garramuño
ggar...@gmail.com

Albrecht Schlosser

未讀,
2024年4月28日 上午11:13:144月28日
收件者:fltkc...@googlegroups.com
On 4/28/24 16:25 Gonzalo Garramuño wrote:
On 4/28/2024 11:16 AM, 'Albrecht Schlosser' via fltk.coredev wrote:
Gonzalo, I agree that the existing code is broken, and it was IMHO broken since it was introduced in the year 2000 as Manolo mentioned.

However, a typo seems to be in your suggested code in in `for (p && ;p->visible();` . I propose the following:

if (visible()) {
Fl_Widget* p = parent();
for (; p && p->visible(); p = p->parent()) {}
if (p && p->as_window()) break; // don't do the unmap
}
pWindowDriver->unmap();
Agreed.  The typo is because I did not copy the code, I wrote it manually.  Sorry.

OK, thanks for confirmation. Can you try my patch (code above) ?

Note that I also change the type check to `p->as_window()` but that shouldn't affect the behavior, it the "modern" way to determine whether a widget is a window.


(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?

The problem seems to be that the Window is hidden while being in an FL_LEAVE event of another window.

OK, let's recap and try to simplify the scenario, maybe to create a small example program that triggers the issue so we are sure and can test easier:

(1) You have a (main) window A (subclassed from an FLTK window class)
(2) You open another window B (the small preview window)
(3) Window B is a modal (or non-modal) window (stays on top of A)
(4) You drag some "object" (in the timeline) inside your window A
(5) Your window A gets an FL_LEAVE event when the mouse cursor leaves window A while dragging
(6) You hide window B while handling FL_LEAVE in window A's handle method

Is all of this true? If not, please correct my wrong assumptions.

More questions: which of the windows (A, B, or both) are OpenGL windows?


The timeout was a work-around (which randomly failed too) to hide the window.

According to your video with the crash you removed the timeout and did it as described above. Is this correct?


It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.
I think indeed it was a Windows issue (bug).  Sadly when I wrote the timeout code, I did not document *why* I wrote it and had not delved into FLTK's code.

Now that you can reproduce the issue, can you also reproduce it on Linux (X11 or Wayland) or macOS?

Note: I'm asking these questions because I wonder why this bug is triggered in your code and was not reported earlier. Meanwhile I'm pretty sure that the fix given above is correct and necessary but I'm still wondering what the root cause of the issue may be.

Thanks for all your testing and for finding the issue.

Gonzalo Garramuño

未讀,
2024年4月28日 上午11:27:054月28日
收件者:fltkc...@googlegroups.com


On 4/28/2024 12:13 PM, 'Albrecht Schlosser' via fltk.coredev wrote:


(3) I agree with Manolo that we should investigate and understand why such old code triggers a bug (crash) in your code. Understanding what happens is important to know *what* exactly needs to be fixed. Maybe there's another bug elsewhere (in FLTK or inyour application) that causes this unexpected behavior. As Manolo wrote, it is possible that some code accesses a deleted widget or window which causes undefined behavior. Do you have any insights what may trigger the bug in your program?

The problem seems to be that the Window is hidden while being in an FL_LEAVE event of another window.

OK, let's recap and try to simplify the scenario, maybe to create a small example program that triggers the issue so we are sure and can test easier:

(1) You have a (main) window A (subclassed from an FLTK window class)
(2) You open another window B (the small preview window)
(3) Window B is a modal (or non-modal) window (stays on top of A)
(4) You drag some "object" (in the timeline) inside your window A
(5) Your window A gets an FL_LEAVE event when the mouse cursor leaves window A while dragging
(6) You hide window B while handling FL_LEAVE in window A's handle method

Is all of this true? If not, please correct my wrong assumptions.

I believe so.  Here's a self contained example showing the crash on MSVC and compiled with MSys64 as my terminal as usual:

https://mega.nz/file/OaJ0XKwa#qgxJK2q9E9fzqO3Ltf5wUA2HR4ts4cCHWLc9lvoRIU4


More questions: which of the windows (A, B, or both) are OpenGL windows?
One is, the other not.


The timeout was a work-around (which randomly failed too) to hide the window.

According to your video with the crash you removed the timeout and did it as described above. Is this correct?
Yes.


It would be great if you could provide us with some more information about the program's state when the crash is triggered. Thanks in advance.
I think indeed it was a Windows issue (bug).  Sadly when I wrote the timeout code, I did not document *why* I wrote it and had not delved into FLTK's code.

Now that you can reproduce the issue, can you also reproduce it on Linux (X11 or Wayland) or macOS?
Haven't tried yet.  Will try it after I take a shower :D


Note: I'm asking these questions because I wonder why this bug is triggered in your code and was not reported earlier. Meanwhile I'm pretty sure that the fix given above is correct and necessary but I'm still wondering what the root cause of the issue may be.


I am
-- 
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 下午1:05:004月28日
收件者:fltkc...@googlegroups.com

On 28/4/24 12:13, 'Albrecht Schlosser' via fltk.coredev wrote:
>
> Now that you can reproduce the issue, can you also reproduce it on
> Linux (X11 or Wayland) or macOS?

I've updated my test for Linux and macOS

https://mega.nz/file/6SwV1IQT#0EYlDESaiX9x1m3vUtBandlUT9D15HMcmhsLWfLcKKM

I'm sad to report it is not only a Windows issue.  It happens on Linux
(and likely the other OSes).

--
Gonzalo Garramuño
ggar...@gmail.com

Gonzalo Garramuño

未讀,
2024年4月28日 下午2:18:324月28日
收件者:fltkc...@googlegroups.com

On 28/4/24 12:13, 'Albrecht Schlosser' via fltk.coredev wrote:
>
> OK, thanks for confirmation. Can you try my patch (code above) ?

Tried it.  Seems to fix Windows, Wayland and XWayland (did not try macOS
and X11 yet but I am guessing they should be fine too).

--
Gonzalo Garramuño
ggar...@gmail.com

Albrecht Schlosser

未讀,
2024年4月28日 下午6:45:594月28日
收件者:fltkc...@googlegroups.com
On 4/28/24 20:18 Gonzalo Garramuño wrote:

On 28/4/24 12:13, 'Albrecht Schlosser' via fltk.coredev wrote:

OK, thanks for confirmation. Can you try my patch (code above) ?

Tried it.  Seems to fix Windows, Wayland and XWayland (did not try macOS and X11 yet but I am guessing they should be fine too).


Thanks for testing it. It's basically your "patch" with minor modifications.

I'll give it another review tomorrow and I will very likely commit it. I'm pretty sure that the patch itself is correct but I still don't understand everything that leads to this situation. Your example code is a good reproducer and was very helpful in analyzing the issue (more to come tomorrow).

One bug of your program (and I call it a bug because it causes a kind of undefined behavior) can be seen in a comment in your example code:

Timeline(int X, int Y, int W, int H, const char* L = 0) :
Fl_Gl_Window(X, Y, W, H, L)
{
//Fl_Group::current(0); // adding this seems to make it not crash [1]
thumbnailWindow = new Fl_Double_Window(120, Y - 100, 128, 80); [2]
thumbnailWindow->clear_border();
thumbnailWindow->set_non_modal(); [3]
thumbnailWindow->begin();
Fl_Box* box = new Fl_Box(2, 2, 120 - 2, 80 - 2);
box->box(FL_FLAT_BOX);
box->color(FL_BLACK);
thumbnailWindow->end(); [4]
end(); [5]
}

    
[1] You should really enable this statement to make sure that `thumbnailWindow` does not become a child (subwindow) of the Timeline (Fl_Gl_)Window.

[2] Since the constructor of 
Fl_Gl_Window ends with `begin();` as all group widgets do the following `thumbnailWindow` will become a child if you don't enable the statement above. Alternative code after the c'tor of thumbnailWindow:
  if (parent()) parent()->remove(this);
but this is IMHO the less elegant solution.

[3] 
set_non_modal() should only be used on top-level windows and makes the window stay above its base window (in Z order). In FLTK this relation is determined when the modal or non-modal window is shown: the active window becomes its base (or "parent") window as far as the system is concerned. Unfortunately we can't affect this order by any other means.

[4] This is correct but superfluous, because it is followed by `end();` [5]

[5] This end()'s the `Timeline` window which does not really do what everybody expects. What it really does is
Fl_Group::current([this->]parent()), i.e. it makes the parent of the Timeline widget the new "current group". In your code environment (in the test program) this would be correct if ... you didn't enable the statement in [1]. Therefore you would have to use `win.begin()` in your main program if you intended to add other widgets after the timeline. This is not the case in your demo program but may well be in your real program.


As you wrote yourself "adding this seems to make it not crash", and as I wrote above, this would be better because it prevents making the "non-modal" `thumbnailWindow` a subwindow of your main window. I'm not sure if it is documented that subwindows and { modal | non-modal } are mutually exclusive (IMHO this must be the case).

However, there are still some things I do not yet fully understand when it comes to event delivery of FL_LEAVE events and hiding the non-modal subwindow. In my tests I could only trigger the NULL pointer dereference (the patched code in Fl_Window) when I dragged the mouse (timeline) to the right out of the window. It does not happen when the mouse leaves the timeline to any other direction (up, down, and including to the left side). But that is internal stuff and should not affect you and your program.

Summary:
(1) You should fix your program (see above) and ...
(2) I will fix the potential NULL pointer dereference.

Good night from here

Albrecht

Gonzalo Garramulo

未讀,
2024年4月29日 清晨5:14:474月29日
收件者:fltkc...@googlegroups.com

El 28/4/24 a las 19:45, 'Albrecht Schlosser' via fltk.coredev escribió:
>
> Summary:
> (1) You should fix your program (see above) and ...

See this new version of the program that no longer crashes:

https://mega.nz/file/aHYWkabR#EUFPKmDA_87NSLKekWjH8fHb4kmpaKz8kht_KjNsyZc

To avoid the crashes besides your suggestions I cleaned up the code and
added clamping of the thumbnail subwindow to the parent windows' borders.

> (2) I will fix the potential NULL pointer dereference.

Yes, do so just in case, as it is still a bug.   Thanks!

--
Gonzalo Garramuño
ggar...@gmail.com

Albrecht Schlosser

未讀,
2024年4月29日 下午3:38:024月29日
收件者:fltkc...@googlegroups.com
On 4/29/24 11:14 Gonzalo Garramulo wrote:

El 28/4/24 a las 19:45, 'Albrecht Schlosser' via fltk.coredev escribió:

Summary:
(1) You should fix your program (see above) and ...

See this new version of the program that no longer crashes:

https://mega.nz/file/aHYWkabR#EUFPKmDA_87NSLKekWjH8fHb4kmpaKz8kht_KjNsyZc

To avoid the crashes besides your suggestions I cleaned up the code and added clamping of the thumbnail subwindow to the parent windows' borders.

Yeah, looks good (much better than before) - but unfortunately it doesn't work on Wayland (as expected). The problem of Wayland is that you can't position windows absolutely on the screen and you don't get real (screen) coordinates of windows and events.

Good news: an even simplified version works fine on all current FLTK platforms. See attached file 'timeline.diff'.

Note to Gonzalo and others: subwindows don't have borders anyway, hence 'clear_border();' could be removed.

Although this particular solution of a user's issue is going OT now I decided to post my solution here for others seeing this later. The attached file 'timeline2.cxx' demonstrates how a /subwindow/ (called 'thumbnailWindow') can be opened (shown) and moved above the main window while moving (or dragging) the mouse. Thanks to Gonzalo for the working example, BTW.. The point here is that 'thumbnailWindow' is a subwindow and not another independent window which can't be positioned easily on Wayland (we have a working solution for menus etc. but there's no API for user programs).

The mechanism used in this example is that the subwindow is the last child of the main window (thus always drawn on top). It overlaps other widgets of the main window but this is tolerable in this case.

The code in
```
    thumbnailWindow->parent()->redraw(); // needed for Wayland
```
is not optimal because it may draw more than needed but I'm leaving this as an exercise for the reader.


(2) I will fix the potential NULL pointer dereference.

Yes, do so just in case, as it is still a bug.   Thanks!

Welcome, and done in commit b402b6a8397f9fc13157813d39d505ea9ead00f0.

timeline.diff
timeline2.cxx
回覆所有人
回覆作者
轉寄
0 則新訊息