Wayland's wl_display_dispatch blocking

455 views
Skip to first unread message

Gonzalo Garramuño

unread,
Dec 23, 2022, 6:07:28 AM12/23/22
to fltkc...@googlegroups.com
I found the problem call in the Wayland code.  The
wl_display_dispatch(display) function is blocking sometimes in my
application

.The issue in my application is in the following function, in
Fl_Wayland_Screen_Driver.cxx:


static void fd_callback(int fd, struct wl_display *display) {
  struct pollfd fds = (struct pollfd) { fd, POLLIN, 0 };
  do {
    if (wl_display_dispatch(display) == -1) {
      Fl::fatal("Fatal error while communicating with the Wayland
server: %s",
                strerror(errno));
    }
  }
  while (poll(&fds, 1, 0) > 0);
}


The documentation comments at:

https://chromium.googlesource.com/external/wayland/wayland/+/refs/heads/master/src/wayland-client.c

indicate, among other cases:

"If the default event queue is empty, this function blocks until there are
events to be read from the display fd. Events are read and queued on
the appropriate event queues. Finally, events on the default event queue
are dispatched.

 \note It is not possible to check if there are events on the queue
          or not. For dispatching default queue events without
blocking, see \ref
          wl_display_dispatch_pending()."

Manolo

unread,
Dec 24, 2022, 6:39:58 AM12/24/22
to fltk.coredev
Function fd_callback is associated to the file descriptor over which all communication
between the app and the Wayland compositor occurs. This association is established by
  Fl::add_fd(wl_display_get_fd(wl_display), FL_READ, (Fl_FD_Handler)fd_callback, wl_display);
in function Fl_Wayland_Screen_Driver::open_display_platform(), that is, when the display is open.
Therefore this function is called only when the OS detects that there is something to be read
on that file descriptor.
Then, function fd_callback also recursively calls itself in a do / while loop, but the condition
for the loop to iterate is   while (poll(&fds, 1, 0) > 0);  which ensures again that there is
something to be read.

Therefore, the call to wl_display_dispatch(display) should not block, unless some child thread
also calls functions such as Fl::wait().


Manolo

unread,
Dec 24, 2022, 6:59:57 AM12/24/22
to fltk.coredev
Le samedi 24 décembre 2022 à 12:39:58 UTC+1, Manolo a écrit :
Function fd_callback is associated to the file descriptor over which all communication
between the app and the Wayland compositor occurs. This association is established by
  Fl::add_fd(wl_display_get_fd(wl_display), FL_READ, (Fl_FD_Handler)fd_callback, wl_display);
in function Fl_Wayland_Screen_Driver::open_display_platform(), that is, when the display is open.
Therefore this function is called only when the OS detects that there is something to be read
on that file descriptor.
Then, function fd_callback also recursively calls itself in a do / while loop,
There is an error in that sentence. It should be 
Then, function fd_callback calls  function wl_display_dispatch() in a do / while loop,

but the condition
for the loop to iterate is   while (poll(&fds, 1, 0) > 0);  which ensures again that there is
something to be read.

Therefore, the call to wl_display_dispatch(display) should not block, unless some child thread
also calls functions such as Fl::wait().
This conclusion remains unchanged.
 

imm

unread,
Dec 24, 2022, 7:55:08 AM12/24/22
to fltkc...@googlegroups.com
On Sat, 24 Dec 2022 at 11:59, Manolo wrote:
>
>> Therefore, the call to wl_display_dispatch(display) should not block, unless some child thread
>> also calls functions such as Fl::wait().
>
> This conclusion remains unchanged.

Nevertheless, As Gonzalo says, there is something weird happening at
times: Whilst testing the Wayland port, but only using the fltk tests
and examples, I have seen scenarios with animation (e.g. the cube or
fractal demos, etc.) get into a "stuck" state where they would only
update whilst I was wiggling the mouse over them.

That said, I have been unable to reproduce this effect today at all,
and TBH I don't actually recall what setup I saw the issues on - I
tested with Weston and with mutter, on my Ubuntu hosts (AMD64 and Pi
ARM64 both) and with the WSL/WSLg setup hosted on Win11 and *today*
they all work. That has not always been the case.

As an aside, the behaviour is somewhat reminiscent of one of the
effects we saw when the Win32 event queue got clogged up, which was
when we added the queue flushing mechanism for Win32.
Is it possible that the Wayland scheme also needs a similar
event-queue-flushing scheme, or is this something else entirely?

Manolo

unread,
Dec 24, 2022, 9:21:47 AM12/24/22
to fltk.coredev
Le samedi 24 décembre 2022 à 13:55:08 UTC+1, imacarthur a écrit :
On Sat, 24 Dec 2022 at 11:59, Manolo wrote:
>
>> Therefore, the call to wl_display_dispatch(display) should not block, unless some child thread
>> also calls functions such as Fl::wait().
>
> This conclusion remains unchanged.

Nevertheless, As Gonzalo says, there is something weird happening at
times: Whilst testing the Wayland port, but only using the fltk tests
and examples, I have seen scenarios with animation (e.g. the cube or
fractal demos, etc.) get into a "stuck" state where they would only
update whilst I was wiggling the mouse over them.

That said, I have been unable to reproduce this effect today at all,
and TBH I don't actually recall what setup I saw the issues on - I
tested with Weston and with mutter, on my Ubuntu hosts (AMD64 and Pi
ARM64 both) and with the WSL/WSLg setup hosted on Win11 and *today*
they all work. That has not always been the case.

Yes. Function  fd_callback() got its present form relatively recently
precisely to avoid blocking situations that occurred at times previously.
I believe only the behaviour of test apps as of today is to be considered.

Gonzalo Garramuño

unread,
Dec 24, 2022, 12:19:33 PM12/24/22
to fltkc...@googlegroups.com


El 24/12/22 a las 11:21, Manolo escribió:


Nevertheless, As Gonzalo says, there is something weird happening at
times:

Yes. Function  fd_callback() got its present form relatively recently
precisely to avoid blocking situations that occurred at times previously.
I believe only the behaviour of test apps as of today is to be considered.

The problem does not seem to be with the fd_callback function itself, but what happens before that makes wl_display_dispatch block in my application.  Here's some printf's that show what goes on on my program:

...etc...
=======================================================
Entering fd_callback with fd=3 display=0x2f41120
Call wl_display_dispatch
Called wl_display_dispatch
Called poll returning=0
Returned from fd_callback
=======================================================
Entering fd_callback with fd=3 display=0x2f41120
Call wl_display_dispatch
****BLOCKS****

With current HEAD of fltk1.4, the cube demo stops the animation when you do the following:

* You bring the cube window to the front by clicking on the top bar of it.
* You click on the terminal window (the one you fired the cube demo from, for example) to bring it to the front.

The cube starts animating once again after a while or after you enter the cube window again.

Note, however, this is a different bug than the one I am referring to. 

For my bug, neither the cube or fractal demos are a good example, as they call fd_callback only when there's no redrawing:

- With the cube demo, when speed is 0 and you move the mouse.
- With the fractals demo, when you stop the animation.

Gonzalo Garramuño

unread,
Dec 24, 2022, 1:03:15 PM12/24/22
to fltkc...@googlegroups.com


El 24/12/22 a las 08:59, Manolo escribió:
Then, function fd_callback also recursively calls itself in a do / while loop,
There is an error in that sentence. It should be 
Then, function fd_callback calls  function wl_display_dispatch() in a do / while loop,

I changed wl_display_dispatch for wl_display_dispatch_pending() and that removed the blocking on that function.  Indeed the playback improved.  However, I still get some blocking somewhere else (still don't know where).

The documentation for that function mentions I should be calling wl_display_flush afterwards, too.  And it mentions as an example just my use case: a video player.  However, the explanation on how to use the functions went over my head.


Manolo

unread,
Dec 25, 2022, 1:52:22 PM12/25/22
to fltk.coredev
Le samedi 24 décembre 2022 à 18:19:33 UTC+1, ggar...@gmail.com a écrit :

With current HEAD of fltk1.4, the cube demo stops the animation when you do the following:

* You bring the cube window to the front by clicking on the top bar of it.
* You click on the terminal window (the one you fired the cube demo from, for example) to bring it to the front.

The cube starts animating once again after a while or after you enter the cube window again.


I don't see that.  The cube never stops moving when I do that.

Manolo

unread,
Dec 25, 2022, 2:01:07 PM12/25/22
to fltk.coredev
Le samedi 24 décembre 2022 à 19:03:15 UTC+1, ggar...@gmail.com a écrit :
I changed wl_display_dispatch for wl_display_dispatch_pending() and that removed the blocking on that function.  Indeed the playback improved.  However, I still get some blocking somewhere else (still don't know where).

Replacing wl_display_dispatch() by wl_display_dispatch_pending() in a single threaded FLTK app
produces an app that never opens any window. That is to be expected because in that situation
the app will never read data arriving on the Wayland file descriptor.

I have detailed previously why a single threaded app cannot block in function fd_callback(). In short, that's because
all read attempts are done immediately after a call to select/poll that guarantees the presence of data to be read.

I therefore more and more believe that your app makes forbidden calls to functions such as Fl::wait(), Fl::check()
or Fl::ready() from a child thread. That's the only explanation I can think of for the blocking you report.

Gonzalo Garramuño

unread,
Dec 25, 2022, 3:53:10 PM12/25/22
to fltkc...@googlegroups.com


El 25/12/22 a las 15:52, Manolo escribió:
That's weird.  I am running it on Ubuntu on Wayland (don't know what compositor is that), and the behavior is always the same.

Gonzalo Garramuño

unread,
Dec 25, 2022, 4:40:52 PM12/25/22
to fltkc...@googlegroups.com


El 25/12/22 a las 16:01, Manolo escribió:


Le samedi 24 décembre 2022 à 19:03:15 UTC+1, ggar...@gmail.com a écrit :
I changed wl_display_dispatch for wl_display_dispatch_pending() and that removed the blocking on that function.  Indeed the playback improved.  However, I still get some blocking somewhere else (still don't know where).

Replacing wl_display_dispatch() by wl_display_dispatch_pending() in a single threaded FLTK app
produces an app that never opens any window. That is to be expected because in that situation
the app will never read data arriving on the Wayland file descriptor.
Well, in my application, before the hanging occurs, I just noticed that the playback stutters.  I found that that happens in:

Fl_Gl_Wayland_Gl_Window_Driver::swap_buffers() when it calls:
wl_display_read_events(fl_wl_display()); // this stops the event loop for some seconds (or less) and then continues.

In that function, I saw an event loop (weird), with a special opengl queue, which I could not understand what it does.  The hanging I see of wl_display_dispatch happens a tad later.  Not sure if that could be related, but it seems it might.  Read below.


I have detailed previously why a single threaded app cannot block in function fd_callback(). In short, that's because
all read attempts are done immediately after a call to select/poll that guarantees the presence of data to be read.

I understood it, and I believe you are right.  But I think this is blocking not because of the select/poll but because there's some other issue going on with the wl_* calls.

I therefore more and more believe that your app makes forbidden calls to functions such as Fl::wait(), Fl::check()
or Fl::ready() from a child thread. That's the only explanation I can think of for the blocking you report.
I re-checked my code for all Fl::check() calls and none are in a thread or in the event loop. One thing that is different from the demo programs, thou, is that I *do* call Fl::check *before* I enter the Fl::run() loop, to open the window and have my window resize to the video size.

As a test, I commented out the wl_* functions in Fl_Gl_Wayland_Gl_Window_Driver::swap_buffers() leaving just the eglSwapBuffer call and left the call to wl_display_dispatch() in fd_callback as you suggested. 

My application then played the video correctly and responded to my mouse movements, without any stuttering or blocking, as long as my mouse was in the window of the viewer.  When it left the window, the refresh of my opengl window stopped (but it did not block, as the video playback continued in the background --it just was not shown).

This of course is not a solution but just a test as my application eventually crashed when my mouse left the main window and entered it again and also the demo programs did not respond to mouse dragging.  But to me, it pointed out an issue with the wl_* calls.


Manolo

unread,
Dec 26, 2022, 12:46:54 AM12/26/22
to fltk.coredev
I confirm that under Unbuntu 22.10 the cube doesn't stop moving in my tests.
 

Manolo

unread,
Dec 26, 2022, 11:34:58 AM12/26/22
to fltk.coredev
I attach a small FLTK-wayland program which is my attempt to draw a constantly changing GL3 scene
and to show in parallel time, observed fps, and a responsive FLTK widget, a text editor panel in that case.

In my hands, when the GL scene is asked to change 24 times per second, the observed fps is indeed 24.0.
The GL3 scene is visually constantly changing without stuttering.
The editor panel at right is always responsive.
The window can be made fullscreen and stays at 24 fps.

With a quite higher rate of GL3 redraw (80 / sec), the observed fps is about 80.0 when the window is as initially created,
but falls to 40-50 fps when the window is made fullscreen, on my virtual Linux machine. This shows the limit of
what GL can handle on this virtual hardware.

ToyGL3player.cxx

Bill Spitzak

unread,
Dec 26, 2022, 4:17:57 PM12/26/22
to fltkc...@googlegroups.com
Nothing should be attempting to read events from Wayland other than FL::wait. If they do so it will cause the fd to become not-ready unexpectedly.


--
You received this message because you are subscribed to the Google Groups "fltk.coredev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fltkcoredev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fltkcoredev/4260547a-e5f7-4e97-9015-04ea5684044en%40googlegroups.com.

Gonzalo Garramuño

unread,
Dec 26, 2022, 5:39:37 PM12/26/22
to fltkc...@googlegroups.com
Okay.  Thank you for being so thorough.  I tried your program and it did
not refresh properly.

I did a fresh install of Ubuntu and using the Mesa drivers I was able to
see the correct behavior you report.  The ToyGL3player refreshed
properly at 24 FPS even when going fullscreen.  I then tried my player
and it played back without stuttering but with a poor performance (which
was expected from the Mesa --or is it nouveau-- drivers).

I installed the Nvidia drivers (525) from the Ubuntu repositories and
went and tried the ToyGL3player and once again it did not refresh
properly except if I was moving the mouse in the window. I tried my
player and it was stuttering once again.

My Nvidia card is an old GeForce GTX 960.

It seems there's an issue with the NVidia drivers and Wayland.

Greg Ercolano

unread,
Dec 26, 2022, 8:04:15 PM12/26/22
to fltkc...@googlegroups.com

On 12/26/22 14:39, Gonzalo Garramuño wrote:

It seems there's an issue with the NVidia drivers and Wayland.

    I wonder if that's because of this infamous comment:
    https://youtu.be/MShbP3OpASA?t=2927


Manolo

unread,
Dec 27, 2022, 9:05:23 AM12/27/22
to fltk.coredev
Le lundi 26 décembre 2022 à 23:39:37 UTC+1, ggar...@gmail.com a écrit :

It seems there's an issue with the NVidia drivers and Wayland.

OK. Let's try to make something that would also work with the NVidia driver.

Please, try ToyGL3player with a version of libfltk_gl using the attached modified version of file
src/drivers/Wayland/Fl_Gl_Window_Driver.cxx on your system with the NVidia driver.
If it still blocks, don't loose hope and try to make this change in the attached file :
Change line #368 from
#if 0 // OTHER_WAY
to
#if 1 // OTHER_WAY
and build libfltk_gl again and link ToyGL3player.

Please report what happens in these 2 cases with ToyGL3player.

If ToyGL3player works, try also the modified source file with your app.

Fl_Wayland_Gl_Window_Driver.cxx

Gonzalo Garramuño

unread,
Dec 27, 2022, 12:45:35 PM12/27/22
to fltkc...@googlegroups.com

On 27/12/22 11:05, Manolo wrote:
>
> Please report what happens in these 2 cases with ToyGL3player.

The results were rather similar, except for the stuttering.

Without OTHER_WAY, the square would resize when the mouse was outside
the window, but with stuttering (at around 4.7 fps).  Then after a
while, it would stop (block?).  Reentering the window with the mouse
would start the resizing again with stuttering.  Moving the mouse in the
text window would make the square resize continuously (eventually
reaching 24fps if moving fast enough).

With OTHER WAY, the square would *not* resize when the mouse was outside
the window.  When the mouse was in the window, the square would resize
faster than without OTHER WAY, but it would still stop when the mouse
stopped moving.  There would be no stuttering in this case.  As in the
other case, moving the mouse in the text window fast enough, would make
the square resize continuously (up to 24 fps).

Gonzalo Garramuño

unread,
Dec 27, 2022, 12:58:35 PM12/27/22
to fltkc...@googlegroups.com

On 27/12/22 11:05, Manolo wrote:
>
> If ToyGL3player works, try also the modified source file with your app.

OK.  I also tried my application with OTHER_WAY on and, as long as my
mouse was in the window, it played back the movie correctly.  As soon as
the mouse left the window, no playback continued.

Manolo

unread,
Dec 28, 2022, 2:06:33 AM12/28/22
to fltk.coredev
Le mardi 27 décembre 2022 à 18:58:35 UTC+1, Gonzalo a écrit :

OK.  I also tried my application with OTHER_WAY on and, as long as my
mouse was in the window, it played back the movie correctly.  As soon as
the mouse left the window, no playback continued.

Is it necessary  to constantly move the mouse and keep it inside the window for the movie to play?
Or, is it enough to put the mouse in the window and let it stay still therein ?

Gonzalo Garramuño

unread,
Dec 28, 2022, 5:28:45 AM12/28/22
to fltkc...@googlegroups.com
It's enough to put it inside the OpenGL window and let it therein.  Note that if I put the mouse anywhere else of my GUI instead of the GL window (timeline, etc) it will also stop the playback.
Reply all
Reply to author
Forward
0 new messages