Possible GSource leak in wxGTK, especially when using a lot of timers (Issue #23364)

34 views
Skip to first unread message

Eevee

unread,
Mar 20, 2023, 4:35:58 PM3/20/23
to wx-...@googlegroups.com, Subscribed

Apologies if this is a little half-baked — C++ isn't my native language, I don't know much about GTK, and the project I'm trying to fix isn't even mine. :) But something funny is going on here and at the very least I could use some insight from someone who knows what they're doing.

The project is SLADE, a Doom editor (specifically issue 1016). A few years ago it started having a problem where, once the map editor had been opened, it would consume more and more CPU and become less and less responsive until it was completely unusable.

Subsequent on-and-off investigation revealed the following:

  • The cause seems to be an accumulation of GSources, to the point that the glib event loop cannot iterate over them fast enough. This ends up visible in perf top as g_source_ref eating most of the CPU time (called via g_main_loop_rung_main_context_iterateg_main_context_checkg_source_iter_next).

  • Armed with a debug copy of glib, I found that the extraneous GSources are all timeouts. SLADE's map editor has a wxTimer that goes off every 15ms to trigger its GL canvas to redraw (by calling Refresh), and sometimes the associated GSource gets incref'd one too many times and sits in the GMainContext, in a dead state, forever.

    The timer is a oneshot that's restarted from its own event handler. I don't know why that's the case, but the obvious next step is:

  • I tried making SLADE's timer continuous, and that partially fixed the problem since it would only ever create one GSource... but then I would still accumulate timeout GSources, this time from GTK itself — they're named [gtk+] gdk_frame_clock_paint_idle and can be seen being set here.

This doesn't make any sense — timeouts are handled entirely internally by glib, and nothing outside has any reason to alter their refcount at all (save for decrementing it once). And yet, they persist.

As far as I can tell, what's actually happening is this:

  1. wxGUIEventLoop::DoRun calls gtk_main, which in turn calls g_main_loop_run.

  2. g_main_loop_run calls g_main_context_iterate, which assembles a list of GSources that are ready to be dispatched. Because the list holds copies of pointers to the GSources, their refcounts are all incremented.

    This list is internal and attached to the GMainContext, so to be safe, the function that actually assembles it (g_main_context_prepare) first clears it out and decrements the refcount of everything in it.

    Notably, there are only three places this list is cleared: when populating the list anew; when dispatching to everything in the list; and when destroying the context.

  3. g_main_loop_run now iterates over that list and dispatches events, via g_main_dispatch.

  4. Sometimes, one of those is an idle event, which calls wxApp::DoIdle.

  5. That calls gtk_events_pending, which calls g_main_context_pending, which calls... g_main_context_iterate. This, again, clears out the list of pending events and repopulates it.

  6. wxApp::DoIdle returns. Now we have a problem. We are back to the dispatch loop from step 3, except that the list has changed out from under us, and we've skipped part of it. The dispatch loop does deref every source it iterates over, but because we've already processed at least one source (the one that triggered the idle event), if more sources became ready in the time between the initial check of sources and the call to gtk_events_pending, we will fail to deref at least one of them. It will then have an incorrectly high refcount and never ever be destroyed.

I'm guessing this is happening in the first place because SLADE calls Refresh from its timer handler, which then makes GTK want to repaint, which adds a new GSource, which sometimes becomes ready immediately, all while we're still in the middle of the glib dispatch loop.

This raises two questions.

  • Whose fault is this? These are pretty low-level details, and I'm not sure whether wxGTK is incorrect for calling g_main_context_pending from within an event handler, or glib is incorrect for not handling this case correctly, or SLADE is incorrect for calling Refresh via a timer (but I couldn't find any documentation suggesting that's a bad idea).

  • How has no one ever noticed this before? It seems like a conspicuous problem arising from an obvious pattern, but it only seems to impact SLADE, and even then, only after the map editor has opened.

Arch Linux x64, GTK 3.24.36, glib 2.74.5 (as well as git main), X11, wxWidgets 3.2.2.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364@github.com>

VZ

unread,
Mar 20, 2023, 6:49:34 PM3/20/23
to wx-...@googlegroups.com, Subscribed

Thanks for looking at this!

It does look like a bug in wxGTK, but I won't be able to look at this in the near future because my short-term TODO queue is already overflowing right now (huh, looks like the same issue), so for now all I can suggest is to use a simple workaround and avoid calling Refresh() if the window hasn't been refreshed yet since it was called the last time: e.g. have some repaint_requested flag, set it to true in the timer handler and reset it to false at the end of your wxEVT_PAINT handler. AFAICS this ought to fix the problem without any real bad side effects.

As for actually fixing the problem, it would, of course, be invaluable to have a simple example reproducing it, so if you could please make one, it would be really great. TIA!


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1477045914@github.com>

Eevee

unread,
Mar 20, 2023, 9:49:07 PM3/20/23
to wx-...@googlegroups.com, Subscribed

Further adventures:

from src/app/gtk.cpp:

    bool needMore;
    do {
        ProcessPendingEvents();

        needMore = ProcessIdle();
    } while (needMore && gtk_events_pending() == 0);

So if needMore is false, then gtk_events_pending is never called, and the problem can never happen. That might explain its relative rarity, at least.

The only thing that sets needMore is some window calling wxIdleEvent::RequestMore — and indeed, doing this in my example does replicate the problem. In SLADE's case it seems to be coming from Scintilla, which likes idle events for wrapping lines. (Incidentally I caught a glimpse of it calling WrapLines very frequently and eating more CPU than seemed necessary, even for text that hadn't been changed at all.)

And SLADE uses a wxStyledTextCtrl for... its console. So this was happening every single time something printed to the console, which the map editor does quite frequently. And finally, finally, I have a full explanation.


This little wxPython script should repro. Run it (or port it to C++ first if you must), watch your favorite top variant, and its cpu usage should steadily climb. perf top -t PID should blame g_source_ref.

import time

import wx


class MainFrame(wx.Frame):
    def __init__(self):
        super().__init__(None, title="glib leak test")

        self.timer = wx.Timer(self)
        self.timer.Start(10, True)

        self.Bind(wx.EVT_TIMER, self.ontimer)
        self.Bind(wx.EVT_IDLE, self.onidle)

    def ontimer(self, evt):
        self.timer.Start(10, True)

    def onidle(self, evt):
        time.sleep(0.005)
        evt.RequestMore()


app = wx.App()
frame = MainFrame()
frame.Show()
app.MainLoop()

The sleep is just so the app isn't constantly doing things, or it would eat a full core spinning its wheels. :)

It's difficult to directly demonstrate that leaking sources are the problem, because it's occurring way down in private struct fields, but adding debug code into a local copy confirms it.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1477171081@github.com>

Eevee

unread,
Mar 20, 2023, 11:44:12 PM3/20/23
to wx-...@googlegroups.com, Subscribed

Also: I'm far from an expert here, but I'm not sure the gtk_events_pending check is even necessary? If anyone wants more idle events, then the block at the end of wxApp::DoIdle adds an idle source to the GLib context. That source should respond right away on the next iteration of the event loop, unless other events have queued in the meantime... which is basically what DoIdle is doing already.

First introduced in 5375a1f in 2002, and the surrounding code has changed a lot since. The block at the end that reinstalls an idle source is certainly much newer.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1477240112@github.com>

VZ

unread,
Mar 21, 2023, 7:43:45 AM3/21/23
to wx-...@googlegroups.com, Subscribed

Thanks a lot for the test case, this is really helpful!

Unfortunately I think gtk_events_pending() is necessary, without it we wouldn't process any user-generated events as long as the wxEVT_IDLE handler kept requesting more time, which wouldn't be right, so the fix is not as simple as just removing it.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1477692030@github.com>

paulcor

unread,
Mar 21, 2023, 1:32:35 PM3/21/23
to wx-...@googlegroups.com, Subscribed

Patch against the minimal sample that will reproduce the issue:

diff --git a/samples/minimal/minimal.cpp b/samples/minimal/minimal.cpp
index 501caf9096..0c185da9f3 100644
--- a/samples/minimal/minimal.cpp
+++ b/samples/minimal/minimal.cpp
@@ -65,6 +65,9 @@ public:
     void OnAbout(wxCommandEvent& event);
 
 private:
+    void OnIdle(wxIdleEvent&);
+    void OnTimer(wxTimerEvent&);
+    wxTimer m_timer;
     // any class wishing to process wxWidgets events must use this macro
     wxDECLARE_EVENT_TABLE();
 };
@@ -137,10 +140,25 @@ bool MyApp::OnInit()
 // main frame
 // ----------------------------------------------------------------------------
 
+void MyFrame::OnIdle(wxIdleEvent& event)
+{
+    wxMicroSleep(5000);
+    event.RequestMore();
+}
+
+void MyFrame::OnTimer(wxTimerEvent& event)
+{
+    event.GetTimer().Start(10, true);
+}
+
 // frame constructor
 MyFrame::MyFrame(const wxString& title)
        : wxFrame(NULL, wxID_ANY, title)
+       , m_timer(this)
 {
+    Bind(wxEVT_IDLE, &MyFrame::OnIdle, this);
+    Bind(wxEVT_TIMER, &MyFrame::OnTimer, this);
+    m_timer.Start(10, true);
     // set the frame icon
     SetIcon(wxICON(sample));
 


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1478316928@github.com>

VZ

unread,
Mar 21, 2023, 5:12:02 PM3/21/23
to wx-...@googlegroups.com, Subscribed

Sorry for my misleading comment above, I didn't realize that GTK would still call our existing idle hook, so, apparently, the fix is as simple as that.

Could you please check if the linked PR works correctly for you? TIA!


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issues/23364/1478585798@github.com>

VZ

unread,
Mar 24, 2023, 1:10:55 PM3/24/23
to wx-...@googlegroups.com, Subscribed

Closed #23364 as completed via 2d32b8c.


Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.Message ID: <wxWidgets/wxWidgets/issue/23364/issue_event/8841081596@github.com>

Reply all
Reply to author
Forward
0 new messages