Double free of groups

10 views
Skip to first unread message

Rob McDonald

unread,
Mar 14, 2023, 8:29:04 PM3/14/23
to fltk.coredev
a09b0e2357f2  Remove reversal of children in Fl_Group::clear()  (Albrecht Schlosser)

Causes my program to attempt to double-free a widget on shutdown.

My program subclasses Fl_Group.  My specialized version only implements a constructor and a "int handle(int event)" method.  (I do not implement a destructor or clear() method).

The particular group that is crashing (they don't all seem to) includes widgets from Cartesian, a now apparently defunct 2D plotting library for FLTK developed by Roman Kantor.

The first entity that tries to double-free is a Ca_Y_Axis, which comes from Ca_Axis, which is derived from a Fl_Box.

The stack trace looks something like this...
  • 0x103acdf20 Ca_Y_Axis::~Ca_Y_Axis
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8ba84 Fl_Group::~Fl_Group
  • 0x10263efe4 Vsp_Group::~Vsp_Group
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8bb4c Fl_Group::~Fl_Group
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8ba84 Fl_Group::~Fl_Group
  • 0x10263efe4 Vsp_Group::~Vsp_Group  

Is this a bug with the new Fl_Group::clear()?

if not, does anyone have suggestions for fixing it?  Do you expect it to be a problem with my subclass of Fl_Group, or a problem with Cartesian?

thanks for any ideas.

Albrecht Schlosser

unread,
Mar 15, 2023, 10:12:16 AM3/15/23
to fltkc...@googlegroups.com
On 3/15/23 01:29 Rob McDonald wrote:
a09b0e2357f2  Remove reversal of children in Fl_Group::clear()  (Albrecht Schlosser)

Causes my program to attempt to double-free a widget on shutdown.

Sigh. I was afraid that this could happen but did it anyway (to improve speed) in the hope it wouldn't. It's difficult...


My program subclasses Fl_Group.  My specialized version only implements a constructor and a "int handle(int event)" method.  (I do not implement a destructor or clear() method).

Given your stack trace below, which one is your subclass? I assume it is Vsp_Group, is this correct?


The particular group that is crashing (they don't all seem to) includes widgets from Cartesian, a now apparently defunct 2D plotting library for FLTK developed by Roman Kantor.

OK, for my understanding, what do you mean with "now apparently defunct" ? Do you say this because your application crashes (with double free) or for any other reason?


The first entity that tries to double-free is a Ca_Y_Axis, which comes from Ca_Axis, which is derived from a Fl_Box.

The stack trace looks something like this...
  • 0x103acdf20 Ca_Y_Axis::~Ca_Y_Axis
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8ba84 Fl_Group::~Fl_Group
  • 0x10263efe4 Vsp_Group::~Vsp_Group
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8bb4c Fl_Group::~Fl_Group
  • 0x103c8b8d0 Fl_Group::clear
  • 0x103c8ba84 Fl_Group::~Fl_Group
  • 0x10263efe4 Vsp_Group::~Vsp_Group  

Is this a bug with the new Fl_Group::clear()?

Hard to say, but without seeing your code I suspect that your Vsp_Group *can* cause the issue. As you know, I'm responsible for the change that appears to cause it, and I have some ideas how this can happen. But to explain it I need more details, so ...

Is your code public, can I see it somewhere? Although it's maybe not a good idea to investigate in someone else's full code, I have some ideas what to look for, hence I could maybe try to do it. I would also be interested in testing the code so I could analyze it better. However, there shouldn't be too many dependencies.

That said, I wonder why I see 'Vsp_Group::~Vsp_Group' twice in the stack trace. Just to be sure, do you have nested Vsp_Group's, or is this maybe a recursion? Nested groups would be OK, a recursion probably not.

How do you use the Cartesian widget(s) in your Vsp_Group? Are they somehow embedded as subwidgets (like the scrollbars in Fl_Scroll [2]), or are are they allocated with 'new'? This might make a difference. [1]


if not, does anyone have suggestions for fixing it?

Not yet, please answer my questions above and/or show at least some code or point me to the full code somewhere.


Do you expect it to be a problem with my subclass of Fl_Group, or a problem with Cartesian?

I have an old version of Fl_Cartesian (Fl_Cartesian_0.9.2.tgz) which says in its header and implementation:
==> Fl_Cartesian.H <==
// Cartesian.H,v 1.0
//
// Copyright 2000-2005 by Roman Kantor.

==> Fl_Cartesian.cxx <==
// Cartesian.cpp,v 1.0
//
// Copyright 2000-2005 by Roman Kantor.

Which version are you using, and if it's different (maybe newer), do you have a current URL for downloading it? I didn't find a newer version.

In my version of Fl_Cartesian I didn't find anything that would explain the issue, but OTOH I couldn't study every instruction. I would need something I can test.

thanks for any ideas.

I have some ideas, but again, without seeing more code, this would be just guessing.

However, I'm really interested in this "test case" because I need to understand what's happening, if it is the mentioned commit that changes the behavior. Thanks in advance for any insights (particularly code).

PS:

[1] As you can maybe conclude from my statement above: if you are *embedding* the Cartesian widget in your Vsp_Group, could you test allocating it with operator `new` instead and report if this really makes a difference? This is just a guess...

[2] https://github.com/fltk/fltk/blob/0f41797b7ab0ee010e31659f60cafd38affac8f5/FL/Fl_Scroll.H#L157

Albrecht Schlosser

unread,
Mar 15, 2023, 11:46:15 AM3/15/23
to fltkc...@googlegroups.com
On 3/15/23 15:12 Albrecht Schlosser wrote:
On 3/15/23 01:29 Rob McDonald wrote:
a09b0e2357f2  Remove reversal of children in Fl_Group::clear()  (Albrecht Schlosser)

Causes my program to attempt to double-free a widget on shutdown.
[...]
The particular group that is crashing ... includes widgets from Cartesian, a [...] 2D plotting library for FLTK developed by Roman Kantor.

OK, after a little research I found OpenVSP on GitHub (which appears to be the project you mentioned) and a newer version of the Cartesian library here: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/

Meanwhile I also found out that this library adds more than one FLTK widget to its parent group (or window) - which may cause the double free under certain circumstances - and I could indeed make the provided example program crash with double free if I only delete the main window after `Fl::run()`.

Source code: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/test/example.cpp

Modified like this (see comment):
```
Fl_Double_Window *w = new Fl_Double_Window(580, 380, "Cartesian graphics example");
// ... more code here ...
w->end();
w->show(argc, argv);
Fl::add_timeout(0, next_freq);
Fl::run();
delete w; // ADDED only this statement
return 0;
```

    
This is an example I can work with, and I'll investigate this issue. Please don't expect quick results though, today my time is very much limited. I hope I can find something more tomorrow.


Current status of investigation:

1. the modified Cartesian example crashes with double free when built with FLTK 1.4.0 (git)
2. it does not crash with FLTK 1.3.x (not sure which git commit exactly), i.e. it exits silently

Note that this doesn't mean (yet) that the program doesn't *attempt* a double free in FLTK 1.3. So far it only means that this is *not diagnosed* by my current build of 1.3.x. I'll need to use a memory checker to verify this and I need more time to debug it.

I'll let you know when I have more info.

Rob McDonald

unread,
Mar 15, 2023, 12:27:39 PM3/15/23
to fltk.coredev
On Wednesday, March 15, 2023 at 8:46:15 AM UTC-7 Albrecht Schlosser wrote:
OK, after a little research I found OpenVSP on GitHub (which appears to be the project you mentioned) and a newer version of the Cartesian library here: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/

Great.  You are correct -- that is my program.

I saw your first email.  I figured I would respond to the second (this) one first and then go back and try to answer your questions in the first message too.

OpenVSP is not exactly a MWE of any particular bug.  I'll try to point you at the right spots. 

 
Meanwhile I also found out that this library adds more than one FLTK widget to its parent group (or window) - which may cause the double free under certain circumstances - and I could indeed make the provided example program crash with double free if I only delete the main window after `Fl::run()`.

Source code: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/test/example.cpp

Modified like this (see comment):
```
Fl_Double_Window *w = new Fl_Double_Window(580, 380, "Cartesian graphics example");
// ... more code here ...
w->end();
w->show(argc, argv);
Fl::add_timeout(0, next_freq);
Fl::run();
delete w; // ADDED only this statement
return 0;
```

This is an example I can work with, and I'll investigate this issue. Please don't expect quick results though, today my time is very much limited. I hope I can find something more tomorrow.


Great -- I agree that a Cartesian example will be a much better MWE.

No hurry, my 'release' versions will still ship with an older version of FLTK.  I'm trying to occasionally keep up with the tip of master so I'll be ready for 1.4 -- when I find something awry, I try to bring it up so it can get fixed.

I.e. this is not urgent to me, but hopefully it will get fixed on the 1.4.0 timescale. 

 
Current status of investigation:

1. the modified Cartesian example crashes with double free when built with FLTK 1.4.0 (git)
2. it does not crash with FLTK 1.3.x (not sure which git commit exactly), i.e. it exits silently

Note that this doesn't mean (yet) that the program doesn't *attempt* a double free in FLTK 1.3. So far it only means that this is *not diagnosed* by my current build of 1.3.x. I'll need to use a memory checker to verify this and I need more time to debug it.

I was running with the address sanitizer and I bisected the git commits down before I posted the message blaming one specific commit.  In my testing, the immediately previous commit exited cleanly from an Address Sanitizer point of view.
 
I'll let you know when I have more info.

Thanks much,

Rob

 

Rob McDonald

unread,
Mar 15, 2023, 12:34:40 PM3/15/23
to fltk.coredev
On Wednesday, March 15, 2023 at 7:12:16 AM UTC-7 Albrecht Schlosser wrote:
Sigh. I was afraid that this could happen but did it anyway (to improve speed) in the hope it wouldn't. It's difficult...

Understood.  Sometimes we get lucky, sometimes we don't. 

Given your stack trace below, which one is your subclass? I assume it is Vsp_Group, is this correct?

Correct.

The definition of VspGroup is here.  The implementation is here

The particular group that is crashing (they don't all seem to) includes widgets from Cartesian, a now apparently defunct 2D plotting library for FLTK developed by Roman Kantor.

OK, for my understanding, what do you mean with "now apparently defunct" ? Do you say this because your application crashes (with double free) or for any other reason?

By 'now apparently defunct', I just meant that Cartesian is no longer a maintained / supported project.  Its website is down and I believe the only references to it on the internet are some abandoned SourceForge pages.  We have occasionally had to make small changes to Cartesian to keep it going.

 
Hard to say, but without seeing your code I suspect that your Vsp_Group *can* cause the issue. As you know, I'm responsible for the change that appears to cause it, and I have some ideas how this can happen. But to explain it I need more details, so ...

Is your code public, can I see it somewhere? Although it's maybe not a good idea to investigate in someone else's full code, I have some ideas what to look for, hence I could maybe try to do it. I would also be interested in testing the code so I could analyze it better. However, there shouldn't be too many dependencies.

I can certainly help you look through OpenVSP, but you probably don't want to compile it and use it as a debugging platform.  You've found the Cartesian project and can replicate the issue with its test case -- that should be a better way to go.
 
 
That said, I wonder why I see 'Vsp_Group::~Vsp_Group' twice in the stack trace. Just to be sure, do you have nested Vsp_Group's, or is this maybe a recursion? Nested groups would be OK, a recursion probably not.

I believe it is recursing. 

I think the rest of the questions are answered / superseded by your followup.

Thanks a ton for your attention to this and for all you do with FLTK.

Rob



Albrecht Schlosser

unread,
Mar 16, 2023, 1:53:01 PM3/16/23
to fltkc...@googlegroups.com
On 3/15/23 16:46 Albrecht Schlosser wrote:

... I found ... a newer version of the Cartesian library here: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/


Meanwhile I also found out that this library adds more than one FLTK widget to its parent group (or window) - which may cause the double free under certain circumstances - and I could indeed make the provided example program crash with double free if I only delete the main window after `Fl::run()`.

See below, this was not the case, i.e. it was not FLTK's fault.

Source code: https://sourceforge.net/p/rfltk/code/HEAD/tree/cartesian/test/example.cpp

Modified like this (see comment):
```
Fl_Double_Window *w = new Fl_Double_Window(580, 380, "Cartesian graphics example");
// ... more code here ...
w->end();
w->show(argc, argv);
Fl::add_timeout(0, next_freq);
Fl::run();
delete w; // ADDED only this statement
return 0;
```

This is an example I can work with, and I'll investigate this issue. ...


I'll let you know when I have more info.

I have good news: I found the culprit in the Cartesian software and it is easy to fix. Fortunately it has nothing to do with the changes in FLTK - at least as far as I could find out. I concentrated on finding the bug and didn't investigate why the older FLTK version did not exhibit the bug whereas the newer version does because this is moot. A bug is a bug. ;-)


The double free bug is (potentially) caused by two identical statements in `Ca_X_Axis::~Ca_X_Axis()` and `Ca_Y_Axis::~Ca_Y_Axis()`. In my tests it was only triggered by `~Ca_Y_Axis()` but since both d'tors are structured identically I fixed both of them in my version.

If you comment out the two statements as shown below in context (marked with '<<<<<<<') the bug should be gone. Hopefully.

Note that this doesn't fix all memory leaks in the demo program, but neither valgrind nor ASAN report any errors after this fix.

Please let me know if this fixes the issue for you.


That said, I changed a lot more code in my version, fixed compiler warnings, removed unused variables, and fixed usage of uninitialized variables (!). I'm thinking about publishing my fixes somewhere, maybe on GitHub, but this is not yet possible - it would need some more work to clean up everything.

Rob, you wrote before that you also had to fix some issues. Can you tell me what you fixed, and/or post a link to your fixed version of the Cartesian library? Is it maybe included in your OpenVSP project? If yes, I don't remember to have seen it (only a link to the sourceforge repo, IIRC). I'd appreciate if we could synchronize our efforts, but this is now probably OT here and we can check this off-list if you agree. Let me know...


Back on topic:

My proposed changes: since I'm not sure that we are using the same version of the Cartesian project, following is the full code of the two mentioned d'tors with fixes.

```
Cartesian.cpp-728-Ca_X_Axis::~Ca_X_Axis(){
Cartesian.cpp-729-  if(canvas_){
Cartesian.cpp-730-    Ca_ObjectChain *ochain=canvas_->first_object_;
Cartesian.cpp-731-    Ca_ObjectChain *next;
Cartesian.cpp-732-    Ca_ObjectChain *previous=0;
Cartesian.cpp-733-    while (ochain){
Cartesian.cpp-734-      next=ochain->next;
Cartesian.cpp-735-      if(ochain->object->x_axis_==this){
Cartesian.cpp-736-        delete ochain->object;
Cartesian.cpp-737-        if(previous)
Cartesian.cpp-738-          previous->next=next;
Cartesian.cpp-739-        else
Cartesian.cpp-740-          canvas_->first_object_=next;
Cartesian.cpp:741:        // delete ochain; // would cause double free        <<<<<<<
Cartesian.cpp-742-      }
Cartesian.cpp-743-      ochain=next;
Cartesian.cpp-744-    }
Cartesian.cpp-745-  }
Cartesian.cpp-746-}
--
Cartesian.cpp-1051-Ca_Y_Axis::~Ca_Y_Axis(){
Cartesian.cpp-1052-  if(canvas_){
Cartesian.cpp-1053-    Ca_ObjectChain *ochain=canvas_->first_object_;
Cartesian.cpp-1054-    Ca_ObjectChain *next;
Cartesian.cpp-1055-    Ca_ObjectChain *previous=0;
Cartesian.cpp-1056-    while (ochain){
Cartesian.cpp-1057-      next=ochain->next;
Cartesian.cpp-1058-      if(ochain->object->y_axis_==this){
Cartesian.cpp-1059-        delete ochain->object;
Cartesian.cpp-1060-        if(previous)
Cartesian.cpp-1061-          previous->next=next;
Cartesian.cpp-1062-        else
Cartesian.cpp-1063-          canvas_->first_object_=next;
Cartesian.cpp:1064:        // delete ochain; // would cause double free        <<<<<<<
Cartesian.cpp-1065-      }
Cartesian.cpp-1066-      ochain=next;
Cartesian.cpp-1067-    }
Cartesian.cpp-1068-  }
Cartesian.cpp-1069-}
```

Albrecht Schlosser

unread,
Mar 16, 2023, 2:16:40 PM3/16/23
to fltkc...@googlegroups.com
On 3/16/23 18:52 Albrecht Schlosser wrote:
> Rob, you wrote before that you also had to fix some issues. Can you
> tell me what you fixed, and/or post a link to your fixed version of
> the Cartesian library? Is it maybe included in your OpenVSP project?


Never mind, I found it in your OpenVSP project. Should have double
checked before posting this.

Albrecht Schlosser

unread,
Mar 16, 2023, 2:28:23 PM3/16/23
to fltkc...@googlegroups.com
On 3/15/23 17:34 Rob McDonald wrote:
On Wednesday, March 15, 2023 at 7:12:16 AM UTC-7 Albrecht Schlosser wrote:
Sigh. I was afraid that this could happen but did it anyway (to improve speed) in the hope it wouldn't. It's difficult...

Understood.  Sometimes we get lucky, sometimes we don't.

Yep, I was and am still convinced that my change was good and correct, but you never know how users use the library. I remember that I once had an issue with the order of deletion of child widgets in an old program - which I'm no longer maintaining.



OK, for my understanding, what do you mean with "now apparently defunct" ? Do you say this because your application crashes (with double free) or for any other reason?

By 'now apparently defunct', I just meant that Cartesian is no longer a maintained / supported project.  Its website is down and I believe the only references to it on the internet are some abandoned SourceForge pages.

OK, I understand. I obviously had a translation problem. ;-)


We have occasionally had to make small changes to Cartesian to keep it going.

Yes, I also found some compiler warnings, uninitialized variables, and more. And I hope I fixed the main bug as written before in this thread.


I think the rest of the questions are answered / superseded by your followup.

Yes, indeed.


Thanks a ton for your attention to this and for all you do with FLTK.

You're welcome.

Thank you for testing FLTK 1.4 with your real world program and for reporting bugs and other issues. This is very much appreciated because real programs may show issues we'd never find otherwise. Even if this particular issue was fortunately not a FLTK bug. AFAICT.

Rob McDonald

unread,
Mar 16, 2023, 3:19:08 PM3/16/23
to fltk.coredev
Thanks a ton for running this to ground.  Much appreciated.

Rob
 

Rob McDonald

unread,
Mar 16, 2023, 3:23:04 PM3/16/23
to fltk.coredev
On Thursday, March 16, 2023 at 10:53:01 AM UTC-7 Albrecht Schlosser wrote:
Back on topic:

My proposed changes: since I'm not sure that we are using the same version of the Cartesian project, following is the full code of the two mentioned d'tors with fixes.

```
Cartesian.cpp-728-Ca_X_Axis::~Ca_X_Axis(){ Cartesian.cpp-729- if(canvas_){ Cartesian.cpp-730- Ca_ObjectChain *ochain=canvas_->first_object_; Cartesian.cpp-731- Ca_ObjectChain *next; Cartesian.cpp-732- Ca_ObjectChain *previous=0; Cartesian.cpp-733- while (ochain){ Cartesian.cpp-734- next=ochain->next; Cartesian.cpp-735- if(ochain->object->x_axis_==this){ Cartesian.cpp-736- delete ochain->object; Cartesian.cpp-737- if(previous) Cartesian.cpp-738- previous->next=next; Cartesian.cpp-739- else Cartesian.cpp-740- canvas_->first_object_=next; Cartesian.cpp:741: // delete ochain; // would cause double free <<<<<<< Cartesian.cpp-742- } Cartesian.cpp-743- ochain=next; Cartesian.cpp-744- } Cartesian.cpp-745- } Cartesian.cpp-746-} -- Cartesian.cpp-1051-Ca_Y_Axis::~Ca_Y_Axis(){ Cartesian.cpp-1052- if(canvas_){ Cartesian.cpp-1053- Ca_ObjectChain *ochain=canvas_->first_object_; Cartesian.cpp-1054- Ca_ObjectChain *next; Cartesian.cpp-1055- Ca_ObjectChain *previous=0; Cartesian.cpp-1056- while (ochain){ Cartesian.cpp-1057- next=ochain->next; Cartesian.cpp-1058- if(ochain->object->y_axis_==this){ Cartesian.cpp-1059- delete ochain->object; Cartesian.cpp-1060- if(previous) Cartesian.cpp-1061- previous->next=next; Cartesian.cpp-1062- else Cartesian.cpp-1063- canvas_->first_object_=next; Cartesian.cpp:1064: // delete ochain; // would cause double free <<<<<<< Cartesian.cpp-1065- } Cartesian.cpp-1066- ochain=next; Cartesian.cpp-1067- } Cartesian.cpp-1068- } Cartesian.cpp-1069-} ```


This fix works for me.

It took a while for me to figure out what he was doing -- it feels much more complex than a singly linked list should be.  It took even longer to convince myself that the fix is right.

As you say -- it is hard to understand why the FLTK change would trigger this.  

I also removed everything to do with previous -- it looks like a remnant of a doubly linked list, but it isn't used now.

Thanks again,

Rob


 

Albrecht Schlosser

unread,
Mar 16, 2023, 3:51:51 PM3/16/23
to fltkc...@googlegroups.com
On 3/16/23 20:23 Rob McDonald wrote:
On Thursday, March 16, 2023 at 10:53:01 AM UTC-7 Albrecht Schlosser wrote:
Back on topic:

My proposed changes: since I'm not sure that we are using the same version of the Cartesian project, following is the full code of the two mentioned d'tors with fixes.


This fix works for me.

It took a while for me to figure out what he was doing -- it feels much more complex than a singly linked list should be.  It took even longer to convince myself that the fix is right.

Yes, I also had a hard time to find this out. I added a lot of debug statements in destructors to see when the first occurrence of the double free happened exactly. ASAN was a big help, but finally you need to understand what he was doing. I was surprised that he didn't include pointers in the objects but used the linked list

As you say -- it is hard to understand why the FLTK change would trigger this.

Yes, I have no idea, but I won't bother finding it out.



I also removed everything to do with previous -- it looks like a remnant of a doubly linked list, but it isn't used now.

Hmm, I'm not sure. In my version class 'Ca_ObjectChain' has only a 'next' and an 'object' pointer. The (local) variable 'previous' is AFAICT needed to close the linked list whenever you remove an entry from the middle of the list.

But maybe you meant something else...

Albrecht Schlosser

unread,
Mar 16, 2023, 3:56:24 PM3/16/23
to fltkc...@googlegroups.com
On 3/16/23 20:51 Albrecht Schlosser wrote:
> On 3/16/23 20:23 Rob McDonald wrote:
>> It took a while for me to figure out what he was doing -- it feels
>> much more complex than a singly linked list should be.  It took even
>> longer to convince myself that the fix is right.
>
> Yes, I also had a hard time to find this out. I added a lot of debug
> statements in destructors to see when the first occurrence of the
> double free happened exactly. ASAN was a big help, but finally you
> need to understand what he was doing. I was surprised that he didn't
> include pointers in the objects but used the linked list

[hit send too early, should continue with]


... as separate objects. But I didn't pursue this further, I just wanted
to fix the bug, be it in Cartesian or FLTK.

I'm glad that this fixes the issue for you, thanks for confirmation.
Reply all
Reply to author
Forward
0 new messages