FLTK logo

STR #1894

FLTK matrix user chat room
(using Element browser app)   FLTK gitter user chat room   GitHub FLTK Project   FLTK News RSS Feed  
  FLTK Apps      FLTK Library      Forums      Links     Login 
 Home  |  Articles & FAQs  |  Bugs & Features  |  Documentation  |  Download  |  Screenshots  ]
 

Return to Bugs & Features | Roadmap 1.3 | SVN ⇄ GIT ]

STR #1894

Application:FLTK Library
Status:1 - Closed w/Resolution
Priority:4 - High, e.g. key functionality not working
Scope:3 - Applies to all machines and operating systems
Subsystem:Core Library
Summary:Fl_Group::clear() accesses potentially deleted widget
Version:1.3-current
Created By:AlbrechtS
Assigned To:AlbrechtS
Fix Version:1.3.0 (SVN: v6623)
Fix Commit:eec7f80e949dd78874741a6c17fe66ec74cec3a5
Update Notification:

Receive EMails Don't Receive EMails

Trouble Report Files:


Name/Time/Date Filename/Size  
 
#1 AlbrechtS
09:24 Mar 12, 2008
crash.cxx
4k
 
 
#2 AlbrechtS
09:25 Mar 12, 2008
crash_1.cxx
8k
 
 
#3 AlbrechtS
08:38 Dec 29, 2008
del_widget-r_6608.patch
4k
 
     

Trouble Report Comments:


Name/Time/Date Text  
 
#1 AlbrechtS
09:24 Mar 12, 2008
This is a long standing bug. Maybe since version 1.0.

The docs of Fl_Group state: "The destructor also deletes all the children. This allows a whole tree to be deleted at once, without having to keep a pointer to all the children in the user code. A kludge has been done so the Fl_Group and all of it's children can be automatic (local) variables, but you must declare the Fl_Group first, so that it is destroyed last."

Obviously, the kludge is this statement in Fl_Group::clear() at line #357:

  if (o->parent() == this) delete o;

together with the Fl_Widget destructor setting its parent_ to 0 (without removing itself from the parent group).

Most of the time, this works...

But sometimes, it can happen, that the memory of a deleted child widget has already been "returned to the system" and thus isn't accessible any more. Then, the program segfaults in o->parent().

I will attach a test program (crash.cxx) that demonstrates this behavior, together with comments, how to make it "work" or "crash". The values given in the head of "crash.cxx" are from my test systems: Windows XP (SP2) and Linux (debian/Knoppix based, kernel 2.6.17). They may vary on different systems, or the program may not crash at all, dependent on the memory allocation implementation.

The test program adds two widgets (My_Input) to a group, and each My_Input adds two other box widgets (green and red) to the same group.

There are two buttons to remove (1) My_Input #1 from the group and (2) the complete group from the window.

IMHO, this is all perfectly correct code WRT FLTK docs and implementation details, but if I'm wrong, please correct me.

I'll attach another, slightly modified source file with a documented debugging session in the next post.

Albrecht
 
 
#2 AlbrechtS
09:29 Mar 12, 2008
crash_1.cxx is a modified version of crash.cxx with more printf statements and with an attached debugging session output. Please note, that the embedded array in My_Box is declared as int instead of char, as it is in crash.cxx. Therefore, you may need to modify KBS accordingly.  
 
#3 AlbrechtS
09:49 Mar 12, 2008
IMHO, the code of Fl_Group::clear(), which is called from Fl_Group's destructor, is too much "optimized", because it sets the children_ member variable to 0 before deleting all the children. Then, it deletes all children that _appear_ to be members of the group.

This effectively prevents child widgets from remove()ing themselves or any related widgets from the group, when they are destroyed from Fl_Group::clear(), because Fl_Group::remove() wouldn't find() the widget in its list, because there are 0 children().

In svn -r 1878, it has already been tried to remove a widget from its parent group and to remove the check in Fl_Group::clear():

-    // test the parent to see if child already destructed:
-    if (o->parent() == this) delete o;
+    delete o;

and in Fl_Widget.cxx:

-  parent_ = 0; // kludge to prevent ~Fl_Group from destroying again
+  if (parent_) parent_->remove(this);

(please note the comments that have been deleted),

but it has been reversed in svn -r 1883. The log says:

Revert Fl_Group/Fl_Widget destructor change - it doesn't work for
statically initialized widgets (like the widgets in a color chooser...)

I tried an improved solution that doesn't set children_ to 0, and uses a while loop, something like:

  while (children_) {
    int old_children = children();
    Fl_Widget* o = child(0);  // or (children_-1);
    if (o->parent() == this) {
      remove(o); // remove widget from group
      delete o; // ... and delete it
    }
    else {
      remove(o); // remove only
    }
  }

to allow deleted widgets to delete and remove their dependent widgets in their destructor, but this still has problems with statically initialized widgets that are added to a group. I could see bad effects with fluid and with menus.

I'm still testing, but what I found so far, seems to mean that correcting this bug might have consequences for other (maybe user) code.

Albrecht
 
 
#4 matt
12:14 Mar 12, 2008
I do agree with all your findings and I am very thankful that you tested them out so thoroughly. However, as you say yourself, this issue has been around since FLTK1.0 and seems to be more a conceptual problem than a killer bug.

I suggest that we attack this issue when we have the ability to change binary compatibility and besic concepts of FLTK. Upgrading to 1.2 (which currently stands for "whatever comes after 1.1").
 
 
#5 AlbrechtS
12:58 Mar 12, 2008
Okay, I agree to moving this to 1.2, but this problem really hit me in real life code (it's not an academic one). Maybe others have unexplained and not reproducible crashes, too.

I'll keep looking for a solution. Maybe there should be a note in the README file or somewhere else?

Albrecht
 
 
#6 matt
13:11 Mar 12, 2008
Is there a specific sentence I could add to the Fl_Group documenttion? Along the lines of "Don't mix locally allocated widgest with ... or Fl_Group may get confused and crash when freeing your widget." ?  
 
#7 AlbrechtS
14:07 Mar 12, 2008
"Don't mix ..." is a good recommendation, but I need some time to think about it and do more tests, and it's already late here ...

As it seems now, it should be taken care that all static and/or automatic objects (this does also include embedded widgets in derived groups like the scrollbars in Fl_Scroll) _must_ be removed from their parent groups before the parent group will be deleted. Otherwise it could happen that Fl_Scroll might access deleted widgets, or - even worse - widgets could be deleted twice.

I didn't look into the code of menus yet, but from what I saw with some printf's in Fl_Group::clear(), I'd assume that static Fl_Menu_Items would be added to a menu window to display the menu hierarchie. They should be removed before deleting the menu window, too.

I'll keep you updated, if I find more.
 
 
#8 rgarcia
04:47 May 13, 2008
You can use valgrind to check whether it is using memory after free() independently of the OS reclaiming the memory.

If this is the case, fltk's behaviour is undefined (as per the C++ standard), and should be fixed.
 
 
#9 fabien
02:23 Aug 26, 2008
Wouldn't it be better to solve such problems to add in the the Widget destructor a parent remove() call so that the widget smoothly detach from its parent before it destroys ?
This implies that children should always be destroyed before their parents, and should be documented but this make sense to me.

A radical fix would be the use of a new is_valid(Fl_Widget*) static method that would associate a couple(unique_id,pointer) dictionary to any created widget during its life cycle.
This has been discussed before, and I demonstrated the feasability of such implementation by efficient use of hash tables for the dictionary impl.
For it to work, any Fl_Widget constructor would 'register' in the dictionary, and its (virtual) destructor would 'unregister' himself from the dict.
Then the destructor should be protected and the unique way to delete a Widget would be to call a static 'destroy' method.
Before removing a parent link, the child would also validate the parent reference before calling its parent remove method.
 
 
#10 AlbrechtS
03:56 Aug 26, 2008
Fabien, thanks for your comments. IMHO, your suggestion to remove a widget from its parent group in its destructor is the proper way to go. Meanwhile I have a working patch for our local FLTK version, that is in production code for some time now. However, I didn't find the time to propose an official patch - and I think that there is still at least one problem that needs to be solved for an official patch. See below.

However, I don't like the idea of using a dictionary of valid widgets, as you propose. This seems to be contrary to the "Fast Light" principle of FLTK (think of using hundreds or thousands of widgets in an application). And I think that it wouldn't solve the problem shown below.

Matt proposed "smart pointers" for some critical code paths, where widgets might be deleted during callbacks. IMHO, this should be used in FLTK's do_callback() logic, but that's another STR.

The problem with fluid, that I mentioned at Mar 12, 2008, turned out to be another "kludge" in Fl_Value_Input. I could fix that, too. It's still a kludge, but maybe someone can find a better solution.

The *unsolved* problem is:

Because Fl_Group's destructor deletes all its children (recursively), there *_must_not_be_* static widgets as children in an Fl_Group or any of its children, when an Fl_Group is deleted, because these static widgets would be deleted, too. But then, at program exit, the static widgets' destructors would be called, and these static widgets would be deleted a second time. Hence, there would be a high probability of a crash at program exit.

Thus, the "don't mix" rule (cf. Mar 12, 2008) would apply here. This would be something like: "The programmer is responsible for removing all static widgets that are children of an Fl_Group (or any of its children - that's recursive), before deleting an Fl_Group (or any Fl_Group derived widget)".

This would be a new constraint, that is not documented in FLTK 1.1: If you add static widgets to a group, then remove them before deleting the group or any parent group. It's simple, but it has some potential to break existing code. OTOH, this would be good proramming practice, anyway.

This rule would *not* apply to *automatic* (widget) variables that are destructed before their parent group, as documented already for FLTK 1.x. It works with composed widgets, too, because the child widgets would usually be deleted first.

I'll try to create a patch for FLTK 1.3 soon. Or maybe for FLTK 1.1.9, if someone wants to test this with FLTK 1.1 code (because this is stable). The patch could be ported to FLTK 1.3 later, if confirmed (I assume that we wouldn't want to apply it to FLTK 1.1 ?).
 
 
#11 fabien
04:20 Aug 26, 2008
I also think less (modifications) is better, especially in the fltk1.x branch.
This is why I suggested first this idea of unregistering the parent before deletion first.
As far as I recall, the auto pointers can solve invalid deletion problems at the cost of adding more ref. counting hassle, to handle them and still may not prevent the need of taking care of some parent/children side effects manually.
IHMO, it could indeed be sufficient to document the side effects occuring when static widgets are used in groups, and it makes sense to give the developer the responsibility of removing them before the group does it as there is no way to detect/distinguish them from dynamically created widgets.
 
 
#12 StanS
08:04 Aug 26, 2008
If modifications to fltk's memory management are being contemplated,
I'd suggest that it would also be nice if this program didn't
crash (#include's omitted for brevity):

int main()
{
    Fl_Double_Window* win(300, 300);
    Fl_Box bx(50, 50, 30, 30);
    win->end();
    win->show();
    int r = Fl::run();
    delete win;
    return r;
}
 
 
#13 fabien
08:38 Aug 26, 2008
Please don't spam this specific STR with such considerations.
Instead, discuss what you may not understand about fltk1.x memory management policies in the ftlk.general discussion forum first.
Thanks.
 
 
#14 fabien
00:32 Dec 04, 2008
We should apply what albrecht did in his branch, or at least ; the part consisting of giving the responsibility to the widget to remove itself before destruction which was agreed and is IMHO a much more reliable approach.  
 
#15 AlbrechtS
00:46 Dec 04, 2008
Agreed. I intended to start over with this part (removing widgets from their parent group when deleted) as a separate modification, anyway.

I can extract this part from my branch or from my (long) working private FLTK 1.1 extensions, and add it to svn, maybe at the coming weekend.
 
 
#16 AlbrechtS
08:38 Dec 29, 2008
Attached is my proposed patch (del_widget-r_6608.patch).

The changes are:

 - Fl_Widget's destructor removes the widget from its parent group

 - Fl_Group::clear() removes widgets one by one [1] from the group and deletes them

 - Fl_Value_Input.H: added destructor

 - Fl_Value_Input.cxx: destructor "repairs" the kludge done in the constructor [2].

A similar patch works in my private copy in production code for a long time now (several months). The earlier mentioned problem(s) with static widgets turned out to be the Fl_Value_Input kludge which can't easily be removed without bigger code changes.

However, it is still useful to remove _static_ widgets from groups _before_ deleting the group itself, because otherwise the static widget's destructor would be called twice (once when deleting the group and once at program exit, when the system calls destructors of static widgets).

Please test this patch with your application code, I'm fairly sure that there are no problems with FLTK itself and with my own code, although my code heavily creates and deletes widgets.

I suggest to commit this to svn for FLTK 1.3 ASAP, hopefully after some positive feedback :-) , so that we can have more tests when it is in svn.

------
Notes:

[1] This enables a widget to deletes other widgets in its destructor and thus also to remove them from their parent groups, even if this would be the same group (this failed before and was one reason for multiple destructor calls).

[2] This kludge is currently necessary for correct event handling (and has always been there), but make the destructor fail, if it tries to remove a widget from its parent, because this parent widget is _not_ a Fl_Group.
 
 
#17 fabien
03:23 Jan 03, 2009
Tested that one on my mac 10.5 box, looks fine to me (as it did in your branch last time I also tested).
Please go ahead and apply this deletion and kludge cast patch ASAP :-)
 
 
#18 AlbrechtS
09:14 Jan 08, 2009
Fixed in Subversion repository.  
 
#19 fabien
06:13 Jan 12, 2009
works fine thanks, please don't forget to close this STR asap.  
 
#20 AlbrechtS
09:26 Feb 08, 2009
Updated comments (SVN: r6652).

Closing this STR now.
 
 
#21 AlbrechtS
17:12 Jan 14, 2023
For the record: fix version =
svn r6623 = git commit eec7f80e949dd78874741a6c17fe66ec74cec3a5

Documentation updates =
svn r6652 = git commit 3ebc315ad21eb626b3e2d7eccd46c0cb76031246
 
     

Return to Bugs & Features ]

 
 

Comments are owned by the poster. All other content is copyright 1998-2024 by Bill Spitzak and others. This project is hosted by The FLTK Team. Please report site problems to 'erco@seriss.com'.