In any case, I do have the feeling that the reason for the changes done
in my PR (and the changes itself) are not fully understood. This is why
I want to encourage question to the changes itself as I think those were
not (only) meant to fix an issue in some circumstances but actually
correct the code with test cases to prove this where possible (e.g. the
is_destroyed flag). Other testcases just try to run into the problem I
described.
Best, Alex
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexander Grund
Interdisziplinäre Anwendungsunterstützung und Koordination (IAK)
Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Chemnitzer Str. 46b, Raum 010 01062 Dresden
Tel.: +49 (351) 463-35982
E-Mail: alexand...@tu-dresden.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I confirm that the 1.66 version never ever invokes the singleton
destructor declared and defined in boost/serialization/singleton.hpp,
approximately line 175. It does not matter whether it is a debug or
release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows. A
DLL wrapper around Boost.Serialization is used at run time. The actual
Boost.Serialization library is statically linked to the DLL. The
roughly 20 line main function, test executable then uses the DLL to
invoke the serialization.
I verify the behavior using an old school "printf" approach, with
std::cout messages and static ptrdiff_t counting variables inside the
get_instance() and ~singleton() methods. I observe eleven independent
instances of the singleton class, with ZERO calls to the singleton
destructor. The eleven singletons are created with ONE instance of the
serialization object, regardless of invoking the serialize method.
The main function:
int main()
{
{
HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll");
}
/**/
std::cout << "Beginning unit test...";
std::shared_ptr<thislibrary::myclass> myc = thislibrary::create_myclass();
{
std::shared_ptr<std::ofstream> ostream =
std::make_shared<std::ofstream>();
ostream->open("C:\\temp\\leak_test.xml");
{
boost::archive::xml_oarchive xml(*ostream);
myc->serialize(xml, 1); // comment this line out and the leak remains.
}
ostream->close();
}
/* */
return 0;
}
Fundamentally, there is something between 300 to 400 bytes memory leak.
Yes, it is not much with today's world of GB RAM. However, when you
have production code that leaks, then ANY memory leak is unacceptable.
Essentially, the leak is due to the lack of the eleven singleton
instances NOT invoking the respective destructor. Plus, the group I am
part of spends tens of hours tracking down why the production product
has memory leaks. The final convincing point that must be clearly
understood is this: due to the leak, Boost.Serialization has been
replaced. If that does not convince a reader that there is a real
problem, then nothing will.
Sincerely,
Robert
>
> Best, Alex
To add on this: It is not only a memory leak. The ctor is not called. If
one relies on side effects by that, it can cause all sorts of havoc.
Thanks, Alex
I believe this has been fixed for 1.68 which can be found in the current
develop and master branch. What are the results of your test when
applied to the latest develop or release version?
By general definition, if you have two shared libraries that internally
statically link the "same" library, they must *completely hide* that
internal usage. Exactly zero of the internal library's symbols are
permitted to exist in the shared library's export list, and absolutely
never can any of its types appear in the shared library's own API.
This includes usage by inheritance or by convention (such as being
streamable to a serialization archive via templated operators).
Violation of that rule occurs frequently (because it's hard to enforce
and most of the time goes unnoticed) but it is still an ironclad rule if
you don't want weird and undefined behaviour.
Templates are a frequent cause of issues since they're very leaky by
nature, and highly susceptible to ODR issues.
In the context of Boost.Serialization, this means that you can pass the
serialized representations across shared library module boundaries but
you cannot pass archives or the serializable objects themselves across
those boundaries, unless both shared libraries also use
Boost.Serialization itself as a shared library.
In general the simplest thing to do is to either keep everything static
or everything shared; mixing leads to headaches unless a great deal of
care is taken.
Possible use case: You use Boost.Serialization in your library and use
another shared library which uses Boost.Serialization itself, but
completely independently. Nothing about it in the interface, but it
would still crash.
>>This is due to the bug I described and fixed in
https://github.com/boostorg/serialization/pull/105.
>>An extract from there only showing the bug with `is_destroyed` is
https://github.com/boostorg/serialization/pull/110
>>
>>In the current state I'm strongly against including the latest master
from Boost.Serialization. It would be better, to include the Boost 1.67
version of it which "only" contains a memory leak.
>
>We differ on the cause, the fix and the work around. Objection noted.
Ok let me put this differently and lets see how far we can agree:
1. Destruction order may or may not be as intended (For single binaries
destruction order is inverse of construction, but my test shows, that
this does not hold for this shared library of static library mix)
2. Singletons of Boost.Serialization have a method `is_destroyed` which
returns whether one can still access the singleton or not
3. `!is_destroyed` is asserted by the destructors of singletons
referring to other singletons
4. 3. will fail if 1. does not hold, the program will then crash (or UB)
5. if 1. is true, the crash can be avoided by checking `is_destroyed`
before accessing the singleton (not doing so if destroyed)
6. `is_destroyed` is broken, hence the assertion is unreliable and we
get crashes where we would first get assertion failures in debug builds
Hence the steps to fix this:
First and foremost: Test and fix `is_destroyed`
Second: implement 5.
While 1. is an assumption that is disputable I think the rest is not.
Correct me here, if I'm wrong. Hence
https://github.com/boostorg/serialization/pull/110 which adds the test
for `is_destroyed`.
Even if there is another explanation for 1. what is wrong with
implementing 5.? It will always be safe and code like
`if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo)
foo->use()`)
Alex Grund
Although crashing is not observed with the slightly older Boost releases
(e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
Boost.Serialization as Gavin describes fails, with the already described
lack of destructor invocations and memory leak.
I completely agree with your approach. I also agree that having
crashing and/or undefined behavior (UB) is not acceptable for any
release, let alone the upcoming 1.68.
--Robert
But even without this: Isn't it enough, that `is_destroyed` is broken,
to investigate in my fix? Fixing that function is the major part of the
PR. The rest is the simple check-before-use stuff.
I do acknowledge your lack of time and because the fix is so
logical/straight forward I asked for another reviewer to relieve you
from that or help out.
> The whole issue of dynamically loaded DLLS/shared libraries has been a
> festering sore for at least 20 years now. It's undefined by the
> standard and the committee has not seen fit to address it. Sorry it
> just takes time to get this right.
I also do understand this. But I have proof that your latest change
causes a (possible) crash instead of a (certain) memory leak and hence
ask you to revert that change for 1.68 or the library will be unusable
for some users. Once there is more time, the issue can be fully resolved.
Alex Grund
it's test_dll_export. It's the simplest test I could figure out.
There's another one that tests dynamic loading of DLL test_dll_plugin.
I don't run that test as I would have to dive a lot more into bjam than
I can aford to do.
> Please check the very basic singleton interface tests I even added as a
> separate PR (https://github.com/boostorg/serialization/pull/110) which
> clearly shows, that the current implementation is flawed.
I think I did that once. On this basis I added a test_singleton to the
test suite. I'm pretty
https://github.com/boostorg/serialization/blob/master/test/test_singleton.cpp
. I does check that all singletons are destroyed in the reverse order.
Note: this test doesn't appear in any of the "teeks99" tests. I have no
idea why that might be.
Also note that the current singleton implementation does no allocation
from the heap. If there is a memory leak - it's not obvious why
singleton might be causing it.
OK - i've looked again at your pr. It replaces my test with your own.
Rather than doing that, it's been my practice to add new tests. This
helps prevent turning into a process of whack-a-mole. I'll add these
two new tests and see if I can re-produce the results on my own machine.
> Additionally I have (empiric) evidence that destruction order is
> unreliable (see the test I sent around in this ML).
I looked as one or two of your tests and as a result updated the test
suite as above. So I think things are covered. I'll double check though.
> So what my proposed change is, is simply:
> a) fix the bug in the implementation shown by #110 (should be
> indisputably the right thing to do) and
LOL - sorry it's not indiputable to me.
> b) check if the singleton accessed by a singleton-destructor is
> still alive (equivalent of `if(foo) foo->bar()`)
I will check this.
> I don't see how this fixes a symptom only. It tackles the root causes: A
> defect in the code and a use-after-free caused by the unexpected
> destruction order
I do not believe there is any freeing going on here.
(although the cause for that is unclear and only know
> to be related to shared libraries).
Right. I do not think its possible to avoid problems if libs are not
unloaded in the reverse order that they are loaded. I believe that this
will be something that has to be enforced at the application level. It
would be nice if it were possible to detect this from the library, but
without any standard defined behavior, this is going to be difficult.
> I do acknowledge your lack of time and because the fix is so
> logical/straight forward I asked for another reviewer to relieve you
> from that or help out.
I don't always agree with the PRs and the analysis of other
participants. I try to give all such suggestions the attention they
deserve and recognize the hard work that they entail - even if I decline
to include them. On the other hand. Most of the suggestions do
eventually get incorporated - perhaps after some back and forth to
refine them, add/update tests, etc. It is only because of efforts by
persons such as yourself, that I have been able to maintain the library
after all these years. And it only because of these efforts that I keep
motivated to keep it as perfect as possible. You make me a better man.
>> The whole issue of dynamically loaded DLLS/shared libraries has been a
>> festering sore for at least 20 years now. It's undefined by the
>> standard and the committee has not seen fit to address it. Sorry it
>> just takes time to get this right.
> I also do understand this. But I have proof that your latest change
> causes a (possible) crash instead of a (certain) memory leak and hence
> ask you to revert that change for 1.68 or the library will be unusable
> for some users. Once there is more time, the issue can be fully resolved.
This is the crux. I do not this can be resolved definitively without a
real understanding of the source of the problem.
If you have some time we can discuss this in IRC or another platform
directly. But please test what I sent around first. Otherwise we are
running around in circles as you don't seem to believe me that there is
a crash in a valid use case without anything a user can do.