[boost] [Boost.Serialization] Help with PR fixing a memory leak in the current version

136 views
Skip to first unread message

Alexander Grund via Boost

unread,
Jun 25, 2018, 3:37:10 AM6/25/18
to bo...@lists.boost.org, Alexander Grund
Hi,

TLDR: I would like https://github.com/boostorg/serialization/pull/105 to
be merged for the next Boost release to fix a memory leak and need
someone to review this or help getting it merge-ready.

Details:
Some time ago I made an analysis of a crash introduced in Boost 1.65.0
in the serialization part. The details are quite complicated as the
cause is static initialization/destruction order in shared libraries
which may even depend on compilers (could not confirm) or standard
libraries (dito, observed in all tested versions).
It boils down to a violation of the expected destruction order in the
case of shared libraries. I observed how 2 global instances of the same
type but from different shared libs are destroyed together although the
2nd should not yet be which causes a use-after-free from another global
instance.
The Singletons involved did have methods to detect, whether they are
active or destroyed but they were kinda optimized away in Boost 1.65
leading to a crash which makes the whole library unusable. Not
understanding the root cause of the crash lead to changing the singleton
to use dynamic allocation, but not freeing it which leads to the memory
leak that is currently in Boost.
The current state of the develop-branch changed this back to the crash
situation we had before making it unsuitable for the next release.

Another pair of eyes would be great to check the PR and get this finally
fixed.

Thanks, Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey via Boost

unread,
Jun 25, 2018, 3:38:51 PM6/25/18
to Alexander Grund via Boost, Robert Ramey
I much appreciate your willingness to invest effort in this issue. But
my analysis and conclusions regarding the issue are totally different.
Motivated your efforts I added new tests for the singleton and spent
time enhancing tests for dll. One test was just to complicated for me
to make work with b2 so I suppress it. Only one test is failing on some
platforms - test_export. This is related to DLLs. I added windows +
mingw and msvc++ to my machine just to try to replicate the failure in
my own environment. I have been unable to do so.

I believe that this is due to some issue in libc++ or libstdc++ related
to some combination of circumstances related to dynamic loading and
possible visibility. Both of these areas are undefined by the standard
so it's natural that there might be ambiguities here.

Late last year I tried to address this dll/export problem by merging in
a fix. This fix entailed dynamically allocated a data structure in the
singleton. This fixed this failing test, but unbeknownst to me, resulted
in the memory leak. At this point there are two paths: plow forward and
make the code evermore complex to navigate around the circumstances
which make the test fail or step back and backout the fix which resulted
in the memory leak. I've chosen the latter.

I realize you've invested a lot of effort and it's disheartening not to
have it appreciated. Don't think this way. It IS appreciated. The
serialization library is still going strong only because persons such as
yourself have made these sorts of efforts. I incorporate a significant
portion of PRs submitted.

The way forward is to add some more tests which isolate the problem. I'm
aware that you've submitted one test which fails. It seems like its on
the right track. But (IIRC), it dynamically loads/unloads DLLS
containing user level serialization code in the same sequence. I think
its reasonable to insist that if a user is going to do that that such
DLLS be unloaded in reverse order. I admit that setting up these sorts
of tests is difficult and time consuming in any build environment. But
that's where I am.

Robert Ramey

Alexander Grund via Boost

unread,
Jun 26, 2018, 4:39:21 AM6/26/18
to bo...@lists.boost.org, Alexander Grund
Hi,

> I much appreciate your willingness to invest effort in this issue. 
> But my analysis and conclusions regarding the issue are totally
> different. Motivated your efforts I added new tests for the singleton
> and spent time enhancing tests for dll.  One test was just to
> complicated for me to make work with b2 so I suppress it. Only one
> test is failing on some platforms - test_export.  This is related to
> DLLs.  I added windows + mingw and msvc++ to my machine just to try to
> replicate the failure in my own environment.  I have been unable to do
> so.
I do  have (Makefile) test, which does show the behaviour described. I'm
pretty sure, if executed on another machine, it will fail too. I tried
to add it to the b2-Testsuite but failed due to having static boost in
dynamic libraries (which is the way I found where it crashes for sure).
And there is 1 testcase currently in the Boost.Serialization testsuite
that DOES fail and I'm pretty sure the issue described is the reason but
of course I cannot really confirm this as the root cause is undefined
behaviour (or ambiquities with shared libs)

> Late last year I tried to address this dll/export problem by merging
> in a fix.  This fix entailed dynamically allocated a data structure in
> the singleton. This fixed this failing test, but unbeknownst to me,
> resulted in the memory leak.
This is what I tried to describe too in the issue and MR: The leak is
simply due to no one freeing the dynamically allocated memory. This is
even deterministic and easily reproducible. I think that oversight might
be caused by you being so used to the code, that you overlook things
that might be visible to fresh eyes. This is why I would like to also
have some one else comment on the PR too.

> The way forward is to add some more tests which isolate the problem.
> I'm aware that you've submitted one test which fails.  It seems like
> its on the right track.  But (IIRC), it dynamically loads/unloads DLLS
> containing user level serialization code in the same sequence.  I
> think its reasonable to insist that if a user is going to do that that
> such DLLS be unloaded in reverse order.  I admit that setting up these
> sorts of tests is difficult and time consuming in any build
> environment.  But that's where I am.
I can provide a test for b2 (the same as I have the Makefile for) if the
boost test suite is build with `-fPIC`. This does not involve unloading
DLLs or so but simply links an executable against 2 shared libraries
which are linked against static boost. This crashes reproducibly. If
anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to
add this test to show that it fails with a crash in current code and
that my approach fixes this.

In any case, I do have the feeling that the reason for the changes done
in my PR (and the changes itself) are not fully understood. This is why
I want to encourage question to the changes itself as I think those were
not (only) meant to fix an issue in some circumstances but actually
correct the code with test cases to prove this where possible (e.g. the
is_destroyed flag). Other testcases just try to run into the problem I
described.

Best, Alex

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexander Grund
Interdisziplinäre Anwendungsunterstützung und Koordination (IAK)

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Chemnitzer Str. 46b, Raum 010 01062 Dresden
Tel.: +49 (351) 463-35982
E-Mail: alexand...@tu-dresden.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Robert via Boost

unread,
Jun 27, 2018, 5:19:41 PM6/27/18
to bo...@lists.boost.org, Robert

I confirm that the 1.66 version never ever invokes the singleton
destructor declared and defined in boost/serialization/singleton.hpp,
approximately line 175. It does not matter whether it is a debug or
release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows. A
DLL wrapper around Boost.Serialization is used at run time. The actual
Boost.Serialization library is statically linked to the DLL. The
roughly 20 line main function, test executable then uses the DLL to
invoke the serialization.

I verify the behavior using an old school "printf" approach, with
std::cout messages and static ptrdiff_t counting variables inside the
get_instance() and ~singleton() methods. I observe eleven independent
instances of the singleton class, with ZERO calls to the singleton
destructor. The eleven singletons are created with ONE instance of the
serialization object, regardless of invoking the serialize method.

The main function:

int main()
{
{
HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll");
}
/**/
std::cout << "Beginning unit test...";
std::shared_ptr<thislibrary::myclass> myc = thislibrary::create_myclass();
{
std::shared_ptr<std::ofstream> ostream =
std::make_shared<std::ofstream>();
ostream->open("C:\\temp\\leak_test.xml");
{
boost::archive::xml_oarchive xml(*ostream);
myc->serialize(xml, 1); // comment this line out and the leak remains.
}
ostream->close();
}
/* */
return 0;
}

Fundamentally, there is something between 300 to 400 bytes memory leak.
Yes, it is not much with today's world of GB RAM. However, when you
have production code that leaks, then ANY memory leak is unacceptable.
Essentially, the leak is due to the lack of the eleven singleton
instances NOT invoking the respective destructor. Plus, the group I am
part of spends tens of hours tracking down why the production product
has memory leaks. The final convincing point that must be clearly
understood is this: due to the leak, Boost.Serialization has been
replaced. If that does not convince a reader that there is a real
problem, then nothing will.

Sincerely,

Robert


>
> Best, Alex

Alexander Grund via Boost

unread,
Jun 28, 2018, 3:14:40 AM6/28/18
to bo...@lists.boost.org, Alexander Grund

> Fundamentally, there is something between 300 to 400 bytes memory
> leak. Yes, it is not much with today's world of GB RAM.  However, when
> you have production code that leaks, then ANY memory leak is
> unacceptable. Essentially, the leak is due to the lack of the eleven
> singleton instances NOT invoking the respective destructor.  Plus, the
> group I am part of spends tens of hours tracking down why the
> production product has memory leaks.  The final convincing point that
> must be clearly understood is this: due to the leak,
> Boost.Serialization has been replaced.  If that does not convince a
> reader that there is a real problem, then nothing will.
Thank you Robert. I have also added another PR that does something very
similar to your manual test. It tests that singletons are actually
destructed and that the is_destructed flag is set. Note that this is an
extract from my bigger PR and just contains the tests to prove this
point and tests are failing obviously in both 1.66 and current dev
(different parts though). Maybe we can get this one in first and then
fix it with my other PR. Would you maybe add a vote/thumbs-up for the
PRs also on Github to increase awareness?

To add on this: It is not only a memory leak. The ctor is not called. If
one relies on side effects by that, it can cause all sorts of havoc.

Thanks, Alex

Robert Ramey via Boost

unread,
Jun 28, 2018, 7:45:45 AM6/28/18
to Robert via Boost, Robert Ramey

I believe this has been fixed for 1.68 which can be found in the current
develop and master branch. What are the results of your test when
applied to the latest develop or release version?

Alexander Grund via Boost

unread,
Jun 29, 2018, 5:08:06 AM6/29/18
to bo...@lists.boost.org, Alexander Grund
Hi all again,

I have to stress the importance of a swift action/review here as the
current master now contains the flawed commit again that leads to a
certain crash when using shared libraries linked against static boost.

Reproduce with:
git clone g...@github.com:boostorg/boost.git && cd boost && git submodule
update --recursive --init
./bootstrap.sh && ./b2 variant=release link=static
--with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2
mkdir mytest && cd mytest
Create 3 files:

|test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make
sure symbols are
usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){//
Use different(!)
typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}|

export CPATH=$PWD/..
export LIBRARY_PATH=$PWD/../stage/lib/
g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared
-olibmulti_singleton1.so
g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared
-olibmulti_singleton2.so
g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2
LD_LIBRARY_PATH=. ./a.out

Result: *** Error in `./a.out': double free or corruption (fasttop):
0x0000000001accc20 ***

This is due to the bug I described and fixed in
https://github.com/boostorg/serialization/pull/105.
An extract from there only showing the bug with `is_destroyed` is
https://github.com/boostorg/serialization/pull/110

In the current state I'm strongly against including the latest master
from Boost.Serialization. It would be better, to include the Boost 1.67
version of it which "only" contains a memory leak.

Regards, Alex

Robert Ramey via Boost

unread,
Jun 29, 2018, 11:21:14 AM6/29/18
to bo...@lists.boost.org, Robert Ramey
On 6/29/18 2:07 AM, Alexander Grund via Boost wrote:
> Hi all again,
>
> I have to stress the importance of a swift action/review here as the
> current master now contains the flawed commit again that leads to a
> certain crash when using shared libraries linked against static boost.

Right now I'm bogged down in other stuff, but I did find a little time
to investigate this some more. I reviewed the information on the
failures if test_dll_export along with the corresponding config.info
output. I've found some useful things. e.g. doesn't occur on windows.
Seems to occur only on linux platforms. I've found it occurs on both
lib++ and libstdc++ evironments. I can't figure out from the test
results whether the tests are linking from static or shared standard
libraries.

I didn't realize that this occurs (only on?) configurations which create
shared libraries which link against static boost. Mixing shared
libraries is a known cause of difficulties. It many cases it seems to
work, but in many cases it also leads to unfixable failures.

I'm still looking into this.

>
> Reproduce with:
> git clone g...@github.com:boostorg/boost.git && cd boost && git submodule
> update --recursive --init
> ./bootstrap.sh && ./b2 variant=release link=static
> --with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2
> mkdir mytest && cd mytest
> Create 3 files:
>
> |test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make
> sure symbols are
> usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){//
> Use different(!)
> typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}|
>
>
> export CPATH=$PWD/..
> export LIBRARY_PATH=$PWD/../stage/lib/
> g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared
> -olibmulti_singleton1.so
> g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared
> -olibmulti_singleton2.so
> g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2
> LD_LIBRARY_PATH=. ./a.out
>
> Result: *** Error in `./a.out': double free or corruption (fasttop):
> 0x0000000001accc20 ***

I'll look into your test.

>
> This is due to the bug I described and fixed in
> https://github.com/boostorg/serialization/pull/105.
> An extract from there only showing the bug with `is_destroyed` is
> https://github.com/boostorg/serialization/pull/110
>
> In the current state I'm strongly against including the latest master
> from Boost.Serialization. It would be better, to include the Boost 1.67
> version of it which "only" contains a memory leak.

We differ on the cause, the fix and the work around. Objection noted.

Robert Ramey

Gavin Lambert via Boost

unread,
Jul 1, 2018, 8:49:01 PM7/1/18
to bo...@lists.boost.org, Gavin Lambert
On 30/06/2018 03:20, Robert Ramey wrote:
> I didn't realize that this occurs (only on?) configurations which create
> shared libraries which link against static boost.  Mixing shared
> libraries is a known cause of difficulties.  It many cases it seems to
> work, but in many cases it also leads to unfixable failures.

By general definition, if you have two shared libraries that internally
statically link the "same" library, they must *completely hide* that
internal usage. Exactly zero of the internal library's symbols are
permitted to exist in the shared library's export list, and absolutely
never can any of its types appear in the shared library's own API.

This includes usage by inheritance or by convention (such as being
streamable to a serialization archive via templated operators).

Violation of that rule occurs frequently (because it's hard to enforce
and most of the time goes unnoticed) but it is still an ironclad rule if
you don't want weird and undefined behaviour.

Templates are a frequent cause of issues since they're very leaky by
nature, and highly susceptible to ODR issues.


In the context of Boost.Serialization, this means that you can pass the
serialized representations across shared library module boundaries but
you cannot pass archives or the serializable objects themselves across
those boundaries, unless both shared libraries also use
Boost.Serialization itself as a shared library.

In general the simplest thing to do is to either keep everything static
or everything shared; mixing leads to headaches unless a great deal of
care is taken.

Gavin Lambert via Boost

unread,
Jul 1, 2018, 9:01:45 PM7/1/18
to bo...@lists.boost.org, Gavin Lambert
Mere moments ago, quoth I:
> In the context of Boost.Serialization, this means that you can pass the
> serialized representations across shared library module boundaries but
> you cannot pass archives or the serializable objects themselves across
> those boundaries, unless both shared libraries also use
> Boost.Serialization itself as a shared library.

To put this a simpler way:

If you want to be able to use Boost.Serialization as a static library
within shared libraries, then you must make absolutely certain that
there are exactly zero #includes or references of any kind to
Boost.Serialization's header files in any of the public header files of
the shared library.

If everything compiles after that, then you should be ok. This usually
requires making use of PIMPL techniques.

degski via Boost

unread,
Jul 1, 2018, 11:26:50 PM7/1/18
to boost, degski
On 2 July 2018 at 03:48, Gavin Lambert via Boost <bo...@lists.boost.org>
wrote:
This is very useful information, in any context, be it Boost or not.

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

Alexander Grund via Boost

unread,
Jul 2, 2018, 3:22:32 AM7/2/18
to bo...@lists.boost.org, Alexander Grund
> To put this a simpler way:
>
> If you want to be able to use Boost.Serialization as a static library
> within shared libraries, then you must make absolutely certain that
> there are exactly zero #includes or references of any kind to
> Boost.Serialization's header files in any of the public header files
> of the shared library.
>
> If everything compiles after that, then you should be ok.  This
> usually requires making use of PIMPL techniques.
Please have a look at my test: The interfaces of the shared libraries do
NOT contain anything boost related. Only very simple C functions that
are called to make sure, the singletons within Boost are instantiated.
So your preconditions are met, yet the failure occurs.

Possible use case: You use Boost.Serialization in your library and use
another shared library which uses Boost.Serialization itself, but
completely independently. Nothing about it in the interface, but it
would still crash.

>>This is due to the bug I described and fixed in
https://github.com/boostorg/serialization/pull/105.
>>An extract from there only showing the bug with `is_destroyed` is
https://github.com/boostorg/serialization/pull/110
>>
>>In the current state I'm strongly against including the latest master
from Boost.Serialization. It would be better, to include the Boost 1.67
version of it which "only" contains a memory leak.
>
>We differ on the cause, the fix and the work around.  Objection noted.

Ok let me put this differently and lets see how far we can agree:

1. Destruction order may or may not be as intended (For single binaries
destruction order is inverse of construction, but my test shows, that
this does not hold for this shared library of static library mix)
2. Singletons of Boost.Serialization have a method `is_destroyed` which
returns whether one can still access the singleton or not
3. `!is_destroyed` is asserted by the destructors of singletons
referring to other singletons
4. 3. will fail if 1. does not hold, the program will then crash (or UB)
5. if 1. is true, the crash can be avoided by checking `is_destroyed`
before accessing the singleton (not doing so if destroyed)
6. `is_destroyed` is broken, hence the assertion is unreliable and we
get crashes where we would first get assertion failures in debug builds

Hence the steps to fix this:
First and foremost: Test and fix `is_destroyed`
Second: implement 5.

While 1. is an assumption that is disputable I think the rest is not.
Correct me here, if I'm wrong. Hence
https://github.com/boostorg/serialization/pull/110 which adds the test
for `is_destroyed`.
Even if there is another explanation for 1. what is wrong with
implementing 5.? It will always be safe and code like
`if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo)
foo->use()`)

Alex Grund

Robert via Boost

unread,
Jul 2, 2018, 9:31:25 AM7/2/18
to bo...@lists.boost.org, Robert
On 7/2/2018 2:22 AM, Alexander Grund via Boost wrote:
>> To put this a simpler way:
>>
>> If you want to be able to use Boost.Serialization as a static library
>> within shared libraries, then you must make absolutely certain that
>> there are exactly zero #includes or references of any kind to
>> Boost.Serialization's header files in any of the public header files
>> of the shared library.
>>
>> If everything compiles after that, then you should be ok.  This
>> usually requires making use of PIMPL techniques.
> Please have a look at my test: The interfaces of the shared libraries do
> NOT contain anything boost related. Only very simple C functions that
> are called to make sure, the singletons within Boost are instantiated.
> So your preconditions are met, yet the failure occurs.
>
> Possible use case: You use Boost.Serialization in your library and use
> another shared library which uses Boost.Serialization itself, but
> completely independently. Nothing about it in the interface, but it
> would still crash.

Although crashing is not observed with the slightly older Boost releases
(e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
Boost.Serialization as Gavin describes fails, with the already described
lack of destructor invocations and memory leak.

I completely agree with your approach. I also agree that having
crashing and/or undefined behavior (UB) is not acceptable for any
release, let alone the upcoming 1.68.

--Robert

Alexander Grund via Boost

unread,
Jul 2, 2018, 9:40:45 AM7/2/18
to bo...@lists.boost.org, Alexander Grund

> Although crashing is not observed with the slightly older Boost
> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
> Boost.Serialization as Gavin describes fails, with the already
> described lack of destructor invocations and memory leak.
Did you test the code I sent around with static boost and those
versions? I expect a crash in 1.68 (if the commit is not reverted) and
in 1.65.x, so if you don't experience one I'd be interested in details
(OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should
not leak, but crash. 1.66/1.67 have the leak.

Alex

Robert Ramey via Boost

unread,
Jul 2, 2018, 11:25:14 AM7/2/18
to Alexander Grund via Boost, Robert Ramey
On 7/2/18 6:40 AM, Alexander Grund via Boost wrote:
>
>> Although crashing is not observed with the slightly older Boost
>> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
>> Boost.Serialization as Gavin describes fails, with the already
>> described lack of destructor invocations and memory leak.

> Did you test the code I sent around with static boost and those
> versions? I expect a crash in 1.68 (if the commit is not reverted) and
> in 1.65.x, so if you don't experience one I'd be interested in details
> (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should
> not leak, but crash. 1.66/1.67 have the leak.

Here's what I did. I looked at the serialization test matrix:

https://www.boost.org/development/tests/develop/developer/serialization.html

The interesting test is test_dll_exported. I compared this table
against the config.info:

https://www.boost.org/development/tests/develop/developer/config.html

And I also looked at the test command line.

test results to see if I could find a commonality between the test
failures and the configuration being tested. I found that:

a) Test failed only on linux platforms. But the test doesn't fail on
all linux platforms.
b) Test never fails on windows platforms.
c) tests would fail with both clang and gcc compilers
d) Test failure seems pretty reliable. Exception: the one test which
passes uses gcc 7.3
e) I can't discern what the build switch settings are from the python
command line. That is I would like to know if it was built with
link=shared/static and runtime-link=shared/static. I believe that this
information would be significant.
f) a couple of ARM tests pass - one fails.

Much has been made of a memory leak. The current version 1.68 on
develop and master branches removed the dynamic allocation in the
singleton module. So if there is a memory leak it would be more subtle
than first meets the eye. I believe this memory leak was introduced in
efforts to make this test pass for 1.67 (or maybe 1.66 I don't remember).

I believe the correct way to approach this is to analyze your new test
and also test_dll_export in light of Gavin Lambert's comments above. If
this works out, I would then add this information to the serialization
library documentation.

Unfortunately, I'm bogged down in other stuff so I can't address it
instantaneously, but it I do consider it important to investigate further.

Robert Ramey

Robert via Boost

unread,
Jul 2, 2018, 11:40:00 AM7/2/18
to bo...@lists.boost.org, Robert
On 7/2/2018 8:40 AM, Alexander Grund via Boost wrote:
>
>> Although crashing is not observed with the slightly older Boost
>> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
>> Boost.Serialization as Gavin describes fails, with the already
>> described lack of destructor invocations and memory leak.
> Did you test the code I sent around with static boost and those
> versions?

No, I have not tested the 1.68 revision. If my day goes well enough
with my primary tasks, I will build 1.68 early this evening or possibly
July 5th. This will be strictly on Windows with both Microsoft 15.6.4
and Intel C++ 18.0, Update 3.

--Robert

Alexander Grund via Boost

unread,
Jul 2, 2018, 11:46:22 AM7/2/18
to bo...@lists.boost.org, Alexander Grund

> test results to see if I could find a commonality between the test
> failures and the configuration being tested.  I found that:
>
> a) Test failed only on linux platforms.  But the test doesn't fail on
> all linux platforms.
> b) Test never fails on windows platforms.
> c) tests would fail with both clang and gcc compilers
> d) Test failure seems pretty reliable.  Exception: the one test which
> passes uses gcc 7.3
> e) I can't discern what the build switch settings are from the python
> command line.  That is I would like to know if it was built with
> link=shared/static and runtime-link=shared/static.  I believe that
> this information would be significant.
> f) a couple of ARM tests pass - one fails.
Great! I found at least for
https://www.boost.org/development/tests/develop/developer/output/BI1-Debian-Stretch-boost-bin-v2-libs-serialization-test-test_dll_exported-test-gcc-4-9-4-debug.html
that boost was build with link=shared. I don't think runtime-link is
tested? At least I'd expect a reference to that in the link command. But
I don't think it matters much, if the test I sent is considered formally
valid (see below)

> Much has been made of a memory leak.  The current version 1.68 on
> develop and master branches removed the dynamic allocation in the
> singleton module.  So if there is a memory leak it would be more
> subtle than first meets the eye.  I believe this memory leak was
> introduced in efforts to make this test pass for 1.67 (or maybe 1.66 I
> don't remember).
FYI: The leak was introduced in 1.66, the crash in 1.65.0 and again in
current master.

> I believe the correct way to approach this is to analyze your new test
> and also test_dll_export in light of Gavin Lambert's comments above. 
> If this works out, I would then add this information to the
> serialization library documentation.
To make it simple I'd suggest you concentrate on my new test. I'm sure
it meets Gavin Lambert's requirements of not leaking Boost-"internals"
outside of the shared libs used and the crash is easy to reproduce (at
least on my side`on every linux machine). If you can reproduce this even
on a single machine on your side, then this is all argument it needs
that there is something in need of fixing.

But even without this: Isn't it enough, that `is_destroyed` is broken,
to investigate in my fix? Fixing that function is the major part of the
PR. The rest is the simple check-before-use stuff.

degski via Boost

unread,
Jul 2, 2018, 11:54:02 AM7/2/18
to boost, degski
On 2 July 2018 at 18:39, Robert via Boost <bo...@lists.boost.org> wrote:

> This will be strictly on Windows with both Microsoft 15.6.4 ...


With the current level of flux in VC development, you really should
consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
better, move to 15.8 Preview 3 (that's the advice STL gave me in another
issue).

Have a good day,

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

Robert Ramey via Boost

unread,
Jul 2, 2018, 12:22:07 PM7/2/18
to bo...@lists.boost.org, Robert Ramey
On 7/2/18 8:46 AM, Alexander Grund via Boost wrote:
> But even without this: Isn't it enough, that `is_destroyed` is broken,
> to investigate in my fix? Fixing that function is the major part of the
> PR. The rest is the simple check-before-use stuff

I will also investigate "is_destroyed"

I agree that everything has to come up clean and correct. I'm shooting
for "no loose ends". But the issues are complicated for, among other
things:

a) DLL/Shared libraries are not defined by the standard.
b) different vendors have different keywords/requirements
c) visibility is not defined by the standard either - but it's almost
essential.

We're navigating a mine field here.

Robert Ramey

Robert Ramey via Boost

unread,
Jul 2, 2018, 12:27:28 PM7/2/18
to bo...@lists.boost.org, Robert Ramey
On 7/2/18 8:53 AM, degski via Boost wrote:
> On 2 July 2018 at 18:39, Robert via Boost <bo...@lists.boost.org> wrote:
>
>> This will be strictly on Windows with both Microsoft 15.6.4 ...
>
>
> With the current level of flux in VC development, you really should
> consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
> better, move to 15.8 Preview 3 (that's the advice STL gave me in another
> issue).

But many/most users don't have the latest MSVC version - and the library
has to work on at least the most recent ones.

>
> Have a good day,
>
> degski
>



degski via Boost

unread,
Jul 2, 2018, 12:52:39 PM7/2/18
to boost, degski
On 2 July 2018 at 19:22, Robert Ramey via Boost <bo...@lists.boost.org>
wrote:

> But many/most users don't have the latest MSVC version


Given how easy the upgrade process has become, I suspect, you're wrong
here. Also, users (not corporate ones, but they are not going to adopt
Boost 1.68 either at release time anyway, so that's irrelevant) are looking
forward now to the (C++17) stuff that will work (and doesn't yet work with
their current version, a minor version difference can and will make all the
difference).


> - and the library has to work on at least the most recent ones.
>

Today's Preview is tomorrow's Stable, and will be outdated 2 weeks later,
that's how things are if you really "wonna make some progress" (you should
see the number of bugs and fixes on vcpkg, it's impressive), it's a wild
ride, but it's fun. As you are a dev of an important library, your stable
is today's Preview.

Robert via Boost

unread,
Jul 2, 2018, 12:54:49 PM7/2/18
to bo...@lists.boost.org, Robert
On 7/2/2018 10:53 AM, degski via Boost wrote:
> On 2 July 2018 at 18:39, Robert via Boost <bo...@lists.boost.org> wrote:
>
>> This will be strictly on Windows with both Microsoft 15.6.4 ...
>
>
> With the current level of flux in VC development, you really should
> consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
> better, move to 15.8 Preview 3 (that's the advice STL gave me in another
> issue).

I would love to, but am intentionally delaying for work reasons. Some
of the team have updated to 15.7.4. Because of my using the older
Microsoft version, it takes only a few minutes to track down the root
cause of a recent issue.

This may not be known to news group members here. But, the Windows
Intel C++ compiler is heavily dependent on the Microsoft supplied
standard library header files (e.g. from 2015:
https://software.intel.com/en-us/forums/intel-c-compiler/topic/596318).
Similarly, in a Mac oriented post, there is a dependency on the gcc
header files (e.g. from 2013:
https://software.intel.com/en-us/forums/intel-c-compiler/topic/366102).

I have already submitted an issue with Intel that the Visual Studio
15.7.4 headers cause Intel 18.0, Update 3 to produce thousands of
spurious errors on the std::variant. The 15.6.4 std::variant version
does not cause the invalid error output from Intel. Plus, only the 19.0
Intel C++ is planned to address the problem(s).

--Robert

>
> Have a good day,
>
> degski
>


degski via Boost

unread,
Jul 2, 2018, 1:10:50 PM7/2/18
to boost, degski
On 2 July 2018 at 19:54, Robert via Boost <bo...@lists.boost.org> wrote:

> Some of the team have updated to 15.7.4. Because of my using the older
> Microsoft version ...
>

You can install the preview side-by-side with the stable version, you'd
have BOBW (another advice from STL).

This may not be known to news group members here. But, the Windows Intel
> C++ compiler is heavily dependent on the Microsoft supplied standard
> library header files (e.g. from 2015: https://software.intel.com/en-
> us/forums/intel-c-compiler/topic/596318).


You said it, it's a compiler, not an implementation of a STL.


> Similarly, in a Mac oriented post, there is a dependency on the gcc header
> files (e.g. from 2013: https://software.intel.com/en-
> us/forums/intel-c-compiler/topic/366102).
>

Luckily I know nothing about Mac (in the same category as fuckbook.com,
twitter (I don't own a phone) and smartphones in general (fat fingers, but
my wife keeps asking me to debug her bloody Android phone, shit)).


> I have already submitted an issue with Intel that the Visual Studio 15.7.4
> headers cause Intel 18.0, Update 3 to produce thousands of spurious errors
> on the std::variant.
>

Yeah, I know from experience that they are very slow in reacting (if they
react at all, which is usually not the case). F.e. the way they detect the
compiler in tbb/mkl does not work with Clang/LLVM on windows, because they
do the detection (if it should work) in the wrong order (they should check
for VC first and Clang later, can't go wrong on linux/Mac), seems pretty
simple, I filed it, but it is still as it used to be.

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

Robert via Boost

unread,
Jul 2, 2018, 3:04:14 PM7/2/18
to bo...@lists.boost.org, Robert
On 7/2/2018 12:10 PM, degski via Boost wrote:
> On 2 July 2018 at 19:54, Robert via Boost <bo...@lists.boost.org> wrote:
>
>> Some of the team have updated to 15.7.4. Because of my using the older
>> Microsoft version ...
>>
>
> You can install the preview side-by-side with the stable version, you'd
> have BOBW (another advice from STL).

Yeah, I am not too excited about that on my primary desktop. I might
install the preview on a VM though. Presently, the team continues to
find newly released "features" that break stuff. Hence, my now four
decades (and growing) distrust...

>
> This may not be known to news group members here. But, the Windows Intel
>> C++ compiler is heavily dependent on the Microsoft supplied standard
>> library header files (e.g. from 2015: https://software.intel.com/en-
>> us/forums/intel-c-compiler/topic/596318).
>
>
> You said it, it's a compiler, not an implementation of a STL.
>
>
>> Similarly, in a Mac oriented post, there is a dependency on the gcc header
>> files (e.g. from 2013: https://software.intel.com/en-
>> us/forums/intel-c-compiler/topic/366102).
>>
>
> Luckily I know nothing about Mac (in the same category as fuckbook.com,
> twitter (I don't own a phone) and smartphones in general (fat fingers, but
> my wife keeps asking me to debug her bloody Android phone, shit)).
>
>
>> I have already submitted an issue with Intel that the Visual Studio 15.7.4
>> headers cause Intel 18.0, Update 3 to produce thousands of spurious errors
>> on the std::variant.
>>
>
> Yeah, I know from experience that they are very slow in reacting (if they
> react at all, which is usually not the case). F.e. the way they detect the
> compiler in tbb/mkl does not work with Clang/LLVM on windows, because they
> do the detection (if it should work) in the wrong order (they should check
> for VC first and Clang later, can't go wrong on linux/Mac), seems pretty
> simple, I filed it, but it is still as it used to be.

I have them fixing a couple other items too. As far as speedily
resolving something, that all depends on what the Intel group determines
is important. Tea leaf interpretation or flipping a coin 50 times, is
about the same prediction accuracy. :)

--Robert

>
> degski

degski via Boost

unread,
Jul 2, 2018, 11:38:40 PM7/2/18
to boost, degski
On 2 July 2018 at 22:03, Robert via Boost <bo...@lists.boost.org> wrote:

> Yeah, I am not too excited about that on my primary desktop. I might
> install the preview on a VM though. Presently, the team continues to find
> newly released "features" that break stuff. Hence, my now four decades
> (and growing) distrust...
>

You are not wrong here.

... Tea leaf interpretation or flipping a coin 50 times, is about the same
> prediction accuracy. :)
>

LOL

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

Olaf van der Spek via Boost

unread,
Jul 3, 2018, 4:15:19 AM7/3/18
to bo...@lists.boost.org, Olaf van der Spek, Robert
On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <bo...@lists.boost.org> wrote:
> Yeah, I am not too excited about that on my primary desktop. I might
> install the preview on a VM though. Presently, the team continues to find
> newly released "features" that break stuff. Hence, my now four decades (and
> growing) distrust...

In VC or in their code?
Either way, the sooner it's found the sooner it can be fixed.


--
Olaf

Robert via Boost

unread,
Jul 3, 2018, 9:34:47 AM7/3/18
to bo...@lists.boost.org, Robert
On 7/3/2018 3:14 AM, Olaf van der Spek via Boost wrote:
> On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <bo...@lists.boost.org> wrote:
>> Yeah, I am not too excited about that on my primary desktop. I might
>> install the preview on a VM though. Presently, the team continues to find
>> newly released "features" that break stuff. Hence, my now four decades (and
>> growing) distrust...
>
> In VC or in their code?
Windows SDK, ATL, COM, C++ compiler conformance, standard library header
files, ODBC driver to SQL Server, etc.

> Either way, the sooner it's found the sooner it can be fixed.
>
>
In the group I am part of, as soon as the root cause is determined, an
issue is filed. How soon the Visual Studio C++ team responds with a
resolution, to something that previously is not broken, is non
deterministic, however. The tea leaf and coin flipping prediction
statement applies here too. :)

--Robert

Alexander Grund via Boost

unread,
Jul 11, 2018, 3:11:46 AM7/11/18
to bo...@lists.boost.org, Alexander Grund
Hi all,

with the release of 1.68 beta I want to push this again. Have you tried
the example I sent around? If so, did it crash on (probably) every linux
system too?
For me it did which makes me consider this as a showstopper and I would
like to propose again to simply revert the last change to the version
with the memory leak which is way less serious than a crash. Of course
merging my fix would be preferred as it fixes all problems but it seems
that the review of that might take to long for 1.68.

Regards, Alex

Robert via Boost

unread,
Jul 11, 2018, 10:04:48 AM7/11/18
to bo...@lists.boost.org, Robert
On 7/11/2018 2:11 AM, Alexander Grund via Boost wrote:
> Hi all,
>
> with the release of 1.68 beta I want to push this again. Have you tried
> the example I sent around? If so, did it crash on (probably) every linux
> system too?
> For me it did which makes me consider this as a showstopper and I would
> like to propose again to simply revert the last change to the version
> with the memory leak which is way less serious than a crash. Of course
> merging my fix would be preferred as it fixes all problems but it seems
> that the review of that might take to long for 1.68.

I agree with you. IMHO, the PR should be merged. At a minimum, the
1.68 release notes should list where to find the PR. Then, the users
decide whether to use it. I am only one concerned professional user
though. I am sure others will voice their thoughts too.

Sincerely,

Robert

Robert Ramey via Boost

unread,
Jul 11, 2018, 12:15:27 PM7/11/18
to Alexander Grund via Boost, Robert Ramey
On 7/11/18 12:11 AM, Alexander Grund via Boost wrote:
> Hi all,
>
> with the release of 1.68 beta I want to push this again. Have you tried
> the example I sent around?
Sorry, I haven't had time to investigate this.

> For me it did which makes me consider this as a showstopper and I would
> like to propose again to simply revert the last change to the version
> with the memory leak which is way less serious than a crash. Of course
> merging my fix would be preferred as it fixes all problems but it seems
> that the review of that might take to long for 1.68.
The problem is that I don't see any connection between your proposed
change and the failed test. I'm not disputing that your change makes a
problematic symptom go away. But I haven't been able to invest enough
time to really understand why that might be so.

The whole issue of dynamically loaded DLLS/shared libraries has been a
festering sore for at least 20 years now. It's undefined by the
standard and the committee has not seen fit to address it. Sorry it just
takes time to get this right.

Robert Ramey

Alexander Grund via Boost

unread,
Jul 11, 2018, 12:32:34 PM7/11/18
to bo...@lists.boost.org, Alexander Grund

> The problem is that I don't see any connection between your proposed
> change and the failed test.  I'm not disputing that your change makes
> a problematic symptom go away.  But I haven't been able to invest
> enough time to really understand why that might be so.
Please do not focus on that currently checked in test (test_dll_load or
what it was) because that one is to complex to reason about.
Please check the very basic singleton interface tests I even added as a
separate PR (https://github.com/boostorg/serialization/pull/110) which
clearly shows, that the current implementation is flawed.
Additionally I have (empiric) evidence that destruction order is
unreliable (see the test I sent around in this ML).
So what my proposed change is, is simply:
     a) fix the bug in the implementation shown by #110 (should be
indisputably the right thing to do) and
     b) check if the singleton accessed by a singleton-destructor is
still alive (equivalent of `if(foo) foo->bar()`)
I don't see how this fixes a symptom only. It tackles the root causes: A
defect in the code and a use-after-free caused by the unexpected
destruction order (although the cause for that is unclear and only know
to be related to shared libraries).

I do acknowledge your lack of time and because the fix is so
logical/straight forward I asked for another reviewer to relieve you
from that or help out.


> The whole issue of dynamically loaded DLLS/shared libraries has been a
> festering sore for at least 20 years now.  It's undefined by the
> standard and the committee has not seen fit to address it. Sorry it
> just takes time to get this right.

I also do understand this. But I have proof that your latest change
causes a (possible) crash instead of a (certain) memory leak and hence
ask you to revert that change for 1.68 or the library will be unusable
for some users. Once there is more time, the issue can be fully resolved.

Alex Grund

Robert Ramey via Boost

unread,
Jul 11, 2018, 4:53:45 PM7/11/18
to Alexander Grund via Boost, Robert Ramey
On 7/11/18 9:32 AM, Alexander Grund via Boost wrote:
>
>> The problem is that I don't see any connection between your proposed
>> change and the failed test.  I'm not disputing that your change makes
>> a problematic symptom go away.  But I haven't been able to invest
>> enough time to really understand why that might be so.
> Please do not focus on that currently checked in test (test_dll_load or
> what it was) because that one is to complex to reason about.

it's test_dll_export. It's the simplest test I could figure out.
There's another one that tests dynamic loading of DLL test_dll_plugin.
I don't run that test as I would have to dive a lot more into bjam than
I can aford to do.

> Please check the very basic singleton interface tests I even added as a
> separate PR (https://github.com/boostorg/serialization/pull/110) which
> clearly shows, that the current implementation is flawed.

I think I did that once. On this basis I added a test_singleton to the
test suite. I'm pretty
https://github.com/boostorg/serialization/blob/master/test/test_singleton.cpp
. I does check that all singletons are destroyed in the reverse order.
Note: this test doesn't appear in any of the "teeks99" tests. I have no
idea why that might be.

Also note that the current singleton implementation does no allocation
from the heap. If there is a memory leak - it's not obvious why
singleton might be causing it.

OK - i've looked again at your pr. It replaces my test with your own.
Rather than doing that, it's been my practice to add new tests. This
helps prevent turning into a process of whack-a-mole. I'll add these
two new tests and see if I can re-produce the results on my own machine.

> Additionally I have (empiric) evidence that destruction order is
> unreliable (see the test I sent around in this ML).

I looked as one or two of your tests and as a result updated the test
suite as above. So I think things are covered. I'll double check though.

> So what my proposed change is, is simply:
>      a) fix the bug in the implementation shown by #110 (should be
> indisputably the right thing to do) and

LOL - sorry it's not indiputable to me.

>      b) check if the singleton accessed by a singleton-destructor is
> still alive (equivalent of `if(foo) foo->bar()`)

I will check this.

> I don't see how this fixes a symptom only. It tackles the root causes: A
> defect in the code and a use-after-free caused by the unexpected
> destruction order

I do not believe there is any freeing going on here.

(although the cause for that is unclear and only know
> to be related to shared libraries).

Right. I do not think its possible to avoid problems if libs are not
unloaded in the reverse order that they are loaded. I believe that this
will be something that has to be enforced at the application level. It
would be nice if it were possible to detect this from the library, but
without any standard defined behavior, this is going to be difficult.

> I do acknowledge your lack of time and because the fix is so
> logical/straight forward I asked for another reviewer to relieve you
> from that or help out.

I don't always agree with the PRs and the analysis of other
participants. I try to give all such suggestions the attention they
deserve and recognize the hard work that they entail - even if I decline
to include them. On the other hand. Most of the suggestions do
eventually get incorporated - perhaps after some back and forth to
refine them, add/update tests, etc. It is only because of efforts by
persons such as yourself, that I have been able to maintain the library
after all these years. And it only because of these efforts that I keep
motivated to keep it as perfect as possible. You make me a better man.

>> The whole issue of dynamically loaded DLLS/shared libraries has been a
>> festering sore for at least 20 years now.  It's undefined by the
>> standard and the committee has not seen fit to address it. Sorry it
>> just takes time to get this right.

> I also do understand this. But I have proof that your latest change
> causes a (possible) crash instead of a (certain) memory leak and hence
> ask you to revert that change for 1.68 or the library will be unusable
> for some users. Once there is more time, the issue can be fully resolved.

This is the crux. I do not this can be resolved definitively without a
real understanding of the source of the problem.

Alexander Grund via Boost

unread,
Jul 12, 2018, 3:45:52 AM7/12/18
to bo...@lists.boost.org, Alexander Grund

> it's test_dll_export.  It's the simplest test I could figure out.
> There's another one that tests dynamic loading of DLL test_dll_plugin.
> I don't run that test as I would have to dive a lot more into bjam
> than I can aford to do.
Did you test what I sent around in this ML? As already explained I could
not get it into bjam either, but it does show the crash(!) on all linux
systems without further fiddling highlighting the need of an action.

> Also note that the current singleton implementation does no allocation
> from the heap.  If there is a memory leak - it's not obvious why
> singleton might be causing it.
The current implementation does NOT have a leak but a CRASH. Thats why I
want you to reverse that as the former is less severe (see above).

> OK - i've looked again at your pr.  It replaces my test with your own.
> Rather than doing that, it's been my practice to add new tests.  This
> helps prevent turning into a process of whack-a-mole.  I'll add these
> two new tests and see if I can re-produce the results on my own machine.
My test was developed before yours, so I had a merge conflict. Looking
at your test I found that it tests the wrong thing: Instead of testing
that the interface of singleton is correct, it tests the
construction/destruction order. That is guaranteed by the compiler,
hence you actually test the compiler. As that order only changes in the
context of shared libraries I did not see any added value of the test,
hence the complete overwrite (yeah, sorry, but was the easiest and as
explained a test of the interface is better than a test of the compiler)

>> Additionally I have (empiric) evidence that destruction order is
>> unreliable (see the test I sent around in this ML).
> I looked as one or two of your tests and as a result updated the test
> suite as above. So I think things are covered.  I'll double check though.
If you had run the test I sent around you'd see that this is not
covered. Destruction order is guaranteed except in the case of shared
libraries which is hard to get into bjam (well actually not, but it
requires building with fPIC. If that's possible/allowed, I'll add a test
to bjam that will crash on travis although it is very valid.

>> So what my proposed change is, is simply:
>>       a) fix the bug in the implementation shown by #110 (should be
>> indisputably the right thing to do) and
>
> LOL - sorry it's not indiputable to me.
Can you explain that a bit more? #110 shows that `is_destroyed` does
return a wrong value. Why would it be not right to fix that?

>> I don't see how this fixes a symptom only. It tackles the root
>> causes: A defect in the code and a use-after-free caused by the
>> unexpected destruction order
>
> I do not believe there is any freeing going on here.
An object is used after it is destroyed. That object has a map which is
freed on destruction but accessed afterwards. Try valgrind on the code I
sent around and it will show you the use-after-free.

> Right.  I do not think its possible to avoid problems if libs are not
> unloaded in the reverse order that they are loaded. I believe that
> this will be something that has to be enforced at the application
> level.  It would be nice if it were possible to detect this from the
> library, but without any standard defined behavior, this is going to
> be difficult.
How can this be enforced if there is no (manual) dynamic loading going
on? You simply link 2 shared libraries and the application crashes on
exit. Nothing a user can do about that.

>> I also do understand this. But I have proof that your latest change
>> causes a (possible) crash instead of a (certain) memory leak and
>> hence ask you to revert that change for 1.68 or the library will be
>> unusable for some users. Once there is more time, the issue can be
>> fully resolved.
>
> This is the crux.  I do not this can be resolved definitively without
> a real understanding of the source of the problem.
Well the source is indeterminate destruction order. You could try to
find the source for that or take it as given and handle it properly
which is easily possible.

If you have some time we can discuss this in IRC or another platform
directly. But please test what I sent around first. Otherwise we are
running around in circles as you don't seem to believe me that there is
a crash in a valid use case without anything a user can do.

Alexander Grund via Boost

unread,
Jul 12, 2018, 6:30:24 AM7/12/18
to bo...@lists.boost.org, Alexander Grund
FWIW I opened another PR to show the crash also on travis:
https://github.com/boostorg/serialization/pull/111. See the (expected as
described) failures with LINK=static on Linux.
I hope #111 is enough to show, that the latest change needs to be
reverted for 1.68.

#110 together with #111 should show clearly, where the implementation is
broken.
Reply all
Reply to author
Forward
0 new messages