cereal serialization and polymorphism

415 views
Skip to first unread message

David Bond

unread,
Apr 28, 2015, 7:47:02 PM4/28/15
to cere...@googlegroups.com
Hello All,

I'm having trouble with my use of cereal and I'm hoping someone here can spot it easily. I'm missing something simple...

I posted this last night on stackoverflow as well:


In an abstract sense I have a large graph which I am serializing with lots of shared pointers connecting the edges and vertices. Edges (and vertices) also have attributes attached to them.

Now one of these attributes (base class) is an account (child class). Account also inherits from Idable which also is serializable. Now here are some pertinent snips of code which show some of my cereal usage. I'll explain the issue after this context:

Attribute.hpp/cpp

    class Attribute {
    ...

    template<class Archive> void serialize(Archive&)
    {   
    }   

    friend class cereal::access;
    ...

    CEREAL_REGISTER_TYPE(mgraph::Attribute)

Idable.hpp/cpp

    class Idable {
    ...

    Id id;
    
    template<class Archive> void serialize(Archive& archive)
    {
        archive(cereal::make_nvp("id", id)); 
    }

    template<class Archive> static void load_and_construct(Archive& ar, cereal::construct<mcommon::Idable>& construct)
    {
        mcommon::Id id;
        ar(id);
        construct(id);
    }

    friend class cereal::access;
    ...

    CEREAL_REGISTER_TYPE(mcommon::Idable)

Position.hpp/cpp

    class Position
    : public mgraph::Attribute
      , public mcommon::Displayable {

    template<class Archive> void serialize(Archive& archive)
    {   
        archive(cereal::make_nvp("Attribute",
                                 cereal::base_class<mgraph::Attribute>(this)));
    }   

    friend class cereal::access;
    ...

    CEREAL_REGISTER_TYPE(mfin::Position)

Account.hpp/cpp

    class Account
    : public mcommon::Idable
      , public Position {
    ...
    Currency balance;

    template<class Archive> void serialize(Archive& archive)
    {   
        archive(cereal::make_nvp("Idable",
                                 cereal::base_class<mcommon::Idable>(this)),
                cereal::make_nvp("Position",
                                 cereal::base_class<mfin::Position>(this)),
                cereal::make_nvp("balance", balance));
    }

    template<class Archive> static void load_and_construct(Archive& ar, cereal::construct<Account>& construct)
    {
        mcommon::Id iden;
        Currency::Code code;
        ar(iden, code);
        construct(iden, code);
    }

    friend class cereal::access;
    ...

    CEREAL_REGISTER_TYPE(mfin::Account)

So the problem comes when a mfin::Account is being serialized. The mfin::Account belongs to a std::list<std::shared_ptr<mgraph::Attribute>>. When we get down into the serialize function for Idable the object is invalid.

Going into gdb which halts on a segfault I go up a few stackframes to this this line: /usr/include/cereal/types/polymorphic.hpp:341. Which is:

    (gdb) list
    336
    337    auto binding = bindingMap.find(std::type_index(ptrinfo));
    338    if(binding == bindingMap.end())
    339      UNREGISTERED_POLYMORPHIC_EXCEPTION(save, cereal::util::demangle(ptrinfo.name()))
    340
    341    binding->second.shared_ptr(&ar, ptr.get());
    342  }
    343
    344  //! Loading std::shared_ptr for polymorphic types
    345  template <class Archive, class T> inline

Now here this is what ptr is:

    (gdb) print *((mfin::Account*)(ptr.get()))
    $10 = {<mcommon::Idable> = {_vptr.Idable = 0x4f0d50 <vtable for mfin::Account+16>, id = "bank"}, <mfin::Position> = {<mgraph::Attribute> = {
          _vptr.Attribute = 0x4f0d78 <vtable for mfin::Account+56>}, <mcommon::Displayable> = {_vptr.Displayable = 0x4f0da0 <vtable for mfin::Account+96>}, <No data fields>}, balance = {<mcommon::Displayable> = {
          _vptr.Displayable = 0x4f0570 <vtable for mfin::Currency+16>}, amount = 0, code = mfin::Currency::USD}}
    (gdb) print ptr
    $11 = std::shared_ptr (count 3, weak 0) 0x758ad0

Everything is looking good. But notice when I cast it to a void*:

    $11 = std::shared_ptr (count 3, weak 0) 0x758ad0
    (gdb) print *((mfin::Account*)((void*)ptr.get()))
    $12 = {<mcommon::Idable> = {_vptr.Idable = 0x4f0d78 <vtable for mfin::Account+56>, 
        id = "\363aL\000\000\000\000\000PbL\000\000\000\000\000\304\031L\000\000\000\000\000\021#L", '\000' <repeats 13 times>, " \232N", '\000' <repeats 21 times>, "P\251@\000\000\000\000\000\370\377\377\377\377\377\377\377 \232N", '\000' <repeats 21 times>, "\304\031L\000\000\000\000\000P\251@", '\000' <repeats 45 times>, "St19_Sp_counted_deleterIPN4mfin7AccountE"...}, <mfin::Position> = {<mgraph::Attribute> = {
          _vptr.Attribute = 0x4f0570 <vtable for mfin::Currency+16>}, <mcommon::Displayable> = {_vptr.Displayable = 0x0}, <No data fields>}, balance = {<mcommon::Displayable> = {_vptr.Displayable = 0x0}, amount = 49, 
        code = (unknown: 7702648)}}

This is of course what happens in binding->second.shared_ptr (seen below) which takes a const void*. 

    (gdb) list
    295            writeMetadata(ar);
    296
    297            #ifdef _MSC_VER
    298            savePolymorphicSharedPtr( ar, dptr, ::cereal::traits::has_shared_from_this<T>::type() ); // MSVC doesn't like typename here
    299            #else // not _MSC_VER
    300            savePolymorphicSharedPtr( ar, dptr, typename ::cereal::traits::has_shared_from_this<T>::type() );
    301            #endif // _MSC_VER
    302          };
    303
    304        serializers.unique_ptr =

What is wrong in my usage of cereal that would cause this?

Program received signal SIGSEGV, Segmentation fault.
0x000000000040f7cd in rapidjson::Writer<rapidjson::GenericWriteStream, rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::WriteString (this=0x7fffffffd358, 
    str=0x4f1ae0 <vtable for mfin::Account+96> "\363aL", length=4989722) at /usr/include/cereal/external/rapidjson/writer.h:276
276 if ((sizeof(Ch) == 1 || characterOk(*p)) && escape[(unsigned char)*p])  {
Missing separate debuginfos, use: debuginfo-install boost-date-time-1.55.0-8.fc21.x86_64 boost-filesystem-1.55.0-8.fc21.x86_64 boost-program-options-1.55.0-8.fc21.x86_64 boost-system-1.55.0-8.fc21.x86_64 boost-thread-1.55.0-8.fc21.x86_64 fcgi-2.4.0-24.fc21.x86_64 glog-0.3.3-3.128tech.x86_64 libgcc-4.9.2-1.fc21.x86_64 libstdc++-4.9.2-1.fc21.x86_64

Thanks,
David

David Bond

unread,
Apr 29, 2015, 3:07:19 AM4/29/15
to cere...@googlegroups.com
Ok after much investigation I believe I have the answer to my problem. And I believe this is a bug in the library.

I have produced a simple program below which demonstrates this problem. The issue stems from multiple inheritance, polymorphism, and casting. In the program below notice were we create a Derived object. The Derived object when laid out in memory will have a format approximately.:

Derived:
  Base2::vtable
  Base2::var
  Base::vtable

Consider:
(gdb) print ptr
$2 = std::shared_ptr (count 1, weak 0) 0x63c580
(gdb) print *ptr
$3 = (Derived &) @0x63c580: {<Base2> = {_vptr.Base2 = 0x421f90 <vtable for Derived+16>, var = ""}, <Base> = {_vptr.Base = 0x421fa8 <vtable for Derived+40>}, <No data fields>}

Now when we dynamic_pointer_cast it to Base we have:

(gdb) print ptr
$8 = std::shared_ptr (count 2, weak 0) 0x63c590
(gdb) print *ptr
$9 = (Base &) @0x63c590: {_vptr.Base = 0x421fa8 <vtable for Derived+40>}

This is where the problem begins. Now on /usr/include/cereal/types/polymorphic.hpp, line 341. We have this ptr to Base. Here we have:
    binding->second.shared_ptr(&ar, ptr.get());

Which ends up being a cast to a const void*. Later on based on the type info however we cast this the type from the registered polymorphic type. Since the shared_ptr points to an object of the Derived type this means a Derived*. As seen below:

272      static inline void savePolymorphicSharedPtr( Archive & ar, void const * dptr, std::false_type /* has_shared_from_this */ )
273      {
274        PolymorphicSharedPointerWrapper psptr( dptr );
275        ar( CEREAL_NVP_("ptr_wrapper", memory_detail::make_ptr_wrapper( psptr() ) ) );
276      }

Now this means down the stack ptr which is a Base* was cast to void* and then cast to Derived*. And thus the cast chain results in an invalid object. As seen below this the ptr is invalid now:

(gdb) print *ptr
$7 = (const Derived &) @0x63c590: {<Base2> = {_vptr.Base2 = 0x421fa8 <vtable for Derived+40>, var = <error reading variable: Cannot access memory at address 0x49>}, <Base> = {_vptr.Base = 0x0}, <No data fields>}

The pointer is pointing to the vtable for Base and not Derived/Base2 like it should be thus the program crashes:

{
    "ptr": {
        "polymorphic_id": 2147483649,
        "polymorphic_name": "Derived",
        "ptr_wrapper": {
            "id": 2147483649,
            "data": {
                "Base2": {

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b8e9e3 in std::string::size() const () from /lib64/libstdc++.so.6

Below is a sample program which reproduces this:

// g++ test.cpp -std=c++11 -ggdb -o test && gdb ./test
#include <cereal/archives/json.hpp>
#include <cereal/types/polymorphic.hpp>
#include <iostream>

struct Base {
    virtual void foo() { } 
    template<class Archive> void serialize(Archive& archive) { } 
};

struct Base2 {
    virtual void foo() { } 
    std::string var;
    template<class Archive> void serialize(Archive& archive) {   
        archive(cereal::make_nvp("var", var));
    }   
};

struct Derived : public Base2, public Base {
    template<class Archive> void serialize(Archive& archive) {   
        archive(cereal::make_nvp("Base2",
                                 cereal::base_class<Base2>(this)),
                cereal::make_nvp("Base",
                                 cereal::base_class<Base>(this)));
    }   
};

CEREAL_REGISTER_TYPE(Base);
CEREAL_REGISTER_TYPE(Base2);
CEREAL_REGISTER_TYPE(Derived);

int main() {
    auto ptr = std::make_shared<Derived>();
    cereal::JSONOutputArchive ar(std::cout);
    ar(cereal::make_nvp("ptr", std::dynamic_pointer_cast<Base>(ptr)));

    return 0;
}

Rand Voorhies

unread,
Apr 29, 2015, 4:09:01 PM4/29/15
to cere...@googlegroups.com
Hi David,
  I haven't had time to look into your issue in depth, but I just compiled your test code on my Mac and it ran just fine with no segfaults.  Could you try compiling with a newer/different compiler to see if that changes things?  Regardless, it looks like we don't have any unit tests for multiple polymorphic inheritance - I'll add a Github issue for this now. 

rand@Randolphs-MBP:~/Desktop$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix
rand@Randolphs-MBP:~/Desktop$ g++ test-davidbond.cpp -std=c++11 -I/Users/rand/workspace/cereal/include/
rand@Randolphs-MBP:~/Desktop$ ./a.out
{
    "ptr": {
        "polymorphic_id": 2147483649,
        "polymorphic_name": "Derived",
        "ptr_wrapper": {
            "id": 2147483649,
            "data": {
                "Base2": {
                    "var": ""
                },
                "Base": {}
            }
        }
    }
}rand@Randolphs-MBP:~/Desktop$

-- Rand

--
You received this message because you are subscribed to the Google Groups "cereal serialization library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cerealcpp+...@googlegroups.com.
To post to this group, send email to cere...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cerealcpp/3fe2a995-820f-4abf-8350-c4f341008fca%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rand Voorhies

unread,
Apr 29, 2015, 4:17:39 PM4/29/15
to cere...@googlegroups.com
FYI, I added a Github issue here: https://github.com/USCiLab/cereal/issues/187

-- Rand

w.shan...@gmail.com

unread,
Apr 29, 2015, 5:05:42 PM4/29/15
to cere...@googlegroups.com
Crashes for me on Ubuntu 14.10 with all flavors of g++ and clang.  See new bug report: https://github.com/USCiLab/cereal/issues/188

FYI David you don't need to register base classes, only the derived ones.

-- Rand


-- Rand

To unsubscribe from this group and stop receiving emails from it, send an email to cerealcpp+unsubscribe@googlegroups.com.

To post to this group, send email to cere...@googlegroups.com.

David Bond

unread,
Apr 29, 2015, 8:15:16 PM4/29/15
to w.shan...@gmail.com, cere...@googlegroups.com
Thanks yeah I'm seeing this on both gcc and clang on fedora 21. (see below).

The essential error is a shared ptr to a base class is passed in. That is internally cast to a void*. Later the void* is cast to a derived ptr. Note the void* pointed to the base vtable which isn't the same place to where the derived type should point. If you do a cast from base to derived that's fine since as the correct type of cast is chosen and the pointer value is adjusted to point to the correct location. If you cast from base to void* to derived however the pointer is never adjusted.

One fix would be always convert the ptr from the given type to the derived type before casting to a void* (or even better store as a derived ptr). I'll see if I can come up with a patch to for this but as I'm not familiar with the code base I'm sure it won't be 100% right. 

Also I suspect its not crashing on your computer as it depends what the std::string var in base2 contains. In my case that's where its hitting the segfault. Originally i was using a int var which since the int is fixed sized that means its a lot less likely to be an illegal read.

Thanks,
David

i:g++ --version
g++ (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

i:clang++ --version
clang version 3.5.0 (tags/RELEASE_350/final)
Target: x86_64-redhat-linux-gnu
Thread model: posix

i:uname -a
Linux --- 3.18.3-201.fc21.x86_64 #1 SMP Mon Jan 19 15:59:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux



-- Rand


-- Rand

To unsubscribe from this group and stop receiving emails from it, send an email to cerealcpp+...@googlegroups.com.

To post to this group, send email to cere...@googlegroups.com.


--
You received this message because you are subscribed to a topic in the Google Groups "cereal serialization library" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cerealcpp/eWwUkMPRlFU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cerealcpp+...@googlegroups.com.

To post to this group, send email to cere...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages