Multiple different serialization definitions for the same type and archive in one binary

78 views
Skip to first unread message

Nikolaus Demmel

unread,
Jul 31, 2019, 5:53:30 AM7/31/19
to cereal serialization library
Hi,

I would like to use cereal to interface between two projects, that both use cereal by themselves.

In one project A, I'd like to include headers from project B that implement the serialization of some datatypes such than I can load such files in project A.

Now the issue is, that both projects define serialization for some thirdparty library type, lets say foo::bar (in reality, its Eigen::Matrix<...>), but unfortunately the serialization format is slightly different. It would be the simplest, if I could make them compatible, but for legacy reasons I cannot change it in either project.

So what happens, when I include two different definitions like

template <class Archive> inline void serialize(Archive& ar, foo::bar baz) {
 
...
 
}

in the same binary? I can isolate the two use cases into separate translation units, but if the same template function is instantiated multiple times, wouldn't the linker in the end just pick one and discard the other, assuming that they are all equivalent, even if here they are actually not? So it seems like doing this may work, or not, depending on what gets inlined where...

I tried using https://uscilab.github.io/cereal/archive_specialization.html together with a custom derived class like

class MyJSONInputArchive : public cereal::JSONInputArchive {
 public:
  MyJSONInputArchive(std::istream& stream)
      : cereal::JSONInputArchive(stream) {}
};

and the using sfinae or directly specializing the templates like

template <> inline void serialize(MyJSONInputArchive& ar, foo::bar baz) {
 
...
 
}

but that doesn't seem to work for existing common types like std::map<int, foo::bar>, because in the implementation for serialization of std::map for JSONInputArchive, the static archive type is JSONInputArchive even if the dynamic type is MyJSONInputArchive and therefore it doesn't pick up the specialization. So the only solution I can see here, is to provide also a specialization for std::map for MyJSONInputArchive.

Long story short: Is there a way to support different serialization for custom types within the same binary, depending on the Archive type by some simple derivation of the existing archive types, that doesn't require to reimplement serialization for any other common types except for the ones that I need to be different?

Best,
Nikolaus

Julius Rapp

unread,
Jul 31, 2019, 6:44:35 AM7/31/19
to cereal serialization library
Hi Nikolaus,

I can not think of a good solution following your archive-specialization approach. Let me just share another idea with you:

In one project A, I'd like to include headers from project B that implement the serialization of some datatypes such than I can load such files in project A.

Now the issue is, that both projects define serialization for some thirdparty library type, lets say foo::bar (in reality, its Eigen::Matrix<...>), but unfortunately the serialization format is slightly different. It would be the simplest, if I could make them compatible, but for legacy reasons I cannot change it in either project.

I assume you can change the source code of project A, but not its serialization format for Eigen::Matrix.

Have you considered to introduce a tiny wrapper class around Eigen::Matrix in the namespace of project A? That way, project A does not need to define serialization functions for Eigen::Matrix in the cereal namespace because the serialization functions for the wrapper can be put into the namespace of project A. As a result, serialization functions for the two different formats (project A and project B) can coexist (namespace of project A and namespace cereal, respectively). If some piece of code in project A needs the project-A format then the wrapper is used. If the project-B format is required then the wrapper is omitted.

I believe that you can be lucky such that a tiny serialization-only wrapper around Eigen::Matrix can be introduced with minimal changes to existing code in project A. Eigen documentation even allows you to derive from Eigen::Matrix, but that is probably overkill: http://eigen.tuxfamily.org/dox-3.2/TopicCustomizingEigen.html#InheritingFromMatrix .

Best,
Julius

Nikolaus Demmel

unread,
Aug 1, 2019, 5:22:00 AM8/1/19
to cereal serialization library

Hi Julius,

thanks for your quick response.

On Wednesday, July 31, 2019 at 12:44:35 PM UTC+2, Julius Rapp wrote:
Hi Nikolaus,

I can not think of a good solution following your archive-specialization approach.

Do you share my assessment that having different definitions of "serialize" with the same signature but different definition in different translation units that get linked into the same binary is problematic?
 Thanks for the suggestion, this may indeed work quite nicely in some cases.

By serialization-only wrapper you mean that the datastructures are unchanged, but the project A serialization functions do something like

archive(cereal::make_nvp("foo", wrap_eigen(foo)));

for a member `foo` that is of type Eigen::Matrix?

For my own nested types this should work, since I would just put the "wrap_eigen" everywhere there is an eigen member, but using e.g. stl types like std::map<int, Eigen::Matrix<...>> would require me to also implement wrap_eigen() for stl containers, right? Or am I missing a simpler way for that case?

I'll give it a try.

Best,
Nikolaus

Julius Rapp

unread,
Aug 1, 2019, 7:37:26 AM8/1/19
to cereal serialization library
Do you share my assessment that having different definitions of "serialize" with the same signature but different definition in different translation units that get linked into the same binary is problematic?



Yes, my intuition tells me that such a situation is problematic. However, I am not an expert in this: It seems to me that the one-definition rule (ODR) applies to inline functions with external linkage only. I am not sure if it is legal to declare and define *static* inline functions in two different cpp files. Alternatively, I wonder if anonymous namespaces in the cpp files can help.

In conclusion: I can not give you a definitive answer here. All I want to share is the idea of a workaround (wrapper around Eigen::Matrix in namespace of project A).



By serialization-only wrapper you mean that the datastructures are unchanged, but the project A serialization functions do something like

archive(cereal::make_nvp("foo", wrap_eigen(foo)));

for a member `foo` that is of type Eigen::Matrix?



Yes, that is one possibility I had in mind, and that is what I described as a serialization-only wrapper. You found the disadvantage of that:



[...] using e.g. stl types like std::map<int, Eigen::Matrix<...>> would require me to also implement wrap_eigen() for stl containers, right? Or am I missing a simpler way for that case?



Yes, you are right. It took me a while to understand that you have two small wrappers in mind: One for Eigen::Matrix and one for std::map<int, Eigen::Matrix>. I can imagine that it is a useful solution if there are only a couple of different containers around Eigen::Matrix that shall be serialized.

I also considered adding an overload/"better match" for std::map serialization (if the mapped_type matches Eigen::Matrix) in project A, but inside the ::cereal namespace. However, that feels fragile because the default std::map serialization is part of another project. It also asks for more trouble of the original kind by adding stuff to the cereal namespace.

If there are more and more use cases of Eigen::Matrix serialization inside different containers then at some point the ProjectA::EigenMatrixWrapper (publicly derived from Eigen::Matrix) may become simpler. The latter solves the std::map issue for free because it becomes std::map<int, ProjectA::EigenMatrixWrapper>.



I believe this is all I can contribute here. I am interested to follow your progress, so feel free to share your findings.

Best,
Julius

Nikolaus Demmel

unread,
Aug 1, 2019, 11:00:29 AM8/1/19
to cereal serialization library


On Thursday, August 1, 2019 at 1:37:26 PM UTC+2, Julius Rapp wrote:
Do you share my assessment that having different definitions of "serialize" with the same signature but different definition in different translation units that get linked into the same binary is problematic?



Yes, my intuition tells me that such a situation is problematic. However, I am not an expert in this: It seems to me that the one-definition rule (ODR) applies to inline functions with external linkage only. I am not sure if it is legal to declare and define *static* inline functions in two different cpp files. Alternatively, I wonder if anonymous namespaces in the cpp files can help.

In conclusion: I can not give you a definitive answer here. All I want to share is the idea of a workaround (wrapper around Eigen::Matrix in namespace of project A).

Ah, I think you might be right about external linkage. I think two identical functions with internal linkage should be allowed, so if I can make both definitions of the the serializers have internal linkage (using anonymous namespace or static), and strictly separate their usage to distinct translation units, I think it should be ok. I just haven't tried if "cereal::serialize" can be made static (or put into anonymous namespace inside namespace cereal) and still participate in the template specialization in the same way. But maybe it just works. I'll give that a try. That would overall maybe be the easiest solution (for cases like mine, where you can edit change source code of both projects, but don't want to change the serialization format in either project).
 



By serialization-only wrapper you mean that the datastructures are unchanged, but the project A serialization functions do something like

archive(cereal::make_nvp("foo", wrap_eigen(foo)));

for a member `foo` that is of type Eigen::Matrix?



Yes, that is one possibility I had in mind, and that is what I described as a serialization-only wrapper. You found the disadvantage of that:



[...] using e.g. stl types like std::map<int, Eigen::Matrix<...>> would require me to also implement wrap_eigen() for stl containers, right? Or am I missing a simpler way for that case?



Yes, you are right. It took me a while to understand that you have two small wrappers in mind: One for Eigen::Matrix and one for std::map<int, Eigen::Matrix>. I can imagine that it is a useful solution if there are only a couple of different containers around Eigen::Matrix that shall be serialized.

Yes, it would be a separate wrapper for every container, so this can become cumbersome if you have many, and slow if you have large containers, since you need to make a full copy for both loading and saving.
 

I also considered adding an overload/"better match" for std::map serialization (if the mapped_type matches Eigen::Matrix) in project A, but inside the ::cereal namespace. However, that feels fragile because the default std::map serialization is part of another project. It also asks for more trouble of the original kind by adding stuff to the cereal namespace.

Ah ok, but then when I go back to adding custom specializations for stl containers, maybe the less fragile option is to use my original proposal of having a MyJSONInputArchive, whose main downside was that it requires to provide specialization also for the used containers (to propagate the custom MyJSONInputArchive type down to where the Eigen::Matrix options are).
 

If there are more and more use cases of Eigen::Matrix serialization inside different containers then at some point the ProjectA::EigenMatrixWrapper (publicly derived from Eigen::Matrix) may become simpler. The latter solves the std::map issue for free because it becomes std::map<int, ProjectA::EigenMatrixWrapper>.

Yes, that would be simpler. However, it would also be quite intrusive, since Eigen::Matrix is used heavily throughout both projects, and we also interface with other thirdparty libraries that have eigen members which we cannot change / wrap like that.
 
 
I believe this is all I can contribute here. I am interested to follow your progress, so feel free to share your findings.


Thanks, these are very helpful pointers. I'll report back how it goes. 

Nikolaus Demmel

unread,
Aug 1, 2019, 1:03:00 PM8/1/19
to cereal serialization library


On Thursday, August 1, 2019 at 5:00:29 PM UTC+2, Nikolaus Demmel wrote:


On Thursday, August 1, 2019 at 1:37:26 PM UTC+2, Julius Rapp wrote:
Do you share my assessment that having different definitions of "serialize" with the same signature but different definition in different translation units that get linked into the same binary is problematic?



Yes, my intuition tells me that such a situation is problematic. However, I am not an expert in this: It seems to me that the one-definition rule (ODR) applies to inline functions with external linkage only. I am not sure if it is legal to declare and define *static* inline functions in two different cpp files. Alternatively, I wonder if anonymous namespaces in the cpp files can help.

In conclusion: I can not give you a definitive answer here. All I want to share is the idea of a workaround (wrapper around Eigen::Matrix in namespace of project A).

Ah, I think you might be right about external linkage. I think two identical functions with internal linkage should be allowed, so if I can make both definitions of the the serializers have internal linkage (using anonymous namespace or static), and strictly separate their usage to distinct translation units, I think it should be ok. I just haven't tried if "cereal::serialize" can be made static (or put into anonymous namespace inside namespace cereal) and still participate in the template specialization in the same way. But maybe it just works. I'll give that a try. That would overall maybe be the easiest solution (for cases like mine, where you can edit change source code of both projects, but don't want to change the serialization format in either project).
 

Just a follow-up on this idea. Turns out that this is probably not possible, since it looks like you cannot specialize a template with external linkage with a static function, and the template in an anonymous namespace is not a specialization at all, see https://godbolt.org/z/TvlxKT

Does `serialize(...)` have to be in `cereal` namespace, or could it also be somewhere else, maybe in the Eigen namespace?

Nikolaus Demmel

unread,
Aug 5, 2019, 6:31:59 PM8/5/19
to cereal serialization library
Turns out I was testing the wrong thing. `serialize` for different types are actually in general not template specializations, but overloads. Therefore, adding `static` to some of them works just fine, i.e. in my case to the one for Eigen::Matrix. 

As the documentation mentions (https://uscilab.github.io/cereal/serialization_functions.html), you can indeed have serialize live in the cereal namespace or Eigen in my case (it seems ADL is used). For both variants, adding static works fine (which makes sense).

So I think this solves my issue. I just have to make sure that for those types I use static functions in both projects, which guarantees they have internal linkage, and then I can have different definitions in different translation units. The only thing one has to be careful about is that you don't include the multiple header files with the same definitions in one translation unit. But the compiler should then complain about function redefinition I would assume.

Thanks again for the input. You put me on the right track!

Nikolaus Demmel

unread,
Aug 26, 2019, 10:57:40 AM8/26/19
to cereal serialization library
Actually, I have to take my statement back. This unfortunately does not solve the issue. In particular in Debug mode what happens is that the wrong serialize function from another translation unit gets invoked. In release, it doesn't happen, probably due to inlining / optimization. But it probably still is UB, so might break at any moment.

My analysis is the following: I think the multiple serialize (or save / load) with static are fine, but serialization also involves other template functions, in particular InputArchive::operator(), InputArchive::process and InputArchive::processImpl. Having different serialize / load / save implementations for the same types in different translation units will break the ODR for those functions, and in Debug mode it actually picks the wrong one for my specific case.

I don't see how this could be fixable, so unfortunately this means that we're back to the wrapper class solution.

Nikolaus Demmel

unread,
Aug 27, 2019, 1:02:23 PM8/27/19
to cereal serialization library
I actually now solved my issue by changing the code of both projects after all and breaking compatibility with existing files.

However, maybe this is still relevant for others or in the future. A solution came to mind: cereal could allow to define a macro like CEREAL_NAMESPACE, that can be defined before the first cereal header is included, and would allow to move all of cereal into a custom namespace and thus avoid the symbol clash or ODR voilations when using it in two or more translation units. rapidJSON actually uses this approach, apparently for the same reason: https://github.com/Tencent/rapidjson/blob/2648a732dbb8c7fa211e41616071202bfbd09c77/include/rapidjson/rapidjson.h#L81-L125
Reply all
Reply to author
Forward
0 new messages