Faster than Boost, Cereal and Protobuf

woodb...@gmail.com

unread,

Aug 21, 2017, 12:34:19 AM8/21/17

to

I'm happy to report that the C++ Midleware Writer (CMW)
is faster than the serialization library in Boost, Cereal
and Protobuf in this benchmark:
/https://github.com/thekvs/cpp-serializers

. The CMW produced a smaller serialized size than
Capnproto or Cereal:

Capnproto 17,768
Cereal 17,416
CMW 16,712

I'm happy to give demos of the software. If you have
a 2017 C++ compiler, it normally takes about ten minutes.
The first step is to download/clone this:
https://github.com/Ebenezer-group/onwards

Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net

Daniel

unread,

Aug 21, 2017, 12:43:27 AM8/21/17

to

On Monday, August 21, 2017 at 12:34:19 AM UTC-4, woodb...@gmail.com wrote:
> I'm happy to report that the C++ Midleware Writer (CMW)
> is faster than the serialization library in Boost, Cereal
> and Protobuf in this benchmark:
> /https://github.com/thekvs/cpp-serializers
>

Am I missing something? I don't see CMW in that benchmark.

Daniel
https://github.com/danielaparker/jsoncons

woodb...@gmail.com

unread,

Aug 21, 2017, 1:14:38 AM8/21/17

to

On Sunday, August 20, 2017 at 11:43:27 PM UTC-5, Daniel wrote:
> Am I missing something? I don't see CMW in that benchmark.
>

No, it's not listed there. I ran the benchmark locally.

David Brown

unread,

Aug 21, 2017, 3:27:42 AM8/21/17

to

Then how about giving a list of the results here? Otherwise it looks
like you are just cherry-picking - saying you are faster than Boost,
Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and
the others listed on that site. Benchmarks can give an idea of relative
speeds and sizes, but you have to provide the numbers - not your
conclusions, which will be highly biased (or at least assumed to be
highly biased) since you are the producer of one of the competing libraries.

Of course, there are all sorts of feature and requirements differences
between these libraries which are usually far more important than speed
or size. It would be helpful to have a comparison there too (the github
project is missing this information, and is basically useless for anyone
trying to consider choosing a serialisation library).

Daniel

unread,

Aug 21, 2017, 7:24:41 AM8/21/17

to

I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request.

If your project is not accepted, you can still send a link to the cloned github project here, so people can see if they can reproduce your results, should they wish to do so.

Daniel
https://github.com/danielaparker/jsoncons

Öö Tiib

unread,

Aug 21, 2017, 8:43:05 AM8/21/17

to

It looks half of a test anyway. What I would expect is serialization speed
comparison, size comparison otw (need for bandwidth) and deserialization
speed comparison. A thing has to be tested from end to end otherwise
the results are likely meaningless with gaps for cheating.

woodb...@gmail.com

unread,

Aug 21, 2017, 11:09:13 AM8/21/17

to

On Monday, August 21, 2017 at 2:27:42 AM UTC-5, David Brown wrote:
>
> Then how about giving a list of the results here? Otherwise it looks
> like you are just cherry-picking - saying you are faster than Boost,
> Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and

If it's faster than Cereal, it's faster than thrift and msgpack.

> the others listed on that site. Benchmarks can give an idea of relative
> speeds and sizes, but you have to provide the numbers - not your
> conclusions, which will be highly biased (or at least assumed to be
> highly biased) since you are the producer of one of the competing libraries.

The size I provided is a number.

>
> Of course, there are all sorts of feature and requirements differences
> between these libraries which are usually far more important than speed
> or size. It would be helpful to have a comparison there too (the github
> project is missing this information, and is basically useless for anyone
> trying to consider choosing a serialisation library).

The CMW automates the creation of serialization functions.
Here's another serialization library:
https://github.com/eliasdaler/MetaStuff

His approach requires you to maintain functions like this:

template <>
inline auto registerMembers<Person>()
{
return members(
member("age", &Person::getAge, &Person::setAge),
member("name", &Person::getName, &Person::setName),
member("salary", &Person::salary),
member("favouriteMovies", &Person::favouriteMovies)
);
}

With the CMW you don't have to write code like that.

Other than the CMW, I'm not aware of other libraries that have
support for plf::colony or std::string_view.

Brian
Ebenezer Enterprises
http://webEbenezer.net

woodb...@gmail.com

unread,

Aug 21, 2017, 11:23:25 AM8/21/17

to

I did send an email to the author of the benchmark telling him
my serialized size and how it did on the timing. He hasn't
replied.

What I could do is publish the code I used in my repo.

woodb...@gmail.com

unread,

Aug 21, 2017, 11:27:43 AM8/21/17

to

On Monday, August 21, 2017 at 7:43:05 AM UTC-5, Öö Tiib wrote:
>
> It looks half of a test anyway. What I would expect is serialization speed
> comparison, size comparison otw (need for bandwidth) and deserialization
> speed comparison. A thing has to be tested from end to end otherwise
> the results are likely meaningless with gaps for cheating.

I don't see a big difference between his benchmark and what
you wrote. He provides the serialized sizes and the combined
(serialization and deserialization) times. You want to see
it broken down more? In my opinion it's an OK benchmark.

Daniel

unread,

Aug 21, 2017, 11:32:52 AM8/21/17

to

Why would he? Either you send him a pull request, which is what people do if they want to be included in somebody else's benchmarks, or there's nothing
to reply to.

Daniel

woodb...@gmail.com

unread,

Aug 21, 2017, 11:35:39 AM8/21/17

to

On Monday, August 21, 2017 at 10:09:13 AM UTC-5, woodb...@gmail.com wrote:
> On Monday, August 21, 2017 at 2:27:42 AM UTC-5, David Brown wrote:
> >
> > Then how about giving a list of the results here? Otherwise it looks
> > like you are just cherry-picking - saying you are faster than Boost,
> > Cereal and Protobuf but "forgetting" to mention yas, thrift, msgpack and
>
> If it's faster than Cereal, it's faster than thrift and msgpack.
>
> > the others listed on that site. Benchmarks can give an idea of relative
> > speeds and sizes, but you have to provide the numbers - not your
> > conclusions, which will be highly biased (or at least assumed to be
> > highly biased) since you are the producer of one of the competing libraries.
>
> The size I provided is a number.

Cereal 17,416
CMW 16,712

One difference is probably due to my using a variable-length integer
for the string lengths. There's a vector of 100 strings in the test.
I'm not sure if Cereal is using 4 byte or 8 byte string lengths. If
it's using 8 bytes and I only need 1 byte for each string, that's 700
bytes which is close to the difference in sizes between my approach
and Cereal.

I use 4 byte integers for the lengths of the vectors. Some of the others may use 8. In that sense they are more general than my approach, but am not sure how often it matters. In this test there are two vectors, so it would be an 8 byte difference.

Daniel

unread,

Aug 21, 2017, 11:54:07 AM8/21/17

to

On Monday, August 21, 2017 at 11:35:39 AM UTC-4, woodb...@gmail.com wrote:
>
> One difference is probably due to my using a variable-length integer
> for the string lengths. There's a vector of 100 strings in the test.
> I'm not sure if Cereal is using 4 byte or 8 byte string lengths. If
> it's using 8 bytes and I only need 1 byte for each string, that's 700
> bytes which is close to the difference in sizes between my approach
> and Cereal.
>

Don't know about Cereal, but most binary representations use variable length
encodings for the lengths of strings, arrays or objects, see for example
MessagePack or cbor. For short strings or small integers, they typically
combine the data type code and the length into one byte.

A big obstacle you'll have to getting a user for CMW is the fact that you're
using a proprietary data format that is known only to you. For example, if
you were to use cbor instead, somebody could create binary data encodings
with your software and read them in a python application with no additional
work.

Daniel

woodb...@gmail.com

unread,

Aug 21, 2017, 3:06:56 PM8/21/17

to

On Monday, August 21, 2017 at 10:54:07 AM UTC-5, Daniel wrote:
> Don't know about Cereal, but most binary representations use variable length
> encodings for the lengths of strings, arrays or objects, see for example
> MessagePack or cbor. For short strings or small integers, they typically
> combine the data type code and the length into one byte.

I don't need data type codes. At least not in general.

>
> A big obstacle you'll have to getting a user for CMW is the fact that you're
> using a proprietary data format that is known only to you.

The format is not a secret. Others like the serialization
library in Boost or Cereal don't use cbor.

> For example, if
> you were to use cbor instead, somebody could create binary data encodings
> with your software and read them in a python application with no additional
> work.
>

I hope for something like that in the future, but will let
things shake out a little before working on that.

Brian

Öö Tiib

unread,

Aug 21, 2017, 5:05:35 PM8/21/17

to

Why to combine times of serializing and deserializing that often
happen on different hosts and sometimes even on different hardware?

woodb...@gmail.com

unread,

Aug 21, 2017, 5:33:57 PM8/21/17

to

Yes, but doing it the way he does simplifies things and
makes it so you can get some results with just one machine.

Öö Tiib

unread,

Aug 21, 2017, 6:43:23 PM8/21/17

to

So measuring serializing and deserializing separately still somehow
makes things too complicated and also can't be done on just one machine?

woodb...@gmail.com

unread,

Aug 21, 2017, 11:51:23 PM8/21/17

to

No. I'm not opposed to this idea, but it makes for
more work for the benchmark author in terms of coding
and presenting the results.

Dombo

unread,

Aug 30, 2017, 2:16:35 PM8/30/17

to

Op 21-Aug-17 om 6:43 schreef Daniel:

> https://github.com/danielaparker/jsoncons

Slightly off-topic; a year ago I was looking for a C++ JSON library for
a quick proof-of-concept. Though there are plenty to choose from I
didn't find any I liked. So I wrote my own and considered putting it on
GitHub if I ever got around to polish it a bit.

The link above made me curious what you came up with. For some reason
this library hasn't popped up on my radar before. I must say I really
liked what I saw. In many ways your library is very similar to what I
created, it also has a couple of things that I planned to do if I had
the time (or need) and things I hadn't even considered. If I had found
your library a year ago I most likely wouldn't have bothered to create
my own, but if I ever need a JSON library again I know where to look
now. Nice work!

woodb...@gmail.com

unread,

Sep 3, 2017, 12:32:26 AM9/3/17

to

On Monday, August 21, 2017 at 6:24:41 AM UTC-5, Daniel wrote:

> I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request.

I don't think I'm going to do that. One of the weaknesses
of the benchmark in my opinion is how it lumps everything
together.
https://github.com/thekvs/cpp-serializers/blob/master/benchmark.cpp

One of the hallmarks of the C++ Middleware Writer over the
years has been smaller test programs than parallel competitor
programs. While this benchmark is helpful to me in terms of
the run times, it would be more helpful if there were separate
programs for each of the libraries being tested. So rather
than creating a pull request, I'm thinking about creating an
issue with the benchmark.

Brian
Ebenezer Enterprises - Enjoying programming again.
http://webEbenezer.net

woodb...@gmail.com

unread,

Sep 3, 2017, 1:20:11 AM9/3/17

to

If K. Sorokin accepts the issue I've created now and
creates separate programs for each library, I'd be more
inclined to create a pull request as you suggested.

woodb...@gmail.com

unread,

Sep 3, 2017, 1:34:54 AM9/3/17

to

On Saturday, September 2, 2017 at 11:32:26 PM UTC-5, woodb...@gmail.com wrote:

> On Monday, August 21, 2017 at 6:24:41 AM UTC-5, Daniel wrote:
>
> > I would suggest cloning the cpp-serializers master branch, adding CMW, and submitting a pull request.
>
> I don't think I'm going to do that. One of the weaknesses
> of the benchmark in my opinion is how it lumps everything
> together.
> https://github.com/thekvs/cpp-serializers/blob/master/benchmark.cpp
>
> One of the hallmarks of the C++ Middleware Writer over the
> years has been smaller test programs than parallel competitor

By smaller I'm primarily talking about the sizes of the
text segments of the programs.

Ian Collins

unread,

Sep 3, 2017, 4:31:46 AM9/3/17

to

Something no one other than you worries about...

--
Ian

woodb...@gmail.com

unread,

Sep 3, 2017, 11:58:43 AM9/3/17

to

Something few have cared about.