How faster than protobuf?

1,296 views
Skip to first unread message

Igor Gatis

unread,
Sep 9, 2013, 9:56:09 AM9/9/13
to capn...@googlegroups.com
Hi Kenton,

I wonder how faster than protobuf capnproto really is. I think of the scenario I want to write a struct to a file or send it over the network. IO will be the bottleneck, specially for large structs. Since capnproto serialization takes more bytes (while unpacked) then protobuf serialization, will it be faster still? What happens when packing is used? The time consumed by packing and writing is still faster than protobuf writing?

For the case scenario where shared memory for IPC, I guess there is no performance difference between capnproto and protobuf. Right?

-Gatis

Kenton Varda

unread,
Sep 9, 2013, 2:41:04 PM9/9/13
to Igor Gatis, capnproto
Hi Igor,

On Mon, Sep 9, 2013 at 6:56 AM, Igor Gatis <igor...@gmail.com> wrote:
Hi Kenton,

I wonder how faster than protobuf capnproto really is. I think of the scenario I want to write a struct to a file or send it over the network. IO will be the bottleneck, specially for large structs. Since capnproto serialization takes more bytes (while unpacked) then protobuf serialization, will it be faster still? What happens when packing is used? The time consumed by packing and writing is still faster than protobuf writing?

Actually, IO is not the bottleneck as often as you think.  In a modern datacenter, machines are likely to have 10gbit NICs.  It's actually pretty hard for a CPU to saturate such a pipe if it's doing any significant amount of work.  There are servers at Google, for instance, that spend a third of their CPU time encoding or decoding protobufs while their NICs sit mostly idle.

Of course, when transmitting over a lower-bandwidth pipe (like the generate internet), I/O will become an issue.  This is where packing comes in.  Yes, Cap'n Proto packing is faster than Protobuf encoding.  The reason it's faster is because it is implemented as a tight loop that inputs arbitrary bytes and outputs arbitrary bytes, branching only once per (unpacked) word, whereas Protobuf encoding involves generating specialized code for every type with lots of branching, function call overhead, and instruction cache pressure.

The exact speed difference depends on the use case, and you can certainly construct use cases where either system wins.  But in general my benchmarks seem to indicate that transmitting Cap'n Proto messages over a local pipe with packing is ~30% faster than transmitting Protobufs over a pipe, with the message sizes being roughly comparable (each system wins some cases).  And this is without much optimization effort having been applied yet on the Cap'n Proto side, while Protobuf has been intensely micro-optimized.
 
For the case scenario where shared memory for IPC, I guess there is no performance difference between capnproto and protobuf. Right?

Completely the opposite!  This is exactly the case where Cap'n Proto is actually infinity times faster.  The "bandwidth" of shared memory is effectively infinite, because you don't have to copy the bytes at all.  But to send a protobuf through shared memory, you must encode it on the sending end and decode it again on the receiving end, because protobuf's in-memory objects are not contiguous nor relocatable.  With Cap'n Proto, the in-memory representation is contiguous and relocatable so it can be shared directly, with no copying.

-Kenton

Igor Gatis

unread,
Sep 10, 2013, 11:02:30 AM9/10/13
to Kenton Varda, capnproto

Makes sense. It's exciting.

Reading the encoding documentation, I could not really picture how capnproto objects look like in memory. Perhaps an example would help (like protobuf doc has).

Also, browsing the source code i could not find examples of generated code. I think would help people willing to add support for other languages.

Andreas Stenius

unread,
Sep 10, 2013, 2:19:34 PM9/10/13
to Igor Gatis, capnproto
2013/9/10 Igor Gatis <igor...@gmail.com>

Makes sense. It's exciting.

Reading the encoding documentation, I could not really picture how capnproto objects look like in memory. Perhaps an example would help (like protobuf doc has).

Also, browsing the source code i could not find examples of generated code. I think would help people willing to add support for other languages.

From the capnproto sources, you can look at the non-trivial generated code for c++ of schema.capnp here:

For sample messages in different formats (flat, packed, text, etc..) take a look at the test data here:

They're not annotaded, of course, which would've been helpful ;)

If you have the 010-Editor [1] btw, you can use a binary template when looking at the unpacked binary message format (attached a version which ought to work for most cases, except I've not implemented the far pointers).
Shouldn't be too hard to implement a template for the packed version as well..


capnp.bt

Igor Gatis

unread,
Sep 10, 2013, 5:18:09 PM9/10/13
to Andreas Stenius, capnproto
Thanks Andreas. I've got to confess looking at the generated code made me more confused and curious.

Kenton Varda

unread,
Sep 11, 2013, 8:47:33 PM9/11/13
to Igor Gatis, capnproto
Thanks for the suggestions, Igor.  I'll try to improve the docs for the next release.
Reply all
Reply to author
Forward
0 new messages