What does zero-copy mean?

686 views
Skip to first unread message

Wes Peters

unread,
Aug 7, 2014, 9:09:35 PM8/7/14
to capn...@googlegroups.com
I've just finished reading your article comparing features of Cap'n Proto with protobufs, SBE, and flatbuffers.

I was involved in a selection trial for a serialization protocol two years ago, and we went with the wrong too for all the "right" reasons: the boss had already made a decision that we should be using XML to exchange data.  As you probably know, XML serialization in C++ is generally an abysmal mess; I managed to make the best of it by discovering the quite good XSD compiler.  Eventually we switched from XML to Boost encoding for most uses, because the XML encoding is just way too slow, so now we have all the lovely overhead of describing messages in XSD without even getting the questionable benefits of XML.  I wanted to use protobufs, but got shouted down because it wasn't XML.  

So, this experience has left with with a quandry for the next project.  I get to pick this time, and I say screw XML.  So I'm out looking for serializers again.

One thing that leaped out to me in your chart: you point out that the "zero copy" serialization libraries don't allow you to use the protocol-compiler generated objects as mutable state.  Doesn't this mean that they are in fact not zero-copy then?  If you have to maintain the state in another object, then create an object from that for serialization, you have in fact copied the data, probably in a constructor.  So what we've actually managed to do is complicate the copying process... or have I mis-read what you wrote?

Kenton Varda

unread,
Aug 8, 2014, 4:32:37 PM8/8/14
to Wes Peters, capnproto
Hi Wes,

Yes, you can argue that certain use cases might require a copy where with protobufs they could be designed differently and avoid that copy.

In my experience, most real-world uses of protobufs construct the whole message just before sending. Whether or not this construction makes sense to call a "copy" depends on the use case, but in any case that's not the copy that Cap'n Proto is claiming to avoid.

If you are trying to store some in-memory state that changes over time, and you want to occasionally dump that state to disk / network, and you would normally have been happy keeping the state in a protobuf object, then, yes, with Cap'n Proto you now probably need to keep the state in some other structure and do an extra copy every time you dump it. However, this copy will still be a lot faster than protobuf encoding, since it's just a lot of loads and stores with few branches.

Actually, though, if you're careful, you can in fact store your state in a Cap'n Proto structure. You just have to understand how Cap'n Proto manages memory. If the changes you're making to your state don't ever involve removing whole objects from the message tree, then there's no problem. Or, if you're willing to occasionally do a sort of "garbage collection" pass by making a copy of the whole message into a new MessageBuilder, then that can also solve the problem -- and may in fact turn out to be a lot more efficient than Protobuf memory management. For instance, you could do this "GC" pass every time the space occupied by the message doubles. This would give you amortized performance that's likely better than relying on malloc(). But you have to understand what you're doing, which is why we don't usually recommend it. :)

-Kenton

--
Sandstorm.io is crowdfunding! http://igg.me/at/sandstorm


--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.

Wes Peters

unread,
Aug 9, 2014, 4:53:03 PM8/9/14
to capn...@googlegroups.com, barna...@gmail.com


On Friday, August 8, 2014 1:32:37 PM UTC-7, Kenton Varda wrote:
Hi Wes,

Yes, you can argue that certain use cases might require a copy where with protobufs they could be designed differently and avoid that copy.

In my experience, most real-world uses of protobufs construct the whole message just before sending. Whether or not this construction makes sense to call a "copy" depends on the use case, but in any case that's not the copy that Cap'n Proto is claiming to avoid.

If you are trying to store some in-memory state that changes over time, and you want to occasionally dump that state to disk / network, and you would normally have been happy keeping the state in a protobuf object, then, yes, with Cap'n Proto you now probably need to keep the state in some other structure and do an extra copy every time you dump it. However, this copy will still be a lot faster than protobuf encoding, since it's just a lot of loads and stores with few branches.

That's our normal use case.  We do telemetry tracking and recording, so we spend a lot of time caching the current state of something and dumping it to a log file or database on time slices.
 
Actually, though, if you're careful, you can in fact store your state in a Cap'n Proto structure. You just have to understand how Cap'n Proto manages memory. If the changes you're making to your state don't ever involve removing whole objects from the message tree, then there's no problem. Or, if you're willing to occasionally do a sort of "garbage collection" pass by making a copy of the whole message into a new MessageBuilder, then that can also solve the problem -- and may in fact turn out to be a lot more efficient than Protobuf memory management. For instance, you could do this "GC" pass every time the space occupied by the message doubles. This would give you amortized performance that's likely better than relying on malloc(). But you have to understand what you're doing, which is why we don't usually recommend it. :)

This might actually work out pretty well.  Our storage tends to grow slowly, and only occasionally is part of the tree pruned, usually when a sensor goes offline, so we would know when it's time to regenerate the tree.  But, as you point out, the copy-out for serialization is probably still less load than just serializing and existing protobufs object.

In the current incarnation, we're using objects generated by the CodeSynthesis XSD compiler.  It's probably the best way to go if you "have" to use XML, but the serialization cost is astonishing.

Thanks for the info.

nys...@gmail.com

unread,
May 24, 2016, 3:16:19 PM5/24/16
to Cap'n Proto, barna...@gmail.com
I know this post is fairly old, but I'm curious - are you aware of any serialization protocols that are more focused on catering mutable data?

Fahrzin Hemmati

unread,
May 24, 2016, 3:44:04 PM5/24/16
to nys...@gmail.com, barna...@gmail.com, capnproto

Protocol buffers are much better with mutable data.

Kenton Varda

unread,
May 24, 2016, 3:44:36 PM5/24/16
to nys...@gmail.com, Cap'n Proto, Wes Peters
If you mean being able to manipulate objects in-memory, protobuf does a fine job of that. But note that this really orthogonal to the core goal of providing serialization. Also, I'm planning to add better support for this to Cap'n Proto in the near future -- the the mailing list discussions about "POCS" support.

If you mean being able to modify data on disk without rewriting the whole data set, then you want sqlite or a database. :)

-Kenton

nys...@gmail.com

unread,
May 24, 2016, 7:58:23 PM5/24/16
to Cap'n Proto, nys...@gmail.com, barna...@gmail.com
Yes, in-memory is what I had in mind - similar to the use case offered by the OP.

I have experimented with protobuf in this way but it felt like a hack. Isn't this double duty - holding state within the generated serialization code - actually discouraged? I can see that protobuf may be better relative to Cap'n Proto, but I guess what I'm actually asking is if anyone is aware of anything that actually intends to be used in this way.

By the way, I'm not passive aggressively criticizing Cap'n Proto! If it isn't good for this use case is totally OK - don't bloat your project on account of a few edge cases. :-)

Kenton Varda

unread,
May 24, 2016, 9:34:46 PM5/24/16
to Aaron Swearingen, Cap'n Proto, Wes Peters
Hmm, I guess I'm not sure what you're looking for here. If holding state in the generated classes is "a hack", then what else can you do?

(FWIW, using protobuf objects as in-memory state is decried as a hack by some and used as an incredible time-saving/boilerplate-avoiding technique by others... I've been on both sides myself.)

-Kenton

nys...@gmail.com

unread,
May 24, 2016, 10:46:40 PM5/24/16
to Cap'n Proto, nys...@gmail.com, barna...@gmail.com
Perhaps it isn't as 'unconventional' of an approach as it had seemed to me at the time.

I read through some of the POCS thread as you suggested and it does sound interesting. Looking forward to seeing how that plays out.
Reply all
Reply to author
Forward
0 new messages