Trying to understand how to use CapnProto

125 views
Skip to first unread message

Software Developer

unread,
Nov 9, 2023, 7:28:31 AM11/9/23
to capn...@googlegroups.com
Hi,

I'm pretty new to (de-)serialisation; at least the kind that revolves around frameworks
like CnP, Protobuf, Msgpack, etc. I'm quite familiar with the concept, especially if hand-rolling
your own binary (de-)serialisation using something like TLVs/XDR as a format.

I have an application in C++ that loads in a YAML config, performs some validation and pre-processing,
then parses N config items to set up the state for a running daemon loop.
When N=10000 this can cause the program to take minutes in startup time.

It has been suggested that we look into binary (de-)serialisation where we first serialise the Config object state
(root object of all compiled configuration) and its transitive dependencies after the YAML config
has been parsed/processed into a file, and then in future invocations of the program load this file
so we can skip the whole YAML dance and deserialse the Config object, and go directly to the daemon loop.

My understanding of schema based (de-)serialisation frameworks is that you write a schema to indicate what
will be the format on the wire, and then using that schema you generate classes/structs that will represent
that schema.

How are these generated classes meant to be used exactly? Can I retrofit my existing Config class with CnP
and just supply a (de-)serialise() function. Going by the usage examples on CnP website this doesn't seem to
be the case.

So this is my current thinking: if CnP generates a class called CnP_Config (distinct from the concrete Config class)
which represents the schema/wire format, then these will be use as intermediate classes for (de-)serialisation.
Specifically if I want to serialise the Config class, I would write a Config::serialise() function that returns a CnP_Config
which will construct a CnP_Config object that is completely populated. Then with the CnP_Config object I would then
write this out to the filesystem using the CnP utility functions. Similarly to deserialise, I would load the file on the wire
into a CnP_Config object using the CnP utility functions, and then have a function like
Config::deserialise(CnP_Config &other) which will convert the CnP_Config object into a concrete Config object.

Is this the right approach? Or should I be using these CnP generated classes as first-class classes and replace the
Config class in my program with CnP generated ones? The problem with doing this is that these objects or sub-objects
may have inheritence and is already quite entangled in the existing code base; supplanting them with CnP generated
classes would be a herculean effort.

Thanks,
Software Developer

Kenton Varda

unread,
Nov 15, 2023, 11:00:58 AM11/15/23
to Software Developer, capn...@googlegroups.com
You basically understand correctly, although what you describe is a little bit closer to the Protobuf model than the Cap'n Proto model. They are very similar, but the difference with capnp is that the generated classes do not create a self-contained copy of the message content, but rather act as cursors into the underlying message buffer. The setter methods write directly into the underlying buffer, and the getter methods read directly from it (or return pointers pointing into it). This differs from Protobuf where the buffer is parsed into an object tree upfront, and the generated class represents that object tree -- once you have that object tree, the byte buffer can be discarded.

The reason this matters is, if you use a design where you convert CnP_Config into your traditional Config class upfront, you lose this zero-copy benefit. At that point in time, you are presumably making copies of all the message contents into an in-memory object tree. If you can avoid that, you will get a performance benefit. One way to avoid it would be to use CnP_Config::Reader directly, replacing your old Config class. Another approach might be to refactor the Config class so that it can be backed by a CnP_Config::Reader under the hood, to avoid the upfront copy.

That said, even with the upfront copy, you should see a performance benefit vs. YAML, which is much more expensive to parse.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/7085ea4d-5bad-482c-85e2-d152d4290386%40app.fastmail.com.
Reply all
Reply to author
Forward
0 new messages