How to deserialize a message from file (beginner) (c++)

Benjamin Valpey

unread,

Aug 12, 2019, 12:38:04 PM8/12/19

to Cap'n Proto

Hello,

I am trying to deserialize a message from a file that was serialized using Python's pycapnp. I have a schema file and a binary file that adheres to the schema. I want to read the binary file in c++ using a filestream. Is this sort of thing possible? If so, how do I do it? I tried searching for how to deserialize a message from a file but ended up getting a bit confused and figured it would be more effective to just ask here.

I am just using Capn' Proto to serialize data and then read the file into a c++ class that I have created. Once I have opened the file into a filestream, how would I go about reading the message using the schema?

Thanks!

Kenton Varda

unread,

Aug 13, 2019, 11:59:14 AM8/13/19

to Benjamin Valpey, Cap'n Proto

Hi Benjamin,

By "filestream" do you mean the C++ std::fstream class?

Cap'n Proto doesn't directly support C++ iostreams, because, frankly, iostreams are slow and poorly-designed. If you want to read from an std::fstream, you'll need to write a custom subclass of kj::InputStream (from kj/io.h) which wraps the fstream. You can then use capnp::InputStreamMessageReader to read from that.

However, it's much more efficient to use raw file descriptors instead (or HANDLEs on Windows). On Unix-ish systems, use open() to open a file, then pass the returned file descriptor (an integer) to capnp::StreamFdMessageReader. You can also use file descirptors on Windows (make sure to pass O_BINARY flag to open()), but it may be more efficient to use Windows' CreateFile() function which returns a HANDLE, then create kj::HandleInputStream on top of that, and capnp::InputStreamMessageReader on top of that.

Note that for large files (more than a megabyte), you might want to consider memory mapping instead of streaming reads, especially if you only need to process a few pieces of the overall data. Memory mapping places the whole file into memory in a way that lets the operating system loads pages of the file on-demand when first accessed by the program, rather than reading the whole file from disk upfront. Probably the easiest way to do memory mapping is to use the KJ filesystem API in kj/filesystem.h; it wraps the fiddly details in a platform-independent way, although the KJ filesystem API itself is admittedly fairly complex.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/da81eb36-f4d4-4db3-8e35-4d3b2c642f2f%40googlegroups.com.

Benjamin Valpey

unread,

Aug 13, 2019, 3:26:24 PM8/13/19

to Cap'n Proto

Hi Kenton,

Yes, that is exactly what I meant by filestream. However, as per your suggestion, perhaps this is not the most efficient way to do this. FYI I am using a few distributions of Linux (Debian, Fedora, and Ubuntu).

My files are a bit large (around 5M). However, I do need to process the entire file. I will try it without using mmap, but if I run into performance issues I will look into that.

Thanks

To unsubscribe from this group and stop receiving emails from it, send an email to capn...@googlegroups.com.

Reply all

Reply to author

Forward