Transferring large arrays

codie

unread,

Aug 16, 2018, 6:08:01 PM8/16/18

to Cap'n Proto

Hello,
I'm considering currently to use CapnProto for inter-process communication and could use a little hint before investing too much time into it. I have to send large vectors and matrices (~300MB) to another process. I know protobuf is not designed to handle such amounts of data for transfer. How about CapnProto? I also considered using simply e.g. sockets.Thanks.

Kenton Varda

unread,

Aug 16, 2018, 6:17:45 PM8/16/18

to codie, Cap'n Proto

Hi,

Yes, Cap'n Proto should handle this use case much better than Protobufs. By default there is a message size limit of 64MB, but this is for security purposes, not performance, and you can easily configure a much larger limit as needed using `ReaderOptions`.

You could write your messages to a socket, but for the IPC use case you can get even better performance using shared memory. If you set up a shared memory region and build your Cap'n Proto message directly in it on the sending end, then the receiving end can read it directly without ever copying the bytes at all. With a socket, the bytes have to be copied at least twice -- into and out of the kernel.

-Kenton

On Thu, Aug 16, 2018 at 1:52 AM, codie <constan...@gmail.com> wrote:

Hello,
I'm considering currently to use CapnProto for inter-process communication and could use a little hint before investing too much time into it. I have to send large vectors and matrices (~300MB) to another process. I know protobuf is not designed to handle such amounts of data for transfer. How about CapnProto? I also considered using simply e.g. sockets.Thanks.

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

qd.eng....@gmail.com

unread,

Aug 28, 2018, 7:55:24 PM8/28/18

to Cap'n Proto

Thanks for the answer. I investigated the topic and looked into the codebase. As described in the docs, I found the mmap reader. What's still left though is an architectural question I'm not sure how to deal with appropriately. I have two applications communicating via CapnProto RPC and want to pass the mmap memory by this connection.

(1) Does it make more sense to just hand over the file identifier and read the message over there again or would it be more reasonable to transfer the memory via RPC but glue everything together with something like an mmap Client/Server.

(2) If I need to make memory estimations beforehand for mmap, I haven't found a routine yet which provides functionality. Did I simply miss it or is there no message memory estimator. If not maybe this is worth a PR then.

- Codie

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.

Kenton Varda

unread,

Aug 28, 2018, 8:38:19 PM8/28/18

to qd.eng....@gmail.com, Cap'n Proto

Hi Codie,

While Cap'n Proto is compatible with shared memory, it currently does not provide much in the way of helpers for using it. You'll have to do a number of things yourself.

For how to set up shared memory, take a look at: http://man7.org/linux/man-pages/man7/shm_overview.7.html

You can use ftruncate() to set the size extremely large, and then mmap() in the whole thing. Only the pages that you actually write to will be allocated (I think), so this way you don't have to worry about guessing in advance how much to allocate -- just allocate something that's far more than enough.

Once you have a shared memory region set up, you'll need to implement a custom subclass of capnp::MessageBuilder which allocates segments in this space. When you want to transmit to the receiving process, you'll probably need to call builder.getSegmentsForOutput(), construct a segment table based on the offset of each segment from the start of the shared memory space, and transmit that table over a socket or pipe to the receiving process. The receiving process will then need a custom subclass of capnp::MessageReader which can accept this table and read the segments from shared memory. Eventually, the receiving process will need to communicate back when it is done with the message, so that the sending process can free up the segments for reuse.

If you want to transmit messages in both directions, you'll probably want to create two shared memory regions, one for each direction. This way the sending side is always responsible for keeping track of which parts of the region are currently in-use. You can even map the memory read-only on the receiving side, for extra safety.

This design still calls for a socket used to transmit segment tables and other synchronization messages, though the main payload is in shared memory. If you want to avoid a socket altogether, you could devise a protocol that uses POSIX semaphores (or even futexes) located in shared memory for signaling. But personally I'd go with the socket, since it's much easier for the receiving end to listen for events on multiple sockets at the same time.

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward