Reading a multi-segment message with zero-copy

84 views
Skip to first unread message

Sune Sash

unread,
Jul 20, 2019, 1:43:43 PM7/20/19
to Cap'n Proto

Hello

I am new to cap'n'proto and came across this comment in serialize.h. 

"A multi-segment message can be read entirely in three system calls with no buffering."

What are the 3 system calls involved? Also, I would like to understand if this statement is true under zero-copy semantics.

Thanks
Shweta

Ian Denhardt

unread,
Jul 20, 2019, 2:28:52 PM7/20/19
to Cap'n Proto, Sune Sash
Haven't looked at the code for the C++ implementation, but based on my
knowledge of the wire format[1] I would assume:

1. read() 4 bytes to get the number of segments
2. read() the list of segment sizes
3. readv() to read in all the segments

[1]: https://capnproto.org/encoding.html#serialization-over-a-stream

Quoting Sune Sash (2019-07-20 13:43:43)
> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [1]capnproto+...@googlegroups.com.
> To view this discussion on the web visit
> [2]https://groups.google.com/d/msgid/capnproto/92d0c205-d5cc-4ecd-b1ff-
> f514a0aa49c7%40googlegroups.com.
>
> Verweise
>
> 1. mailto:capnproto+...@googlegroups.com
> 2. https://groups.google.com/d/msgid/capnproto/92d0c205-d5cc-4ecd-b1ff-f514a0aa49c7%40googlegroups.com?utm_medium=email&utm_source=footer

Kenton Varda

unread,
Jul 20, 2019, 2:40:22 PM7/20/19
to Ian Denhardt, Cap'n Proto, Sune Sash
Ian is almost right. It's:

1. read() first 8 bytes, which contains the number of segments and size of the first segment.
2. (Only if more than 1 segment) read() the rest of the segment table.
3. read() the entire message content (all segments) into one big array.

So in the case of a single-segment message, it's actually two syscalls.

Of course, read() implies a copy -- from kernel buffers to userspace. So this is not truly zero-copy in that sense. However, once the data is read in from the kernel, it can then be operated on with no further copies.

For true zero-copy, you need to use mmap() (for files) or shared memory (for inter-process communication).

Over a normal IP network, zero-copy input is probably impossible, because the individual packets need to land in a temporary buffer in order for the kernel to be able to inspect their headers and find out which socket they are destined for. There's typically no way for the network card to deliver TCP packets directly to the final buffer. If you have high-end RDMA network hardware, that might be a different story.

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/156364702914.5369.4249645648625880523%40localhost.localdomain.

Sune Sash

unread,
Jul 20, 2019, 3:02:51 PM7/20/19
to Cap'n Proto

Thanks to both of you. 

I see that the writeMessage() in serialize.h creates a segment table and copies over the individual segments. So I presume, in order to send a multi-segment without incurring a copy, the application
would have to forego using the interfaces in serialize.h and frame the segments with a segment table on its own similar to what writeMessage() does.

Is there a way for the application to send all segments as one individual message over, say a socket, or would the application need to send multiple messages and
reconstruct at the receiving end? I presume that as Kenton's response indicates for a read, the write cannot be truly zero copy if the message needs
to be ultimately sent over a socket/queue. The copy that is saved by not using writeMessage would be coalescing of multiple segments into one long segment. Is that an accurate understanding?

Thanks
-
>    an email to [1]capn...@googlegroups.com.

>    To view this discussion on the web visit
>    [2]https://groups.google.com/d/msgid/capnproto/92d0c205-d5cc-4ecd-b1ff-
>    f514a0aa49c7%40googlegroups.com.
>
> Verweise
>
>    1. mailto:capn...@googlegroups.com

>    2. https://groups.google.com/d/msgid/capnproto/92d0c205-d5cc-4ecd-b1ff-f514a0aa49c7%40googlegroups.com?utm_medium=email&utm_source=footer

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capn...@googlegroups.com.

Kenton Varda

unread,
Jul 20, 2019, 3:27:01 PM7/20/19
to Sune Sash, Cap'n Proto
Hi Sune,

On Sat, Jul 20, 2019 at 12:02 PM Sune Sash <snmg...@gmail.com> wrote:

Thanks to both of you. 

I see that the writeMessage() in serialize.h creates a segment table and copies over the individual segments. So I presume, in order to send a multi-segment without incurring a copy, the application
would have to forego using the interfaces in serialize.h and frame the segments with a segment table on its own similar to what writeMessage() does.

No, writeMessage() does *not* make copies of the segments. It passes pointers to the original segment memory locations down into writev().

The writev() call itself makes a copy of the data into kernel buffers, but no copies are made in userspace.

Is there a way for the application to send all segments as one individual message over, say a socket, or would the application need to send multiple messages and
reconstruct at the receiving end? I presume that as Kenton's response indicates for a read, the write cannot be truly zero copy if the message needs
to be ultimately sent over a socket/queue. The copy that is saved by not using writeMessage would be coalescing of multiple segments into one long segment. Is that an accurate understanding?

If you want true end-to-end zero-copy -- even in the kernel -- then you need to map a shared memory segment into both the sending and receiving processes. In this case you would not use writeMessage(); you would use a MessageBuilder that allocates segments directly in the shared memory area.

But, assuming you don't want to use shared memory and want to stick with sockets, then writeMessage() is optimal.

-Kenton
 
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/f719008d-ec2f-4ee1-8bd0-d223434114d7%40googlegroups.com.

Sune Sash

unread,
Jul 20, 2019, 5:04:31 PM7/20/19
to Cap'n Proto

Thanks Kenton, I missed that writeMessage() just passes along raw pointers to segments in a call to writev().

If the application uses an IPC like say, message queues instead of sockets, then it would not be able to use writeMessageToFd() interface. Is that a scenario where the system should be falling back to application level framing and reconstructing for sending and receiving multi-segment messages?

Thanks
-

Kenton Varda

unread,
Jul 20, 2019, 6:03:26 PM7/20/19
to Sune Sash, Cap'n Proto
Hi Sune,

I guess SysV / POSIX message queues do not have any gather-write interface (like writev()), therefore it is probably impossible to send a multi-segment message over a message queue without performing an upfront copy to concatenate the segments. I suppose you could send each segment as a separate message, but that doesn't work if you have multiple concurrent senders, which is one of the main reasons to use message queues in the first place.

In general, the best way to transmit a message probably depends on the transmission API. In some cases you will want to do your own framing, yes. In other cases it might work to implement custom subclasses of kj::InputStream and kj::OutputStream, then use capnp::writeMessage() and capnp::InputStreamMessageReader on top of those.

-Kenton

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/5ce13b79-6a1c-4e6d-b8ab-ecee7e664a79%40googlegroups.com.

Sune Sash

unread,
Jul 20, 2019, 6:39:33 PM7/20/19
to Cap'n Proto
Thanks for the response.
 
I guess SysV / POSIX message queues do not have any gather-write interface (like writev()), therefore it is probably impossible to send a multi-segment message over a message queue without performing an upfront copy to concatenate the segments.

[S] In which case, using an interface like messageToFlatArray() seems like a good choice even though it incurs one additional copy. 
 
I suppose you could send each segment as a separate message, but that doesn't work if you have multiple concurrent senders, which is one of the main reasons to use message queues in the first place.

In general, the best way to transmit a message probably depends on the transmission API. In some cases you will want to do your own framing, yes.
 
[S] Given this discussion, it sounds like the only reason one might want to do own framing would be to generate the segment table, if one choose to go the route of implementing custom builders and readers that allocate and read segments from shared memory. 

Thanks
Shweta

 
Reply all
Reply to author
Forward
0 new messages