On Mon, Sep 9, 2013 at 1:37 PM, Kenton Varda <
temp...@gmail.com> wrote:
> On Mon, Sep 9, 2013 at 10:04 AM, Andrew Lutomirski <
an...@luto.us> wrote:
>>
>> Looking at it, though, I have another question: I bet it's possible to
>> shave 16 bytes off of single-segment serialized messages. A
>> multiple-segment message starts with a count of segments. Assuming
>> that the only allowable message roots are structs (is this true? can
>> I serialize an Int16?), the first segment of a message will always
>> start with a struct pointer. So, if the format changed such that the
>> struct pointer tag wasn't zero, then as long as a multi-segment
>> message could have no more than, say, 2^62 - 1 segments, a segment
>> count could never look like a struct pointer. Then single-segment
>> messages could omit the segment count and the length of the first
>> message.
>
>
> Interesting line of thought. But note that the segment count is 4 bytes,
> and each segment size is also 4 bytes, so the segment table for a
> single-segment message is actually only 8 bytes. Also, since the "tag" bits
> of a pointer are the least-significant bits, I'm not sure how having a
> non-zero tag for structs would help distinguish them from a segment count.
Whoops -- I read that backwards.
>
> It would probably be possible to make a rule that the segment count cannot
> be more than 2^31, so if the top bit is set, then we can interpret the
> segment table in a completely different way. We could, for instance,
> interpret bytes 4-7 as the upper bytes of a struct pointer (indicating the
> section sizes) and assume the lower bytes of that pointer to be all-zero
> (indicating a struct pointer pointing to the very next word, which is the
> norm for a message root).
>
> I'm not really sure if this is worth the complication, though.
A simpler approach might be to send the number of segments as ((N-1)
<< 1) | 1 -- that is, ensure that the low bit is always set. That
also makes sure that struct pointers never look like segment tables.
In the simplest case, doing just this means that trying to read a
segment table as a struct or vice versa is guaranteed to fail cleanly.
Your fancier thing has the benefit that this kind of struct is still
self-delimiting, which is nice (given the below).
>
>>
>> Another possible tweak would be to always omit the length of the last
>> segment, since I doubt that anyone will use the segment table as a way
>> to make the messages self-delimiting.
>
>
> Actually, it's very much intended for such use. It currently works just
> fine to write multiple Cap'n Proto messages to a stream without any
> additional delimiting.
>
> In cases where the transport provides its own framing, people are welcome to
> skip the "standard" serialization and instead call getSegmentsForOutput()
> and frame them however they want.
--Andy
>
> -Kenton