Representing fixed-size arrays and blobs

453 views
Skip to first unread message

Jan Huwald

unread,
Aug 22, 2013, 4:20:56 PM8/22/13
to capn...@googlegroups.com
Hi,

when reading the encoding spec I missed the ability to store blobs of a
fixed size that are larger than 64 bit in the same way primitive data
types are stored in a struct, that is: inline in the structure,
continuous in memory, and without additional pointer or tag.

In my use case some messages are soly composed of several non-optional
IPv6 addresses and cryptographic hashes. Right now I could represent them as
1. Blob/List
2. Struct of named UInts

But none of those values is ever going to change in size or get another
field. So the overhead of putting them in a separate structure or of
having a seperate variable-length blob for each of them seems quite
wasteful to me. In addition, the above implementation forces checks for
existence or correct length and thus introduces a sources of bugs. Those
checks are not necessary for types of fixed length that are non-optional.

To mitigate I see three options:
1. Add UIntX for all powers of two, or better all multiples of 64 to
accommodate weirdos like SHA384.
2. Add a fixed size array type to schema definitions. Something like:
struct Msg {
src_ip @0 :UInt8[16];
hash @1 :Uint64[4];
}
3. Add the ability to add fixed size/unextendable types that can contain
only other primitive or unextendable types and are always embedded
into the host structure (never freestanding).

The obvious disadvantages are:
- lack of host language support (1)
- byte order problems (1, 2 if incorrectly used)
- implementation complexity (3, maybe 2)
- inability to take raw pointer (depending on impl: 2, 3)
- non-obvious extension of alignment: has UInt128 a 128b alignment? (1)
- feature creep (1,2,3)

The last reason is probably the most pressing. But is there a technical
reason not to allow fixed-sized blobs the are stored like primitive
types? Is there a schema definition trick I have overlooked?

With best regards,
Jan Huwald

signature.asc

Kenton Varda

unread,
Aug 23, 2013, 6:00:44 PM8/23/13
to Jan Huwald, capnproto
Hi Jan,

Sorry this was delayed; Google Group apparently thought it was spam and didn't notify me right away.

This is actually something we've struggled with.  In fact, at one point early in Cap'n Proto's development, I actually implemented "inline" fixed-length structs and list.  However, in the process of doing so, it became clear that they added quite a bit of complexity to the Cap'n Proto implementation.  It's hard to explain without going into a lot of detail about how everything works internally, but some parts of the code got really ugly, and there were a lot of weird quirks and pitfalls that came up.  Ultimately, other developers encouraged me to drop this feature, and I decided to do so, even though I had spent a couple weeks getting it working.

That said, one of the arguments against the feature was that it didn't actually buy very much.  Yes, it seems super ugly and inefficient to store a fixed-width array outside the struct.  But in practice, the overhead is only 8 bytes of memory and negligible CPU.  We're not talking about a call to malloc() here with associated heap bookkeeping.  Since Cap'n Proto uses arena allocation, the pointer itself is the only bookkeeping needed, and allocating space is just a matter of incrementing the "end of segment" pointer.

I do agree, though, that the need for the receiver to verify the list size is problematic.  This could be a common source of DoS attacks.  If we found a narrow solution for this problem, do you at least agree that the efficiency isn't a big deal?

-Kenton

Kenton Varda

unread,
Aug 28, 2013, 1:52:20 PM8/28/13
to Jan Huwald, capnproto
I had an interesting thought about this:  I don't think the list elements actually need to be located together in memory.

Consider that Cap'n Proto lists today do not give you a direct pointer to the underlying data -- you must use operator[] on the list object to access elements.  This is actually important, not just for handling endianness, but to allow protocol evolution -- a list of non-structs can be upgraded to a list of structs, and the structs can gain new fields over time.

So about inline lists:  We don't actually need the fields to appear sequentially in memory, we just need to generate accessor methods that know how to map numeric indices to elements.  So if you have an inline list of four int32s, we might as well allocate it as if we were allocating four separate int32 fields.

This has a couple neat advantages, implementation-wise:
- The layout algorithm doesn't have to become any more complicated, and can continue to guarantee no more than 63 bits lost to padding.
- Reading the values can be implemented in terms of the existing field accessor implementation, with straightforward support for default values.  I had been thinking before that it wouldn't be possible to support default values for these inline lists, and that the implementation would have to keep around a buffer of zeros to use to back a default instance of the list in cases where the parent struct was written using an older version of the schema that didn't yet include the inline list field.  Not having to do that really makes me feel a lot more comfortable about this.

I am thinking that the syntax for declaring an inline list would be like so:

    struct Foo {
      bar[4] @0 :Int32;  # list of four Int32s
    }

This syntax is intended to make it clear that "fixed list" is not an independent type, but an aspect of struct layout.  E.g. you can't have a list of fixed lists (something I found painful when trying to implement inline lists the first time).

What do people think?  Have I gone off the deep end?

-Kenton


On Thu, Aug 22, 2013 at 1:20 PM, Jan Huwald <j...@sotun.de> wrote:

Andrew Lutomirski

unread,
Aug 28, 2013, 2:01:34 PM8/28/13
to Kenton Varda, Jan Huwald, capnproto
Yes :)

I can't actually think of anything wrong with it other than the
awkward code you'll generate when you try to copy the thing to a flat
array.

--Andy

Andreas Stenius

unread,
Aug 28, 2013, 4:41:47 PM8/28/13
to Kenton Varda, Jan Huwald, capnproto
2013/8/28 Kenton Varda <temp...@gmail.com>
[...]
I am thinking that the syntax for declaring an inline list would be like so:

    struct Foo {
      bar[4] @0 :Int32;  # list of four Int32s
    }


I like it :)
 
This syntax is intended to make it clear that "fixed list" is not an independent type, but an aspect of struct layout.  E.g. you can't have a list of fixed lists (something I found painful when trying to implement inline lists the first time).

What do people think?  Have I gone off the deep end?

No, I think it plays out well..

<hijack thread>
Speaking of default values; it occurred to me that changing the default value is a incompatible change. Reading up on the docs, it certainly isn't mentioned as safe, thus should be considered unsafe, which it is..
yet, it feels like a trivial enough change that it ought to be safe (until you know how default values are treated)..
Maybe have a informal section in there with a few examples of breaking changes, such as changing the default values could be helpful..
</hijack thread>

Cheers

Kenton Varda

unread,
Aug 28, 2013, 4:52:02 PM8/28/13
to Andreas Stenius, Jan Huwald, capnproto
On Wed, Aug 28, 2013 at 1:41 PM, Andreas Stenius <g...@astekk.se> wrote:
<hijack thread>
Speaking of default values; it occurred to me that changing the default value is a incompatible change. Reading up on the docs, it certainly isn't mentioned as safe, thus should be considered unsafe, which it is..
yet, it feels like a trivial enough change that it ought to be safe (until you know how default values are treated)..
Maybe have a informal section in there with a few examples of breaking changes, such as changing the default values could be helpful..
</hijack thread>

Good idea, added to my todo list for the next release.

Jan Huwald

unread,
Aug 29, 2013, 5:25:35 AM8/29/13
to capn...@googlegroups.com
On 08/28/2013 08:01 PM, Andrew Lutomirski wrote:
> On Wed, Aug 28, 2013 at 10:52 AM, Kenton Varda <temp...@gmail.com> wrote:
>> I am thinking that the syntax for declaring an inline list would be like so:
>>
>> struct Foo {
>> bar[4] @0 :Int32; # list of four Int32s
>> }
>>
>> [...]
>>
>> What do people think?

Suits me, especially the possible gain in representational compactness.

For default values, one could use the same syntax as for non-inline lists.

It is also close to being extendable in a backward-compatible fashion:
increasing the list size would be possible, if one could put a version
number for the time of the increase somewhere. Like

struct Foo {
bar [4]@0, [8]@2, [12]@3 :Int8; # list of four, later eight and
foo @1 : Bool; # finally twelve bytes
}

Only that this looks quite ugly and thus should not be done until it is
_really_ necessary.


> I can't actually think of anything wrong with it other than the
> awkward code you'll generate when you try to copy the thing to a flat
> array.

The code should be equivalent to a completely unrolled loop with the
offsets encoded as immediates. Given that the (/my) use case is
relatively small lists (<= 8 elements) this is probably optimal. For
lists of types smaller than 64b reads of adjacent values might even be
coalesced.

Jan

signature.asc
Reply all
Reply to author
Forward
0 new messages