struct Foo fixed(2 words, 3 pointers) {# ... some fields that come to less than 2 words, 3 pointers ...}struct Bar {foo @0 :Inline(Foo);fourMoreFoos @1 : InlineList(Foo, 4);}
> struct Foo fixed(2 words, 3 pointers) {How about using a different term then struct? Though I presume it
# ... some fields that come to less than 2 words, 3 pointers ...
}
doesn't always need to be used Inline and normal usage is still
supported, in which case keeping struct might be wise.
> InlineData would make sense
Is InlineList(UInt8) sufficient or would you still want the typedef?
> In a somewhat related development, it occurred to me that when you are encoding a list of structs where the struct type is less than one word in size, you might as well just encode it like a primitive-value list. E.g. if you have a pair of Int16s in a struct, why not encode it like a list of 32-bit data values?This has an interesting side effect. It means a List(UInt8) can be
upgraded to a List(UInt16) where the first byte is taken.
Another interesting twist. In my API I return an inner pointer to the
struct data for a composite list. This way composite lists and pointer
lists look the same for readers that don't care. Now I would return an
inner pointer to the list data for any non-pointer list. Supporting
List(Bool) for this is kind of annoying. My pointers at the moment are
byte offset based.
--
> What we really want to know is, are there legitimate use cases where the savings of inline fields would be a large percentage of the overall message size?Do you care about heavily optimizing the in memory size or is the
compressed format more important?
> Note that manual inlining is not always feasible. Consider the case of a union which you really want to be reusable. You really can't just copy the whole union around, but most unions are 1 or 2 words so embedding it by pointer means 50%-100% overhead.I was wondering about this. If the struct is less than 7 bytes, could
we embed it in the pointer value itself. IE use a pointer tag of 2 +
something in first byte, and then the next 7 bytes are the actual
struct value. Would save you a word, but seems a bit hackish ... But
it allows an implementation to use it and still have the ability to
switch to a normal pointer when the struct becomes larger.
The whole inline data thing feels like you're requiring the user to
know and guess the max size of the struct in the future, which would
necessitate that they know the packing format, alignment, etc and can
manually figure out how much space might be needed.
I'm not sure if thats the case. If we start out withList(UInt8)
Sometime down the road this gets updated to
struct Foo {
a @0 :UInt8;
}
List(Foo)
New users of List(Foo) will still serialize it as a List(U8).
Further down the line this gets updated to
struct Foo {
a @0 :UInt8;
b @1 :UInt8;
}
Newer users will create it as a List(U16) via the same rule.
All three users should be able to read data from the new users.
I've handled this quite cleanly by divorcing the API data size from
the serialized data size, even if the data is serialized as a
composite type. As long as the API size <= serialized size, I just
return the first x bytes.
For my C code atm I don't allow the upgrade, so to read a List(Bool)
it must also be serialized as a List(Bool). This vastly simplifies all
the other cases and means I can use pointers rather than bit offsets.
The go code uses bit offsets.
Further thinking: the Bool type itself is violating the Liskov substitution principle. If you have a List(T), where T can be "any type", and "bool is a type", then List(Bool) should be valid, and should behave exactly as all other List(T). Instead there are special restrictions on List(Bool).
Here's a third idea: Eliminate Booleans as a base type. It sounds crazy, but I don't think it's totally crazy.- As suggested earlier, offer a BoolList or VectorBool type for variable-length bool lists.- Offer a "Bitset" type with symbolic names for the bits. Without thinking too deeply, I think it may be possible to offer an upgrade path from bitset to struct { bitset@0, <other fields>}
This is, I think, just complicated enough to inspire a featurerequest: can we have something like capnpc
--are-definitions-compatible?
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto?hl=en-US.
The "same direction" constraint is surprising. Is this because you don't allow holes in the numbering? If you did, and you never reused an old number for something else, would the "same direction" constraint go away? Are you disallowing this only to avoid a minor representation cost?
On Tue, Apr 30, 2013 at 8:51 PM, Mark S. Miller <eri...@google.com> wrote:
The "same direction" constraint is surprising. Is this because you don't allow holes in the numbering? If you did, and you never reused an old number for something else, would the "same direction" constraint go away? Are you disallowing this only to avoid a minor representation cost?So, JSON, Protobufs, and everything else all have the same directionality constraint, but only in that you can't reuse the same field identifier (name in JSON, number in protobufs) for a differently-typed field. I assume you only find the constraint surprising in the case where reuse isn't happening, right? Otherwise, if compatibility were fully transitive and commutative, then every struct would be compatible with every other struct, since every struct is clearly compatible with the empty struct.I think what you're missing about Cap'n Proto is that the field numbers do not directly translate to positions. Different fields have different widths, and pointers are actually segregated from flat data. Therefore, the position of any particular field necessarily depends on what fields were declared "before" it. The purpose of the field numbers in Cap'n Proto is only to provide a stable ordering -- one which developers are unlikely to "mess up" by adding new fields somewhere other than at the end of the struct.The restriction that field numbers cannot have "holes" follows from this. Removing an earlier field would cause all later fields to shift forward, so this cannot be allowed.In practice, no one ever removes fields from protobufs, because doing so leads to the risk that the field number will be accidentally reused. Instead, they rename the field to something like "OBSOLETE". So, Cap'n Proto is merely enforcing what was already best practice.
message AnOldAndCruftyMessage {optional int32 a_field = 3;// ...many more fields// Deleted fields: 10, 17, 22, 3.}
struct Person {name @0 :Text;birthdate @3 :Date;
email @1 :Text;phones @2 :List(PhoneNumber);DELETED_crufty_extras_here @10 :DELETED_CruftyStruct;}struct DELETED_CruftyStruct {// This used to have lots of crufty fields, but is deleted.}
struct Person {name @0 :Text;birthdate @3 :Date;email @1 :Text;phones @2 :List(PhoneNumber);Deleted(10 :Struct, 5 :Int32);}
On Mon, Apr 29, 2013 at 6:09 PM, Kenton Varda <temp...@gmail.com> wrote:
I am sympathetic to this argument, particularly after implementing inlines and finding them more complicated and quirky than I expected. E.g. the fact that inline fields cannot have default values is weird, but implementing default values would have excessive overhead. And the fact that fixed-width structs explicitly abandon the extensibility principles that are considered a key feature of the whole protocol is... worrying.I think that measuring the absolute number of bytes saved by an inline field doesn't tell us much. What we really want to know is, are there legitimate use cases where the savings of inline fields would be a large percentage of the overall message size?
Agreed.
Note that manual inlining is not always feasible. Consider the case of a union which you really want to be reusable. You really can't just copy the whole union around, but most unions are 1 or 2 words so embedding it by pointer means 50%-100% overhead. On the other hand, my argument against making unions reusable in the first place was explicitly because you might later decide you want to attach some extra information to them, which you can't do with a fixed-width struct.
Good point. How about allowing users to trade processing time for memory efficiency, and get reusable unions to boot? Encode the maximum size of the union in the first three bits of the discriminant* (I'm sure users will be fine with only 8192 possible union fields), but rather than having the value immediately follow, place it at the end of the data section, and place the values of unions with higher ordinals in reverse order from there. For example:
union A {
one @0 :Boolean;
two @1 :UInt8;
}
union B {
one @0 :Int16;
two @1 :Object;
}
struct Foo {
x @0 :A;
bar @1 :UInt8;
y @2 :B;
baz @3 :Int16;
qux @4 :Text;
}
+--------+---------------+-----------+------------+
| x_s(3) | x_disc(13) | bar(8) | pad(8) | // normal, absolutely-positioned fields
+--------+---------------+-----------+------------+
| y_s(3) | y_disc(13) | baz(16) |
+--------+---------------+------------------------+
| pad(32) | // pad to word boundary
+------------------------+-----------+------------+
| y_data(16) | pad(8) | x_data(8) | // union data fields in reverse order
+------------------------+-----------+------------+
| qux(64) | // here be the pointer section
| |
+-------------------------------------------------+
| y_pointer(64) | // union pointer fields in reverse order
| |
+-------------------------------------------------+
This allows unions to evolve independently of one another, making them reusable (i.e. typed), while consuming no extra space. The downside, however, is that since the size of a typed union is no longer known beforehand, its position must be calculated relative to all typed unions before it - in other words, worst-case access time is linear in the number of typed unions in the struct (although presumably the information will be cached on successive lookups). Since I expect the number of unions (especially typed unions) per struct to be quite small, I doubt this will be an issue in practice.
Since typed unions may grow freely, extra information can be attached by adding a new union field pointing to a struct with the data, although the other union fields might have to be duplicated in the referenced struct, and the client API would be clumsy. Instead, by stealing another bit from the discriminant, we can indicate whether there is an "extra information" struct pointer following the normal pointer section entry for the union (if present). This makes handling default values for versions missing the extra information a snap, and takes up the exact same amount of space as it would if the user had had the foresight to wrap the union in a struct beforehand. First-class support in the schema language could look like one of the following, and would also be applicable to enums:
union Foo {
one @0 :Boolean;
two @1 :UInt8;
data {
bar @0 :Float32;
qux @1 :Text;
}
}
# or
struct Extra {
bar @0 :Float32;
qux @1 :Text;
}
union Foo data(Extra) {
one @0 :Boolean;
two @1 :UInt8;
}
Thoughts?
* Similar in spirit to the C section of list pointers, but instead divided into two bits for the size in bytes (1, 2, 4, or 8; 0 is already covered by enums), and one for whether a pointer value is possible.
Anyone else have an opinion on this? Did I just waste the last week working on a misfeature? (Well, I did fix a bunch of small things at the same time, so not a complete waste, but...)
<snip>
What about making List(Bool) a separate toplevel type? BoolVector? BoolList? Or even consider omitting specialized list of bools entirely from the first version or two of capnproto until a clear need is demonstrated.
What are the expected use cases for lists of bools? Expanding bitfields? Wouldn't those be better served by regressing them to boolean types?In my experience violating the liskov substitution principle is bad, and List(Bool) feels like it's doing exactly that.
Under this scheme, calculating the offset of a union value would be complicated. Consider the case where a 1-byte union is followed by a 2-byte union. The 2-byte union has to be properly-aligned, so simple addition is not good enough. Also, these unions cannot be located in space otherwise wasted to padding. A 1-byte union followed by a 4-byte union followed by another 1-byte union would end up taking 16 bytes. Seems too complicated.
I'm also not confident at all in your expectation that few structs would include more than one union. For example, just in schema.capnp I have Type and Value unions which usually come in pairs. I could easily see someone used to algebraic typing writing lots of union type and using them extensively.