Slightly changing layout rules for unions... to support "groups"

245 views

Skip to first unread message

Kenton Varda

unread,

Aug 1, 2013, 8:52:48 PM8/1/13

to capnproto

I've decided to make a small change to the rules governing how union members are laid out within a class. Imagine the following case:

struct S {

u @0 union {

foo @1 :Int32;

bar @2 :Int64;

baz @3 :Int8;

}

As you know, the offset of each field in the struct depends only on members with lower ordinals, never members with higher ordinals, so that adding new members never changes the offset of existing members.

In this case, the fields are laid out like so:

- Since the union itself has ordinal zero, its discriminant is placed first, and offset zero.

- foo is placed at the lowest available properly-aligned offset, namely 32 bits.

- bar cannot be placed to overlap foo, because it would not be aligned. So it is placed at an offset of 64 bits.

- Under the existing rules, baz is placed to overlap the largest existing union member, namely bar, at an offset of 64 bits.

- Under the new rules, baz is placed to overlap the smallest existing union member which is at least as large as baz. So, it overlaps foo, at 32 bits.

This change has no advantages or disadvantages on its own. The reason I want to introduce it is to support a concept of "groups", which are members of a union that actually have multiple fields:

struct S {

u @0 union {

foo @1 :Int16;

bar @2 :Int64;

corge group {

baz @3 :Int8;

qux @4 :Int64;

}

Here, baz and qux can be present at the same time, because they are both part of group "corge".

I believe groups are necessary to solve a common problem: You've created a union containing some primitive values and then, later on, you realize that one of the union members should have some extra data associated with it. You can't change that member to a struct, since that would be backwards-incompatible. So probably what you end up doing is adding a new field outside the union and saying "This is only valid if the union's field baz is set". That's ugly, error-prone, and wasteful. Instead, the language should allow you to declare exactly what you mean -- that the new field comes together with baz.

In any case, since the whole point of groups is to be introduced retroactively, it is important that the above two example declarations be compatible (baz should have the same offset in both).

But where does that leave qux? Under the old rules, baz would be overlapping bar, and thus the only option for qux would be to allocate all-new space within the struct, placing it at an offset of 128. Under the new rules, baz overlaps foo, and therefore qux can overlap bar, and the struct size stays fixed.

This rule change is, unfortunately, backwards-incompatible on the wire. However, it's a fairly obscure corner case, and I'm not aware of anyone actually using this in production yet, so this should not be a big deal, right? In the worst case, if you had an existing struct declared like the first version of S above, you could simply change baz's type from Int8 to Int64 to force its alignment back to what it was before.

(I wonder if I should have just had people specify bit offsets rather than ordinals in the first place...)