Safety of retroactive unionization

Yaron Minsky

unread,

May 14, 2019, 4:06:42 PM5/14/19

to capn...@googlegroups.com

Retroactive unionization is only backwards compatible, not forward
compatible, right? So, if I start with this struct:

struct Person {
name @0 :Text;
email @1 :Text;
}

And decide that I want to evolve it to this one:

struct Person {
name @0 :Text;
union {
email @1 :Text;
age @2 :Float64;
}
}

(I know it's not a very meaningful example).

If I write something in the new spec that uses the age branch of the
union, the old struct can try to read it, and get very confused. In
particular, if someone tries to read the email for a struct that
actually populates age, they'll end up reading a Float64 as if it were
a pointer to a text block.

Am I understanding the issue correctly? If so, how do people handle
these kinds of protocol changes? Do people use retroactive
unionization in practice? Do people use schema validation of some
kind to detect when someone makes a potentially unsafe change like
this one?

y

Ian Denhardt

unread,

May 14, 2019, 4:47:01 PM5/14/19

to Yaron Minsky, capn...@googlegroups.com

Quoting Yaron Minsky (2019-05-14 16:06:29)

> If I write something in the new spec that uses the age branch of the
> union, the old struct can try to read it, and get very confused.
> In particular, if someone tries to read the email for a struct that
> actually populates age, they'll end up reading a Float64 as if it were
> a pointer to a text block.

In this case, the failure mode is a bit different, because the wire
encoding separates pointers and basic types (like Float64) into two
sections:

https://capnproto.org/encoding.html#structs

For this example, the old code would read the email as a null pointer,
ignoring the union tag saying that email is not set. If you added more
pointer fields to the union, like:

struct Person {
name @0 :Text;
union {
email @1 :Text;
age @2 :Float64;

emergencyContact @3 :Person;
mailingAddress @4 :Text;
}
}

Then it would read the pointer back without realizing it was a different
field. In the above, if mailingAddress was set you might run in to some
amusing bugs where code tried to send email to street addresses. If
emergencyContact was set, I know the Haskell implementation will throw
an exception when it reads it, since it's a struct pointer instead of a
text (list) pointer. I believe the C++ implementation can be configured
to do either that or return a default value? Not sure about others.

There are also some docs on protocol evolution:

https://capnproto.org/language.html#evolving-your-protocol

> Am I understanding the issue correctly? If so, how do people handle
> these kinds of protocol changes? Do people use retroactive
> unionization in practice?

If you have the foresight to know that you *might* add other variants
later, you can do something like this:

https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/capnp/rpc.capnp#L1016

..but obviously that's not really "retroactive." I don't think there's
a general solution, and I haven't seen it done in practice.

> Do people use schema validation of some kind to detect when someone
> makes a potentially unsafe change like this one?

Tools that do this sort of thing have been discussed before and
consensus is that it would be useful, but I don't think anyone's written
any.

Hope this helps,

-Ian

Kenton Varda

unread,

May 14, 2019, 5:13:00 PM5/14/19

to Yaron Minsky, Cap'n Proto

Hi Yaron,

Ian already answered the question, but I thought I'd add:

For protocols that are published publicly and used by arbitrary parties that you don't control, retroactive unionization may indeed be too unsafe to really use.

Many protocols, though, are used privately between components of a system. In this case, forwards- and backwards-compatibility may be important in order to allow components to be updated independently, but compatibility only needs to extend between all components that are currently in production. In that case, it's quite common to do something like:

1) Retroactively unionize a field, but don't actually use them new variant yet.

2) Update each component that receives messages of the modified type, so that they are aware of the union.

3) Now, start setting the new variant where desired.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CACLX4jRucxh5%2BmrvkkbcvTgmEbxeAcY%2BEJa4XKw5y_-DZGHorQ%40mail.gmail.com.

Yaron Minsky

unread,

May 19, 2019, 3:14:58 PM5/19/19

to Kenton Varda, Cap'n Proto

Thanks to both of you. That all makes a ton of sense.

I'm thinking about the use of capnp in an environment where the
systems producing and consuming messages are at least sometimes under
enough control to make this kind of thing possible/desirable.

There are some cases where the native capnp versioning behavior seems
highly congenial, and others where I'm less certain. If it's not too
much of a bore, here's another case I've been thinking of that I'm not
sure how to handle.

Imagine I have an RPC protocol where the request has this form:

struct ListMatchingPeople {
age @0: Text;
emailDomain @1: Text;
}

Here, the implied semantics of the RPC is that it should return a list
of all people who match the listed criteria. Now, let's say I decide
that I want to extend the RPC to allow people to also filter by
occupation, so I add a new field.

struct ListMatchingPeople {
age @0: Text;
emailDomain @1: Text;
occupation @2: Text = "any";
}

Note that this has the nice property that the default value of the
field has the same semantics as just omitting the field, so if an old
client sends a message to a new RPC server, it will get the behavior
that would be expected.

The reverse versioning story doesn't work out that well, though. If I
send a message from a new client to an old server, then any occupation
specified by the old client will be unceremoniously ignored. You
might prefer the behavior of having the new message be rejected when a
non-default value for occupation was sent, but I think there's no way
to implement that within capnp.

Again, I'm curious in practice how people deal with this kind of
issue. Maybe the approach is simply as before to be aware of this
kind of problem, and roll the server before you roll clients.

You could also imagine some kind of dynamic exchange and validation of
schema that could detect this problem in advance, but since there's no
schema compatibility validator at present, I imagine no one is doing
that...

y

Kenton Varda

unread,

May 19, 2019, 3:42:00 PM5/19/19

to Yaron Minsky, Cap'n Proto

Hi Yaron,

There actually is a compatibility validator library in C++. If you load the old and new schemas into the same SchemaLoader object, it will throw an exception if they aren't compatible, according to the documented compatibility rules.

However, I don't think this is as useful as people imagine. It wouldn't solve the case you describe, because the validator has no way of understanding the implications of the "occupation" field being ignored. I think this is an AI-complete problem: you need an actual understanding of the semantics of each field to fully analyze its backwards-compatibility properties.

In practice, application developers still need to think about all of the combinations of new and old servers and clients, and how the introduction of a new field will affect them. You'll need to design an upgrade strategy on a case-by-case basis. What Cap'n Proto (like Protobuf, JSON, etc.) provides is some tools so that you can potentially design strategies that don't require upfront version/schema negotiation, but can instead be handled with a few lines of code in the changed method's implementation or caller. But like any tool, the developer has to consider how to use it properly in each situation.

Now, maybe there is room for a more powerful framework for detecting higher-level incompatibilities. For example, maybe in the use case you describe, we could imagine an annotation:

occupation @2: Text = "any" $nonDefaultMustBeKnown;

Then you could develop some sort of protocol, on top of Cap'n Proto, that does an upfront exchange of schemas, and detects that the "occupation" field is missing from the server's schema. It could then check any message sent to the server to see if `occupation` is not set to the default, and if so, generate an error. This could all be built on top of Cap'n Proto. I'm not aware of any other serialization system building something like this, though. It seems complex and I'm not sure if it's really worth it.

I prefer instead to do this sort of thing at the application layer. For example, you could have a boolean field in the response that indicates if the server recognized the `occupation` field, and the client could then discard results that it knows to be bad because this field is missing. Or you could define a simple application-level version number exchange that happens before making any calls, and use the version number in the specific places where needed to detect these problems. Or, you can make sure to update your server before your client. I find the best answer varies from case to case.

-Kenton

Yaron Minsky

unread,

May 19, 2019, 4:38:10 PM5/19/19

to Kenton Varda, Cap'n Proto

On Sun, May 19, 2019 at 3:41 PM Kenton Varda <ken...@cloudflare.com> wrote:
>
> Hi Yaron,
>

> There actually is a compatibility validator library in C++. If you
> load the old and new schemas into the same SchemaLoader object, it
> will throw an exception if they aren't compatible, according to the
> documented compatibility rules.

Yeah, I've heard that mentioned before. I'm curious what notion of
compatibility it checks. Forwards, backwards, or both? e.g., does it
flag the lack of forwards compatibility for retroactive unionization?

Makes sense. It's exactly this kind of design choice that I'm curious
about.

For what it's worth, much of my own experience is using a messaging
system with no built-in cross-version compatibility. In this system,
you build compatibility between different versions by writing explicit
upgrade and downgrade functions, along with protocols for negotiating
to the best shared version. Such a system requires the user to be
very explicit about the semantics of interactions across versions,
thus bypassing the AI-complete problem of figuring out how to approach
version changes. It's not the most efficient thing, and it requires a
decent amount of boilerplate for the conversions. But the chief
virtue is that the resulting behavior is pretty easy to reason about.

But, the boilerplate issues are enough to make us want to support
capnp-style versioning, which is why we're thinking about all this.
And the application-layer approaches you're describing are similar to
things we're considering, which is comforting.

y

Kenton Varda

unread,

May 19, 2019, 6:59:34 PM5/19/19

to Yaron Minsky, Cap'n Proto

On Sun, May 19, 2019 at 1:38 PM Yaron Minsky <ymi...@janestreet.com> wrote:

Yeah, I've heard that mentioned before. I'm curious what notion of
compatibility it checks. Forwards, backwards, or both? e.g., does it
flag the lack of forwards compatibility for retroactive unionization?

SchemaLoader only tries to determine whether one schema is a valid "upgrade" from another, essentially checking the rules documented at: https://capnproto.org/language.html#evolving-your-protocol

I actually don't remember if it allows retroactive unionization. There are definitely some types of changes that are situationally safe and which I've made in real systems which SchemaLoader would reject.

-Kenton

Ian Denhardt

unread,

May 19, 2019, 7:08:02 PM5/19/19

to Kenton Varda, Yaron Minsky, Cap'n Proto

As Kenton mentioned, capnp doesn't entirely lift the burden of thinking
about the high-level semantics of these things. One possible update
strategy for this particular case:

Suppose your old interface definition looks like this:

interface Server {
listMatchingPeople @0 ListMatchingPeople -> List(Text);
}

Rather than just adding an extra field to the existing request type, you
could add a new method, which will eventually replace the old one
entirely:

interface Server {
listMatchingPeople @0 ListMatchingPeople -> List(Person);
newListMatchingPeople @1 NewListMatchingPeople -> List(Person);
}

New clients will try to use newListMatchingPeople, and at first have to
decide what to do if the server throws an unimplemented exception --
maybe call the old method and do the filtering themselves. As you roll
out servers that support the new method, the implementations could
initially basically be aliases for one another.

At some point, when everything supports the new method, you can drop
the fallback code. It's possible to shuffle the names around to keep
things ergonomic; a common convention is to rename the old method to
something like obsoleteFoo/deprecatedFoo (As a side note, it would be
nice if there was a standard annotation that could be used to tell code
generators that a method should be considered "deleted", so the stubs
for the old method could be removed).

In this particular case, I might also ask whether the interface should
be redesigned for extensibility, and instead maybe do something like:

struct Filter {
union {

age @0 :Text;
emailDomain @1 :Text;

occupation @2 :Text;
}
}

interface Server {
deprecatedListMatchingPeople @0 ...;
listMatchingPeople @1 (filters :List(Filter)) -> List(People);
}

Where servers can reject any filters they don't recognize. It would be
easy enough to add 'and' and 'or' filters later if desired. Hopefully
this would avoid needing to do finicky staged upgrades as often in the
future.

-Ian

Reply all

Reply to author

Forward