On canonicalisation, upgrades and defaults

David Buckley

unread,

Dec 16, 2015, 10:25:00 AM12/16/15

to Cap'n Proto

So I decided to have a play at writing a canonicalisation routine as I couldn't find anyone else's.

It should, in principle, be easy, as it's just a bunch of rules, but here's a few increasingly hard questions. I'll ignore the envelope as generating that part is trivial.

Suppose I encode this:

struct Null {
}

I should presumably output a null pointer (ie. one word, equal to zero). Unambiguous? I notice the official encoder uses (-1 << 2) to represent it as a root, and (0) when it's a field.

Now suppose it's not so null:

struct AnInt {
i @0 :Int64 = 0;
}

If I encode (i = 0) should I again write just a single null pointer? Any value in offset would violate a bounds check.

Let's try for something harder, now.

struct FunkyStruct {
willBeNull @0 :AnInt;
willNotBeNull @1 :AnInt;
}

I encode (willBeNull = (i=0), willNotBeNull = (i=1)). Now we're getting ambiguous. willBeNull, in pre-order, must start after the end of the root message. So do I encode this as (2<<48, 1<<2, 1<<32, 1) or as (2<<48, 0, 1<<32, 1)? That is, should my empty struct become a null pointer or a pointer of length 0 to the area just after the root message? Both of these are valid according to the canonical spec, and as far as I can tell are valid encodings, too (the former being output by the official 'capnp' tool when I encode '(willNotBeNull=(i=1))').

To be clear, my complaint is that the following both appear to be canonical and equal messages:

0000 0000 0000 0200
0400 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

and:

0000 0000 0000 0200
0000 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

OK, this is just ambiguity. I would guess the intention is to encode the null pointer in this case (new rule: if struct length is zero, offset must be zero also). BUT!

Suppose I upgrade to:

struct AnInt {
    i @0 :Int64 = 0;
    j @1 :Int64 = 1;
}

struct FunkyStruct {
   willBeNull @0 :AnInt = (j=0);
   willNotBeNull @1 :AnInt = (j=0);
}

Now the two encodings listed above actually have different values for willBeNull.j (the first has j=0, the second j=1)! So while it seems obvious to delete empty structs, it can fundamentally alter the message as seen by an upgraded client -- and we actually need a way to encode zero-length structs with an offset to disambiguate the two cases.

NEXT!

struct AnInt {
    i @0 :Int64 = 0;
}

struct DefaultInt {
    i @0 :Int64 = 1;
}

struct DefaultStruct {
    willBeNull @0 :DefaultInt;
    willNotBeNull @1 :AnInt = (i=1);
}

Now I encode (willBeNull = (i=1), willNotBeNull = (i=1)). The official encoder produces (2<<48, 1<<2+1<<32, 1<<2+1<<32, 0, 1) as we expect. Canonicalisation says I must encode willBeNull as a zero-length struct (ie. null?). Fine. What about willNotBeNull? Without reference to the schema, we must encode this as-is, so (2<<48, 0, 1<<32, 1) is canonical. BUT. This message is equal to (2<<48, 0, 0) which is equal to (0). In fact, I can get the official encoder to produce the first of these two by encoding '()'. Which of these encodings are canonical?

To be clear, my complaint is that the following encodings are canonical, but equal as messages:

0000 0000 0000 0000

0000 0000 0000 0200
0000 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

Since both are valid encodes, and both are canonical according to the spec, does defined-ness of a field alter the canonical encoding of a message, even though any attempt to directly read any field from either message will yield the same answer?

This obviously affects upgraded clients in the same way as above, too.

There are parallel issues with respect to lists. We can't null an empty list (even though null is a valid encoding of it, if it's the default), and upgrading from "List(T) = [(i=0)]" to "List(T) = [(i=0, j=1)]" where j is a new field is undefined unless j=1 is default on T also.

The upgrade stuff is of particular concern; the spec doesn't mention that adding new composite fields with defaults might see the defaults violated in upgraded messages, though knowing the encoding makes it fairly obvious. I believe the canonicalisation stuff is fixable provided it is made clear that assigned-ness is preserved, and there is get a special value for "pointer to a zero length struct with offset zero" (which must be generated by implementations when they realise empty structs like Null above, also).

David Renshaw

unread,

Dec 16, 2015, 11:21:55 AM12/16/15

to David Buckley, Cap'n Proto

Hi!

In Cap'n Proto, pointer fields are always optional. A null pointer is never equal to a non-null pointer, even if the thing pointed to has all default values in its fields, and even if it has zero size. (See https://capnproto.org/faq.html#how-do-i-make-a-field-optional .)

Inline responses follow.

On Wed, Dec 16, 2015 at 10:24 AM, David Buckley <isreal-g...@bucko.me.uk> wrote:

So I decided to have a play at writing a canonicalisation routine as I couldn't find anyone else's.

It should, in principle, be easy, as it's just a bunch of rules, but here's a few increasingly hard questions. I'll ignore the envelope as generating that part is trivial.

Suppose I encode this:

struct Null {
}

I should presumably output a null pointer (ie. one word, equal to zero). Unambiguous? I notice the official encoder uses (-1 << 2) to represent it as a root, and (0) when it's a field.

A field of type `Null` has two possible values: null, encoded as eight bytes of zeroes, and non-null, encoded as a struct pointer with an offset of -1. These values are not equal.

Now suppose it's not so null:

struct AnInt {
i @0 :Int64 = 0;
}

If I encode (i = 0) should I again write just a single null pointer? Any value in offset would violate a bounds check.

Let's try for something harder, now.

struct FunkyStruct {
willBeNull @0 :AnInt;
willNotBeNull @1 :AnInt;
}

I encode (willBeNull = (i=0), willNotBeNull = (i=1)). Now we're getting ambiguous. willBeNull, in pre-order, must start after the end of the root message. So do I encode this as (2<<48, 1<<2, 1<<32, 1) or as (2<<48, 0, 1<<32, 1)? That is, should my empty struct become a null pointer or a pointer of length 0 to the area just after the root message? Both of these are valid according to the canonical spec, and as far as I can tell are valid encodings, too (the former being output by the official 'capnp' tool when I encode '(willNotBeNull=(i=1))').

To be clear, my complaint is that the following both appear to be canonical and equal messages:

0000 0000 0000 0200
0400 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

and:

0000 0000 0000 0200
0000 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

These messages are not equal. In the first one, the `willBeNull` field is non-null. In the second, the `willBeNull` field is null.

The first message canonicalizes to

0000 0000 0000 0200
FCFF FFFF 0000 0000

0000 0000 0100 0000
0100 0000 0000 0000

The second message is already canonical.

OK, this is just ambiguity. I would guess the intention is to encode the null pointer in this case (new rule: if struct length is zero, offset must be zero also). BUT!

Suppose I upgrade to:

struct AnInt {
    i @0 :Int64 = 0;
    j @1 :Int64 = 1;
}

struct FunkyStruct {
   willBeNull @0 :AnInt = (j=0);
   willNotBeNull @1 :AnInt = (j=0);
}

Now the two encodings listed above actually have different values for willBeNull.j (the first has j=0, the second j=1)! So while it seems obvious to delete empty structs, it can fundamentally alter the message as seen by an upgraded client -- and we actually need a way to encode zero-length structs with an offset to disambiguate the two cases.

Yes, and that's exactly what we do. (See https://github.com/sandstorm-io/capnproto/blob/6f26adbeef8e124d0f3d66fb02f932d5ae20d7fa/c%2B%2B/src/capnp/layout.c%2B%2B#L139-L146 .)

NEXT!

struct AnInt {
    i @0 :Int64 = 0;
}

struct DefaultInt {
    i @0 :Int64 = 1;
}

struct DefaultStruct {
    willBeNull @0 :DefaultInt;
    willNotBeNull @1 :AnInt = (i=1);
}

Now I encode (willBeNull = (i=1), willNotBeNull = (i=1)). The official encoder produces (2<<48, 1<<2+1<<32, 1<<2+1<<32, 0, 1) as we expect. Canonicalisation says I must encode willBeNull as a zero-length struct (ie. null?).

It should be zero-length, but non-null.

Fine. What about willNotBeNull? Without reference to the schema, we must encode this as-is, so (2<<48, 0, 1<<32, 1) is canonical. BUT. This message is equal to (2<<48, 0, 0)

I don't follow this step. These messages look unequal to me. Did you mean to say that `willNotBeNull` is of type `DefaultInt`?

which is equal to (0). In fact, I can get the official encoder to produce the first of these two by encoding '()'. Which of these encodings are canonical?

To be clear, my complaint is that the following encodings are canonical, but equal as messages:

0000 0000 0000 0000

0000 0000 0000 0200
0000 0000 0000 0000
0000 0000 0100 0000
0100 0000 0000 0000

Since both are valid encodes, and both are canonical according to the spec, does defined-ness of a field alter the canonical encoding of a message, even though any attempt to directly read any field from either message will yield the same answer?

This obviously affects upgraded clients in the same way as above, too.

There are parallel issues with respect to lists. We can't null an empty list (even though null is a valid encoding of it, if it's the default), and upgrading from "List(T) = [(i=0)]" to "List(T) = [(i=0, j=1)]" where j is a new field is undefined unless j=1 is default on T also.

With lists too, a null pointer can never equal a non-null pointer.

The upgrade stuff is of particular concern; the spec doesn't mention that adding new composite fields with defaults might see the defaults violated in upgraded messages, though knowing the encoding makes it fairly obvious. I believe the canonicalisation stuff is fixable provided it is made clear that assigned-ness is preserved, and there is get a special value for "pointer to a zero length struct with offset zero" (which must be generated by implementations when they realise empty structs like Null above, also).

Yep, that's what we do. The special value is "struct pointer with size 0 and offset -1".

- David

Kenton Varda

unread,

Dec 16, 2015, 2:24:04 PM12/16/15

to David Renshaw, Andrew Lutomirski, Matthew Maurer, David Buckley, Cap'n Proto

(cc'ing people who I think mentioned interest in writing a canonicalizer in the past)

I think there's actually an open question here:

Is a non-null pointer to an empty struct subject to the "preorder" rule?

David (Renshaw) seems to be saying "no": A non-null pointer to an empty struct should always have offset = -1. (Since the offset is relative to the *end* of the pointer, an offset of -1 effectively points back to the beginning of the pointer, so is always in-bounds.)

However, I think I in the past have answered "yes": A pointer to an empty struct (or list) is subject to the same preorder constraints as pointers to non-empty objects. This means that it points to the same memory location as the object that comes after it -- because the pointer points to a zero-size object. As a special exception, if this rule would result in a struct pointer having an offset of zero, then instead the offset is replaced with -1, in order to avoid ambiguity with a null pointer.

I initially thought the latter rule would make implementations simpler in practice because I thought it meant fewer special cases in practice. However, thinking about it now, I'm not sure that's the case. E.g. when validating a canonical message by checking that the pointers are in order, you'd still have to account for the possibility that the last pointer in the struct could have a -1 offset after other pointers had positive offsets.

So maybe we should adopt David's version for simplicity? I'm not sure if anyone has actually implemented it either way yet.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

David Buckley

unread,

Dec 16, 2015, 4:02:39 PM12/16/15

to Cap'n Proto, isreal-g...@bucko.me.uk, da...@sandstorm.io

Right, so it looks like I can make a well-defined canonicalisation, and the issues are simply with documentation:

The -1 offset trick is only mentioned in the code. This ought to be specifically mentioned in the canonicalisation section of the encoding spec.
That null is not equal to empty is also absent from the documentation I saw (it doesn't have any section on equality but if we define "equals" as "has the same canonical representation", adding a note to this effect in the canonicalisation section ought to cover it).
In the upgrade page, there is no mention of the trap where you upgrade a struct S and assign a default value D to the new field that is different to the default value D2 of the field as a sub-field of a second struct S2. I guess the easy way to say this is "any default value for a field with struct/list type T must not assign a value to a new field of T which is different to its default" or some-such. It's not at all obvious from the spec that this is the case, and I'll bet it'll catch someone out!

WRT when to encode as zero, I think you're arguing between "a pointer which is not null and would pre-order encode to 0 must have offset -1" and "a struct pointer which has zero length must have offset -1". The second rule seems shorter, and I'm a big fan of golf. :)

The official implementation does not implement these rules, by the way:

$ capnp --version
Cap'n Proto version 0.5.3
$ tail test.capnp -n 6
struct Null {
}
struct NuNull {
i @0 :Null;
j @1 :Null;
}

$ echo '(i=(), j=())' | capnp encode test.capnp NuNull | xxd
00000000: 0000 0000 0300 0000 0000 0000 0000 0200 ................
00000010: fcff ffff 0000 0000 0000 0000 0000 0000 ................
$ echo '(i=(), j=())' | capnp encode test.capnp NuNull | capnp decode test.capnp NuNull
(i = ())

Seems like it's implementing non-null -1 pointers exactly when they /aren't/ needed, though I didn't check the source!

Andrew Lutomirski

unread,

Dec 16, 2015, 4:40:00 PM12/16/15

to Kenton Varda, David Renshaw, Matthew Maurer, David Buckley, Cap'n Proto

On Wed, Dec 16, 2015 at 11:23 AM, Kenton Varda <ken...@sandstorm.io> wrote:
> (cc'ing people who I think mentioned interest in writing a canonicalizer in
> the past)
>
> I think there's actually an open question here:
>
> Is a non-null pointer to an empty struct subject to the "preorder" rule?
>
> David (Renshaw) seems to be saying "no": A non-null pointer to an empty
> struct should always have offset = -1. (Since the offset is relative to the
> *end* of the pointer, an offset of -1 effectively points back to the
> beginning of the pointer, so is always in-bounds.)
>
> However, I think I in the past have answered "yes": A pointer to an empty
> struct (or list) is subject to the same preorder constraints as pointers to
> non-empty objects. This means that it points to the same memory location as
> the object that comes after it -- because the pointer points to a zero-size
> object. As a special exception, if this rule would result in a struct
> pointer having an offset of zero, then instead the offset is replaced with
> -1, in order to avoid ambiguity with a null pointer.
>
> I initially thought the latter rule would make implementations simpler in
> practice because I thought it meant fewer special cases in practice.
> However, thinking about it now, I'm not sure that's the case. E.g. when
> validating a canonical message by checking that the pointers are in order,
> you'd still have to account for the possibility that the last pointer in the
> struct could have a -1 offset after other pointers had positive offsets.
>
> So maybe we should adopt David's version for simplicity? I'm not sure if
> anyone has actually implemented it either way yet.

I don't have a strong preference here. Thinking about it, I think I
agree that using a fixed -1 offset for pointers to empty structs may
make more sense.

--Andy

David Renshaw

unread,

Dec 16, 2015, 5:38:23 PM12/16/15

to David Buckley, Cap'n Proto

On Wed, Dec 16, 2015 at 4:02 PM, David Buckley <isreal-g...@bucko.me.uk> wrote:

The official implementation does not implement these rules, by the way:

$ capnp --version
Cap'n Proto version 0.5.3
$ tail test.capnp -n 6
struct Null {
}
struct NuNull {
i @0 :Null;
j @1 :Null;
}
$ echo '(i=(), j=())' | capnp encode test.capnp NuNull | xxd
00000000: 0000 0000 0300 0000 0000 0000 0000 0200 ................
00000010: fcff ffff 0000 0000 0000 0000 0000 0000 ................
$ echo '(i=(), j=())' | capnp encode test.capnp NuNull | capnp decode test.capnp NuNull
(i = ())

Seems like it's implementing non-null -1 pointers exactly when they /aren't/ needed, though I didn't check the source!

Good find! This is a bug. That message should look like this:

0000 0000 0300 0000

0000 0000 0000 0200
fcff ffff 0000 0000

fcff ffff 0000 0000

The problem is that orphans (https://capnproto.org/cxx.html#orphans ) don't correctly handle the case where the orphan is a zero-sized struct.

I have a working fix that I plan to clean up and then submit as a pull request.

- David

David Buckley

unread,

Dec 16, 2015, 7:15:46 PM12/16/15

to Cap'n Proto, isreal-g...@bucko.me.uk, da...@sandstorm.io

https://github.com/bucko909/capnp-canonicalize

I've implemented both versions of zero-width struct encoding ("-1 always" is the default). The code is a bit rough and ready, but seems to be the identity on:

capnpc -o/bin/cat /usr/include/capnp/schema.capnp

according to:

capnp decode /usr/include/capnp/schema.capnp CodeGeneratorRequest

while taking about 1/3 from the file size (output is 2093 words, down from 2962). Which I take to mean that it's done some canonicalisation. :)

There's a bug where it won't handle the 2-byte far pointers properly if the content is a list of structs, due to the stupid way I've implemented it, but other than that I believe the result is sound.

Kenton Varda

unread,

Dec 16, 2015, 7:30:15 PM12/16/15

to David Buckley, Cap'n Proto, David Renshaw

On Wed, Dec 16, 2015 at 1:02 PM, David Buckley <isreal-g...@bucko.me.uk> wrote:

In the upgrade page, there is no mention of the trap where you upgrade a struct S and assign a default value D to the new field that is different to the default value D2 of the field as a sub-field of a second struct S2. I guess the easy way to say this is "any default value for a field with struct/list type T must not assign a value to a new field of T which is different to its default" or some-such. It's not at all obvious from the spec that this is the case, and I'll bet it'll catch someone out!

Maybe I'm misunderstanding, but isn't this covered by:

> You cannot change a field or method parameter’s type or default value.

(from the docs at https://capnproto.org/language.html#evolving-your-protocol)

-Kenton

David Buckley

unread,

Dec 16, 2015, 7:58:53 PM12/16/15

to Cap'n Proto, isreal-g...@bucko.me.uk, da...@sandstorm.io

On Thursday, 17 December 2015 00:30:15 UTC, Kenton Varda wrote:

Maybe I'm misunderstanding, but isn't this covered by:

> You cannot change a field or method parameter’s type or default value.

(from the docs at https://capnproto.org/language.html#evolving-your-protocol)

I realise that there might be some technical truth to that, but it doesn't read as covering the evolution from:

struct Simple {
}
struct Composite {
a @0 :Simple = ();
}

to:

struct Simple {
b @0 :Int64;
}
struct Composite {
a @0 :Simple = (b = -1);
}

After all, at no point here have I changed a default value of an existing field. The default for a.b has always been -1. I suppose you might argue that a.b was previously equal to default_if_exists(Simple.b) or somesuch, but to me, this is a property of the underlying encoding. A similar argument (again, valid without knowing the encoding) might say that just adding a field to Simple (with a default?) changes the default value of Composite.a, on account of it having the new field!

When I read the above schema, I naturally think of Composite.a as having a special subtype of Simple which has different defaults, rather than being a Simple, which by default is initialised differently, if you can see the distinction there. I know this is wrong, but the fact that I naturally think it and mentally correct myself due to being acquainted with the underlying wire format means that I suspect many people just reading the tutorial and not thinking too hard would believe this without realising their error. I'm pretty sure I've discussed this part of the spec with people and proved the behaviour by experiment before now. Being specific here in the documentation might save future people-like-me this task!

lcy03...@gmail.com

unread,

Dec 22, 2015, 2:40:17 AM12/22/15

to Cap'n Proto, isreal-g...@bucko.me.uk, da...@sandstorm.io

So I suggest remove defaults completely, or in other words, all values are forced to default to zero and all pointers to null. Also in C/C++, defaults are an indication of ill-formed interfaces. They bring troubles in an unpredictable way. And their benefits can always be achieved by overloaded functions or carefully defined enums.

And things will be even worse in capnp. No one will read your capnp files carefully because they are for the capnp compiler. And no one will read the capnp.h's ever because they are nonsense of the capnp compiler. So when you claims "all Simple.b defaults to 0 but this one here defaults to -1" you are effectively trapping other programmers. For example, a simple nut may say 'default is default' and construct a default Simple and adopt it to Composite.a, who knows what default does he expect?

Reply all

Reply to author

Forward