A critique of the schema language.

156 views
Skip to first unread message

Ian Denhardt

unread,
Jun 25, 2019, 1:22:08 AM6/25/19
to capn...@googlegroups.com
Hey all,

A few weeks ago in the thread about the Elm implementation, I mentioned
that I had a longer critique of the schema language that I'd been
meaning to write up. I finally got around to it; the blog post is here:

https://zenhack.net/2019/06/25/a-critique-of-the-capnproto-schema-language.html

Cheers,

-Ian

Kenton Varda

unread,
Jun 26, 2019, 7:13:41 AM6/26/19
to Ian Denhardt, Cap'n Proto
Nice write-up, and good feedback.

I'd push back a little bit on the idea of "design for the call site", i.e. that ergonomics of application code take precedence over that of the schema language. My main concern with that notion is that schemas represent an interface, and interfaces need to be read and understood by many more people than implementations. For that reason I'd argue that it's important that schemas can be expressed as clearly as possible, with the ability to organize features semantically for presentation to a reader.

One thing that is part of that is making sure related declarations can be grouped together, so that they can easily be read and digested together, without scrolling back and forth. I'd say this is the main motivation for nested declarations -- they can live next to the field or method that references them, rather than far away at the global scope. It's also the motivation for inlined method parameters and results, rather than requiring separate structs.

Moreover, personally, I often write schemas as a way to organize my thoughts and lay out a design, before I start implementing. Here, again, as I organize my thoughts and rapidly change my mind about details, I think it's important that I don't have to scroll up and down constantly.

That said I do sympathize with the argument that having to constantly cross-reference other files while writing (or reading) code is painful. I still have trouble writing a method definition without looking at the generated header. I rely a lot of my IDE's autocomplete and jump-to-definition to make things more bearable. 10 years ago it would have been a lot more difficult.

----------

Regarding inheritance, I do think multiple inheritance is a must-have. Looking at Sandstorm, there are 23 interfaces that inherit other interfaces, and fully 16 of them are multiply-inherited. Some of them could perhaps have been replaced by composition. For example, VerifiedEmailSendPort could have inherited nothing and instead had two methods, getVerifiedEmail() and getSendPort(). Arguably that's even a better decide regardless, so that the capabilities can be passed on separately. (Though, of course, it's always possible to accomplish the same with membranes, albeit with higher cost.)

However, most of the multiple inheritance is to add persistence. The `Persistent` interface is essentially a mix-in, used to add a save() method that returns a token which can be used to get the same capability again.


You might argue that Persistent should also be composition-based: it could have a method that you call to get the actual object. However, this would mess up a lot of interface designs in Sandstorm. In lots of places, it turns out, interfaces don't really know if the capabilities they are passing are persistent or not. You'll see doc comments saying "this is persistent if X and Y are true". Sometimes it's fully expected that applications need to probe for save() support at runtime. I'd say the underlying reason it ends up this way is because persistence is fundamentally an orthogonal concern from business logic. Persistence is a logistical issue. It is very natural, then, for it to be a mix-in.

You could maybe argue that persistence should be baked more directly into the system. Maybe save() should be a method on Capability, even though not all objects will implement it. But, that's actually how I originally thought of persistence -- when I defined "level 2" of RPC to be persistence, I had imagined it being baked into the protocol. I ended up very happy that it didn't have to be; that persistence could be entirely defined in a separate, higher layer. Ultimately I don't think there is a single obvious design for persistence -- I think the concerns are, for example, very different between Sandstorm apps vs, say, people talking over the public internet. Even within Sandstorm, there are different realms of persistence which called for slightly different interfaces (SystemPersistent, AppPersistent, etc.). I also expect that there are features other that persistence which might be similarly orthogonal to business logic, which people will want the ability to mix-in in a similar way.

So I remain pretty happy with the decision to support multiple inheritance.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/156143984544.21360.5023497358389493172%40localhost.localdomain.

Ian Denhardt

unread,
Jun 27, 2019, 10:55:55 PM6/27/19
to Kenton Varda, Cap'n Proto
Quoting Kenton Varda (2019-06-26 07:13:03)

> One thing that is part of that is making sure related declarations can
> be grouped together, so that they can easily be read and digested
> together, without scrolling back and forth. I'd say this is the main
> motivation for nested declarations -- they can live next to the field
> or method that references them, rather than far away at the global
> scope.

One interesting approach that came to my head: you could allow
definitions to be syntactically nested in the schema (as they are now),
but not actually live in separate namespaces, so the generated code
could still have a flat namespace. A bit non-intuitive, but otherwise
seems to combine the best of both worlds, and it wouldn't be hard for
the schema compiler to generate a clear and helpful error message in
the case of collisions of this sort.

> It's also the motivation for inlined method parameters and results,
> rather than requiring separate structs.

I'll grant there's a big readability drop in most cases with separate
structs; I'm not happy with the current trade-off, but I see the
argument.

> However, most of the multiple inheritance is to add persistence.

Furthermore, all of these are just declaring a one off interface like:

interface PersistentFoo extends (Foo, SystemPersistent) {}

Which is not used anywhere else in the schema, and has no methods of its
own nor any meaningful semantics except for adding persistence to
another interface.

I imagine the reason for all of these is just that (as I understand it),
the C++ implementation doesn't provide any way to export a capability
that implements disjoint interfaces -- you need to have a common
subtype. I think these interfaces are really C++ implementation detail
unnecessarily leaking back into the schema definitions, making them less
declarative.

I suspect there is a better solution to be found involving direct
support for exporting disjoint method sets. You can do this with
the Go implementation, but it's a little non-obvious; see:

https://github.com/capnproto/go-capnproto2/issues/86

..and:

https://github.com/zenhack/sandstorm-filesystem/blob/master/filesystem/local/local.go#L231-L249

> I also expect that there are features other that persistence which
> might be similarly orthogonal to business logic, which people will
> want the ability to mix-in in a similar way.

This is a plausible argument, but I am dubious of including features
with demonstrable downsides on the basis of somewhat vague, so far
purely hypothetical use cases.

-Ian

Kenton Varda

unread,
Jun 28, 2019, 4:01:48 AM6/28/19
to Ian Denhardt, Cap'n Proto
On Thu, Jun 27, 2019 at 7:55 PM Ian Denhardt <i...@zenhack.net> wrote:
Quoting Kenton Varda (2019-06-26 07:13:03)

>    One thing that is part of that is making sure related declarations can
>    be grouped together, so that they can easily be read and digested
>    together, without scrolling back and forth. I'd say this is the main
>    motivation for nested declarations -- they can live next to the field
>    or method that references them, rather than far away at the global
>    scope.

One interesting approach that came to my head: you could allow
definitions to be syntactically nested in the schema (as they are now),
but not actually live in separate namespaces, so the generated code
could still have a flat namespace.  A bit non-intuitive, but otherwise
seems to combine the best of both worlds, and it wouldn't be hard for
the schema compiler to generate a clear and helpful error message in
the case of collisions of this sort.

Any particular code generator can also make this choice, although of course it's possible the names won't be sufficiently unique if the author wasn't thinking about them. Though in practice I bet such conflicts are rare within a single schema file.

Maybe this calls for a new annotation $topLevelName("Blah") which will be used by code generators in languages that don't support nesting. Perhaps the Go and Haskell code generators could then default to putting all types in the global scope, and if/when conflicts arise, this annotation can be added to address it.
 
I suspect there is a better solution to be found involving direct
support for exporting disjoint method sets.

Isn't that still multiple inheritance, though? Wouldn't it be just as hard to support it in this ad hoc way as it is to support it as a first-class language feature?

-Kenton

Kuba Ober

unread,
Jul 13, 2019, 5:24:36 PM7/13/19
to Cap'n Proto
For what it's worth form a user's perspective: Multiple inheritance is an absolute killer feature for me; it would be much harder to use the RPC protocol without it. But I see it as an entirely abstract architectural design decision. It's a concept, and by itself it seems to impose no limitations whatsoever in how it is expressed in any concrete programming language.

The bundled codegen expresses this abstract concept in the concrete in terms of C++ virtual inheritance, bit this is an implementation detail. It allows flexibility in composing types, using a fairly at a nominal added cost, but is not an architectural decision. The C++ codegen could, for example, have a mode that combines interface types using type tuples, and then you'd have static compile-time virtual inheritance instead.

I generate Pascal-like code from capnp schemas, and it's a dialect with no virtual inheritance, and I found a suitable way expressing it, and the schema is chock full of multiple inheritance.

Cheers, Kuba
Reply all
Reply to author
Forward
0 new messages