v2 modules and versioning

70 views

Skip to first unread message

Jonathan Shapiro

unread,

Sep 20, 2023, 8:50:25 AM9/20/23

to Cap'n Proto

I've been thinking about modules and versioning. CapnProto has an import mechanism, but it doesn't seem to have a first-class concept of a schema that can be versioned.

Recently, I've spent a bunch of time working in both TypeScript and Go, and I had designed a module system for BitC many years ago with some care for verification. It took some getting used to, but from a developer perspective I have come to feel that Go has made a good set of pragmatic decisions by tying code repositories, the cryptographic hashes they supply, and version tags to modules and versioning, and by separating imported identifier aliasing from the import itself. The go.mod/go.sum combination seems to handle everything that the node package system does with regard to version binding, but subjectively feels simpler. I do find that import paths get long, and I sometimes wish that go.mod had a way to do import path shorthands, but I haven't ever hit a point where that seemed critical.

[ I definitely do not like Go's decision to conflate identifier capitalization with export. It's cute in a bad way, irritating, and breaks link compatibility with everything. In schemas, the need to distinguish between public and private things doesn't arise in the same way, because the whole point is to be publishing the protocol. ]

For CapnProto, I imagine we would call the versioned thing a schema rather than a module. A pleasant side benefit is that well-defined versions on schemas offer the possibility that the backwards compatibility of protocol versions (e.g. v1.1.0 relative to v1.0.0) can be mechanically validated - which seems useful.

Two relevant points are made in the CapnProto language description. Paraphrasing:

"Symbolic names can collide... which can be hard to detect in large systems using different versions of protocols."

This point is made in the context of discussing type IDs. CapnProto needs type IDs for wire encoding reasons, but this isn't the right argument for having them! It's an argument for a proper module and version system. And as an aside, the question of type equivalence in the presence of federated protocols is good for a couple of doctoral dissertations.
"Fully qualified names become large and waste space on the wire."

As has been noted elsewhere, CapnProto's "everything is a namespace" leads to horrifically long names produced by the generators, so I think that ship has already sailed. The Go module system and import design limits the length of names in code to "importBinding.typeName". It would also help to get rid of the "everything is a namespace" idea.

The notion of wasted space on the wire because of long names seems like a red herring, because I can't see anything in the spec suggesting that identifier names ever appear on the wire. If they did, and if compression is more important than clarity, we should be thinking about a compression-friendly renaming similar to what Google does when minifying JavaScript.

Before I rathole too far on this, does anybody else see this as a thing worth thinking about for v2?

Jonathan

Kenton Varda

unread,

Oct 31, 2023, 5:09:55 PM10/31/23

to Jonathan Shapiro, Cap'n Proto

[Sorry for the long delay in replying -- I recently moved into a new house and have been rather swamped.]

Cap'n Proto follows in the Protobuf philosophy of versioning, which is, there are no versions, or alternatively, versions are a continuous spectrum. As long as each incremental change is made in a backwards-compatible way, then old programs should be able to talk to new programs and vice versa. If you want to make a breaking change, you make a whole new type to represent the new protocol. If you want to put "v2" or whatever in the name of that type, that's up to you.

Cap'n Proto uses 64-bit type IDs to canonically identify a type. Two type definitions are presumed to be versions of the same type if they have the same ID. Hence, type names are merely a convenience for developers writing code, but can freely be changed without breaking compatibility. In pure Cap'n Proto, type IDs are the only global namespace of types, and collisions there are unlikely due to being chosen randomly. In most programming languages we are forced to place type names into some sort of global namespace, but that's up to the language-specific code generator to deal with.

> backwards compatibility of protocol versions can be mechanically validated.

With Cap'n Proto's SchemaLoader (in C++, at least), when you load two types with the same ID, it will automatically check them for compatibility and choose the newer version as the type that the SchemaLoader ultimately gives to the application. This is done by actually comparing the schema contents. If the two versions are inherently incompatible, an exception is thrown. I don't know if version numbers would actually add anything here.

Of course, it's entirely possible that a newer version of some software has changed the interpretation of a schema, without changing the actual definition in any way that is detectably incompatible. Obviously, it's fundamentally impossible to detect such incompatibility in an automated way.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/CAJdcQk2%3DsUUbvGFRiQ1eWHGtDdgV%3Dq5A8%2BEdYqBV6AJmFnfYkA%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages