Schema Checking

32 views
Skip to first unread message

gaus...@gmail.com

unread,
Jun 7, 2019, 9:03:15 PM6/7/19
to Cap'n Proto
I have a use case that I'm not sure this library supports, but this is the most likely choice I've seen. If you're aware of a better option, I would be delighted to know about it.

I have a multiprocess system, probably multi-machine. When starting up, there is a connection phase as everything gets going, then steady streams of typed data. (Not a fixed list. May be additions after compiling)

Right now, data is passed between processes as void*, size_t. Lots of issues there, not least of which is that there no way to check the types except looking at documentation. So if you connect the wrong thing, not only do you fail, you fail while you're supposed to be running.

I want to use Cap'n Proto to, during the connection phase, send something to verify the data is compatible. I won't have access to actual data at this point. I don't think I can compare schemas directly, since they can be added onto and changed.

All of the processes are "trusted". Namely, they're not malicious, but they might be stupid. If they break things, I want to be able to point at them and say "They did it".

I have the authority to limit the capabilities of Cap'n Proto we support, though I would prefer not to.

I saw the unique ID generator, but if some future dev just copies a schema and changes it, that won't work, right?

My other idea was to do a default construction and send that, then try to decode it at the other end. I can't tell what happens if it fails (an exception, I think), or what happens if you have two different, but similar struts (xyz coordinates and GPS LLA are all 3 float64s for example).

I think the default message should work, but I can't be sure. And it doesn't feel like the right way.

Thanks for any advice.

Kenton Varda

unread,
Jun 9, 2019, 4:56:36 AM6/9/19
to gaus...@gmail.com, Cap'n Proto
Hi gaussgun,

The default-construction idea won't work: All default-constructed Cap'n Proto structs are effectively identical; encoding an unmodified struct and decoding it as a different type will in fact get you the default content for that other type. This is because structs are simply zero-initialized (this is why integers are encoded XOR'd with their default value -- so that the default encoding is always zero). The structs might have different sizes, but that's a normal thing to happen when new fields are added, so it doesn't make them incompatible.

The idea of auto-detecting "schema compatibility" between servers comes up often. In my opinion, though, it's a mirage -- an idea that looks great from a distance but you can sink a lot of effort into it and not really get anywhere useful. At the end of the day, the kinds of errors that are easy to detect by comparing schemas at runtime almost never actually happen. The only real case I ever saw of engineers making accidental incompatible changes to schemas in either Cap'n Proto or Protobuf were people changing "required" to "optional" in protobuf and being unaware how easily this could break stuff -- the solution was to remove "required" from the language.

If you're worried that two servers might be trying to speak totally different protocols to each other, I'd say add a field to the top-level struct that simply names the protocol, e.g.:

    struct Message {
      protocol @0 :Text;  # must always be set to "foo-protocol"
      # ...
    }

That's really about the best you can do here, IMO.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/43f5da93-4ff6-40e2-a3a9-5927bb88d2dc%40googlegroups.com.

gaus...@gmail.com

unread,
Jun 10, 2019, 10:10:55 AM6/10/19
to Cap'n Proto
I was looking deeper at the ids, and noticed something I'd missed. Capnp generates ids for every type automatically.

If I understand it correctly, the auto-generated ID depends on the ID of the parent scope and the item's name, nothing else. If that's correct, then using this might work.

If I don't give them an example schema with a file id, then the only way to get a bad duplicate ID would be to copy their own schema, and not change a structure name, and also change its internals incompatibly. Which would only break their own stuff, not properly functioning already existing code. Or really bad luck.

And sending a
struct SchemaID {
id @0 :Int64;
}
Is easy.

Am I understanding the id generation properly?

Kenton Varda

unread,
Jun 10, 2019, 10:57:14 AM6/10/19
to gaus...@gmail.com, Cap'n Proto
Yes, you could indeed use type IDs in that way.

Also worth noting that if someone accidentally reuses an ID and tried to link both schemas into the same program, they'll get a linker error.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
Reply all
Reply to author
Forward
0 new messages