Versioning Flatbuffers

1,368 views
Skip to first unread message

Sean McCauliff

unread,
Jun 21, 2018, 6:42:17 PM6/21/18
to FlatBuffers
I realize that flatbuffers deal with versioning by allowing optional fields.  This works well for each individual data structure that needs to be serialized.  But is there a way I can version a collection of flat buffer specification files?  I was thinking about something that strips out white space and comments, concatenates all the specification files and then takes an MD5.  Does something like this exist?

Thanks,
Sean

Shivendra Agarwal

unread,
Jun 24, 2018, 11:01:24 AM6/24/18
to FlatBuffers
I don't think any of such thing would be there.. But can you explain your use case and why would you like to build such a feature?

Wouter van Oortmerssen

unread,
Jun 25, 2018, 1:24:04 PM6/25/18
to sean.mc...@gmail.com, FlatBuffers
Nope, haven't heard of something like that. The idea behind FlatBuffers is to make explicit versioning unnecessary. Can you give a use case?

To test if schema evolution is compatible, there is `--conform`

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott Watson

unread,
Jun 25, 2018, 5:20:52 PM6/25/18
to FlatBuffers
I'm a +1 on this, since I was just about to post this question!

We have a whole bunch of programs that are only loosely coupled, and use BSON/JSON for most of our message passing, which works well for many data-structures, but there degenerate cases which have us looking at ways of sending pre-formatted (i.e. flatbuffers) around.

What we would like is a way to check a structure signature, to confirm that the sender and receiver are in sync, and in the context of flatbuffers, in sync for what could be a breaking change. 

The ROS idl generates an MD5 signature, AFIK, using a technique like OP suggests. 

What we would like to do is use flatbuffers, but include the signature in the message we send, so receivers can do a "type check".

Wouter van Oortmerssen

unread,
Jun 25, 2018, 6:24:51 PM6/25/18
to alcoh...@gmail.com, FlatBuffers
Again, you shouldn't need this.. you can make all sorts of changes and additions to a FlatBuffer schema that are forwards and backwards compatible, and sender and receivers do not need to be in sync for this. See the schema docs.

--

mikkelfj

unread,
Jun 26, 2018, 7:54:04 AM6/26/18
to FlatBuffers
Funny, I was just thinking about this today (not for the first time though).

Yes, stripping spaces and computing a hash, and a hash of all included schema.

I was thinking about taking it one step further: create a signature of a base schema using a sha256 and a private SSH key.
Then for a given derived schema, specify the base schema, the new version and have the schema compiler verify that it is forward compatible.
If it passes that check, sign the new schema with the old version listet.

A chain of schema hashes and signatures could be listed in a file, similar to SSL certs.

I was thinking of adding this to flatcc (for C) at some point since I have the openssh cert parser somewhere and ed25512 signature logic also somewhere.
But it is fairly low on the TODO right now.

I'd have to add the forward compat check that Googles flatc already does.

Mikkel

mikkelfj

unread,
Jun 26, 2018, 7:59:04 AM6/26/18
to FlatBuffers
@Wouter,

the reason for doing this is because you cannot trust that a schema has not been modified in an incompatible way.
If you have all schema versions you can run flatc checks to ensure this so in that sense it is not needed.

But having a list of signatures that you know you can trust makes it much safer to use in production environments.
Generated code could list the sha hash it was generated with.
And, with a signature, it could also be trusted to an extend - though of course the source could be modified - but then you can regenerate it and compare
without having the history of all other schema.

Once you can trust the schema, and you trust all other deployments, then these other deployments might run on older or newer schema versions.
This is where forward/backward compat is important. But it won't work if the schema are incorrectly versioned. A simple merge mistake could do that.

Wouter van Oortmerssen

unread,
Jun 27, 2018, 12:07:14 PM6/27/18
to mikkelfj, FlatBuffers
I feel that is the wrong way to solve the problem. If you are worried about incompatible changes, running `flatc --conform` as pre-submit check on your VCS (or CI if need be) is a more developer friendly guarantee than trying to maintain a list of known good schemas, which seems fragile and too late in the process.

Also if you can't trust of the rest of your systems you'll probably need to protect yourself at a binary level, e.g. use the verifier. Though if you're working in such an unpredictable environment I'm not sure if FlatBuffers is even the best solution.


Sean McCauliff

unread,
Jun 27, 2018, 12:21:19 PM6/27/18
to FlatBuffers
Just to clarify this question: I'm not really asking for this to be implemented, just inquiring if this is possible.  If this is not already a feature then is there: 1) a way to generate a canonical schema such that all two different flat buffer schema files defining the schema will be mapped to the same file or 2) a formal specification of the flat buffer schema grammar that I can use to create such a tool?

Here are two use cases for having an automatically generated identifier for a collection of schemas:

A) Protocol level.  A client or server can send a different serialized messages depending on the client protocol level.  This can either be additional fields or different messages entirely.  

B) Archive.  I have an archive of data, in order to read this data I need to know something about what is written there.  This means some kind of schema id needs to be stored with the data.

Wouter van Oortmerssen

unread,
Jun 27, 2018, 1:09:56 PM6/27/18
to Sean McCauliff, FlatBuffers
If you want to do this, your best bet is probably to work from binary schemas (.bfbs), since that way you don't have to deal with parsing and all the possible ways a text schema can subtly differ. As a bonus, your tool can be written in any language FlatBuffers supports (as these files can be accessed using FlatBuffers itself). You'd hash/checksum important data about fields, and since fields in these files are always in sorted order it be easy to get deterministic results.

mikkelfj

unread,
Jun 28, 2018, 10:47:23 AM6/28/18
to FlatBuffers
There is no canonical schema because what constitutes forward compatible schema is a matter of interpretation.
For example, it may be valid to rename a table field as long as the type is not modified. But it breaks down badly in JSON parsing and a naive schema checksum will also not capture it. Enumerations can also be change.

Assuming harder rules blocking any renames, there are still optional attributes, especially those that only make sense for some target languages.
But once you get this strict, you might as well prevent reordering in the schema as well.

Hence, you can just write a shell tool that strips whitespace and applies an MD5 or sha1 checksum, and even use openssh to sign if so desired.

On binary schema: printing it JSON without space would be the best option, but the bfbs schema leads to explosion due to shared DAG references.
A different binary schema without DAGs would be better, and could also be used by other programs to ready schema without understand FB, which is useful when bootstrapping.

Taking a checksum directly on bfbs is not very precise because it can change from build to build whereas a JSON output is rather stable, especially if limiting the names to ASCII.
Reply all
Reply to author
Forward
0 new messages