Is FlatBuffers encoding deterministic?

982 views
Skip to first unread message

Ghadi Shayban

unread,
Jul 20, 2017, 10:39:50 AM7/20/17
to FlatBuffers
Does FlatBuffers guarantee that generated payloads are byte-for-byte identical when generated from different systems?

A prerequisite for this is ensuring that key serialization order is consistent when serializing map/dicts. There are other sources of nondeterminism though. [1]



Wouter van Oortmerssen

unread,
Jul 20, 2017, 12:10:57 PM7/20/17
to Ghadi Shayban, FlatBuffers
It does not guarantee it, no. In fact, if you read the internals document, it explicitly states that there is some flexibility in the format in terms of how an encoder can choose to represent things. For example, the C++ API stores fields in schema order if you call CreateMyObject(), or it stores them in source code order if you explicitly call .add_my_field() instead. Then, the JSON parser sorts by field size as it encodes (to attempt to pack things more tightly), which is yet a possibly different order.

That said, if you use one particular API across different machines (32bit vs 64bit, x86 vs arm vs big endian) they SHOULD all end up with the same encoding, since FlatBuffers has been written to ignore these differences. If it doesn't, that would be a serious bug. We currently don't have an automated test that runs across all possible architectures that guarantees no such bugs however.

For dictionaries, CreateVectorOfSortedTables uses std::sort, so as long as the order of that is deterministic, so will we be.


--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mikkelfj

unread,
Jul 21, 2017, 12:09:02 AM7/21/17
to FlatBuffers
You can also look at minimified JSON printing from a given buffer and API - there might be some differences in how enums are translated between values and symbols and the ordering of keys, or how default values are included or left out, but other differences should be absent for the same content. Printed numeric values might differ slightly, but they shouldn't.

If you need to hash a buffer it is best to walk it recursively and exclude an already visited via a hash table, or you can do it on JSON after managing the differences above. A JSON printer could be asked to walk tables in strict order of table field id for consistency (that is likely already the case).

mikkelfj

unread,
Jul 21, 2017, 12:11:33 AM7/21/17
to FlatBuffers
> For dictionaries, CreateVectorOfSortedTables uses std::sort, so as long as the order of that is deterministic, so will we be.

FlatCC (C binding) does not sort while creating a buffer, but can do it retroactively - it is not a stable sort because it use no external memory.

Mehdi AMINI

unread,
Jul 21, 2017, 4:54:29 PM7/21/17
to Wouter van Oortmerssen, Ghadi Shayban, FlatBuffers
2017-07-20 9:10 GMT-07:00 'Wouter van Oortmerssen' via FlatBuffers <flatb...@googlegroups.com>:
It does not guarantee it, no. In fact, if you read the internals document, it explicitly states that there is some flexibility in the format in terms of how an encoder can choose to represent things. For example, the C++ API stores fields in schema order if you call CreateMyObject(), or it stores them in source code order if you explicitly call .add_my_field() instead. Then, the JSON parser sorts by field size as it encodes (to attempt to pack things more tightly), which is yet a possibly different order.

That said, if you use one particular API across different machines (32bit vs 64bit, x86 vs arm vs big endian) they SHOULD all end up with the same encoding, since FlatBuffers has been written to ignore these differences. If it doesn't, that would be a serious bug. We currently don't have an automated test that runs across all possible architectures that guarantees no such bugs however.

For dictionaries, CreateVectorOfSortedTables uses std::sort, so as long as the order of that is deterministic, so will we be.

You'd need std::stable_sort to get portable determinism though right?

-- 
Mehdi

Wouter van Oortmerssen

unread,
Jul 24, 2017, 11:24:02 AM7/24/17
to Mehdi AMINI, Ghadi Shayban, FlatBuffers
You'd need std::stable_sort to get portable determinism though right?

Yes, I guess if your data may contain duplicate keys, then this possibly won't be deterministic across STL implementations.

I'm not sure if this is worth the cost switching to stable_sort for, especially since, the format isn't deterministic in other circumstances either. If this is necessary, Adding a CreateVectorOfStableSortedTables may be the way to go. For now, you could also sort the vector yourself, before passing it to CreateVector.

Reply all
Reply to author
Forward
0 new messages