How does Flatbuffer serialization works ?

1,614 views
Skip to first unread message

Gaurav Jain

unread,
Mar 18, 2016, 11:23:21 AM3/18/16
to FlatBuffers
Just wondering how flatbuffer store data (tables) in memory. As far as i understand, flatbuffer just maintain a contiguous memory location to store each entry in table, and maintains an offset array (where each entry in offset array points to the location where that entry lies). In case of vectors / struct, it stores the memory offset from where it starts. Therefore in case of vectors, we have TWO(2) memory access.

But I didn't understand the part of serialization. Flatbuffers claim we only need to have buffer pointer and we can send it over the network, but if the things like vector/struct are maintained in indirect fashion, then how actually flatbuffer serialization works?





mikkelfj

unread,
Mar 18, 2016, 11:55:38 AM3/18/16
to FlatBuffers
You essentially answered it yourself: it can move over network because all data references are relative offsets and because all data are stored within the same buffer memory that can be moved over the network.

Even if flatbuffers are flat, they really aren't. The are trees (or even acyclic graphs) with all nodes stored as close together in memory as possible, and such that children are stored after parents (in the default configuration). But you only need one pointer to somewhere in the tree in order to go further down because you can add the offset to the pointer repeatedly until you reach a leaf node, for example an integer value.

It is also true that you often have multiple memory accesses, but because of the way CPU cache lines work, it is often very cheap to perform either the first or the second lookup. In fact, there are often more than two access required because of vtable lookups. To give some explicit numbers, reading a value in table typically takes 30ns instead of 1 nano second. This is the cost of going from level1 cache to level 2 cache and you see the same in hash tables once they grow large enough. But you do not see 30ns for each memory access because often one of the accesses will already be in much faster level 1 cache. But this widely depends on data layout, access pattern and CPU architecture, and even OS.

To further complicate matters, languages like Java sometimes need to dynamically allocate memory where C/C++ can read data directly from the buffer, making it vastly faster to use flatbuffers in these languages. This may change as various language interface get more clever in caching allocated objects, for example using a hash table.

Gaurav Jain

unread,
Mar 18, 2016, 2:35:49 PM3/18/16
to FlatBuffers
Thanks a lot for clearly explaining things. Your explanation leads to another question in my mind:

If all data are stored in contiguous chunks of memory, then it should be pretty simpler to copy contents from one flatbuffer to another in c++. It should be simple (byte dst[], byte src[], size) call.
Then why it is claimed as expensive operation ? Did i miss something ?

mikkelfj

unread,
Mar 18, 2016, 4:41:07 PM3/18/16
to FlatBuffers


On Friday, March 18, 2016 at 7:35:49 PM UTC+1, Gaurav Jain wrote:
Thanks a lot for clearly explaining things. Your explanation leads to another question in my mind:

If all data are stored in contiguous chunks of memory, then it should be pretty simpler to copy contents from one flatbuffer to another in c++. It should be simple (byte dst[], byte src[], size) call.
Then why it is claimed as expensive operation ? Did i miss something ?

First, expensive is a relative term. Copying the entire buffer is much more expensive than, say, updating an integer. But memcpy is much faster than probably most are aware of. You can build 1 millon+ 300 byte buffers per second https://github.com/dvidelabs/flatcc#operation-flatcc-for-c-encode-optimized
so depending on your needs, it may be a non-issue, but for interactive use and busy servers, things are never fast enough as small costs pile on top of each other.

Therefore, it is feasible to update buffers as you suggest. However, when growing an array, you also move what is after, and therefore the relative offsets changes for unrelated content. This means the entire tree must be traversed and have offsets updated. This is also not very expensive, but still much more costly than trivial updates. In addition, adding new fields may require changes to the vtable, adding more complexity to the update.

Doing all of this once, is faily acceptable, but doing it every time a small change is required, such as appending a value to an array, is slow, and the copying evicts useful data in the CPU cache lines adding hidden extra costs.

However, there is another approach that Facebook wrote about (google facebook android flatbuffers), and a related approach that C++ flatbuffer reflection uses (haven't studied the details). This approach stores new updates in a separate buffer and than merges the result once all updates are done. Meanwhile reading such data is more complicated and slower, but the amortized cost of the update will not be too bad.

One can take this a step further and create a hash table that is updated whenever a tree element is updated. The updated data is stored in normal dynamically memory, or some similar heap structure. Vectors are overallocated (using exponential growth). Whenever a tree element is read or written to, the hash table is first visited and if there is a hit, the alternative storage location is used. This is slower than direct flatbuffer use in C, but not a lot slower. Once enough changes have  been received, the entire buffer is reconstructed, either before transmission, or simply as a garbage collection strategy. For this to work, an alternative reader interface must be generated which supports this hash table lookup.

The above approach might also be useful for Java, C#, and other dynamic languages even for read only because the hash table can store strings allocated in the language runtimes heap, as I hinted at earlier.

Reply all
Reply to author
Forward
0 new messages