Most efficient way to serialize large binary blob without creating a copy?

Walter Gray

unread,

Mar 13, 2018, 4:53:39 PM3/13/18

to FlatBuffers

I'm looking at flatbuffers and wondering if there is a built-in or best practice suitable for my particular use case.

The Goal: Serialize and write image data (to a WriteBytes(void*, size_t) interface) without making a 2nd copy of the image in memory.

My current solution is to do this only by convention of my schema - use a flatbuffers object as a header, then write the image data directly after. Ideally though, I'd be able to write my schema and serialization code such that on the receiving end I'd be able to use the flatbuffers verification scheme & access the image data through the FB object. Is there any way I could achieve this? I'd also be interested to know if it would be possible to write a flatbuffersbuilder that would write directly to the output as soon as enough information was available.

Ideally I could have a schema something like:

table Image {
  width:int16;
  height:int16;
  bpp:int8;
  ... //other metadata
  data:[byte];
}


union MessageType {
  Image
}


table Message {
  message:MessageType
}
root_type Message;

with C++ code that looked something like this:

void WriteImage(const MyImageType& myImg) {
  flatbuffers::FlatBufferBuilder builder;
  auto imgDataOffset = builder.CreateExternalVector(myImg.data(), myImg.size()); //Creates a promise to write immediately following the end of the known fb buffer. Cannot use CreateVector since that would copy the memory into the buffer.
  auto img = CreateImage(builder, myImg.x, myImg.y,...,imgDataOffset);
  auto message = CreateMessage(builder, MessageType_Image, img.Union());
  builder.FinishSizePrefixed(message);  //the size prefix includes the size of the size-prefixed external vector
  WriteBytes(builder.GetBufferPointer(), builder.GetSize());
  WriteBytes(myImgData.size(), sizeof(myImgData.size()));
  WriteBytes(myImgData.data(), myImgData.size());
}

or even better with a direct-write builder:

void WriteImage(const MyImageType& myImg) {
  flatbuffers::FlatBufferStreamer builder([](void* data, size_t sz) { WriteBytes(data, sz); }); //Takes a lambda that will write as an argument.
  auto imgDataOffset = builder.CreateExternalVector(myImg.data(), myImg.size()); //creates a promise to write the image data, stores the pointer & size but does not perform any allocation or memcpy.
  auto img = CreateImage(builder, myImg.x, myImg.y,..., imgDataOffset);
  auto message = CreateMessage(builder, MessageType_Image, img.Union());
  builder.FinishSizePrefixed(message); //Automatically Writes the data on completion, including the data that is pointed to by the ExternalVector
}

mikkelfj

unread,

Mar 14, 2018, 1:49:41 PM3/14/18

to FlatBuffers

Hi Walter,

I'm the author of the FlatCC, the C bindings for FlatCC.

I take some interest in the problem that you describe - but I cannot speak for the C++ interface.

Generally, FlatBuffers are built back to front for historical reasons. It doesn't matter much when you realloc a dynamic array but it does matter if you stream to disk or network.

A slight change in format would solve this problem with near zero impact on readers (might have to move to a signed data type):

https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#streambuffers

Without such a change you can still emit data in fragments via FlatCC: it has a pluggable emitter object. The default implementation stores data in

pages of a circular buffer that adds pages as needed. While not directly implemented, this design takes into account that you can transmit a page and

and recycle the page once acknowledged during buffer construction. You'd have to modify the emitter to add network operations.

Because buffers are sent back to front, the receiver need to store each received page along with its size and reconstruct the buffer when all data is avaialable.

Such bookkeeping could be transmitted in a separate channel, or in a seperate file.

If you transmit a vector in a single call, as opposed to append, the builder will forward the user directly to the backend emitter. This will normally be copied into pages

but could also move directly to a network buffer.

Note that FlatCC does not assume that it is possible to access already written buffer content exactly because pages might get recycled or because data might

not be stored in consequitive memory. As far as I can tell, C++ builds data up in a single contigous memory buffer. FlatCC, instead copies out the data at the end

optionally to a user supplied buffer - or not at all assuming you implemented a transmitter.

I'm rather keen on getting a StreamBuffer implementation working, but it is not currently a priority 0 thing. It would allow to append data to disk on an ongoing basis

and it would provide verifiable checkpoints. Earlier today i was considering a design where you have a table where one field is a table of the same type and another

field is a vector of data. The root is always stored at the end of the file, but you can replace the root reference with new table and store a reference to the old table

in a chain. Then you can append until the 2GB data limit.

Mikkel

Walter Gray

unread,

Mar 14, 2018, 6:00:46 PM3/14/18

to FlatBuffers

Fascinating! This is more or less exactly the kind of thing I'd hope for. I believe you're correct about the C++ api building up a single contiguous buffer. FlatCC's pluggable emitter object seems much more suitable for constrained memory environments. What's the feeling of the main flatbuffers binding maintainers towards integrating support for such a system with the C++ api? It seems like a much more flexible approach and I'd love to see it added to more of the language bindings.

StreamBuffers describe my problem almost exactly. Are there any existing PRs for either flatcc or the main repo aimed at supporting such a thing?

mikkelfj

unread,

Mar 15, 2018, 1:45:12 PM3/15/18

to FlatBuffers

There are no PR's open on flatcc.

If you are interested in trying to tackle the problem, I can guide you. It should not be very difficult to implement.

The main problem is that it might need alternative names for the builder operations, or a runtime flag that would slow down things.

For a start this could be managed with compile time flags so you either compile the builder for StreamBuffers or FlatBuffers.

A related issue is the verifier - it is not very difficult, but there are relatively small pieces of tricky code that must be right and avoid

overflow in all possible computations.

As to recombining data of fragmented buffer pages: If the receiver has sufficient resources, memory copy operations or even file copy

operations are significantly cheaper than actually building a buffer, so the amortized overhead of reconstruction is small.

Wouter van Oortmerssen

unread,

Mar 15, 2018, 3:44:13 PM3/15/18

to mikkelfj, FlatBuffers

You can sorta-kinda achieve the no-copy writing of the image data using CreateUninitializedVector. That will use memory for the buffer that includes the size of the image, but it will not incur the cost of copying (or initializing) that memory. Then when writing the buffer, you'd write it in 3 steps like your first example. You'd manually track the offset where the vector sits in the larger buffer to do so.

So unless you are memory constrained, that would be the way to go. That is assuming you have tested that copying the data is indeed a bottleneck.

The streaming Mikkel proposes is cool, but it is also a large amount of changes (especially if we wanted to support multiple languages), so sofar I am on the fence about it. I'll agree that this is a clear pain point of FlatBuffers, but it is also what gives it its speed and simplicity.

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Walter Gray

unread,

Mar 15, 2018, 5:53:24 PM3/15/18

to FlatBuffers

Unfortunately the whole reason for considering this is that I *am* in a memory constrained environment. It might be possible to add some kind of support for a composed buffer without the need for the full set of StreamBuffers changes - Maybe adding a Write function to flatbufferbuilder, and adding a CreateExternal that would take a lambda or some external block of memory, and make calling Release an error? I'm not 100% sure the best way to go about it. I've got a solution that works for me for now, but I'll look into seeing if I can find time to play with possible implementations.

To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.

Wouter van Oortmerssen

unread,

Mar 15, 2018, 6:21:49 PM3/15/18

to Walter Gray, FlatBuffers

Unfortunately FlatBuffers works with offsets pervasively, so a LOT of the code depends different parts of the buffer being at the correct distance from eachother. It is probably possible, but it would be a fairly complex change.

Actually, that reminds of yet another way this could be implemented. reflection.h: ResizeVector contains code that knows how to insert space into an existing vector by iterating through the buffer and patching all offsets. So in theory you construct a FlatBuffer with a 0-sized image vector, then use that code to patch all the offsets, but disable the code that actually resizes the buffer and does a memmove. Then write the resulting buffer 3 parts. It uses reflection though, so is not the simplest method.

To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers+unsubscribe@googlegroups.com.

mikkelfj

unread,

Mar 15, 2018, 6:50:12 PM3/15/18

to FlatBuffers

It is fairly simple to do StreamBuffers in flatcc

For FlatBuffers:

when an object is emitted, a reference is returned. This reference is a negative offset from the end of buffer and is computed as the

last emitted negative offset - the new size (incl padding).

The user gets to hold this reference and at the same time the emitted segment as sent to the emitter as the negative offset and

one or more iovec_t buffer pointers that populates the range between the previously emitted data and the new data.

When the reference is stored in another table or a vector, the builder makes a translation pass to convert it from an absolute

negative reference to an offset. This is offset is trivially given as the distance between the two negative references.

The negative references are a purely virtual construct. The emitter uses it to move data into the circular buffer.

There is also support for adding positive references - meaning adding data at the other end of the circular buffer - this is currently only

used for clustering vtables together.

Moving to StreamBuffers would be nearly the same thing, excepting mirrored:

Instead of negative offsets, the offsets would be positive and the emitter would append in the other direction - it already knows how

to do that. vtables would not be clustered because they can't be placed before the start. Tables are constructed on a stack, so when they

are ready to be emitted, it is just a blob send to the emitter, so also no change here.

The main difficulty is probably in handling padding and alignment because computations would be slightly different - and these computations

are very easy to get wrong - but, overall, front to back computations are simpler than the current back to front.

The translation from references to offsets would be unchanged because the subtraction of of two references gives the same result - only

with a different sign due to the order, which is exactly what we want.

There is no material change on order of output - except the detail about vtable clustering, but vtable clustering can already be disabled today

and has to be for nested buffers because the too don't have a logical place to store clustered vtables.

There is also going to be zero difference in performance assuming no runtime decisions are being made between buffer types.

Benchmarks show that FlatCC is very similar to C++ timings for FlatBuffers.

Wouter van Oortmerssen

unread,

Mar 19, 2018, 12:02:14 PM3/19/18

to mikkelfj, FlatBuffers

For writing the case is certainly easier than for reading. In C++, the offset returned to the user is actually a positive integer that represent the distance from the end of the buffer, and that gets converted to a real offset by ReferTo(), which indeed stores the difference of 2 such offsets. So this code is already stream-agnostic.

More problematic is the vtable writing which wants access to all past vtables, which may sit in any streamed part.

And then there's the fact that of course things are generated backwards.

Streaming reading is a lot harder.

To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers+unsubscribe@googlegroups.com.