Flatbuffers on small MCU embedded targets

Perry Hung

unread,

Dec 16, 2016, 6:01:27 PM12/16/16

to FlatBuffers

Has anyone had any experience using flatbuffers on small microcontrollers?

I am looking at designing a network protocol to replace an old ad-hoc binary protocol between a server (Linux/Windows) and a set of ethernet-enabled ARM based microcontrollers (Cortex M3s and M4s).

Has anyone tried shipping flatbuffers to these kinds of devices? Were there any problems encountered (serialization/deserialization speeds, memory constraints, build problems)?

If FB is not appropriate for this application, does anyone have a recommendation as to a more appropriate stack? I would hate to reinvent the wheel, and continually extending our old protocol is starting to become a nightmare of reserved fields and versioning incompatibilities.

mikkelfj

unread,

Dec 16, 2016, 6:31:34 PM12/16/16

to FlatBuffers

On Saturday, December 17, 2016 at 12:01:27 AM UTC+1, Perry Hung wrote:

Has anyone had any experience using flatbuffers on small microcontrollers?

Hi, I'm the author of flatcc, the C implementation of flatbuffers. I have not personally used this with microcontrollers, but I do have experience with such systems, and flatcc was designed also with such environments in mind. But in many cases the C++ implementation will also be useful.

Lairdtech is using flatcc with libssh to communicate with some of their devices, but I don't know how constrained they are.

https://github.com/LairdCP/dcas

I am aware of others that studied extremely limited devices with RTOS library. This involves custom allocation, which flatcc supports. But I don't know how far they got.

I am looking at designing a network protocol to replace an old ad-hoc binary protocol between a server (Linux/Windows) and a set of ethernet-enabled ARM based microcontrollers (Cortex M3s and M4s).

If you really constrained, you can emit partial buffers and recombine them on the receiving end, but normally it is simpler to build a buffer and send it. Then you can use any transport you like. I would look into combining flatbuffers with MQTT which I am already using in another context.

Has anyone tried shipping flatbuffers to these kinds of devices? Were there any problems encountered (serialization/deserialization speeds, memory constraints, build problems)?

I can only provided guidelines here, someone has to go and have those problem in real life:

flatcc is designed to be extremely portable, but you likely need to make a few changes to the flatcc/portable library to make some systems/compilers happy. If you can deliver your raw data as complete arrays before building the buffer, it requires only a few kilobytes of working memory, and space for resulting the output buffer. You can customize the emitter object so you don't need all that space at once, or sent partial buffers on the wire. If you cannot tolerate dynamic allocation, you can preallocate blocks of memory and use a pluggable allocator to feed those blocks. It is a bit of work, but definitely possible. If you have 16K working memory, you can handle a lot of common use cases without doing anything special.

As to speed, it will be faster to just send a fixed size struct over the wire but still very fast (1-200 ns on modern x64 Intel chips, smal buffer). Reading is very fast (<30ns). If you generally want to use flatbuffers and the generated code and schema support, but have really extreme needs, flatcc can also ship buffers that only has structs, not tables, loosing versioning and compatibility with other FB implementations but still with a compatible schema, then you should have just about as fast a format as you can imagine.

C++ runs at about the same speed as the C version.

If FB is not appropriate for this application, does anyone have a recommendation as to a more appropriate stack? I would hate to reinvent the wheel, and continually extending our old protocol is starting to become a nightmare of reserved fields and versioning incompatibilities.

FB is very suitable for this application. The only issue is that the format is not well suited for streaming partial buffers - you generally want to build a complete buffer before you ship it. But as I suggested above, it is possible to do with flatcc with some careful design. With C++ you need to the full memory at once.

Austen Higgins-Cassidy

unread,

Dec 9, 2019, 5:03:44 PM12/9/19

to FlatBuffers

Hey,

I'm just wondering where to start making an in-place allocator.

I desire to essentially give the builder a block of memory so that it avoids all dynamic allocation.
I generally know the maximum (worst-case) size of the structure I'll be generating.
The intention is that once the builder is done I can just ship the block of memory without a copy and deserialize it as-is elsewhere.

I'm thinking of a naive implementation that just returns N bytes from the block when requested, advancing a pointer from the start of the block.

Should the allocator allocate from the front of the buffer or the back?
Do I need a full free-list deallocator, or can I assume allocated memory is a FILO?
Assuming I know the maximum, worse-case size of the buffer, what is the minimum memory I can pass to the builder?
Am I better off just doing a copy?

Thanks!

Pranas Baliuka

unread,

Dec 9, 2019, 5:29:38 PM12/9/19

to Austen Higgins-Cassidy, FlatBuffers

If you looking for alternatives for old flavours of C you may check ASN.1 (BER/DER) language supporting processors - The ISO standard language was used to define TCP protocol frames, stock market data feeds like Extreme, and some very very old telco protocols TAP3/RAP3 or CDRs e.g. by Siemens.

References: e.g. https://gitlab.com/mtausig/tiny-asn1 and https://www.eevblog.com/forum/microcontrollers/asn-1-ber-der-vs-your-own-byte-code-system/

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flatbuffers/82563925-c1fe-425e-831b-f9c0f1bd44b8%40googlegroups.com.

Visit our website at http://www.algoteq.com

Privileged/Confidential information may be contained in this message and may be subject to legal privilege. Access to this e-mail by anyone other than the intended is unauthorised. If you are not the intended recipient (or responsible for delivery of the message to such person), you may not use, copy, distribute or deliver to anyone this message (or any part of its contents) or take any action in reliance on it. In such case, you should destroy this message, and notify us immediately. If you have received this email in error, please notify us immediately by e-mail and delete the e-mail from your computer. If you or your employer does not consent to internet e-mail messages of this kind, please notify us immediately. All reasonable precautions have been taken to ensure no viruses are present in this e-mail. As Algoteq Pty Ltd cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments we recommend that you subject these to your virus checking procedures prior to use. The views, opinions, conclusions and other informations expressed in this electronic mail are not given or endorsed by the company unless otherwise indicated by an authorised representative independent of this message.

Владимир Г.

unread,

Dec 10, 2019, 9:27:19 AM12/10/19

to FlatBuffers

You can start with the implementation of `DefaultAllocator`:

https://github.com/google/flatbuffers/blob/f9724d1bde4fd54bbeadf6a8195911cb366432e3/include/flatbuffers/flatbuffers.h#L601-L611

https://github.com/google/flatbuffers/blob/f9724d1bde4fd54bbeadf6a8195911cb366432e3/include/flatbuffers/flatbuffers.h#L559-L586

https://github.com/google/flatbuffers/blob/master/tests/test_builder.cpp

> Assuming I know the maximum, worse-case size of the buffer, what is the minimum memory I can pass to the builder?

In the best case - the size of data plus alignment.

In the worst case, you will need twice as much memory to implement reallocate_downward.

Of course, you need `new` operator to create FlatBuilder and Allocator objects.

The `placement new` operator, std::aligned_storage and `finally` qualifier can help you.

Austen Higgins-Cassidy

unread,

Dec 10, 2019, 9:37:17 AM12/10/19

to FlatBuffers

Thanks!

I've worked with ASN1 parsers before, it doesn't fulfill our use case, but thanks for the alternative.

I've got some placement new used in other places, but I'm a bit worried about memory. We're not working with much so every Kilobyte counts. Our structures are generally pretty small.
I think we're okay with the builder and allocator going on the stack, with the buffer itself on the heap.
The default allocator seems to just wrap new and delete, so I'll give the naive implementation a try and see what I get.

If I provide the flatbuffers builder an allocator that takes from a block, will that block have the same address as the GetBuffer pointer?
I assume if it's not, that it will point someplace within the block?

mikkelfj

unread,

Dec 10, 2019, 2:45:59 PM12/10/19

to FlatBuffers

On in-place allocator, assuming you intend to work with flatcc in C:

First, the way flatbuffers is organized means that you cannot trivially get a clean finished buffer from incremental allocations because it works back to front.

There is a proposal for an alternative StreamBuffers but that is just theory for now: https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#streambuffers

You can however, reconstruct the buffer from piecemeal allocations if you record information about each chunk since they (roughly) just have to be reassembled in the opposite order of creation.

Futhermore, it is important to be aware that the flatcc builder allocates temporary memory for various stacks. These stacks are drained as data is pushed to the emitter object. The default emitter object allocates memory in a ring buffer. It is this emitter object you need to override to capture the buffer data.

However, you also need to manage the temporary stack data. This can be controlled by replacing the default allocator object in an argument given to flatcc builder initialization. It will need some experimentation to find a safe upper limit for each stack type the allocator handles but if you get this right you can just dedicate a fixed memory block for this purpose and provide an allocator to deliver large enough fragments of this block for each stack type. The allocator will then not be asked again if it delivered enough stack the first time around.

Back to the ring buffer. The best option is to replace the existing implementation with you own modification, but you can also override the allocation calls the default emitter makes - but this override is more intended for systems that don't support malloc and friends.

If you go down this path, I can provide you with more detailed information on how to proceed.

Austen Higgins-Cassidy

unread,

Dec 10, 2019, 3:08:31 PM12/10/19

to FlatBuffers

I'm working in C++ at the moment, so I might be able to avoid some of this.
Giving it a static block is basically what I'm looking to do.

```

uint8_t* buffer = GetBuffer(1024);
InPlaceAllocator alloc(buffer);
flatbuffers::FlatBufferBuilder builder(1024, &alloc);

my::protocol::CreateRootStructure(
   builder, 
   0,
   1,
   2);
Modem.transmit(buffer, 1024);

I can tolerate some extra scratch data, and work to find reasonable limits that work on our platform.
I'm looking to avoid fragmenting memory and double-buffering. I want to be able to pool and re-use these chunks of memory for IPC and external messaging.

Michał Kopczyński

unread,

Dec 23, 2019, 12:01:04 PM12/23/19

to FlatBuffers

We are using FlatBuffers succesfuly on STM32 controller with C++. We are sending and receiving FlatBuffers messages of sizes from 100 bytes to 1KB from STM over TCP/IP, BlueTooth, serial.
Allocation of a buffer is not a problem as we can allocate 1 KB buffer up front and reuse it for all messages as MCU processes only 1 message at a time.

The only issue related to memory allocation is that in our case sizes of messages varies depending on the sizes of vectors and/or union types used inside of message. Those in turn depend on request comming from remote device (i.e. server). In this case it's not easy to estimate what will be resulting message size for different cases. We have to actually measure it so that we are sure that when we built message it doesn't exceed 1 KB what would result in memory reallocation and most likely reset of the device.

Luckly, so far we still have some spare RAM available and therefore can just create buffer with some margin and check after building message if it is not above 1KB. But assuming that buffer has to have exactly 1KB size is there some way to detect that message exceeds buffer and prevent memory reallocation?

Reply all

Reply to author

Forward