Hi, I'm the author of flatcc, the C implementation of flatbuffers. I have not personally used this with microcontrollers, but I do have experience with such systems, and flatcc was designed also with such environments in mind. But in many cases the C++ implementation will also be useful.
Lairdtech is using flatcc with libssh to communicate with some of their devices, but I don't know how constrained they are.
I am aware of others that studied extremely limited devices with RTOS library. This involves custom allocation, which flatcc supports. But I don't know how far they got.
If you really constrained, you can emit partial buffers and recombine them on the receiving end, but normally it is simpler to build a buffer and send it. Then you can use any transport you like. I would look into combining flatbuffers with MQTT which I am already using in another context.
I can only provided guidelines here, someone has to go and have those problem in real life:
flatcc is designed to be extremely portable, but you likely need to make a few changes to the flatcc/portable library to make some systems/compilers happy. If you can deliver your raw data as complete arrays before building the buffer, it requires only a few kilobytes of working memory, and space for resulting the output buffer. You can customize the emitter object so you don't need all that space at once, or sent partial buffers on the wire. If you cannot tolerate dynamic allocation, you can preallocate blocks of memory and use a pluggable allocator to feed those blocks. It is a bit of work, but definitely possible. If you have 16K working memory, you can handle a lot of common use cases without doing anything special.
As to speed, it will be faster to just send a fixed size struct over the wire but still very fast (1-200 ns on modern x64 Intel chips, smal buffer). Reading is very fast (<30ns). If you generally want to use flatbuffers and the generated code and schema support, but have really extreme needs, flatcc can also ship buffers that only has structs, not tables, loosing versioning and compatibility with other FB implementations but still with a compatible schema, then you should have just about as fast a format as you can imagine.
C++ runs at about the same speed as the C version.
FB is very suitable for this application. The only issue is that the format is not well suited for streaming partial buffers - you generally want to build a complete buffer before you ship it. But as I suggested above, it is possible to do with flatcc with some careful design. With C++ you need to the full memory at once.