Is FlatBuffers a good way to implement application-level virtualization?

267 views
Skip to first unread message

Kenneth Kasajian

unread,
May 16, 2016, 2:49:57 PM5/16/16
to FlatBuffers
When I look at the documentation and videos for Flatbuffers, it's clear that it's goal is to be a much more optimal replacement for other serialization techniques such as JSON and XML.   That seems to be the emphasis for most of its pitch.   Based on what I've seen so far I have no doubt that it achieves this goal and is much more efficient that those techniques.

That is, if I have a system that's using, say, JSON to serialize, and I'm having performance problems with loading and storing my data, Flatbuffers sounds like an ideal solution.

However, what if my problem is that I currently have too much in memory.   If I have a system in place where things are in RAM, and now I have to come up with a technique to offload that to disk in an efficient manner, it's possible that Flatbuffers would be a great solution.   I would be able to take a system that may require gigs of RAM to reduce it down to some small working-set such as 500MB, and simply use memory-mapped files along with Flatbuffers to create an application-level virtual-memory system.    Conceptually one can see how that can work.   But it would be nice to know if anyone has tried to solve this problem with FlatBuffers.

Has anyone done this?  Anyone have any numbers on the performance impact in real-world applications?  I realize that this can vary significantly between computers and devices and persistent storage (disk) will often be orders of magnitude slower than RAM, but it would be nice to know have some kind of an idea of what to expect from real-world applications.

mikkelfj

unread,
May 16, 2016, 4:20:51 PM5/16/16
to FlatBuffers


On Monday, May 16, 2016 at 8:49:57 PM UTC+2, Kenneth Kasajian wrote:
When I look at the documentation and videos for Flatbuffers, it's clear that it's goal is to be a much more optimal replacement for other serialization techniques such as JSON and XML.   That seems to be the emphasis for most of its pitch.   Based on what I've seen so far I have no doubt that it achieves this goal and is much more efficient that those techniques.

I would say one great benefit is the schema. With JSON the code quickly gets complicated when checking for valid fields and encoding types. As for JSON, when done right, it can be done really fast (but still slower than FlatBuffers) if one cares to optimize for it, and it compresses better than flatbuffers. But as it turns out, flatbuffers is a great way to represent parsed JSON, and also to verify fields.
 
That is, if I have a system that's using, say, JSON to serialize, and I'm having performance problems with loading and storing my data, Flatbuffers sounds like an ideal solution.

Well yes, but you don't have to give up JSON if there are inter-op benefits - as long as JSON is compatible with a FlatBuffers schema. If it doesn't matter, sure go all in on FlatBuffers.
 
However, what if my problem is that I currently have too much in memory.   If I have a system in place where things are in RAM, and now I have to come up with a technique to offload that to disk in an efficient manner, it's possible that Flatbuffers would be a great solution.   I would be able to take a system that may require gigs of RAM to reduce it down to some small working-set such as 500MB, and simply use memory-mapped files along with Flatbuffers to create an application-level virtual-memory system.    Conceptually one can see how that can work.   But it would be nice to know if anyone has tried to solve this problem with FlatBuffers.

Has anyone done this?  Anyone have any numbers on the performance impact in real-world applications?  I realize that this can vary significantly between computers and devices and persistent storage (disk) will often be orders of magnitude slower than RAM, but it would be nice to know have some kind of an idea of what to expect from real-world applications.

I can't say I have done it with FlatBuffers, but I have worked a lot with memory mapped formats and disk formats. First, yes, FlatBuffers is well suited because the disk format is stable and portable, and because it is zero-copy, at least for some languages. For other languages, in memory object construction may reduce benefits and increase memory consumption even if loaded from disk. Some of this might be optimized in future versions. 

Assuming we have language like C or C++ with zero-copy, and with a more specific use case, then it doesn't really matter much if you

1) memory map
2) just allocate the memory in main memory
3) read the buffers directly from disk io.

The reason is 1 and 2 are the same is that main memory allocation is memory mapped already by the OS. But you can have more control doing it manually, and of course you can persist data between processes.

3) is less obvious, but is at least equally fast and provides much more control, but is also much more work. Here you can carefully control what you load into memory, flush unused memory etc. Memory mapped files still have to call the read function behind the scenes, and you have expensive page misses before that happens. If you don't get it right, you end up with the memory buffers spilling into virtual memory and have double file I/O which you want to avoid. So you end up with something like a LRU cache.

FlatBuffers does have one major downside wrt. memory mapped files: It writes back to front so the builder needs a fully constructed buffer before it makes sense to write it to disk for the purpose of read back, i.e. you cannot trivially stream to disk without rearranging the storage subsequently. But if you keep each buffer reasonably sized, this is also ok.

Compared to JSON or msgpack, or protobuf, these all need to be expanded in memory making memory mapped files less ideal. Direct file I/O can work, because you can load and expand data in you own buffer system. Since JSON compresses better than FlatBuffers for large buffers, you can theoretically gain performance over FlatBuffers due to I/O bottlenecks. But the amount of work to make this work in a general way, is very significant.

So I would say, if you are concerned about memory, don't even memory map, just be clever about you memory access patterns so you keep data in fast cache as much as possible. Otherwise memory mapped files are simpler to work with than raw files, but raw files are more portable.

Wouter van Oortmerssen

unread,
May 23, 2016, 6:45:00 PM5/23/16
to mikkelfj, FlatBuffers
Yes, FlatBuffers was designed from the start with memory mapping in mind (for reading).

What performance you're going to get depends on your data, the access patterns, and the hardware you run on, so it is hard to say for sure. Assuming access patterns are sparse enough, this should be really good.

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages