Performace query

106 views
Skip to first unread message

Abhijit Bhopale

unread,
Nov 5, 2016, 3:00:11 PM11/5/16
to FlatBuffers

Is it recommended to use FlatBuffer for nested structure having list , size may grow up to 700-800 MB?

I just tested with protobuf (though it is not meant for it) but when the size grows XML based serialization gives better performance than the protobuf. 

mikkelfj

unread,
Nov 6, 2016, 3:17:33 AM11/6/16
to FlatBuffers


On Saturday, November 5, 2016 at 8:00:11 PM UTC+1, Abhijit Bhopale wrote:

Is it recommended to use FlatBuffer for nested structure having list , size may grow up to 700-800 MB?

I just tested with protobuf (though it is not meant for it) but when the size grows XML based serialization gives better performance than the protobuf. 

As long as you can comfortably keep all data in memory and you keep the size below 2GB you should have no problems, but it also depends on the language that you are using.

Here is the output of a load test on a 100MB buffer with a vector of 1000 tables each with vector of 100 strings and another vector of 100 integers.
Note that the bandwidth for such large tables is much higher than for small buffers, but the time per operation (full buffer with 1000 tables) is of course also much larger.

This is for the flatcc C interface. I expect C++ to be comparable.

buffer size: 100752116
start timing ...
operation: encode and partially decode large buffer
elapsed time: 0.433 (s)
iterations: 10
size: 100752116 (bytes)
bandwidth: 2325.206 (MB/s)
throughput in ops per sec: 23.078
time per op: 43.330 (ms)


 

mikkelfj

unread,
Nov 6, 2016, 3:32:02 AM11/6/16
to FlatBuffers
Sorry, the it is no 100 integers but 100.000 integers (bytes) in each of the 1000 tables.

Abhijit Bhopale

unread,
Nov 6, 2016, 11:46:56 AM11/6/16
to FlatBuffers
Thank you Mikkel. The communication is in between C#(.Net 4.5) & C++(VxWorks 6.8 GNU 4.1.2 ). I am planning to use flatcc but seems supported since at least GCC 4.4. 
Bandwidth shouldn't be the problem as we have our private network. Please provide me tips/precautions for building flatcc runtime if any. Hope this would work.

Abhijit Bhopale

unread,
Nov 6, 2016, 11:52:02 AM11/6/16
to FlatBuffers
I am more concerned about serialization & de-serialization time. Also the communication is in between little & big endian systems.

mikkelfj

unread,
Nov 6, 2016, 4:18:35 PM11/6/16
to FlatBuffers


On Sunday, November 6, 2016 at 5:52:02 PM UTC+1, Abhijit Bhopale wrote:
I am more concerned about serialization & de-serialization time. Also the communication is in between little & big endian systems.

 
The timings include full serialization and deserialisation. It takes 43ms til encode, decode and validate a 100MB large buffer, excluding any I/O overhead.

Regarding endian encoding: I cannot test performance on a big endian systems (but I know it works). I have been running a variation of the load tst where data is encoded in big endian format in the buffer. This is non-standard since standard flatbuffers are always encoded in little endian but it should give a good idea about conversion overhead. This can be done by checkout out the flatcc "be" branch on a little endian system.

The timings in this case (note that most of the data are byte elements in a vector and need no conversion)

buffer size: 100728088
start timing ...
operation: encode and partially decode large buffer
elapsed time: 0.457 (s)
iterations: 10
size: 100728088 (bytes)
bandwidth: 2206.047 (MB/s)
throughput in ops per sec: 21.901
time per op: 45.660 (ms)

So this is almost the same timings. The Intel platform I'm using has efficient byteswap operations for endian translation. The overhead could be higher on a system such as PowerPC that has no easily accessible byteswap operation.

For small buffers with varied content I have observed a 40% slowdown when doing endian conversion, but we are then talking about sub microsecond timings for both little- and big-endian encodings.


mikkelfj

unread,
Nov 6, 2016, 4:45:53 PM11/6/16
to FlatBuffers


On Sunday, November 6, 2016 at 5:46:56 PM UTC+1, Abhijit Bhopale wrote:
Thank you Mikkel. The communication is in between C#(.Net 4.5) & C++(VxWorks 6.8 GNU 4.1.2 ). I am planning to use flatcc but seems supported since at least GCC 4.4. 
Bandwidth shouldn't be the problem as we have our private network. Please provide me tips/precautions for building flatcc runtime if any. Hope this would work.

 
I missed this message in my earlier reply and just read your concern about serialization. But the answer still stands. Bandwidth in this context means how long time does it take to encode a flatbuffer and store it in memory, then read it from memory and make sure it has the expected content. Flatbuffers operate at a speed where it can be faster than the time it takes to access second level cache depending on CPU on buffer content.

I have no performance data on C#. It is likely that there will be some garbage collection overhead, but it should still be quite fast.

As to building flatcc, there are already some documentation in place: https://github.com/dvidelabs/flatcc#building

I suggest you just try it out and I'd be happy to answer any questions.

For older GCC you likely need the -D FLATCC_PORTABLE flag to work around various compatibility issues. Compilers older than testet might result in warnings that breaks the pedantic warnings of the CMAKE file, but you don't have to be so strict in your own code and then you can likely use older GCC compilers. If you encounter a specific issue that is not merely a warning on your GCC version, we might be able to work together to provide support in the flatcc/portable library.


mikkelfj

unread,
Nov 6, 2016, 5:19:37 PM11/6/16
to FlatBuffers
BTW: if anyone can think of a good reason why the big endian encoded buffer size (100728088) is slightly smaller than its little endian counterpart (100752116), I'd like to know. It could be different alignment due to different code paths, or related to vtable deduplication.

mikkelfj

unread,
Nov 6, 2016, 5:34:00 PM11/6/16
to FlatBuffers


On Sunday, November 6, 2016 at 11:19:37 PM UTC+1, mikkelfj wrote:
BTW: if anyone can think of a good reason why the big endian encoded buffer size (100728088) is slightly smaller than its little endian counterpart (100752116), I'd like to know. It could be different alignment due to different code paths, or related to vtable deduplication.

Well, now I am consistently getting the same size 100728088 on clean builds for both encodings, and the difference in timings also aren't significantly different.

Wouter van Oortmerssen

unread,
Nov 7, 2016, 1:48:50 PM11/7/16
to mikkelfj, FlatBuffers
The reason ProtoBuf may be slow is because it insist you first populate the data in objects before it serializes, which may cause a lot of object allocation/deallocation. FlatBuffers was designed to not have this problem.

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Abhijit Bhopale

unread,
Nov 8, 2016, 1:53:03 AM11/8/16
to FlatBuffers
Thank you Mikkel. 
I am using PowerPC at VxWorks end. If i get success it will be used in kernel downloadable project.

To compile flatbuffer runtime I am using files from src/runtime/*.c and included include/flatcc/** in my library project with -D FLATCC_PORTABLE flag. I could not find  "mm_malloc.h" file for powerpc . Attached build log.

Am I missing something? 
Buildlog.txt

mikkelfj

unread,
Nov 9, 2016, 4:02:40 PM11/9/16
to FlatBuffers
flatcc/portable/paligned_alloc.h is a very new feature added at the same time another used helped testing on AIX PowerPC so either we missed something in the process, or more likely, your system configuration is different. The AIX test setup used C99, so perhaps -stdc=c99 will do the job?

You could also try to to define PORTABLE_NO_POSIX_MEMALIGN 
There should be a fallback solution, though not well tested.
paligned_alloc.h could be updated with tests for your system if you are able to provide it.

In more detail, aligned_alloc either relies on C11 aligned_alloc (but no clib seems to support it yet), or use the posix_memalign (in the mm header you are missing), or use a fallback overallocation with enough space for manual alignment (which is not detected for your platform), or use the Windows _aligned_alloc (which is not relevant for you).

There is also an option to extend paligned_alloc.h with a non-posix memalign function, but it is very system specific and the free operation is no well defined so it must be done on a case by case basis.


It is likely you will encounter other issues with vxworks, but it is very likely these can be fixed by enhancing the portable library.

mikkelfj

unread,
Nov 10, 2016, 12:23:49 PM11/10/16
to FlatBuffers
I have update flatcc with updated detection for posix_memalign and tested and fixed som bugs in the malloc based fallback function for aligned_alloc.
It might be that the source now compiles for you. The following define is no longer supported:
PORTABLE_NO_POSIX_MEMALIGN

but -DPORTABLE_POSIX_MEMALIGN=0
and -DPORTABLE_C11_ALIGNED_ALLOC=0

can be used to disable these implementations if the detection fails, leaving the fallback available.

If you wish to use the VxWorks memalign function you need to provide reliable detection and implementation which I can the add to palgined_alloc.h.
Reply all
Reply to author
Forward
0 new messages