How fast is random access?

600 views
Skip to first unread message

john

unread,
Aug 5, 2015, 1:43:58 AM8/5/15
to FlatBuffers
Hi,
I just googled about flat buffers and could not find a answer to this question.

How fast is random  access with flat buffers?
Let's say I have 200000 rows index by id (approx. 2GB on disc)
1) if I want to randomly access all rows by id what performance can I expect for example with the java generated code?
2) Is the speed  comparable to a java Map<Id,Object>?
3) Do I need to hold the whole 3GB in memory?


Wouter van Oortmerssen

unread,
Aug 5, 2015, 1:10:43 PM8/5/15
to john, FlatBuffers
1) I have no benchmark numbers for you, but randomly accessing a FlatBuffer vector should be about the same as randomly accessing a Java array, since you will be bound by cache/memory performance in either case.

2) Presumably Java's Map is some sort of binary tree, not an array, so performance will be entirely different. Maps are typically slower to access, but if your id's are very sparse, it can win because less memory is being accessed. Though I assume you're talking about a situation where the full range of ids are used, in which case a Map should always be slower.

3) Do note that the current FlatBuffer implementation is limited to store 2GB in a single FlatBuffer (it uses signed 32bit offsets). How it is loaded in memory is up to you: if Java has a way to load a byte[] / ByteBuffer similar to C's mmap(), then you could work with it without it all being loaded in memory.

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hardik patel

unread,
Jan 18, 2016, 4:27:19 PM1/18/16
to FlatBuffers, john....@gmail.com
Hello Wouter:

On your (3)rd point below:
3) Do note that the current FlatBuffer implementation is limited to store 2GB in a single FlatBuffer (it uses signed 32bit offsets). How it is loaded in memory is up to you: if Java has a way to load a byte[] / ByteBuffer similar to C's mmap(), then you could work with it without it all being loaded in memory.

My response and follow up questions are:
Yes, Java has a way to leverage C style mmap using MappedByteBuffer as I have implemented here: https://gist.github.com/hardikrpatel/f614e91e0647a23fc8ad#file-main-java-L16

My concern is the data can grow bigger and my application cannot afford to load everything in memory. So, in this case, as shown in the gist stated above, I am able to iterate through the raw bytes of data from the FB-serialized binary file. 

1) Using memory mapped approach (lazy reading), how do I infer/re-construct EmailHeader object instance or vector<Participant> instances ?

Note: I am using emailheader.fbs as the schema and PlayWithFB.cpp code to serialize the file. The original gist link is: https://gist.github.com/hardikrpatel/f614e91e0647a23fc8ad

Thank you for your help and guidance.

Regards,
Hardik

hardik patel

unread,
Jan 19, 2016, 2:46:38 PM1/19/16
to FlatBuffers, john....@gmail.com
Hello Wouter:

On your (3)rd point below:
3) Do note that the current FlatBuffer implementation is limited to store 2GB in a single FlatBuffer (it uses signed 32bit offsets). How it is loaded in memory is up to you: if Java has a way to load a byte[] / ByteBuffer similar to C's mmap(), then you could work with it without it all being loaded in memory.

My response and follow up questions are:
Yes, Java has a way to leverage C style mmap using MappedByteBuffer as I have implemented here: https://gist.github.com/hardikrpatel/f614e91e0647a23fc8ad#file-main-java-L16

My concern is the data can grow bigger and my application cannot afford to load everything in memory. So, in this case, as shown in the gist stated above, I am able to iterate through the raw bytes of data from the FB-serialized binary file. 

1) Using memory mapped approach (lazy reading), how do I infer/re-construct EmailHeader object instance or vector<Participant> instances ?

Note: I am using emailheader.fbs as the schema and PlayWithFB.cpp code to serialize the file. The original gist link is: https://gist.github.com/hardikrpatel/f614e91e0647a23fc8ad

Thank you for your help and guidance.

Regards,
Hardik

On Wednesday, August 5, 2015 at 10:10:43 AM UTC-7, Wouter van Oortmerssen wrote:

Wouter van Oortmerssen

unread,
Jan 20, 2016, 1:39:45 PM1/20/16
to hardik patel, FlatBuffers, john
Since MappedByteBuffer is a subclass of ByteBuffer, you should be able to call the generated function EmailHeader.getRootAsEmailHeader() on it to start iterating through the data.

hardik patel

unread,
Jan 21, 2016, 1:03:42 PM1/21/16
to FlatBuffers, hardik...@gmail.com, john....@gmail.com
Wouter, I tried this one and it worked! :-) I can see data getting streamed from the file and I don't need to load entire file into memory.

Thank you for your help.

Regards,
Hardik
Reply all
Reply to author
Forward
0 new messages