FlatBuffer object shows heavy memory usage as compared to serialized Protocol Buffer object.

1,471 views
Skip to first unread message

Gaurav Jain

unread,
Jun 10, 2016, 2:29:10 AM6/10/16
to FlatBuffers
Hello All,
I conducted some experiments, in which i have similar(containing same fileds) proto and flatbuffer schema files.

I instantiated various class/struct and figure there is significant difference in serialized protobuf object and flatbuffer object.

Below table depict the sizes of various object in bytes.

Protocol Buffer
Flatbuffer
SerializedUnserialized
106912408

Though we have achieved in-memory reduction in size by using Flatbuffer, but in terms of disk-usage we are paying much more penalty.

Any suggestion/thoughts on this ?

mikkelfj

unread,
Jun 13, 2016, 10:28:20 AM6/13/16
to FlatBuffers



Though we have achieved in-memory reduction in size by using Flatbuffer, but in terms of disk-usage we are paying much more penalty.

Any suggestion/thoughts on this ?

You data might not fit the schema well, for example small integers stored in in large integers, or you may have heavy indexing such as vectors of small tables.
In some cases rearranging the schema may help.

If you don't need direct memory mapped access (which Protobuffers don't provide anyway), you could use compression. LZ4 should get you closer while still being much faster than protocol buffers, or better but slower gzip compression. It still likely won't beat protocol buffers, especially when these are also compressed, but it is important to remember that LZ4 + FlatBuffers is a readily available tool.

If you have heavy indexing with many small tables you can also store data in JSON format and compress. While it doesn't sound efficient, it avoids the pointer structures and it can still be much faster than protocol buffers. Only FlatCC (which I developed) provides per schema generated JSON printing and parsing, but this can likely still be much faster than Protocol Buffers, though again, not necessarily smaller. As a rule of thumb, large compressed JSON can be halve the size of compressed FlatBuffers, while the opposite is true for small FlatBuffers.

Wouter van Oortmerssen

unread,
Jun 13, 2016, 4:50:46 PM6/13/16
to mikkelfj, FlatBuffers
A 4x factor is definitely excessive, in my own testing FlatBuffers was typically only 30% bigger on the wire.



Can you show us the schema (and indication of typical sizes of arrays, if any) ?

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gaurav Jain

unread,
Jun 28, 2016, 3:03:33 AM6/28/16
to FlatBuffers, mik...@dvide.com
Attaching the sample schema file:

============================
table EntryFB {
  regions:[Region];
  time_usecs:long;
  locs:[clock];
  last_data_mutator_vdisk_id:long;
  incomplete:bool;
  nullified:bool;
}

table Region {
  block_offset:int;
  length:int;
  extent_offset:int;
  extent_id:ExtentIdFB;
  extent_group_id:long;
  invalid:bool;
}

table ExtentIdFB {
  vdisk_block:long;
  egroup_mapping_in_eid_map:bool;
  sha1_hash:string;
  extent_size:int;
  owner_id:long;
}

table clock {
  component_id:long;
  incarnation_id:long;
  operation_id:long;
  operation_timestamp:long;
}

root_type EntryFB;
=============================

The size of array field will be around 15-20. Is there any rearrangement I can do in this schema to consume less space ?

Gaurav Jain

unread,
Jun 28, 2016, 6:01:44 AM6/28/16
to FlatBuffers
Hey mikkelfj,
Can you point me to the LZ4 + Flatbuffer tool you were talking about ?

mikkelfj

unread,
Jun 29, 2016, 9:18:41 AM6/29/16
to FlatBuffers


On Tuesday, June 28, 2016 at 12:01:44 PM UTC+2, Gaurav Jain wrote:
Hey mikkelfj,
Can you point me to the LZ4 + Flatbuffer tool you were talking about ?

To answer you other question first: Looking briefly at you schema, I think the timestamp table should be a struct. You could also consider if all fields should be long. Place smaller fields last in the struct so you don't waste alignment.

Otherwise I think the layout makes sense - I can't say if the type of each field is the right for you data - and I don't think you should sacrifice function to minimize space. Use tables when you need flexibility or when you don't use most of the fields. Use structs for small fixed items. Avoid writing table fields than can have a default or are not in active use.

LZ4:
It isn't a combined FlatBuffer + LZ4 tool, although it could be - LZ4 has a very nice streaming protocol format that could be added to the FlatBuffer interface, but for simplicty just compress after creating the buffer and vice versa for reading. You loose direct buffer access, but it is still much faster than field by field compression like protobuf.


You can use Python LZ4 Tools on the command line: https://pypi.python.org/pypi/lz4tools to test the compression after writing a buffer to disk.

There are LZ4 libraries in almost any language. It should for example be only a few lines in Python which makes LZ4 attractive - fast, widely support, easy to integrate into embedded systems with minimu overhead. Zlib would also work - better compression, but slower and more overhead.

Advanced use: I am only intimately familiar with FlatCC, the C binding: you could create you own emitter object that compress one LZ4 block at a time and emits it to a stream. Due to back to front operation the blocks would be reverse ordered, but you just fix that when reading back. Even without compression, this is a nice feature of the LZ4 format. I have considered doing this, but think perhaps it is best to keep FlatBuffers simple. In this way you don't need to store the entire buffer in memory or on disk before transmission.

Wouter van Oortmerssen

unread,
Jun 29, 2016, 1:12:40 PM6/29/16
to Gaurav Jain, FlatBuffers, mikkelfj
You use a lot of "long" types, which in FlatBuffers are always 64bit. In Protobuf, if you use varints and these values are typically small, these can often reduce to just 1-3 bytes.

FlatBuffers doesn't have any varints (though it could have, I guess), so the biggest savings right now would be to use smaller int types where possible (byte/short/int).

Wouter van Oortmerssen

unread,
Jun 29, 2016, 1:13:38 PM6/29/16
to Gaurav Jain, FlatBuffers, mikkelfj
Also, simple types like clock can save a lot of memory when they're structs as opposed to tables, assuming you have no need to ever extend them.
Reply all
Reply to author
Forward
0 new messages