Hi,
I read recently
protocol-buffers encoding and notice a way to save space
It is better to hold a repeated value than a repeated message of a value
protobuf is saving data on each message (header/type/length), so saving a repeated message of two int64 will cost more than saving 2 repeated int64 (int64 as an example).
I Used protobuf-java version: 3.4.0
Made a test to check it, with and without compression (LZ4) see results bellow (this is a similar case we have in production)
message Head1 {
repeated Data d1 = 1;
}
message Data {
int64 v1 = 1;
int64 v2 = 2;
}
message Head2 {
repeated int64 v1 = 1;
repeated int64 v2 = 2;
}
With 400 messages of Head1 and Head2 (same random values in each message):
Message 'Head1' Uncompressed data size is: 3985 bytes
Message 'Head1' compressed data size is: 3697 bytes
Message 'Head2' Uncompressed data size is: 2391 bytes
Message 'Head2' compressed data size is: 2402 bytes --> 35% less
Questions:
The problem is I am losing schema ordering on app side and I will have to keep lists (in Head2) syncd all the time
Is this correct or I am missing something?
By adding a new flag to the proto it can save lots of data in the encoded proto (in case its relevant)
I tested also with writing to cassandra and the save is huge +40%!!!!
Thoughts?