Reduce protobuf with repeated messages size

64 views
Skip to first unread message

Boaz Yaniv

unread,
May 12, 2019, 9:41:50 AM5/12/19
to Protocol Buffers
Hi,
I read recently protocol-buffers encoding and notice a way to save space
It is better to hold a repeated value than a repeated message of a value
protobuf is saving data on each message (header/type/length), so saving a repeated message of two int64 will cost more than saving 2 repeated int64 (int64 as an example).

I Used protobuf-java version: 3.4.0
Made a test to check it, with and without compression (LZ4) see results bellow (this is a similar case we have in production)

message Head1 {
    repeated Data d1 = 1;
}

message Data {
    int64 v1 = 1;
    int64 v2 = 2;
}
message Head2 {
    repeated int64 v1 = 1;
    repeated int64 v2 = 2;
}

With 400 messages of Head1 and Head2 (same random values in each message):
Message 'Head1' Uncompressed data size is: 3985 bytes
Message 'Head1' compressed data size is: 3697 bytes

Message 'Head2' Uncompressed data size is: 2391 bytes
Message 'Head2' compressed data size is: 2402 bytes   --> 35% less

Questions:
The problem is I am losing schema ordering on app side and I will have to keep lists (in Head2) syncd all the time

Is this correct or I am missing something?

By adding a new flag to the proto it can save lots of data in the encoded proto (in case its relevant)

I tested also with writing to cassandra and the save is huge +40%!!!!

Thoughts? 

Adam Cozzette

unread,
May 13, 2019, 5:25:34 PM5/13/19
to Boaz Yaniv, Protocol Buffers
I have not looked into size savings from compression, but your uncompressed sizes sound right assuming about 3 bytes per int64 (those use a variable-size integer so the size depends on the value being stored). I think there's a tradeoff here between ease of use and serialized size, but if it's important for your use case to keep serialized size small then this technique sounds like one which is worth considering for sure.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/69b44003-5821-4678-9ba7-18c1a7a05ee5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nadav Samet

unread,
May 13, 2019, 7:21:14 PM5/13/19
to Adam Cozzette, Boaz Yaniv, Protocol Buffers
A few more thoughts:
- Random data doesn't tend to compress well - try to measure the benefit of compression for your preferred message layout using data that is typical for your application.
- It's always possible to add helper functions/classes to make it easier to deal with inconvenient message layouts.

-Nadav


For more options, visit https://groups.google.com/d/optout.


--
-Nadav
Reply all
Reply to author
Forward
0 new messages