Is there a need or desire for protobuf-lite?

2,503 views
Skip to first unread message

Wink Saville

unread,
Apr 19, 2009, 7:45:34 PM4/19/09
to prot...@googlegroups.com
I've been looking at protobuf and I'm somewhat disappointed by the size of
the library on X86_64 and the size of the generated code for a simple message:

$ size libprotobuf.so
   text       data        bss        dec        hex    filename
1008339      21344       1128    1030811      fba9b    libprotobuf.so

The flags for gcc I used for my simple test program was:

CFLAGS := -Wall -g -DGOOGLE_NO_RTTI -o2

The simple protobuf message was:

$ cat test1.proto
syntax = "proto2";
option optimize_for = SPEED;

package protobuf_tests;

message test1 {
  required int32 v = 1;
  optional int32 o = 2;
  repeated string s = 3;
}


Size when optimized for speed:

   text       data        bss        dec        hex    filename
  15851          8         33      15892       3e14    test1.pb.o

Size when not optimized for speed::

   text       data        bss        dec        hex    filename
   6852          8         33       6893       1aed    test1.pb.o


As would be expected the performance hit was pretty large, optimized for speed:

test1_cpp serialze Done total=0.656162secs 1000000 loops 656ns/loop
test1_cpp deserialize Done total=0.434740 1000000 loops 434ns/loop

without optimized for speed:

test1_cpp serialze Done total=1.994011secs 1000000 loops 1994ns/loop
test1_cpp deserialize Done total=1.609001 1000000 loops 1609ns/loop

The two loops are below:

  nsecs_t start = system_time_ns();
  for (int i=loops; i != 0; i--) {
    t.SerializeToString(&data);
  }
  nsecs_t stop = system_time_ns();

  start = system_time_ns();
  for (int i=loops; i != 0; i--) {
    x.ParseFromString(data);
  }
  stop = system_time_ns();



Given the above, I thought I'd try protobuf-c which appears to ignore the speed option,
it is quite a bit smaller and somewhat faster on this simple message:

   text       data        bss        dec        hex    filename
   1370         56          0       1426        592    test1.pb-c.o
  51751       1320         16      53087       cf5f    libprotobuf-c.so

test1_c serialze Done total=0.182868secs 1000000 loops 182ns/loop
test1_c deserialize Done total=0.420284 1000000 loops 420ns/loop

The loops for protobuf-c are:

  nsecs_t start = system_time_ns();
  for (int i=loops; i != 0; i--) {
    size = protobuf_tests__test1__get_packed_size(&t);
    protobuf_tests__test1__pack(&t, data);
  }
  nsecs_t stop = system_time_ns();

  start = system_time_ns();
  for (int i=loops; i != 0; i--) {
    _ProtobufTests__Test1 *x = protobuf_tests__test1__unpack(NULL, size, data);
    protobuf_tests__test1__free_unpacked(x, NULL);
  }
  stop = system_time_ns();

So protobuf library is about 19x larger (1,000,000/52,000) and the code is about 11x larger (16,000/1,400)
when optimized for speed and about 5x larger (6,00/1,400) when not optimized for speed. I could be making
an inappropriate comparison and the protobuf-c is certainly not as mature but it does look encouraging.

This may not be news to anyone, but the large difference makes me wonder if it would be worth
while to create protobuf-lite. What do people feel the minimum feature set that would be needed
for protobuf-lite? Does anyone else feel a lite version would be desirable?

Other ideas comments?

-- Wink

test1.tgz

Observer

unread,
Apr 20, 2009, 11:49:03 AM4/20/09
to Protocol Buffers
Size in what context? RAM or on disk? Or the amount of memory that
is required at run time for your application to function?

Create a statically linked binary using the same set of tests and post
the results. The size of the .so is a less than perfect test of bloat
or efficiency, but I'd be hard pressed to assume on disk size of a .so
is important in any real world application. Try either stripping
your .so (strip(1)) or create a statically linked library without
debugging flags (-static -Os && strip -s ${MY_BINARY}) and I'd imagine
the results will yield a reasonably small binary. It looks like the
the phenomena that you're commenting on is the size of the debugging
symbols (note the size of the text section of the binaries). Run nm
(1) on the .so and you'll see all kinds of namespace information and
other miscellaneous string data that contributes to an enormous .so
binary.

In a recent C++ project, the size of our .so library went from a few
MiB in size to tens of MiB because we started making use of C++'s
namespaces. The resulting statically linked and stripped binary,
however, contained none of the bloat we experienced after adding
namespaces to the code base.

Hope that's useful.

lahi...@gmail.com

unread,
Apr 20, 2009, 1:15:02 PM4/20/09
to Protocol Buffers
Frankly I'm surprised so many people care about the generated code
size - I'm generally much more interested in speed.
For example, I suspect your C unpack() could be optimized quite a bit
by using a custom allocator. Another example: probably the only
change I'm likely to make to protobuf-c in the forseeable future is a
rewrite of "pack()" to optimize packing of submessages... well, and
i'll probably need to follow protobuf if it implements packed repeated
fields (another great optimization).

If I were designing a C++ protobuf, I'd probably use the strategy I
used for protobuf-c: make the reflection data so efficient and easy
that you can optimize the hell out of the reflection-based api,
thereby:
- only needs one copy of the pack/unpack code, in the core library
- eliminate the difference between optimize for speed or size -- it's
really possible to do both!
- minimizes the generated code to be practically nothing but
introspection data
- in theory, one could bind the C objects could to other languages
using the reflection api

Unfortunately, some amount of bloat is inherent in the C++ tradition
of using accessor methods for the various members. More bloat from
std::string. etc. So I'm not sure you "lite" you can get w/o making
a completely incompatible version.

- dave
>  test1.tgz
> 2KViewDownload

Wink Saville

unread,
Apr 20, 2009, 1:29:55 PM4/20/09
to lahi...@gmail.com, Protocol Buffers
In the embedded systems they are both important. I potentially see 100's
of messages being defined so generated code size could be a problem.
Also, as I have no existing code right now an incompatible version isn't
a problem for me.

One thing that does surprise me is the cost of an enum in the generated
code. My expectation is that there should be zero runtime cost. But is
appears there is some need for a data structure to describe them, could
you educate me why they need the extra infrastructure?

-- Wink

Wink Saville

unread,
Apr 20, 2009, 1:31:32 PM4/20/09
to Observer, Protocol Buffers
Thanks for the info, I'll create a stripped statically linked binary and report back.

Alain M.

unread,
Apr 20, 2009, 9:46:44 PM4/20/09
to ProtBuf List
Is it possible to use protobuf-c in the embedded side and regular
protbuff in the PC side?

This sound like a win-win option, or am I mistaken???

Thanks in advance for feedback,
Alain

Wink Saville escreveu:

Wink Saville

unread,
Apr 20, 2009, 11:51:16 PM4/20/09
to Alain M., ProtBuf List
I assume the wire format for all variations of protobuf
are compatible of course there are no guarantees.

In my little test program I'm using both and see that
they are generating the same data so it is true for
that one data point.

-- Wink

Wink Saville

unread,
Apr 21, 2009, 2:20:15 AM4/21/09
to Observer, Protocol Buffers
Below is the data I collected. I decided not to do static linking
as it wouldn't be helpful for a size constrained system.

For the size on "disk" as reported by "ls -l" we see that the
library size for libprotobuf.so is 18x larger than libprotobuf-c.so.

We also see that the on "disk" size for the generated code,
test1.pb.*, is 5x larger for the optimized for speed case (-O2) and
4x for the optimized for space (-Os) compared to the test1.pb-c.*.

Using the size utility to represent the approximate RAM usage
we see test1.pb.* is 6x larger for -O2 and 3x larger for -Os as
compared to test1.pb-c.*

Looking at the performance we see that the c++ version is 17% faster
faster serializing and 1.8x faster deserializing when optimized for speed.
While in the optimized for space case the c++ version is 6.8x slower
when serializing and 3.7x slower deserializing.

Of course YMMV and I wouldn't be surprised if other code or other
compiler settings on other machines could be quite different.

In any case, my conclusion remains the same and that is protobuf-c is
more efficient when looking at the space/performance trade offs.

So I think the question remains, could there be an improvement in efficiency
for C++ and maybe Java and Python by defining protobuf-lite?

-- Wink


Output of ls -l representing size on "disk"

-rwxr-x--- 1 wink eng   56768 Apr 20 22:36 libprotobuf-c.so
-rwxr-x--- 1 wink eng 1033432 Apr 20 22:36 libprotobuf.so
-rwxr-x--- 1 wink eng   28720 Apr 20 22:16 test1-O2
-rwxr-x--- 1 wink eng   21664 Apr 20 22:13 test1-Os
-rw-r----- 1 wink eng   13576 Apr 20 22:35 test1.pb-O2.o
-rw-r----- 1 wink eng   10032 Apr 20 22:35 test1.pb-Os.o
-rw-r----- 1 wink eng    2576 Apr 20 22:35 test1.pb-c-O2.o
-rw-r----- 1 wink eng    2352 Apr 20 22:35 test1.pb-c-Os.o

Output of size utility representing size in RAM:


   text       data        bss        dec        hex    filename
   3427          0          0       3427        d63    test1.o
   9326          8         33       9367       2497    test1.pb-O2.o
   4361          8         33       4402       1132    test1.pb-Os.o
   1424          0          0       1424        590    test1.pb-c-O2.o
   1299          0          0       1299        513    test1.pb-c-Os.o
  24587       1112        240      25939       6553    test1-O2
  17602       1040        192      18834       4992    test1-Os

Speed optimized:

test1_cpp serdes
test1_cpp serialze Done total=0.242912secs 1000000 loops 242ns/loop
test1_cpp deserialize Done total=0.225155 1000000 loops 225ns/loop
test1_cpp X

test1_c serdes
test1_c serialze Done total=0.291570secs 1000000 loops 291ns/loop
test1_c deserialize Done total=0.405527 1000000 loops 405ns/loop
test1_c X

Space optimized:


test1_cpp serdes
test1_cpp serialze Done total=1.897658secs 1000000 loops 1897ns/loop
test1_cpp deserialize Done total=1.602048 1000000 loops 1602ns/loop
test1_cpp X

test1_c serdes
test1_c serialze Done total=0.277009secs 1000000 loops 277ns/loop
test1_c deserialize Done total=0.423629 1000000 loops 423ns/loop
test1_c X


On Mon, Apr 20, 2009 at 8:49 AM, Observer <se...@chittenden.org> wrote:
test1-090420.tgz

Anonymous-ish

unread,
Apr 21, 2009, 12:02:43 PM4/21/09
to Protocol Buffers
> Below is the data I collected. I decided not to do static linking
> as it wouldn't be helpful for a size constrained system.

... curious, but okay. Unnecessary/unresolved symbols and their
associated instructions won't be included in the final resulting
binary, so unless your embedded app does a lot of fork(2)/exec(2) and
benefits from using shared pages, I don't understand why it wouldn't
be helpful. Did you at least try stripping the binary to see how much
of the output was debugging information? Surely an embedded app would
ship with that information removed. The size of C++'s debugging
information is impressive and not to be understated. For example:

% /usr/local/bin/g++43 -g ${CXXFLAGS} ${WFLAGS} -o foobin.debug $
{FOOBIN_OBJS} lib/libprotobuf.a -static
% cp foobin.debug foobin.strip
% strip -s foobin.strip
% du -h foobin.debug foobin.strip
10M foobin.debug
2.0M foobin.strip
% size foobin.strip foobin.debug
text data bss dec hex filename
1879465 19764 160760 2059989 1f6ed5 foobin.strip
1879465 19764 160760 2059989 1f6ed5 foobin.debug

Cheers.

Wink Saville

unread,
Apr 21, 2009, 12:32:02 PM4/21/09
to Anonymous-ish, Protocol Buffers
I forgot to mention, I stripped all of the files, and in this
case the "embedded" system is a phone (Android) which
is size constrained because of limited flash memory but
does have many executables so shared libraries are the
norm.

lahi...@gmail.com

unread,
Apr 21, 2009, 1:11:39 PM4/21/09
to Protocol Buffers
If you don't care about the API, why not just use protobuf-c from C++?
If you take away the accessor API, I don't see what C++ gets you.

It's true that the enum descriptor could probably be eliminated... i
was mostly paralleling the c++ api... BUT I like it, cause it could be
useful for language bindings and pretty-printing and parsing,
someday. A patch to not generate them for proto_c-c would be
accepted...

- dave

On Apr 20, 10:29 am, Wink Saville <w...@google.com> wrote:
> In the embedded systems they are both important. I potentially see 100's
> of messages being defined so generated code size could be a problem.
> Also, as I have no existing code right now an incompatible version isn't
> a problem for me.
>
> One thing that does surprise me is the cost of an enum in the generated
> code. My expectation is that there should be zero runtime cost. But is
> appears there is some need for a data structure to describe them, could
> you educate me why they need the extra infrastructure?
>
> -- Wink
>

Wink Saville

unread,
Apr 21, 2009, 2:42:01 PM4/21/09
to lahi...@gmail.com, Protocol Buffers
See inline.

On Tue, Apr 21, 2009 at 10:11 AM, <lahi...@gmail.com> wrote:

If you don't care about the API, why not just use protobuf-c from C++?
If you take away the accessor API, I don't see what C++ gets you.

I will probably look at doing just that if there is no interest in a protobuf-lite.
 


It's true that the enum descriptor could probably be eliminated... i
was mostly paralleling the c++ api... BUT I like it, cause it could be
useful for language bindings and pretty-printing and parsing,
someday.  A patch to not generate them for proto_c-c would be
accepted...

Maybe I'll do this too:) But could you give me some insight on how you
use the enum descriptor, maybe I'm missing something. All I want to
do is have a "structure" defined and be able to serialize and deserialize
that structure. That's what I think the minimum protobuf-lite is, but
again, maybe I'm totally missing something.

-- Wink

lahi...@gmail.com

unread,
Apr 21, 2009, 3:30:09 PM4/21/09
to Protocol Buffers


On Apr 21, 11:42 am, Wink Saville <w...@google.com> wrote:
> Maybe I'll do this too:) But could you give me some insight on how you
> use the enum descriptor, maybe I'm missing something. All I want to
> do is have a "structure" defined and be able to serialize and deserialize
> that structure. That's what I think the minimum protobuf-lite is, but
> again, maybe I'm totally missing something.

We don't use it, essentially. We provide it mostly because protobuf
itself provides an equivalent, and it seems useful (as I mentioned for
printing/parsing and for language-bindings).

For most users the space wasted by this structure is pretty slight, so
I think it's useful enough to keep in by default (I'm extremely
reluctant to break compatibility at this point).

Kenton Varda

unread,
Apr 21, 2009, 4:06:26 PM4/21/09
to Wink Saville, Alain M., ProtBuf List
On Mon, Apr 20, 2009 at 8:51 PM, Wink Saville <wi...@google.com> wrote:
I assume the wire format for all variations of protobuf
are compatible of course there are no guarantees.

Yes, they are definitely compatible.

Wink Saville

unread,
Apr 21, 2009, 4:17:11 PM4/21/09
to lahi...@gmail.com, Protocol Buffers
Sounds good to me, glad to hear it's not required. One of the things
I'm looking to use protobuf's for is to create a number of small
messages including "enums" and if something isn't being used
it would be nice not to have to carry it around.

-- Wink

Kenton Varda

unread,
Apr 21, 2009, 4:20:59 PM4/21/09
to Wink Saville, prot...@googlegroups.com
The protocol buffers implementation was designed to run on beefy server machines where we regularly run binaries with hundreds of megs of code and don't worry about it that much.

Optimizing for code size does make the code a lot slower, true, but this is compared to an insanely fast base.  It's likely that the speed difference won't matter for client-side apps, e.g. on Android.

As for a protobuf-lite, I definitely think this is worthwhile and have been thinking about it for awhile now.  I think what we should do is write a new superclass of protobuf::Message -- let's call it LiteMessage -- which contains only a minimal subset of its methods.  Basically, it would only support serialization and parsing.  It would have no descriptors and no reflection.

Then, we can add an option which causes the protocol compiler to generate only an implementation of LiteMessage -- with only parsing, serialization, and field accessors.  Everything else is omitted.  No descriptor, no reflection, no UnknownFields, etc.

Then we could factor out a libprotobuf-lite which basically only includes LiteMessage, part of wire_format, and the io subdirectory from the protobuf code.  And maybe extension_set -- not sure if we want this or not.

Wink Saville

unread,
Apr 21, 2009, 4:51:42 PM4/21/09
to Kenton Varda, prot...@googlegroups.com
In my particular case I'm looking at using protobuf's as an internal communication scheme between processes, therefore we are both the client and server. Also, I need to sell this to my colleagues and if it is to large or is slow it will likely be rejected.

Anyway, your plan looks right on target, what can I do to help?

One other question, is there any reason this can't be applied to Java at the same time. Android is very Java centric and my first target is for communication between a Java process and a C/C++ process.

-- Wink
Reply all
Reply to author
Forward
0 new messages