Question about size/speed of protobufs with different formats

George Wong

unread,

Aug 5, 2013, 1:32:03 PM8/5/13

to prot...@googlegroups.com

Hello,

I was wondering which of the following ways to define a .proto would be faster / more space efficient -- or if there'd be no difference at all...

Option 1:

message Packet {

  ...
  repeated Process process = 6;
  ...
  message Process {
    optional uint32 pid = 1;
    ...
    optional string execname = 14;
  }
}

or Option 2:
message Packet {
  ...
  optional uint32 pid1 = 6;
  ...
optional string execname1 = 20;
  optional uint32 pid2 = 21;
 
...
optional string execname2 = 35;
}


I'm essentially wondering what effect "loop unwinding" has in a protobuf (and yes, I know how many of the Process protos I have). Because the ID is used, I'm wondering if the extra byte (when you go from 15 to 255) is that much of an issue. Also I'm not sure how actually reading into the "repeated" works, so I'm wondering about speed (in creation/setting/encoding/decoding/reading).


Thanks!

Ilia Mirkin

unread,

Aug 5, 2013, 1:45:24 PM8/5/13

to George Wong, prot...@googlegroups.com

Well, for regular values, it goes <tag> <value>, for a subproto it
goes <tag> <length> <submessage> (which in turn has <tag> <value>
pairs in it). So it depends on the number of fields inside of the
message, and how many bytes it is total. But assuming non-edge-case
conditions, you're probably better off using a submessage and smaller
tag ids (i.e. < 16) than unwinding the repeated (or, indeed, even if
it were a single message). Hm, this also creates an interesting aside,
which is that perhaps it is more efficient to only ever use tag ids <
16 and once you go over that, stick the rest of the fields into a
submessage. It'd make for very confusing code though.

In terms of speed, you end up creating more objects/have recursion
when dealing with submessages, so "loop unrolling" could be effective
there. (By very very small amounts, usually I/O is the limiting
factor.) As always, with such things, benchmark it :)

-ilia

> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Christopher Head

unread,

Aug 6, 2013, 5:33:40 PM8/6/13

to prot...@googlegroups.com, George Wong

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Wouldn’t it be even smaller to do this

message Packet {
repeated uint32 pids = 1 [packed=true];
repeated string execnames = 2;
}

working under the assumption that the two repeated fields will be the
same length? You will burn two bytes of overhead for the tag and length
for the pids field, plus two bytes of overhead per process for the
execnames field, for a total of 2N+2. This is less than the 5N for
option 1 or 3×15 + 5×(N-15) for option 2, for most values of N.

Chris

since then you would

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)

iF4EAREIAAYFAlIBa7cACgkQMcVpqLZH/3xKlgD+MbppaNsWfKWFMp2YSum4WZLK
ykcxcL6aztlgm+7jkksA/30PTie31o/TH29wVGfW84yGUgXVz4UPTR9dICC0BdPJ
=0CCv
-----END PGP SIGNATURE-----

Reply all

Reply to author

Forward