In my experience, protocol buffers are more than fast enough to be
able to keep up with disk speeds. That is, when reading uncached data
from the disk at 100 MB/s, protocol buffers can decode it at that
speed. Now, if your data is cached, and your application is not doing
much with the data, then I would expect protocol buffers to take 100%
of the CPU time, since the disk read doesn't take CPU, and your
application isn't doing much.
In other words: in a more "real" application, I would expect protocol
buffers will take only a very small portion of your application's time.
> Again I expected that decoding strings would be almost all the time
> (although decoding here still seems slower than in C in my
> experience). I am trying to figure out why mergeFrom method for this
> message is taking 6 sec (own time).
Decoding strings in Java is way slower because it actually decodes the
UTF-8 encoded strings into UTF-16 strings in memory. The C++ version
just leaves the data in UTF-8. If this is a performance issue for your
application, you may wish to consider using the bytes protocol buffer
type rather than strings. This is less convenient, and means you can
"screw up" by accidentally sending invalid data, but is faster.
> There are around 15 SubMessages.
This is basically the problem right here. Each time you parse one of
these messages, it ends up allocating a new object for each of these
sub messages, and a new object for each string inside them. This is
pretty slow.
As I said above: I suspect that in a "real" application, this won't be
a problem. However, it would be faster if you get rid of all the sub
messages (assuming that you don't actually need them for some other
reason).
Finally, I'll take a moment to promote my patch that improves Java
message *encoding* performance, by optimizing string encoding. It is
available at the following URL. Unfortunately, there is no similar
approach to improving the decoding performance.
http://codereview.appspot.com/949044/
Evan
--
Evan Jones
http://evanjones.ca/
This is true, provided that everyone uses the same encoding without
any bugs, and canonicalizes Unicode in the same way (http://unicode.org/reports/tr15
). In general, this is tricky, and I would suggest using the built-in
string type. However, if you have a very specific need, and the
decoding is a bottleneck, this should work.
> Also do you think that if I
> encode/decode using utf-16 it would be faster? Clearly it is not as
> compressed.
I would think it should be, but I haven't done any performance
measurements, so I can't confirm 100% that this is the case.