Java CodedOutputStream performance: buffer size and OutputStream

Evan Jones

unread,

Oct 22, 2009, 9:41:23 AM10/22/09

to Protocol Buffers

I've been wasting my time doing various microbenchmarks when I should
be doing "real" work. This message describes some "failed" attempted
optimizations, ideally so others don't waste their time.

I was looking at the Java CodedOutputStream implementation, and was
interested that it uses an internal byte[] array buffer, since this is
what BufferedOutputStream does. Additionally, the JVM internally uses
8192 as the "magic" buffer size inside BufferedOutputStream, and the
native code that actually writes data to/from files and sockets. I
tried two tweaks that are both worse than the existing code. I'm
reporting this here so others don't waste their time:

a) Change the default buffer size from 4096 to 8192 bytes.
b) Remove the internal buffer and rely on OutputStream.

System: Intel Xeon E5540 (Core i7/Nehalem) @ 2.53 GHz, Linux 2.6.29
Java: Both Sun 1.6.0_16-b01 and 1.7.0-ea-b74; 64-bit; always using -
serve

Benchmark: Using ProtoBench, with my own extensions to write to /dev/
null using FileOutputStream, and BufferedOutputStream(FileOutputStream)

Summary of results:

a) Bigger buffer size: small messages are slightly slower, large
messages are slightly faster. The difference is ~1-2% at most, so this
could just be "noise." I also tried a 2048 byte buffer, and it also
makes approximately no difference.

b) Using OutputStream instead of internal buffer: For the small
message serializing to byte[] is slower, but serializing to /dev/null
is much faster (~ +30%). However, for the large message, it makes
everything a fair bit slower (at least 10% worse).

bonus) jdk7 has the same results, except it is generally faster than
jdk6

Conclusions:

* None of these optimizations is a clear win.

* 8192 is not always the right buffer size for Java (although it
should be a maximum for anything that might call
OutputStream.write()). I'm guessing the reason making the buffer
bigger hurts performance is due to the extra allocation/deallocation
cost for all the temporary CodedOutputStreams.

* Hotspot doesn't magically optimize as much as you might like: using
BufferedOutputStream does the same thing as CodedOutputStream's
internal byte[] buffer, but hotspot can't optimize the code as well.
I'm guessing this is because the dynamic dispatch on OutputStream
prevents aggressive inlining?

* Results are somewhat variable, and are of course data dependent.
More benchmarks should be done before making a performance related
code change.

Evan

--
Evan Jones
http://evanjones.ca/

Kenton Varda

unread,

Oct 22, 2009, 4:36:19 PM10/22/09

to Evan Jones, Protocol Buffers

Hey, it's great that you're trying things. I think there's room for improvement in the Java implementation (as opposed to C++), and it tends to take some trial-and-error.

You note that small messages seem faster with smaller buffer sizes, but larger messages are slower. I am guessing that by "small messages" you mean ones which are significantly smaller than the buffer size, and "large messages" means larger than the buffer size. One thing you might try: if the message is smaller than 4096 (or whatever the buffer size constant is), then allocate a buffer exactly as big as the message to avoid waste. You can call getSerializedSize() to find out the message size ahead of time. Note that calling this doesn't actually waste any time since the result is cached, and it would have to be called during serialization anyway.

Once you do that, then increasing the buffer size constant (which is now the *maximum* buffer size) might make more sense.

Evan Jones

unread,

Oct 22, 2009, 8:55:41 PM10/22/09

to Kenton Varda, Protocol Buffers

On Oct 22, 2009, at 16:36 , Kenton Varda wrote:
> You note that small messages seem faster with smaller buffer sizes,
> but larger messages are slower. I am guessing that by "small
> messages" you mean ones which are significantly smaller than the
> buffer size, and "large messages" means larger than the buffer size.

Right. I just used the messages and benchmark framework in SVN:

http://code.google.com/p/protobuf/source/browse/#svn/trunk/benchmarks

The small message is 228 bytes, the large message is 82.6 kB

> One thing you might try: if the message is smaller than 4096 (or
> whatever the buffer size constant is), then allocate a buffer
> exactly as big as the message to avoid waste. You can call
> getSerializedSize() to find out the message size ahead of time.
> Note that calling this doesn't actually waste any time since the
> result is cached, and it would have to be called during
> serialization anyway.

That is a good idea. This would be a tiny patch to the
AbstractMessageLite.writeTo(OutputStream). It already effectively does
this for toByteString() and toByteArray(), but doing this for
writeTo(OutputStream) would probably be a win: either it uses the
JDK's preferred buffer size for big messages, which would halve the
number of calls to .write(), or would use an exact sized buffer for
smaller messages, which (should? might?) result in reduced garbage
collection. I'll put testing this on my TODO list. It is a trivial
code change, but benchmarking it carefully takes time.

Basically this all started because I was curious how many times my
data was being copied between a protocol message and the wire. I
should eventually try some of these tweaks on a "larger" application
that is doing network I/O, so I'll eventually do some digging in that
context.

Reply all

Reply to author

Forward