Java: how to use protobuf to send messages quickly over sockets?

8,342 views
Skip to first unread message

jta23

unread,
Oct 29, 2009, 4:45:42 PM10/29/09
to Protocol Buffers
I'm looking to get sense if my experience sounds reasonable or if it
sounds like I'm doing something very wrong, any insight appreciated!

I have a Serializable Java object containing:

4 integers
3 bytes
2 Strings
1 short
1 double

I have a vector of about 10M objects I am sending from a server to a
client. The ObjectOutputStream for the server is created like this:

Socket clientSocket;
...
outputStream = new ObjectOutputStream(clientSocket.getOutputStream());

and the ObjectInputStream is created in a similar fashion.


I send the Serializable objects like this:

outputStream.writeObject(msg);

and receive them like this:

MyMessage msg = (MyMessage)inputStream.readObject();

On my rather slow server I can receive and process about 12K messages
a second.


Now for the Protocol Buffers part:
I used protoc to create a new Java class using:
8 int32s
2 strings
1 double

I create objects like this:

MyProtoMessage.MyMessage msg =
MyProtoMessage.MyMessage.newBuilder()
...
.build();

and send them like this:

byte size = (byte)msg.getSerializedSize();
outputStream.writeByte(size);
outputStream.write(msg.toByteArray());

and finally they are read like this:

byte size = inputStream.readByte();
byte []bytes = new byte[size];
inputStream.readFully(bytes);
MyProtoMessage.MyMessage msg = MyProtoMessage.MyMessage.parseFrom
(bytes);

With this protobuf approach I can do about 9K a second. I also tried
writeDelimited/parseDelimited and it was even slower.

Does this sound reasonable? I expected it to be faster than the
standard Java serialization approach.

Thanks!

Kenton Varda

unread,
Oct 29, 2009, 5:42:39 PM10/29/09
to jta23, Protocol Buffers
It sounds plausible.  There's no fundamental reason why protocol buffers should be faster than Java serialization, at least for simple objects like yours composed of a set of primitive values.  Since Java serialization is implemented by the VM, it can probably optimize better than protobufs can.  However, the protobuf wire format is considerably simpler, portable to many languages other than Java, and handles extensibility better.  Also, I suspect you'll find that Java serialization gets slower with more complex object trees.  And finally, Java serialization involves sending a large chunk of metadata over the wire, basically describing each class -- protocol buffers avoids this by assuming that the receiver already knows the type information.  But when sending a giant homogeneous array of simple objects, the metadata overhead ends up small.

jta23

unread,
Oct 29, 2009, 6:36:04 PM10/29/09
to Protocol Buffers
Thanks for the info. I thought the Java serialization metadata would
have been large compared to the relevant data, but I guess not!

Evan Jones

unread,
Oct 30, 2009, 11:48:14 AM10/30/09
to jta23, Protocol Buffers
On Oct 29, 2009, at 16:45 , jta23 wrote:
> and send them like this:
>
> byte size = (byte)msg.getSerializedSize();
> outputStream.writeByte(size);
> outputStream.write(msg.toByteArray());


You should do the following instead:

1. Create a CodedOutputStream wrapping your OutputStream.
2. Use CodedOutputStream.writeVarInt32 to write the size
3. use msg.writeTo() to write it. It will look something like:

CodedOutputStream out = CodedOutputStream.newInstance(outputStream);

for ( ... ) {
out.writeVarInt32(msg.getSerializedSize());
msg.writeTo(out);
}
out.flush();

This should avoid a whole ton of extra allocations/deallocations that
are being done by your current approach. If you try this, please let
me know what the performance numbers look like.

Evan

--
Evan Jones
http://evanjones.ca/

jta23

unread,
Oct 30, 2009, 10:24:35 PM10/30/09
to Protocol Buffers
I'm a bit embarrassed :)

The protobuf version of my code uses about 950MB of memory (the Java
Serializable version is only using around 650MB) and I had the java -
Xmx flag set too low; in reality protobuf is extremely fast compared
to Java Serializable:

Java Serializable:
12,000 msgs/sec

Protocol Buffers (as described in my first post):
70,000 msgs/sec

Protocol Buffers (CodedOutputStream, as described in Evan's post):
73,000 msgs/sec

Protocol Buffers (CodedOutputStream, but no flush() after each write):
76,600 msgs/sec


Thanks very much for the help, I'm very happy with the performance!

Jonathan

Kenton Varda

unread,
Oct 30, 2009, 11:01:16 PM10/30/09
to jta23, Protocol Buffers
Ah.  So Java Serialization has no good reason to be slower...  but apparently it is!  Hah!  :)

Evan Jones

unread,
Oct 31, 2009, 10:52:56 AM10/31/09
to jta23, Protocol Buffers
On Oct 30, 2009, at 22:24 , jta23 wrote:
> The protobuf version of my code uses about 950MB of memory (the Java
> Serializable version is only using around 650MB) and I had the java -
> Xmx flag set too low; in reality protobuf is extremely fast compared
> to Java Serializable:

Hm. It is a little interesting that it would use so much more memory
than using Java serializable ...

> Protocol Buffers (CodedOutputStream, but no flush() after each write):
> 76,600 msgs/sec

This is what I was attempting to describe (one flush() after writing
all messages). This matches my expectations that this should be a
little faster, which is nice.

Kenton Varda

unread,
Oct 31, 2009, 5:25:45 PM10/31/09
to Evan Jones, jta23, Protocol Buffers
On Sat, Oct 31, 2009 at 7:52 AM, Evan Jones <ev...@mit.edu> wrote:
Hm. It is a little interesting that it would use so much more memory
than using Java serializable ...

Protocol buffers need to keep track of which fields are set.  Currently this is done using a bool for each field.  Perhaps it would be more efficient to use a bitfield.  We do this in C++ but I haven't had a chance to do the same optimization in Java.
Reply all
Reply to author
Forward
0 new messages