Re: [protobuf] Optimizing protoc for Java

150 views
Skip to first unread message

Christopher Smith

unread,
Nov 28, 2012, 11:17:38 AM11/28/12
to Ryan Fogarty, prot...@googlegroups.com
Interested.

--Chris

On Nov 19, 2012, at 4:07 AM, Ryan Fogarty <ryan.f...@gmail.com> wrote:

I have a repeated primitive field array optimization for the protoc-generated Java source, but before I discuss I would like to gauge interest (and get access to the Protocol Buffer Group).

Thanks,
Ryan

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/ym9XqRQ9tbMJ.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.

Ryan Fogarty

unread,
Nov 28, 2012, 3:47:08 PM11/28/12
to prot...@googlegroups.com, Ryan Fogarty
So to provide a little background, we are using Protobuf to push some fairly high throughput data and thus may be abusing its original design intent. We do try to keep packages under multi-megabytes by segmenting data packets. We are pushing this data from a C++ producer to a Java consumer. The C++ side has no problem handling and serializing the data. The Java side we found has the limitation that accessing the data through the accessor needlessly creates autoboxed object types that eventually slam our GC.

To fix the problem, I started out by generating Java stubs and then modyfying the problematic fields. However, I find that strategy is difficult to maintain with each new Protobuf version and it also was a bit precarious as this is a multi-organizational system and required too much expertise (and knowledge transfer). So I finally got around to hacking the protoc compiler itself to make a more efficient Java implementation.

I have ported the 2.4.1 version of the protoc compiler to use a Java <primitivetype>[ ] as the storage for a repeated primitive fields as an alternative to the less efficient java.util.List<PrimitiveBoxedType>. Additionally, it looks trivial to push this to the subversion trunk though it has a couple of added features over 2.4.1. <aside>I must compliment the authors of the protoc compiler as I must say that it was pretty straight forward to make this change.</aside>

The changes are entirely located in the file $PROTO_HOME/src/google/protobuf/compiler/java/java_primitive_field.cc

The change should improve performance for the (repeated) primitive accessor without changing its API:

<primitivetype> get$FIELDNAME$(int index)

but I have also extended the interface with a get..Array() function:

<primitivetype>[ ] get$FIELDNAME$Array();

Unfortunately, with the lack of const in Java this little addition actually punches a hole in the immutable pattern used in the Java API, but perhaps significantly for some will also allow use of the System.arraycopy function. I suppose I could have offered up my own version of an arraycopy as an alternative in lieu of the get...Array() function (kind of just thought of that). I.e. copy$FIELDNAME$Array(<primitivetype>[ ] target, int offset, int length). Anyway, I would welcome such suggestions.

I have uploaded to my Google Drive (https://docs.google.com/open?id=0B6kQ2S7zDGNaWl9OY041MGpwcHc) a rather raw patch for these changes (actually maintains all of the original macro lines commented out for reference). I hope others might find a use for this higher performance version of the Java interface. Ideally, I would love to see this integrated into the official version but there might be some edge cases that I am not considering. And there are some other design issues to consider such as the breaking immutability mentioned above as well as best strategies of growing a primitive array on the Builder side (I probably should just mimic the proportional ArrayList algorithm that it would use now).

Would love to get some feedback if folks might find this useful.

Thanks,
Ryan

P.S. Apologies for the long post.

Reply all
Reply to author
Forward
0 new messages