So to provide a little background, we are using Protobuf to push some
fairly high throughput data and thus may be abusing its original design
intent. We do try to keep packages under multi-megabytes by segmenting
data packets. We are pushing this data from a C++ producer to a Java
consumer. The C++ side has no problem handling and serializing the data.
The Java side we found has the limitation that accessing the data
through the accessor needlessly creates autoboxed object types that
eventually slam our GC.
To fix the problem, I started out by
generating Java stubs and then modyfying the problematic fields.
However, I find that strategy is difficult to maintain with each new
Protobuf version and it also was a bit precarious as this is a
multi-organizational system and required too much expertise (and
knowledge transfer). So I finally got around to hacking the protoc
compiler itself to make a more efficient Java implementation.
I
have ported the 2.4.1 version of the protoc compiler
to use a Java <primitivetype>[ ] as the storage for a repeated
primitive fields as an alternative to the less efficient
java.util.List<PrimitiveBoxedType>. Additionally, it looks trivial
to push this to the subversion trunk though it has a couple of added
features over 2.4.1. <aside>I must compliment the authors of the
protoc compiler as I must say that it was pretty straight forward to
make this change.</aside>
The changes are entirely located in the file $PROTO_HOME/src/google/protobuf/compiler/java/java_primitive_field.cc
The change should improve performance for the (repeated) primitive accessor without changing its API:
<primitivetype> get$FIELDNAME$(int index)
but I have also extended the interface with a get..Array() function:
<primitivetype>[ ] get$FIELDNAME$Array();
Unfortunately,
with the lack of const in Java this little addition actually punches a
hole in the immutable pattern used in the Java API, but perhaps
significantly for some will also allow use of the System.arraycopy
function. I suppose I could have offered up my own version of an
arraycopy as an alternative in lieu of the get...Array() function (kind
of just thought of that). I.e.
copy$FIELDNAME$Array(<primitivetype>[ ] target, int offset, int
length). Anyway, I would welcome such suggestions.
I have uploaded to my Google Drive (
https://docs.google.com/open?id=0B6kQ2S7zDGNaWl9OY041MGpwcHc) a rather raw patch for these changes (actually maintains all of
the original macro lines commented out for reference). I hope others
might find a use for this higher performance version of the Java
interface. Ideally, I would love to see this integrated into the
official version but there might be some edge cases that I am not
considering. And there are some other design issues to consider such as
the breaking immutability mentioned above as well as best strategies of
growing a primitive array on the Builder side (I probably should just
mimic the proportional ArrayList algorithm that it would use now).
Would love to get some feedback if folks might find this useful.
Thanks,
Ryan
P.S. Apologies for the long post.