Custom zero-copy Java codegen: does it make sense?

25 views

Skip to first unread message

Stefano Baghino

unread,

Oct 27, 2020, 5:23:10 AM10/27/20

to Protocol Buffers

Hello there everybody,

I'm currently working on a system in which we have several components communicating with each other via gRPC/Protobuf.

My team has been doing some benchmarks and realized there's room for improvement in this regard (improvements that we actually need), as we noticed that we spend a lot of time encoding and decoding Protobuf message (more specifically, we work in Scala and we use ScalaPB to generate code for our system).

We have been evaluating the idea of switching to a different serialization format that minimizes copies, like Cap'n'Proto, FlatBuffers or SBE.

Doing some research on this formats, it appears to me that we can probably achieve something similar by having a custom Protobuf codegen that disallows mutations (we are not mutating Protobuf as objects in our system anyhow, so that's fine with us) and acts as a view over the serialized Protobuf messages.

I proposed the idea to a couple of colleagues and at least it doesn't seem completely crazy, although a colleague correctly pointed out to me that even if we wouldn't have to perform copying, we would still have to perform some form of indexing to allow consumers to quickly read nested and/or variable length fields off of the serialized messages (maybe in a lazy and possibly incremental fashion), which could turn out to be more costly than just copying data.

The main advantage in retaining Protobuf as the encoding format is that we would need anyhow to still provide it as an alternative for existing users, and having to maintain two encoding formats could increase the complexity and/or represent an obstacle.

Does this idea check out with you? Is there some built-in capability and/or prior art that we are missing?