Thanks Alan for the great talk on your experiments with speeding up P4Runtime.
I briefly scanned our internal resources for protobuf performance tips, here is a condensed summary of the most important bits:
- Field numbers 1-15 are most efficient.
- Type `bytes` is cheaper than `strings`, since the latter is validated to be valid UTF8.
- Fixed ints (fixed32, fixed64) may use a bit more space than int32, int64 but are faster to decode roughly for integers >= 2^7
- Message hierarchy is not cheap, since it can occur function calls, memory allocation, cache misses.
- Use arenas to increase locality, amortize allocations, and make deallocations virtually free.
- You can avoid making copies of strings/bytes when parsing them and use aliasing instead, but unfortunately this is not open-sourced yet.
My comments based on precursory glance over p4runtime.proto:
- N/A, we already use small numbers.
- N/A, we already use `bytes` in places where it would matter.
- This may be worth giving a shot!
- This is what you observed and exploited in your proposal.
- This seems important. I believe you mentioned you already tried this?
- N/A, unfortunately, since this is unfortunately not yet open source.
One more observation, note that not all hierarchy is equally bad. Having a tightly packed representation of all the data we care about many layers deep is basically just as good as having a tightly packed representation of that data at the top level, since you only need to traverse the indirection once (so it is amortized across the entire data).
Not much new here, but maybe 3. is worth giving a shot.