Dear Protobuf community,
I’m reaching out with a proposal for several optimizations to the protobuf library that can improve the processing throughput (processed bytes per second) of strings and varints. These optimizations rely on SIMD instructions such as Intel AVX-512. The benefits of these optimizations increase with the size of the payload.
The optimizations are in the utf8 validation function, which is used during both serialization and deserialization, and in the read packed varint array function which is used only during deserialization. The core part of the utf8 validation function is the ability to load more bytes in an input vector for validation by using AVX-512 larger registers. The core part of the array deserialization is breaking the one-at-a-time parsing of varints by using AVX-512 instructions to figure out the number of varints, allocating enough space in memory to load these varints and then processing the varints. The AVX-512 instructions are also used for processing signed varints.
To showcase this, I have performed the following experiments:
Benchmarking serialization of strings.
I generated strings of various lengths i.e., 10, 100, 500, 750 and 1000. For each length I generated five strings which have characters of different sizes (1 byte, 2 bytes, 1-2 bytes, 1-2-4 bytes and 1-2-3-4 bytes). The optimizations were benchmarked against version 3.21.4 and code release 22.0. While the utf8range module in code release 22.0 brings some improvements compared to 3.21.4, the proposed changes improve the performance over the baseline code up to ~78%.

Benchmarking deserialization of arrays of varints.
I generated arrays of various lengths i.e., 500, 1000, 2000 and 5000 elements. For each length, I created 8 different arrays where each element in the array is of different size i.e., integers starting from 1 byte to 8 bytes. Benchmarked the proposed changes and compared with the baseline code to achieve up to ~40% increase for arrays of 8-byte integers.

I’m wondering if these proposals would be of interest to this community and if this is an active area for development.
Would users expect to use these improvements in processing of large arrays or large strings?
While the code can’t be shared yet due to internal review, what would be the best way to present these proposed optimizations?
Thanks,
Andrei
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/3c62df62-e40a-4f73-a30f-921eef130b41n%40googlegroups.com.