Serialized message size difference between Windows and POSIX

44 views
Skip to first unread message

Oleksii Hladyshko

unread,
Jul 10, 2025, 10:48:04 AM7/10/25
to grpc.io
gRPC: 1.59.1
Protocol Buffers: 3.21.12

I have a .proto file with proto2 syntax. One of the messages is defined like this:
message Log
{
    repeated Event events = 1; // can be very heavy
...
    optional string database = 7;
    optional bytes bundle = 8;
...
}

I need to calculate the size of the message at runtime to send data when a specific size threshold is reached. The function Log::ByteSizeLong() isn't suitable for my needs because, when there are many added events, it negatively impacts the performance of the critical sections.

I calculated the size according to the documentation and generated the `pb.cc` file.
size_t getTagSize(uint32_t fieldNumber)
{
    using namespace google::protobuf::internal;
    return WireFormatLite::TagSize(fieldNumber, WireFormatLite::FieldType::MAX_FIELD_TYPE);
}
size_t getProtobufMessageSize(google::protobuf::MessageLite* message, uint32_t fieldNumber)
{
    return getTagSize(fieldNumber) + google::protobuf::internal::WireFormatLite::MessageSize(*message);
}

It functions perfectly on Windows. However, for POSIX systems (macOS and Linux), there is a slight discrepancy between the calculated size and the actual size of the serialized message. My calculations show that it is 2 bytes smaller for both the database and the bundle. When I comment out the code that sets those fields, the size is accurate.

What causes the difference in serialized sizes between Windows and POSIX? Could the methods I use to calculate the sizes also differ?

Peter Schow

unread,
Jul 10, 2025, 11:43:46 AM7/10/25
to Oleksii Hladyshko, grpc.io
On Thu, Jul 10, 2025 at 8:48 AM Oleksii Hladyshko <oleksii....@gmail.com> wrote:
> It functions perfectly on Windows. However, for POSIX systems (macOS and Linux), there is a slight discrepancy between the calculated size and the actual size of the serialized message. My calculations show that it is 2 bytes smaller for both the database and the bundle. When I comment out the code that sets those fields, the size is accurate.
>
> What causes the difference in serialized sizes between Windows and POSIX? Could the methods I use to calculate the sizes also differ?

I'm wondering if the differences are being caused by the default UTF-16 character encoding on Windows vs. UTF-8 on Linux.  

Have you inspected the contents of the serialized message, byte by byte?

Oleksii Hladyshko

unread,
Jul 10, 2025, 12:31:24 PM7/10/25
to grpc.io
I haven't checked it byte by byte, but I have tested on Windows 11 with UTF-8 encoding enabled in system locale settings.
Reply all
Reply to author
Forward
0 new messages