C++ enum serialization question

447 views
Skip to first unread message

John McCabe

unread,
Aug 18, 2022, 6:05:32 AM8/18/22
to Protocol Buffers
I've started with one of the simple protobuf examples of writing to a file, then reading back. I've extended it a little to try to use both oneof, and serialisation with a length field in front so that I can read multiple messages.

The proto file is:

syntax = "proto3";

package pbuf;

message GetRequest {
    enum GetTarget {
        FIRST = 0;
        SECOND = 1;
    }

    GetTarget get_target = 1;
}

message SetRequest {
    enum SetTarget {
        FIRST = 0;
        SECOND = 1;
    }

    SetTarget set_target = 1;
}

message Request {
    oneof request_type {
        SetRequest s_request = 1;
        GetRequest g_request = 2;
    }
}


I.e. there is a main message type that can hold one of the others. The others have a single enum field, with enum values of 0 and 1 defined for each one.

The code I'm using to experiment with is compiled using gcc 7.5.0 on 64-bit Ubuntu 16.04, with std=c++17, and is:

#include <fstream>
#include <iostream>
#include <string>
#include <cstdint>

#include "oneof.pb.h"

int main(int argc, char* argv[])
{
    GOOGLE_PROTOBUF_VERIFY_VERSION;

    if (argc != 2)
    {
        std::cerr << "Usage:  " << argv[0] << " DATA_FILE" << std::endl;
        return -1;
    }

    {
        std::fstream input(argv[1], std::ios::in | std::ios::binary);
        if (!input)
        {
            std::cout << argv[1] << ": File not found.  Creating a new file." << std::endl;
        }
        else
        {
            while (input)
            {
                pbuf::Request request;
                std::size_t serialisationLength { 0 };

                // Read the data size first; this is the most likely to fail at end of file
                input.read(reinterpret_cast<char *>(&serialisationLength), sizeof(serialisationLength));
                if (!input.eof())
                {
                    std::cout << "Serialisation Length : " << serialisationLength << std::endl;

                    uint8_t serialisedData[serialisationLength];
                    if (!input.read(reinterpret_cast<char *>(serialisedData), serialisationLength))
                    {
                        std::cerr << "Failed to read request" << std::endl;
                        return -1;
                    }

                    if (!request.ParseFromArray(serialisedData, serialisationLength))
                    {
                        std::cerr << "Failed to parse request " << std::endl;
                        return -1;
                    }

                    std::cout << "Data: " << request.DebugString() << std::endl;
                }
            }
        }
    }

    // Reset the output file
    std::fstream output(argv[1], std::ios::out | std::ios::trunc | std::ios::binary);

    for (int32_t i = 0; i < 3; i++)
    {
        pbuf::Request request;
        switch (i)
        {
        case 0:
        {
            pbuf::GetRequest* getRequest { request.mutable_g_request() };
            getRequest->set_get_target(pbuf::GetRequest_GetTarget::GetRequest_GetTarget_FIRST);
            break;
        }

        case 1:
        {
            pbuf::SetRequest* setRequest { request.mutable_s_request() };
            setRequest->set_set_target(pbuf::SetRequest_SetTarget::SetRequest_SetTarget_FIRST);
            break;
        }

        case 2:
        {
            pbuf::GetRequest* getRequest { request.mutable_g_request() };
            getRequest->set_get_target(pbuf::GetRequest_GetTarget::GetRequest_GetTarget_SECOND);
            break;
        }
        }

        auto serialisationLength { request.ByteSizeLong() };
        uint8_t serialisedData[serialisationLength];
        if (!request.SerializeToArray(serialisedData, serialisationLength))
        {
            std::cerr << "Failed to serialise  request " << i << "." << std::endl;
            return -1;

        }

        if (!output.write(reinterpret_cast<const char *>(&serialisationLength), sizeof(serialisationLength)))
        {
            std::cerr << "Failed to write request " << i << "." << std::endl;
            return -1;
        }

        if (!output.write(reinterpret_cast<const char *>(serialisedData), serialisationLength))
        {
            std::cerr << "Failed to write request " << i << "." << std::endl;
            return -1;
        }
    }
    output.close();

    // Optional:  Delete all global objects allocated by libprotobuf.
    google::protobuf::ShutdownProtobufLibrary();

    return 0;
}


The gist of it is that it reads a given file and prints out the contents using the DebugString() for each object, then overwrites the file, putting new entries in (they're actually always the same anyway, but...) so, if you're running this and passing "var.bin" as the file argument, the first time round you should expect to see:

var.bin: File not found.  Creating a new file.
Serialisation Length: 2
Serialisation Length: 2
Serialisation Length: 4

At this point you can, I guess, see the issue (more later). The second time round, you should expect to see the contents of the 3 entries that were added to the file. I.e.:

Serialisation Length : 2
Data: g_request {
}

Serialisation Length : 2
Data: s_request {
}

Serialisation Length : 4
Data: g_request {
  get_target: SECOND
}

Serialisation Length: 2
Serialisation Length: 2
Serialisation Length: 4

This further highlights the issue I've got; i.e. when the enum value is set to the "0" version, the value isn't actually added to the serialized data and, hence, doesn't appear in the de-serialized data.

Is this expected behaviour because, to me, that seems weird? Saying that, I've only got 35 years experience of professional software development, so what would I know?!

If I change the proto file to:

syntax = "proto3";

package pbuf;

message GetRequest {
    enum GetTarget {
        DEFAULT = 0;
        FIRST = 1;
        SECOND = 2;
    }

    GetTarget get_target = 1;
}

message SetRequest {
    enum SetTarget {
        DEFAULT = 0;
        FIRST = 1;
        SECOND = 2;
    }

    SetTarget set_target = 1;
}

message Request {
    oneof request_type {
        SetRequest s_request = 1;
        GetRequest g_request = 2;
    }
}

i.e. add an extra enumeration literal of "DEFAULT = 0;" in both, and move the others "up" 1, the output is:

var.bin: File not found.  Creating a new file.
Serialisation Length: 4
Serialisation Length: 4
Serialisation Length: 4


the first time, and 

Serialisation Length : 4
Data: g_request {
  get_target: FIRST
}

Serialisation Length : 4
Data: s_request {
  set_target: FIRST
}

Serialisation Length : 4
Data: g_request {
  get_target: SECOND
}

Serialisation Length: 4
Serialisation Length: 4
Serialisation Length: 4


the second time, which is more like I would expect.

The cause of this seems to be this, generated function:

size_t GetRequest::ByteSizeLong() const {
// @@protoc_insertion_point(message_byte_size_start:pbuf.GetRequest)
  size_t total_size = 0;

  uint32_t cached_has_bits = 0;
  // Prevent compiler warnings about cached_has_bits being unused
  (void) cached_has_bits;

  // .pbuf.GetRequest.GetTarget get_target = 1;
  if (this->_internal_get_target() != 0) {
    total_size += 1 +
      ::_pbi::WireFormatLite::EnumSize(this->_internal_get_target());
  }


  return MaybeComputeUnknownFieldsSize(total_size, &_impl_._cached_size_);
}


When the enum value is zero, it's (obviously) skipping over the total_size update where EnumSize is called.

Any comments would be appreciated; I may be missing something but this doesn't seem to make sense to me.




John McCabe

unread,
Aug 18, 2022, 11:23:27 AM8/18/22
to Protocol Buffers
It looks like I hadn't spotted this issue, https://github.com/protocolbuffers/protobuf/issues/5883, that one of my colleagues has helped me find.

Hard to understand the justification for that, to be honest!

Matthew Fowles Kulukundis

unread,
Aug 18, 2022, 11:28:20 AM8/18/22
to John McCabe, Protocol Buffers
John~

Honestly, I don't particularly like it, but that is the API we have right now and we need to move forward because changing it would be extremely disruptive.

Matt

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/27775d01-5273-4cea-a946-2a276e87ef2cn%40googlegroups.com.

John McCabe

unread,
Aug 18, 2022, 1:42:33 PM8/18/22
to Protocol Buffers
On Thursday, 18 August 2022 at 16:28:20 UTC+1 k...@google.com wrote:
John~

Honestly, I don't particularly like it, but that is the API we have right now and we need to move forward because changing it would be extremely disruptive.

Thanks for your reply Matt. FWIW, I notice in the message.h file there's the following declaration of DebugString():

  // Generates a human-readable form of this message for debugging purposes.
  // Note that the format and content of a debug string is not guaranteed, may
  // change without notice, and should not be depended on. Code that does
  // anything except display a string to assist in debugging should use
  // TextFormat instead.
  std::string DebugString() const;


Surely that suggests that anyone who's dependent on the content and format of the DebugString() is _wrong_, and if it changes, it's their problem for relying on it in the first place? Hence the 'disruptive' aspect should be limited to those who're misusing DebugString in the first place? :-) 

Matthew Fowles Kulukundis

unread,
Aug 18, 2022, 2:00:45 PM8/18/22
to John McCabe, Protocol Buffers
John~

If this were only an issue of DebugString, you would be correct.  Unfortunately, the behavior of enum's showing up in the unknown field set vs the field itself is observable in many different ways.

Matt

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages