I've started with one of the simple protobuf examples of writing to a file, then reading back. I've extended it a little to try to use both oneof, and serialisation with a length field in front so that I can read multiple messages.
The proto file is:
syntax = "proto3";
package pbuf;
message GetRequest {
enum GetTarget {
FIRST = 0;
SECOND = 1;
}
GetTarget get_target = 1;
}
message SetRequest {
enum SetTarget {
FIRST = 0;
SECOND = 1;
}
SetTarget set_target = 1;
}
message Request {
oneof request_type {
SetRequest s_request = 1;
GetRequest g_request = 2;
}
}
I.e. there is a main message type that can hold one of the others. The others have a single enum field, with enum values of 0 and 1 defined for each one.
The code I'm using to experiment with is compiled using gcc 7.5.0 on 64-bit Ubuntu 16.04, with std=c++17, and is:
#include <fstream>
#include <iostream>
#include <string>
#include <cstdint>
#include "oneof.pb.h"
int main(int argc, char* argv[])
{
GOOGLE_PROTOBUF_VERIFY_VERSION;
if (argc != 2)
{
std::cerr << "Usage: " << argv[0] << " DATA_FILE" << std::endl;
return -1;
}
{
std::fstream input(argv[1], std::ios::in | std::ios::binary);
if (!input)
{
std::cout << argv[1] << ": File not found. Creating a new file." << std::endl;
}
else
{
while (input)
{
pbuf::Request request;
std::size_t serialisationLength { 0 };
// Read the data size first; this is the most likely to fail at end of file
input.read(reinterpret_cast<char *>(&serialisationLength), sizeof(serialisationLength));
if (!input.eof())
{
std::cout << "Serialisation Length : " << serialisationLength << std::endl;
uint8_t serialisedData[serialisationLength];
if (!input.read(reinterpret_cast<char *>(serialisedData), serialisationLength))
{
std::cerr << "Failed to read request" << std::endl;
return -1;
}
if (!request.ParseFromArray(serialisedData, serialisationLength))
{
std::cerr << "Failed to parse request " << std::endl;
return -1;
}
std::cout << "Data: " << request.DebugString() << std::endl;
}
}
}
}
// Reset the output file
std::fstream output(argv[1], std::ios::out | std::ios::trunc | std::ios::binary);
for (int32_t i = 0; i < 3; i++)
{
pbuf::Request request;
switch (i)
{
case 0:
{
pbuf::GetRequest* getRequest { request.mutable_g_request() };
getRequest->set_get_target(pbuf::GetRequest_GetTarget::GetRequest_GetTarget_FIRST);
break;
}
case 1:
{
pbuf::SetRequest* setRequest { request.mutable_s_request() };
setRequest->set_set_target(pbuf::SetRequest_SetTarget::SetRequest_SetTarget_FIRST);
break;
}
case 2:
{
pbuf::GetRequest* getRequest { request.mutable_g_request() };
getRequest->set_get_target(pbuf::GetRequest_GetTarget::GetRequest_GetTarget_SECOND);
break;
}
}
auto serialisationLength { request.ByteSizeLong() };
uint8_t serialisedData[serialisationLength];
if (!request.SerializeToArray(serialisedData, serialisationLength))
{
std::cerr << "Failed to serialise request " << i << "." << std::endl;
return -1;
}
if (!output.write(reinterpret_cast<const char *>(&serialisationLength), sizeof(serialisationLength)))
{
std::cerr << "Failed to write request " << i << "." << std::endl;
return -1;
}
if (!output.write(reinterpret_cast<const char *>(serialisedData), serialisationLength))
{
std::cerr << "Failed to write request " << i << "." << std::endl;
return -1;
}
}
output.close();
// Optional: Delete all global objects allocated by libprotobuf.
google::protobuf::ShutdownProtobufLibrary();
return 0;
}
The gist of it is that it reads a given file and prints out the contents using the DebugString() for each object, then overwrites the file, putting new entries in (they're actually always the same anyway, but...) so, if you're running this and passing "var.bin" as the file argument, the first time round you should expect to see:
var.bin: File not found. Creating a new file.
Serialisation Length: 2
Serialisation Length: 2
Serialisation Length: 4
At this point you can, I guess, see the issue (more later). The second time round, you should expect to see the contents of the 3 entries that were added to the file. I.e.:
Serialisation Length : 2
Data: g_request {
}
Serialisation Length : 2
Data: s_request {
}
Serialisation Length : 4
Data: g_request {
get_target: SECOND
}
Serialisation Length: 2
Serialisation Length: 2
Serialisation Length: 4
This further highlights the issue I've got; i.e. when the enum value is set to the "0" version, the value isn't actually added to the serialized data and, hence, doesn't appear in the de-serialized data.
Is this expected behaviour because, to me, that seems weird? Saying that, I've only got 35 years experience of professional software development, so what would I know?!
If I change the proto file to:
syntax = "proto3";
package pbuf;
message GetRequest {
enum GetTarget {
DEFAULT = 0;
FIRST = 1;
SECOND = 2;
}
GetTarget get_target = 1;
}
message SetRequest {
enum SetTarget {
DEFAULT = 0;
FIRST = 1;
SECOND = 2;
}
SetTarget set_target = 1;
}
message Request {
oneof request_type {
SetRequest s_request = 1;
GetRequest g_request = 2;
}
}
i.e. add an extra enumeration literal of "DEFAULT = 0;" in both, and move the others "up" 1, the output is:
var.bin: File not found. Creating a new file.
Serialisation Length: 4
Serialisation Length: 4
Serialisation Length: 4
the first time, and
Serialisation Length : 4
Data: g_request {
get_target: FIRST
}
Serialisation Length : 4
Data: s_request {
set_target: FIRST
}
Serialisation Length : 4
Data: g_request {
get_target: SECOND
}
Serialisation Length: 4
Serialisation Length: 4
Serialisation Length: 4
the second time, which is more like I would expect.
The cause of this seems to be this, generated function:
size_t GetRequest::ByteSizeLong() const {
// @@protoc_insertion_point(message_byte_size_start:pbuf.GetRequest)
size_t total_size = 0;
uint32_t cached_has_bits = 0;
// Prevent compiler warnings about cached_has_bits being unused
(void) cached_has_bits;
// .pbuf.GetRequest.GetTarget get_target = 1;
if (this->_internal_get_target() != 0) {
total_size += 1 +
::_pbi::WireFormatLite::EnumSize(this->_internal_get_target());
}
return MaybeComputeUnknownFieldsSize(total_size, &_impl_._cached_size_);
}
When the enum value is zero, it's (obviously) skipping over the total_size update where EnumSize is called.
Any comments would be appreciated; I may be missing something but this doesn't seem to make sense to me.