Hello everyone,
I hope I can find some advise here.
I have C++ code that writes a number of protobuf messages to a compressed size delimited stream like this (simplified):
FILE *ofile = fopen("myfile.bin.gz", "wb");
google::protobuf::io::FileOutputStream ostream(_fileno(ofile));
google::protobuf::io::GzipOutputStream zipstream(&ostream);
while (loop) {
google::protobuf::util::SerializeDelimitedToZeroCopyStream(my_msg, zipstream);
}
This works fine. The files are written and I can read them back in in C++ with no issues.
Now I am trying to read them in Python and I'm having difficulties to understand the structure of the files. Here's what I'm trying:
def read_messages(raw_data: bytes):
offset = 0
while offset < len(raw_data):
# Read the size (4 bytes, little-endian) and decode
size_bytes = raw_data[offset : offset + 4]
offset += 4
size, _ = _DecodeVarint(size_bytes, 0)
# This reads the correct size of the message (verified in C++)
message_data = raw_data[offset : offset + size]
offset += size
# This causes an "Error parsing message" exception at the first message
msg = my_messages_protobuf.MyMessage()
msg.ParseFromString(message_data)
... and ...
with gzip.open(
"myfile.bin.gz", "r") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break;
read_messages(chunk)
Now, to clarify a bit, I have worked with protobuf for very long, although not in Python. Yet much Python code already deserializes such messages that come in elsewhere, so I assume the whole "setup Protobuf in Python" thing is not an issue here. It should work.
Given the fact that
_DecodeVarint() correctly reads the message size leads me to believe the reading of the gzipped file is okay too.
Yet when I look at the raw buffer "message_data" it looks very different than the raw message data looks in C++ when I use the debugger there. I have no idea what could cause this difference.
Can anybody give me a hint on what could be wrong here?
Much appreciated,
Moose