framing header message -- how to know length?

1,527 views
Skip to first unread message

John Lilley

unread,
Jul 7, 2018, 2:26:28 PM7/7/18
to Protocol Buffers
I am posting protobuf messages to a message broker, and in order to identify them, I prefix the message bytes with the serialized result of a "header" message:

message Header {
   
int version = 1;
   
string message_type = 2;
}

It is easy, to concatenate the header+actual message bytes and post the resulting block to a queue. But how do I take these apart on the receiving end? Suppose I get a byte-buffer consisting of:

---------------
| header      |
---------------
| body        |
---------------

Is it OK to throw this oversized buffer at the Header deserialization?  Will the extra bytes hurt anything?

Then, once I extract the Header message, how do I know where the body begins? I could turn around and ask the Header object "how big would you be if serialized?".  Is that reliable?  Is there a better way?

Thanks
john

Ilia Mirkin

unread,
Jul 7, 2018, 2:45:58 PM7/7/18
to John Lilley, Protocol Buffers
You need explicit lengths. Usually this is done as <header length
varint><header><body>. And the header contains the body length in it.
In Java, there's a CodedInputStream/OutputStream which makes it easy
to consume fixed lengths (push/popLimit) as well as raw varints (as
for the initial header length). Other languages have similar
abstractions.
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.

John Lilley

unread,
Jul 7, 2018, 4:02:17 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
Thanks!
Given that, is there any advantage to a "header message" as opposed to just hand-serializing everything in the header?

John Lilley

unread,
Jul 7, 2018, 4:04:10 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
Does protobuf include utility methods for direct ser/deser on varint, string, etc?
Thanks
john

Ilia Mirkin

unread,
Jul 7, 2018, 4:13:46 PM7/7/18
to John Lilley, Protocol Buffers
CodingInputStream/OutputStream have all that. readInt32/etc.

There's no strict advantage... but presumably you're using protobuf to
make your life easier, and this will make your life easier. (With a
string, you have to include the length, etc. And if the header ever
changes and you want to have back/forward compat, it can be
convenient.)

John Lilley

unread,
Jul 7, 2018, 4:16:46 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
OK in Java I've found the classes UInt32Value, StringValue, etc.
C++ isn't quite so obvious. Where should I look for those classes?
Thanks
john

Ilia Mirkin

unread,
Jul 7, 2018, 4:17:57 PM7/7/18
to John Lilley, Protocol Buffers

John Lilley

unread,
Jul 7, 2018, 4:18:53 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
Got it, thanks!
john

John Lilley

unread,
Jul 7, 2018, 4:34:30 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
If I want to write header+body to an array of bytes (in C++), is the easiest thing to use StringOutputStream, then copy its buffer when finished?
I also looked at ArrayOutputStream, but at first read it appears to require knowledge of the output size before constructing the stream.  True?
Thanks
john

Ilia Mirkin

unread,
Jul 7, 2018, 4:41:10 PM7/7/18
to John Lilley, Protocol Buffers
It's been nearly a decade since I've looked at those APIs closely,
hopefully someone else can elaborate. If what you want is a data
buffer you can then copy somewhere else, you either need something
dynamically sizeable, or just decree what the max size is, and
allocate that. Even better would be to avoid the additional copy, but
perhaps your message broker won't allow that.

John Lilley

unread,
Jul 7, 2018, 5:04:34 PM7/7/18
to imi...@alum.mit.edu, prot...@googlegroups.com
Thanks, I'll try the StringOutputStream and see what happens.  Double-copying isn't a big concern for us because our protocol is not high-frequency.

John Lilley

unread,
Jul 10, 2018, 11:20:19 AM7/10/18
to Protocol Buffers
Just to wrap this up, here's what I found:
You really have to know how big the header message is, because if you simply wrap an input stream around the entire concatenated buffer, the deserialization doesn't know where the end of the header is.  This strikes me as a shortcoming of protobuf, but it is what it is.  I used code something like(in java):

protected static byte[] makePacket(Message header, Message message) throws IOException {
   
// We cannot simply concatenate the two messages; protobuf will fail to deserialize them.
   
// So we add the header length at the front.  But we don't know it yet...
   
ByteArrayOutputStream os = new ByteArrayOutputStream();
   os
.write(new byte[4]);   // we will update this later
   
CodedOutputStream cos = CodedOutputStream.newInstance(os);
   header
.writeTo(cos);
   cos
.flush();
   
// NOW we know the header length
   
int headerLength = os.size() - 4;
   message
.writeTo(cos);
   cos
.flush();
   
byte[] result = os.toByteArray();
   
// Set header size
   
ByteBuffer bb = ByteBuffer.wrap(result);
   bb
.order(ByteOrder.LITTLE_ENDIAN);
   bb
.putInt(0, headerLength);
   
return result;
}

Then for deserialization:

protected static Message getResponseMessage(MessageFactory messageFactory, byte[] packet) throws Exception {
   
ByteBuffer bb = ByteBuffer.wrap(packet);
   bb
.order(ByteOrder.LITTLE_ENDIAN);
   
int headerLength = bb.getInt(0);
   
CommonWrapper.ResponseHeader header;
   
{
     
InputStream is = new ByteArrayInputStream(packet, 4, headerLength);
     
CodedInputStream cis = CodedInputStream.newInstance(is);
      header
= CommonWrapper.ResponseHeader.parseFrom(cis);
   
}
   
InputStream is = new ByteArrayInputStream(packet, 4 + headerLength, packet.length - (4 + headerLength));
   
return messageFactory.createMessage(header.getResponseMessageType(), is);
}

The MessageFactory is a hand-rolled class for creating messages from full name; but that's for a different thread.

john

Ilia Mirkin

unread,
Jul 10, 2018, 11:31:36 AM7/10/18
to John Lilley, Protocol Buffers
So like I said... you have to write it as

<header length><header which contains message length><message>

To decode it, you can stick the header length as a varint, all via the
coded input/output stream. Sample code for encoding:

makePacket(Message message) {
String name = ... (there has gotta be a way of getting the typename
from the message)
length = message.getSerializedSize()
Header h = Header.newBuilder().setName(name).setLength(length);
CodedOutputStream os = newInstance();
os.writeInt32(h.getSerializedSize());
os.writeMessageNoTag(h);
os.writeMessageNoTag(message);
}

And to decode:

Message readPacket() {
CodedInputStream is;
hlen = is.readInt32();
limit = is.pushLimit(hlen);
Header h = Header.parseFrom(is);
is.popLimit(limit);
is.pushLimit(h.getLength());
messageFactory.getBuilderForType(h.getName()).mergeFrom(is);
}

And you're done. Protobuf is meant for serializing data, not framing
it. Your complaint about lack of framing is well taken, but ... it's
just not part of what protobuf does.

You could just as well have a

message TheOneTrueMessage {
string name;
bytes data;
}

And then always decode that -- this gives you your framing. But this
ends up causing lots of copies of the data.

-ilia
Reply all
Reply to author
Forward
0 new messages