decode messages that are in protocol buffer format

5,051 views
Skip to first unread message

newbie

unread,
Oct 18, 2011, 6:03:38 AM10/18/11
to Protocol Buffers
Hi,

I am developing a sniffer that will sniff messages moving between a
message broker and DWH. The messages are written in "protocol buffers"
serialization format. So the message body that I sniff is a byte
string.

How do I decode this message to human readable format?

The sniffer is developed in c# .net .

I tried using System.Text.Encoding.UTF8.GetString(body) , but extra
character gets added maybe because UTF8 doesn't identify this format.


Thanks.

Marc Gravell

unread,
Oct 18, 2011, 1:02:36 PM10/18/11
to newbie, Protocol Buffers
Well, firstly protobuf is not a text format, so UTF-8 is not the way to start. What is it you need? Note that the protobuf format is ambiguous unless you already know the schema (the same data can be interpreted in different ways). However, if you read the encoding spec, you should be able to guess many cases.

Marc

> --
> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
>

Aaron Rich

unread,
Oct 18, 2011, 1:09:16 PM10/18/11
to newbie, Protocol Buffers
I would highly recommend looking at this:
http://code.google.com/p/protobuf-wireshark/

Might get you want you need.

-Aaron

newbie

unread,
Oct 19, 2011, 7:00:06 AM10/19/11
to Protocol Buffers
What I tried to do is this.

using ProtoBuf;
...
byte[] body = e.Body;
using (MemoryStream memStream = new MemoryStream(body))
{
memStream.Seek(0, SeekOrigin.Begin);
MyProtocoClass message =
ProtoBuf.Serializer.Deserialize<MyProtocoClass>(memStream);

}

But, I cannot deserialize using protobuf-net lib as it will requires
reference to the class generated from the proto file. Now there are 30
diff proto files.

Now I am trying to do this.

using Google.ProtocolBuffers;
...
CodedInputStream s = CodedInputStream.CreateInstance(body);
string data=s.ReadString() ;

But this returns only the first string in the message body.
Please tell me what to do to get the complete message ?

Thanks.

On Oct 18, 10:09 pm, Aaron Rich <aaron.r...@gmail.com> wrote:
> I would highly recommend looking at this:http://code.google.com/p/protobuf-wireshark/
>
> Might get you want you need.
>
> -Aaron
>
>
>
>
>
>
>
> On Tue, Oct 18, 2011 at 11:02 AM, Marc Gravell <marc.grav...@gmail.com> wrote:
> > Well, firstly protobuf is not a text format, so UTF-8 is not the way to start. What is it you need? Note that the protobuf format is ambiguous unless you already know the schema (the same data can be interpreted in different ways). However, if you read the encoding spec, you should be able to guess many cases.
>
> > Marc
>
> > On 18 Oct 2011, at 11:03, newbie <choudhury.ana...@gmail.com> wrote:
>
> >> Hi,
>
> >> I am developing a sniffer that will sniff messages moving between a
> >> message broker and DWH. The messages are written in "protocol buffers"
> >> serialization format. So the message body that I sniff is a byte
> >> string.
>
> >> How do I decode this message to human readable format?
>
> >> The sniffer is developed in c# .net .
>
> >> I tried using System.Text.Encoding.UTF8.GetString(body) , but extra
> >> character gets added maybe because UTF8 doesn't identify this format.
>
> >> Thanks.
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> >> To post to this group, send email to prot...@googlegroups.com.
> >> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en.

Marc Gravell

unread,
Oct 19, 2011, 8:23:37 AM10/19/11
to newbie, Protocol Buffers
It is still a little unclear what you are trying to do; it is a little painful to parse out the data if you don't know the message-type in advance. However, many implementations will have an API to consume raw protobuf data - it sounds like CodedInputStream is such; as is (in the case of your earlier protobuf-net attempt) ProtoReader. That, however, will not *directly* let you reliably reconstruct a message, unless you know the schema details (are specific integers zig-zag, for example).

Perhaps if you could be more specific as to what the data is that was sent? Note: protobuf includes virtually no "meta" on the wire; all you can find is, for example, "field 2 was 4 bytes, which could have been any of [there] values, depending on whether it was a float, an int, a zig-zag int, etc; field 5 was length-prefixed and 17 bytes - it *looks* like it might be the UTF-8 string "blah blah blah", but it could also have been a sub-message, or a packed array". All of *that* type of data (minus the interpretation) should be available via those APIs.

Marc
Regards,

Marc

newbie

unread,
Oct 20, 2011, 11:27:23 PM10/20/11
to Protocol Buffers
So, I got the schema details and able to decode the message now.

Thanks.

On Oct 19, 5:23 pm, Marc Gravell <marc.grav...@gmail.com> wrote:
> It is still a little unclear what you are trying to do; it is a little
> painful to parse out the data if you don't know the message-type in advance.
> However, many implementations will have an API to consume raw protobuf data
> - it sounds like CodedInputStream is such; as is (in the case of your
> earlier protobuf-net attempt) ProtoReader. That, however, will not
> *directly* let you reliably reconstruct a message, unless you know the
> schema details (are specific integers zig-zag, for example).
>
> Perhaps if you could be more specific as to what the data is that was sent?
> Note: protobuf includes virtually no "meta" on the wire; all you can find
> is, for example, "field 2 was 4 bytes, which could have been any of [there]
> values, depending on whether it was a float, an int, a zig-zag int, etc;
> field 5 was length-prefixed and 17 bytes - it *looks* like it might be the
> UTF-8 string "blah blah blah", but it could also have been a sub-message, or
> a packed array". All of *that* type of data (minus the interpretation)
> should be available via those APIs.
>
> Marc
>
Reply all
Reply to author
Forward
0 new messages