Automatically Generated Wireshark/Ethereal plugins

207 views
Skip to first unread message

Dilip Joseph

unread,
Jul 24, 2008, 3:25:40 AM7/24/08
to Protocol Buffers
Hello,

I have written a Python script to generate and install dissector
plugins for viewing messages packed with Protocol Buffers in
Wireshark.

http://www.cs.berkeley.edu/~dilip/software/wireshark_protocolbufs/

Currently, a generated dissector only displays a text representation
of a message (screenshot available at above URL). I am still working
on displaying a message as a tree with collapsible sub-trees.

If you have any trouble using the script, please email me. Please
email me any suggestions/comments, as well.

Thank you,

Regards

Dilip

Kenton Varda

unread,
Jul 24, 2008, 1:48:06 PM7/24/08
to Dilip Joseph, Protocol Buffers
Cool!

Does it allow you to print the numeric tag/value pairs for a message if you don't have the .proto file available?  In C++ and Java, you can accomplish this by parsing the data into a message with no fields, then printing that using TextFormat -- the unknown fields will be printed as tag/value pairs.  Python doesn't currently support retaining unknown fields, though.

Dilip Joseph

unread,
Jul 24, 2008, 3:55:57 PM7/24/08
to Protocol Buffers
Yes. The wireshark dissector can print the numeric tag/value pairs for
a message without the .proto file.

The python script only generates the C/C++ glue code required to
integrate Protocol Buffers C++ parsing code into Wireshark. Complete C
++ Protocol Buffer code functionality is available to the dissector.

However, I was able to obtain only the first level of numeric tag/
value pairs using TextFormat::PrintToString() on a message with no
fields.
For example, the Wireshark output for the AddressBook example is
below:

1: "\n\005Dilip\020\001\032\035dilip.an...@gmail.com"
1: "\n\004Mary\020\002\032\016...@email.com"

Is there some other function TextFormat::Print() function that can
print the tag/value pairs for embedded messages? Is this even
feasible without having the .proto file?

Tonight, I will upload a new package containing:

1) a generic .proto independent wireshark dissector that displays
numeric tag/value pairs.
2) python scripts to create wireshark dissectors specific to .proto
files (from package released yesterday).

Regards
Dilip

Kenton Varda

unread,
Jul 24, 2008, 4:31:50 PM7/24/08
to Dilip Joseph, Protocol Buffers
On Thu, Jul 24, 2008 at 12:55 PM, Dilip Joseph <dilip.ant...@gmail.com> wrote:

Yes. The wireshark dissector can print the numeric tag/value pairs for
a message without the .proto file.

The python script only generates the C/C++ glue code required to
integrate Protocol Buffers C++ parsing code into Wireshark.  Complete C
++ Protocol Buffer code functionality is available to the dissector.

It looks like your solution is based on compiling the .proto file into .pb.cc and using that.

Another strategy you could use would be to parse the .proto file dynamically at runtime.  You can do this by using protobuf::compiler::Importer to parse the .proto file into a protobuf::FileDescriptor and protobuf::DynamicMessageFactory to construct a protobuf::Message instance based on that FileDescriptor.  You can then use that to parse the message and give it to TextFormat.

Not sure if this would work better than what you have.  Just throwing it out there.
 
However, I was able to obtain only the first level of numeric tag/
value pairs using TextFormat::PrintToString() on a message with no
fields.
For example, the Wireshark output for the AddressBook example is
below:

1: "\n\005Dilip\020\001\032\035dilip.an...@gmail.com"
1: "\n\004Mary\020\002\032\016...@email.com"

Is there some other function TextFormat::Print() function that can
print the tag/value pairs for embedded messages?  Is this even
feasible without having the .proto file?

The wire format does not distinguish between strings and embedded messages.  However, you could heuristically assume that a length-delimited field is an embedded message if it parses successfully as a protocol message.  Random bytes are not likely to parse successfully.

Dilip Joseph

unread,
Jul 25, 2008, 2:33:20 AM7/25/08
to Kenton Varda, Protocol Buffers
On 7/24/08, Kenton Varda <ken...@google.com> wrote:
> The wire format does not distinguish between strings and embedded messages.
> However, you could heuristically assume that a length-delimited field is an
> embedded message if it parses successfully as a protocol message. Random
> bytes are not likely to parse successfully.
>

The heuristic seems to work well. I had to make the following minor
modifications to TextFormat::PrintUnknownFields() to implement the
heuristic. Is there some existing function that already implements
this heuristic? If not, is it possible to add one to the codebase? It
appears to be very useful for scenarios where one doesn't have the
.proto file, and only requires minor code modifications listed below:

<code>
for (int j = 0; j < field.length_delimited_size(); j++) {
generator.Print(field_number);
EmptyMessage embedded_msg;
// The empty_message.pb.h and empty_message.pb.cc files
generated by protoc are included in Makefile.am and thus added to
libprotobuf
// #include <google/protobuf/empty_message.pb.h> is used earlier
string field_str = field.length_delimited(j);
if (embedded_msg.ParseFromArray(field_str.data(), field_str.size())) {
// the new action
generator.Print(":\n");
generator.Indent();
Print(embedded_msg.GetDescriptor(),
embedded_msg.GetReflection(), generator);
generator.Outdent();
generator.Print("\n");
} else {
// The original action
generator.Print(": \"");
generator.Print(CEscape(field.length_delimited(j)));
generator.Print("\"\n");
}

}
</code>

--
_________________________________________
Dilip Antony Joseph
Graduate Student
Computer Science Division,
University of California, Berkeley
http://www.cs.berkeley.edu/~dilip

Kenton Varda

unread,
Jul 25, 2008, 6:27:41 PM7/25/08
to Dilip Joseph, Protocol Buffers
This seems like it would be a pretty useful addition to TextFormat.  As far as the code goes, you can actually directly call methods of WireFormat to parse the embedded message to an UnknownFieldSet, rather than parse to an EmptyMessage.  If you'd like to make this change and send me a patch, I'll apply it.

Dilip Joseph

unread,
Jul 27, 2008, 11:30:00 PM7/27/08
to Kenton Varda, Protocol Buffers
I tried parsing an embedded Message to an UnknownFieldSet using
WireFormat::SkipMessage(). The heuristic doesn't work well in this
case - SkipMessage() sometimes returns true even when parsing regular
strings, and thus the heuristic wrongly considers a regular string as
an embedded message. Parsing the message into an EmptyMessage didn't
have this problem.

I could fix this problem by adding a input->ConsumedEntireMessage()
check in WireFormat::SkipMessage() [code at end of email]. I couldn't
find documentation for the return value semantics of SkipMessage().
Is this an acceptable change? Am I missing some other way to use
WireFormat to parse a message into an UnknownFieldSet?

I will send the TextFormat patch as soon as the above issue is resolved.

Regards

Dilip

<code>
bool WireFormat::SkipMessage(io::CodedInputStream* input,
UnknownFieldSet* unknown_fields) {
while(true) {
uint32 tag = input->ReadTag();
if (tag == 0) {
// End of input. This is a valid place to end, so return true.
return true;
}

WireType wire_type = GetTagWireType(tag);

if (wire_type == WIRETYPE_END_GROUP) {
// Must be the end of the message.
if(!input->ConsumedEntireMessage()) return false;//ADDED by Dilip
return true;
}

if (!SkipField(input, tag, unknown_fields)) return false;
}
}

</code>

Kenton Varda

unread,
Jul 28, 2008, 12:16:07 AM7/28/08
to Dilip Joseph, Protocol Buffers
It's the responsibility of the caller to call ConsumedEntireMessage() after SkipMessage() because the UnknownFieldSet may be representing a group, in which case ConsumedEntireMessage() is wrong (LastTagWas() should be used instead).  So, call ConsumedEntireMessage() after SkipMessage() returns.

Dilip Joseph

unread,
Jul 28, 2008, 2:52:02 AM7/28/08
to Kenton Varda, Protocol Buffers
Hi Kenton,

Thanks for the info.

Here is the patch that uses the heuristic you suggested to display
numeric key-value pairs for embedded fields, when the .proto is not
available. I have added Print() functions which take in just the
message bytes and length, similar to the existing ones that take in
Message/Reflection/Descriptor objects.

I have submitted the online individual contribution license form.

Regards
Dilip

text-format.diff

Kenton Varda

unread,
Jul 28, 2008, 12:56:16 PM7/28/08
to Dilip Joseph, Protocol Buffers
Thanks, Dilip.  Can you upload this to http://codereview.appspot.com/ and send it to me as a code review?  At Google we require that every change be reviewed by another engineer before submission.

One thing that we'll definitely need before this can be submitted is tests.  Can you add tests to text_format_unittest.cc for all these cases:
- A field that looks like a sub-message, and is printed as one.
- A field that would have parsed correctly as a sub-message if not for the ConsumedEntireMessage() check.
- A field that would not have parsed correctly as a sub-message even without the ConsumedEntireMessage() check.

Also, your code will need to conform to the Google C++ style guide:


In particular, code lines should be no longer than 80 characters.  You don't actually have to read the style guide if you don't want to; I'll tell you in the code review if there are any other problems.

Thanks again; this will be a very useful change!

Kenton Varda

unread,
Jul 28, 2008, 12:58:34 PM7/28/08
to Dilip Joseph, Protocol Buffers
On Mon, Jul 28, 2008 at 9:56 AM, Kenton Varda <ken...@google.com> wrote:
Thanks, Dilip.  Can you upload this to http://codereview.appspot.com/ and send it to me as a code review?  At Google we require that every change be reviewed by another engineer before submission.

(to clarify:  You'll want to create a new "issue" and then upload your patch.)

Kenton Varda

unread,
Aug 5, 2008, 9:16:29 PM8/5/08
to Dilip Joseph, Protocol Buffers
After code review, this has been committed as revision 31.
Reply all
Reply to author
Forward
0 new messages