Any way to "extract all" using protoc or any other tool?

867 views
Skip to first unread message

Jim Baldwin

unread,
Dec 9, 2017, 5:52:10 PM12/9/17
to Protocol Buffers
I have a protobuf file, and a .proto file that describes the schema.

The .proto describes dozens of different messages that may be in the protobuf file.

I would like to know what messages can be found in the file.  I do a protoc --decode_raw and get something out, but I don't see how to use that to figure out how to extract messages from the file.

I assume there's something I don't get about protobufs, but it seems to me I should be able to take a protobuf data file and corresponding .proto and turn it into a file that lets me see what the message hierarchy is in the file.  JSON would be a great way to do that.

What am I missing?

Ilia Mirkin

unread,
Dec 9, 2017, 7:20:15 PM12/9/17
to Jim Baldwin, Protocol Buffers
An encoded protobuf is just a sequence of (tag, value) pairs. If you
don't know which proto it is, decode_raw is the best you can do. If
you do know which proto it is, you can use --decode instead and pass
it a proto name to use for the decoding.

Cheers,

-ilia

Jim Baldwin

unread,
Dec 10, 2017, 10:14:53 AM12/10/17
to Protocol Buffers
It's not really just a sequence; it's a hierarchy, isn't it?  Why can't I use --decode <root> or something like that?

Marc Gravell

unread,
Dec 10, 2017, 11:23:12 AM12/10/17
to Jim Baldwin, Protocol Buffers
You can and it does. The problem is that the wire format by itself doesn't tell it **what message type** the root object is. So you need to tell it in the additional parameter to --decode

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Jim Baldwin

unread,
Dec 10, 2017, 11:23:24 AM12/10/17
to Protocol Buffers
Perhaps it might help if I understood the output of protoc --decode_raw. 

Here's an example of a .caffemodel file I'm trying to inspect.  Is there a description of what the numbers mean in this file?

1: "VGG_ILSVRC_16_layers"
100 {
  1: "input-data"
  2: "Python"
  4: "data"
  4: "im_info"
  4: "gt_boxes"
  10: 0
  130 {
    1: "roi_data_layer.layer"
    2: "RoIDataLayer"
    3: "\'num_classes\': 2"
  }
}
100 {
  1: "data_input-data_0_split"
  2: "Split"
  3: "data"
  4: "data_input-data_0_split_0"
  4: "data_input-data_0_split_1"
  10: 0

Marc Gravell

unread,
Dec 10, 2017, 11:25:05 AM12/10/17
to Jim Baldwin, Protocol Buffers
They are field numbers. They don't mean anything by themselves other than to identify each field. If you want to know the logical *name* of each field, you need the .proto schema.

--

Jim Baldwin

unread,
Dec 10, 2017, 11:32:27 AM12/10/17
to Protocol Buffers
OK, this helps.  I need to figure out what the "root" message is.  It seems like an omission in the whole PB thing that you can't specify the .proto and do a --decode_everything.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.

Marc Gravell

unread,
Dec 10, 2017, 11:35:36 AM12/10/17
to Jim Baldwin, Protocol Buffers
as I ubderstand it, --decode *will do that*. it doesn't decode *just* the root : but, it needs to know the root message type in order to correctly interpret the data

To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.

Jim Baldwin

unread,
Dec 10, 2017, 11:42:23 AM12/10/17
to Marc Gravell, Jim Baldwin, Protocol Buffers
In this case, it looks like there isn’t a root.  Rather, this format is a series of top-level parameters.  So, I have to give it the Parameter I’m looking for.  The problem I have with this is the order of parameters _might_ matter, and I lose that by only looking for one.

Marc Gravell

unread,
Dec 10, 2017, 11:45:13 AM12/10/17
to Jim Baldwin, Jim Baldwin, Protocol Buffers
oh, there is a root. if i had to guess it has something like:


message TheRootMessageType {
    required string name = 1;
    repeated SomeNoun items = 100;
}


Jim Baldwin

unread,
Dec 10, 2017, 12:12:41 PM12/10/17
to Protocol Buffers
Hmmm. Back to the core problem.  I am not certain I have the root.


I'm gradually thinking the root of this file is "NetParameter", but I'm not sure how I was supposed to know that. :-P

I still think there should be a tool that can do (protobuf, .proto) <-> JSON.

Ilia Mirkin

unread,
Dec 10, 2017, 12:35:10 PM12/10/17
to Jim Baldwin, Protocol Buffers
On Sun, Dec 10, 2017 at 11:23 AM, Jim Baldwin <jmb...@gmail.com> wrote:
> Perhaps it might help if I understood the output of protoc --decode_raw.
>
> Here's an example of a .caffemodel file I'm trying to inspect. Is there a
> description of what the numbers mean in this file?

As I said... it's a sequence of (tag, value) pairs, so...

>
> 1: "VGG_ILSVRC_16_layers"

Tag id 1: string value "VGG_bla"

> 100 {

Tag id 100: sub-message of some length (not printed here, but it's in
the encoded proto)

> 1: "input-data"

Tag id 1 of sub-message: "input-data"

(I won't annotate the rest, hopefully you get the gist of it.)

So for example, you might have a proto which is

message foo {
string a = 1;
repeated message bar = 100;
};

that would lead to that sort of encoding. Nowhere is type information
encoded on the wire unless you explciitly put it there yourself.
There's no way to know which proto it is, and there can be any number
of protos that would yield a valid decoding. So you have to give it
the name of the proto it is with --decode.

Protobuf is a structure encoding/decoding mechanism. Anything extra,
like type names or framing have to come from surrounding logic.

-ilia

Jim Baldwin

unread,
Dec 10, 2017, 12:42:47 PM12/10/17
to Protocol Buffers
Thanks, I think I understand now.  I do think identifying a "root" type in the .proto would have been a big help and wouldn't have cost much space or bandwidth.

If I had time, I'd write a tool that would infer the root by testing for types that have matching fields, but life is short...
Reply all
Reply to author
Forward
0 new messages