Parse a .proto file

3.670 visualizzazioni
Passa al primo messaggio da leggere

Pradeep Gollakota

da leggere,
29 ott 2014, 22:53:4029/10/14
a prot...@googlegroups.com
Hi Protobuf gurus,

I'm trying to parse a .proto file in Java to use with DynamicMessages. Is this possible or does it have to be compiled to a "descriptor set" file first before this can be done?

I have a use case where I need to parse messages without having the corresponding precompiled classes in Java. So the DynamicMessage seems to be the correct fit, but I'm not sure how I can generate the DescriptorSet from the ".proto" definition.

Thanks in advance,
Pradeep

Oliver Jowett

da leggere,
30 ott 2014, 17:41:1930/10/14
a Pradeep Gollakota, Protocol Buffers
On 30 October 2014 02:53, Pradeep Gollakota <prade...@gmail.com> wrote:

> I have a use case where I need to parse messages without having the
> corresponding precompiled classes in Java. So the DynamicMessage seems to be
> the correct fit, but I'm not sure how I can generate the DescriptorSet from
> the ".proto" definition.

protoc --descriptor_set_out=FILE ?

Oliver

Pradeep Gollakota

da leggere,
31 ott 2014, 13:56:4031/10/14
a prot...@googlegroups.com, prade...@gmail.com
Hi Oliver,

Thanks for the response! I guess my question wasn't quite clear. In my java code I have a string which contains the content of a .proto file. Given this string, how can I create an instance of a Descriptor class so I can do DynamicMessage parsing.

Thanks!
- Pradeep

Oliver Jowett

da leggere,
31 ott 2014, 14:39:2731/10/14
a Pradeep Gollakota, Protocol Buffers
Basically, you can't do that in pure Java - the compiler is a C++
binary, there is no Java version.

Still, working with the output of --descriptor_set_out is probably the
way to go here. If you have the .proto file ahead of time, you can
pregenerate the descriptor output at build time and store it instead
of the .proto file. If you don't have the .proto file ahead of time
(and you can't redesign - this is not a good design) then you could
run the compiler at runtime and read the output. Either way, now you
have a parsed version of the message format as a protobuf-encoded
message that you can read into your Java program and extract the
Descriptors you need.

If you're looking at a selfdescribing message format, then I'd go with
using the parsed descriptors as your format description, not the text
.proto file.

Oliver
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.

Pradeep Gollakota

da leggere,
31 ott 2014, 17:10:3731/10/14
a prot...@googlegroups.com, prade...@gmail.com

Ok… awesome… I do have the .proto’s ahead of time, so I can have them compiled to the .desc files and store those.

Here’s my .proto file:

package com.lithum.pbnj;

import "google/protobuf/descriptor.proto";

option java_package = "com.lithium.pbnj";

extend google.protobuf.FieldOptions {
    optional bool isPii = 50101;
}

message MessagePublish {
    required string uuid = 1;
    required int64 timestamp = 2;
    required int64 message_uid = 3;
    required string message_content = 4;
    required int64 message_author_uid = 5;
    optional string email = 6 [(isPii) = true];
}

I compiled this .proto file into a .desc file using the command you gave me. I’m now trying to parse a DynamicMessage from the .desc file. Here’s the code I have so far.

        DescriptorProtos.FileDescriptorSet descriptorSet = DescriptorProtos.FileDescriptorSet.parseFrom(PBnJ.class.getResourceAsStream("/messages.desc"));
        Descriptors.Descriptor desc = descriptorSet.getFile(0).getDescriptorForType();

        Messages.MessagePublish event = Messages.MessagePublish.newBuilder()
                .setUuid(UUID.randomUUID().toString())
                .setTimestamp(System.currentTimeMillis())
                .setEmail("he...@example.com")
                .setMessageAuthorUid(1)
                .setMessageContent("hello world!")
                .setMessageUid(1)
                .build();

        DynamicMessage dynamicMessage = DynamicMessage.parseFrom(desc, event.toByteArray());

The final line in the above code is throwing the following exception:

Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
    at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
    at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:174)
    at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:478)
    at com.google.protobuf.MessageReflection$BuilderAdapter.parseMessage(MessageReflection.java:482)
    at com.google.protobuf.MessageReflection.mergeFieldFrom(MessageReflection.java:780)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:336)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:318)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:229)
    at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:180)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:419)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:229)
    at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:171)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:412)
    at com.google.protobuf.DynamicMessage.parseFrom(DynamicMessage.java:119)
    at com.lithium.pbnj.PBnJ.main(PBnJ.java:36)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

Ilia Mirkin

da leggere,
31 ott 2014, 17:18:4431/10/14
a Pradeep Gollakota, prot...@googlegroups.com
At no point are you specifying that you want to use the
"MessagePublish" descriptor, so you must still be using the API
incorrectly...

Pradeep Gollakota

da leggere,
31 ott 2014, 18:18:5931/10/14
a prot...@googlegroups.com, prade...@gmail.com, imi...@alum.mit.edu

Ok… so I was finally able to parse a dynamic message and it looks good. It looks like it was just a user error on my part… after a little bit of digging around, I found the right APIs to call. Now my code looks like:

        Descriptors.FileDescriptor fieldOptionsDesc = DescriptorProtos.FieldOptions.getDescriptor().getFile();
        DescriptorProtos.FileDescriptorSet set = DescriptorProtos.FileDescriptorSet.parseFrom(
                PBnJ.class.getResourceAsStream("/messages.desc"));
        Descriptors.Descriptor md = Descriptors.FileDescriptor.buildFrom(set.getFile(0),
                new Descriptors.FileDescriptor[]{fieldOptionsDesc}).findMessageTypeByName("MessagePublish");

        Messages.MessagePublish event = Messages.MessagePublish.newBuilder()

                .setUuid(UUID.randomUUID().toString())
                .setTimestamp(System.currentTimeMillis())
                .setEmail("he...@example.com")
                .setMessageAuthorUid(1)
                .setMessageContent("hello world!")
                .setMessageUid(1)
                .build();

        DynamicMessage dynamicMessage = DynamicMessage.parseFrom(md, event.toByteArray());
        // Parse worked!

        for (Descriptors.FieldDescriptor fieldDescriptor : md.getFields()) {
            Boolean extension = fieldDescriptor.getOptions().getExtension(Messages.isPii);
            System.out.println(fieldDescriptor.getName() + " isPii = " + extension);
        }

The output is:

uuid isPii = false
timestamp isPii = false
message_uid isPii = false
message_content isPii = false
message_author_uid isPii = false
email isPii = false

For some reason, this is incorrectly showing “isPii = false” for the email field when it should be “isPii = true” (as it is in the .proto file). Any thoughts on this?

Thanks again all!

Ilia Mirkin

da leggere,
31 ott 2014, 18:25:5131/10/14
a Pradeep Gollakota, prot...@googlegroups.com
On Fri, Oct 31, 2014 at 6:18 PM, Pradeep Gollakota <prade...@gmail.com> wrote:
> Boolean extension =
> fieldDescriptor.getOptions().getExtension(Messages.isPii);

Shouldn't this use some sort of API that doesn't use the Messages class at all?

-ilia

Pradeep Gollakota

da leggere,
31 ott 2014, 20:48:3631/10/14
a prot...@googlegroups.com, prade...@gmail.com, imi...@alum.mit.edu
Not really... one of the use cases I'm trying to solve for is an anonymization use case. We will have several app's writing protobuf records and the data will pass through an anonymization layer. The anonymizer inspects the schema's for all incoming data and will transform the pii fields. Since I will be defining the custom options that will be used by the app dev's, I will have precompiled classes available for reference just like the code shows.

So what I'm trying to figure out is, using the DynamicMessage API and having parsed a Descriptor, how do I find all the fields which have been annotated with the (isPii = true) option.

Oliver Jowett

da leggere,
31 ott 2014, 22:01:5931/10/14
a Pradeep Gollakota, Protocol Buffers, imi...@alum.mit.edu
You may be running into issues where the set of descriptors associated
with your parsed DynamicMessage (i.e. the ones you parsed at runtime)
do not match the set of descriptors from your pregenerated code (which
will be using their own descriptor pool). IIRC they're looked up by
identity, so even if they have the same structure they won't match if
loaded separately. It's a bit of a wart in the API - I'm not sure what
the right way to do this is, if you try to mix pregenerated code &
dynamically loaded descriptors, all sorts of things break.

Oliver

Pradeep Gollakota

da leggere,
31 ott 2014, 22:24:2231/10/14
a Protocol Buffers
Confirmed... When I replaced the "md" variable with the compiled Descriptor, it worked. I didn't think I was mixing the descriptors, e.g. the MessagePublish message is one that is produced via the compiled API and parsed using the DynamicMessage API. The isPii extension has been refactored into a separate proto that is precompiled into my codebase. I.E. the descriptor for MessagePublish should be loaded dynamically and the descriptor for the FieldOption I'm defining won't be loaded dynamically. As far as I can tell, there shouldn't be any mixing of the descriptor pools, though I may be wrong.

Any thoughts on how I can proceed with this project?

Oliver Jowett

da leggere,
1 nov 2014, 07:26:3501/11/14
a Pradeep Gollakota, Protocol Buffers
On 1 November 2014 02:24, Pradeep Gollakota <prade...@gmail.com> wrote:
> Confirmed... When I replaced the "md" variable with the compiled Descriptor,
> it worked. I didn't think I was mixing the descriptors, e.g. the
> MessagePublish message is one that is produced via the compiled API and
> parsed using the DynamicMessage API. The isPii extension has been refactored
> into a separate proto that is precompiled into my codebase. I.E. the
> descriptor for MessagePublish should be loaded dynamically and the
> descriptor for the FieldOption I'm defining won't be loaded dynamically. As
> far as I can tell, there shouldn't be any mixing of the descriptor pools,
> though I may be wrong.

This is exactly where the problem is, though - you have:

MessagePublish descriptor D1 (from encoded descriptorset) references
extension field F1 (from encoded descriptorset)
Message descriptor D2 (from pregenerated code) references extension
field F2 (from pregenerated code)

So if you have a message built from D1 then it thinks it has a field
F1; when you ask if it has extension F2 it says "no!" even though
they're really the same thing.

> Any thoughts on how I can proceed with this project?

It seems like a flaw in the API .. In the case I ran into, I could
work around it as the processing code only wanted to work with
non-option extensions when it had the precompiled code for the
extension, so for those well-known message types I'd just look up the
descriptor to use from the precompiled set rather than using the
in-stream descriptor set (the message format included the descriptors
inline). That doesn't really work here though.

Can you inspect the options using field descriptors from the encoded
descriptorset, rather than using Messages.pii from the pregenerated
code?

Oliver

Pradeep Gollakota

da leggere,
5 nov 2014, 16:45:1805/11/14
a prot...@googlegroups.com, prade...@gmail.com

Ok… I finally figured out the work around for this. I use a separate .proto file that contains my custom options.

package com.lithum.pbnj;

import "google/protobuf/descriptor.proto";

option java_package = "com.lithium.pbnj";

message LiOptions {
    optional bool isPii = 1 [default = false];
    optional bool isEmail = 2 [default = false];
    optional bool isIpAddress = 3 [default = false];
}

extend google.protobuf.FieldOptions {
    optional LiOptions li_opts = 50101;
}

Then I compile this .proto into a .java and can use it. When a message uses this extension, I can figure out which fields use my options, I use the following code:

        Descriptors.FileDescriptor fieldOptionsDesc = DescriptorProtos.FieldOptions.getDescriptor().getFile();
        Descriptors.FileDescriptor extensionsDesc = Extensions.getDescriptor().getFile();
        Descriptors.FileDescriptor[] files = new Descriptors.FileDescriptor[]{fieldOptionsDesc, extensionsDesc};

        DescriptorProtos.FileDescriptorSet set = DescriptorProtos.FileDescriptorSet.parseFrom(
                PBnJ.class.getResourceAsStream("/messages.desc"));
        DescriptorProtos.FileDescriptorProto messages = set.getFile(0);
        Descriptors.FileDescriptor fileDesc = Descriptors.FileDescriptor.buildFrom(messages, files);
        Descriptors.Descriptor md = fileDesc.findMessageTypeByName("MessagePublish");

        Set<Descriptors.FieldDescriptor> piiFields = Sets.newHashSet();
        for (Descriptors.FieldDescriptor fieldDescriptor : md.getFields()) {
            DescriptorProtos.FieldOptions options = fieldDescriptor.getOptions();
            UnknownFieldSet.Field field = options.getUnknownFields().asMap().get(Extensions.LI_OPTS_FIELD_NUMBER);
            if (field != null) {
                Extensions.LiOptions liOptions = Extensions.LiOptions.parseFrom(field.getLengthDelimitedList().get(0).toByteArray());
                if (liOptions.getIsEmail() || liOptions.getIsIpAddress() || liOptions.getIsPii()) {
                    piiFields.add(fieldDescriptor);
                    System.out.println(fieldDescriptor.toProto());
                }
            }
        }
Rispondi a tutti
Rispondi all'autore
Inoltra
0 nuovi messaggi