deserializing unknown protobuf message?

Tom Ward

unread,

Aug 19, 2013, 3:37:43 PM8/19/13

to prot...@googlegroups.com

Hi all,

I'm just getting to grips with protobuf and, having read through the documentation, I'm struggling to find what I'm after.

Basically I'm trying to work out how to deserialize a protobuf message without using the generated headers, as we're likely to get messages that weren't generated at compile time. I've looked through the documentation, but I only seem to be able to find ones that use generated classes to deserialize, or that use a Descriptor from a generated class to create a DynamicMessage, which I can't seem to work out how to do when we don't have the proto.

Here's a very simple example of what I mean, where Message* is set to some base type that allows deserialization, and then can use the reflection API (or some factory) to correctly deal with the message:

#include <sstream>
#include "google/protobuf/message.h"
 
int main(int argc, char **argv)
{
  if( argc > 1)
  {
    // This will fail to compile, as Message is pure abstract
    google::protobuf::Message* message = NULL; //new google::protobuf::Message();
    std::istringstream ss(std::string(argv[1],strlen(argv[1])));
 
    message->ParseFromIstream(&ss);
 
    // do something with the message
    // ...
  }
 
  return 0;
}

Apologies in advance if this has already been answered...

Thanks

Tom

Ilia Mirkin

unread,

Aug 19, 2013, 4:28:28 PM8/19/13

to Tom Ward, prot...@googlegroups.com

Something like protoc --decode_raw?

As for manipulating these in code, there's nothing too great available
if you don't have a descriptor at all. You could create a dummy
message (no fields) and then get them via unknown fields in the
message's reflection object, but they're not easy to manipulate. Note
that the encoded information is not always sufficient for correct
decoding. For example, strings, submessages, and perhaps packed arrays
are encoded in the same way. I also don't think that whether zigzag
encoding is used on varints is part of the serialized data. You really
really really want a descriptor. If not compiled in, then passed along
with the data.

-ilia

> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/groups/opt_out.

Oliver Jowett

unread,

Aug 19, 2013, 4:42:57 PM8/19/13

to Tom Ward, Protocol Buffers

On Mon, Aug 19, 2013 at 8:37 PM, Tom Ward <tw...@thefoundry.co.uk> wrote:

Basically I'm trying to work out how to deserialize a protobuf message without using the generated headers, as we're likely to get messages that weren't generated at compile time. I've looked through the documentation, but I only seem to be able to find ones that use generated classes to deserialize, or that use a Descriptor from a generated class to create a DynamicMessage, which I can't seem to work out how to do when we don't have the proto.

The protobuf encoding isn't self-describing; generally you need either a compiled message definition or a Descriptor to make sense of an encoded message.

Without one of those, there is not much you can do beyond splitting the message into opaque fields that you will have difficulty in interpreting further.

As Descriptors can be turned into protobuf messages, one approach is to include a serialized Descriptor as part of the standard message framing; then you can use DynamicMessage on the receiver side.

See https://developers.google.com/protocol-buffers/docs/techniques#self-description

Oliver

Tom Ward

unread,

Aug 20, 2013, 3:27:02 PM8/20/13

to prot...@googlegroups.com, Tom Ward

I thought as much, just wanted to make sure I wasn't missing anything :)

So after much playing about, I've managed to get something working, but I had to use a deprecated function (DynamicMessageFactory(const DescriptorPool* pool);
)

I also had to pass the fullname() of the message type to be able to know which descriptor to use in a FileDescriptorProto, am I correct in saying you can't have two messages with the same fully qualified name? (For example, "foo.bar.baz" can/should only have one proto?)

Here's my working example, First I have a "MetaMessage" proto which has a FileDescriptorProto, string for message fullname, and message_data:

import "google/protobuf/descriptor.proto";
 
message MetaMessage {
  required google.protobuf.FileDescriptorProto message_descriptor = 1;
  required string message_typename = 2;
  required bytes message_data = 3;
}

I then try and find a descriptor for the given message type name first, and if that fails then I call BuildFile() on my own DescriptorPool to parse the FileDescriptor into a descriptor, then with custom message factory generate a DynamicMessageFactory to create DynamicMessages.

void deserializeMessage( const char* filePath )
{
 
  std::ifstream filestr( filePath, std::ifstream::binary );
 
  // Create MetaMessage object and parse from stream
  MetaMessage message;
  message.ParseFromIstream(&filestr);
 
  // First try and find the descriptor, incase it's a compiled in type
  const google::protobuf::Descriptor* descriptor = 
    google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName( message.message_typename() );
 
  const google::protobuf::Message* prototype = NULL;
 
  // if we have a descriptor, then get the Prototype message
  if( descriptor )
  {
    prototype = google::protobuf::MessageFactory::generated_factory()->GetPrototype( descriptor );
  }
  else
  {
    // Otherwise, add the descriptor to my Descriptor Pool
    static google::protobuf::DescriptorPool myPool;
    myPool.BuildFile( message.message_descriptor() );
 
    descriptor = myPool.FindMessageTypeByName( message.message_typename() );
 
    // create a factory from descriptor pool, and get prototype
    static google::protobuf::DynamicMessageFactory myFactory(&myPool);
    prototype = myFactory.GetPrototype( descriptor );
  }
 
  // now we have prototype, can create mutable type and deserialize
  if( prototype )
  {
    google::protobuf::Message* payload = prototype->New();
    assert( message.has_message_data() );
    payload->ParseFromString( message.message_data() );
 
    std::string msg = payload->GetReflection()->GetString(*payload, descriptor->FindFieldByName("message"));
    std::cout << msg << std::endl;
  }
}

int main(int argc, char **argv)
{
  if

( argc > 1 )
  {
    deserializeMessage( argv[1] );
  }
 
  return 0;
}

Seems to work, but not sure about inter-dependencies within the proto file...

Reply all

Reply to author

Forward