Parsing messages in C++ with extensions

703 views
Skip to first unread message

Jesper Eskilson

unread,
Aug 26, 2009, 10:49:08 AM8/26/09
to Protocol Buffers
How do you parse a message from a byte-steam which contains
extensions? In Java, I can do

ExtensionRegistry registry = ...
registry.add(...);
Foo.parseFrom(buf, registry)

but I can't find any reference on how to do this in C++.

--
/Jesper

Kenton Varda

unread,
Aug 26, 2009, 4:38:19 PM8/26/09
to Jesper Eskilson, Protocol Buffers
In C++ all compiled-in extensions are automatically registered in a global registry which is used automatically by all compiled-in classes.  I now regret this design decision due to a number of subtle problems it creates, but for you it means that you don't have to do anything special.

Jesper Eskilson

unread,
Aug 26, 2009, 6:04:33 PM8/26/09
to Kenton Varda, Protocol Buffers
On Wed, Aug 26, 2009 at 10:38 PM, Kenton Varda<ken...@google.com> wrote:
> In C++ all compiled-in extensions are automatically registered in a global
> registry which is used automatically by all compiled-in classes.  I now
> regret this design decision due to a number of subtle problems it creates,
> but for you it means that you don't have to do anything special.

I wonder why it isn't working for me, then. I'm serializing an object
from Java with an extension set, but when parsing it in C++, the
extension field is unset. I'll have to dig some deeper tomorrow.

(Annoying time-zone lag. You're replying just around my bedtime. :-))

--
/Jesper

Kenton Varda

unread,
Aug 26, 2009, 6:18:23 PM8/26/09
to Jesper Eskilson, Protocol Buffers
Some linkers will drop object files that aren't referenced from anywhere, so if your code doesn't actually use anything from the .proto file defining the extension, it might not be linked in, and thus won't be in the registry.  This is one of the common problems that make me wish we had an explicit ExtensionRegistry in C++...

Otherwise, I don't know what your problem might be.  If you can narrow it down to a small self-contained example I could debug it.

Jesper Eskilson

unread,
Aug 27, 2009, 5:14:21 AM8/27/09
to Protocol Buffers
I keep clicking on "reply" instead of "reply all"...


---------- Forwarded message ----------
From: Jesper Eskilson <jes...@eskilson.se>
Date: Thu, Aug 27, 2009 at 9:23 AM
Subject: Re: Parsing messages in C++ with extensions
To: Kenton Varda <ken...@google.com>


On Thu, Aug 27, 2009 at 12:18 AM, Kenton Varda<ken...@google.com> wrote:
> Some linkers will drop object files that aren't referenced from anywhere, so
> if your code doesn't actually use anything from the .proto file defining the
> extension, it might not be linked in, and thus won't be in the registry.
>  This is one of the common problems that make me wish we had an explicit
> ExtensionRegistry in C++...
> Otherwise, I don't know what your problem might be.  If you can narrow it
> down to a small self-contained example I could debug it.

Well, hm. I don't seem to be linking with the defining code. That
would probably explain it.

Would an explicit extension-registry be difficult to implement?

What's the recommended way of solving the problem? Linking in all the
protocol definitions in the same module is something I'd like to
avoid. Is there a way I can extract the "unknown" field and pass it to
the "defining module" for further parsing?

--
/Jesper

--
/Jesper

Jesper Eskilson

unread,
Aug 27, 2009, 5:17:35 AM8/27/09
to Kenton Varda, Protocol Buffers
On Thu, Aug 27, 2009 at 12:18 AM, Kenton Varda<ken...@google.com> wrote:
> Some linkers will drop object files that aren't referenced from anywhere, so
> if your code doesn't actually use anything from the .proto file defining the
> extension, it might not be linked in, and thus won't be in the registry.
>  This is one of the common problems that make me wish we had an explicit
> ExtensionRegistry in C++...
> Otherwise, I don't know what your problem might be.  If you can narrow it
> down to a small self-contained example I could debug it.

You're right. I first needed to link all of the *.pb.cc files into
library doing the parsing, but as you said, MSVC drops the code unless
I explicitly refer to some code in it.

This is really annoying.

--
/Jesper

Kenton Varda

unread,
Aug 27, 2009, 7:28:43 PM8/27/09
to Jesper Eskilson, Protocol Buffers
Yep, it's a very annoying problem.  The solution I prefer is to add a dummy usage of one of the classes in your .proto somewhere high-up in your program, in a place that should logically "know" that the file is needed.

BTW, if you aren't actually explicitly using the extension anywhere, then the only reason to force it to be linked in is if you want it to appear correctly when using reflection or TextFormat.  Otherwise you should just let it go into the UnknownFieldSet.

Jesper Eskilson

unread,
Aug 28, 2009, 1:25:14 AM8/28/09
to Kenton Varda, Protocol Buffers
On Fri, Aug 28, 2009 at 1:28 AM, Kenton Varda<ken...@google.com> wrote:
> Yep, it's a very annoying problem.  The solution I prefer is to add a dummy
> usage of one of the classes in your .proto somewhere high-up in your
> program, in a place that should logically "know" that the file is needed.

This is really not feasible. There is no such place, unfortunately.

> BTW, if you aren't actually explicitly using the extension anywhere, then
> the only reason to force it to be linked in is if you want it to appear
> correctly when using reflection or TextFormat.  Otherwise you should just
> let it go into the UnknownFieldSet.

The situation is this: I have a main program which parses incoming
messages, and some of these messages have extensions set. These
extensions are (sometimes) only known to "plugins" to the main
program. The incoming message has an identifier so that the main
program knows which plugin it should send the message to, but the main
program itself doesn't know anything about the plugin. The problem I
had was that when the message was passed to the plugin, the plugin
fails to get the extension, i.e. the extension field was unset (i.e.
HasExtension(foo::foo_ext) returned false).

Does the UnknownFieldSet allow the plugin to extract the "unknown field"?

The original solution to this I had before I read up on extensions was
to store the messages to/from the plugins as seralized byte-streams in
the top-level package. This actually worked fine, with the exception
of having to encode/copy the message twice at both ends.

--
/Jesper

Kenton Varda

unread,
Aug 28, 2009, 12:42:38 PM8/28/09
to Jesper Eskilson, Protocol Buffers
Ouch, this hole is probably a lot deeper than it looks.

First let me review some things which you may already know...

I assume these "plugins" are DLLs.  Do you load and unload these plugins at runtime, or just at startup?  If you unload them at runtime, then each one needs to be statically linked against its own instance of the protobuf library (probably the lite library!), because libprotobuf is not designed to allow individual protos to be unloaded without shutting down the entire library.  But if each plugin has its own instance, then you cannot pass protobuf objects to the plugin.  You can only pass encoded messages, which it must parse itself.

If you do load the plugins for the entire life of the process, then things are a little more flexible.  In this case, you can share a single instance of libprotobuf among all of them and your app as long as everyone links against it as a DLL.  (Though, in this case all plugins must be linked against the exact same version of libprotobuf, which may be a problem if they are developed by separate groups.)

Now, getting back to extensions, if you are going with the first option, then obviously your app can recognize extensions defined within the plugins, because they use a separate instance of libprotobuf.  But it doesn't matter, because you have to re-serialize the messages before sending them to the plugins anyway, and they will do their own parsing with the extensions recognized.

If you are going with the second option (sharing a common instance of libprotobuf), then the plugin *should* have registered its extensions with that common instance at startup, and therefore it should be parsing correctly.

To answer your specific question, BTW, yes, you can inspect the contents of UnknownFieldSet.  Every message object has methods unknown_fields() and mutable_unkown_fields() which return the UnknownFieldSet.  The API is described here:

Jesper Eskilson

unread,
Aug 28, 2009, 12:53:09 PM8/28/09
to Kenton Varda, Protocol Buffers

Ok, thanks for the detailed explanation. I think I have enough info on
this for the time being.

--
/Jesper

Jesper Eskilson

unread,
Sep 3, 2009, 7:18:59 AM9/3/09
to Kenton Varda, Protocol Buffers
On Fri, Aug 28, 2009 at 6:42 PM, Kenton Varda<ken...@google.com> wrote:

> To answer your specific question, BTW, yes, you can inspect the contents of
> UnknownFieldSet.  Every message object has methods unknown_fields() and
> mutable_unkown_fields() which return the UnknownFieldSet.  The API is
> described here:
> http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.unknown_field_set.html

I'm probably missing something, but I get a complation error when
trying to access the unknown fields of a message:

Error 1 error C2248:
'google::protobuf::UnknownFieldSet::UnknownFieldSet' : cannot access
private member declared in class 'google::protobuf::UnknownFieldSet';
e:\dev\ide-platform\core\ide\protobuf\src\google/protobuf/unknown_field_set.h(124)
: see declaration of
'google::protobuf::UnknownFieldSet::UnknownFieldSet';
e:\dev\ide-platform\core\ide\protobuf\src\google/protobuf/unknown_field_set.h(63)
: see declaration of
'google::protobuf::UnknownFieldSet' e:\dev\ide-platform\core\ide\CSpyServer\src\model\CssListWindowService.cpp 21

The code looks like this:

void
foo(const cdp::DebugCommand &cmd)
{
UnknownFieldSet set = cmd.GetReflection()->GetUnknownFields(cmd);
}

--
/Jesper

Kenton Varda

unread,
Sep 3, 2009, 12:56:49 PM9/3/09
to Jesper Eskilson, Protocol Buffers
You want:

  const UnknownFieldSet& set = cmd.GetReflection()->GetUnknownFields(cmd);

Kenton Varda

unread,
Sep 3, 2009, 12:57:47 PM9/3/09
to Jesper Eskilson, Protocol Buffers
BTW, this also works, assuming cmd is a protoc-generated class:

  const UnknownFieldSet& set = cmd.unknown_fields();

Jesper Eskilson

unread,
Sep 6, 2009, 12:46:22 PM9/6/09
to Kenton Varda, Protocol Buffers
On Thu, Sep 3, 2009 at 6:56 PM, Kenton Varda<ken...@google.com> wrote:
> You want:
>   const UnknownFieldSet& set = cmd.GetReflection()->GetUnknownFields(cmd);

Ok, so if I have function which receives a message which as an unknown
field which I need to parse into a "real" message, how should I do?
(The documentation is a little fuzzy on this...)

I tried to call ParseFromString() on the string returned by
length_delimited(), but that just crashed.

const UnknownFieldSet& set = cmd.unknown_fields();

const UnknownField f = set.field(0);
const std::string &buf = f.length_delimited();

listwindow::ListWindowCommand lwc;
lwc.ParseFromString(buf);

--
/Jesper

Kenton Varda

unread,
Sep 6, 2009, 10:45:15 PM9/6/09
to Jesper Eskilson, Protocol Buffers
On Sun, Sep 6, 2009 at 9:46 AM, Jesper Eskilson <jes...@eskilson.se> wrote:
On Thu, Sep 3, 2009 at 6:56 PM, Kenton Varda<ken...@google.com> wrote:
> You want:
>   const UnknownFieldSet& set = cmd.GetReflection()->GetUnknownFields(cmd);

Ok, so if I have function which receives a message which as an unknown
field which I need to parse into a "real" message, how should I do?
(The documentation is a little fuzzy on this...)

I tried to call ParseFromString() on the string returned by
length_delimited(), but that just crashed.

 const UnknownFieldSet& set = cmd.unknown_fields();
 const UnknownField f = set.field(0);

This also needs to be a const reference.  I guess I forgot to explicitly declare UnknownField uncopyable, so the compiler is generating a default copy constructor, which indeed would crash in this case.  Please do not assume that protobuf classes are copyable unless they explicitly say that they are.
Reply all
Reply to author
Forward
0 new messages