Ability to iterate a descriptor pool.

Exaurdon

unread,

Sep 22, 2009, 6:39:04 PM9/22/09

to Protocol Buffers

Summary: I would really like to be able to iterate through the message
descriptors in a DescriptorPool, in particular the generated_pool.

I am sending various protocol buffers as messages across a transport.
On the receiving side of the transport, I need to use information
contained in a message header to instantiate an object of the correct
type. If I can get a descriptor for the transmitted message, then I
can use the factory method to instantiate an appropriately typed
object. So, I need a way to embed enough information into the header,
that I can use it to get a descriptor.

First, since .proto definitions are provided for the descriptor type,
I could transmit a DescriptorProto object with each message. This is a
lot of per-message overhead though. Second, I could transmit the
complete name of each message in the header, then I can look up the
descrptor in the generated descriptor pool. However, I would prefer
not to send the complete string name in each message. Finally, I can
assign a unquie id (int) to each message type, and send this int in
the header. This minimizes the number of overhead bytes, but now
presents the problem of associating each message with an int. By
embedding an enum 'MessageId' in each message, I can easily determine
the messageId of any message object. This works well, except that on
the 'receiving' side of the application, I need to iterate through the
complete list of protobuf messages to extract their ids. This should
be easy, since the generated_pool contains descriptors for each
message. Unfortunately, while you can search a pool for a specific
descriptor, there is no way to iterate through the descriptors in the
pool.

Is there any reason not to permit iteration of descriptors? Or is this
simply a feature that has not be requested/needed? Finally, am I going
about this wrong? Is there a simpler way to accomplish what I am
trying to do? (I.e. transmit a message's type from server to client)

Henner Zeller

unread,

Sep 22, 2009, 6:48:05 PM9/22/09

to Exaurdon, Protocol Buffers

Hi,

On Tue, Sep 22, 2009 at 15:39, Exaurdon <alexric...@gmail.com> wrote:
>
> Summary: I would really like to be able to iterate through the message
> descriptors in a DescriptorPool, in particular the generated_pool.
>
> I am sending various protocol buffers as messages across a transport.
> On the receiving side of the transport, I need to use information
> contained in a message header to instantiate an object of the correct
> type.

[...]

> Finally, am I going about this wrong? Is there a simpler way to accomplish what I am
> trying to do?

Not answering your question about the iterating here, but would it be
sufficient to just have a container protocol buffer that contains
several optional fields with the different message types you want to
transmit - then you can make use of proto-buffers way of generating
the messages for you

messasge ContainerMessage {
optional MyFirstMessageType foo = 1;
optional TheOtherMessageType bar = 2;
optional YetAnotherMessageType baz = 3;
// ... to be extended with more fields if more different types need
to be sent.
}

On the sending side, you just fill the right field and send the
ContainerMessage.
On the receiving side, you can check with has_foo() if the message has
been sent.

The overhead is minimal with this way because you essentially indicate
the type of the message with a single byte (the tag 1, 2, 3 would
indicate that). And the speed in encoding/decoding will be as fast as
it gets compared to a meta-data approach you're considering. It is
actually close to the 'having a type-id thing' - but just simply
implemented with the stuff that is already there; and even in a
typesafe manner because you have getters/setters that work with the
correct type.

-h

Exaurdon

unread,

Sep 22, 2009, 7:35:01 PM9/22/09

to Protocol Buffers

In a limited scope, this would probably work.

In my particular case I am working with teams from 10+ different
projects. Each project is generating dozens to hundreds of messages. I
imagine that trying to get all of those messages into a single message
would result in a pretty bloated message definition. Since each
project team uses their own .proto file, (and each team only needs to
compile and include proto files from other components they communicate
with) there would not be a central place to define this message,
though I could use the 'extension' functionality to get around this.

Another problem would be that I think I would end up needing code (C+
+) to iterate and call each of the 'has_*' methods. which is less than
Ideal with a non-trivial set of messages. It looks like I could avoid
some of this by using the reflection API to get a list of the fields
that are set.

Thanks for the idea, I'll continue to look at it. I still would love
to be able to iterate through the descriptor pool. It looks like
internally it is a set of hash_maps, so iteration shouldn't really be
difficult, (aside from constructing a thread-safe interface I
suppose?)

Alex

On Sep 22, 4:48 pm, Henner Zeller <h.zel...@acm.org> wrote:
> Hi,
>

Kenton Varda

unread,

Sep 23, 2009, 1:46:33 PM9/23/09

to Exaurdon, Protocol Buffers

On Tue, Sep 22, 2009 at 4:35 PM, Exaurdon <alexric...@gmail.com> wrote:

In my particular case I am working with teams from 10+ different
projects. Each project is generating dozens to hundreds of messages. I
imagine that trying to get all of those messages into a single message
would result in a pretty bloated message definition. Since each
project team uses their own .proto file, (and each team only needs to
compile and include proto files from other components they communicate
with) there would not be a central place to define this message,
though I could use the 'extension' functionality to get around this.

This is pretty much exactly what extensions were designed for. We had the exact same problem at Google, and I personally invented something called "MessageSet" and later extensions (a refinement of MessageSet) to solve the problem. You will need a shared proto file with a message definition like:

message Outer {

extensions 1000 to max;

}

Then each project can "extend" it, by writing something like the following *in their own .proto file*:

extend Outer {

optional FooMessage foo_ext = 1234;

optional BarMessage bar_ext = 1235;

}

You need to have some system for keeping the extension numbers unique, of course.

Another problem would be that I think I would end up needing code (C+
+) to iterate and call each of the 'has_*' methods. which is less than
Ideal with a non-trivial set of messages. It looks like I could avoid
some of this by using the reflection API to get a list of the fields
that are set.

Yes, you can use the reflection API's ListFields(). But more likely what will happen is that the eventual consumer of the message will know what type it is expecting.

Thanks for the idea, I'll continue to look at it. I still would love
to be able to iterate through the descriptor pool. It looks like
internally it is a set of hash_maps, so iteration shouldn't really be
difficult, (aside from constructing a thread-safe interface I
suppose?)

Actually, it would be pretty difficult. The problem is that a DescriptorPool is typically just a cache in front of a DescriptorDatabase. When you look up something that isn't already in the pool's hash_maps, it falls back to the DescriptorDatabase. So we'd also need to add a way to iterate through a database. But DescriptorDatabases can be huge. For instance, we have an implementation inside Google which contains *all* protobuf types defined across *all* Google projects -- a huge number! Thus, DescriptorPool has quite intentionally been designed to avoid iterating over the contents, because the contents are potentially infinite.

Reply all

Reply to author

Forward