Does protocol buffers supports polymorphism (In java at least) ?

Maxim Veksler

unread,

Aug 21, 2008, 11:58:54 AM8/21/08

to Protocol Buffers, Tomer Ben-Ezra, Gilad Cohen

Hello everyone,

Lets me start by apologizing for the the abstract* question.

We're currently looking for an alternative solution to SOAP, which is
integrated into our code using XFire.

The main problem is that you cannot passed a sub class to an interface
the is defined as accepting super classes and expect the system to
deserialize the object to the sub class.

For the sake of example, we have super Object "Animal" and 2 sub
classes "Dog" and "Cat"

[SystemA]-[java]:[SOAP]:[WSDL]:getAnimal(Animal animal);
[SystemB] can send either Cat or Dog, [SystemA] will receive only
Animal and nothing more.

Does protocol buffers suggests a solution to this requirement? And if
so, could someone please give a word about the implementation
requirement ?

Thank you very much,
Maxim.

P.S.

One possibly solution to the mentioned above problem is serialization
and then sending the object as byte[], this OTOH is very non optimal
solution. Can't cooperate with other systems not running java; No data
type enforcement; Need to pass the object and it's class to cast to,
version upgrades and co...

Other solution is taking the class and making it flat. Only a list of
keys and values is passed. Same problems as above imply.

Kenton Varda

unread,

Aug 21, 2008, 1:47:20 PM8/21/08

to Maxim Veksler, Protocol Buffers, Tomer Ben-Ezra, Gilad Cohen

With protocol buffers, this problem should be solved using extensions:

http://code.google.com/apis/protocolbuffers/docs/proto.html#extensions

Example:

message Animal {

// General animal properties.

optional double weight = 1;

extensions 1000 to max;

}

message Dog {

// Dog-specific properties.

optional float average_bark_frequency = 1;

}

message Cat {

// Cat-specific properties.

enum Breed { TABBY = 1; CALICO = 2; ... }

optional Breed breed = 1;

}

extend Animal {

optional Dog dog = 1000;

optional Cat cat = 1001;

}

Note that in this example, it would be possible to create an Animal message which has both the dog and cat extensions filled in. You'll have to write some sort of validation routine to check for that.

Marc Gravell

unread,

Aug 22, 2008, 3:31:52 AM8/22/08

to Protocol Buffers

> Kenton Varda said [snip]

Ahh... your example has just answered a question I was asking myself
on the train earlier! I'm working on inheritance support for protobuf-
net, and while I can make it work .NET-.NET, it wouldn't have been
directly portable [as in: hard to represent as a .proto descriptor];
but I reckon I can tweak it to work like the above without much
pain...

(I know we discussed the same thing a while ago, but I was busy...)

Marc

Mats Kindahl

unread,

Aug 22, 2008, 3:55:37 AM8/22/08

to Kenton Varda, Maxim Veksler, Protocol Buffers, Tomer Ben-Ezra, Gilad Cohen

Hi Kenton,

It would be nice with something like a "type tagged union", i.e.,

message Animal {
union Inner {
message Dog dog = 1;
message Cat cat = 2;
}

optional double weight = 1;

required Inner animal = 2;
}

message AnimalList {
repeated Animal animals = 1;
}

It is a quite common use case to want to have one of several alternative
structures/messages and this extension would allow heterogeneous
sequences to be represented entirely in protobuf.

Best wishes,
Mats Kindahl
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com

mats.vcf

Marc Gravell

unread,

Aug 25, 2008, 6:47:36 PM8/25/08

to Protocol Buffers

For info, I've done a commit of protobuf-net that maps inheritance
of .NET classes to extensions as described by Kenton's Animal/Dog/Cat
example. I haven't tested .proto emission (nor done performance
tuning), but the main serializer/deserializer core works, and is wire-
compatible with other implementations.

As an aside - in this scenario I bend the serialization slightly so
that it will write sub-messages that define subclasses *before* it
writes local properties; during deserialization this enables the code
to create the right type earlier, rather than having to go back and re-
map the object to the desired sub-type. I doubt this will impact most
serializers, since the encoding spec indicates that out-of-sequence
tags should be expected...

Marc

Colin Fleming

unread,

Aug 26, 2008, 11:02:17 AM8/26/08

to Kenton Varda, Maxim Veksler, Protocol Buffers, Tomer Ben-Ezra, Gilad Cohen

The problem I see with this example is that you have to know all the derived types at compile time. Is there any pattern for an extensible model, where you could add more derived types without touching the base type?

Cheers,
Colin

2008/8/21 Kenton Varda <ken...@google.com>

ksuco...@gmail.com

unread,

Aug 26, 2008, 11:28:27 AM8/26/08

to Protocol Buffers

It would be nice to extend the syntax for inheritance, and use a form
of annotation to give the compiler hint as to how the classes /
interfaces should be generated....
So, two ways to support inheritance:

1. define a file type (e.g. .protoext) with a new extension that
supports the inheritance syntax.

- The new extension supports the new keywords 'extends' and
'implements' to support direct specificifcation os the inheritance
relationship between messages.
- A new keyword 'reserved' is used to allocate a (relative)
range of indices for a message. This helps with future extension of
messages.

2. support annotation to give compiler hint on how to generate the
class (java, cpp, etc..):

- @Hierarchi <child>:<parent-1>:<parent-2>:<parent-3>, etc...
(no multile inheritance; no interface support)

Developer can either go with approach #2 and specify the inheritance
relationship directly using annotation, or specify the relationships
using the extended syntax and have the .proto file generated based on
the defintions in the .protoext file.

Below is a simple example showing the concept:

o Define messages using the new syntax / extension that supports
inheritance (i.e. file name: Types.protoext)

message Animal reserved 10 {

optional double weight = 1;

optional Color color = 2;
}

message Dog extends Animal reserved 10 {
optional float average_bark_frequence = 1;
}

message Cat extends Animal reserved 10 {

optional Breed breed = 1;
}

message Lion extends Cat reserved 10 {
optional boolean isAlphaMale = 1;
}

o Result of compiling the .protoext file above (or this can be
hand coded - filename : Types.proto):

//@Hierarchi Lion:Cat:Animal
message Lion {
//-- Animal's indices start at 1 because it is the base
class (Lifeform does not count since it is an interface)
//@Class Animal

optional double weight = 1;

//@Class Animal
optional Color color = 2;

//-- Cat's indices starts at 11 because reserved is set
to 10: 10+1
//@Class Cat
optional Breed breed = 11;

//-- Lion's indices starts at 21 because reserved is set
to 10: 10+10+1
//@Class Lion
optional bolean isAlphaMale = 21;
}

//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Lion above
//@Class Animal

optional double weight = 1;

//@Class Animal
optional Color color = 2;

//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 11;
}

o Another example of generated .protofile with the 'reserved'
value not specified (i.e. default) or set to 0

//@Hierarchi Lion:Cat:Animal
message Lion {
//-- See comments for Animal above
//@Class Animal

optional double weight = 1;

//@Class Animal
optional Color color = 2;

//-- Cat's indices starts at 3 here because nothing
there is no reservation for the super class Animal, so the
//-- index continue from 3 - following the maximum index
in Animal.
//@Class Cat
optional Breed breed = 3;

//-- See comments for Cat above
//@Class Lion
optional bolean isAlphaMale = 4;
}

//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Animal above
//@Class Animal

optional double weight = 1;

optional Color color = 2;

//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 3;
}

Now we can do something like:

message Zoo {
repeated Animal animals;
}

This is not pertect, since you can instantiate an Animal which should
be an abstract object... but this does allow a more natural and safer
way to support inheritance in Protocol Buffer.

Kenton Varda

unread,

Aug 26, 2008, 4:45:21 PM8/26/08

to Colin Fleming, Maxim Veksler, Protocol Buffers, Tomer Ben-Ezra, Gilad Cohen

On Tue, Aug 26, 2008 at 8:02 AM, Colin Fleming <colin.ma...@gmail.com> wrote:

The problem I see with this example is that you have to know all the derived types at compile time.

No you don't. The Cat and Dog types could be declared in a completely separate .proto file. They could optionally not be compiled in to programs which do not use them.

Is there any pattern for an extensible model, where you could add more derived types without touching the base type?

This can be done just fine with extensions.

Kenton Varda

unread,

Aug 26, 2008, 4:50:08 PM8/26/08

to ksuco...@gmail.com, Protocol Buffers

On Tue, Aug 26, 2008 at 8:28 AM, <ksuco...@gmail.com> wrote:

It would be nice to extend the syntax for inheritance, and use a form
of annotation to give the compiler hint as to how the classes /
interfaces should be generated....

In your example, when you parse an Animal off the wire, how do you then "down-cast" that to a Dog or a Cat? Do you have to know which type to expect before you parse?

ksuco...@gmail.com

unread,

Aug 26, 2008, 11:59:34 PM8/26/08

to Protocol Buffers

Hmm, in the example, since the exact Animal is not known during
compile time, this information needs to be sent over the wire to the
parser as part of the encoded data -- perhaps the available wire type
can be extended to send the runtime type information over the wire
(e.g. using type 3 or 4 which has been deprecated to indicate a length-
limited type used to send the name of the proto-buffer message name to
use for the parsing...)?

On Aug 26, 1:50 pm, "Kenton Varda" <ken...@google.com> wrote:

Colin Fleming

unread,

Aug 27, 2008, 12:17:58 PM8/27/08

to prot...@googlegroups.com

The problem I see with this example is that you have to know all the derived types at compile time.

No you don't. The Cat and Dog types could be declared in a completely separate .proto file. They could optionally not be compiled in to programs which do not use them.

Ok, I obviously misunderstood something here. I'll take another look - thanks for the pointers.

Cheers,
Colin

Kenton Varda

unread,

Aug 27, 2008, 1:50:17 PM8/27/08

to ksuco...@gmail.com, Protocol Buffers

On Tue, Aug 26, 2008 at 8:59 PM, <ksuco...@gmail.com> wrote:

Hmm, in the example, since the exact Animal is not known during
compile time, this information needs to be sent over the wire to the
parser as part of the encoded data -- perhaps the available wire type
can be extended to send the runtime type information over the wire
(e.g. using type 3 or 4 which has been deprecated to indicate a length-
limited type used to send the name of the proto-buffer message name to
use for the parsing...)?

What happens when the receiver has the Animal type compiled in but not the Cat type? I suppose the type information we send needs to identify the type and all its superclasses, so that the receiver can choose to parse it as the closest superclass that it knows about.

But this is all getting very hairy. The advantage of extensions is that they fit naturally into the existing wire format without adding any new concepts like RTTI.

ksuco...@gmail.com

unread,

Aug 27, 2008, 10:56:01 PM8/27/08

to Protocol Buffers

Only the type Cat needs to be sent over the wire.. the hierachi
information was known during compile time.... so no need to send the
entire hierarchical information over the wire... Just the type of the
concrete class instantiated by the sender: Here is an example of a
home that has a pet of type Cat....

message Home {
optional Animal pet;
}

//@Hiearchi Cat:Animal
message Cat {
...
}

sender:

Home home = new Home();
home.setPet(new Cat("fluffy")); // okay because Cat is
a type of Animal
outstream.write(home.getBytes[]);

receiver:

Home home = Home.parseFrom(inStream); // pet is set to
Cat because type Cat is sent over the
wire...
Animal pet = home.getPet();
Cat cat = (Cat)pet; // okay, since Cat is constructed
from RTTI sent over the wire.

Here is another exmaple:

sender:

Cat cat = newCat();
cat.setName("Garfield");
outstream.write(cat.getBytes[]);

receiver:

Animal animal = Animal.parseFrom(inStream); // parser
of Animal see Cat as type sent over..
//
It looks up meta information for Cat and found that it
//
it a type of Animal based on hierahical information known
//
during compile time (from annotation specified).. It deligates
//
the parsing to the parser for Cat...
Cat cat = (Cat)animal; // okay, since Cat is constructed
by the Cat builder

On Aug 27, 10:50 am, "Kenton Varda" <ken...@google.com> wrote:

Kenton Varda

unread,

Aug 27, 2008, 11:14:08 PM8/27/08

to ksuco...@gmail.com, Protocol Buffers

On Wed, Aug 27, 2008 at 7:56 PM, <ksuco...@gmail.com> wrote:

Only the type Cat needs to be sent over the wire.. the hierachi
information was known during compile time....

So, the receiver has to have all possible subclasses of Animal compiled in? This seems like a pretty big limitation to me.

Also, your factory pattern works well in Java, but in C++ it's usually the case that the server allocates one message object and reuses it for each request (for efficiency). That wouldn't be possible with your scheme. Not to mention, many people prefer not to enable RTTI in C++, so there's no way to check if an object is an instance of any particular class.

We've discussed inheritance many times over the years, and the conclusion we keep coming to is that it just doesn't fit well with the protocol buffers model, leading to lots of little complications like the above. Extensions were explicitly designed to avoid these problems.

ksuco...@gmail.com

unread,

Aug 28, 2008, 2:45:34 AM8/28/08

to Protocol Buffers

If there are thousands of possible Animals, don't you have no choice
but to compile all the possible Animals since the true impl is
determined dynamically...? However, if both the sender and the
receiver agree to only deal with a limited subset of all possible
cases, they can always include only those subsets necessary for the
application - since the other types will never be sent; else error/
exceptions will be thrown.

As for efficiency vs. functionality, that's a trade off the design
will have to deal with ...

On Aug 27, 8:14 pm, "Kenton Varda" <ken...@google.com> wrote:

> ...
>
> read more »

daveb

unread,

Aug 28, 2008, 7:11:41 PM8/28/08

to Protocol Buffers

FWIW-- I'm just beginning to try in earnest to figure what to do for
the C binding -- since the exact type of the message isn't included in
the message itself, it falls on the code to pick the exact type for
the message.

My plan is to make a function which returns an instance of a subclass
given an instance of the class and the subclass type itself. It would
interpret the "unknown_fields" part of the message to create the
variables of the subclass.

A trivial example would be something like:
animal = animal__unpack (NULL, msg_len, msg_data);
if (strcmp (animal->species, "zebra") == 0)
{
Zebra *zebra = zebra__create_from__animal (animal);
...
}
it'd be more efficient if there was a way to get the type/descriptor
before completely deserializing... oh well...
It does seem clear that often the type information is most usefully
encoded in the message itself. It's easy to invent solutions--
although bookkeeping does become a hassle to allow client-side reuse
of messages.

- dave

> ...
>
> read more »

Kenton Varda

unread,

Aug 29, 2008, 1:21:48 PM8/29/08

to ksuco...@gmail.com, Protocol Buffers

On Wed, Aug 27, 2008 at 11:45 PM, <ksuco...@gmail.com> wrote:

If there are thousands of possible Animals, don't you have no choice
but to compile all the possible Animals since the true impl is
determined dynamically...?

Right. But extensions don't have this problem. An unknown extension just goes in the UnknownFieldSet.

However, if both the sender and the
receiver agree to only deal with a limited subset of all possible
cases, they can always include only those subsets necessary for the
application - since the other types will never be sent; else error/
exceptions will be thrown.

In practice I don't think this would work very well. Many apps at Google that use extensions rely on the fact that they don't need to compile in definitions for extensions they might receive but don't actually care about. This is especially important for servers that act as one part of a pipeline.

Kenton Varda

unread,

Aug 29, 2008, 1:23:15 PM8/29/08

to daveb, Protocol Buffers

On Thu, Aug 28, 2008 at 4:11 PM, daveb <lahi...@gmail.com> wrote:

FWIW-- I'm just beginning to try in earnest to figure what to do for
the C binding -- since the exact type of the message isn't included in
the message itself, it falls on the code to pick the exact type for
the message.

Err... I think you should implement extensions, not inheritance.

chi...@gmail.com

unread,

Sep 2, 2008, 9:14:17 PM9/2/08

to Protocol Buffers

Sorry for coming into this thread late (again).

I personally think this thread is trying to address 2 problems. If
you separate them out I think it becomes easier to thing about. First
off lets forget about polymorphism. Polymorphism is just a lazy mans
tool :)
Lets say I'm using protobufs to log different messages to a single log
file. To support this use case, I would have to do something like the
following:

message Foo { ... }
message Bar { ... }
message Baz { ... }
message LogEntry {
enum TypeEnum {
FOO = 1;
BAR = 2;
BAZ = 3;
}
required TypeEnum type = 1;
required bytes content = 2;
}

Then I would marshal the Foo,Bar, and Baz message types into the
content field fo the LogEntry and set the type enum to the correct
type. I believe that's a common scenario. I think it would make
everyone's life easier if the above construct was handled by the code
generators. I'm going to call this a 'union' for lack of a better
term. Something like:

message LogEntry {
union TypeUnion {
Foo = 1;
Bar = 2;
Baz = 3;
}
required TypeUnion content = 1;
}

Then in the API you could then use msg.setContent(Message msg) and
Message msg.geContent().. In languages that have built in RTTI like
java/c# perhaps that's all that you add, but in c/c++ you would also
code gen some msg.setContentType(TypeUnion type) and TypeUnion
msg.getContentType().

Now comes the second part of the problem.. Polymorphism.. I think
message Polymorphism would be handy for lazy folks (no insult
intended, I'm the king of lazy) who want to avoid repeating fields in
multiple message types. Going back to my logging use case, if all my
log messages have a 'required string transactionId=1' field, then it
would be nice if I could just:

message Transacted {
required string transactionId=1;
extensions 100 to max;
}
message Foo extends Transacted { ... }
message Bar extends Transacted { ... }
message Baz extends Transacted { ... }

Now if the language supports inheritance, sure why not have the Foo
class extends Transacted. But even if the language does not support
inheritance, I think you still get a win because you just made a Foo
class with all the fields that Transacted has without having to repeat
them (and possibly get them wrong) in the proto definition.

--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://open.iona.com

On Aug 29, 1:23 pm, "Kenton Varda" <ken...@google.com> wrote:

> ...
>
> read more »

Kenton Varda

unread,

Sep 3, 2008, 2:35:28 PM9/3/08

to chi...@gmail.com, Protocol Buffers

On Tue, Sep 2, 2008 at 6:14 PM, hi...@hiramchirino.com <chi...@gmail.com> wrote:

Going back to my logging use case, if all my
log messages have a 'required string transactionId=1' field, then it
would be nice if I could just:

message Transacted {
required string transactionId=1;
extensions 100 to max;
}
message Foo extends Transacted { ... }
message Bar extends Transacted { ... }
message Baz extends Transacted { ... }

Now if the language supports inheritance, sure why not have the Foo
class extends Transacted. But even if the language does not support
inheritance, I think you still get a win because you just made a Foo
class with all the fields that Transacted has without having to repeat
them (and possibly get them wrong) in the proto definition.

Why not make Foo, Bar, and Baz each contain a Transacted, like so:

message Foo {

optional Transacted transacted = 1;

...

Reply all

Reply to author

Forward