Protobuf: Union Encoding

3,169 views
Skip to first unread message

wmsar

unread,
Sep 2, 2008, 6:44:23 PM9/2/08
to Protocol Buffers
Hi,

I'm new to protobuf as of 8 hours, and have some legacy software that
needs to communicate across a network where the messages need to be
"properly" encoded / decoded, such as is done in protobuf.

I'm curious how things like C/C++ unions would work when encoding /
decoding?

Is it as straight-forward as specifying a series of "optional"
parameters, and on the decoding side figuring out which one was
present through the protobuf Message API, and then re-constructing the
"real" union for the legacy application software to use?

Regards

David Anderson

unread,
Sep 2, 2008, 6:51:32 PM9/2/08
to wmsar, Protocol Buffers

That is correct. Any optional field having no value will be omitted in
the wire byte stream, which effectively implements a superset of
union. If you want the strict semantics of "exactly one field set",
you need to implement that in your own application logic. However,
given the C++ implementation of protobufs, such a translation should
remain blazingly fast for any sane number of union fields (say, under
20).

- Dave

>
> Regards
> >
>

Gregory P. Smith

unread,
Sep 2, 2008, 7:27:41 PM9/2/08
to wmsar, Protocol Buffers

Yep, you're thinking along the right lines.

-gps

chi...@gmail.com

unread,
Sep 2, 2008, 9:27:12 PM9/2/08
to Protocol Buffers
Lol.. I just posted something regarding this subject at:

http://groups.google.com/group/protobuf/msg/89a64fd5e5557d58

The short of it is I think the protocol buffers should support some
kinda syntax that allows you to do something like:

message Foo {...}
message Bar {...}
message Baz {...}
message LogEntry {
union TypeUnion {
Foo = 1;
Bar = 2;
Baz = 3;
}
required TypeUnion content = 1;
}

Read the original post for more details.
--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://open.iona.com

Kenton Varda

unread,
Sep 3, 2008, 2:48:15 PM9/3/08
to chi...@gmail.com, Protocol Buffers
If we were to support a union encoding, I think it should look like this:

message Foo {
  optional union bar {
    int32 a = 1;
    Baz b = 2;
    string c = 3;
  }
}

In other words, all it says is that "only one of these fields will be set", thus allowing code generators to share memory between them.  The fields a, b, and c would have the exact same methods that optional fields normally have, like has_a(), a(), etc.  Additionally, there would be a method like "bar()" which returns an enum identifying which of the fields is currently set.

The advantages are:
* No need to modify the wire format.
* Implementations can get by ignoring unions (just treat them as separate fields) until they care to implement them.

All that said, I'm not convinced this is worth implementing.  It would add a bunch of complication, and we've gotten by fine without this for many years.

chi...@gmail.com

unread,
Sep 3, 2008, 3:00:17 PM9/3/08
to Protocol Buffers
Good point.. guess the code generator could handle unionizing a
standard message encoding.. if multiple fields are encoded we could
go with a last one wins approach. And to make the proto files also
backward compatible, that could just be a message level option like:

message Foo {
option union true;
int32 a = 1;
Baz b = 2;
string c = 3;
}

BTW I do think this is a very common issue that comes up. Any time
you are working message formats your going to have to frame them in
something that will let you know the type your working with. Sure the
app could do the framing, but I think it would be much nicer if we did
it for them.

On Sep 3, 2:48 pm, "Kenton Varda" <ken...@google.com> wrote:
> If we were to support a union encoding, I think it should look like this:
> message Foo {
>   optional union bar {
>     int32 a = 1;
>     Baz b = 2;
>     string c = 3;
>   }
>
> }
>
> In other words, all it says is that "only one of these fields will be set",
> thus allowing code generators to share memory between them.  The fields a,
> b, and c would have the exact same methods that optional fields normally
> have, like has_a(), a(), etc.  Additionally, there would be a method like
> "bar()" which returns an enum identifying which of the fields is currently
> set.
>
> The advantages are:
> * No need to modify the wire format.
> * Implementations can get by ignoring unions (just treat them as separate
> fields) until they care to implement them.
>
> All that said, I'm not convinced this is worth implementing.  It would add a
> bunch of complication, and we've gotten by fine without this for many years.
>
Reply all
Reply to author
Forward
0 new messages