ParseFromString() Performance Cost

255 views

Skip to first unread message

jeun...@gmail.com

unread,

Aug 21, 2008, 4:39:47 PM8/21/08

to Protocol Buffers

Hi folks,

I've got a question about reading messages using the Protocol Buffer.

In this scenario, I have an application that is reading in a bunch of
messages from a file. The thing is, there are actually a whole bunch
of different data types stored in this file, and it is the
responsibility of the reading application to correctly interpret these
different data types.

Since the data has been serialized, the application must use the
ParseFromString function to "unpack" the data back into a usable
format.

My questions is this: What is the "accepted way" to differentiate
between the different data types on the receiving side?

In the examples I've looked at (See:
http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.message.html)
it appears that the receiving side just "knows" what the data is
before it tries to ParseFromString.

For instance, if I know that the data is of type Foo1, I can do this:

Foo1 foo1;
foo1.ParseFromString(data);

However, if I'm not sure if the data is Foo1, Foo2, Foo3 or Foo4, it
seems like I would have to do something like this:

Foo1 foo1;
Foo2 foo2;
Foo3 foo3;
Foo4 foo4;

if (foo1.ParseFromString(data)) { /* The data is of type Foo1 */ }
elseif (foo2.ParseFromString(data)) { /* The data is of type Foo2 */ }
elseif (foo3.ParseFromString(data)) { /* The data is of type Foo3 */ }
elseif (foo4.ParseFromString(data)) { /* The data is of type Foo4 */ }

Obviously, the more types of data that it could be, the more times I
have to call ParseFromString(). From a performance standpoint, is
ParseFromString a relatively lightweight call?

Or is there perhaps a better way to solve this problem without having
to make a bunch of blind calls to ParseFromString()?

-Michael

Kenton Varda

unread,

Aug 21, 2008, 4:52:16 PM8/21/08

to jeun...@gmail.com, Protocol Buffers

ParseFromString() may return true even if the message being parsed is a different type, because it may have a similar underlying structure.

You need two either write a tag before each message in your file which identifies the type (which you would parse and handle yourself), or you could have a container type like:

message Container {

required string type_name = 1;

required bytes message = 2;

}

Here, "message" is the actual message and you figure out what type it is from type_name.

Or, if there are only a small number of possible types (e.g. A, B, and C), you could do:

message Container {

// Exactly one of these will be filled in.

optional A a = 1;

optional B b = 2;

optional C c = 3;

Reply all

Reply to author

Forward

0 new messages