libprotobuf ERROR google/protobuf/message.cc:121

83 views
Skip to first unread message

dan.schm...@gmail.com

unread,
May 22, 2009, 6:59:48 AM5/22/09
to Protocol Buffers
Hey, there.

Massive discussion at work. We have two types of pb messages with
required fields (A & B, let's say) and we're using A.ParseFromString()
to determine the type. Obviously, when the message that arrives is of
type B, the parsing fails and we get the pertinent "Can't parse
message of type". Now, that comes with a capitalised ERROR at the
beginning of the message. Several questions now:

1. Does that mean something really wrong happens inside the pb code?
2. Why would this message appear inconsistently?
3. Diving into the code, the output comes from line 121 of message.cc
(as the title of the post says ;) ). Output should redirect to the
GOOGLE_LOG. Is there anyway of defining where this out actually goes
to? Like a flag when you compile them?

I am aware of the ongoing discussion between required and repeated
fields, but the confusing bit to me as a developer is that the
function returns a boolean. So when I ask it to parse, should I not
only expect just a true and a false? Or if an error actually happens,
would it not be better to let me handle it? Maybe an int or error code
return value would be handy to indicate the exact nature of the
error...

Anyway, thanks for any thoughts on that.

Oh, and the response to this post is going to settle who gets to hold
the TV remote at the office, so this is important stuff ;)

Regards,

Dan

Kenton Varda

unread,
May 22, 2009, 12:44:53 PM5/22/09
to dan.schm...@gmail.com, Protocol Buffers
You can capture log output using SetLogHandler():


However, you have a deeper problem:  Protocol Buffers does not guarantee that trying to parse an A from data which is actually of type B will fail.  In fact, if your input is a valid protocol buffer of *some* type, then the only reason parsing it as type A would fail is if the message did not happen to satisfy A's required fields.  Remember that protocol buffers are not self-describing, so there is no way to know for sure what a particular message's type is or isn't.  It may be worth reading the encoding documentation to understand what is actually being sent on the wire.

In any case, what you really need to do is create a message like:

message AorB {
  optional A a = 1;
  optional B b = 2;
}

Then send *that* message.  Now you can check has_a() and has_b() to distinguish the types.

dan.schm...@gmail.com

unread,
May 22, 2009, 2:06:17 PM5/22/09
to Protocol Buffers
Thanks a lot for that reply.

I do appreciate that two messages with the same layout will parse
indifferently as each others type. In the particular situation we are
just now, A and B are completely different, so there is no possibility
of confusion there. Dealing with specific messages, we do not rely
exclusively on the correct parsing of them to identify their type.
However, we do use it to verify that specific attributes are in fact
present in one of the messages. We then have other constraints to
check that the content of the message is valid. This is where our
debate is centred. Is it reasonable to use the pb to verify at least
that initial structure even though that every wrong type of message
parsing gives an ERROR message? Does the fact that there is such a
message mean that something could go wrong or is it just information
equivalent to the false return value of the function?

Many thanks again.

Dan

On May 22, 5:44 pm, Kenton Varda <ken...@google.com> wrote:
> You can capture log output using SetLogHandler():http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google...
>
> <http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google...>However,
> you have a deeper problem:  Protocol Buffers does not guarantee that trying
> to parse an A from data which is actually of type B will fail.  In fact, if
> your input is a valid protocol buffer of *some* type, then the only reason
> parsing it as type A would fail is if the message did not happen to satisfy
> A's required fields.  Remember that protocol buffers are not
> self-describing, so there is no way to know for sure what a particular
> message's type is or isn't.  It may be worth reading the encoding
> documentation to understand what is actually being sent on the wire.
>
> In any case, what you really need to do is create a message like:
>
> message AorB {
>   optional A a = 1;
>   optional B b = 2;
>
> }
>
> Then send *that* message.  Now you can check has_a() and has_b() to
> distinguish the types.
>

Kenton Varda

unread,
May 22, 2009, 5:14:42 PM5/22/09
to dan.schm...@gmail.com, Protocol Buffers
Again, I want to emphasize that any program which expects parse failures to happen during normal operation is probably doing something wrong.  Whether or not the parser accepts a message actually tells you very little about the integrity of the message.  Remember that the parser is designed to be tolerant of unexpected information in the message, since this is how it provides forward- and backward-compatibility.

For example, take these two messages:

  message A {
    optional int32 i = 1;
    required bool b = 2;
  }

  enum E { FOO = 1; BAR = 2; }
  message B {
    optional double d = 1;
    repeated E e = 2;
  }

What do you think would happen if you had a valid encoded message of type A and you tried to parse it as a B.  Guess what?  It succeeds.  In fact, *any* valid protocol message of *any* type will parse successfully as a B, since B has no required fields.

But what if you parsed a B as an A?  A has a required field, and B doesn't even have any fields of that type, so it can't succeed, right?  Wrong.  Booleans and enums are both encoded as varints in the wire.  So if B.e is non-empty, then A will interpret it as a value of A.b, satisfying the required field.

Now, you say that you are doing extra validation afterwards which you are convinced will ensure that a B cannot be mistaken for an A or vice versa.  This could work, but I still think it is a hacky approach, and hacky approaches tend to lead to unexpected problems down the road.  If at all possible, you should move to using the AorB message I proposed earlier.

All that said, the reason for the error message is simply for debugging help:  It identifies exactly which field was missing from the message.  Typically the caller doesn't particularly care which field was missing, since it will reject message regardless.  However, a developer testing the program would be very interested to know what was missing.  We discovered that without this message, it was very hard to debug these kinds of errors.  In general, error messages logged by libprotobuf are for such debugging purposes.

If you want to avoid printing this error, what you should really do is use one of the Message::ParsePartial*() methods when parsing.  This skips the required fields check, and will never print an error message.  You can manually call IsInitialized() on the result to check for required fields yourself -- this also will not log any errors.

Another alternative is to use LogSilencer:

But using ParsePartial will avoid even constructing the error message, which can be expensive.

dan.schm...@gmail.com

unread,
May 23, 2009, 12:40:51 PM5/23/09
to Protocol Buffers
Thanks a lot for the time spent on this topic. Really appreciate it. I
think we will end up using your optional message suggestion. There's
several places where the amount of messages might get us into trouble
in the future.

Once again thank you.

Dan
> Another alternative is to use LogSilencer:http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google...
>
> But using ParsePartial will avoid even constructing the error message, which
> can be expensive.
>
Reply all
Reply to author
Forward
0 new messages