scalapb-json4s doesn't preserve unknown fields

230 views
Skip to first unread message

Chris Taylor

unread,
Oct 27, 2020, 9:35:05 AM10/27/20
to ScalaPB
Hi,

we noticed today that ScalaPB doesn't preserve unknown fields in json messages. In fact, by default (using JsonFormat), the parser throws an exception "scalapb.json4s.JsonFormatException: Cannot find field: <field-name> in message <message-name>".

I noticed I can at least ignore unknown fields by instantiating and configuring a Parser myself (new Parser().ignoringUnknownFields), but there doesn't seem to be a way to populate the unknownFields value on generated message types.

Is that correct and intended? In Twinagle (the Scala/Finagle implementation of Twirp), we'd like to be able to roundtrip messages even in the presence of unknown fields. This works for binary protobuf, but is currently broken for json (de)serialization.

There's a PR replicating the behaviour here: https://github.com/soundcloud/twinagle/pull/179

Thanks,
Chris

Nadav Samet

unread,
Oct 27, 2020, 12:03:52 PM10/27/20
to Chris Taylor, ScalaPB
Hi Chris,

ScalaPB's JsonFormat goal is to follow the JSON spec and otherwise the behavior of the Java protobuf library. In that sense, discarding unknown fields is the correct and intended behavior. From the API docs:
Proto2 only features (e.g., extensions and unknown fields) will be discarded in the conversion. That is, when converting proto2 messages to JSON format, extensions and unknown fields will be treated as if they do not exist. This applies to proto2 messages embedded in proto3 messages as well.

Since the above comment was written, unknown fields became available in proto3 (it's no longer a proto2-only feature), yet the official JSON serializer/parser ignores them. 
One way to solve this would be to add an option to the parser/serializer to encode the unknown fields as a base64 string under an "_unknownFields" key in the json object. This would make it possible to pass those bits around without the JSON library knowing their interpretation, though it wouldn't be human readable. Let me know if something like this can be useful, and feel free to file a feature request in scalapb-json4s github project for that. It will be helpful if you can indicate whether you're available to work on a PR to expedite this.

I would also strongly recommend that if both systems use proto, consider using the binary representation and not the JSON representation, as you are missing out on protocol evolution features of protobuf. These features include properly preserving unknown fields, supporting field renames (as long as the type and tag number are the same), unrecognized enum values, and so on.
 

--
You received this message because you are subscribed to the Google Groups "ScalaPB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scalapb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scalapb/b35da879-102c-4193-8e4b-31a47c0fe331n%40googlegroups.com.


--
-Nadav

Chris Taylor

unread,
Nov 9, 2020, 3:53:46 AM11/9/20
to Nadav Samet, ScalaPB
I just realized that I dropped the ball on following up here, sorry about that!

On Tue, 27 Oct 2020 at 17:03, Nadav Samet <thes...@gmail.com> wrote:
Hi Chris,

ScalaPB's JsonFormat goal is to follow the JSON spec and otherwise the behavior of the Java protobuf library. In that sense, discarding unknown fields is the correct and intended behavior. From the API docs:
Proto2 only features (e.g., extensions and unknown fields) will be discarded in the conversion. That is, when converting proto2 messages to JSON format, extensions and unknown fields will be treated as if they do not exist. This applies to proto2 messages embedded in proto3 messages as well.

Since the above comment was written, unknown fields became available in proto3 (it's no longer a proto2-only feature), yet the official JSON serializer/parser ignores them. 

thanks, it's useful context to know that ScalaPB is following the Java library behaviour here.
 
One way to solve this would be to add an option to the parser/serializer to encode the unknown fields as a base64 string under an "_unknownFields" key in the json object. This would make it possible to pass those bits around without the JSON library knowing their interpretation, though it wouldn't be human readable. Let me know if something like this can be useful, and feel free to file a feature request in scalapb-json4s github project for that. It will be helpful if you can indicate whether you're available to work on a PR to expedite this.

adding an "_unknownFields" field would be a ScalaPB-specific feature though, wouldn't it? E.g a Go recipient of such a message wouldn't know what to do about it. What I was thinking about would be to keep around unknown fields in such a way that they can be serialized out in the same way that they were received, similar to how binary protobuf is able to propagate unknown fields.

That said, I've been thinking some more: it wouldn't be possible to receive unknown fields in JSON and propagate them as binary protobuf, since JSON doesn't contain any information about field numbers, but contains the field names instead. Or am I missing something?

 
I would also strongly recommend that if both systems use proto, consider using the binary representation and not the JSON representation, as you are missing out on protocol evolution features of protobuf. These features include properly preserving unknown fields, supporting field renames (as long as the type and tag number are the same), unrecognized enum values, and so on.

Thanks for the heads-up. We're aware of the benefits of using binary protobuf, and use it by default. This issue came up during manual debugging, where it is useful to be able to use curl and jq :).

Regards, and sorry again for the late follow-up,
Chris

Nadav Samet

unread,
Nov 9, 2020, 12:17:13 PM11/9/20
to chris....@soundcloud.com, ScalaPB
Correct - the names are not available for unknown fields, so it will not be possible to serialize to JSON in a way that it would be recognizable by a JSON parser that expects those field names. The best we can do is a ScalaPB-specific extension. Theoretically, you can still write a Go client that parses the JSONs, then looks for _unknownFields recursively in the JSON and applies it.
--
-Nadav

Chris Taylor

unread,
Nov 10, 2020, 4:22:15 AM11/10/20
to Nadav Samet, Chris Taylor, ScalaPB
Thanks, Nadav! I think for the our use-case, it makes sense to configure the JSON parser to ignore unknown fields, and to document the limitation that JSON doesn't handle unknown fields.

And thanks for all your hard work on ScalaPB!
Chris
Reply all
Reply to author
Forward
0 new messages