Unmarshal variable JSON structures

640 views
Skip to first unread message

Andrew Burian

unread,
Dec 26, 2022, 1:04:41 PM12/26/22
to golan...@googlegroups.com
Hey all, wondering if there's an existing best practice for this so I'm not reinventing the wheel.

It's frustratingly common to have APIs where the response JSON can be one of several distinct objects, and the indicator of which object has been returned is itself a property of the object.

So you might get a `{ "type": "aType", "aField" : ."..." }` or a `{ "type": "bType", "bField": "..." }` response from the same API.

What's the best way to deserialize in these situations? 

Ideas I've tried so far:
  • Unmarhsal twice, once into a struct that just defines the `Type` property and ignores all other fields, then again based on the type set the first time.
    Works, but for large objects it's extremely wasteful.

  • Unmarshal into a large struct that defines all possible subtypes as anonymous struct fields so their declarations are treated as being on the outer struct, then cast to the appropriate type after unmarshaling to mask all the unfilled fields.
    Again, works, but feels awful. It also presents a real issue when you need to verify that no fields other than the expected fields for the given type were present, which you can usually do with Decoder.DisallowUnknownFields, but silently succeeds if one of the fields is valid for a different object type.
I'm trying to do this as much with stdlib as possible. I've looked into some other libraries that make heavy use of JSON decoding and have seen both my above ideas, as well as just entirely custom Unmarshaller implementations. Hopefully it doesn't come to that.

Cheers,
Andrew

Marcin Romaszewicz

unread,
Dec 26, 2022, 2:08:54 PM12/26/22
to Andrew Burian, golan...@googlegroups.com
This is a very annoying problem, and one that we hit a lot in my project (https://github.com/deepmap/oapi-codegen), where we generate Go models from OpenAPI specs, and the OpenAPI "AnyOf" or "OneOf" schema does precisely this.

You can partially unmarshal; store your "type" field in a typed variable, and use json.RawMessage to defer parsing to later, when you know the type. This still gets annoying, because if your field names are dynamic, you need to override the default unmarshaling behavior to produce a map of field names to json.RawMessage. If you jump through these hoops, you can avoid parsing twice. Once you've partially parsed your object, you can create some functions on it, such as "AsAtype()" or "AsBType()", which switches on "type" and returns the correct concrete object.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAPyCRsvvzzscpgfjj2vPQAi5_DVvrfhxLMu_OhuETzKAd7N1xQ%40mail.gmail.com.

Marcin Romaszewicz

unread,
Dec 26, 2022, 2:41:03 PM12/26/22
to Andrew Burian, golan...@googlegroups.com
Here is an example of what I mean.


This is why I wrote a code generator, it's tedious by hand :)

p...@morth.org

unread,
Dec 29, 2022, 6:13:36 AM12/29/22
to golang-nuts
Hi!

It's not possible to decode this data without scanning it twice. It's a flawed design where someone has chosen to make that restriction. programming language doesn't matter.
The only way to avoid parsing it twice is to decode to a map[string]any and then use that as-is. Which I suppose you might end up doing in some languages, but not in Go, for practical reasons.

I'd go with the first option, it's the easier one. However, if there's a JSON schema, you should probably also consider using a compiler such as quicktype.
You'll notice it will merge all the fields into a single struct and use pointers for all optional field. Not great from a Go coding point of view but OTOH by compiling the schema you save a lot of time not writing the structs manually and also not needing to keep up with future schema changes, if they're frequent. To easily assign to the pointer fields I added a func newval[T any](v T) { return &v } to my package.

Regards,
Per Johansson

Fabrice Vaillant

unread,
Jan 3, 2023, 4:24:51 PM1/3/23
to golan...@googlegroups.com
Hello

map[string]any might be frustrating if aField or bField are complex since you might want to convert them from any to dedicated struct. I would suggest map[string]json.RawMessage as a possible improvement. It allows delaying the parsing of part of the struct for this kind.

However I don't think it's possible to "verify that no fields other than the expected fields are present", without coding the validation logic no matter the method chosen.

Fabrice
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages