json.NewDecoder()/Decode() versus Unmarshal() for single large JSON objects

272 views
Skip to first unread message

Amit Saha

unread,
Dec 28, 2020, 7:22:39 PM12/28/20
to golang-nuts
Hi all, let's say I am a single large JSON object that I want to
process in my HTTP server backend.

I am trying to get my head around if there is any performance
advantage - memory or CPU to use the json.NewDecoder()/Decode()
mechanism versus the Unmarshal() function?

Thanks,
Amit

burak serdar

unread,
Dec 28, 2020, 7:36:06 PM12/28/20
to Amit Saha, golang-nuts
Unmarshal uses the decoder for unmarshaling. Unless you are planning
to process the JSON object piece by piece using a decoder, the two are
identical in terms of performance.

>
> Thanks,
> Amit
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CANODV3%3DU2KRRkvAAEfYqRtCVtYnh2dmGreqePF8QXLo1PriSPw%40mail.gmail.com.

Amit Saha

unread,
Dec 29, 2020, 5:37:38 AM12/29/20
to burak serdar, golang-nuts
On Tue, Dec 29, 2020 at 11:35 AM burak serdar <bse...@computer.org> wrote:
>
> On Mon, Dec 28, 2020 at 5:22 PM Amit Saha <amits...@gmail.com> wrote:
> >
> > Hi all, let's say I am a single large JSON object that I want to
> > process in my HTTP server backend.
> >
> > I am trying to get my head around if there is any performance
> > advantage - memory or CPU to use the json.NewDecoder()/Decode()
> > mechanism versus the Unmarshal() function?
>
> Unmarshal uses the decoder for unmarshaling. Unless you are planning
> to process the JSON object piece by piece using a decoder, the two are
> identical in terms of performance.

Thank you for confirming.

Axel Wagner

unread,
Dec 29, 2020, 5:51:23 AM12/29/20
to Amit Saha, burak serdar, golang-nuts
There is an important semantic difference between the two, which means you almost never want to use a `Decoder`: `Unmarshal` is for parsing a single json document, whereas a `Decoder` is for parsing a stream of concatenated documents, like so: https://play.golang.org/p/4uiNyJlNIKh. In particular, this means that using a `Decoder` silently drops trailing data and might not detect erronous json: https://play.golang.org/p/cuOAUnKCuEk
So, unless you specifically know that you have a stream of concatenated json documents `Decoder` is not actually doing what you want.

Amnon

unread,
Dec 29, 2020, 11:17:05 AM12/29/20
to golang-nuts
I always use `json.NewDecoder(r.Body).Decode(&payload)`

The code is more succinct than reading the entire body into a buffer, and then unmarshalling it.
And there is only one error to check.

If I was super concerned about people sending trailing gibberish to my server, I could call `dec.Buffered()` to see 
if there was anything left after the json object. I generally have not seen people sending garbage after their requests.
And it is not clear what the correct action in these case is. I generally will ignore it.

On occasions that I need to consume very large json arrays in my backend, without consuming much memory, 
I sometimes do something like

https://play.golang.org/p/Isw_3p7mR5-

Axel Wagner

unread,
Dec 29, 2020, 11:31:58 AM12/29/20
to Amnon, golang-nuts
On Tue, Dec 29, 2020 at 5:17 PM Amnon <amn...@gmail.com> wrote:
I always use `json.NewDecoder(r.Body).Decode(&payload)`

You shouldn't.

The code is more succinct than reading the entire body into a buffer, and then unmarshalling it. And there is only one error to check.

There is only one error to check, because you are ignoring one. Namely this one:

If I was super concerned about people sending trailing gibberish to my server, I could call `dec.Buffered()` to see 

if there was anything left after the json object. I generally have not seen people sending garbage after their requests.

How do you know, if you don't check? FTR, it's not just about sending garbage, it's also about requests accidentally being truncated or just generally garbled.

FWIW, if your concern is that the "read body and unmarshal" is two error checks, one solution would be to add a helper that does it for you - which you can then re-use. If that doesn't seem worth it because you would only use it once - well, then I'd question if the one extra error check is really that much of an issue.

IMO, `Unmarshal` is simply the correct function to use and if I don't have a good reason to make my software behave in strange ways in the presence of bugs, I prefer not to do that. But YMMV, of course.

 

Amnon

unread,
Dec 29, 2020, 12:06:55 PM12/29/20
to golang-nuts
How do you know, if you don't check? FTR, it's not just about sending garbage, it's also about requests accidentally being truncated or just generally garbled.

json.NewDecoder(r.Body).Decode(&payload) will return an error if the request is garbled or truncated.

The other nice thing about this code is that it composes nicely  with the context.
net/http provides the body as a Reader. And json NewDecoder takes a reader. The two fit together nicely, without any need
for adapters.

The API looks like it should be able to decode a large payload without allocating a large buffer to hold the bytes. 
Sadly this is not the way encoding/json is implemented. 

Axel Wagner

unread,
Dec 29, 2020, 12:27:58 PM12/29/20
to Amnon, golang-nuts
On Tue, Dec 29, 2020 at 6:07 PM Amnon <amn...@gmail.com> wrote:
How do you know, if you don't check? FTR, it's not just about sending garbage, it's also about requests accidentally being truncated or just generally garbled.

json.NewDecoder(r.Body).Decode(&payload) will return an error if the request is garbled or truncated.

Only if it is garbled in such a way that it doesn't have a valid document as a prefix.

The other nice thing about this code is that it composes nicely  with the context.
net/http provides the body as a Reader. And json NewDecoder takes a reader. The two fit together nicely, without any need
for adapters.

Yes, that's unfortunate. It suggests that it would be a good idea to do so, leading to loads of incorrect code in the wild.

The API looks like it should be able to decode a large payload without allocating a large buffer to hold the bytes. 
Sadly this is not the way encoding/json is implemented. 

I think the general wisdom is, that a decoded json value will be asymptotically the same size as the json string representing it. So there is at most a 2x overhead, which doesn't matter much in most practical use-cases.

Amit Saha

unread,
Dec 30, 2020, 12:57:26 AM12/30/20
to Amnon, golang-nuts
On Wed, Dec 30, 2020 at 4:07 AM Amnon <amn...@gmail.com> wrote:
>
> How do you know, if you don't check? FTR, it's not just about sending garbage, it's also about requests accidentally being truncated or just generally garbled.
>
> json.NewDecoder(r.Body).Decode(&payload) will return an error if the request is garbled or truncated.
>
> The other nice thing about this code is that it composes nicely with the context.
> net/http provides the body as a Reader. And json NewDecoder takes a reader. The two fit together nicely, without any need
> for adapters.

This is a good point, I like that viewpoint.

Amit Saha

unread,
Dec 30, 2020, 12:57:44 AM12/30/20
to Axel Wagner, golang-nuts
On Tue, Dec 29, 2020 at 9:50 PM Axel Wagner
<axel.wa...@googlemail.com> wrote:
>
> There is an important semantic difference between the two, which means you almost never want to use a `Decoder`: `Unmarshal` is for parsing a single json document, whereas a `Decoder` is for parsing a stream of concatenated documents, like so: https://play.golang.org/p/4uiNyJlNIKh. In particular, this means that using a `Decoder` silently drops trailing data and might not detect erronous json: https://play.golang.org/p/cuOAUnKCuEk
> So, unless you specifically know that you have a stream of concatenated json documents `Decoder` is not actually doing what you want.

Indeed, thank you for the explanation.
Reply all
Reply to author
Forward
0 new messages