json decode is very slow

868 views
Skip to first unread message

Sarath Prabath Redlapalli Jaya

unread,
Apr 28, 2020, 12:49:08 PM4/28/20
to golang-nuts
req := &mystruct{}
json.NewDecoder(r.Body).Decode(req)

We've instrument the above code block and reading request body bytes is taking very long i.e., upto 5 secs sometimes reaching upto 20 seconds at the throughput of ~4-5K RPM

The Request Body Size Metrics are as follows

Average: 73190 Bytes
90th percentile: 170862 Bytes
99th percentile: 467638 Bytes

Tried ioutil.ReadAll too. But, the read time is same. We're not sure why it's so very slow on production with HIGH BANDWIDTH Network line.

Also, there are 5% of errors in this API with following errors while decoding json data
  1. unexpected EOF
  2. Error in decoding request: read tcp 127.0.0.1:8080->127.0.0.1:48896: i/o timeout
We're using Kubernetes with Istio 1.0.5 on GCP. These errors are probably from Envoy Sidecar, but since the Envoy acts as a reverse proxy, not sure why we can get EOF or Timeout Errors since end user connection terminates at proxy itself.
Message has been deleted

Jesper Louis Andersen

unread,
Apr 30, 2020, 5:04:41 AM4/30/20
to Sarath Prabath Redlapalli Jaya, golang-nuts
On Tue, Apr 28, 2020 at 6:48 PM Sarath Prabath Redlapalli Jaya <harrysa...@gmail.com> wrote:
We've instrument the above code block and reading request body bytes is taking very long i.e., upto 5 secs sometimes reaching upto 20 seconds at the throughput of ~4-5K RPM

The Request Body Size Metrics are as follows

Average: 73190 Bytes
90th percentile: 170862 Bytes
99th percentile: 467638 Bytes


Let us assume we have 500 kilobytes of data. At 20 seconds processing time, you are looking at something like 25 bytes per millisecond, which seems rather low for a decent CPU. Try measuring the time it takes to ioutil.ReadAll from the Reader I assume you have access to. My hunch is you are spending most of the time there, and relatively little on decoding the JSON data you have. Once you've done this, you will have some more knowledge of where to look: the network, or the JSON decoder.

If this is a web request, you should also look at how time it takes the other end to feed data to the GCP process. Chances are that the problem is in the other end, and you are looking at a slow sender, especially if you have a large set of mobile devices out there with variety in their internet connections.

Sarath Prabath Redlapalli Jaya

unread,
Apr 30, 2020, 7:14:43 AM4/30/20
to Jesper Louis Andersen, golang-nuts
Thanks Amnon & Hi Jesper,

You're correct. Most of the time is actually spent in ioutil.ReadAll() . Trying to debug network issues related to GCP, GKE & the functionality of ISTIO sidecar reverse proxy, but seems okay so far couldn't find anything.

But, I've performed an interesting experiment. 

Tried the same API in Ruby with Sinatra by mirroring the traffic. When instrumented the API Response Times, all were <= 100 ms. Any clues on how to debug this?I can't seem to find much on inner workings of http.Request.Body -> io.ReadCloser

Also, If we're are looking for flaky end-user connections i.e., mobile wireless devices etc., the connection termination is at reverse proxy itself and the lag in reading request shouldn't be observed at the application level. Correct me if I'm wrong.

Regards

Jason E. Aten

unread,
May 1, 2020, 10:54:47 PM5/1/20
to golang-nuts
you could try ffjson https://github.com/pquerna/ffjson

Amnon Baron Cohen

unread,
May 2, 2020, 2:39:40 AM5/2/20
to golang-nuts
That was my fault too. encoding/json is slow, but not that slow.

Unfortunately the title of this thread is a bit misleading.
Apparently what is taking time here is reading the data off the wire,
rather than decoding the data once it has arrived. 

So it seems that that the slowness is probably nothing to do with json decoding
and nothing to do with Go.

Sarath says that a Ruby implementation is fast.
So I would get tcpdump traces for both, and walk your way through the relevant 
sessions, and see what the differences are.

Jesper Louis Andersen

unread,
May 3, 2020, 5:34:12 AM5/3/20
to Sarath Prabath Redlapalli Jaya, golang-nuts
The general rule is to run a corollary of Dijkstra: experiments can only show the presence of problems in a system, not their absence :)

That is, seek to validate your assumptions rather than trying to guess what is wrong with the system.

The obvious things to check against are HTTP/1.1 vs HTTP/2 operation. There are times where the multiplexing nature of HTTP/2 needs some more gentle coercion in order to run fast. And I wouldn't a priori assume your proxy takes all of the request given the sizes of the inputs. I'd much rather assume it will use some kind of small buffering. Otherwise, your proxy are sitting with 512 kilobyte of data in a buffer and that is going to be a problem given enough connections. It might do buffering up to 4k or something such however. Above that, it is likely to start streaming the data in a little bit at a time. This is definitely something to do some investigation on.

If you cut out GCP/GKE/ISTIO you can verify if the culprit is Go itself, or something in front of it. Since you have a fast and a slow system, you can slot each in and look for differences in how the system is operating.
--
J.

Amnon Baron Cohen

unread,
May 3, 2020, 5:59:35 AM5/3/20
to golang-nuts
Excellent advice Jesper!


On Sunday, 3 May 2020 10:34:12 UTC+1, Jesper Louis Andersen wrote:
The general rule is to run a corollary of Dijkstra: experiments can only show the presence of problems in a system, not their absence :)

That is, seek to validate your assumptions rather than trying to guess what is wrong with the system.

The obvious things to check against are HTTP/1.1 vs HTTP/2 operation. There are times where the multiplexing nature of HTTP/2 needs some more gentle coercion in order to run fast. And I wouldn't a priori assume your proxy takes all of the request given the sizes of the inputs. I'd much rather assume it will use some kind of small buffering. Otherwise, your proxy are sitting with 512 kilobyte of data in a buffer and that is going to be a problem given enough connections. It might do buffering up to 4k or something such however. Above that, it is likely to start streaming the data in a little bit at a time. This is definitely something to do some investigation on.

If you cut out GCP/GKE/ISTIO you can verify if the culprit is Go itself, or something in front of it. Since you have a fast and a slow system, you can slot each in and look for differences in how the system is operating.

On Thu, Apr 30, 2020 at 1:14 PM Sarath Prabath Redlapalli Jaya <harrysa...@gmail.com> wrote:
Thanks Amnon & Hi Jesper,

You're correct. Most of the time is actually spent in ioutil.ReadAll() . Trying to debug network issues related to GCP, GKE & the functionality of ISTIO sidecar reverse proxy, but seems okay so far couldn't find anything.

But, I've performed an interesting experiment. 

Tried the same API in Ruby with Sinatra by mirroring the traffic. When instrumented the API Response Times, all were <= 100 ms. Any clues on how to debug this?I can't seem to find much on inner workings of http.Request.Body -> io.ReadCloser

Also, If we're are looking for flaky end-user connections i.e., mobile wireless devices etc., the connection termination is at reverse proxy itself and the lag in reading request shouldn't be observed at the application level. Correct me if I'm wrong.

Regards

On Thu, Apr 30, 2020 at 2:34 PM Jesper Louis Andersen <jesper.lo...@gmail.com> wrote:
On Tue, Apr 28, 2020 at 6:48 PM Sarath Prabath Redlapalli Jaya <harrysa...@gmail.com> wrote:
We've instrument the above code block and reading request body bytes is taking very long i.e., upto 5 secs sometimes reaching upto 20 seconds at the throughput of ~4-5K RPM

The Request Body Size Metrics are as follows

Average: 73190 Bytes
90th percentile: 170862 Bytes
99th percentile: 467638 Bytes


Let us assume we have 500 kilobytes of data. At 20 seconds processing time, you are looking at something like 25 bytes per millisecond, which seems rather low for a decent CPU. Try measuring the time it takes to ioutil.ReadAll from the Reader I assume you have access to. My hunch is you are spending most of the time there, and relatively little on decoding the JSON data you have. Once you've done this, you will have some more knowledge of where to look: the network, or the JSON decoder.

If this is a web request, you should also look at how time it takes the other end to feed data to the GCP process. Chances are that the problem is in the other end, and you are looking at a slow sender, especially if you have a large set of mobile devices out there with variety in their internet connections.



--
J.
Reply all
Reply to author
Forward
0 new messages