Debugging large gap between appLatencySeconds and latencySeconds

289 views
Skip to first unread message

Eddie Zhou

unread,
Sep 16, 2020, 2:42:54 PM9/16/20
to Google App Engine
Hi all,

We have a Go App Engine Flex app that occasionally sees a large disparity between appLatencySeconds and latencySeconds in the nginx.request logs, on the order of sub-1s for the former and close to 10s for the latter.  Our request traces for the application code confirm the former.

We thought this might have to do with the App Engine request queueing, but this post seems to indicate the latencySeconds value is independent of that.  This thread points to looking at Load Balancer logs, but it doesn't appear we can look at those associated with our App Engine apps.

Does anyone have pointers on how to debug this, or relatedly, request queuing in general?  These slow request times for our clients are tough when they're not due to our application code.

Thanks!

Screen Shot 2020-09-16 at 11.37.35 AM.png

Eddie Zhou

unread,
Sep 16, 2020, 2:55:37 PM9/16/20
to Google App Engine
We were actually able to find the Load Balancer logs (hoping this helps someone else in the future) via resource.type="http_load_balancer" and connecting trace to jsonPayload.trace from App Engine nginx.request.

In our case, this shows that the load balancer received the client request at 10:05:53 and App Engine saw it at 10:06:04, which seems to mirror the skew and contradict the first linked thread above.

What would be any suggested App Engine configuration changes to reduce this time?  Does this indicate that we have too many requests for our instances to handle?  For more information, the number of instances didn't scale up during this time -- possibly because we have target cpu utilization set to 0.8 and the utilization was very much below that.  I'm not sure how that setting interacts with request backlog -- some signals are alluded to here but that post is from 2017, and also unclear if it refers to Standard or Flex.

Eddie Zhou

unread,
Sep 16, 2020, 2:56:32 PM9/16/20
to Google App Engine
Sorry -- correction, our target cpu utilization is at the default of 0.5, but the utilization at the time was very much below that.

Juan Carlos Gomez

unread,
Sep 17, 2020, 11:37:16 AM9/17/20
to Google App Engine
Hello,
appLatencySeconds and latencySeconds are equal to nginx variables. With, appLatencySeconds == $upstream_response_time [1] and latencySeconds == $request_time[2].
For clarity I'll include the definitions:
appLatencySeconds :$upstream_response_time: keeps time spent on receiving the response from the upstream server.
latencySeconds: $request_time: request processing time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client.

I'll focus on how to debug this issue since it is expressed in the title. There is a guide on `Optimizing Application Latency with Load Balancing` [3], where it is expressed how to measure latency in different scenarios.

However, it seems to me that the issue could be due to a bad network throughput of some of the clients of your application, I would check the user-agent, payload size of such request that take long to be passed to the App Engine app to determine if that could be the issue.

Reply all
Reply to author
Forward
0 new messages