GAE Latency & Instance issues

714 views
Skip to first unread message

Trevor Chinn

unread,
Jun 30, 2016, 3:45:10 AM6/30/16
to Google App Engine
Hello ladies and gentlemen, I am here to hopefully draw on some collective knowledge about App Engine and its intricacies. 

For the last two weeks our (my company) site has been experiencing very odd latency issues, and having now tried about 7 different methods of solving it, we are left at exactly where we began: Rising costs with performance that is much lower than previously. 



Essentially what happens is that say 50-60% of our requests are served normally, however the remainder have these extremely long "pauses" in the middle of the trace which is basically during the "processing" phase of the backend handling (after the datastore & memcache data has been retrieved). Here is an example of a single page that in the space of an hour had wildly different loading times for users. The vast majority were the same thing, grab 3 things from memcache and spit out the html retrieved from memcache. That's it... 


And some individual traces to see what is happening




So essentially the troubleshooting steps we took to figure out what was going wrong. 
  • Checked all deployed code over the week preceding and following the latency spike to ensure we hadn't let some truly horrendous, heavy code slip through the review process. Everything deployed around that period was rather light, backend/cms based updates, hardly anything touching customer-facing requests. 
  • Appstats, obviously. On the development server (and even unloaded test versions on the production server) such behavior is not seen. Didn't help. 
  • Reducing unnecessary requests (figure 1) - We noticed some of our ajax-loaded content was creating 2-3 additional, separate requests per user-page-load, and as such refactored the code to only call those things when absolutely necessary, and eliminated one altogether. For the most part, a page load now equals one request. This had no effect on the latency spikes
  • Created a separate system that meant that our backend task-based processing was cut down by 90%, and thus the instance average load was reduced significantly. This had the opposite effect and average latency actually climbed, I suspect because of the extensive memcache use with large chunks of data (tracking what things should be updated by the backend tasks)
  • Separated the front end and tasks-back-end into modules/services so that frontend requests could have 100% instance attention. This had a small effect, but the spikes are still regularly happening (as seen in the above traces). 
  • Played with max_idle_instances  - This had a wonderful effect of halving our daily instance costs, with almost no effect on latency. When this is set to automatic, we get charged for a huge amount of unused instances, it actually borders on ludicrous (figure 2) 
  • Played with max_concurrent_requests (8->16->10) which only served to make the latency issues worse. 
  • Hours and hours pouring over logs, traces, dashboard graphs. 
 Figure 1 (Since the latency spike on June 5th, we have worked to reduce meaningless requests through API calls or task queuing) 


Figure 2 (14:40 is when the auto-scaling setting was deployed)


What I have noticed is when the CPU cycles spike, so does the latency. So it would lead in the direction that our requests are starved for CPU time, however now that we have deployed the instance auto scaling (and are paying for an average of around 8 instances vs 4-5 previously), it has not improved the latency, which confuses me considerably. 

If it were all requests that had slowed down, our code would clearly need optimization. If the rise in latency coincided with a change in our frontend processing, it would make sense, but there were only very light backend changes deployed within +/- 2 days of the first latency spike (figure 3)

Figure 3 - Latency started rising on June 5th

 
Some other images that may assist in understanding the issue:

CPU Cycles (today)


CPU Cycles (2 month)



Is there anyone out there that can proffer some advice of where to poke, prod or peer next? I have only been using App Engine for 1.5 years now, but this company has been on the platform for about 4 years without these kinds of issues.

troberti

unread,
Jun 30, 2016, 3:53:53 AM6/30/16
to Google App Engine
I recommend trying to set max_concurrent_requests to 2. As you said, higher values only makes latency (much) worse. In our experience, a value of 2 gets the best latency results without an increase in cost.

We still have some of those pauses mid-request in a trace, and I really would like to know where they come from (and get rid of them), but they seem much shorter with a lower max_concurrent_requests value.

Trevor Chinn

unread,
Jun 30, 2016, 4:09:56 AM6/30/16
to Google App Engine
Thanks for the advice! I hadn't considered setting it that low because we are on F4 instances and I would really, really hope that they could handle at least the default 8 concurrent requests. That being said, I will throw the front-end on those settings this evening and monitor until noon tomorrow. The only thing I worry about is the amount of instances increasing rapidly and blowing out our monthly costs. We have 10 PV per sec during low times, and up to 30/35 per sec during peak hours. That means to maintain our current costs (3-5 billed f4 instances) we would need a sub-300ms request-completion time during peak, if my muddled math is correct. I suppose if there are less requests going to each instance, we could drop down to a lower class, perhaps?



What instance class do you run, if I may ask?

troberti

unread,
Jun 30, 2016, 5:03:23 AM6/30/16
to Google App Engine
Right, you should definitely test and see what the results are. My first inclination was also to increase max_concurrent_requests, but because then all those requests have increased latency, the actual QPS per instance decreased! Lowering max_concurrent_requests decreased request latency, so each instance could process more requests/second.

We use F1 instances, because we do not need the additional memory, and our requests perform mostly RPCs. In our testing, faster instance classes do process requests faster, but also cost significantly more.  F1s provide the best performance/cost ratio for us. This could be a Python thing, not sure. Again, you should really test and figure out what is the best for your application+runtime.

Trevor

unread,
Jun 30, 2016, 5:44:34 AM6/30/16
to Google App Engine
Well, I have to say thank you very, very much. Thanks to your advice we have our lowest latency in 3 years! Sub 300ms average.  As expected though, we are now sitting on 21 billed f4 instances, which will potentially cost us in the order of 3x our current ($30-40 -> $100+), but we will tweak that from tomorrow onwards. Peak hour is about to hit so we are going to see if the system can keep sub-300ms at the current "automatic" setting for scaling. But yes, once again, thank you for solving in 5 minutes what I have been working on doing for 2 weeks (my tears are from joy and sadness all at once)

troberti

unread,
Jun 30, 2016, 7:13:42 AM6/30/16
to Google App Engine
Great to hear that it helps. Actually, if you are using F4s, I might try a slightly higher max_concurrent_requests , say 4. Again, test and compare to be sure.

Finally, to reduce costs, I would recommend to set max_idle_instances to 1. Keep min_idle_instances to what you need for your application. For us this reduces cost significantly without any apparent drawbacks.

Thomas Taschauer

unread,
Jul 1, 2016, 3:28:51 AM7/1/16
to Google App Engine
One thing I noticed is that the first request(s?) served by a fresh instance will always be really slow. Not that they stay in the request queue for a longer time (which is expected behaviour of course), but they have long "pauses" in the middle of the request as you mentioned before, usually up to 5 seconds in my case.

What I'm going to test next is upgrading to F2 - hoping for smaller pauses due to a faster CPU - and reverting other scaling-options to default (used max_concurrent_requests and max_idle_instances before) hoping for the AppEngine scaler to figure it out himself. :)

Nick (Cloud Platform Support)

unread,
Jul 5, 2016, 2:24:46 PM7/5/16
to Google App Engine
Hey Folks,

As for the amount of concurrent requests an instance can handle, it depends on the CPU usage on the instance, and the number of concurrent requests the instance can handle is dependent on that, and the distribution of requests to instances is also dependent on statistical trends in latency. It's possible to see variable concurrent request performance dependent on how requests use up the resources for the given instance class and the latency statistics of requests on an instance.

I have one small recommendation relating to the mysterious gaps of time in requests. Using System.currentTimeMillis() calls (this is for java, but other runtimes have equivalent system calls) to surround calls to complex libraries, or any calls which require network activity, and you could be able to determine what exactly is taking up that time. It might be something optimize-able, or it might be a network issue. Depending on the nature of the network call itself, it could also be optimize-able.

Regards,

Nick
Cloud Platform Community Support

Christian F. Howes

unread,
Jul 6, 2016, 6:05:11 PM7/6/16
to Google App Engine
2 thoughts:

 - remember that when servicing multiple simultaneous requests the CPU is switching between them.  I bet the pauses are (in part) due to waiting for a turn on the CPU - your request got "paused" while the memcache call was made, and then had to wait for CPU once the memcache response was ready.
 - based on the description of what the pages are doing, have you tried using "Edge Cache"?  it's very under-documented, but when the state of a page is the same for many users it's a big $$$ saver.  see https://code.google.com/p/googleappengine/issues/detail?id=2258 and https://cloud.google.com/appengine/docs/python/how-requests-are-handled#cache-control_expires_and_vary

good luck!

cfh

Mark Cummins

unread,
Jul 15, 2016, 2:04:33 PM7/15/16
to Google App Engine
Thank you so much for this! We also got bitten hard by this problem seemingly out of nowhere in the last few weeks. The same request latency varying from 200ms to over 10,000ms in some cases, with no obvious reason except mysterious gaps in the traces. It's a hard one to debug - there's no obvious way to tell that the instances are being CPU starved.

Rajesh Gupta

unread,
Jul 16, 2016, 1:38:44 AM7/16/16
to google-a...@googlegroups.com
Hi,
Our team has the same issue, and it is still bothering us a lot. 

It was raised with the Google Appengine production team. 

To support the Google team for the evidence, our team slimmed down our project and submitted minimal code which can be verified by the appengine team


"2) This happens in part because the the servlet container is threaded and will attempt to service requests as soon as it is able. There can also be some startup activity that is not handled by the warmup request and can only occur within the first request (eg. classpath scanning). How much depends on the libraries and frameworks you are using and how much you are able to handle in your warmup request handler."





--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/90b5d4dc-98c1-44b1-bf6f-147302350e7e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Regards,
Rajesh
Accounting/Inventory/Orders on Google Cloud Platform and Mobile

Anoop

unread,
Jul 25, 2016, 12:27:50 PM7/25/16
to Google App Engine
can you explain what did you do to reduce the latency, i have the same issue?

Trevor

unread,
Jul 27, 2016, 12:05:18 AM7/27/16
to Google App Engine
I simply followed the advice kindly offered in this thread and experimented with lowering the max concurrent requests. Our best settings ended up being 2 max concurrent requests with 2 max idle instances.

I think an important part is that before this tweaking, we separated our front-end and task modules (recently renamed services) such that their instances could be completely independent.

By taking those two steps, our front-end average latency fell from an average of 600ms to 180ms, and our daily instance costs went from $100 to $30. So performance improved while cutting costs by >60%. We're pretty happy with the results.

Anoop

unread,
Jul 27, 2016, 4:42:18 AM7/27/16
to Google App Engine
but my latency is only below 1.5 seconds when i looked at the log that's ok for me, i have an issue with the time taking for the prepossessing or is it the same thing, i will start a new thread with issue anyway

Deepak Singh

unread,
Jul 29, 2016, 12:16:10 PM7/29/16
to google-a...@googlegroups.com
We have our application 
Java, flexible env - autoscaling

Almost all of our logs show the following statistics

ms=346 cpu_ms=0 cpm_usd=1.32769e-7 loading_request=0 instance=- app_engine_release=1.9.42 trace_id=-

Our users face front end latency. Is it any server issue Or any corrective measures can be taked.

 


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.

For more options, visit https://groups.google.com/d/optout.



--
Deepak Singh

Christian F. Howes

unread,
Jul 29, 2016, 1:04:47 PM7/29/16
to google-a...@googlegroups.com
A helpful stat would be the "pending latency" (the amount of time that
the request waited for an available instance to run on). This was part
of the logs in the old log viewer and is missing is the new log viewer.
i believe someone has filed an issue requesting that it be re-added.

if any of your requests are cacheable, setting proper cache headers will
trigger google's edge cache and dramatically reduce server side latency
as well.
>> <https://groups.google.com/d/msgid/google-appengine/c8b7af68-6f26-40fe-95c4-cea0b96e5562%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .

Christian F. Howes

unread,
Jul 29, 2016, 3:29:44 PM7/29/16
to Google App Engine
looks like the pending MS logs mystery was solved: https://groups.google.com/forum/#!topic/google-appengine/8mHxjsSCOYc  so if you don't see that in your logs you can rule that out as a problem.

Esteban Bonham

unread,
Nov 27, 2017, 12:49:06 AM11/27/17
to Google App Engine
I recently migrated to endpoints framework version 2 and started paying closer attention to latency. In my case I have a very low traffic app. I even tried just using the standard python echo example found here: https://cloud.google.com/endpoints/docs/frameworks/python/get-started-frameworks-python 

So in my case I made sure one backend instance is running and then I started issuing basic requests via curl. I also see large unexplained gaps in my appstats and the corresponding log latency is much lower than the actual latency reported by curl. The unexplained latency overhead is very sporadic and ranges from 200ms to 1.3 seconds. Is this just the nature of cloud endpoints framework api? Here are some examples:

MacBook-Pro$ curl -H "Content-Type: application/json" -X POST -d '{"content":"hello world"}' https://testtictactoe-164905.appspot.com/_ah/api/echo/v1/echo -w "@curl-format.txt"

{

 "content": "hello world"

}

time_namelookup: 0.005

time_connect: 0.019

time_appconnect: 0.143

time_pretransfer: 0.144

time_redirect: 0.000

time_starttransfer: 0.660

--------

time_total: 0.660


MacBook-Pro$ curl -H "Content-Type: application/json" -X POST -d '{"content":"hello world"}' https://testtictactoe-164905.appspot.com/_ah/api/echo/v1/echo -w "@curl-format.txt"

{

 "content": "hello world"

}

time_namelookup: 0.005

time_connect: 0.021

time_appconnect: 0.145

time_pretransfer: 0.145

time_redirect: 0.000

time_starttransfer: 1.028

--------

time_total: 1.028



MacBook-Pro$ curl -H "Content-Type: application/json" -X POST -d '{"content":"hello world"}' https://testtictactoe-164905.appspot.com/_ah/api/echo/v1/echo -w "@curl-format.txt"

{

 "content": "hello world"

}

time_namelookup: 0.528

time_connect: 0.542

time_appconnect: 0.668

time_pretransfer: 0.668

time_redirect: 0.000

time_starttransfer: 0.912

--------

time_total: 0.912


MacBook-Pro$ curl -H "Content-Type: application/json" -X POST -d '{"content":"hello world"}' https://testtictactoe-164905.appspot.com/_ah/api/echo/v1/echo -w "@curl-format.txt"

{

 "content": "hello world"

}

time_namelookup: 0.005

time_connect: 0.019

time_appconnect: 0.141

time_pretransfer: 0.141

time_redirect: 0.000

time_starttransfer: 0.903

--------

time_total: 0.903


MacBook-Pro$ curl -H "Content-Type: application/json" -X POST -d '{"content":"hello world"}' https://testtictactoe-164905.appspot.com/_ah/api/echo/v1/echo -w "@curl-format.txt"

{

 "content": "hello world"

}

time_namelookup: 0.005

time_connect: 0.018

time_appconnect: 0.151

time_pretransfer: 0.151

time_redirect: 0.000

time_starttransfer: 1.597

--------

time_total: 1.597 
Reply all
Reply to author
Forward
0 new messages