High latency for no apparent reason

154 views
Skip to first unread message

Thomas Baldauf

unread,
Nov 26, 2016, 12:46:15 PM11/26/16
to Google App Engine
Hi! I have been facing this issue permanently since I have my Java app running on App Engine (Standard Edition), that is since 6 years. Users are complaining more and more, so I have to do something about it:
Occasionally, response latency for even the most simple requests to my app climbs up from some 100ms to crazy values like 20s, 30s or even 40s for no apparent reason. I attach a screenshot showing the trace of such a request from today. It was very quiet, so there weren't too many concurrent requests and it's also not a cold start request. 


As you can see, there's only a datastore.get and then nothing for a very long time, so it must be a very occupied (shared?) CPU just not being available for my request to be handled. How to deal with such cases? 
My approach was to move to App Engine Flexible which was very successful: performance was constantly great, but now I can't use it any more as Google decided to drop the compat runtime making my app incompatible for Flex Env. 

Maybe somebody from Google can investigate the issue and point me in the right direction. I don't want to move away from App Engine as I would have to rewrite lots of code depending on GAE SDK APIs. Please help!

Thanks,
Thomas

Adam (Cloud Platform Support)

unread,
Nov 27, 2016, 7:33:38 PM11/27/16
to Google App Engine
Since all that is known is that you have a handler that sometimes takes a very long time under certain conditions, and it's not RPC related, you'd have to post the code from the handler in order for the issue to be investigated.

If you're adverse to posting your application code on a public forum, you can get architecture support through a Cloud Platform Support package. I can say that the issue you're describing is not a known general issue with the platform, so it's most likely specific to your application.

Thomas Baldauf

unread,
Nov 28, 2016, 1:21:20 AM11/28/16
to Google App Engine
It would be hard to post my code here, because I'd have to include HttpFilters and the whole DAO/Cache-layer. But there's really nothing going on that's needing a lot of CPU resources. One thing I asked myself: could it be GC occupying the process, because the frequency of such slow requests is like every couple of hours? If so, is there anything we can do about it?

Adam (Cloud Platform Support)

unread,
Nov 29, 2016, 1:55:31 PM11/29/16
to Google App Engine
GC pauses could very well be the culprit. What do the Utilization and Memory Usage graphs look like at the time of the slow request?

Thomas Baldauf

unread,
Nov 29, 2016, 4:04:07 PM11/29/16
to Google App Engine
CPU megacycles went up from 1300 to 20000 and down again, so this is really suspicious. At the same time, memory usage was constantly at about 215 MB and only went up slightly (a few MB) in the next minutes. Any ideas?

Adam (Cloud Platform Support)

unread,
Dec 2, 2016, 1:41:09 PM12/2/16
to Google App Engine
This seems consistent with a GC pause. The size of the allocated heap would not necessarily be reduced to show a drop in memory usage after a GC. If the issue was due to the shared core not being available the Utilization would not have shown a spike.

On the standard runtime you don't have the ability to pass JVM options to tune the GC or use System.gc(), unfortunately. You'd be limited to performance tuning your code to reduce GCs, which is a fairly broad topic.

Thomas Baldauf

unread,
Dec 4, 2016, 6:27:04 AM12/4/16
to Google App Engine
Ok, thank you! I'll try to optimize my code to produce less objects where possible.

Adam (Cloud Platform Support)

unread,
Dec 5, 2016, 12:32:54 PM12/5/16
to Google App Engine
No problem, glad I could help!

Thomas Baldauf

unread,
Jan 11, 2017, 3:08:34 AM1/11/17
to Google App Engine
Well, it happened again, and it was a period of very high latency spikes between 12pm and 8pm (US/central) as you can see in the Stackdriver graphs below. The strange thing is that high latency stayed high after 5pm when requests/s dropped to almost 0.
Can you please investigate, my paying customers (schools) are complaining that their students have to wait for 20 seconds to almost 1min to start lessons or submit their results. This is not acceptable!
Reply all
Reply to author
Forward
0 new messages