Some API endpoints taking a long time to start work?

91 views
Skip to first unread message

Ben Rometsch

unread,
Aug 17, 2016, 5:19:17 AM8/17/16
to Google App Engine
We are running a Java Cloud Endpoints API on Flexible VMs. Most of the endpoints complete in a timely manner, but from time to time we see things like the attached in the trace:



Does anyone know why/what the platform is doing in the first 1000ms? We have an AuthFilter that runs to check users are authenticated, but normally this runs quickly at the start of the request. We have some API urls that dont go through Cloud Endpoints and they don't exhibit this issue, so are suspicious that its the cloud endpoints but we are not sure? 

Nick (Cloud Platform Support)

unread,
Aug 18, 2016, 6:18:23 PM8/18/16
to Google App Engine
Hey Ben,

If you've only observed the issue on methods with an auth filter, and the gap in time is always before any other line than the authfilter, it's quite likely this is the cause, but here are some other things to check:

You could check the "ms" vs "cpu_ms" times in the log listing for that request, or even check if there's a "pending_ms" field. This could also indicate what's going on.

If "ms" is significantly higher than "cpu_ms", it means you're spending time waiting on remote calls. If "pending_ms" is high, this means the request spent time waiting in a pending queue to find an instance able to handle it (this would require a change to scaling settings to improve, allowing more idle instances to be ready).

So, after checking on that, it might come back to the fact that the auth filter is taking a while. The nature of the auth filter would be quite important to diagnosing that any further. 

I hope this is helpful so far, let me know your findings!

Cheers,

Nick
Cloud Platform Community Support

Ben Rometsch

unread,
Aug 19, 2016, 6:07:01 AM8/19/16
to Google App Engine
Thanks for the reply Nick.

I dont think I was clear in my previous post, but when I think the logs are showing is that the call into the AuthFilter is stalling for around 1000ms before it actually starts into that method. I'm not sure what could be causing that delay? Or maybe I'm reading this screen wrong? 

Nick (Cloud Platform Support)

unread,
Aug 19, 2016, 3:03:22 PM8/19/16
to Google App Engine
Hey Ben,

Thanks for the quick reply. Did you manage to check the "cpu_ms" vs "ms" for the request log in question?

As for the way to read the trace, I think the info log tags are showing up in the timeline when the method completes (with a slight logging delay) rather than when it starts, as evidenced also by the pattern of the other method calls. The rectangles show the duration of the method, while the info log tags are showing up in the timeline associated with these, but are being inserted by another process. Do you have some sort of config turned on to log all method calls?

Finally, does the auth filter code require some intensive CPU or a network call? 


Cheers,

Nick
Cloud Platform Community Support

Ben Rometsch

unread,
Aug 22, 2016, 7:36:41 AM8/22/16
to Google App Engine
Hi Nick,

I'm not sure where to find the cpu_ms vs ms times? Can you point me to where I would find that? 

We're using cloud endpoints, so I think the logging is being managed by that library as opposed to us doing any specific logging. 

It's possible the AuthFilter could be waiting on FB - I'll add some logging in there to see. 

Thanks for your help.  

Nick (Cloud Platform Support)

unread,
Aug 30, 2016, 4:44:57 PM8/30/16
to Google App Engine
Hey Ben,

The cpu_ms and ms data can be found in the logs in the Developers' Console. Let me know how the timing code works out!


Cheers,

Nick
Cloud Platform Community Support

Ben Rometsch

unread,
Sep 2, 2016, 5:34:12 AM9/2/16
to Google App Engine
Ah ok - I see this now. For some reason I cant get hold of the log for this particular request, but I can see that most of my current endpoints are showing 0ms cpu_ms, which would suggest that they are optimal, and are always waiting on RPCs to complete? Is that correct? 

I'll get more logging in the auth filter and see where that gets us! 

Thanks very much,
Ben

Nick (Cloud Platform Support)

unread,
Sep 2, 2016, 9:53:12 AM9/2/16
to Google App Engine
Hey Ben,

Yep, that's correct. As described in the doc linked, cpu_ms is the milliseconds of cpu cycles used by the actual runtime, as opposed to when it's simply waiting on RPCs. Feel free to update this thread whenever you get more info!


Cheers,

Nick
Cloud Platform Community Support

Reply all
Reply to author
Forward
0 new messages