Huge execution times. DeadlineExceededException(s). Crushing Weltschmerz.

Cesium

unread,

Jun 1, 2012, 11:50:08 AM6/1/12

to google-a...@googlegroups.com

I am seeing wild execution times for requests that nominally take 500ms. E.g.,

xxx.xxx.xxx.xxx - - [01/Jun/2012:07:21:08 -0700] "POST /xxx/sdtp?t=ACCESS_POINT_APP_DATA_UPLOAD HTTP/1.1" 200 64 - - "xxx.appspot.com" ms=54346 cpu_ms=3858 api_cpu_ms=8 cpm_usd=0.107224 loading_request=1 pending_ms=1566 instance=00c61b117cfadbd3e42218864e6cc2ffe3449b

I'm also seeing DeadlineExceededException(s). E.g.,

com.google.apphosting.api.DeadlineExceededException: This request (41c5eb91df9afc7a) started at 2012/06/01 14:27:26.021 UTC and was still executing at 2012/06/01 14:28:25.488 UTC.

(that's a whole minute.)

The absence of any feedback from previous posts (see "Stop tearing down my instances!") has lead to overwhelming weltschmerz.

What's a brother gotta' do to get some love?

David

Takashi Matsuo

unread,

Jun 1, 2012, 12:11:29 PM6/1/12

to google-a...@googlegroups.com

Please file a production issue on our issue tracker:
http://code.google.com/p/googleappengine/issues/entry?template=Production%20issue

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/4_H4c0or0s8J.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

--
Takashi Matsuo | Developer Advocate | tma...@google.com

Cesium

unread,

Jun 1, 2012, 3:59:16 PM6/1/12

to google-a...@googlegroups.com

Hi Takashi,

Thank you for your help!

I sent you an email with some details.

Cheers,

David

Takashi Matsuo

unread,

Jun 1, 2012, 6:51:20 PM6/1/12

to google-a...@googlegroups.com

Hi David,

First of all, it seems OK right now. Let me know that's not the case.

If you don't want to put information on the public issue tracker,
please use the "Report Production Issue" located near the right-top
corner of the Admin Console. The report tool will collect some useful
information and convey it to us secretly.

At the same time, I'd suggest using appstats in order to find out the
root cause of your latency first. If a certain RPC is the culprit,
there would be a decent workaround. Basically you can set a particular
deadline on the RPC call, and catch the deadline error and tell the
client to retry.

Besides that, please keep in mind that the performance of each loading
request(sure, actually every single request) may vary, because of
various reasons like actions of other customers in the same cluster,
the load on the system, or some kind of maintenance happening. App
Engine is a multi-tenant cloud platform.

Actually, in that timeframe, I can see that only loading requests of
your application had higher latencies.

Those kind of hiccups are supposed to be automatically addressed and
fixed swiftly(almost always faster than manual intervention like me
doing right now). Additionally, you can set "Min Idle Instances" to a
certain level in order to mitigate this kind of risk, although it will
certainly cost you some more money.

Please let me know if you have further questions.

-- Takashi

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/hD6FC_YSBlsJ.

Cesium

unread,

Jun 1, 2012, 7:07:40 PM6/1/12

to google-a...@googlegroups.com

Hi Takashi,

Thank you for your reply!

You have taken the time to personally look into my issues and for that I am grateful.

I promise to follow your suggestions. It may take me a couple of days to perform some tests.

I will let you know if I have new questions.

Best Regards,

David

Cesium

unread,

Jun 3, 2012, 9:08:28 AM6/3/12

to google-a...@googlegroups.com

Hi Takashi,

You wrote:

Actually, in that timeframe, I can see that only loading requests of
your application had higher latencies.
Those kind of hiccups are supposed to be automatically addressed and
fixed swiftly(almost always faster than manual intervention like me
doing right now). Additionally, you can set "Min Idle Instances" to a
certain level in order to mitigate this kind of risk, although it will
certainly cost you some more money.

Yes, the loading requests have become very disruptive to my application.

I am currently at Auto-Auto for the Idle Instances and Pending Latency settings.

There is 1 client connecting to my application, and it provides a light load.

I cannot see from my logs when an instance shuts down, but I can see when one is started.

For example:

2012-06-03 06:25:12.259 /xxx/sdtp?t=ACCESS_POINT_APP_DATA_UPLOAD 200 10716ms 0kb

xxx.xxx.xxx.xxx - - [03/Jun/2012:05:25:12 -0700] "POST /xxx/sdtp?t=ACCESS_POINT_APP_DATA_UPLOAD HTTP/1.1" 200 64 - - "xxx.appspot.com" ms=10716 cpu_ms=5044 api_cpu_ms=708 cpm_usd=0.140171 loading_request=1 instance=00c61b117c977896b0d6d1161670c6b7518c43

I2012-06-03 06:25:12.253

This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.

2012-06-03 06:22:19.694 /xxx/sdtp?t=AP_RTC_SYNC 200 34ms 0kb

xxx.xxx.xxx.xxx - - [03/Jun/2012:05:22:19 -0700] "POST /xxx/sdtp?t=AP_RTC_SYNC HTTP/1.1" 200 72 - - "xxx.appspot.com" ms=35 cpu_ms=38 api_cpu_ms=0 cpm_usd=0.001107 instance=00c61b117c3d078ba2eaebba354307e44c0a15

These two, back-to-back entries show something odd.

The first request at 6:22:19 is handled by an existing instance, namely ...0a15.

The next request comes in 3 seconds later. However, it is not handled by instance ...0a15.

A new process is started, namely ...8c43.

You are correct, when you say:

Actually, in that timeframe, I can see that only loading requests of
your application had higher latencies.

My question is: "Why was instance ...8c43 created to serve the request, rather than using ...0a15?"

Cheers,

David

alex

unread,

Jun 3, 2012, 10:45:17 AM6/3/12

to google-a...@googlegroups.com

Could it be that your average response time became much higher than the spinup time? That might explain the new instance.

Also, have you tried running it with multithreading enabled?

Cesium

unread,

Jun 3, 2012, 12:21:11 PM6/3/12

to google-a...@googlegroups.com

Hi Alex,

First things first.

I'm an idiot.

The second reply comes in 3 MINUTES later, not 3 SECONDS.

Let me chew on your suggestions a bit...

David

PS (Can I edit a previous post, rather than deleting it?)

Reply all

Reply to author

Forward