Hourly bursts of deadline exceeded errors

MiuMeet team

unread,

Jan 10, 2011, 9:58:12 AM1/10/11

to google-a...@googlegroups.com

Hey there

We are currently doing about 750 qps peak on appengine.

Error rate is very low, except at 35 minutes after the full hour

we get a burst of Deadline Exceeded errors.

See the "Errors per second" graph: http://tinyurl.com/2v4msbt

We dont have any cron jobs that run once an hour. So we assume

that the problem is on the Google side.

Does anyone experience similar problems?

AppID is miumeet

Cheers,

-Andrin

Nik

unread,

Jan 10, 2011, 11:33:46 AM1/10/11

to Google App Engine

+1
this is causing serious issues with transactions and seems to have
started last week (2nd of Jan).
See my post on Saturday :
http://groups.google.com/group/google-appengine/browse_thread/thread/8e3d224c71b5346a#

instances keep cycling up and down and between 30 to 35 minutes every
hour,
as well as high http500 errors, requests that finally do complete,
take 10secs to 20secs.
Usually they take 20ms to 400ms.

Regards,
n.

Message has been deleted

master outside

unread,

Jan 10, 2011, 4:38:43 PM1/10/11

to Google App Engine

ok I just made Issue 4376: 500 errors and datastore unavailable issues
Now questions for everyone.
Do you use your own domain for traffic or is use the
appid.appspot.com? We use our own domain.
Do every request have this error? Looking at the graph most are at
about 0.25 errors/s while our requests are between 3 and 4 per second.
We do have one bigger spike to 1.25 about 3 hours ago now.
-Looks like Andrin's ranges from almost 0 to 30 per second with
averaging at about 12/s.
Do you use app-stats? We do but we only have it run 10% of the time.

Raymond C.

unread,

Jan 10, 2011, 9:54:42 PM1/10/11

to google-a...@googlegroups.com

My app has the exact same error pattern (didnt know we have that view for the error pattern) since Sept's maintenance. It had been better for some weeks and got worst in the last few weeks again.

I use appspot.com without custom domain.

Jeff Schwartz

unread,

Jan 11, 2011, 5:34:19 AM1/11/11

to google-a...@googlegroups.com

Yesterday I got the following:

01-10 01:31PM 54.141 /lmv/MemberDataStoreServices 500 10194ms 0cpu_ms 0kb Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)

173.77.83.166 - - [10/Jan/2011:13:32:04 -0800] "POST /lmv/MemberDataStoreServices HTTP/1.1" 500 0 "http://lovemyvehicle.appspot.com/" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)" "lovemyvehicle.appspot.com" ms=10195 cpu_ms=0 api_cpu_ms=0 cpm_usd=0.000106

W01-10 01:32PM 04.336

Request was aborted after waiting too long to attempt to service your request. This may happen sporadically when the App Engine serving cluster is under unexpectedly high or uneven load. If you see this message frequently, please contact the App Engine team.

And was immediately followed by:

01-10 01:32PM 48.295 /lmv/MemberDataStoreServices 200 25ms 68cpu_ms 21api_cpu_ms 0kb Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)

173.77.83.166 - - [10/Jan/2011:13:32:48 -0800] "POST /lmv/MemberDataStoreServices HTTP/1.1" 200 397 "http://lovemyvehicle.appspot.com/" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)" "lovemyvehicle.appspot.com" ms=26 cpu_ms=68 api_cpu_ms=22 cpm_usd=0.002049

Both were for the same process so I don't understand why it took 10194ms and failed and the 500 wasn't due to startup. I have been seeing these more frequently and hope the team can address this issue.

Jeff

On Mon, Jan 10, 2011 at 9:54 PM, Raymond C. <wind...@gmail.com> wrote:

My app has the exact same error pattern (didnt know we have that view for the error pattern) since Sept's maintenance. It had been better for some weeks and got worst in the last few weeks again.

I use appspot.com without custom domain.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

--
Jeff Schwartz

Raymond C.

unread,

Jan 11, 2011, 8:51:09 PM1/11/11

to google-a...@googlegroups.com

looking forward to reply from google, as least if there is any possible solution to tackle this. It seems to me that this is an issue with Google's server.

Ikai Lan (Google)

unread,

Jan 12, 2011, 2:54:58 PM1/12/11

to Google App Engine

Hey guys, sorry about the delay getting back to you on this, but I wanted to get the answer before posting.

There was a push made to an hour job that computes quotas. The push inadvertently locks blocks of data so they cannot be written. In the short term, we've changed the job to lock far less frequently, and we are moving to a model where locks are infrequent or unnecessary.

--

Ikai Lan
Developer Programs Engineer, Google App Engine

Blogger: http://googleappengine.blogspot.com

Reddit: http://www.reddit.com/r/appengine

Twitter: http://twitter.com/app_engine

On Tue, Jan 11, 2011 at 5:51 PM, Raymond C. <wind...@gmail.com> wrote:

looking forward to reply from google, as least if there is any possible solution to tackle this. It seems to me that this is an issue with Google's server.

--

kamens

unread,

Jan 12, 2011, 3:03:39 PM1/12/11

to Google App Engine

Is there an Issue open for this so we can track when the change has
been made? Same problem on our app (id: khanexercises).

On Jan 12, 2:54 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:

> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Raymond C.

unread,

Jan 12, 2011, 8:37:02 PM1/12/11

to google-a...@googlegroups.com

Thanks so much for the reply. Now at least we know Google knows it and are trying to fix it!

Thanks so much! Looking for the fix from Google as my app is having 1-2% error for the past few months due to this (all requests timeout within a the same minute very hour).

Dan

unread,

Jan 12, 2011, 3:30:46 PM1/12/11

to Google App Engine

I am experiencing the same problems on my app (id mind-well).

kamens I think the issue is 4376 http://code.google.com/p/googleappengine/issues/detail?id=4376

Dan

master outside

unread,

Jan 13, 2011, 10:37:27 AM1/13/11

to Google App Engine

I have still seen this problem on appid collarcmds at
01-13 12:08AM 36.268
01-12 03:18PM 07.370
01-12 12:06PM 06.998

I also see a few additional errors
A serious problem was encountered with the process that handled this
request, causing it to exit. This is likely to cause a new process to
be used for the next request to your application. If you see this
message frequently, you may be throwing exceptions during the
initialization of your application. (Error code 104)

Now there seems to be three places for this problem:
http://code.google.com/p/googleappengine/issues/detail?id=4376 -
Status:New
http://code.google.com/p/googleappengine/issues/detail?id=4380 -
Status:Fixed
http://groups.google.com/group/google-appengine/browse_thread/thread/f00fe0cc9b9bea
- The main group post
Are these all the same issue or do we have more than one problem here?

Note I am cross posting this.

On Jan 12, 2:54 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:

> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Dennis

unread,

Jan 13, 2011, 1:33:41 PM1/13/11

to Google App Engine

Does anyone know of an appengine app that does a simple appengine
status check?
eg: does a simple read and write of the datastore and confirms that
they worked.

The official appengine system status is "too optimistic".
I'm looking for something that can tell me: is it my app or appengine
that is having problems currently.

On Jan 13, 11:37 pm, master outside <masterouts...@gmail.com> wrote:
> I have still seen this problem on appid collarcmds at
> 01-13 12:08AM 36.268
> 01-12 03:18PM 07.370
> 01-12 12:06PM 06.998
>
> I also see a few additional errors
> A serious problem was encountered with the process that handled this
> request, causing it to exit. This is likely to cause a new process to
> be used for the next request to your application. If you see this
> message frequently, you may be throwing exceptions during the
> initialization of your application. (Error code 104)
>
> Now there seems to be three places for this problem:
> http://code.google.com/p/googleappengine/issues/detail?id=4376-
> Status:New
> http://code.google.com/p/googleappengine/issues/detail?id=4380-
> Status:Fixed

> http://groups.google.com/group/google-appengine/browse_thread/thread/...

Wim den Ouden

unread,

Jan 13, 2011, 1:41:49 PM1/13/11

to google-a...@googlegroups.com

Hi Dennis,
I only see errors on the moment wich i caused myself.
Haven't seen deadline exceeded errors since end of november.
Using python, jquery and nearly all datastore write via taskqueques.
gr
wim

2011/1/13 Dennis <denni...@gmail.com>:

> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

--
gr
Wim den Ouden
Custom applications, https://e-comm.appspot.com/
Free open source E-commerce framework (web) apps,
http://code.google.com/p/relat/
Gae developer tips, http://code.google.com/p/relat/wiki/gaetips

Barry Hunter

unread,

Jan 13, 2011, 2:04:13 PM1/13/11

to google-a...@googlegroups.com

Not sure such a thing is really possible.

Remember AppEngine is distributed, your app could be running in an
area thats slow or otherwise degraded, but the test app is running in
a perfectly fine area.

(That is almost certainly the issue with the 'official' one, its just
appengine app (you could find the app-id at one point). But it only
has visibility into its immediate surroundings - not the whole
appengine ecosystem)

More reliable would be to create another handler in your app, that
runs a few simple tests. Then at least its likly to be running on the
same instances as your main application. Not totally reliable as a
single app can easily be running on multiple instances - each with
slightly different characteristics.

> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Eli Jones

unread,

Jan 13, 2011, 3:04:31 PM1/13/11

to google-a...@googlegroups.com

As Mr. Hunter mentions, it can be hard to get a general idea of system status.. (since applications exist "within" different parts of the Appengine infrastructure. There is no global system state.)

But, here is an alternative site that I use for a general sense of how the Appengine Datastore is operating (You can compare all sorts of metrics from AWS and GAE):

http://amistrongeryet.com/dashboard.jsp

There is a specific metrics page for "GAE Datastore: read (no transaction)":

http://amistrongeryet.com/op_detail.jsp?op=gae_db_readNonTransactional

It shows a 1 Month graph of read performance.. with Median, Mean, 90th and 99th percentile results.

This lets you see fairly easily if any Datastore issues have popped up recently. (I generally just look at 99th percentile when looking for big issues.)

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Robert Kluin

unread,

Jan 13, 2011, 3:41:01 PM1/13/11

to google-a...@googlegroups.com

I find http://api-status.com/6404/117406/Google-App-Engine-API is
useful too. It is not fine-grained, but often gives you an idea if a
problem is widespread or not.

Robert

Dennis

unread,

Jan 17, 2011, 11:53:21 AM1/17/11

to google-a...@googlegroups.com

Thanks for the links to appengine monitoring sites -- I'm using them now...

Interestingly, my app is currently having problems

(like others are reporting http://code.google.com/appengine/forum/?place=topic%2Fgoogle-appengine%2F8Fey18C1OSQ%2Fdiscussion )

but the monitoring sites suggest only a very slight problem at the 99th percent level.

I'm certainly getting more than 1 error out of 100.

Maybe it's because one html response requires much more than 1 datastore read / write.

In fact, maybe 10 reads combined with 3-5 writes is a better test case.

That would re-label the current 99th percent line to be approximately like the current 90 percent line (or a bit worse).

Yet, even this is not reflecting what my (very sparsely used) app is experiencing.

Another factor is I'm using django 1.1, so there is a large read a the beginning of many app requests -- another important aspect of a test case (for me anyway).

Reply all

Reply to author

Forward