Recurring DeadlineExceededError when loading instances..

145 views
Skip to first unread message

Rajkumar Radhakrishnan

unread,
Dec 29, 2011, 5:30:32 AM12/29/11
to google-a...@googlegroups.com
Hi,

My paid, production application is serving out errors due to DeadlineExceededError when loading the application instance. This happens only for one specific application alone, while a clone for that app used for testing purposes does not have such issues.

This has been recurring issue, occurring intermittently for the past week.

Dashboard : Errors/Second : 24 hours 

errors-per-second-24-hours.png

Dashboard : Errors/Second : 4 days

errors-per-second-4-days.png

Dashboard : Errors/Second : 14 days

errors-per-second-14-days.png

I had filed a production issue when I first noticed this happening..

Is anyone else facing such DeadlineExceededErrors in app loading recently ? 

I am asking this because the system status has Current Availability as 100% !!

Google folks : 
  • Can you throw some light on why such errors continue to occur ? 
  • Is this related to which set/batch of deployment servers are being used to host a specific application ?
    • If so, can we get to know which batch is serving a specific application and possibly a system status for that batch ?
  • Can you not automatically detect such error scenarios.. like say, application was serving requests without issues, new app-instances cannot be loaded due to DeadlineExceededError --> schedule a test to see if the same app-instance can be loaded in a different batch of servers (not for serving requests, but just to see if the app instances get loaded), if the app can be loaded file automatically file issues to test the current batch of servers, while moving the app hosting to the other batch ?
I am also willing to pay an additional amount of $9/month if you can ensure that my primary instances are working without issues. May be, you can host some monitoring application on may be Amazon (no, I am not sarcastic.. just that I believe such monitoring needs to be on a different platform, than the one that is tested) and file production issues automatically (after initiating a series of test cases for that application.. checking for deployment history, etc.,.), while also sending an email to the app-instance owner.

Thanks & Regards,
Raj

errors-per-second-24-hours.png
errors-per-second-14-days.png
errors-per-second-4-days.png

timh

unread,
Dec 29, 2011, 7:49:51 AM12/29/11
to google-a...@googlegroups.com
Are you on MS or HR (I am assuming python)

If on MS historically you could find yourself in a bit of appengine infrastructure that is having a bad hair day and raised latencies can 
turn into slow import times and then DEE's and instances not starting.  

The system status will look fine as their monitoring isn't looking at every machine but the service as a whole.

T

Rajkumar Radhakrishnan

unread,
Dec 29, 2011, 9:22:34 AM12/29/11
to google-a...@googlegroups.com
The app is on MS, Python 2.5.

I am okay with occasional scheduled read-only period for maintenance, as migrating the database is not straight-forward for this app, which is metadata driven (data-store entity names are based on key-ids and are referenced within other metadata.. views, actions, work-flows, etc.,. I am not sure if the migration tool will retain the same key-ids)

I am just puzzled why this specific instance alone is continuing to have these issues for an extended period of time, while my other MS - Python 2.5 apps (including another with identical setup) seem to work fine. Why can't this app work fine too.. is what I wonder.

Thanks for tip though, T.

-Raj



T

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/jqI2A2cyJnAJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



Klaas Pieter Annema

unread,
Dec 29, 2011, 11:24:44 AM12/29/11
to google-a...@googlegroups.com
Unsure if this is related but I'm seeing some of the same signals. I just noticed DeadlineExceedErrors being thrown intermittently. My errors / second starting spiking about 24 hours ago which suggests that this problem has been going on for that long. Additionally I'm seeing the following message in my logs:

Request was aborted after waiting too long to attempt to service your request. This may happen sporadically when the App Engine serving cluster is under unexpectedly high or uneven load. If you see this message frequently, please contact the App Engine team.

- Klaas Pieter

timh

unread,
Dec 29, 2011, 5:48:42 PM12/29/11
to google-a...@googlegroups.com
As I said  earlier , you can get stuck in a bad part of the google infrastructure.  They could have some heavy processing running, or 
some sort of failure in a  few boxes, and it doesn't show up on the system status, and your other instances aren't affected
as they live in a different part of appengine infrastructure.

Add some debug logging at the very start of your app (before imports of custom code) etc.... through to startup.
You will see you are probably not getting past some initial imports in a timely fashion, and it has nothing to do with your code.

I had this sort of problem years ago ;-) In the end I had to move the instance to get out of the performance hole.  I was getting hit
with DEE on startup and if I could startup it was slow or DEE'd at the same time every day for about 3 hours.  On other occasions it the DEE where continuous until someone at google noticed they where having a problem.

Rgds

Tim


timh

unread,
Dec 29, 2011, 6:56:32 PM12/29/11
to google-a...@googlegroups.com
I just found one of my sites is stuck in this performance hole too.

T

Rajkumar Radhakrishnan

unread,
Dec 29, 2011, 8:14:00 PM12/29/11
to google-a...@googlegroups.com
Thanks for the details, T & Klaas Pieter. Sorry to hear that you are facing it for some of your apps too!

My app seems to be okay for now. Not sure when the issue will happen again though.

Google folks : 
Charge me more if you want to, but please automate this issue out.

-Raj

On Fri, Dec 30, 2011 at 5:26 AM, timh <zute...@gmail.com> wrote:
I just found one of my sites is stuck in this performance hole too.

T

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/aTY25NFLvZsJ.

Rajkumar Radhakrishnan

unread,
Dec 29, 2011, 9:30:52 PM12/29/11
to google-a...@googlegroups.com
My app has started serving out errors, again !!

Earlier the app was taking a lot of time in writing out the response using the django templates.. and now the error happens in instance loading itself. Believe this has got something to do with File I/O (loading the python source files and the django template files).

Google folks : 
Are there some known issues in File I/O ?
Why is there no response to either the production issue filed more than a week ago or this email ?

And thanks for not forgetting to bill me for this "error-free" hosting period !!

-Raj

Rajkumar Radhakrishnan

unread,
Jan 4, 2012, 3:23:16 AM1/4/12
to google-a...@googlegroups.com
After a few days of app working fine without issues, the errors have started occurring again, as shown in the image below. 

Why is there no response from Google staff (on the reasons for the error and how you/me can avoid it from happening) either to this email or to the production issue ?! 

errors-per-second-last-7-days.png

If you cannot solve such issues even, for paid production apps (and I am willing to pay more too) then at least acknowledge it, so that I can consider other options.

-Raj
errors-per-second-last-7-days.png

Brandon Thomson

unread,
Jan 4, 2012, 5:36:22 AM1/4/12
to google-a...@googlegroups.com
I've been getting lots of the same message as Klaas (request was aborted; contact app engine team) for a couple days on a go-runtime app. Some details in this thread: https://groups.google.com/forum/#!topic/google-appengine-go/a0orbG_oER8

In addition to requests getting aborted filesystem performance seems really bad. Same thing rrk is seeing when his app is loading.

I have started compiling all of our assets into the binary to avoid having to read from the filesystem but there's nothing I can do about the requests getting aborted. Adding more instances doesn't help.

appid: gespartia

Klaas Pieter Annema

unread,
Jan 4, 2012, 10:09:46 AM1/4/12
to google-a...@googlegroups.com
The problems in my resolved itself. Still would like to know what caused it and if there is a way to prevent it.

- Klaas Pieter

errors-per-second-last-7-days.png

Brandon Thomson

unread,
Jan 5, 2012, 10:20:01 PM1/5/12
to google-a...@googlegroups.com
Seems like we are back to normal since 6 hours ago... did anyone else see improvements?

Hope it lasts!
ms_req.png

Rajkumar Radhakrishnan

unread,
Jan 9, 2012, 4:32:49 AM1/9/12
to google-a...@googlegroups.com
The issue is recurring again for my app, with ID : sahasvat-ifreetools-creator. The errors are when the application instances are loaded (and not due to application's datastore query latency) and as mentioned earlier a clone of the (App ID : ifreetools-labs) which is also on MS datastore is working fine without issues.

Errors/Second & Milliseconds/Request images from the Admin Console dashboard for the last 30 days are given below.

Errors / Second 

2012-01-09--errors-per-second-last-30-days.png


Milliseconds/Request

2012-01-09--milliseconds-per-request-last-30-days.png



2012-01-09--milliseconds-per-request-last-30-days.png
2012-01-09--errors-per-second-last-30-days.png

Brandon Wirtz

unread,
Jan 9, 2012, 4:48:52 AM1/9/12
to google-a...@googlegroups.com

What is you idle instances and latency set to? Are you saying this is on MS? If so you can consider this “expected behavior”.

--

image001.png
image002.png

Rajkumar Radhakrishnan

unread,
Jan 9, 2012, 11:50:26 AM1/9/12
to google-a...@googlegroups.com
Yes, I am on MS and am considering to move over to HR. But now, it seems the DeadlineExceededError (during instance loading.. which is what my app faces) happens even for HR apps too..

Brandon Thomson

unread,
Jan 10, 2012, 2:11:24 AM1/10/12
to google-a...@googlegroups.com
The app I had trouble with is an HR app so I can confirm it is not only M/S apps experiencing the problem.

My app is currently working great but I am watching it closely since others have seen the problems recur.

Naresh Talluri

unread,
Jan 10, 2012, 3:54:45 AM1/10/12
to google-a...@googlegroups.com
Even i'm facing the same issue from past few days and some times suddenly all of the instances were set up to 0, and it will reboot freshly, i don't why it is happening.

Here is the screenshot, if anyone having any clue please let me know.

Thanks,
Naresh T


On Tue, Jan 10, 2012 at 12:41 PM, Brandon Thomson <gra...@gmail.com> wrote:
The app I had trouble with is an HR app so I can confirm it is not only M/S apps experiencing the problem.

My app is currently working great but I am watching it closely since others have seen the problems recur.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/zKJpcPY-ziMJ.
Screen Shot 2012-01-07 at 1.15.32 AM.png

Klaas Pieter Annema

unread,
Jan 12, 2012, 4:57:47 PM1/12/12
to google-a...@googlegroups.com
My app was running fine for a while, but has gone back to throwing very frequent DeadlineExceededErrors. Unsure if it's related but I think it started after I deployed a new version of my app. I have since reverted to the older version but it didn't resolve the error.

Karl Rosaen

unread,
Jan 13, 2012, 1:31:38 PM1/13/12
to google-a...@googlegroups.com
our app is seeing this problem too.  we are on the master/slave datastore.

I think it's unacceptable to allow master/slave to perform this poorly; I hope the app engine team will still fully support issues with the master/slave.  I know migrating to the HR datastore is the right thing to do, but allowing the performance of the master/slave to suffer this badly is not the right way to motivate us to migrate...  

There didn't used to be this many issues with the master / slave, so it seems as though the app engine team is scaling back support for the master / slave with the excuse that the better HR alternative exists.  But what about those of us who have run our app on GAE for years?  Migrating is a big deal, and we'd like to at least wait for the blobstore migration tool to be ready.  In the meantime, I see no reason why the master / slave can't continue to perform reliably even if it has planned downtime and other understandable drawbacks compared with the HR.
Reply all
Reply to author
Forward
0 new messages