Numerous internal errors & timeouts this morning, no app changes

8 views
Skip to first unread message

Kyle Jensen

unread,
Sep 1, 2010, 8:15:23 AM9/1/10
to Google App Engine

Hi, I'm seeing a ton of errors this morning including datastore
timeouts, http errors using the remote api, and the following error:

"Request was aborted after waiting too long to attempt to service
your request. This may happen sporadically when the App Engine
serving cluster is under unexpectedly high or uneven load. If you see
this message frequently, please contact the App Engine team."


** Is there a problem this morning? (I've got zero application
changes.)

Thanks, Kyle
appid: psgazettes

Tim Hoffman

unread,
Sep 1, 2010, 8:32:20 AM9/1/10
to Google App Engine
Hi Kyle

I have had applications not starting up all day (q-tracker, qtrack-
dev).

I have had one stuck for over an hour (qtrack-dev) where I can't get
the instance started,
I am even trying to start old versions that haven't been accessed for
days to try and get
around import failures.

I have logged an issue http://code.google.com/p/googleappengine/issues/detail?id=3667

But don't seem to be getting any traction.

It's obviously not a problem affecting all of the appengine
infrastructure, but just a cluster of applications
as it's not showing up on the System Status.

I have run into this sort of thing before
http://groups.google.com.au/group/google-appengine/browse_thread/thread/23e988d494144242/8ff964f7ffcc0b37?q=#8ff964f7ffcc0b37

And it took me days to get googles attention. You should see the
writeup I had to prepare before anyone would take a look.

Rgds

Tim

Kyle Jensen

unread,
Sep 1, 2010, 9:25:35 AM9/1/10
to Google App Engine
Thanks for your note Tim.

Crap -- that is very concerning. I haven't read through it all, but
my impression is you're saying apps you've worked on get 'stuck' in a
state with high error rates and you have evidence that it's due to the
location of the app within the infrastructure.

** Is that right?
** Have you found a way to 'un-stick' an app? E.g. re-deployment,
major version change, fairy dust.

Sincerely,
Kyle

Tim Hoffman

unread,
Sep 1, 2010, 9:38:37 AM9/1/10
to Google App Engine
When we had our major problem last time I found an appid that wasn't
experiencing the problems and moved the whole
application there. I kept monitoring the bad instance for about 3
weeks after and it was still getting hit the same time each day.,

I haven't bothered to look at it since.

On a side note, it looks like app recycling has moved out to between 4
to 6 minutes. Which means if you run something like django
and it fails during the initial imports you get a stuck instance that
just barfs continously and won't die unless everyone stops trying to
access it
and it gets rec-cycled. If recycle time has gone up from around 1-2
mins to 4-6 minutes its going to be a lot harder to get rid of stuck
instances.

As a matter of course I now always to deploy 2 copies of the same code
(version-numnber-a and version-number-b) and if one version gets
stuck
we will flick the default version to the alternate one.

Unfortunately this hasn't worked at alll today as the quiet version
won't start up either because of DeadlineExceeded errors.

So app engine has been very flaky for me today (12 hours) and there is
zip I can do about it. I am starting to get some heat over this (ie
was it wise to chose appengine).

Hopefully we can attract some attention from a google person.

T

Kyle Jensen

unread,
Sep 1, 2010, 10:33:23 AM9/1/10
to Google App Engine
Tim - thanks for the thoughtful response. May I ask:

- Surely re-deployment causes immediate recycling?
- How did you migrate your datastore to the new appid or access data
from the old appid?
- How are you deploying multiple versions? (Quite literally I mean,
what kinds of commands are you using on the commandline?)

Sincerely, Kyle

Tim Hoffman

unread,
Sep 1, 2010, 10:39:10 AM9/1/10
to Google App Engine
Hi Kyle

On Sep 1, 10:33 pm, Kyle Jensen <kljen...@gmail.com> wrote:
> Tim - thanks for the thoughtful response.  May I ask:
>
> - Surely re-deployment causes immediate recycling?

Yes, but that takes a while
So we deploy multiple versions of the same code base. ie Verision 1-1-
a and 1-1-b

> - How did you migrate your datastore to the new appid or access data
> from the old appid?

Moved all the data. (In the specific application instance, all of the
data is pushed from plone) So we set up a new instance on a working
part of appengine.
(Back then we could see different appid's resolve via dns to different
IP's and we mapped out which ones where being hit with the problem).

It took about 18 hours to move everything.

> - How are you deploying multiple versions?  (Quite literally I mean,
> what kinds of commands are you using on the commandline?)
>

We do an appcfg update, then edit app.yaml then do another appcfg
update.

See ya

Tim
Reply all
Reply to author
Forward
0 new messages