Several requests over the last two days have failed with:
1:1352226364.645785 Request was aborted after waiting too long to attempt to service your request.
I know this problem is reported and discussed often here - but I'm still unclear on precisely how to interpret it.
I'm trying to make sure I understand the relationship between idle instances,
pending latency, dynamic/resident instances, warmup requests and startup time.
My configuration:
Idle instances : 4 - Automatic
Pending latency: Autoomatic - Automatic
My app makes having use of deferred, at the time of the failure several
dozen tasks were posted to the app from taskqueue.
1. I'm going to start by assuming that while this request came to my app
from the taskqueue (via the deferred library), this problem has nothing
to do with the taskqueue per se.
2. The 500 means that this request in the Pending Queue:
App Engine's scheduler is responsible for routing incoming
requests to be served by your app's instances. Sometimes the
volume of incoming requests exceeds the capacity of the
instances currently available to your app. When this happens,
incoming requests may have to wait in the Pending Queue until
busy instances become available, or until the scheduler starts
new instances.
3. So by that definition, there were only 3 ways out of the queue.
After minimum pending latency, but before max, Scheduler does one of these:
1. One of the 4 resident instances becomes idle, and get the request.
2. One of the dynamic instances becomes idle, and gets the request.
3. Scheduler spins up a new dynamic instance.
3a. If the instance comes up in time, the request is sent there.
As the 'inaugural request' to this instance, this request
is known as a "loading request".
Your app handles the request, but its noticeably slower.
You get the warning in the log:
"This request caused a new process to be started for your
application, and thus caused your application code to be
loaded for the first time. This request may thus take longer
and use more CPU than a typical request for your
application."
3b. If the instance does not come up in time, the request is
aborted in the Pending Queue before the app ever sees it.
You get the error in the log:
"Request was aborted after waiting too long to attempt to
service your request.
The big questions I have :
1. Is my summary above accurate? Are there any other cases
where "request was aborted after waiting too long" happens?
2. How long can you sit in the pending queue before you hit case #3b,
and the request is aborted? Do I have any control over this value?
3. I don't have warmup requests configured. Would this have helped?
If so, why? The scheduler has *real* requests waiting in the
pending queue, why/when would it need to send me warmup ones?
And most importantly:
How can I tell the difference between :
my instance took too long to come up because my app isn't optimized properly (ie, my problem)
AND
my instance took too long to come up because of something internal to GAE, entirely outside of my control
(ie, an issue that should be reported GAE prod)
Thanks so much for any comments/pointers/responses.
-ckhan