Hi Nick,
Sorry for the delay here. Been working hard these last few days to
resolve these latency issues, which are looking good now. I'd like to
help you figure out the issue here, so please bear with me since I'm
not looking at your app's source code. =)
I believe Marzia stated elsewhere that the once you get into the
100ms+ range for CPU time (runtime only, not including APIs) you will
begin to see this prioritization and additional latency at the request
level. This could account for some of the variability you have seen.
Another thing to keep in mind is the active dynamic requests limit
explained here:
http://code.google.com/appengine/docs/quotas.html#Request_Limits
"An application operating entirely within the free quotas can process
around 30 active dynamic requests simultaneously. This means that an
app whose average server-side request processing time is 75
milliseconds can serve up to (1000 ms/second / 75 ms/request) * 30 =
400 requests/second, independent of the quota system, without
incurring any additional latency. Requests for static files are not
affected by this limit. Applications that are heavily CPU-bound, on
the other hand, may incur some additional latency in long-running
requests in order to make room for other apps sharing the same
servers."
We maximize the throughput of these dynamic requests as much as
possible, taking advantage of App Caching
(
http://code.google.com/appengine/docs/python/runtime.html#App_Caching).
This is what enables high load applications to serve at 75ms of
runtime CPU per request. If you're not using App Caching, you could
see a significant impact on your application's latency.
Another thing to think about is how API call latency affects overall
throughput. For example, if you execute a series of Datastore queries
to retrieve many entities, you may be looking at 200-400ms of latency
for a query (or more, depending on the shape of your data). Now
connect this number with the runtime CPU and wall-clock time and
you're looking at 450-650ms of wall-clock time minimum to execute a
single request. Doing simple math you can get your maximum throughput
per instance per second: 1 request/650ms = ~1.5 requests per second
per instance. What if you're doing three of these queries per request?
Then we're looking at a throughput of less than 1 request per second
per instance.
Going back to it: There are around 30 active instances available to
your application. We do our best to maximize your use of these
depending on your load and your application's throughput. But one
thing to remember is the faster your app is (in wall-clock latency),
the better we can spread its load across these 30 instances. With
requests that may take 1 second or more to complete, you may see more
variability in your app's latency and throughput. I believe that's
part of what's going on here.
The reason is that we scale your instances to match your sustained
throughput. If your average request takes 500ms and you're sustaining
10 requests per second, then you can serve this load easily using only
5 instances. However, if ever so often you get a request that takes
2-5 seconds, then you're going to see some extra latency as the faster
500ms requests slow down for the big guy to go through. If bigger
requests keep coming through, things should balance back out in a
short amount of time; but if the bigger workload is relatively
infrequent this could show up as latency spikes.
There are a few solutions to this. One big one is caching. I assume
you're already doing that, but if you can get any more to reduce
latency, that will definitely help. Another is to profile your
application to find where you spend the majority of your wall-clock
latency (see
http://code.google.com/appengine/kb/commontasks.html#profiling).
Precomputing as much information always helps (that's the App Engine
way). But another thing that could help a lot is using background
processing (when that feature is ready) to do the precomputing in the
background; this could isolate you from increased latency you'll see
when more expensive requests go through the system.
Hopefully this information has helped a bit. Please let me know if you
have any questions. I still would very much appreciate your help in
tracking down the latency you're experiencing so we can figure out the
root cause. Thanks,
-Brett