Hi Gregory,
if I'm doing something wrong here then I'd love to learn how you do it. I tried setting the max idle instances to 1 (didn't help, except increase latency), I set the pending queue to 1s and 2s and 3s (didn't help, but increased the latency). If there is a way to limit the instances reliably, I would love to hear how you do it.
We have maybe 0.5 to 1 requests per second on average (the QPS up there in the screenshot is only a snapshot for the past minute). Our average response time is around 300ms. Occasionally more, occasionally less. It's not great, but it's still a lot lower than the 1s limit for continuous usage. We do use the memcache where we can, but it's limited to some 10MB to 30MB for us at the moment, and the hit ratio is always around 65%, so we do have to access the datastore more than we want to. We're also denormalising plenty of data for faster access, but it would substantially slow us down if we had to denormalise *everything* that is displayed on a page. I don't think it should be a problem to do 2 to 3 queries in a single request, maybe 5 datastore gets plus a few memcache gets, should it? Given that we're developing a complex application (
www.small-improvements.com) and not just a number crunching app, this seems not too bad. And performance-wise we're happy, it's just the cost that's prohibitive.
Anyway, I would love to hear how you do it, and maybe a screenshot of your system (and the ratio of requests to instances) would be interesting.
Cheers,
Per