|Tragedy of the Commons, and Cold Starts||Devel63||10/21/09 8:31 PM|
I fear we are in a destructive cycle:
- Cold starts take a long time (many seconds)
- So app developers set up auto-pings to keep their app warm
- So Google gets more aggressive in cycling out instances
- So app developers increase their ping frequency
- So even popular apps constantly hit cold starts
- So GAE becomes unusable for all
I have personally held off doing any auto-pings because I felt it was
"wrong", and bad for the common good. But cold starts have gotten
slower, and my app seems to get cycled out even more aggressively of
late (just a few seconds).
It would be nice to ask everyone to stop auto-pinging, but there's no
way to enforce it, and history has shown that the tragedy of the
commons is hard to avoid.
I see only 2 ways out:
-- Make cold starts much faster (perhaps by pre-compiling the code?)
-- Somehow associate enough cost that auto-pingers will stop
I have no idea how to do this.
Even if you ping every 3 seconds, that's still only 30K page
views per day
People with no traffic could ping every single second and
never pay anything
So, at a minimum, perhaps pre-compiling the code is a good first
|Re: Tragedy of the Commons, and Cold Starts||ted stockwell||10/22/09 4:25 AM|
Another way would be for Google to charge to keep applications warm.
Amazon has a similar feature where you can pay extra to reserve EC2
instances to make sure that the instance are always available.
Keeping apps warm is quite resource intensive, so I don't see how
Google could not charge for it.
|Re: Tragedy of the Commons, and Cold Starts||bFlood||10/22/09 6:05 AM|
agreed, this a catch22 for low request sites. pinging only goes so far
and seems completely wasteful (for everyone)
I'd love to know if they precompile python modules (and if not, there
must be a good reason why)
paid for warm instances - yes!
|Re: Tragedy of the Commons, and Cold Starts||Devel63||10/22/09 8:55 AM|
In a separate thread, someone from Google confirmed that they compile
all the files from scratch with each load.
Speeding up cold starts is clearly the best solution, but I don't know
how much time pre-compiling would save. Paying for warm instances may
help, but because anyone can auto-ping every second, the tragedy of
the commons will still proceed to its inevitable conclusion, causing
|Re: Tragedy of the Commons, and Cold Starts||ted stockwell||10/22/09 9:59 AM|
I then conclude that Google *must* change the billing model for
application instances from CPU time to elapsed running time if it is
to avoid this tragedy of the commons.
Doing so will remove the economic incentive to ping to stay warm.
|Re: Tragedy of the Commons, and Cold Starts||PK||10/22/09 11:22 AM|
you bring up a lot of good points but I was wondering what triggered
your e-mail and what makes you believe that we are actually in this
sort of "destructive cycle"?
|Re: Tragedy of the Commons, and Cold Starts||ted stockwell||10/22/09 11:42 AM|
Devel63 stated in his opening message that his/her application was
being cycled out after only a few seconds.
Thus his app was frequently having to code start, and thus Devel63's
users would frequently experience delays of several seconds.
All this is due to the 'arms race' between Google (that wants to shut
down applications that are not active in order to conserve server
resources and maximize server usage) and developers that want to keep
thier applications 'warm' (and thus avoid the delays caused by cold
starts) even when their applications are not being used.
|Re: Tragedy of the Commons, and Cold Starts||bugaco||10/23/09 1:02 PM|
I had a bit weird experience with this...
So I wrote app (http://analytics.bugaco.com) that runs on App Engine.
Than I looked at the request logs to see how it is running.
Request logs suggested that I'm using a lot of CPU time on hitting the
home page, but after that CPU time significantly decreases. It also
had annoying red flag suggesting that servlet is using excessive
resources and that I need to optimize it.
Testing a bit, I noticed that pinging lets app be warm, and I had cron
doing the pings for a few days; while also noticing that it does not
do anything useful
1. If log files don't suggest that you are better off pinging people
would not ping
2. It is stupid that google counts warming up your app toward CPU time
(leading to profiling, that leads to pinging)
3. It is very stupid that applications can not denote 'keep this code
path warm/cache it/or something' that will allow new users not to give
up on the up until they get first response.
So, as a conclusion, I think AppEngine is AWESOME. And I also think it
I love SDK, ability to deploy and test and use all the cool things.
I don't like the idea that it can not serve a (entry)page in 3-5
seconds as I think that it leaves bad taste in users mouth, and
consequently bad taste in developers mouth.
Finally, I am not sure I'll use AppEngine for developing other
applications as I'd rather go with paid hosting that provides some
level of performance on serving pages. I think Google would win a lot
of good will if they at least provide quick serving of static
One may wonder how to do that, and given that they have all those yaml
files there may be yaml file that specifies a warm static resource.
This would decrease a need for pinging your app as it would allow user
to hit entry page, and google to pre-cache app much easier.
|Re: Tragedy of the Commons, and Cold Starts||Gijsbert||10/24/09 8:31 AM|
Does anybody know if the start time of java apps are significantly
better (since they are compiled)?
|Re: Tragedy of the Commons, and Cold Starts||john||10/24/09 10:40 AM|
Is the current thinking that the biggest startup delay is due to
module imports for Django? My app has 3 distinct parts, only 1 of
which uses Django or any templating. Right now I use a single main()
function for all 3 parts, but would the other 2 parts have better cold-
start times if I partitioned them into a separate handler script that
didn't import any Django stuff?
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||OvermindDL1||10/24/09 3:53 PM|
On Sat, Oct 24, 2009 at 11:40 AM, johntray <john....@gmail.com> wrote:
Isn't there a python program that can take a python library and
|Re: Tragedy of the Commons, and Cold Starts||ted stockwell||10/24/09 4:34 PM|
The Java apps also take several seconds to start.
Frankly, I consider that WAY FAST.
Java server-side APIs are very heavyweight and definitely designed to
be started once and remain running.
|Re: Tragedy of the Commons, and Cold Starts||nickmilon||10/25/09 6:41 AM|
John, sure it is something worth trying. My experience tells me that
an other handler usually means a new instance, but then again you have
when the part using Django will be cold started in a brand new
instance when needed, even though an instance is already running. So
it all depends on your usage pattern - what part is running more
One approach is to initiate a cold start of the heavy instance
through a ping through js once you feel your user is probably going to
request this heavy Django driven page.
i.e. a timeout function in your landing page. This is more green and
economic solution than pinging in constant intervals.
For some info on instance lifetime you can take a look at
|Re: Tragedy of the Commons, and Cold Starts||Robin B||10/30/09 10:52 AM|
I heard that Google will soon speed up Java boot times by preverifying
code on upload instead of at boot time, but the cold boot problem is
still a problem.
Until the cold boot problem is addressed on appengine, by allowing
people to buy/keep warm handlers, you have to resort to hacks.
The task queue can be used to hit a handler every 10 seconds to keep
> For some info on instance lifetime you can take a look athttp://gaengine.blogspot.com/2009/09/server-instance-life-time-part-i...
|Re: Tragedy of the Commons, and Cold Starts||Nash-t||11/1/09 9:24 AM|
1. For non-logged in users:
Google Sites pages load quickly, so integration of google sites
with app engine may provide a way for us to serve static files
quickly. for example: have an app.yaml directory entry that points to
a google sites page. The app engine web servers immediately redirect
to the static page while warming up the app engine page.
2. For Logged -in users:
If the sign-in process (google accounts) could send a signal to app
engine, and have app engine pre-warm an application the user wouldn't
even notice that the app engine application was "cold".
I love the concept behind app engine and don't want to resort to
|Re: Tragedy of the Commons, and Cold Starts||Adligo||11/3/09 11:21 AM|
I think paying to keep the instances warm is a great idea!
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Niklas Rosencrantz||11/3/09 11:42 AM|
Any instance can, then static can. Or please availabilize counterexample.
|Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/4/09 3:21 PM|
Sorry to resurrect this old thread, but can you clarify what you mean
here? We just released precompilation for Java as of 1.2.8, and we've
seen it significantly reduce cold boot times. (Please try it for
yourself!) Between us optimizing the runtime and you optimizing your
code, I hope you should be able to reach acceptable times on loading
We're discouraging people from using "pinging" techniques to keep
their VMs warm, because it increases the number of loading requests
for all of the low traffic applications on App Engine. It would be a
shame if we had to change scheduling behavior to enforce that policy.
|Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/4/09 3:23 PM|
There's an enhancement request (http://code.google.com/p/
googleappengine/issues/detail?id=2456) open for this for Java, though
it probably applies equally well to Python. Go voice your opinion.
|Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/4/09 3:38 PM|
> I had a bit weird experience with this...I'm not sure what you mean here, but we have plans to change the admin
console to explicitly call out loading requests, so you can take that
into account when profiling your application. Until that becomes
available, it's pretty easy for you to detect and log loading requests
A couple of things:
1) CPU time doesn't grow on trees, it comes out of your free or paid
quota. Why should we hide this from you?
2) The number of loading requests your application receives are
inversely proportional to its traffic. If you get more traffic, you'll
receive fewer loading requests. This means it usually doesn't pay to
optimize loading requests, unless you're just trying to reduce user
Unfortunately, it takes an inordinate amount of physical hardware to
keep on the order of millions of applications in memory, which is
somewhat counter to free. If our startup optimizations plus your own
optimizations don't satisfy you, then maybe you can voice your opinion
on paying for a warm VM (http://code.google.com/p/googleappengine/
Google App Engine already serves static resources without intervening
requests to application VMs. This means that, for example, you could
serve a page that was entirely static content, with a small amount of
JS to ping your VM with an asynchronous dynamic request to wake it up.
That page would be served instantly to the user. You need to ensure
though, that the resources are indeed specified as static content in
your app.yaml or appengine-web.xml.
|Re: Tragedy of the Commons, and Cold Starts||marksea||12/4/09 5:31 PM|
>Would also be a shame if everyone stopped using GAE because it's not
really possible to get an app to work any other way.
|Re: Tragedy of the Commons, and Cold Starts||Devel63||12/6/09 4:49 PM|
Toby, you write that it doesn't usually pay to optimize loading
I agree with this whole-heartedly when you have your own server, and
only load once per day or month. It's probably true using GAE when
you have 100K+ page views per day.
But for lower-volume web sites, GAE performance is atrocious. In my
personal case, we have optimized in all sorts of ways (js
minification, liberal use of memcache, image sprites, sticking with
Django 0.96, etc.) ... but the typical user experience is quite poor.
It takes 3-10 seconds for the first page to load, and then often the
instance is swapped out while the user reads the current page, so that
the next request experiences the same thing. If the app is warm,
performance is fine.
Maybe this gets appreciably better as traffic improves, but of course,
I can't see that at present. I love GAE in theory, but it's getting
harder to ignore the reality of low-volume performance.
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/7/09 9:00 AM|
Thanks for your input Mark.
The applications which suffer most from this problem are those with very low-traffic, heavy initialization, and heavy dependencies. But we care deeply about the performance of all applications on GAE. This is why we're working very hard on performance improvements to the runtime which will make all applications load and run faster. There are two pieces of good news for you:
1) There are more performance improvements in the pipeline, like precompilation. We've seen this provide a 30% improvement to startup for many applications. See the release notes for more details.
2) The majority of execution time spent in startup is in application code, giving you the capability to control it.
|Re: Tragedy of the Commons, and Cold Starts||samwyse||12/7/09 12:10 PM|
On Dec 7, 11:00 am, Toby Reyelts <to...@google.com> wrote:Are Python GAE apps doing the usual Python optimizations? I've been
assuming that if I import codecs, for instance, that I'm loading pre-
compiled byte-code from Lib/codecs.pyc. But if I import my own
modules, it sounds like their byte-code doesn't get saved anywhere.
Would it be possible to upload .pyc files? My application (a Google
Wave bot) is probably never going to be as high-traffic as most of the
people here, so I'd like to optimize the initialization and
dependencies as much as possible.
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/9/09 9:26 AM|
On Sun, Dec 6, 2009 at 7:49 PM, Devel63 <danst...@gmail.com> wrote:
I think there's a misunderstanding here. What I said was that it's not worth optimizing loading requests in regards to quota. Latency is a separate concern.
If your VM is timing out while a user is actively visiting the site, then your site is extremely low traffic. VM timeouts are measured on the order of minutes, not seconds. So, for example, that means that you didn't receive any traffic to your VM at all for several minutes between the time the user fetched the first and second pages.
Yes, as stated above, VMs are not aggressively collected. In the normal case, if you have an active user of your website, you shouldn't see a cold-start per request. Maybe in your particular case you can asynchronously ping your backend (for example, with an AJAX request) a few seconds before they continue onto the next page?
As stated above, I think you're falling into a particularly bad extreme (continuous cold requests for an "active" user). This might require some creativity (for example, as above) to work around.
In terms of speeding up the loading request itself, the good news that the bulk of of that time is directly under your control. As an existence proof of this, you should be able to write a "Hello World" python app that responds from a cold start on the order of 100ms. This means you might try doing things like paring down the dependencies that you load on cold requests. You can also take advantage of the fact that requests for static content bypass your VM and are never "cold". So, for example, you can serve a page that is comprised mostly of static content almost instantly, and let it make AJAX requests to asynchronously fill in its dynamic content as your VM warms up.
If you'd rather just pay to have us maintain a warm VM for you, you can vote on that issue.
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/9/09 9:34 AM|
There's some discussion about pyc support in issue 1695. The summary is:
a) no we don't use compiled byte-code
b) you probably won't ever be able to upload compiled bytecode
c) it doesn't seem unreasonable that we can can safely compile your bytecode for you (look at Java's precompilation in 1.2.8 as an example)
d) but we haven't put that on the official roadmap yet
Startup time is important to us, and we are working on it.
|Re: Tragedy of the Commons, and Cold Starts||Devel63||12/9/09 11:11 AM|
In the past, it was several minutes before an active instance would be
swapped out. Of late, I have seen it repeatedly/regularly happening
within several seconds.
I've avoided (so far) the auto-ping approach; your idea to auto-ping
only when a user is on a page is intriguing. Still wasteful and
"wrong", but perhaps necessary.
Yes, I have voted for the paid warm instance :-)
Finally, I don't understand how I can significantly reduce my warm up
time. I suppose I could split each "page" into a separate app.yaml
handler (already done for admin versus user tasks), but then the user
would even more certainly run into startup issues when navigating
within the site. Besides, most of the time is spent importing Django
and system stuff I can't control.
The only reason I'm using Dango (0.96) is for translations ... is
there a built-in way to handle translations via webapp?
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscribe@googlegroups.com>
> > .
|Re: Tragedy of the Commons, and Cold Starts||bFlood||12/10/09 5:19 AM|
you said: "VM timeouts are measured on the order of minutes, not
seconds" - I have not seen this in practice since over a year ago when
GAE will still young. currently, every site I've measured is collected
in seconds (10, maybe 20)
also: "to write a "Hello World" python app that responds from a cold
start on the order of 100m" - again, I have not seen this in practice
for quite some time. the simplest of python sites, with no imports and
very little code seem to start in the 500ms to 1s range (and
sometimes, much longer). Please note Nick's post here, where he
changed his original cold-start metrics: http://bit.ly/6Fsoxv
I'm now under the impression that slow VM startups is a GAE issue and
while user imports are critical to keep them reasonable once they are
started, there is a lot of overhead that is completely out of our
control. The only way I've found to keep low traffic sites bearable is
to use polling from the task queue, so IMO, the title of this post is
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
|Re: Tragedy of the Commons, and Cold Starts||G||12/10/09 7:13 AM|
A little data point action...
Very low traffic site (just me, occasionally experimenting with
AppEngine/Python). Simple google.appengine.ext.webapp.template usage.
Cold start: 500ms
Cold CPU: 400ms
Warm start: 10-30ms
Warm CPU: 0-20ms
Swap gap: 1.5-2.0 minutes
While 500ms seconds isn't huge, it can be a concern (for those that
want to show more than a blank page or waiting icon during the '50ms
first impression window').
I agree that the static+AJAX approach is a fast architecture
(coughstilllackingfibertothecurbcough), durable, and fits well with
0.1% datastore misses.
Lazy imports might help, for some use cases.
|Re: Tragedy of the Commons, and Cold Starts||gops||12/10/09 9:26 AM|
when we really carefully using python (like lazy loading almost every
single thing ) -- the cold start is not an issue......the problem with
app engine is the lack of de-facto standard on what gives best
performance and solutions to many small problems -- like how to design
an app heavily lazy loaded , what is the best way to partition data,
unique property , timeouts retry , cpickle marshal or protobuf -- give
something fast enough please..., subdomains , full text search ,
mapreduce and cursor , how to create better counters -- this should be
built in -- a counter library ( start this issue:
most of this things we can do it on our own -- and it is reliable --
and we can even defer it -- but i still like google providing
solutions to such a common problem then having to rely on some third
party library or coding our own workarounds....
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/11/09 8:48 AM|
On Wed, Dec 9, 2009 at 2:11 PM, Devel63 <danst...@gmail.com> wrote:Toby,
This is something we monitor fairly closely. The average lifetime of an idle VM varies as the load across App Engine varies, but they do not timeout "within several seconds". One thing you might be seeing is that a burst of several requests can cause more than one VM to be loaded simultaneously.
Sending a single ping a few seconds ahead of time for a user is very low waste.
You can control whether or not you use Django. I also don't understand how splitting your app into separate handlers would cause more startup issues for you.
Sorry, I'm not that familiar with what's available for Python, but my understanding is that Django is aggressive about up front initialization.
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/11/09 8:57 AM|
On Thu, Dec 10, 2009 at 8:19 AM, bFlood <bflo...@gmail.com> wrote:hi toby
Do you have the appids of specific sites that you believe are timing out every 10 seconds? This is not what we're seeing. Are you sure you aren't seeing loading requests to several different VMs in parallel?
Nick's example is not "Hello World". He's cold-starting his blogging app in less than 500ms. (I personally believe that is quite acceptably responsive).
What I'm driving at is that there's a minimum startup time which you can not control as a developer, and that startup time is very low for python (on the order of 100ms). Everything else is under your control: What dependencies you have, what web framework you use, whether you push static content and use ajax requests, etc...
|Re: Tragedy of the Commons, and Cold Starts||Devel63||12/11/09 6:16 PM|
Here's a site that gets almost no traffic (not yet publicized):
Try it Monday morning at 9am PST and see how quickly it times out.
I've been seeing it apparently time out in a few seconds during the
workday. I suppose it's possible that what I'm seeing is Google
firing up another instance, but that sounds unlikely given that
there's no one else on the site most of the time.
On a Friday evening (now), it's staying warm for quite some time.
On Dec 11, 8:57 am, Toby Reyelts <to...@google.com> wrote:
> > > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscribe@googlegroups.com><google-appengine%2Bunsubscrib
> > e...@googlegroups.com>> ...
> read more »
|Re: Tragedy of the Commons, and Cold Starts||Stephen||12/12/09 4:49 AM|
On Dec 11, 4:57 pm, Toby Reyelts <to...@google.com> wrote:
> On Thu, Dec 10, 2009 at 8:19 AM, bFlood <bflood...@gmail.com> wrote:Actually, it pretty much is "hello world".
The whole point of Nick's Bloggart is that content is pre-rendered so
that it can be served 'static'. There is a catch-all regexp in
app.yaml that runs static.py. This script looks up the URL in
memcache, or does a db.get if uncached. It returns the bytes of the
pre-rendered content which is stored there.
Toby, how would you improve the cold start performance of this app
from 500ms to 100ms?
|Re: Tragedy of the Commons, and Cold Starts||G||12/12/09 9:27 AM|
I've become curious about the criteria for deployment of new
instances, and if site code should try to adapt, because of the cold
start overhead for every warm+new that occurs (as traffic ramps up).
Having client side code guess when a 'prime the server' ping would be
appropriate could lead to unnecessary cold starts (when at the edge of
new instance deployment criteria).
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Nick Johnson (Google)||12/14/09 9:32 AM|
Actually, Bloggart still has one or two unresolved transitive dependencies. static.py imports 'utils', which imports django 0.91 (for templates). Splitting out the bits of the utils module that Bloggart depends on from the other bits (which use Django) would likely improve startup time substantially. This has been on my TODO for a while now.
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
|Re: [google-appengine] Re: Tragedy of the Commons, and Cold Starts||Toby Reyelts||12/14/09 9:33 AM|
I took a few measurements of your app around 9AM PST today (Monday). Your VMs were timing out at between 90 seconds and two minutes. As you've noticed, the number of loading requests increases as the amount of traffic on App Engine increases. This is why we discourage active pinging.
|Re: Tragedy of the Commons, and Cold Starts||Devel63||12/14/09 12:12 PM|
Thanks for taking a look. I agree, the site seems to be lasting
longer at the moment ... that's a good thing. Startup time is worse
than usual (10+ seconds rather than 3 seconds), which isn't good.
I know you guys don't have many resources to actively improve GAE, and
I appreciate that you're doing what you can. I look forward to
whatever solution you can come up with to improve the situation for
everyone: more hardware available, pre-compile Python, pay for warm
instance, somehow detect and disable pinging to free up resources for
> read more »