Google Groups

Updated App Engine Pricing FAQ!


Greg D'Alesandre Jun 23, 2011 11:49 PM
Posted in group: Google App Engine
Hello All!, Well, it took longer than expected, but here is the updated FAQ!  I highlighted the new sections, it covers how always-on will work, full explanation of datastore API prices, description of the new scheduler knobs, and description of what is needed to prepare for Python 2.7 and concurrent requests.  I hope this helps clarify some of the bigger questions people had and as always please let me know if you have additional questions.  Thanks!

Greg D'Alesandre
Senior Product Manager, Google App Engine

------

Post-Preview Pricing FAQ

When Google App Engine leaves Preview in the second half of 2011, the pricing will change.  Details are listed here: http://www.google.com/enterprise/appengine/appengine_pricing.html.  This FAQ is intended to help answer some of the frequently asked questions about the new model.

Definitions

Instance: A small virtual environment to run your code with a reserved amount of CPU and Memory.
Frontend Instance: An Instance running your code and scaling dynamically based on the incoming requests but limited in how long a request can run.
Backend Instance: An Instance running your code with limited scaling based on your settings and potentially starting and stopping based on your actions.
Pending Request Queue: Where new requests wait when there are no instances available to serve the request
Pending Latency: The amount of time a request has been waiting in the Request Queue
Scheduler: Part of the App Engine infrastructure that determines which Instance should serve a request including whether or not a new Instance is needed.

Serving Infrastructure

Q: What’s an Instance?
A: When App Engine starts running your code it creates a small virtual environment to run your code with a reserved amount of CPU and Memory.  For example if you are running a Java app, we will start a new JVM for you and load your code into it.

Q: Is an App Engine Instance similar to a VM from infrastructure providers?
A: Yes and no, they both have a set amount of CPU and Memory allocated to them, but GAE instances don’t have the overhead of operating systems or other applications running, so a much larger percentage of the CPU and memory is considered “usable.” They also operate against high-level APIs and not down through layers of code to virtual device drivers, so it’s more efficient, and allows all the services to be fully managed.

Q: How does GAE determine the number of Frontend Instances to run?  
A: For each new request, the Scheduler decides whether there is an available Instance for the request, the request should wait, or a new Instance should be created to service the request.  It looks at the number of Instances, the throughput of the Instances, and the number of requests waiting.  Based on that it predicts how long it will take before it can serve the request (aka the Predicted Pending Latency).  If the Predicted Pending Latency looks too long, a new instance may be created.  If it looks like an Instance is no longer needed, it will take that Instance down.  

Q: Should I assume I will be charged for the number of Instances currently being shown in the Admin console?
A: No, we are working to change the Scheduler to optimize the utilization of instances, so that number should go down somewhat.  If you are using Java, you can also make your app threadsafe and take advantage of handling concurrent requests.  You can look at the current number of running Instances as an upper bound on how many Instances you will be charged for.

Q: How can I control the number of instances running?
A: The Scheduler determines how many instances should run for your application.  With the new Scheduler you’ll have the ability to choose a set of parameters that will help you specify how many instances are spun up to serve your traffic.  More information about the specific parameters can be found below under “What adjustments will be available for the new scheduler?”

Q: What can I control in terms of how many requests an Instance can handle?
A: The single largest factor is your application’s latency in handling the request.  If you service requests quickly, a single instance can handle a lot of requests.  Also, Java apps support concurrent requests, so it can handle additional requests while waiting for other requests to complete.  This can significantly lower the number of Instances your app requires.

Q: Will there be a solution for Python concurrency?  Will this require any code changes?
Python concurrency will be handled by our release of Python 2.7 on App Engine.  We’ve heard a lot of feedback from our Python users who are worried that the incentive is to move to Java because of its support for concurrent requests, so we’ve made a change to the new pricing to account for that.  While Python 2.7 support is currently in progress it is not yet done so we will be providing a half-sized instance for Python (at half the price) until Python 2.7 is released.  See “What code changes will I need to make in order to use Python 2.7?” below for more information.

Q: How many requests can an average instance handle?
A: Single-threaded Instances (Python or Java) can currently handle 1 concurrent request.  Therefore there is a direct relationship between the latency and number of requests which can be handled on the instance per second, for instance: 10ms latency = 100 request/second/Instance, 100ms latency = 10 request/second/Instance, etc.  Multi-Threaded Instances can handle many concurrent requests.  Therefore there is a direct relationship between the CPU consumed and the number of requests/second.  For instance, for a B4 backend instance (approx 2.4GHz): consuming 10 Mcycles/request = 240 request/second/Instance, 100 Mcycles/request = 24 request/second/Instance, etc.  These numbers are the ideal case but they are pretty close to what you should be able to accomplish on an Instance. Multi-Threaded instances are currently only supported for Java; we are planning support for Python later this year.

Q: Why is Google charging for instances rather than CPU as in the old model?  Were customers asking for this?
A: CPU time only accounts for a portion of the resources used by App Engine.  When App Engine runs your code it creates an Instance, this is a maximum amount of CPU and Memory that can be used for running a set of your code.  Even if the CPU is not currently working due to waiting for responses, the instance is still resident and considered “in use” so, essentially, it still costs Google money.  Under the current model, apps that have high latency (or in other words stay resident for long periods of time without doing anything) are not able to scale because it would be cost-prohibitive to Google.  So, this change is designed to allow developers to run any sort of application they would like but pay for all of the resources that are being used.

Q: What does this mean for existing customers?
A: Many customers have optimized for low CPU usage to keep bills low, but in turn are often using a large amount of memory (by having high latency applications).  This new model will encourage low latency applications even if it means using larger amounts of CPU.

Q: How will Always On work under the new model?
A: When App Engine leaves preview all Paid Apps and Apps in Premier Accounts will be able to set the number of idle instances they would like to have running.  Always On was designed to allow an app to always have idle instances running to save on instance start-up latency.  For many Apps a single idle instance should be enough (especially when using concurrent requests).  This means that for many customers, setting an App to be paid will mean a $9/month minimum spend, you can then use the 24 free IH/day to keep an instance running all the time by setting Min Idle Instances to be 1.

Q: What adjustments will be available for the new scheduler?
A: There will be 4 “knobs” provided in the new scheduler which will allow for adjustment of performance vs. cost:
- Min Idle Instances: This determines how many idle instances will be left running all the time in order to ensure instances are ready to go when there is a need based on the traffic.  NOTE: This option is only available to Paid Apps and Apps for Premier Accounts.
- Max Idle Instances: This determines the maximum number  of idle instances the scheduler will leave running to be ready for traffic.  Lowering this value can save money, but if traffic is spikey it could mean repeated start-up times and costs
- Min Pending Latency: This is the minimum time that a request will wait in the Request Queue before the Scheduler will start a new instance. Requests waiting less than this long will not cause a new instance to be spun up.
- Max Pending Latency: This determines the longest time a request can wait in the Request Queue without having an instance which can serve it. If any requests have waited this long, an Instance will immediately be spun up to serve it.

Q: How will the scheduler knobs affect billing and my costs?
A: The individual knobs will affect your application as follows:
- Min Idle Instances: Increasing this will increase your bill by keeping a certain minimum number of idle Instances always running.
- Max Idle Instances: Decreasing this will likely decrease your bill as fewer idle instances will typically be running and we will not charge for any excessive idle Instances.  In this case the scheduler knob is a suggestion to the scheduler but we will not charge you for excess if the scheduler ignores the suggestion.  For instance, if you set Max Idle Instances to be 5 and the scheduler leaves 16 Instances up for some length of time, you will only be charged for 5 Instances.  
- Min Pending Latency: Decreasing this will likely increase your bill as the Scheduler will spin up Instances to handle traffic more aggressively.
- Max Pending Latency: Increasing this will likely decrease your bill as the Scheduler will try to use the running instances more often before spinning up new ones.

Q: What is the difference between On-demand Instances and Reserved Instances?
A: On-demand Instances have no pre-commitment in terms of the number that will be used.  You pay for them as you use them.  Reserved Instances are pre-commitment to a certain number of Instance Hours in a week.  They are cheaper but you must pay for all the Instance Hours that you have pre-committed to, whether you use them or not.  This does not mean they have to be running the whole time.

Q: Wait, so Reserved instances don’t mean you have to keep them running the whole time?
A: No, it is just a way to get cheaper instance-hours by pre-committing to them.

Q: What is the time granularity of the instance pricing?  ie if I have an instance up for 5 minutes, what am I charged, $0.08 / 60*5?
A: Instances are charged for their uptime in addition to a 15 minute startup fee, the startup fee covers what it takes for App Engine to bring up and down the instance.  So if you have an on-demand Instance only serving traffic for 5 minutes, you will pay for 5+15 minutes, or $0.08 / 60 * 20 = 2.6 cents.  Additionally, if the instance stops and then starts again within a 15 minute window, the startup fee will only be charged once and the instance will be considered “up” for the  time that passed. For example, if an on-demand instance is serving traffic for 5 min, is then down for 4 minutes and then serving traffic for 3 more minutes, you will pay for (5+4+3)+15 minutes, or $0.08 / 60 * 27 = 3.6 cents.

Q: You seem to be trying to account for RAM in the new model.  Will I be able to purchase Frontend Instances that use different amounts of memory?
A: We are only planning on having one size of Frontend Instance for now.

Q: Do Frontend instances handle Task Queues and Cron?
A: Yes, the handle Task Queue Requests by default.

Q: Can reserved instance-hours be used the following week?
A: Unfortunately, no.  If you pre-commit to a set of reserved instance-hours for a week you will be charged for those instance-hours regardless of whether they are used during that week.

Q: Can the experimental Go Runtime handle concurrent requests?
A: Not currently but we are actively working on it, stay tuned...

Q: What code changes will I need to make in order to use Python 2.7?
A: In general much of your current Python 2.5 code will run with Python 2.7 but there are some important changes that you might need to make:
- Start using django 1.2: The current Python runtime uses Django 0.96 by default (and you use Django implicitly when you use the templates built into the webapp framework). Because Python 2.7 will be a new runtime we do not plan to package or support this obsolete version.  We will package and support django 1.2 at a minimum.  In order to prepare for this, the best thing to do is ensure your code will run under django 1.2, instructions on how to use Django 1.2 can be found  here.
- Python 2.7 Support: I know this seems like it goes without saying but we’ll say it anyway, your code need to run under Python 2.7 in order to be used with the new runtime.

Q: How will concurrent requests work with Python 2.7?  
A: Python 2.7 will be a new runtime for App Engine.  Concurrent requests will work in threads similar to the way today’s Java runtime supports concurrent requests with threads.  
- Use a WSGI-compliant framework: In order to take advantage of running concurrent requests you won’t be able to use CGI, rather you’ll need to use a WSGI-compliant framework (this includes the webapp framework packaged with App Engine).
<span style="font-family: Arial; color: rgb(0, 0, 0); background-color: rgb(252, 229
...