recommended settings for instances for varying traffic websites?

Richard Cheesmar

unread,

Sep 22, 2016, 7:12:29 AM9/22/16

to Google App Engine

Is there any recommended settings for instances for varying traffic websites?

I mean, low (100-500 users an hour), medium (500-5000), high traffic (anything above 5000), something like that...

Obviously it depends on content being served, but some rough guidelines would be useful.

Any documented minimum number of instances or best approaches..

Nick (Cloud Platform Support)

unread,

Sep 23, 2016, 2:45:45 PM9/23/16

to Google App Engine

Hey Richard,

I'm not aware of any documentation specifically providing advice for the number of users per hour, but I think I have some resources and general tips which could help to think about such situations.

Roughly speaking, empirical data is worth more than a thousand estimations, so you should deploy with automatic scaling settings and some basic, liberal estimates of the basic settings, and adjust according to what you observe to take place. You're right that the content being served will matter, as will the intensity of computations or latency of any RPC calls that your request handler code will depend on, etc.

You can find information on scaling settings in the documentation, and you can use Cloud Monitoring to observe statistics on your app, as well as analyzing logs to check things such as "ms", "cpu_ms", "pending_ms", and so on (you can read about what those mean in the documentation linked as "analyzing logs"). A good starting point is the article "How to Troubleshoot Latency in Your App Engine Application". There is also an article in our documentation called "Designing for Scale", which has good advice. While not limited to scaling settings, it does have this to say:

We recommend that you use the default settings for automatic scaling for max idle instances and min/max pending latency on automatic unless you have done load testing with other settings to verify their effects. The default performance settings will, in most cases, enable the lowest possible latency. A trade-off for low latency is usually higher costs due to having additional idle instances that can handle temporary spikes in load.
You should set min_idle_instances if you want to minimize latency, particularly if you expect sudden spikes in traffic. The number of idle instances that are needed will depend on your traffic and it is best to do load tests to determine the optimal number.

You could in theory use the number of users per hour, the expected latencies of resources served per interaction, the number of interactions by each user and a statistical treatment of the frequencies and timings of these interactions to come up with a general model of how many instances will be needed, given each instance's ability to respond to the requests which correspond to given interactions, per second, to serve that number of users behaving in that way, but I believe this will be a very complex balancing act indeed.

If you were into modelling of that kind, you could build the model system with instances modeled by a request-to-latency function using several parameters and internal state, and have it tune the parameters associated with its model of an instance and the distributions of latencies associated with different request types according to observations - that is, by feeding it the logs for a given amount of real activity, using the session ID's to correlate requests with users, etc. Ultimately, though, while this would be an interesting and powerful system to build, I think it would be not necessary to go to such powers, being the case that a higher-level overview of raw request volume to latencies could allow you to make reason effectively and make decisions about scaling settings.

I hope this is helpful, and let me know if you have further questions; I'll be happy to assist!

Sincerely,

Nick
Cloud Platform Community Support

Reply all

Reply to author

Forward

Message has been deleted