App Engine getting slower and slower

452 views
Skip to first unread message

Marcel Manz

unread,
Jul 10, 2012, 3:41:03 AM7/10/12
to google-a...@googlegroups.com
The more and more applications are hosted on App Engine seems to have a negative impact on performance.

A version of one of my apps is handling the same type of requests over and over and is more or less loaded with a steady request rate from a remote system. This version does nothing more than to take a request and send it off to the task queue.

As it can be seen from the attached chart, the milliseconds per request seem to be increasing over time. I have tried to compensate this by upgrading F1 instances to F4, but that doesn't show any improvement as its load is API bound. The spike in the middle of the chart is a result of this incident: http://code.google.com/status/appengine/detail/taskqueue/2012/07/03#ae-trust-detail-taskqueue-add-many-latency

Not only do the APIs seem to become slower (in my case taskqueue.BulkAdd), but also does the time a request is waiting for an instance to serve increase lately.

For example taskqueue:

There have been times where it was possible to enqueue tasks in approx 10 milliseconds:

January 1st 2011: http://code.google.com/status/appengine/detail/taskqueue/2011/01/01#ae-trust-detail-taskqueue-add-many-latency

January 1st 2012: http://code.google.com/status/appengine/detail/taskqueue/2012/01/01#ae-trust-detail-taskqueue-add-many-latency

Nowadays one can be lucky if this succeeds below 100 milliseconds:

July 9th 2012: http://code.google.com/status/appengine/detail/taskqueue/2012/07/09#ae-trust-detail-taskqueue-add-many-latency

Further I see in the dashboard logs very frequent requests with an overall time of several hundred milliseconds, many with over one second, as well as requests that are even taking several seconds. These are non-loading / non-warmup requests and according appstats they promptly execute. I must assume the difference between the ms value in the logs and reported by appstats is the time the request was waiting to be served.

If so, I don't understand why this is taking so long to be served. My pending latency settings are Automatic / Automatic and I always keep 1 or 2 idle instances, which are sufficient for the load the app is handling.

In one of the Google IO 2012 talks gets mentioned that 1 millisecond +/- delay results in millions of dollars +/- for Google Search/Adwords. It's good that you know that, but please speed up the App Engine platform so we can profit from the same.

Thanks
Marcel


chart.png

Brandon Wirtz

unread,
Jul 10, 2012, 4:32:45 AM7/10/12
to google-a...@googlegroups.com

The time period you are talking about is minimal. Hard to tell if you “broke something” if your instances are being slowed by something, or if you just have a difference in traffic numbers.

 

This is my 30 day numbers for one of my larger apps.


For me what happens is Datastore, or Memcache gets buggy, and that changes the score (this is visible via the errors per second)

The “Instance” size won’t fix slow reads from Datastore or Memcache. 

 

 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YOi0uXp2BBQJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

image001.png
image002.png

Marcel Manz

unread,
Jul 10, 2012, 5:12:56 AM7/10/12
to google-a...@googlegroups.com
What do you mean by the time period would be minimal? If assume the milliseconds indicated in the log viewer represent the total time the request was taking to complete (including waiting in the scheduler for an instance to be served).

My application serves another system, therefore it's crucial for my app to have fast response times. At the moment it looks that this is getting worse and worse (Java HRD):

http://code.google.com/status/appengine/detail/serving-java/2012/07/09#ae-trust-detail-helloworld-get-java-latency

Kaan Soral

unread,
Jul 10, 2012, 3:58:47 PM7/10/12
to google-a...@googlegroups.com
In my case, Appengine gets faster overtime, so I start hitting idiot minutely or hourly urlfetch limits, so from time to time I have to limit task queues by reducing their number of max_concurrent_request's

Brandon Thomson

unread,
Jul 11, 2012, 12:09:11 AM7/11/12
to google-a...@googlegroups.com
What I've observed is that a specific app may tend to get slower over time as the "cluster" it runs on gets busier.

Eventually an app may get moved to a new "cluster". This may include a few minutes of hard downtime, even for HRD apps. Afterwards performance can be dramatically better.

Someone at google may be able to manually move your app if you complain loudly enough, although there's no guarantee the issue won't recur.

When it gets really bad all your instances can be totally idle but many requests still block in the scheduler for hundreds of ms (or more) like you seem to be seeing. Adjusting the latency settings doesn't help because the instances aren't the bottleneck.

Quite possibly my least favorite part of app engine, although on the whole I still like it a lot.

Marcel Manz

unread,
Jul 12, 2012, 8:29:52 AM7/12/12
to google-a...@googlegroups.com
After experimenting with various F1-F4 frontends, I meanwhile have migrated the workload of my app to public facing B1 backends, so it can be accessed remotely.

As you can see from the attached screenshot the latency has improved greatly compared to using frontends. I now have again approx 150ms latency compared to up to 1 second during the last days using frontends.

Not sure if this is because backends operate in a new / not so busy cluster or if simply the scheduler for backend traffic is handling it with less delay. It's obvious that there must be a difference.


backend_frontends.jpg

alex

unread,
Jul 12, 2012, 10:16:54 AM7/12/12
to google-a...@googlegroups.com
The scheduling is completely different for backends, and if you're using only one instance there's simply no autoscale scheduling. Requests are simply put into a waiting queue 'till the backend is free (or they time out). 

-- alex

Marcel Manz

unread,
Jul 12, 2012, 3:44:14 PM7/12/12
to google-a...@googlegroups.com
I'm using more than 1 backend for redundancy reasons. I was also experimenting in addressing a specific backend directly, but couldn't notice any difference in latency compared to addressing the whole backend pool.

By using backends I now see log entries in the range of 20-30 ms for processing a request that is enqueued to the taskqueue compared up to several 100 milliseconds that were consumed (or kept in queue) on frontends.

Running the backend option I can now serve the load that required much more frontend instances (or at least that's what the scheduler thought would be required). The only downside is that there will be no unlimited autoscaling in place as with frontends.

Looking forward for GAE to fix the frontend issues, as latencies like these don't do the platform any good:

http://code.google.com/status/appengine/detail/serving-java/2012/07/11#ae-trust-detail-helloworld-get-java-latency



Brandon Thomson

unread,
Jul 12, 2012, 11:34:09 PM7/12/12
to google-a...@googlegroups.com
By using backends I now see log entries in the range of 20-30 ms for processing a request that is enqueued to the taskqueue compared up to several 100 milliseconds that were consumed (or kept in queue) on frontends.

Impressive, thanks very much for posting your results.

Did you happen to verify the improved latency is seen externally to app engine, too?

I ask because I know the log entry ms value includes time waiting in the dispatch queue when frontends are used, but I haven't verified whether it's included when backends are used.
Reply all
Reply to author
Forward
0 new messages