SLOW

229 views
Skip to first unread message

pdknsk

unread,
Jan 30, 2012, 11:40:53 AM1/30/12
to Google App Engine
For the past 1-2 days, App Engine is very slow. Requests which usually
take 100ms, take up to 20 seconds! This causes high frontend instance
hours. Please review the graph below for ms/request.

http://goo.gl/s0gBF

This is not related to db or anything, because cached dynamic requests
are also affected by this. Just an example request.

2012-01-30 08:34:28.382 /feed/ 204 14770ms 20kb

Yet on the status page, everything is great. Clearly it isn't.

pdknsk

unread,
Jan 30, 2012, 12:03:31 PM1/30/12
to Google App Engine
To have this better reflected on the status page, I'd like to propose
a change. Please star if you agree.

http://code.google.com/p/googleappengine/issues/detail?id=6831

Brandon Wirtz

unread,
Jan 30, 2012, 5:28:39 PM1/30/12
to google-a...@googlegroups.com

I have seen the same thing… Not to the same degree. Other than saying it happens… I don’t know what to tell you. Report it, and Google will fix it, they usually detect and fix it on their own.  Only thing that annoys me is that my CPU usage is more expensive during these times so I’m slow AND I am paying more for my service.

 

 

Brandon Wirtz
BlackWaterOps: President / Lead Mercenary

Description: http://www.linkedin.com/img/signature/bg_slate_385x42.jpg

Work: 510-992-6548
Toll Free: 866-400-4536

IM: dra...@gmail.com (Google Talk)
Skype: drakegreene
YouTube: BlackWaterOpsDotCom

BlackWater Ops

Cloud On A String Mastermind Group

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To post to this group, send email to google-a...@googlegroups.com.

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

 

image001.png
image004.jpg

Robert Kluin

unread,
Jan 31, 2012, 1:00:15 AM1/31/12
to google-a...@googlegroups.com
You're assuming there are multiple test apps. They've stated that
they're working on improved monitoring, but given the usual eta
estimates.

Like Brandon said, file a production issue and they'll get on it.
http://code.google.com/p/googleappengine/issues/entry?template=Production%20issue


Robert

pdknsk

unread,
Jan 31, 2012, 9:43:59 AM1/31/12
to Google App Engine
This seems to occur in waves, as if pressure builds up.

http://goo.gl/Z4LRL

A1programmer

unread,
Jan 31, 2012, 10:21:30 AM1/31/12
to Google App Engine
I was noticing the same thing. However, today (at least for me) it
seems to have went away.

I had several page views taking 1 to 2 seconds (and even 3 to 4
seconds at times). Most of my page views seem instant today.

pdknsk

unread,
Jan 31, 2012, 12:32:38 PM1/31/12
to Google App Engine
Another wave just started.

http://goo.gl/Y8QH4

The status page is aware of it this time, so maybe Google is as well.

It seems like Google only uses 10 probe apps (for Python), at least
that's what I figure from the exact 10% steps in the error graph.

http://goo.gl/1pnHv

Marzia Niccolai

unread,
Jan 31, 2012, 2:29:08 PM1/31/12
to google-a...@googlegroups.com
Hi,

Both today and yesterday, there was a networking issue in one of the datacenters serving High Replication applications that caused increased datastore latency and elevated error rates.  We responded by stopping app engine from serving out of the affected datacenter until the problem was resolved.

If you have a paid HRD app and you feel the SLA has been violated for your app, you can file a request for billing credits here:

pdknsk

unread,
Feb 1, 2012, 12:15:13 PM2/1/12
to Google App Engine
It hasn't helped here. The problem isn't even the short >10s spikes,
but the sometimes many hours before it, with >1s response time (from
usually ~100ms). It seems quite clear to me that the response time
builds up slowly, suddenly peaks, and returns back to normal
immediately. Shortly after that it builds up again.

http://goo.gl/uUXqD

A1programmer

unread,
Feb 1, 2012, 1:15:20 PM2/1/12
to Google App Engine
Getting the same issues here... again...

Java for me.

Requests that are usually sub 200ms are now 3 to 4 seconds (if I'm
lucky). Occasionally they're between 20 and 30 seconds.

pdknsk

unread,
Feb 5, 2012, 11:42:39 AM2/5/12
to Google App Engine
http://goo.gl/Mmhzo

2012-02-05 06:28:04.398 /feed/ 200 90894ms 41kb Apple-PubSub/65.23

I'd file a production issue, but Google ignores those anyway.

Sunvir Gujral

unread,
Feb 5, 2012, 10:49:55 AM2/5/12
to Google App Engine
This is starting to cost me a lot, but in payments to Google and lost
customers due to the high latency being seen by everyone.
Is google working on this?

Sunvir Gujral

unread,
Feb 5, 2012, 8:47:25 PM2/5/12
to Google App Engine
I opened a case issue #6871, please star in the hopes that Google will
notice...
http://code.google.com/p/googleappengine/issues/detail?id=6871

Robert Kluin

unread,
Feb 6, 2012, 12:23:12 AM2/6/12
to google-a...@googlegroups.com
Are you making a URL fetch call in that?

Sunvir Gujral

unread,
Feb 6, 2012, 10:31:14 AM2/6/12
to Google App Engine
Is there an issue with URL fetches? Some of my pages do make URL
fetches, but not all. And everything is slow, even when I don't make
URL fetches.

pdknsk

unread,
Feb 6, 2012, 12:07:51 PM2/6/12
to Google App Engine
Well, it seems like the >50s latency finally made Google notice it,
because the latency has returned to ~100ms afterwards.

http://goo.gl/VQRqs

Robert Kluin

unread,
Feb 6, 2012, 7:03:18 PM2/6/12
to google-a...@googlegroups.com
I was asking about URL Fetch since there could be many factors
influencing latency in those cases, such as network issues, slow /
overloaded servers on the other side, the other site throttling
requests from App Engine, etc....

I asked about other services because sometimes they will experience
elevated latency. Depending on what you're using and how your code is
structured, small increases in latency could be significant.

Robert

Sunvir Gujral

unread,
Feb 6, 2012, 11:08:29 PM2/6/12
to Google App Engine
I looked at specific requests and the code being executed and most of
them do not rely on any other external services. For most of the day
today, latency did come down to <100ms. Then around 6pm PST, it
started going up again.
But I think, I might have a clue as to what might be going on...or
atleast I hope. At about 5pm, I changed min_pending_latency from
automatic to 440ms.
I just changed it back to automatic, I'm hoping that will bring the
latency back down.

Though, the change IMHO should not have made a difference because for
most of the day before the change, I had 2-3 idle instances just
sitting there doing nothing. I figured that I might be able to bring
down the number of instances I am getting charged for by making the
change...

Let's see what happens with this experiment.

Robert Kluin

unread,
Feb 7, 2012, 12:57:59 AM2/7/12
to google-a...@googlegroups.com
Scheduler tuning is a little tricky, the optimal settings depend on
the app's startup time, typical request latency, and your willingness
to pay.

What I've found is that unless you've got very spiky traffic, that
suddenly and massively spikes, you can turn the max idle instances
down. They more-or-less just act like a buffer, but don't actively
serve traffic. If you want better performance turn the min-idle
instances up a little, but again this doesn't need to be too high. I
generally run apps with between "automatic" and 15 idle instances.
Experiment to find the right number.

To get better performance I generally leave the min pending latency on
automatic, but I reduce the max pending latency. For reasonable apps
a setting somewhere between 300ms and 600ms seems to do a good job,
but you'll be spinning up instances for more rapidly. If you don't
want to, bump it up to a max of 1000ms (and possibly increase the min
to a couple hundred ms).

These are just my observations from experimenting. Your results might
be different.


Robert

Brandon Wirtz

unread,
Feb 7, 2012, 1:06:59 AM2/7/12
to google-a...@googlegroups.com
> What I've found is that unless you've got very spiky traffic, that
suddenly and
> massively spikes, you can turn the max idle instances down. They more-or-
> less just act like a buffer, but don't actively serve traffic. If you
want better
> performance turn the min-idle instances up a little, but again this
doesn't
> need to be too high. I generally run apps with between "automatic" and 15
> idle instances.

I find that you should very, very rarely need anywhere near that.

I did a post a while back with some rough formulas, but if your Max idle
time is 30 seconds minus (6x your average request time + warmup time), you
can generally weather any spike.

Typically if you set max pending to 1s your users will be a lot happier, and
you don't have to do as much with tuning. I used to spend more time tuning
max idle instances, now, I set that to 1 or 2 (depending on my instance
size) and tune the max idle time. (lower means better scale)

Robert Kluin

unread,
Feb 7, 2012, 1:22:49 AM2/7/12
to google-a...@googlegroups.com
On Tue, Feb 7, 2012 at 01:06, Brandon Wirtz <dra...@digerat.com> wrote:
>> What I've found is that unless you've got very spiky traffic, that
> suddenly and
>> massively spikes, you can turn the max idle instances down.  They more-or-
>> less just act like a buffer, but don't actively serve traffic.  If you
> want better
>> performance turn the min-idle instances up a little, but again this
> doesn't
>> need to be too high.  I generally run apps with between "automatic" and 15
>> idle instances.
>
> I find that you should very, very rarely need anywhere near that.
>
> I did a post a while back with some rough formulas, but if your Max idle
> time is  30 seconds minus (6x your average request time + warmup time), you
> can generally weather any spike.

Did you factor into account app startup times? Some apps can easily
take several seconds to get fired up, so they do better with a larger
buffer. For light apps I agree, I've found 1 or 2 is sufficient.


>
> Typically if you set max pending to 1s your users will be a lot happier, and
> you don't have to do as much with tuning. I used to spend more time tuning
> max idle instances, now, I set that to 1 or 2 (depending on my instance
> size) and tune the max idle time. (lower means better scale)

I agree here too, I generally set my max idle to 5 then don't mess
with it. I prefer 500ms for my default max latency, but a lot of my
stuff is tuned to spin up fast.

Brandon Wirtz

unread,
Feb 7, 2012, 1:55:03 AM2/7/12
to google-a...@googlegroups.com
> Did you factor into account app startup times? Some apps can easily take
> several seconds to get fired up, so they do better with a larger buffer.
For
> light apps I agree, I've found 1 or 2 is sufficient.

Yeah, I said warm-up not startup, but same thing.

I like lower numbers too, but if your max latency is at 1/3 your average you
will pretty much be guaranteed to have an instance per simultaneous request.
My average request is 300ish MS so I can pull off 500-600ms, but someone
with 2.5s average would spin up a lot of extra instances everytime a user
hit a page making 8 requests. And you'd eat the start up time on all those
instances, so too low of a Max can make things worse. Again check the
archive I talked about this too.

Message has been deleted

Mike Wesner

unread,
Feb 7, 2012, 9:31:42 AM2/7/12
to Google App Engine
Robert's tuning has done very well for us in various situations.
Running a large diverse app takes a balance of these settings. When
you factor in task queue performance and such, it can be a little
weird. The rules are going to be different for every app, so I don't
think you can have a magic formula.

It really comes down to a balance of what you want to pay vs the risk
you want to take of a spike causing issues for customers.

-Mike

Teddy

unread,
Feb 13, 2012, 4:15:54 AM2/13/12
to google-a...@googlegroups.com
I noticed your ticket had been marked 'resolved' . In our case the issue is nowhere near resolved, so after putting a comment on your ticket I appended a new ticket http://code.google.com/p/googleappengine/issues/detail?id=6917

please star this one too if the issue is relevant for more people
Reply all
Reply to author
Forward
0 new messages