Re: [google-appengine] URGENT: Re posting, Google Please respond

Message has been deleted

Felippe Bueno

unread,

Dec 7, 2011, 8:12:27 PM12/7/11

to google-a...@googlegroups.com

Are you using python 2.7 ? HRD ?

Python 2.7 has a known issue in combination with threadsafe: yes

see issue 6323

On Wed, Dec 7, 2011 at 11:03 PM, smwatch <show...@gmail.com> wrote:

NOW our billing rate out.

Today we are seeing upto 400 Frontend Instances , with the usual
normal traffic and our daily billing is ready to expire.

Our usual number of Instances that are shown on any average day are 6
instance, we have a Limit of maximum 1 idle instance.

Traffic is normal, but we see a lot of errors reported in last 6
hours. No new code has been deployed in last many days, no other
billing errors.

Anyone from Google please suggest whats going on? What can we do here
in this case.

We just try to do a dummy deployment with minor valid change in the
code to see if the instances kill themselves but still the problem
remains and our money is running out.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Message has been deleted

Jon McAlister

unread,

Dec 7, 2011, 9:39:16 PM12/7/11

to google-a...@googlegroups.com

I took a look. Not yet sure what the cause was, but perhaps these data
will help you.

First thing was to zoom-in on the relationship between latency and
instances. Attached is the graph. You can see there that first
latencies shoot up, and then subsequently the scheduler is adding
instances, and then latencies drop. Then, the scheduler starts
releasing instances. There are two clear examples of this
multi-phased-reaction-cycle in the graph.

Zooming in on the logs at 16:40 (an interesting point because the
latency just shot up) you can see many examples where ms= is around
3-4 seconds and pending_ms= is near zero:
https://appengine.google.com/logs?app_id=showmypc&version_id=1.355223391043075267&severity_level_override=1&severity_level=3&tz=US%2FPacific&filter=&filter_type=regex&date_type=datetime&date=2011-12-07&time=16%3A40%3A50&limit=20&view=Search

Which suggests that the app code is stuck doing something, but since
cpu_ms=0 it must just be waiting on an api call. Now, the app
primarily uses the memcache api, but it also uses the urlfetch api to
make requests to service1.showmypc.com. One possible explanation is
that this remote website went down, causing all your instances to
hang, further incoming requests to go to the pending queue, and the
scheduler to go into reactive mode. At present, the scheduler doesn't
try to specifically diagnose that the app is down in a way where more
instances wouldn't help, it just keeps adding.

But I don't have concrete proof that this explains what happened in
these time periods, it's just what stands out in the data.

On Wed, Dec 7, 2011 at 6:29 PM, smwatch <show...@gmail.com> wrote:
> We are using Python 2.6 and not yet on HRD
>
> Please suggest what else could it be, can someone from google see into
> our application console , why suddenly after months of running, so
> many instances came and blew the top of our daily limits.
> As I said no new traffic at all, this is a usual controlled traffic.
>
> The only thing I can see was the maintenance google did yesterday.

Screenshot-8.png

Message has been deleted

Jon McAlister

unread,

Dec 7, 2011, 11:29:52 PM12/7/11

to google-a...@googlegroups.com

Yeah for urlfetch you can control the deadline with the parameter:
http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html

There does not exist a setting to cap the total number of frontend
instances. With backends you get more explicit controls though.

On Wed, Dec 7, 2011 at 7:25 PM, smwatch <show...@gmail.com> wrote:
> Our main calls are just to memcache api, url fetch calls are 1 per
> million in the code.
>
> It there a way to timeout the requests, or block the number of max
> instances, This problem of high instance happened all days today and
> our billing ran out.
>
> As we said the code never changed, how can we control this and be able
> to use the service normally. Are our application settings correct in
> terms of open instance and timeout settings.
>
> Current we have disabled the memcahce calls in code (making the code
> useless) and the instances are running at 30 instead of 400

>
>
> On Dec 7, 6:39 pm, Jon McAlister <jon...@google.com> wrote:
>> I took a look. Not yet sure what the cause was, but perhaps these data
>> will help you.
>>
>> First thing was to zoom-in on the relationship between latency and
>> instances. Attached is the graph. You can see there that first
>> latencies shoot up, and then subsequently the scheduler is adding
>> instances, and then latencies drop. Then, the scheduler starts
>> releasing instances. There are two clear examples of this
>> multi-phased-reaction-cycle in the graph.
>>
>> Zooming in on the logs at 16:40 (an interesting point because the
>> latency just shot up) you can see many examples where ms= is around

>> 3-4 seconds and pending_ms= is near zero:https://appengine.google.com/logs?app_id=showmypc&version_id=1.355223...

>> Screenshot-8.png
>> 37KViewDownload

Felippe Bueno

unread,

Dec 8, 2011, 5:41:26 AM12/8/11

to google-a...@googlegroups.com

If you are calling the same memcache key too much, you can have contention.

You could try to use memcache key sharding, or global dicts before geting data from memcache.

Global variables are cached, so this could help if your problem is memcache contention.

On Thu, Dec 8, 2011 at 1:25 AM, smwatch <show...@gmail.com> wrote:

Our main calls are just to memcache api, url fetch calls are 1 per
million in the code.

It there a way to timeout the requests, or block the number of max
instances, This problem of high instance happened all days today and
our billing ran out.

As we said the code never changed, how can we control this and be able
to use the service normally. Are our application settings correct in
terms of open instance and timeout settings.

Current we have disabled the memcahce calls in code (making the code
useless) and the instances are running at 30 instead of 400

On Dec 7, 6:39 pm, Jon McAlister <jon...@google.com> wrote:

> I took a look. Not yet sure what the cause was, but perhaps these data
> will help you.
>
> First thing was to zoom-in on the relationship between latency and
> instances. Attached is the graph. You can see there that first
> latencies shoot up, and then subsequently the scheduler is adding
> instances, and then latencies drop. Then, the scheduler starts
> releasing instances. There are two clear examples of this
> multi-phased-reaction-cycle in the graph.
>
> Zooming in on the logs at 16:40 (an interesting point because the
> latency just shot up) you can see many examples where ms= is around

> 3-4 seconds and pending_ms= is near zero:https://appengine.google.com/logs?app_id=showmypc&version_id=1.355223...

> Screenshot-8.png
> 37KViewDownload

Reply all

Reply to author

Forward