Weird Instance Scheduler

Showing 1-61 of 61 messages
Weird Instance Scheduler Mos 8/22/12 12:58 PM
Does anybody else experience abnormal behavior of the instance-scheduler the last three weeks (the last 7 days it got even worse)?  (Java / HRD)
Or does anybody has profound knowledge about it?

Background:  My application is unchanged for weeks, configuration not changed and application's traffic is constant.
Traffic: One request per minute from Pingdom and around 200 additional pageviews the day (== around 1500 pageviews the day). The peek is not more then 3-4 request per minute.

It's very obvious that one instance should be enough for my application. And that was almost the case the last months!

But now GAE creates most of the time 3 instances, whereby on has a long life-time for days and the other ones are restarted around
10 to 30 times the day.
Because load request takes between 30s to 40s  and requests are waiting for loading instances, there are many request that
fail  (Users and Pingdom agree: A request that takes more then a couple of seconds is a failed request!)

Please check the attached screenshots that show the behavior!

Note:
- Killing instances manually did not help
- Idle Instances were ( Automatic – 2 ) .  Changing it to whatever didn't not change anything; e.g. like ( Automatic – 4 )

Thanks and Cheers
Mos






Re: Weird Instance Scheduler Mos 8/22/12 11:58 PM
In addition here the failed-request report from Pingdom the last day (that's not acceptable!):


al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 3:16    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 3:16    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 3:34    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 3:35    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 17:44    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 17:45    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 17:57    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 17:58    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 18:17    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 18:18    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 18:34    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 18:35    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 18:43    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 18:44    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 19:02    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 19:03    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 19:05    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 19:05    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 19:41    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 19:42    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 19:51    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 19:52    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 20:02    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 20:05    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 20:10    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 20:12    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 23:03    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 23:04    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 23:10    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 23:10    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    Mi 23:15    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    Mi 23:16    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    02:48    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    02:49    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    03:04    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    03:06    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    03:12    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    03:13    6 KB    krisentalk   
al...@pingdom.com    DOWN alert: krisentalk (www.krisentalk.de) is DOWN    06:25    6 KB    krisentalk   
al...@pingdom.com    UP alert: krisentalk (www.krisentalk.de) is UP    06:26    6 KB    krisentalk   
Re: Weird Instance Scheduler Mos 8/23/12 2:18 PM
I fill a production issue for this. If anybody has similar problems or is interested in it please star it:

http://code.google.com/p/googleappengine/issues/detail?id=8004


On Wed, Aug 22, 2012 at 9:58 PM, Mos <mos...@googlemail.com> wrote:

Re: [google-appengine] Weird Instance Scheduler Takashi Matsuo (Google) 8/23/12 4:39 PM

Hi Mos,

On Thu, Aug 23, 2012 at 4:58 AM, Mos <mos...@googlemail.com> wrote:
Does anybody else experience abnormal behavior of the instance-scheduler the last three weeks (the last 7 days it got even worse)?  (Java / HRD)
Or does anybody has profound knowledge about it?

Background:  My application is unchanged for weeks, configuration not changed and application's traffic is constant.
Traffic: One request per minute from Pingdom and around 200 additional pageviews the day (== around 1500 pageviews the day). The peek is not more then 3-4 request per minute.

A possible explanation could be that the traffic pattern had changed.
 

It's very obvious that one instance should be enough for my application. And that was almost the case the last months!

Actually it's not true. In particular, check this log:

You can see the iPhone client repeatedly requests your dynamic resources in a very short amount of time. Presumably it's due to some kind of 'prefetch' feature of that device. Are you aware of those accesses, and that this access pattern can cause a new instance starting?

I don't think this is the only reason, but this can explain that some portion of your loading requests are expected behavior.

Now I'd like to ask you some questions.


* What is the purpose of max-pending-latency = 14.9 setting?
* Can you try automatic-automatic for idle instances setting?
* What is the purpose of those pingdom check? What happens if you stop that?
 

But now GAE creates most of the time 3 instances, whereby on has a long life-time for days and the other ones are restarted around
10 to 30 times the day.
Because load request takes between 30s to 40s  and requests are waiting for loading instances, there are many request that
fail  (Users and Pingdom agree: A request that takes more then a couple of seconds is a failed request!)

Please check the attached screenshots that show the behavior!

Note:
- Killing instances manually did not help
- Idle Instances were ( Automatic – 2 ) .  Changing it to whatever didn't not change anything; e.g. like ( Automatic – 4 )

Thanks and Cheers

Mos






--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Takashi Matsuo | Developers Advocate | tma...@google.com

Re: [google-appengine] Weird Instance Scheduler Mos 8/24/12 2:05 AM
> A possible explanation could be that the traffic pattern had changed.

No. It's the same. Check for example the Request/Seconds statistics of my application for the last 30 days!


>> It's very obvious that one instance should be enough for my application. And that was almost the case the last months!
> Actually it's not true. In particular, check this log:

That's one expection where one client did 8 request in a minute  (+ one pingdom). Nothing else this minute.
In those exceptional cases it could be ok if a second instance starts. (Nevertheless can't one instance not
handle 8 requests a  minute?)

As I described:  Instances are started and stopped without reason, even if less traffic per minute is available!

> * What is the purpose of max-pending-latency = 14.9 setting?

" is high App Engine will allow requests to wait rather than start new Instances to process them"
--> One attempt to stop GAE to create unnecessary instances.

> * Can you try automatic-automatic for idle instances setting?

I played around with this the last days and nothing changed. As I wrote:  I had those configuration for months and it worked fine 3-4 weeks ago!

> * What is the purpose of those pingdom check? What happens if you stop that?

To be alerted if GAE is down a again. "What happens if you stop that?" --> I wouldn't be angry anymore because I wouldn't recognize downtime's of my GAE application. ;)

Please forward http://code.google.com/p/googleappengine/issues/detail?id=8004  to the relevant GAE deparment.

Thanks!
Re: [google-appengine] Weird Instance Scheduler Takashi Matsuo (Google) 8/24/12 7:22 AM

Hi Mos,

On Fri, Aug 24, 2012 at 6:05 PM, Mos <mos...@googlemail.com> wrote:
> A possible explanation could be that the traffic pattern had changed.

No. It's the same. Check for example the Request/Seconds statistics of my application for the last 30 days! 

>> It's very obvious that one instance should be enough for my application. And that was almost the case the last months!
> Actually it's not true. In particular, check this log:

That's one expection where one client did 8 request in a minute  (+ one pingdom). Nothing else this minute.
In those exceptional cases it could be ok if a second instance starts. (Nevertheless can't one instance not
handle 8 requests a  minute?)

The issue here is not 8 requests in a minute. Actually there were almost 8 requests in a second. So App Engine likely needed more than one instance at this particular moment. Anyway, as you say, probably it's just a reason for one of the loading requests you're seeing, and this is not very important thing in this topic.

It's kind of digressing, but at a first glance, the Requests/Seconds stat seems an appropriate data source to discuss how many instances are actually needed, but in fact, it's not. The real traffic is not spreading equally.
 

As I described:  Instances are started and stopped without reason, even if less traffic per minute is available!

Okay. As far as I understand, here is what you've seen in the past weeks.

* You have been almost always set 'Automatic-2' idle instance setting.
* More than 3 weeks ago, number of loading requests were very few.
* Recently you have seen more loading requests than before.

First of all, it seems that you deployed 2 new versions on Aug 1 and Aug 2. Can you describe what kind of changes in those versions?
I'd like to make sure that there is no changes that can cause the scheduler/app server behaving differently.

Especially, if you want me to escalate this issue to our engineering team, you should provide the exact information. You say 'My application is unchanged', but in fact you deployed the new version on that day when you described the issue started. I need to make sure that there is no big change which can cause something bad.

And, to be fair, we didn't think of any change in our scheduler around 3 weeks ago which can cause this issue.

Secondly, you're setting max idle instances = 2. It does not guarantee that you have always 2 instances. It just guarantees that we will never charge you for more than 2 idle instances at any time.

More than 3 weeks before, those 2 idle instances might have had longer lives than now, but it was not a concrete behavior. Please think this way: you were just kind of lucky. Now, presumably one or two of those instances are occasionally killed for some reasons(there should be certain legitimate reasons, but those are something you don't need to care).

If you want some instances always active, please set min idle instances. Certainly it will cost you a bit more, and you will loose the pending queue, but considering the access pattern of your app(no bursty traffic except for few access from the iPhone browser), I would recommend trying this setting in order to achieve what you want here. I'd recommend 2 idle instances in this case, but you should decide the number.


> * What is the purpose of max-pending-latency = 14.9 setting?

" is high App Engine will allow requests to wait rather than start new Instances to process them"
--> One attempt to stop GAE to create unnecessary instances.

I think you should set min pending latency instead of max pending latency if you want to prevent new instance to spin up. However, if you're going to set min idle instances, this setting will almost loose effect. If you don't want to set any min idle instances for whatever reason, please consider setting min pending latency instead of max pending latency.
 

> * Can you try automatic-automatic for idle instances setting?

I played around with this the last days and nothing changed. As I wrote:  I had those configuration for months and it worked fine 3-4 weeks ago! 

> * What is the purpose of those pingdom check? What happens if you stop that?

To be alerted if GAE is down a again. "What happens if you stop that?" --> I wouldn't be angry anymore because I wouldn't recognize downtime's of my GAE application. ;)

Please forward http://code.google.com/p/googleappengine/issues/detail?id=8004  to the relevant GAE deparment.

As you can see, I'm still not convinced to believe that the scheduler is misbehaving. I understand that you're having experiences which are bit worse than 3 weeks ago, and understand your feeling that you want to tell us 'fix it', but I'd say it's still something in the line of 'expected behavior' at least for now.

If you feel differently, please let me know.

Regards,

-- Takashi
Re: [google-appengine] Weird Instance Scheduler Mos 8/24/12 9:39 AM
Hello Takashi,


> Actually there were almost 8 requests in a second. So App Engine likely needed more than one instance at this particular moment.

I thought this is why GAE has the concept of "pending-latency"  (which we discussed below).
Meaning:  Incoming requests may wait up to 15 seconds before starting a new instance. Therefore when 8 requests in one second occur that
should not mean that more instance needs to be started. Especially if there is no other traffic in this minute, as seen in my example.
Otherwise it would be a very bad implementation:
Starting a new instance means around 30s waiting time.  Serving 8 parallel requests from one instance, would result in a maximum of
8 seconds for the last request (assuming that each request takes around 1 second).
There is no reason for this concrete example to fire up more instances and let requests wait more then 30 seconds until a new instance is loaded.

> ... here is what you've seen in the past weeks.
>
>* You have been almost always set 'Automatic-2' idle instance setting.
>* More than 3 weeks ago, number of loading requests were very few.
> * Recently you have seen more loading requests than before.

That, right!  To be even more concrete: At the 16. august the problems got significant worse. Please check especially the time area from 16. august until today.

> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug 2. Can you describe what kind of changes in those versions?

I checked it in our version control. As I wrote no related changes were made! Just Html/Css stuff:
 * One picture upload
 * One html change
 * One JavaScript change
 * One css change


> And, to be fair, we didn't think of any change in our scheduler around 3 weeks ago which can cause this issue.

And around the 16th august? 

> More than 3 weeks before, those 2 idle instances might have had longer lives than now, but it was not a concrete behavior. Please think this way: you were just kind of lucky.

That shouldn't be luck! If GAE is not able to start Java instances in 5sec to 10 second, there needs be a guarantee that instances have longer lives.  Otherwise Java applications on GAE are unusable because user would have a lot of 30seconds wait time  (--> "failed requests"). (See also next comment regarding resistant instances)


> If you want some instances always active, please set min idle instances.

I tried this some days ago. I had one resistant instance. But that changed nothing.  Instances get started and stopped as before. I assumed that requests would go to the resistant instance first. But that was no the case. Resistant instance was idle, but a dynamic instance got started and the request waits 30sec.  Please check other discussion on this list and issues that reported similar observations.

> As you can see, I'm still not convinced to believe that the scheduler is misbehaving. I understand that you're having experiences which are bit worse than 3 weeks ago, and understand your feeling that you want to tell us 'fix it', but I'd say it's > >still something in the line of 'expected behavior' at least for now.
> If you feel differently, please let me know.

Yes I do feel differently (please see answers above).

Please accept http://code.google.com/p/googleappengine/issues/detail?id=8004

Thanks
Mos
http://www.mosbase.com
Re: [google-appengine] Weird Instance Scheduler Takashi Matsuo (Google) 8/24/12 11:43 AM

Hi Mos,

On Sat, Aug 25, 2012 at 1:39 AM, Mos <mos...@googlemail.com> wrote:
Hello Takashi,


> Actually there were almost 8 requests in a second. So App Engine likely needed more than one instance at this particular moment.

I thought this is why GAE has the concept of "pending-latency"  (which we discussed below).
Meaning:  Incoming requests may wait up to 15 seconds before starting a new instance. Therefore when 8 requests in one second occur that
should not mean that more instance needs to be started. Especially if there is no other traffic in this minute, as seen in my example.
Otherwise it would be a very bad implementation:
Starting a new instance means around 30s waiting time.  Serving 8 parallel requests from one instance, would result in a maximum of
8 seconds for the last request (assuming that each request takes around 1 second).
There is no reason for this concrete example to fire up more instances and let requests wait more then 30 seconds until a new instance is loaded.

Do you really read my e-mail?

Setting Max Pending Latency doesn't force requests to be in the pending queue for the specified time. Please use Min Pending Latency instead.
Can you try this first? If it doesn't work, try 2 min idle instances then.
 

> ... here is what you've seen in the past weeks.
>
>* You have been almost always set 'Automatic-2' idle instance setting.
>* More than 3 weeks ago, number of loading requests were very few.
> * Recently you have seen more loading requests than before.

That, right!  To be even more concrete: At the 16. august the problems got significant worse. Please check especially the time area from 16. august until today.

> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug 2. Can you describe what kind of changes in those versions?

I checked it in our version control. As I wrote no related changes were made! Just Html/Css stuff:
 * One picture upload
 * One html change
 * One JavaScript change
 * One css change


> And, to be fair, we didn't think of any change in our scheduler around 3 weeks ago which can cause this issue.

And around the 16th august?  

Sigh... isn't it a waist of time? What is the reason you picked that date? 
 
 

> More than 3 weeks before, those 2 idle instances might have had longer lives than now, but it was not a concrete behavior. Please think this way: you were just kind of lucky.

That shouldn't be luck! If GAE is not able to start Java instances in 5sec to 10 second, there needs be a guarantee that instances have longer lives.  Otherwise Java applications on GAE are unusable because user would have a lot of 30seconds wait time  (--> "failed requests"). (See also next comment regarding resistant instances)


> If you want some instances always active, please set min idle instances.

I tried this some days ago. I had one resistant instance. But that changed nothing.  Instances get started and stopped as before. I assumed that requests would go to the resistant instance first. But that was no the case. Resistant instance was idle, but a dynamic instance got started and the request waits 30sec.   
Please check other discussion on this list and issues that reported similar observations.

So I'd say please try 2. If you still saw the user-facing loading requests, you need more resident instance to eliminate the user-facing loading requests.
 

> As you can see, I'm still not convinced to believe that the scheduler is misbehaving. I understand that you're having experiences which are bit worse than 3 weeks ago, and understand your feeling that you want to tell us 'fix it', but I'd say it's > >still something in the line of 'expected behavior' at least for now.
> If you feel differently, please let me know.

Yes I do feel differently (please see answers above).

Please accept http://code.google.com/p/googleappengine/issues/detail?id=8004

So what is your expected behavior and actual result? Nobody in our team can do anything if you just keep saying "the setting that used to work doesn't work anymore" without trying mu suggestion.

I think my answer is clear at least for some points. 1) You'd better use 'min pending latency' instead of 'max pending latency' to prevent new instances to spin up as much as possible. 2) If you need longer instance lives, set appropriate number of min idle instances.

-- Takashi
Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 8/24/12 12:49 PM
Hi Takashi,

I ran some experiments with an instance that had requests pending only from my own scripts (no user facing traffic at all).

What I found was that sending requests at about 1req/sec, regularly spaced, caused GAE to spin up new instances randomly.  If I set the min instances setting to anything but "automatic", the very first request would cause a new instance to spin up (this was true even if the min instances was some high number, like 8, and I waited for all 8 instances to finish launching before sending a request - so in this case the # of instances started at 9 for the very first request).

The only solution I found for this behavior was to package the entire app as a backend.

- Kris
Re: [google-appengine] Weird Instance Scheduler Mos 8/24/12 1:00 PM
Thanks Kris for describing your case. That's what I saw in my experiments also. The "min instance setting" is not the solution because it doesn't work as expected.
I hope someone from GAE's team takes it serious and elaborate on this.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YIzxpRbmyHMJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler Jeff Schnitzer 8/24/12 1:00 PM
I had a similar experience the last time I experimented with the
min-idle slider.  GAE would route new requests to cold starts rather
than use the idle resident instance.  Min-idle seemed like it should
be renamed "always idle" - it made my UX horrible, my bill higher, and
I haven't touched it since.

There may be rational logic to the scheduler behavior, but it feels
broken to just about everyone who experiments with it.  I can only
guess at how many people have given up on GAE because of this issue.

Every time this comes up (and it comes up a *lot*), I'm going to repeat:

---> Requests should never be sent to cold instances.

Addressing that one issue seems like it would fix all the other
issues, or at least make them transparent enough that we can figure
out how to tune the scheduler on our own.

Jeff

On Fri, Aug 24, 2012 at 3:49 PM, Kristopher Giesing
<kris.g...@gmail.com> wrote:
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/YIzxpRbmyHMJ.
>
Re: [google-appengine] Weird Instance Scheduler Mos 8/24/12 1:24 PM
> Setting Max Pending Latency doesn't force requests to be in the pending queue for the specified time. Please use Min Pending Latency instead.

As you know my setting to "Min Pending Latency" was automatic. The expectation is that GAE takes a reasonable default latency if it is "automatic".
And you say:  Every parallel request starts a new instance if it is "automatic"? That' would be a "Min Pending Latency" of zero and not "automatic".

> If it doesn't work, try 2 min idle instances then

Please check the responses of other user in this thread.  This feature is totally broken and can not be used.


>> And around the 16th august?  
> Sigh... isn't it a waist of time? What is the reason you picked that date?

Did you see/studied my pictures from the first post of this thread?
The statistic shows that on this date the instance creation gets crazy.  I double checked it with the Pingdom reports.
Starting on this day there were even more downtimes.

> So I'd say please try 2. If you still saw the user-facing loading requests, you need more resident instance to eliminate the user-facing loading requests.

Again: As wrote in my post before that does not work. Check the responses from Kristopher and Jeff on this thread.


> So what is your expected behavior and actual result? Nobody in our team can do anything if you just keep saying "the setting that used to work doesn't work anymore" without trying mu suggestion.
> I think my answer is clear at least for some points. 1) You'd better use 'min pending latency' instead of 'max pending latency' to prevent new instances to spin up as much as possible. 2) If you need longer instance lives, set appropriate number of min idle instances.

As I wrote: I tried different settings. As many other people in this group as well.
Me and other people are reporting: The settings are broken!
It's very easy to reproduce. Please set up an application, send one request per minute (or second), configure 1 or 2 or 3 min idle instances and check what is happening. You will see that new  instances are started although resistant instances are available.

Please take it serious and let somebody of the engineers check this!

Cheers
Mos
Re: [google-appengine] Weird Instance Scheduler Johan Euphrosine (Google) 8/24/12 1:58 PM

Hi all,

Please review the following thread where the lead engineer working on the scheduler (Jon McAlister) took the time to explain in great detail the behavior of min idle instance.
https://groups.google.com/d/msg/google-appengine/nRtzGtG9790/hLS16qux_04J

Once you read this, we can discuss if what you're experiencing is really a bug, or if you want the scheduler to behave differently from its current implementation, in which case the more constructive way out of this discussion is to fill feature request, and get it starred by your peers.

Re: [google-appengine] Weird Instance Scheduler Mos 8/24/12 2:28 PM
Thanks Johan. I read the post some days before.

As often discussed on the mailing-list before and as Jeff said in this thread.
It's the combination of "Requests should never be sent to cold instances." and(!) the behavior of min idle instance which doesn't make any sense.

Please check the last comment of http://code.google.com/p/googleappengine/issues/detail?id=8004 where wrote down the problems in my point of view.

Senior Java-developers on this list which have many months of experience with GAE stated again again that there is a big issue around instance handling.
I think you have to trust your power-user and assign a team to work on this!
Re: [google-appengine] Weird Instance Scheduler Johan Euphrosine (Google) 8/24/12 2:59 PM


On Aug 24, 2012 11:28 PM, "Mos" <mos...@googlemail.com> wrote:
>
> Thanks Johan. I read the post some days before.
>
> As often discussed on the mailing-list before and as Jeff said in this thread.
> It's the combination of "Requests should never be sent to cold instances."

Please star this existing feature request:
http://code.google.com/p/googleappengine/issues/detail?id=7865

> and(!) the behavior of min idle instance which doesn't make any sense.

Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.

If you want another behavior like a new slider for Min-Instances please fill a new feature request.

> Please check the last comment of http://code.google.com/p/googleappengine/issues/detail?id=8004 where wrote down the problems in my point of view.
>

I would suggest starring existing or filling new features of defects for those separate issues you identified rather than aggregating comments on this production issue.

> Senior Java-developers on this list which have many months of experience with GAE stated again again that there is a big issue around instance handling.
> I think you have to trust your power-user and assign a team to work on this!
>

We do trust our power users and recognize their frequent contributions to the App Engine developer community.

But like any developer community, there are preferred ways of making feature requests or reporting bugs and I'm just trying to direct your feedbacks to what is more likely to produce results.

If you fill a feature request and get it starred enough the engineering team will definitely consider it, as each team regularly look at the tracker for the most starred feature request of their respective component.

If you fill a bug and produce a way to reproduce it and get it starred enough, it will get triaged and escalated to the corresponding team as we regularly triage the issue tracker when organizing bug squashing sessions.

That's how things are supposed to work. But sometime we lag, we forget about issues or miss them. You are more than welcome to point us at specific issue tracker entries when this happen so we can correct our mistakes.

Re: [google-appengine] Weird Instance Scheduler Armen Danielyan 8/24/12 3:12 PM
Hi Mos,

I have experienced very similar issues for the last week. The loading time of my website increased from 2-3 seconds to 10-15 seconds all of a sudden. When it happened I tried to change the app settings, changed it from F1 to F2, increased the idle instances setting but nothing helped.

During the week I was playing with settings with no results. Suddenly the issue has been solved by itself today. I want to stress, that yesterday the problem was still there, and I haven't changed any settings since then. 

It means that the problem was on Google's side, and they solved it silently. It's a shame Google doesn't accept their mistakes, and keep saying that it's our fault because we didn't configure our applications in a right way. I will never deploy any new application on GAE.
Re: [google-appengine] Weird Instance Scheduler Johan Euphrosine (Google) 8/24/12 3:34 PM
On Fri, Aug 24, 2012 at 11:12 PM, Armen Danielyan <adani...@fhi360.org> wrote:
> Hi Mos,
>
> I have experienced very similar issues for the last week. The loading time
> of my website increased from 2-3 seconds to 10-15 seconds all of a sudden.
> When it happened I tried to change the app settings, changed it from F1 to
> F2, increased the idle instances setting but nothing helped.
>
> During the week I was playing with settings with no results. Suddenly the
> issue has been solved by itself today. I want to stress, that yesterday the
> problem was still there, and I haven't changed any settings since then.
>
> It means that the problem was on Google's side, and they solved it silently.
> It's a shame Google doesn't accept their mistakes, and keep saying that it's
> our fault because we didn't configure our applications in a right way. I
> will never deploy any new application on GAE.
>

Hi Armen,

Are you affected by
http://code.google.com/p/googleappengine/issues/detail?id=7706?

The engineering team is working on improving the high performance
variance for apps that need to load a lot of code on loading request
(typically Java apps with "big" dependency like spring, guice or
depending on a lot of jars).

If you app is hit by this problem, please star this issue and comment
with your application id.

Thanks in advance.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/CxcnspcZJfsJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Weird Instance Scheduler Takashi Matsuo (Google) 8/24/12 7:59 PM
On Sat, Aug 25, 2012 at 5:24 AM, Mos <mos...@googlemail.com> wrote:
> Setting Max Pending Latency doesn't force requests to be in the pending queue for the specified time. Please use Min Pending Latency instead.

As you know my setting to "Min Pending Latency" was automatic. The expectation is that GAE takes a reasonable default latency if it is "automatic".
And you say:  Every parallel request starts a new instance if it is "automatic"? That' would be a "Min Pending Latency" of zero and not "automatic".

> If it doesn't work, try 2 min idle instances then

Please check the responses of other user in this thread.  This feature is totally broken and can not be used.


>> And around the 16th august?  
> Sigh... isn't it a waist of time? What is the reason you picked that date?

Did you see/studied my pictures from the first post of this thread?
The statistic shows that on this date the instance creation gets crazy.  I double checked it with the Pingdom reports.
Starting on this day there were even more downtimes.

> So I'd say please try 2. If you still saw the user-facing loading requests, you need more resident instance to eliminate the user-facing loading requests.

Again: As wrote in my post before that does not work. Check the responses from Kristopher and Jeff on this thread.


Yeah, it's very nice to hear concrete examples from Kristopher and Jeff, other than just saying "I've tried that, but it didn't work".
 

> So what is your expected behavior and actual result? Nobody in our team can do anything if you just keep saying "the setting that used to work doesn't work anymore" without trying mu suggestion.
> I think my answer is clear at least for some points. 1) You'd better use 'min pending latency' instead of 'max pending latency' to prevent new instances to spin up as much as possible. 2) If you need longer instance lives, set appropriate number of min idle instances.

As I wrote: I tried different settings. As many other people in this group as well.
Me and other people are reporting: The settings are broken!
It's very easy to reproduce. Please set up an application, send one request per minute (or second), configure 1 or 2 or 3 min idle instances and check what is happening. You will see that new  instances are started although resistant instances are available.

It's nice if we have a complete reproducible case. I've just started an experiment you mentioned. This time, it's just a helloworld application, and I set 1 min idle instances and 1 minutes cron.

Presumably it will just work fine. Then I will try with slightly different condition. That way, I hope I can determine what kind of condition could be the culprit or not. What do you think? Can you provide some simple projects for that experiment?


Please take it serious and let somebody of the engineers check this!

(I'm one of the engineers btw) A reproducible case is always the best thing to get engineers' attention.

Regards,

-- Takashi
Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 8/24/12 9:02 PM
On Friday, August 24, 2012 2:59:11 PM UTC-7, Johan Euphrosine (Google) wrote:


On Aug 24, 2012 11:28 PM, "Mos" <mos...@googlemail.com> wrote:
>
> Thanks Johan. I read the post some days before.
>
> As often discussed on the mailing-list before and as Jeff said in this thread.
> It's the combination of "Requests should never be sent to cold instances."

Please star this existing feature request:
http://code.google.com/p/googleappengine/issues/detail?id=7865

Done.
 

> and(!) the behavior of min idle instance which doesn't make any sense.

Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.

OK, the post by Jon was an interesting read because it explains why Google seems to think everything is working as intended.  What doesn't seem to be penetrating is that it doesn't matter what some definition on a piece of paper somewhere says the system is supposed to do, if that definition doesn't actually help developers build good products.

The feature starred above absolutely needs to be implemented.  I just wish there was an easier way of getting customers who are frustrated by the instancing behavior to focus on that one feature request, because the naive interpretation of the existing GAE tuning parameters suggests it shouldn't be necessary.

- Kris
Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 8/24/12 9:04 PM

Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.

PS. The behavior described above is not really the problem IMHO, the problem is that the scheduler favors routing traffic to NONEXISTENT dynamic instances rather than idle reserved instances.  No one seems to understand why that would ever be a good idea.

- Kris
Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 8/26/12 2:47 PM
Hi Takashi,

I created a new GAE app to test this and found that I'm not getting the same instance tuning controls in the new app ID that I am getting in my current one.

In my current app, I can set both min and max idle instances, and min and max pending latency.

In the new app, I can set only max idle instances, and min pending latency.

Any ideas why this would be the case?  It complicates the process of setting up a good testbed for this.

- Kris
Re: [google-appengine] Weird Instance Scheduler Jeff Schnitzer 8/26/12 3:10 PM
Free apps have limited controls - perhaps you haven't enabled billing
on the test app?

Jeff
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/vypbs4jA5cgJ.
Re: Weird Instance Scheduler Mobicage 8/26/12 3:25 PM
Hi

Can somebody explain how it is possible that
- I have 1 "resident" java appengine instance
- I didnt send any test requests for 90 minutes
- When I sent the request, the system had to warm up, although I have 1 resident java appengine.

What does resident exactly mean?
My settings are:
* Idle Instances: 1 – 1 )
* Pending latency: (Automatic - 500ms)

Logs (blanked out some stuff with xxx):

2012-08-24 22:14:43.786 /xxx 200 8872ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx)
0.1.0.40 - - [24/Aug/2012:15:14:43 -0700] "POST /xxx HTTP/1.1" 200 793 - "AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx)" "xxx.appspot.com" ms=8873 cpu_ms=5157 cpm_usd=0.000089 loading_request=1 instance=00c61b117c1360f2e8f3cdce153a3c79777a7e81

I 2012-08-24 22:14:43.786
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.

2012-08-24 20:46:47.635 /xxx 200 1862ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.2 - - [24/Aug/2012:13:46:47 -0700] "POST /xxx HTTP/1.1" 200 0 "xxx" "AppEngine-Google; (+http://code.google.com/appengine)" "xxx.appspot.com" ms=1863 cpu_ms=260 cpm_usd=0.000260 queue_name=default task_name=9251619801338774487 instance=00c61b117c266e667e6954feaef8bf2492f23006
 
Thank you!
Re: [google-appengine] Weird Instance Scheduler Johan Euphrosine (Google) 8/26/12 4:10 PM
On Sat, Aug 25, 2012 at 5:02 AM, Kristopher Giesing
<kris.g...@gmail.com> wrote:
> On Friday, August 24, 2012 2:59:11 PM UTC-7, Johan Euphrosine (Google)
> wrote:
>>
>>
>> On Aug 24, 2012 11:28 PM, "Mos" <mos...@googlemail.com> wrote:
>> >
>> > Thanks Johan. I read the post some days before.
>> >
>> > As often discussed on the mailing-list before and as Jeff said in this
>> > thread.
>> > It's the combination of "Requests should never be sent to cold
>> > instances."
>>
>> Please star this existing feature request:
>> http://code.google.com/p/googleappengine/issues/detail?id=7865
>
> Done.
>
>>
>> > and(!) the behavior of min idle instance which doesn't make any sense.
>>
>> Like Jon explained in the post I linked, the scheduler will favor routing
>> traffic to idle dynamic instance rather than idle reserved idle instance and
>> it will always try to maintain the invariant of N x Min-Idle-Instances by
>> starting new instance if the reserved instances are busy.
>
> OK, the post by Jon was an interesting read because it explains why Google
> seems to think everything is working as intended.  What doesn't seem to be
> penetrating is that it doesn't matter what some definition on a piece of
> paper somewhere says the system is supposed to do, if that definition
> doesn't actually help developers build good products.

It does penetrate and we do value feedbacks from the community on the scheduler.

What I was trying to point by referring to Jon post was:
- Here is how the scheduler has been designed
- If you disagree with the design, the group is a good place to
discuss this but ultimately we would like to reach the point where
more specific feature requests are filled (like
http://code.google.com/p/googleappengine/issues/detail?id=7865) that
we can escalate to the engineering team.

> The feature starred above absolutely needs to be implemented.  I just wish
> there was an easier way of getting customers who are frustrated by the
> instancing behavior to focus on that one feature request, because the naive
> interpretation of the existing GAE tuning parameters suggests it shouldn't
> be necessary.

I agree we must do a better job at documenting the current scheduler
behavior, care to star this feature request? :)
http://code.google.com/p/googleappengine/issues/detail?id=5826

>
> - Kris
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/w3m3ZmnH18cJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 8/26/12 4:26 PM
On Sun, Aug 26, 2012 at 11:25 PM, Mobicage <ca...@mobicage.com> wrote:
> Hi
>
> Can somebody explain how it is possible that
> - I have 1 "resident" java appengine instance
> - I didnt send any test requests for 90 minutes
> - When I sent the request, the system had to warm up, although I have 1
> resident java appengine.
>
> What does resident exactly mean?

I believe Jon described resident in great details in this thread:
https://groups.google.com/d/msg/google-appengine/nRtzGtG9790/hLS16qux_04J

"""
Because the scheduler is now treating the reserved instances as
Min-Idle-Instances, what you're describing is expected
behavior. They are intentionally kept idle, and it tries to serve
traffic using the non-reserved instances. Then, if the
non-reserved instances can't keep up, then it will make use of
the reserved instances.
That is, to repeat, the invariant that the scheduler is trying to
maintain here is that your app has at least 3 idle instances.
And if an instance is getting traffic, then it isn't idle. The value
of an idle instance is that it can process requests right-away
if needed, without having to first warmup or do a loading request.

It sounds like what you'd really prefer is something like
Min-Instances, but that's not presently an available option.
"""

The scheduler route traffic to resident idle instance when dynamic
instances can't keep with the traffic and always try to keep Min Idle
Instances in reserve at all times,

I believe it currently need to start at least 1 dynamic instance before
making use of the resident idle capacity.
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/YT9dk2OEsg8J.
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 8/26/12 5:57 PM
On Mon, Aug 27, 2012 at 12:26 AM, Johan Euphrosine <pro...@google.com> wrote:
> On Sun, Aug 26, 2012 at 11:25 PM, Mobicage <ca...@mobicage.com> wrote:
>> Hi
>>
>> Can somebody explain how it is possible that
>> - I have 1 "resident" java appengine instance
>> - I didnt send any test requests for 90 minutes
>> - When I sent the request, the system had to warm up, although I have 1
>> resident java appengine.
>>
>> What does resident exactly mean?
>
> I believe Jon described resident in great details in this thread:
> https://groups.google.com/d/msg/google-appengine/nRtzGtG9790/hLS16qux_04J
>
> """
> Because the scheduler is now treating the reserved instances as
> Min-Idle-Instances, what you're describing is expected
> behavior. They are intentionally kept idle, and it tries to serve
> traffic using the non-reserved instances. Then, if the
> non-reserved instances can't keep up, then it will make use of
> the reserved instances.
> That is, to repeat, the invariant that the scheduler is trying to
> maintain here is that your app has at least 3 idle instances.
> And if an instance is getting traffic, then it isn't idle. The value
> of an idle instance is that it can process requests right-away
> if needed, without having to first warmup or do a loading request.
>
> It sounds like what you'd really prefer is something like
> Min-Instances, but that's not presently an available option.
> """
>
> The scheduler route traffic to resident idle instance when dynamic
> instances can't keep with the traffic and always try to keep Min Idle
> Instances in reserve at all times,
>
> I believe it currently need to start at least 1 dynamic instance before
> making use of the resident idle capacity.

Correction, I answered that too fast.

Resident instances are used for processing incoming request if there
is no dynamic instance, but it is possible that the scheduler warm up
new dynamic instance to maintain the Min Idle Instance invariant.

carl@ if you can give us your application id, and the last timestamp
of occurence of the event you reported, we can try to figure out why
your test spawned a new dynamic instance.

>
>> My settings are:
>> * Idle Instances: ( 1 – 1 )
>> * Pending latency: (Automatic - 500ms)
>>
>> Logs (blanked out some stuff with xxx):
>>
>> 2012-08-24 22:14:43.786 /xxx 200 8872ms 0kb AppEngine-Google;
>> (+http://code.google.com/appengine; appid: s~xxx)
>> 0.1.0.40 - - [24/Aug/2012:15:14:43 -0700] "POST /xxx HTTP/1.1" 200 793 -
>> "AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx)"
>> "xxx.appspot.com" ms=8873 cpu_ms=5157 cpm_usd=0.000089 loading_request=1
>> instance=00c61b117c1360f2e8f3cdce153a3c79777a7e81
>>
>> I 2012-08-24 22:14:43.786
>> This request caused a new process to be started for your application, and
>> thus caused your application code to be loaded for the first time. This
>> request may thus take longer and use more CPU than a typical request for
>> your application.
>>
>> 2012-08-24 20:46:47.635 /xxx 200 1862ms 0kb AppEngine-Google;
>> (+http://code.google.com/appengine)
>> 0.1.0.2 - - [24/Aug/2012:13:46:47 -0700] "POST /xxx HTTP/1.1" 200 0 "xxx"
>> "AppEngine-Google; (+http://code.google.com/appengine)" "xxx.appspot.com"
>> ms=1863 cpu_ms=260 cpm_usd=0.000260 queue_name=default
>> task_name=9251619801338774487
>> instance=00c61b117c266e667e6954feaef8bf2492f23006
>>
>> Thank you!
>>
>>
>> On Wednesday, August 22, 2012 9:58:44 PM UTC+2, Mos wrote:
>>>
>>> Does anybody else experience abnormal behavior of the instance-scheduler
>>> the last three weeks (the last 7 days it got even worse)?  (Java / HRD)
>>> Or does anybody has profound knowledge about it?
>>>
>>> Background:  My application is unchanged for weeks, configuration not
>>> changed and application's traffic is constant.
>>> Traffic: One request per minute from Pingdom and around 200 additional
>>> pageviews the day (== around 1500 pageviews the day). The peek is not more
>>> then 3-4 request per minute.
>>>
>>> It's very obvious that one instance should be enough for my application.
>>> And that was almost the case the last months!
>>>
>>> But now GAE creates most of the time 3 instances, whereby on has a long
>>> life-time for days and the other ones are restarted around
>>> 10 to 30 times the day.
>>> Because load request takes between 30s to 40s  and requests are waiting
>>> for loading instances, there are many request that
>>> fail  (Users and Pingdom agree: A request that takes more then a couple of
>>> seconds is a failed request!)
>>>
>>> Please check the attached screenshots that show the behavior!
>>>
>>> Note:
>>> - Killing instances manually did not help
>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever didn't
>>> not change anything; e.g. like ( Automatic – 4 )
>>>
>>> Thanks and Cheers
>>> Mos
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msg/google-appengine/-/YT9dk2OEsg8J.
>>
>> To post to this group, send email to google-a...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> google-appengi...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>
>
>
> --
> Johan Euphrosine (proppy)
> Developer Programs Engineer
> Google Developer Relations



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Re: Weird Instance Scheduler Carl Schroeder 8/26/12 8:59 PM
Let me see if I understand this correctly: there is currently no way on app engine to ensure that there is an instance ready to process incoming requests for an app that has been idle for some period of time. Min idle instances (labeled as Resident) sit there and do almost nothing while user facing requests are instead sent to cold instance starts. If true, that dovetails with what I have seen in the behavior of my app. For python runtimes with sub-second spinup times, this is no big deal. For java runtimes with spinup times in double digit seconds it is a deal-breaker of a "feature".

The problem seems to be that the scheduler thinks sending a request to a non-existent dynamic instance is a better idea than using the Resident instance for it's intended purpose: to serve requests when dynamic instances are unable to. This is probably a corner case born of low traffic conditions that allow user request serving dynamic instances to despawn.

For low traffic apps, "Resident" instances serve almost no purpose. Better to do away with them via the slider bars and just set up a script to tickle the app just often enough to keep one "Dynamic" instance resident.

So, two features to fix this: 
First, a slider bar labeled "Minimum Dynamic instances" ;)
Second, a button to enable sending warm-up requests and having them return before considering an instance for user facing requests.


Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 8/26/12 10:14 PM
I didn't realize that.  That makes sense now, thanks.

Kind of a drag that I need to pay for a test app though.

- Kris

On Sunday, August 26, 2012 3:10:43 PM UTC-7, Jeff Schnitzer wrote:
Free apps have limited controls - perhaps you haven't enabled billing
on the test app?

Jeff

On Sun, Aug 26, 2012 at 5:47 PM, Kristopher Giesing
<kris.g...@gmail.com> wrote:
> Hi Takashi,
>
> I created a new GAE app to test this and found that I'm not getting the same
> instance tuning controls in the new app ID that I am getting in my current
> one.
>
> In my current app, I can set both min and max idle instances, and min and
> max pending latency.
>
> In the new app, I can set only max idle instances, and min pending latency.
>
> Any ideas why this would be the case?  It complicates the process of setting
> up a good testbed for this.
>
> - Kris
>
> On Friday, August 24, 2012 7:59:17 PM UTC-7, Takashi Matsuo (Google) wrote:
>>
>> On Sat, Aug 25, 2012 at 5:24 AM, Mos <mos...@googlemail.com> wrote:
>>>
>>> > Setting Max Pending Latency doesn't force requests to be in the pending
>>> > queue for the specified time. Please use Min Pending Latency instead.
>>>
>>> As you know my setting to "Min Pending Latency" was automatic. The
>>> expectation is that GAE takes a reasonable default latency if it is
>>> "automatic".
>>> And you say:  Every parallel request starts a new instance if it is
>>> "automatic"? That' would be a "Min Pending Latency" of zero and not
>>> "automatic".
>>>
>>> > If it doesn't work, try 2 min idle instances then
>>>
>>> Please check the responses of other user in this thread.  This feature is
>>> totally broken and can not be used.
>>>
>>>
>>>
>>> >> And around the 16th august?
>>> > Sigh... isn't it a waist of time? What is the reason you picked that
>>> > date?
>>>
>>> Did you see/studied my pictures from the first post of this thread?
>>> The statistic shows that on this date the instance creation gets crazy.
>>> I double checked it with the Pingdom reports.
>>> Starting on this day there were even more downtimes.
>>>
>>> > So I'd say please try 2. If you still saw the user-facing loading
>>> > requests, you need more resident instance to eliminate the user-facing
>>> > loading requests.
>>>
>>> Again: As wrote in my post before that does not work. Check the responses
>>> from Kristopher and Jeff on this thread.
>>>
>>
>> Yeah, it's very nice to hear concrete examples from Kristopher and Jeff,
>> other than just saying "I've tried that, but it didn't work".
>>
>>>
>>>
>>> > So what is your expected behavior and actual result? Nobody in our team
>>> > can do anything if you just keep saying "the setting that used to work
>>> > doesn't work anymore" without trying mu suggestion.
>>> > I think my answer is clear at least for some points. 1) You'd better
>>> > use 'min pending latency' instead of 'max pending latency' to prevent new
>>> > instances to spin up as much as possible. 2) If you need longer instance
>>> > lives, set appropriate number of min idle instances.
>>>
>>> As I wrote: I tried different settings. As many other people in this
>>> group as well.
>>> Me and other people are reporting: The settings are broken!
>>> It's very easy to reproduce. Please set up an application, send one
>>> request per minute (or second), configure 1 or 2 or 3 min idle instances and
>>> check what is happening. You will see that new  instances are started
>>> although resistant instances are available.
>>
>>
>> It's nice if we have a complete reproducible case. I've just started an
>> experiment you mentioned. This time, it's just a helloworld application, and
>> I set 1 min idle instances and 1 minutes cron.
>>
>> Presumably it will just work fine. Then I will try with slightly different
>> condition. That way, I hope I can determine what kind of condition could be
>> the culprit or not. What do you think? Can you provide some simple projects
>> for that experiment?
>>
>>>
>>> Please take it serious and let somebody of the engineers check this!
>>
>>
>> (I'm one of the engineers btw) A reproducible case is always the best
>> thing to get engineers' attention.
>>
>> Regards,
>>
>> -- Takashi
>>
>>>
>>> Cheers
>>> Mos
>>>
>>>
>>> On Fri, Aug 24, 2012 at 8:43 PM, Takashi Matsuo <tma...@google.com>
>>> wrote:
>>>>
>>>>
>>>> Hi Mos,
>>>>
>>>> On Sat, Aug 25, 2012 at 1:39 AM, Mos <mos...@googlemail.com> wrote:
>>>>>
>>>>> Hello Takashi,
>>>>>
>>>>>
>>>>> > Actually there were almost 8 requests in a second. So App Engine
>>>>> > likely needed more than one instance at this particular moment.
>>>>>
>>>>> I thought this is why GAE has the concept of "pending-latency"  (which
>>>>> we discussed below).
>>>>> Meaning:  Incoming requests may wait up to 15 seconds before starting a
>>>>> new instance. Therefore when 8 requests in one second occur that
>>>>> should not mean that more instance needs to be started. Especially if
>>>>> there is no other traffic in this minute, as seen in my example.
>>>>> Otherwise it would be a very bad implementation:
>>>>> Starting a new instance means around 30s waiting time.  Serving 8
>>>>> parallel requests from one instance, would result in a maximum of
>>>>> 8 seconds for the last request (assuming that each request takes around
>>>>> 1 second).
>>>>> There is no reason for this concrete example to fire up more instances
>>>>> and let requests wait more then 30 seconds until a new instance is loaded.
>>>>
>>>>
>>>> Do you really read my e-mail?
>>>>
>>>> Setting Max Pending Latency doesn't force requests to be in the pending
>>>> queue for the specified time. Please use Min Pending Latency instead.
>>>> Can you try this first? If it doesn't work, try 2 min idle instances
>>>> then.
>>>>
>>>>>
>>>>>
>>>>> > ... here is what you've seen in the past weeks.
>>>>> >
>>>>> >* You have been almost always set 'Automatic-2' idle instance setting.
>>>>> >* More than 3 weeks ago, number of loading requests were very few.
>>>>> > * Recently you have seen more loading requests than before.
>>>>>
>>>>> That, right!  To be even more concrete: At the 16. august the problems
>>>>> got significant worse. Please check especially the time area from 16. august
>>>>> until today.
>>>>>
>>>>> > First of all, it seems that you deployed 2 new versions on Aug 1 and
>>>>> > Aug 2. Can you describe what kind of changes in those versions?
>>>>>
>>>>> I checked it in our version control. As I wrote no related changes were
>>>>> made! Just Html/Css stuff:
>>>>>  * One picture upload
>>>>>  * One html change
>>>>>  * One JavaScript change
>>>>>  * One css change
>>>>>
>>>>>
>>>>> > And, to be fair, we didn't think of any change in our scheduler
>>>>> > around 3 weeks ago which can cause this issue.
>>>>>
>>>>> And around the 16th august?
>>>>
>>>>
>>>> Sigh... isn't it a waist of time? What is the reason you picked that
>>>> date?
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> > More than 3 weeks before, those 2 idle instances might have had
>>>>> > longer lives than now, but it was not a concrete behavior. Please think this
>>>>> > way: you were just kind of lucky.
>>>>>
>>>>> That shouldn't be luck! If GAE is not able to start Java instances in
>>>>> 5sec to 10 second, there needs be a guarantee that instances have longer
>>>>> lives.  Otherwise Java applications on GAE are unusable because user would
>>>>> have a lot of 30seconds wait time  (--> "failed requests"). (See also next
>>>>> comment regarding resistant instances)
>>>>>
>>>>>
>>>>> > If you want some instances always active, please set min idle
>>>>> > instances.
>>>>>
>>>>> I tried this some days ago. I had one resistant instance. But that
>>>>> changed nothing.  Instances get started and stopped as before. I assumed
>>>>> that requests would go to the resistant instance first. But that was no the
>>>>> case. Resistant instance was idle, but a dynamic instance got started and
>>>>> the request waits 30sec.
>>>>>
>>>>> Please check other discussion on this list and issues that reported
>>>>> similar observations.
>>>>
>>>>
>>>> So I'd say please try 2. If you still saw the user-facing loading
>>>> requests, you need more resident instance to eliminate the user-facing
>>>> loading requests.
>>>>
>>>>>
>>>>>
>>>>> > As you can see, I'm still not convinced to believe that the scheduler
>>>>> > is misbehaving. I understand that you're having experiences which are bit
>>>>> > worse than 3 weeks ago, and understand your feeling that you want to tell us
>>>>> > 'fix it', but I'd say it's > >still something in the line of 'expected
>>>>> > behavior' at least for now.
>>>>> > If you feel differently, please let me know.
>>>>>
>>>>> Yes I do feel differently (please see answers above).
>>>>>
>>>>> Please accept
>>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004
>>>>
>>>>
>>>> So what is your expected behavior and actual result? Nobody in our team
>>>> can do anything if you just keep saying "the setting that used to work
>>>> doesn't work anymore" without trying mu suggestion.
>>>>
>>>> I think my answer is clear at least for some points. 1) You'd better use
>>>> 'min pending latency' instead of 'max pending latency' to prevent new
>>>> instances to spin up as much as possible. 2) If you need longer instance
>>>> lives, set appropriate number of min idle instances.
>>>>
>>>> -- Takashi
>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Mos
>>>>> http://www.mosbase.com
>>>>>
>>>>>
>>>>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo <tma...@google.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Mos,
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <mos...@googlemail.com> wrote:
>>>>>>>
>>>>>>> > A possible explanation could be that the traffic pattern had
>>>>>>> > changed.
>>>>>>>
>>>>>>> No. It's the same. Check for example the Request/Seconds statistics
>>>>>>> of my application for the last 30 days!
>>>>>>>
>>>>>>>
>>>>>>> >> It's very obvious that one instance should be enough for my
>>>>>>> >> application. And that was almost the case the last months!
>>>>>>> > Actually it's not true. In particular, check this log:
>>>>>>>
>>>>>>> That's one expection where one client did 8 request in a minute  (+
>>>>>>> one pingdom). Nothing else this minute.
>>>>>>> In those exceptional cases it could be ok if a second instance
>>>>>>> starts. (Nevertheless can't one instance not
>>>>>>> handle 8 requests a  minute?)
>>>>>>
>>>>>>
>>>>>> The issue here is not 8 requests in a minute. Actually there were
>>>>>> almost 8 requests in a second. So App Engine likely needed more than one
>>>>>> instance at this particular moment. Anyway, as you say, probably it's just a
>>>>>> reason for one of the loading requests you're seeing, and this is not very
>>>>>> important thing in this topic.
>>>>>>
>>>>>> It's kind of digressing, but at a first glance, the Requests/Seconds
>>>>>> stat seems an appropriate data source to discuss how many instances are
>>>>>> actually needed, but in fact, it's not. The real traffic is not spreading
>>>>>> equally.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As I described:  Instances are started and stopped without reason,
>>>>>>> even if less traffic per minute is available!
>>>>>>
>>>>>>
>>>>>> Okay. As far as I understand, here is what you've seen in the past
>>>>>> weeks.
>>>>>>
>>>>>> * You have been almost always set 'Automatic-2' idle instance setting.
>>>>>> * More than 3 weeks ago, number of loading requests were very few.
>>>>>> * Recently you have seen more loading requests than before.
>>>>>>
>>>>>> First of all, it seems that you deployed 2 new versions on Aug 1 and
>>>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>>> I'd like to make sure that there is no changes that can cause the
>>>>>> scheduler/app server behaving differently.
>>>>>>
>>>>>> Especially, if you want me to escalate this issue to our engineering
>>>>>> team, you should provide the exact information. You say 'My application is
>>>>>> unchanged', but in fact you deployed the new version on that day when you
>>>>>> described the issue started. I need to make sure that there is no big change
>>>>>> which can cause something bad.
>>>>>>
>>>>>> And, to be fair, we didn't think of any change in our scheduler around
>>>>>> 3 weeks ago which can cause this issue.
>>>>>>
>>>>>> Secondly, you're setting max idle instances = 2. It does not guarantee
>>>>>> that you have always 2 instances. It just guarantees that we will never
>>>>>> charge you for more than 2 idle instances at any time.
>>>>>>
>>>>>> More than 3 weeks before, those 2 idle instances might have had longer
>>>>>> lives than now, but it was not a concrete behavior. Please think this way:
>>>>>> you were just kind of lucky. Now, presumably one or two of those instances
>>>>>> are occasionally killed for some reasons(there should be certain legitimate
>>>>>> reasons, but those are something you don't need to care).
>>>>>>
>>>>>> If you want some instances always active, please set min idle
>>>>>> instances. Certainly it will cost you a bit more, and you will loose the
>>>>>> pending queue, but considering the access pattern of your app(no bursty
>>>>>> traffic except for few access from the iPhone browser), I would recommend
>>>>>> trying this setting in order to achieve what you want here. I'd recommend 2
>>>>>> idle instances in this case, but you should decide the number.
>>>>>>
>>>>>>>
>>>>>>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>>
>>>>>>> " is high App Engine will allow requests to wait rather than start
>>>>>>> new Instances to process them"
>>>>>>> --> One attempt to stop GAE to create unnecessary instances.
>>>>>>
>>>>>>
>>>>>> I think you should set min pending latency instead of max pending
>>>>>> latency if you want to prevent new instance to spin up. However, if you're
>>>>>> going to set min idle instances, this setting will almost loose effect. If
>>>>>> you don't want to set any min idle instances for whatever reason, please
>>>>>> consider setting min pending latency instead of max pending latency.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> > * Can you try automatic-automatic for idle instances setting?
>>>>>>>
>>>>>>> I played around with this the last days and nothing changed. As I
>>>>>>> wrote:  I had those configuration for months and it worked fine 3-4 weeks
>>>>>>> ago!
>>>>>>>
>>>>>>>
>>>>>>> > * What is the purpose of those pingdom check? What happens if you
>>>>>>> > stop that?
>>>>>>>
>>>>>>> To be alerted if GAE is down a again. "What happens if you stop
>>>>>>> that?" --> I wouldn't be angry anymore because I wouldn't recognize
>>>>>>> downtime's of my GAE application. ;)
>>>>>>>
>>>>>>>
>>>>>>> Please forward
>>>>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to the
>>>>>>> relevant GAE deparment.
>>>>>>
>>>>>>
>>>>>> As you can see, I'm still not convinced to believe that the scheduler
>>>>>> is misbehaving. I understand that you're having experiences which are bit
>>>>>> worse than 3 weeks ago, and understand your feeling that you want to tell us
>>>>>> 'fix it', but I'd say it's still something in the line of 'expected
>>>>>> behavior' at least for now.
>>>>>>
>>>>>> If you feel differently, please let me know.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> -- Takashi
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo <tma...@google.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Mos,
>>>>>>>>
>>>>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <mos...@googlemail.com> wrote:
>>>>>>>>>
>>>>>>>>> Does anybody else experience abnormal behavior of the
>>>>>>>>> instance-scheduler the last three weeks (the last 7 days it got even worse)?
>>>>>>>>> (Java / HRD)
>>>>>>>>> Or does anybody has profound knowledge about it?
>>>>>>>>>
>>>>>>>>> Background:  My application is unchanged for weeks, configuration
>>>>>>>>> not changed and application's traffic is constant.
>>>>>>>>> Traffic: One request per minute from Pingdom and around 200
>>>>>>>>> additional pageviews the day (== around 1500 pageviews the day). The peek is
>>>>>>>>> not more then 3-4 request per minute.
>>>>>>>>
>>>>>>>>
>>>>>>>> A possible explanation could be that the traffic pattern had
>>>>>>>> changed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It's very obvious that one instance should be enough for my
>>>>>>>>> application. And that was almost the case the last months!
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually it's not true. In particular, check this log:
>>>>>>>>
>>>>>>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search
>>>>>>>>
>>>>>>>> You can see the iPhone client repeatedly requests your dynamic
>>>>>>>> resources in a very short amount of time. Presumably it's due to some kind
>>>>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, and
>>>>>>>> that this access pattern can cause a new instance starting?
>>>>>>>>
>>>>>>>> I don't think this is the only reason, but this can explain that
>>>>>>>> some portion of your loading requests are expected behavior.
>>>>>>>>
>>>>>>>> Now I'd like to ask you some questions.
>>>>>>>>
>>>>>>>>
>>>>>>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>>> * Can you try automatic-automatic for idle instances setting?
>>>>>>>> * What is the purpose of those pingdom check? What happens if you
>>>>>>>> stop that?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But now GAE creates most of the time 3 instances, whereby on has a
>>>>>>>>> long life-time for days and the other ones are restarted around
>>>>>>>>> 10 to 30 times the day.
>>>>>>>>> Because load request takes between 30s to 40s  and requests are
>>>>>>>>> waiting for loading instances, there are many request that
>>>>>>>>> fail  (Users and Pingdom agree: A request that takes more then a
>>>>>>>>> couple of seconds is a failed request!)
>>>>>>>>>
>>>>>>>>> Please check the attached screenshots that show the behavior!
>>>>>>>>>
>>>>>>>>> Note:
>>>>>>>>> - Killing instances manually did not help
>>>>>>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever
>>>>>>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>>>>>>
>>>>>>>>> Thanks and Cheers
>>>>>>>>>
>>>>>>>>> Mos
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "Google App Engine" group.
>>>>>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> google-appengi...@googlegroups.com.
>>>>>>>>>
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Takashi Matsuo | Developers Advocate | tma...@google.com
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Google App Engine" group.
>>>>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> google-appengi...@googlegroups.com.
>>>>>>>>
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Google App Engine" group.
>>>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> google-appengi...@googlegroups.com.
>>>>>>>
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Takashi Matsuo | Developers Advocate | tma...@google.com
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> google-appengi...@googlegroups.com.
>>>>>>
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Google App Engine" group.
>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> google-appengi...@googlegroups.com.
>>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Takashi Matsuo | Developers Advocate | tma...@google.com
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>> To unsubscribe from this group, send email to
>>>> google-appengi...@googlegroups.com.
>>>>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to google-a...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> google-appengi...@googlegroups.com.
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>>
>>
>> --
>> Takashi Matsuo | Developers Advocate | tma...@google.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/vypbs4jA5cgJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 8/26/12 10:17 PM
Resident instances are used for processing incoming request if there
is no dynamic instance

This is the behavior we all want, but experimentation seems to indicate it doesn't happen, at least for some apps.

- Kris
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 8/27/12 2:16 AM
On Mon, Aug 27, 2012 at 5:59 AM, Carl Schroeder
<schroede...@gmail.com> wrote:
> Let me see if I understand this correctly: there is currently no way on app
> engine to ensure that there is an instance ready to process incoming
> requests for an app that has been idle for some period of time. Min idle
> instances (labeled as Resident) sit there and do almost nothing while user
> facing requests are instead sent to cold instance starts. If true, that
> dovetails with what I have seen in the behavior of my app. For python
> runtimes with sub-second spinup times, this is no big deal. For java
> runtimes with spinup times in double digit seconds it is a deal-breaker of a
> "feature".
>
> The problem seems to be that the scheduler thinks sending a request to a
> non-existent dynamic instance is a better idea than using the Resident
> instance for it's intended purpose: to serve requests when dynamic instances
> are unable to. This is probably a corner case born of low traffic conditions
> that allow user request serving dynamic instances to despawn.

Hi Carl,

That's not what we observed, as I corrected in the previous email:
"""
Resident instances are used for processing incoming request if there
is no dynamic instance, but it is possible that the scheduler warm up
new dynamic instance to maintain the Min Idle Instance invariant.
"""

If you observe a different behavior please comment with your
application id and the timestamp of occurence and we can try to figure
out what happened.

Thanks in advance.

>
> For low traffic apps, "Resident" instances serve almost no purpose. Better
> to do away with them via the slider bars and just set up a script to tickle
> the app just often enough to keep one "Dynamic" instance resident.
>
> So, two features to fix this:
> First, a slider bar labeled "Minimum Dynamic instances" ;)
> Second, a button to enable sending warm-up requests and having them return
> before considering an instance for user facing requests.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/G4DPOlW2Jh8J.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 8/27/12 2:17 AM
Hi Kristopher,

Can you comment with the appid and timestamps of when this last happened?

Thanks in advance.

>
> - Kris
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/_wh1KzpESLEJ.
Re: [google-appengine] Re: Weird Instance Scheduler Carl Schroeder 8/27/12 8:11 AM
2012-08-27 08:05 is the point in the logs. 1 Resident instance. No Dynamic instances. 
The request was sent to a cold starting Dynamic instance. Resident instance did nothing. 
Request took 18 seconds to serve.
Re: [google-appengine] Re: Weird Instance Scheduler Mobicage 8/27/12 8:19 AM
Hi Carl,

I see exactly the same behaviour for my Java appengine app.
Resident instance does nothing; instead idle instance is started, going through several seconds of warmup, then request is served.

Regards


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ApT6E62dU9QJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Carl D'Halluin

Next-generation communication at http://www.rogerthat.net

Email: ca...@mobicage.com
Phone: +32 9 324 25 64
Fax: +32 9 324 25 65
Skype: carldhalluin
Twitter: @carl_dhalluin

NV MOBICAGE
Antwerpsesteenweg 19
9080 Lochristi
Belgium

Re: [google-appengine] Re: Weird Instance Scheduler Mos 8/27/12 8:34 AM
I saw the same behavior (as discussed before in the thread). Many other people reported this again and again on this mailing-list.
Google has to acknowledge that the current implementation is buggy or the implementation works but doesn't make any sense in practice. 

Bye the way - The problem is not restricted to resident instances. From time to time the same happens for dynamic instances:

One or more dynamic instances are running and are almost idle  (sometimes really idle==no request or just one request is served).
Request comes and starts a new dynamic instance, it goes through 30-40 seconds of warmup, then request is served.   
Re: [google-appengine] Re: Weird Instance Scheduler Mos 8/27/12 10:05 AM
In http://code.google.com/p/googleappengine/issues/detail?id=8004#c8  I described in detail a current example of the nonconforming instance-handling of GAE.
Please check the comment, the screenshot and the log-file I filed there.

Dear GAE-Team, what else do you need to fix this?  In this thread and in several issues you should have more than enough proof and examples.....

Cheers
Mos
Re: [google-appengine] Re: Weird Instance Scheduler Carl Schroeder 8/27/12 10:13 AM
Yep. Googlites, let us know what else you need to run this down.
Re: [google-appengine] Re: Weird Instance Scheduler Takashi Matsuo (Google) 8/29/12 3:54 PM

Hi Mos and everyone,

I'm trying to reproduce the issue about min idle instance which some of you guys reported here in this thread, saying "Setting min idle instances doesn't work for me".

My initial test is just with a simple helloworld Java application multithread enabled, setting 1 min idle instance, and setting 1 min cron job. I ran this test for about 2 and half days. I think it just worked as expected. The resident instance had been alive and handled 3625 requests during the test.

What I'm planning to do next is another experiment with an application with Spring MVC. I'll update with the result hopefully next week.

At the same time, I'd like one of you to file an issue on our issue tracker for this particular topic, 'Setting min idle instances doesn't work', hopefully with expected behavior, actual results, a characteristic of the application like average time for loading requests as well as normal requests, etc. I've done a quick search on our issue tracker, and I don't think there's any issue yet. If there's already an issue about it, please let me know.

Thanks,


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/kb9OyMgMH5wJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Takashi Matsuo | Developers Advocate | tma...@google.com

Re: [google-appengine] Re: Weird Instance Scheduler Jeff Schnitzer 8/29/12 5:17 PM
This is not a very good test.  Better would be:  Run 'ab -c 1' against it and see if you get any cold starts.  Change 1 to a larger number, up to what concurrency we should expect for a multithreaded instance.

Jeff
Re: [google-appengine] Re: Weird Instance Scheduler Takashi Matsuo (Google) 8/29/12 6:07 PM

Jeff,

Thanks for the suggestion, and probably that's true. I've chosen this test from Mos's e-mail, because I got a feeling that he saw odd behaviors even with one request per minute. Hopefully I can do another test based on your suggestion soon.

Please note that you can also provide your test result on our issue tracker and help us reproduce the issue :)

Thanks,
Re: [google-appengine] Re: Weird Instance Scheduler Jeff Schnitzer 8/29/12 8:16 PM
I really do wish I had time right now to help track this down - believe me, this issue is very relevant to my interests!

Jeff
Re: [google-appengine] Re: Weird Instance Scheduler Carl Schroeder 8/29/12 9:55 PM
Try making a page that consists of more than a single request to the server. A burst of requests that is served (not static content) under the pending latency time would usually trigger an instance spin-up. 
Now that spin-up times are back to normal, I am not seeing this behavior nearly as often.
Re: [google-appengine] Re: Weird Instance Scheduler Mos 8/30/12 12:54 AM
Hello Takashi,

I can offer you to deploy my Spring MVC application to another app-engine-account for testing purpose (Sure, it shouldn't be public available there.)
Then you can test and configure it like you want....

Or even simpler:   My application is for days really unreliable. Many requests take more then 20s or 30 seconds (because of loading requests).
Hence -- it couldn't get much more worse.  You are free to change my configuration settings, as long as we arrange a specific time periods
and as long as my bill doesn't go up.  (Just contact me directly)
Bye the way:  Is a colleague of yours evaluation http://code.google.com/p/googleappengine/issues/detail?id=8004?
Then we should wait, otherwise his evaluation may be disrupted.


> At the same time, I'd like one of you to file an issue on our issue tracker for this particular topic, 'Setting min idle instances doesn't work',

The problem is not restricted to resident instances (min idle instances). From time to time the same happens for dynamic instances:


One or more dynamic instances are running and are almost idle  (sometimes really idle==no request or just one request is served).
Request comes and starts a new dynamic instance, it goes through 30-40 seconds of warmup, then request is served.

Please check this example:
http://code.google.com/p/googleappengine/issues/detail?id=8004#c8


Cheers
Mos
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 8/30/12 6:48 PM
I posted a great deal of information in the thread here:


In that thread I posted logs that showed that the very first request after setting min instances to 1 will spawn a new instance (in addition to the instance that the min instances setting created).  The app ID used in that testing is "titan-game-qa" and the timestamps are in the logs I posted.

At some point I will have enough bandwidth to set up a more specific test, but I feel I've already posted plenty of information for GAE engineers to digest.

- Kris
Re: [google-appengine] Re: Weird Instance Scheduler Mos 8/31/12 7:52 AM
Today again hundreds of useless instance restarts and many  DeadlineExceededException.
(Tried many configuration issues. Nothing helps.  My last try : max-idle-instance to one)

Any news on http://code.google.com/p/googleappengine/issues/detail?id=8004 ?

As Kris wrote below the problem exists now for one month!

GOOGLE, PLEASE FIX THE CRITICAL PRODUCTION PROBLEM  .. we are loosing every day money and customer as long as GAE/Java works like junk !!

I deeply regret to trust Google/GAE and build our application for this PaaS.

Mos
Re: [google-appengine] Re: Weird Instance Scheduler Takashi Matsuo (Google) 8/31/12 8:37 AM
On Fri, Aug 31, 2012 at 10:48 AM, Kristopher Giesing <kris.g...@gmail.com> wrote:
I posted a great deal of information in the thread here:


In that thread I posted logs that showed that the very first request after setting min instances to 1 will spawn a new instance (in addition to the instance that the min instances setting created).  The app ID used in that testing is "titan-game-qa" and the timestamps are in the logs I posted.

As far as I can see, all of the loading requests in the logs are warmup requests(sorry if I misread the logs). This is very likely an expected behavior. If your resident instance get a request, and if the request is CPU intensive, our scheduler needs to spin up a new instance by sending a warmup request in order to keep the number of idle instance. This will help absorbing subsequent traffic, and this behavior is definitely what the resident instances are for.
 

At some point I will have enough bandwidth to set up a more specific test, but I feel I've already posted plenty of information for GAE engineers to digest.

Yeah, if you can file an issue with that information, that will definitely help. However, please keep in mind the expected behavior I mentioned above, and add your expected behavior in detail(don't say just 'It didn't work') alongside the things you actually observe.

Thanks,

-- Takashi
 
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ubhrxTXYlC4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Takashi Matsuo | Developers Advocate | tma...@google.com

Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 8/31/12 9:16 PM
This is the request that I actually issued, being handled:

2012-07-31 23:08:28.045 /api/game/57002?pretty=true 200 7893ms 11kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.25 (KHTML, like Gecko) Version/6.0 Safari/536.25
76.102.149.245 - kris [31/Jul/2012:23:08:28 -0700] "GET /api/game/57002?pretty=true HTTP/1.1" 200 11652 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.25 (KHTML, like Gecko) Version/6.0 Safari/536.25" "titan-game-qa.appspot.com" ms=7893 cpu_ms=3520 api_cpu_ms=0 cpm_usd=0.099322 instance=00c61b117c77507e2cfe78a0806d0ca80b52720e

These are the *two* preceding warmup requests:

** Dynamic instance warmup **
2012-07-31 23:08:27.475 /_ah/warmup 200 5873ms 0kb
0.1.0.3 - - [31/Jul/2012:23:08:27 -0700] "GET /_ah/warmup HTTP/1.1" 200 60 - - "1.360723738856412175.titan-game-qa.appspot.com" ms=5873 cpu_ms=2475 api_cpu_ms=0 cpm_usd=0.068778 loading_request=1 instance=00c61b117cdaae6145945d99c16aeee7cc0f4ad8

** Resident/idle instance warmup **
2012-07-31 23:07:42.842 /_ah/warmup 200 5045ms 0kb
0.1.0.3 - - [31/Jul/2012:23:07:42 -0700] "GET /_ah/warmup HTTP/1.1" 200 60 - - "1.360723738856412175.titan-game-qa.appspot.com" ms=5046 cpu_ms=2475 api_cpu_ms=0 cpm_usd=0.068778 loading_request=1 instance=00c61b117c77507e2cfe78a0806d0ca80b52720e

This is my point.  The problem is not that a new instance was spawned (although I admit that I did not quite understand the desired behavior when I first posted this data).  The problem is that the request I issued is not satisfied until AFTER the warmup request has been issued and handled by the new instance.  The request should FIRST have been handled by the already resident instance, AND THEN the new instance should have been spawned.

If I'm misunderstanding something, please clarify, because at the face of it this seems to be a smoking gun.

- Kris
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 8/31/12 9:25 PM
OK. Something just became clearer to me.

The requests appear to be tagged with the instance that handles the request.  Based on that data, it looks like my request is in fact being handled by the resident instance, not the new dynamic instance.

The puzzle then becomes why the request still takes 8s to satisfy when the instance handling it is already warmed, and the in-application logging code (which I didn't post, but trust me on this) is never higher than about 400ms.  I had been assuming that the 8s cost was the cost of the new instance spinning up, but the instance tag seems to contradict that.

The answer has to be some kind of static initialization cost.  Although my app is not very complex, I wonder if this is due to the class path scanning that JDO does.  I have since switched to Objectify, but I am actually not very clear on whether that is sufficient to prevent JDO/JPA class path scanning; it seems like I would need to evict the JDO/JPA core code from my application on deployment, but it's far from clear to me how to do that.

... But even that may not really explain this behavior because you would think static initialization costs would be born by the warmup request.

So, actually, I am baffled.  Any ideas, anyone?

- Kris


On Friday, August 31, 2012 9:16:16 PM UTC-7, Kristopher Giesing wrote:
This is the request that I actually issued, being handled:

2012-07-31 23:08:28.045 /api/game/57002?pretty=true 200 7893ms 11kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.25 (KHTML, like Gecko) Version/6.0 Safari/536.25
76.102.149.245 - kris [31/Jul/2012:23:08:28 -0700] "GET /api/game/57002?pretty=true HTTP/1.1" 200 11652 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.25 (KHTML, like Gecko) Version/6.0 Safari/536.25" "titan-game-qa.appspot.com" ms=7893 cpu_ms=3520 api_cpu_ms=0 cpm_usd=0.099322 instance=00c61b117c77507e2cfe78a0806d0ca80b52720e

These are the *two* preceding warmup requests:

** Dynamic instance warmup **
2012-07-31 23:08:27.475 /_ah/warmup 200 5873ms 0kb
0.1.0.3 - - [31/Jul/2012:23:08:27 -0700] "GET /_ah/warmup HTTP/1.1" 200 60 - - "1.360723738856412175.titan-game-qa.appspot.com" ms=5873 cpu_ms=2475 api_cpu_ms=0 cpm_usd=0.068778 loading_request=1 instance=00c61b117cdaae6145945d99c16aeee7cc0f4ad8

** Resident/idle instance warmup **
2012-07-31 23:07:42.842 /_ah/warmup 200 5045ms 0kb
0.1.0.3 - - [31/Jul/2012:23:07:42 -0700] "GET /_ah/warmup HTTP/1.1" 200 60 - - "1.360723738856412175.titan-game-qa.appspot.com" ms=5046 cpu_ms=2475 api_cpu_ms=0 cpm_usd=0.068778 loading_request=1 instance=00c61b117c77507e2cfe78a0806d0ca80b52720e

This is my point.  The problem is not that a new instance was spawned (although I admit that I did not quite understand the desired behavior when I first posted this data).  The problem is that the request I issued is not satisfied until AFTER the warmup request has been issued and handled by the new instance.  The request should FIRST have been handled by the already resident instance, AND THEN the new instance should have been spawned.

If I'm misunderstanding something, please clarify, because at the face of it this seems to be a smoking gun.

- Kris
Re: [google-appengine] Re: Weird Instance Scheduler Takashi Matsuo (Google) 8/31/12 9:58 PM

Does your warmup request initialize the persistent manager, or some libraries you may want to preload beforehand?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/4FGx8YdHUIgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
RE: [google-appengine] Re: Weird Instance Scheduler Brandon Wirtz 8/31/12 11:02 PM

> So, actually, I am baffled.  Any ideas, anyone?

 

Does your warm up load all your classes?

 

Warm is kind of relative J

 

 

Re: [google-appengine] Re: Weird Instance Scheduler Carl Schroeder 9/1/12 9:54 AM
The following two log entries appeared next to one another. The whole page usually loads in under 3 seconds. 
There is 1 resident instance. The first network request sent by the page is the first log entry. It was not served by the resident instance.
There were no dynamic instances at the time of the request.
The minimum pending latency of the app is set to 14.9s (the largest possible value). This is an attempt to tell the scheduler to leave me alone.
Two instances started for a page that did not need them. The resident instance did nothing. It served zero requests from this page load.
One of the startups was done via a user facing request which caused the page to stall and take over 15 seconds to load.
The scheduler might be optimized to serve high traffic sites, but I can guarantee I will never get to a high amount of traffic my users 
have to put up with 15 second page loads.
----------------------------------------------------------------------------------------------------------------------------------------------------
  1. 2012-09-01 09:36:59.435 /members/sphere/show/527001 200 13821ms 10kb Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1
  2. 50.131.165.108 - - [01/Sep/2012:09:36:59 -0700] "GET /members/sphere/show/527001 HTTP/1.1" 200 10518 "http://lemurspot.appspot.com/members/sphere/list/Public" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" "lemurspot.appspot.com" ms=13822 cpu_ms=10594 cpm_usd=0.001175 loading_request=1 instance=00c61b117c2fce802ccb47a4e712b784e3f54e
  3. I 2012-09-01 09:36:53.878
  4. [s~lemurspot/3.361443701203144598].<stdout>: Loading core functions for production environment

  5. I 2012-09-01 09:36:59.434
  6. This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  7. ----------------------------------------------------------------------------------------------------------------------------------------------------
  8. 2012-09-01 09:36:50.272 /_ah/warmup 404 13273ms 0kb
  9. 0.1.0.3 - - [01/Sep/2012:09:36:50 -0700] "GET /_ah/warmup HTTP/1.1" 404 205 - - "3.361443701203144598.lemurspot.appspot.com" ms=13273 cpu_ms=9662 cpm_usd=0.000023 loading_request=1 instance=00c61b117c0d042247ec72de95d943c6c2be81
  10. I 2012-09-01 09:36:45.776
  11. [s~lemurspot/3.361443701203144598].<stdout>: Loading core functions for production environment

  12. | 2012-09-01 09:36:49.678
  13. [s~lemurspot/3.361443701203144598].<stdout>: Sep 01, 2012 16:36:49 Warmup request recieved

  14. I 2012-09-01 09:36:50.271
  15. This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  16. ----------------------------------------------------------------------------------------------------------------------------------------------------
Re: [google-appengine] Re: Weird Instance Scheduler Jeff Schnitzer 9/1/12 3:57 PM
Yeah, baffling.  JDO startup costs come with the construction of the
PersistenceManagerFactory, so that should be "in your code".

That 400ms - is that measured from a filter at the outermost layer?

An interesting thing to try is to set up a handler for the warmup
request which issues an actual query to the datastore.  Any query.

Jeff
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 9/2/12 3:13 PM
The 400ms was measured from the time that the code entered my servlet's get method.  I can't be sure anymore since I've rewritten the code since then to use Objectify, but I'm guessing that it did not include the first PMF construction call.

If constructing the PMF is what was costing a bunch of time, then I'm guessing that the warmup request was not constructing one, but that it was getting constructed at static init time relative to the actual request.

I'll keep an eye on this once I'm ready to deploy again (the rewrite to Objectify came with a bunch of other changes I need to finish before I'm ready for real testing again).  For the moment, though, it seems like the problems I was having were due to a misunderstanding of how GAE instance warmups happen, and not due to a problem with the instance scheduler itself.

- Kris
Re: [google-appengine] Re: Weird Instance Scheduler Mos 9/3/12 10:53 AM
The last three days the GAE instance scheduler works accurate again. 
There are just 1 or 2 loading-requests the day. Remember: The weeks before I head hundreds of loading-request and many DeadlineExceededException.

But no status-update from Google on http://code.google.com/p/googleappengine/issues/detail?id=8004 ?

What has happened?

 - Just luck ?
 - Did some GAE infrastructure / policy changed on 1th September ?
 - Did somebody fix the weird instance scheduler? Perhaps after his or her summer holiday ?
 - Or did praying help ?

Please Google, be transparent on this issue. I had a downtime of 5h 19m in August (99,29%) in August.
Other people had similar experiences.  We need to know if reliability of GAE is fixed durably!

Thanks
Mos
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 9/4/12 2:53 AM
On Mon, Sep 3, 2012 at 7:52 PM, Mos <mos...@googlemail.com> wrote:
> The last three days the GAE instance scheduler works accurate again.
> There are just 1 or 2 loading-requests the day. Remember: The weeks before I
> head hundreds of loading-request and many DeadlineExceededException.

Hi Mos,

>
> But no status-update from Google on
> http://code.google.com/p/googleappengine/issues/detail?id=8004 ?

I just updated the issue with more details.

>
> What has happened?
>
>  - Just luck ?
>  - Did some GAE infrastructure / policy changed on 1th September ?
>  - Did somebody fix the weird instance scheduler? Perhaps after his or her
> summer holiday ?
>  - Or did praying help ?
>
> Please Google, be transparent on this issue. I had a downtime of 5h 19m in
> August (99,29%) in August.

As we discussed before what you refer as downtime is actually a
percentage of (loading) requests taking more than 5 seconds (and
indenfied by *pingdom* as downtime).

The App Engine SLA doesn't have the same definition of downtime
https://developers.google.com/appengine/sla
"""
"Downtime" means more than a ten percent Error Rate for any Eligible
Application.
"Downtime Period" means, for an Application, a period of five
consecutive minutes of Downtime. Intermittent Downtime for a period of
less than five minutes will not be counted towards any Downtime
Periods.
"Error rate" for the Service is defined with the Covered Services.
"""

As of today, the App Engine SLA only covers the following components,
https://developers.google.com/appengine/sla_error_rate.
"""
- Serving Infrastructure (HTTP Request sent to App Engine that results
in an INTERNAL_SERVING_ERROR)
- Datastore         (Datastore api call returning one of the following
errors: INTERNAL_ERROR, TIMEOUT, BIGTABLE_ERROR,
COMMITTED_BUT_STILL_APPLYING, TRY_ALTERNATE_BACKEND        )
"""

And unfortunately loading request latency is not covered by the SLA yet.

The serving infrastructure team is constantly working on improving the
reliability of loading requests performance, but this is a long term
effort and in the meantime we (and the App Engine community) can help
you to optimize the performance on your application.
Re: [google-appengine] Re: Weird Instance Scheduler Mos 9/4/12 4:27 AM
Hello Johan,


>> http://code.google.com/p/googleappengine/issues/detail?id=8004 ?
>  I just updated the issue with more details.

Thanks, but can you please be a bit more specific?
What does "The reliability team performed a maintenance operation, and it seems that most application are back to a normal levels of loading requests." mean?

- Could this happen from time to time again?
- The issue last for weeks!  Why do we need to escalate such a critical issue with a 50+ mailing-thread and why isn't GAE able to monitor such bad behavior by itself?

->> Please Google, be transparent on this issue. I had a downtime of 5h 19m in

>> August (99,29%) in August.

> As we discussed before what you refer as downtime is actually a percentage of (loading) requests taking more than 5 seconds (and indenfied by *pingdom* as downtime).

That's not correct! Pingdom reports downtime if a request takes more than 30 seconds.
The Google SLA needs a reality-check! 
If my users had over and over again to wait more than 30 seconds (in sum over 5h in August) , partly with a DeadlineExceededException,
it isn't acceptable!

Cheers
Mos
Re: [google-appengine] Re: Weird Instance Scheduler Johan Euphrosine (Google) 9/4/12 5:07 AM
On Tue, Sep 4, 2012 at 1:27 PM, Mos <mos...@googlemail.com> wrote:
> Hello Johan,
>
>
>>> http://code.google.com/p/googleappengine/issues/detail?id=8004 ?
>>  I just updated the issue with more details.

Hi Mos,

I updated the issue again.

>
> Thanks, but can you please be a bit more specific?
> What does "The reliability team performed a maintenance operation, and it
> seems that most application are back to a normal levels of loading
> requests." mean?
>
> - Could this happen from time to time again?

App Engine is a managed service, the reliability team is constantly
monitoring the overall platform health and performing maintenance
operation.

"Google I/O 2011: Life in App Engine Production" is a great resource
if you are looking for more details about the reliability team
operation:
http://www.youtube.com/watch?v=rgQm1KEIIuc

> - The issue last for weeks!  Why do we need to escalate such a critical
> issue with a 50+ mailing-thread and why isn't GAE able to monitor such bad
> behavior by itself?

See my comments on the bug.

>
> ->> Please Google, be transparent on this issue. I had a downtime of 5h 19m
> in
>
>>> August (99,29%) in August.
>
>> As we discussed before what you refer as downtime is actually a percentage
>> of (loading) requests taking more than 5 seconds (and indenfied by *pingdom*
>> as downtime).
>
> That's not correct! Pingdom reports downtime if a request takes more than 30
> seconds.
> The Google SLA needs a reality-check!

You're welcome to make constructive comments and suggestion about the
SLA by filling feature requests on the public issue tracker or by
contacting premier customer support:
http://code.google.com/p/googleappengine/issues/entry?template=Feature%20request
https://developers.google.com/appengine/docs/premier/

> If my users had over and over again to wait more than 30 seconds (in sum
> over 5h in August) , partly with a DeadlineExceededException,
> it isn't acceptable!

As I said before the serving infrastructure team is currently working on
improving loading request reliability, see:
http://code.google.com/p/googleappengine/issues/detail?id=7706
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 9/21/12 2:08 AM
OK, ready to deploy again, and of course I immediately ran into this issue (along with a bunch of problems related to channels).

Here are the logs.  Note the 7-second gap between the request that created a new instance, and the immediately preceding instance.  Note also that there was no warmup request.  I now see in the documentation that warmup requests aren't guaranteed, so I'm not sure adding a warmup handler will actually help me; in any case, I haven't explicitly disabled them.

So, yeah.  No solution at this point, and random 10s delays are not acceptable. Takashi, can you analyze this and tell me what's going on?

    1. 2012-09-21 01:50:03.465 /api/game/80001/submit?messages=MOVE+Gray.Cross+600+136+135+134+2+false;MOVE+Gray.Queue+600+34+35+36+0+false&index=21200 176ms 0kb Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9B206
      76.102.149.245 - - [21/Sep/2012:01:50:03 -0700] "POST /api/game/80001/submit?messages=MOVE+Gray.Cross+600+136+135+134+2+false;MOVE+Gray.Queue+600+34+35+36+0+false&index=21 HTTP/1.1" 200 192 - "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9B206" "titan-game-qa.appspot.com" ms=176 cpu_ms=163 cpm_usd=0.000080 instance=00c61b117c15f0ab71ada0d4f932c37adf981007
    1. 2012-09-21 01:49:50.460 /api/game/80001/submit?messages=SPLIT+Gray.Queue+Gray.Cross+Angel+Centaur+Centaur+Gargoyle&index=17200 9609ms 0kb Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9B206
      76.102.149.245 - - [21/Sep/2012:01:49:50 -0700] "POST /api/game/80001/submit?messages=SPLIT+Gray.Queue+Gray.Cross+Angel+Centaur+Centaur+Gargoyle&index=17 HTTP/1.1" 200 187 - "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9B206" "titan-game-qa.appspot.com" ms=9609 cpu_ms=5063 cpm_usd=0.000080 loading_request=1 instance=00c61b117cf26825e763cc7957e0c35cc39d68
    2. I2012-09-21 01:49:50.460 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
    1. 2012-09-21 01:49:33.240 /api/game/80001/submit?messages=MARKERCHOICE+Black.Horn&index=14200 459ms 0kb Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405
      76.102.149.245 - - [21/Sep/2012:01:49:33 -0700] "POST /api/game/80001/submit?messages=MARKERCHOICE+Black.Horn&index=14 HTTP/1.1" 200 136 - "Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405" "titan-game-qa.appspot.com" ms=459 cpu_ms=187 cpm_usd=0.000074 instance=00c61b117c15f0ab71ada0d4f932c37adf981007
    1. 2012-09-21 01:49:31.124 /api/game/80001/submit?messages=COLORCHOICE+Black&index=11200 244ms 0kb Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405
      76.102.149.245 - - [21/Sep/2012:01:49:31 -0700] "POST /api/game/80001/submit?messages=COLORCHOICE+Black&index=11 HTTP/1.1" 200 129 - "Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405" "titan-game-qa.appspot.com" ms=244 cpu_ms=210 cpm_usd=0.000073 instance=00c61b117c15f0ab71ada0d4f932c37adf981007
Re: [google-appengine] Re: Weird Instance Scheduler Kristopher Giesing 9/21/12 2:14 AM
PS.  This is with default application settings, and for this test I reverted to using front ends instead of backends (since I gather backends don't support channels, that's no longer an option for me).
Re: [google-appengine] Weird Instance Scheduler Rekby 9/26/12 1:09 AM
I have same problem with startup delay http://stackoverflow.com/q/12581110/1204368 

PS.  This is with default application settings, and for this test I reverted to using front ends instead of backends (since I gather backends don't support channels, that's no longer an option for me).

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/-1SggFlu3_sJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler Kristopher Giesing 9/26/12 7:54 AM
To be clear, my problem isn't with startup time per se.  My problem is that despite having an active instance that has been idle for 7 seconds, GAE decided to spin up a new instance, causing a significant user-visible delay.

Secondarily, I'm puzzled as to why I didn't get a warmup request, but as I mentioned, the docs explicitly state that warmup requests aren't guaranteed, so perhaps that is expected.

- Kris
Re: Weird Instance Scheduler DFB 10/3/12 8:06 AM
Same problem here. For last 2+ weeks, my small application, which usually would not use more than 2 instances, has been using up to 8 instances. There's absolutely no increase in traffic and nothing changed on the application side.

On Thursday, August 23, 2012 3:58:44 AM UTC+8, Mos wrote:
Does anybody else experience abnormal behavior of the instance-scheduler the last three weeks (the last 7 days it got even worse)?  (Java / HRD)
Or does anybody has profound knowledge about it?

Background:  My application is unchanged for weeks, configuration not changed and application's traffic is constant.
Traffic: One request per minute from Pingdom and around 200 additional pageviews the day (== around 1500 pageviews the day). The peek is not more then 3-4 request per minute.

It's very obvious that one instance should be enough for my application. And that was almost the case the last months!

But now GAE creates most of the time 3 instances, whereby on has a long life-time for days and the other ones are restarted around
10 to 30 times the day.
Because load request takes between 30s to 40s  and requests are waiting for loading instances, there are many request that
fail  (Users and Pingdom agree: A request that takes more then a couple of seconds is a failed request!)

Please check the attached screenshots that show the behavior!

Note:
- Killing instances manually did not help
- Idle Instances were ( Automatic – 2 ) .  Changing it to whatever didn't not change anything; e.g. like ( Automatic – 4 )

Thanks and Cheers
Mos






More topics »