Does anybody else experience abnormal behavior of the instance-scheduler the last three weeks (the last 7 days it got even worse)? (Java / HRD)
Or does anybody has profound knowledge about it?
Background: My application is unchanged for weeks, configuration not changed and application's traffic is constant.
Traffic: One request per minute from Pingdom and around 200 additional pageviews the day (== around 1500 pageviews the day). The peek is not more then 3-4 request per minute.
It's very obvious that one instance should be enough for my application. And that was almost the case the last months!
But now GAE creates most of the time 3 instances, whereby on has a long life-time for days and the other ones are restarted around
10 to 30 times the day.
Because load request takes between 30s to 40s and requests are waiting for loading instances, there are many request that
fail (Users and Pingdom agree: A request that takes more then a couple of seconds is a failed request!)
Please check the attached screenshots that show the behavior!
Note:
- Killing instances manually did not help
- Idle Instances were ( Automatic – 2 ) . Changing it to whatever didn't not change anything; e.g. like ( Automatic – 4 )
Thanks and Cheers
Mos
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
> A possible explanation could be that the traffic pattern had changed.No. It's the same. Check for example the Request/Seconds statistics of my application for the last 30 days!
That's one expection where one client did 8 request in a minute (+ one pingdom). Nothing else this minute.
>> It's very obvious that one instance should be enough for my application. And that was almost the case the last months!
> Actually it's not true. In particular, check this log:
In those exceptional cases it could be ok if a second instance starts. (Nevertheless can't one instance not
handle 8 requests a minute?)
As I described: Instances are started and stopped without reason, even if less traffic per minute is available!
> * What is the purpose of max-pending-latency = 14.9 setting?" is high App Engine will allow requests to wait rather than start new Instances to process them"
--> One attempt to stop GAE to create unnecessary instances.
> * Can you try automatic-automatic for idle instances setting?I played around with this the last days and nothing changed. As I wrote: I had those configuration for months and it worked fine 3-4 weeks ago!
> * What is the purpose of those pingdom check? What happens if you stop that?To be alerted if GAE is down a again. "What happens if you stop that?" --> I wouldn't be angry anymore because I wouldn't recognize downtime's of my GAE application. ;)
Please forward http://code.google.com/p/googleappengine/issues/detail?id=8004 to the relevant GAE deparment.
Hello Takashi,I thought this is why GAE has the concept of "pending-latency" (which we discussed below).
> Actually there were almost 8 requests in a second. So App Engine likely needed more than one instance at this particular moment.
Meaning: Incoming requests may wait up to 15 seconds before starting a new instance. Therefore when 8 requests in one second occur that
should not mean that more instance needs to be started. Especially if there is no other traffic in this minute, as seen in my example.
Otherwise it would be a very bad implementation:
Starting a new instance means around 30s waiting time. Serving 8 parallel requests from one instance, would result in a maximum of
8 seconds for the last request (assuming that each request takes around 1 second).
There is no reason for this concrete example to fire up more instances and let requests wait more then 30 seconds until a new instance is loaded.
> ... here is what you've seen in the past weeks.>>* You have been almost always set 'Automatic-2' idle instance setting.>* More than 3 weeks ago, number of loading requests were very few.> * Recently you have seen more loading requests than before.That, right! To be even more concrete: At the 16. august the problems got significant worse. Please check especially the time area from 16. august until today.
I checked it in our version control. As I wrote no related changes were made! Just Html/Css stuff:
> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug 2. Can you describe what kind of changes in those versions?
* One picture upload
* One html change
* One JavaScript change
* One css changeAnd around the 16th august?
> And, to be fair, we didn't think of any change in our scheduler around 3 weeks ago which can cause this issue.
That shouldn't be luck! If GAE is not able to start Java instances in 5sec to 10 second, there needs be a guarantee that instances have longer lives. Otherwise Java applications on GAE are unusable because user would have a lot of 30seconds wait time (--> "failed requests"). (See also next comment regarding resistant instances)
> More than 3 weeks before, those 2 idle instances might have had longer lives than now, but it was not a concrete behavior. Please think this way: you were just kind of lucky.I tried this some days ago. I had one resistant instance. But that changed nothing. Instances get started and stopped as before. I assumed that requests would go to the resistant instance first. But that was no the case. Resistant instance was idle, but a dynamic instance got started and the request waits 30sec.
> If you want some instances always active, please set min idle instances.
Please check other discussion on this list and issues that reported similar observations.
> As you can see, I'm still not convinced to believe that the scheduler is misbehaving. I understand that you're having experiences which are bit worse than 3 weeks ago, and understand your feeling that you want to tell us 'fix it', but I'd say it's > >still something in the line of 'expected behavior' at least for now.> If you feel differently, please let me know.Yes I do feel differently (please see answers above).
Please accept http://code.google.com/p/googleappengine/issues/detail?id=8004
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YIzxpRbmyHMJ.
Hi all,
Please review the following thread where the lead engineer working on the scheduler (Jon McAlister) took the time to explain in great detail the behavior of min idle instance.
https://groups.google.com/d/msg/google-appengine/nRtzGtG9790/hLS16qux_04J
Once you read this, we can discuss if what you're experiencing is really a bug, or if you want the scheduler to behave differently from its current implementation, in which case the more constructive way out of this discussion is to fill feature request, and get it starred by your peers.
On Aug 24, 2012 11:28 PM, "Mos" <mos...@googlemail.com> wrote:
>
> Thanks Johan. I read the post some days before.
>
> As often discussed on the mailing-list before and as Jeff said in this thread.
> It's the combination of "Requests should never be sent to cold instances."
Please star this existing feature request:
http://code.google.com/p/googleappengine/issues/detail?id=7865
> and(!) the behavior of min idle instance which doesn't make any sense.
Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.
If you want another behavior like a new slider for Min-Instances please fill a new feature request.
> Please check the last comment of http://code.google.com/p/googleappengine/issues/detail?id=8004 where wrote down the problems in my point of view.
>
I would suggest starring existing or filling new features of defects for those separate issues you identified rather than aggregating comments on this production issue.
> Senior Java-developers on this list which have many months of experience with GAE stated again again that there is a big issue around instance handling.
> I think you have to trust your power-user and assign a team to work on this!
>
We do trust our power users and recognize their frequent contributions to the App Engine developer community.
But like any developer community, there are preferred ways of making feature requests or reporting bugs and I'm just trying to direct your feedbacks to what is more likely to produce results.
If you fill a feature request and get it starred enough the engineering team will definitely consider it, as each team regularly look at the tracker for the most starred feature request of their respective component.
If you fill a bug and produce a way to reproduce it and get it starred enough, it will get triaged and escalated to the corresponding team as we regularly triage the issue tracker when organizing bug squashing sessions.
That's how things are supposed to work. But sometime we lag, we forget about issues or miss them. You are more than welcome to point us at specific issue tracker entries when this happen so we can correct our mistakes.
> Setting Max Pending Latency doesn't force requests to be in the pending queue for the specified time. Please use Min Pending Latency instead.
As you know my setting to "Min Pending Latency" was automatic. The expectation is that GAE takes a reasonable default latency if it is "automatic".
And you say: Every parallel request starts a new instance if it is "automatic"? That' would be a "Min Pending Latency" of zero and not "automatic".
Please check the responses of other user in this thread. This feature is totally broken and can not be used.
> If it doesn't work, try 2 min idle instances then
>> And around the 16th august?> Sigh... isn't it a waist of time? What is the reason you picked that date?Did you see/studied my pictures from the first post of this thread?
The statistic shows that on this date the instance creation gets crazy. I double checked it with the Pingdom reports.
Starting on this day there were even more downtimes.Again: As wrote in my post before that does not work. Check the responses from Kristopher and Jeff on this thread.
> So I'd say please try 2. If you still saw the user-facing loading requests, you need more resident instance to eliminate the user-facing loading requests.
> So what is your expected behavior and actual result? Nobody in our team can do anything if you just keep saying "the setting that used to work doesn't work anymore" without trying mu suggestion.> I think my answer is clear at least for some points. 1) You'd better use 'min pending latency' instead of 'max pending latency' to prevent new instances to spin up as much as possible. 2) If you need longer instance lives, set appropriate number of min idle instances.As I wrote: I tried different settings. As many other people in this group as well.
Me and other people are reporting: The settings are broken!
It's very easy to reproduce. Please set up an application, send one request per minute (or second), configure 1 or 2 or 3 min idle instances and check what is happening. You will see that new instances are started although resistant instances are available.
Please take it serious and let somebody of the engineers check this!
On Aug 24, 2012 11:28 PM, "Mos" <mos...@googlemail.com> wrote:
>
> Thanks Johan. I read the post some days before.
>
> As often discussed on the mailing-list before and as Jeff said in this thread.
> It's the combination of "Requests should never be sent to cold instances."Please star this existing feature request:
http://code.google.com/p/googleappengine/issues/detail?id=7865
> and(!) the behavior of min idle instance which doesn't make any sense.
Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.
Like Jon explained in the post I linked, the scheduler will favor routing traffic to idle dynamic instance rather than idle reserved idle instance and it will always try to maintain the invariant of N x Min-Idle-Instances by starting new instance if the reserved instances are busy.
Resident instances are used for processing incoming request if there
is no dynamic instance
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ApT6E62dU9QJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/kb9OyMgMH5wJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
I posted a great deal of information in the thread here:In that thread I posted logs that showed that the very first request after setting min instances to 1 will spawn a new instance (in addition to the instance that the min instances setting created). The app ID used in that testing is "titan-game-qa" and the timestamps are in the logs I posted.
At some point I will have enough bandwidth to set up a more specific test, but I feel I've already posted plenty of information for GAE engineers to digest.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ubhrxTXYlC4J.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Does your warmup request initialize the persistent manager, or some libraries you may want to preload beforehand?
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/4FGx8YdHUIgJ.
> So, actually, I am baffled. Any ideas, anyone?
Does your warm up load all your classes?
Warm is kind of relative J
PS. This is with default application settings, and for this test I reverted to using front ends instead of backends (since I gather backends don't support channels, that's no longer an option for me).
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/-1SggFlu3_sJ.
Does anybody else experience abnormal behavior of the instance-scheduler the last three weeks (the last 7 days it got even worse)? (Java / HRD)
Or does anybody has profound knowledge about it?
Background: My application is unchanged for weeks, configuration not changed and application's traffic is constant.
Traffic: One request per minute from Pingdom and around 200 additional pageviews the day (== around 1500 pageviews the day). The peek is not more then 3-4 request per minute.
It's very obvious that one instance should be enough for my application. And that was almost the case the last months!
But now GAE creates most of the time 3 instances, whereby on has a long life-time for days and the other ones are restarted around
10 to 30 times the day.
Because load request takes between 30s to 40s and requests are waiting for loading instances, there are many request that
fail (Users and Pingdom agree: A request that takes more then a couple of seconds is a failed request!)
Please check the attached screenshots that show the behavior!
Note:
- Killing instances manually did not help
- Idle Instances were ( Automatic – 2 ) . Changing it to whatever didn't not change anything; e.g. like ( Automatic – 4 )
Thanks and Cheers
Mos