Urgent, Need Help: Instances are shutting down after a couple minutes of inactivity.

538 views
Skip to first unread message

Cesium

unread,
Dec 17, 2012, 10:32:40 AM12/17/12
to google-a...@googlegroups.com
On Dec 16, something changed.

Now, instances do not survive more than a couple minutes of inactivity (no incoming requests).

Thus, low traffic applications see long latency since a new instance is created for the request.

This is disastrous for my application (and customers).

Can someone please help?

Thanks,

David

Francois Masurel

unread,
Dec 17, 2012, 2:11:23 PM12/17/12
to google-a...@googlegroups.com
There have been quite a few messages posted on this list about this problem :


Googlers have never answered to most of them.  I don't know what that mean.

Are they quietly trying to kill java on GAE ?  Are they working on a fix ? Impossible to know.

It looks like we won't build a strong business on GAE anytime soon with such user facing latencies.

François

PS:

I definitely think PAAS is the way to go for web and mobile development, so we'll probably see some improved hosting solutions in the near future if it's not already the case (RedHat is working on something with JBoss).

Michael Hermus

unread,
Dec 17, 2012, 5:20:37 PM12/17/12
to google-a...@googlegroups.com
It is a real shame that Google isn't better about communicating around issues like this. I think GAE is a tremendously powerful platform, but I am increasingly worried that it is not a high enough priority within Google. The series of recent stability and latency issues are undermining a tangible amount faith in the viability of GAE as a solution for many business applications.

Cesium

unread,
Dec 17, 2012, 5:26:31 PM12/17/12
to google-a...@googlegroups.com
I fear that Sir Brandon has given us a fair assesment of GAE and GAE support.

The unicorns have left the building.

David


Cesium

unread,
Dec 18, 2012, 12:03:19 PM12/18/12
to google-a...@googlegroups.com
I changed my settings to have 3 resident instances.

I still see long latency when the scheduler creates new instances, rather than using the resident instances.

The GAE scheduler behavior at low request rates is odd, and counterintuitive.

But we knew that didn't we?
David

Carl Schroeder

unread,
Dec 18, 2012, 2:10:45 PM12/18/12
to google-a...@googlegroups.com
I have the same problem. We need to migrate away from Java on GAE, I fear it will *never* work right with long instance spinup times.

Google gets their instance scheduler half-way working, then make some breaking change that crushes the user experience for low traffic applications.

Michael Hermus

unread,
Dec 18, 2012, 2:48:12 PM12/18/12
to google-a...@googlegroups.com
This is mind boggling.

There is a simple, brain-dead fix that does NOT require the (presumably complex) scheduling logic to change one bit. Simply NEVER expose a user-facing request to a loading request, if the developer configures it that way. Let the scheduler spin up and down instances however it sees fit, but don't let a new instance serve requests until it successfully completes initialization.

This was beat to death months ago in a series of threads, but obviously resulted in no action.

Cesium

unread,
Dec 18, 2012, 3:18:03 PM12/18/12
to google-a...@googlegroups.com
What chaps my hide is the unannounced CHANGE in the behavior (for my app).

I have been running for months with a single instance and no problems.

Then, it breaks. No warning. No feedback. No drinks. No dinner. Nada.

I have stayed out of the discussion because all was well.

Oh, and, Takashi, don't bother this time.

I have designed a non-appengine solution.

Michael Hermus

unread,
Dec 18, 2012, 3:50:20 PM12/18/12
to google-a...@googlegroups.com
I hate when my hide gets chapped. And drinks would help...

Good luck out there.

Francois Masurel

unread,
Dec 18, 2012, 5:23:15 PM12/18/12
to google-a...@googlegroups.com
Yep you're right, adding some more resident instances doesn't fix the problem, the scheduler is definitively broken.

Feeling desperate.

Carl Schroeder

unread,
Dec 18, 2012, 5:57:52 PM12/18/12
to google-a...@googlegroups.com
I should add code so that whenever an instance spins up to a URL that is not a warmup, it sends an email to the GAE devs.
Perhaps I will try it on a free app first, just to get the kinks out...
;)

Francois Masurel

unread,
Dec 18, 2012, 6:51:22 PM12/18/12
to google-a...@googlegroups.com
Related screenshot :

Michael Hermus

unread,
Dec 19, 2012, 9:07:58 AM12/19/12
to google-a...@googlegroups.com
+1

Michael Hermus

unread,
Dec 19, 2012, 9:14:12 AM12/19/12
to google-a...@googlegroups.com
By the way, if anyone interested has not starred the relevant issue in the issue tracker, please do so!

https://code.google.com/p/googleappengine/issues/detail?id=7865

Cesium

unread,
Dec 19, 2012, 12:36:10 PM12/19/12
to google-a...@googlegroups.com
Thanks for putting the stress in my vacation this week, Appengine Team.

David

David Lee

unread,
Dec 18, 2012, 2:57:11 PM12/18/12
to google-a...@googlegroups.com

Reasoned logic will get us nowhere with this issue.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/mibjVETUupIJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Carl Schroeder

unread,
Dec 20, 2012, 11:24:37 AM12/20/12
to google-a...@googlegroups.com
GAE is still spinning up and down instances rapidly with little rhyme or reason. Unless the reason is to book lots of warmup CPU hours.

Cesium

unread,
Dec 20, 2012, 10:51:35 PM12/20/12
to google-a...@googlegroups.com
Ready for this?

Now, a single instance survives for hours and hours, happily serving requests with the usual low latency response time.

This is just what Sir Brandon wrote about. Mysterious changes in the system's behavior.

I should note that I sprinkled rainbow Skittles across the floor to attract the unicorns. They're back!

David


Carl Schroeder

unread,
Dec 21, 2012, 11:39:22 AM12/21/12
to google-a...@googlegroups.com
I've been sacrificing unicorns to dark powers...clearly I have been doing it wrong.
That probably explains some unusual behavior in other parts of my app.
AFK pentagrams. :(

Christina Ilvento

unread,
Dec 21, 2012, 9:15:05 PM12/21/12
to google-a...@googlegroups.com
Hi All,

Would you mind sending app-ids that you're seeing this behavior for? Please feel free to send them to me directly or to link any issues you have filed in our issue tracker so that we can investigate.


Thanks,
Christina


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MFdFH0ZqWbgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--

Christina Ilvento |
 Google App Engine | cilv...@google.com | (650)-201-9399



Francois Masurel

unread,
Dec 22, 2012, 6:18:00 AM12/22/12
to google-a...@googlegroups.com

Jeff Schnitzer

unread,
Dec 23, 2012, 1:57:04 AM12/23/12
to Google App Engine
<cronentries>
<cron>
<url>/some-non-static-url</url>
<schedule>every 1 minutes</schedule>
</cron>
</cronentries>

This will keep one instance warm.

Jeff

David Lee

unread,
Dec 25, 2012, 11:37:55 AM12/25/12
to google-a...@googlegroups.com

Saket,
Didn't we already discuss this a couple of days ago?
David

On Dec 24, 2012 5:20 PM, "Saket Kumar" <sak...@google.com> wrote:
Hi Cesium,

Can you share your app-id as well?

Thanks!
Saket

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/LOXI1o2m_dgJ.

Saket Kumar

unread,
Dec 25, 2012, 12:04:14 PM12/25/12
to google-a...@googlegroups.com
These are my previous posts which got approved now.

Francois Masurel

unread,
Dec 25, 2012, 7:00:26 PM12/25/12
to google-a...@googlegroups.com
Hi Saket,

Thanx for investigating the problem.

Things seems to have improved significantly these last few days.

On our Neustar reports, for an average page loading, we can see that we went from 6.65s (12/16) to 8.94s (12/18) and back to 2.57s (12/23).

I can confirm that we don't see as much instance warm ups in our logs as we were used before, we went from every 3-4 minutes to 20-25 minutes.

I definitely think things have changed on Google side as we haven't changed anything on our side.

But some strange things are still going on at the moment (12/26 0:52 UTC+1) as a dynamic instance has been up for more than 4 hours but has served only one request (cf. screenshot below).

Thanx again Saket for your help.

François





On Monday, December 24, 2012 11:54:48 PM UTC+1, Saket Kumar wrote:
Hi Francois,

Are you still facing this issue? I did a small test for your application and didn't find anything too bad from scheduler's perspective. I'm trying to understand if it was a temporary glitch that was causing the issue or something is wrong with scheduler's algorithm. Or if scheduler doesn't spin instances properly if QPS is low.

Here is series of events- 

a.) Single resident was serving for 1 day, 03:26:37 minutes, serving 9600 request
b.) After serving 96 requests with raised QPS, new instance was created.
c.) QPS lowered and the new instance was allowed to die.
d.) Again the QPS was increased and new instance was created. Both instances were handling requests at this point.
e.) QPS lowered again, newly formed instance dies and older instance starts serving 100% of the requests.

-
Saket

Carl Schroeder

unread,
Dec 27, 2012, 2:01:30 PM12/27/12
to google-a...@googlegroups.com
I am still seeing java instances decomissioned after sub minute quiet periods.  Given that it takes 20-30 seconds to spin one java instance up, you should probably leave them alive for a bit longer than a few seconds. Otherwise, for low traffic profiles, page loads for GAE java can take up to 30 seconds. God help us if the scheduler thinks I need 2 new instances spun up in series rather than parallel.

FYI, people don't wait around for a minute for pages to load. They use other services.

Once again, due to unannounced pathological behavior of the instancing on GAE, we are wasting our time re-implementing our java infrastructure on AWS. At least, I hope it is a waste of our time...

Carl Schroeder

unread,
Dec 27, 2012, 2:12:27 PM12/27/12
to google-a...@googlegroups.com
Also FYI, this whole issue could be resolved if you would stop sending user facing requests to cold uninitialized instances in java GAE. Handling user requests in ways that you know will cause 20+ second response times is pathological. 

All new development for us on GAE is blocked until this issue can be resolved. 

Cesium

unread,
Dec 27, 2012, 2:32:01 PM12/27/12
to google-a...@googlegroups.com
FYI, my app is cruising along just fine with one instance that has been alive for 2 days.

I feel your pain, brother.

David

Carl Schroeder

unread,
Dec 27, 2012, 3:04:18 PM12/27/12
to google-a...@googlegroups.com
Someone is not sharing their unicorns. :(

Saket Kumar

unread,
Dec 27, 2012, 4:18:47 PM12/27/12
to google-a...@googlegroups.com
Hi Carl,

Can you let me know your app-id?

Regards,
Saket
Reply all
Reply to author
Forward
0 new messages