Scheduler still not working as expected.

81 views
Skip to first unread message

timh

unread,
Oct 12, 2012, 8:13:32 PM10/12/12
to google-a...@googlegroups.com
Hi

After all this time I still see some really odd behaviour from the frontend scheduler.

Case in point.

I have a billing enabled HRD instance
Python 2.7, threadsafe=true
Two resident instances and 3 idle instances.
No traffic going to the site for the last 20-30 minutes.

Fire up a request, and wait 20 seconds whilst a new instance is started.

Based on all my years with appengine (first app in 2008)  and all my reading this does not appear to be following the rules.
I note that with this new HRD based app that startup times are often a lot worse than the M/S instance that it is replacing.

Tim

Takashi Matsuo

unread,
Oct 13, 2012, 1:48:41 AM10/13/12
to google-a...@googlegroups.com

Hi Tim,

Can you tell me the details like app-id, the exact time of occurrence, etc so I'll take a look?

-- Takashi

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/xkXt-xE6NcAJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Takashi Matsuo | Developers Advocate | tma...@google.com

timh

unread,
Oct 13, 2012, 6:31:29 AM10/13/12
to google-a...@googlegroups.com
HI Takashi

fish-and-lily

The event occurred at 
2012-10-13 07:02:32.339  GMT+8

I was inaccurate about the traffic going to the site, but there had not been a request for at least 1minute before the request, that cause the
new instance.

Regards

Tim Hoffman

Takashi Matsuo

unread,
Oct 13, 2012, 8:31:23 AM10/13/12
to google-a...@googlegroups.com

Hi Tim,

I don't think this is a serious issue because such loading requests in your app are still very rare.

In a very rare case, App Engine sometimes needs to direct a request to another instance(even if there is an idle instance) for some reasons. I think this is a nature of a distributed and shared system like App Engine.

-- Takashi


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/TqUMq209yM4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

timh

unread,
Oct 13, 2012, 10:49:22 AM10/13/12
to google-a...@googlegroups.com
Hi Takashi


On Saturday, October 13, 2012 8:31:43 PM UTC+8, Takashi Matsuo (Google) wrote:

Hi Tim,

I don't think this is a serious issue because such loading requests in your app are still very rare.


I know they are rare. 
 
In a very rare case, App Engine sometimes needs to direct a request to another instance(even if there is an idle instance) for some reasons. I think this is a nature of a distributed and shared system like App Engine.


It would seem that scheduling a request to one of 2 resident plus a couple of idle instances would be more efficient even in a distributed/shared environment that incurring a 29sec cost of starting up a brand new instance though.

In my 2.5 based code my app handler could start with no custom imports and serve cached content from memcache without having to start the full stack. This meant typically <200msec cold start to serve from cache  vs 4-6secs starting a complete instance.  

My instances start typically in around 4secs but I am seeing a lot of longer startup times.  I need to work out how I can achieve the same goal with the current 2.7 runtime, so as to avoid these sorts of hits in performance.  

T

Barry Hunter

unread,
Oct 13, 2012, 11:40:17 AM10/13/12
to google-a...@googlegroups.com
 
In a very rare case, App Engine sometimes needs to direct a request to another instance(even if there is an idle instance) for some reasons. I think this is a nature of a distributed and shared system like App Engine.


It would seem that scheduling a request to one of 2 resident plus a couple of idle instances would be more efficient even in a distributed/shared environment that incurring a 29sec cost of starting up a brand new instance though.

I think the implication is there, are multiple "instances" of the scheduler. It might because the app is being served out of multiple datacenters, or even just multiple instances within the same datacenter (one per rack?). 

These scheduler instances dont know what app instances are already being run by *other* schedulers. They are self contained. 

So if a request happens to hit an scheduler instance, that has no app instances running, there is nothing to do but wait for a appstartup. Even though there are maybe other app instances sitting idle. 
 

Google probably does some complicated routing to try to always route a incoming request to a scheduler that is likly to have a instance for your app. Maybe some sort of consistent hashing. But sometimes, that jsut will not work out. maybe the scheduler lost connectivity with the router. Because the router will also be a distributed service - probably running right at the edge of Google network. Maybe the instance scheduler is being decommissioned - taken out of service. For a while it will still have a app instances running, but they will be inaccessible. At scale - particully distributed - nothing happens 'instantly' and consistently, eventual consistency rules. 



In general AppEngine is engineered for scale, tremendous scale. An app with many app instances running (so each scheduler should always have running app instances) - will not notice these occasional blips. Whereas with a mostly idle application, the relative chance of getting a routing 'misdirection' is much increased. 


Now of course Google could work more to lower the chance of it happening. eg having the routing frontends and the schedulers being more tightly coupled, but the overhead of this, is likly not worth it. It would vastly complicate the overall system, both in code, and bandwidth, that it would probably slow down all requests somewhat. 



(Note, I dont know this is how Google implements appengine, but it seems likely) 

 


My instances start typically in around 4secs but I am seeing a lot of longer startup times.  I need to work out how I can achieve the same goal with the current 2.7 runtime, so as to avoid these sorts of hits in performance.  

Get more traffic! The more traffic, the less these things will happen. 

Kristopher Giesing

unread,
Oct 13, 2012, 4:33:05 PM10/13/12
to google-a...@googlegroups.com
On Saturday, October 13, 2012 5:31:43 AM UTC-7, Takashi Matsuo (Google) wrote:

In a very rare case, App Engine sometimes needs to direct a request to another instance(even if there is an idle instance) for some reasons. I think this is a nature of a distributed and shared system like App Engine.

I'm speechless.

Does GAE have an SLA or not?  If so, what is it?

I'm really starting to regret boarding this train.

timh

unread,
Oct 13, 2012, 7:06:41 PM10/13/12
to google-a...@googlegroups.com
Hi Barry

What you are saying does make some sense.  

I don't regret using appengine at all.  Unfortunately appengine does require us to make some optimisations that ideally should not be necessary. It's a case of leaky abstractions.  All the infrastructure is hidden by abstractions however we have to guess how it actually works
and make some adjustments based on those guesses to get the most out of it and avoid performance problems that can crop up.

Reply all
Reply to author
Forward
0 new messages