Outages?

Showing 1-117 of 117 messages
Outages? Adam Sherman 3/6/12 1:17 PM
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: Outages? Cesium 3/6/12 1:22 PM
I've got nothing but errors.
Re: [google-appengine] Re: Outages? Jeff Schnitzer 3/6/12 1:27 PM
I see a lot of errors on app startup like this:

Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@15a2dc4{/,/base/data/home/apps/s~voostip/1.357238701206702102}
com.google.apphosting.api.DeadlineExceededException: This request (71d5265cd8f687bc) started at 2012/03/06 21:22:53.913 UTC and was still executing at 2012/03/06 21:23:53.699 UTC.
	at com.google.appengine.runtime.Request.process-71d5265cd8f687bc(Request.java)
	at java.util.zip.ZipFile.read(Native Method)
	at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
	at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
	at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

It's not 100% of the time, but it's often enough to be scary.  If an instance gets up it seems to stay up.  But getting there is a problem.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/AwHg5a7-EPoJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Adam Sherman 3/6/12 1:28 PM
That is also what I am seeing.

--
Adam Sherman, CTO
Versature Corp. / +1.877.498.3772 x113

Follow us on Twitter - http://twitter.com/Versature
Check out the Versature Blog - http://inside.versature.com

Re: [google-appengine] Re: Outages? Joakim 3/6/12 1:41 PM
In addition to those, I've been getting logs with a single line of text reading "Request was aborted after waiting too long to attempt to service your request."
Too long seems to be about ten seconds.
Re: Outages? Francois Masurel 3/6/12 1:44 PM
Yep, getting quite a few errors on loading requests lately like this one for example :

  1. 2012-03-06 20:26:42.834
    Uncaught exception from servlet
    org.apache.xerces.parsers.ObjectFactory$ConfigurationError: Provider org.apache.xerces.parsers.XIncludeAwareParserConfiguration could not be instantiated: com.google.apphosting.api.DeadlineExceededException: This request (c2d42bb1d5647665) started at 2012/03/06 19:25:43.000 UTC and was still executing at 2012/03/06 19:26:42.782 UTC.
    	at org.apache.xerces.parsers.ObjectFactory.newInstance(Unknown Source)
    	at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
    	at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
    	at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
    	at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    	at org.mortbay.xml.XmlParser.makeFactorySecure(XmlParser.java:162)
    	at org.mortbay.xml.XmlParser.setValidating(XmlParser.java:102)
    	at org.mortbay.xml.XmlParser.<init>(XmlParser.java:91)
    	at org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:210)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1247)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. I2012-03-06 20:26:42.879
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  3. W2012-03-06 20:26:42.879
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Re: [google-appengine] Re: Outages? Adam Sherman 3/7/12 7:25 AM
So, apparently, we all imagined the problem. The status page no longer
admits to anything.

A.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/BCIjV778ufoJ.

>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

--

Adam Sherman, CTO
Versature Corp. / +1.877.498.3772 x113

Follow us on Twitter - http://twitter.com/Versature
Check out the Versature Blog - http://inside.versature.com

RE: [google-appengine] Re: Outages? Brandon Wirtz 3/7/12 8:15 AM
> So, apparently, we all imagined the problem. The status page no longer
> admits to anything.

In most systems the Uptime is 100% minus the summation of the downtime of
all other systems.  The exception to this rule is logging. When Logging
fails to record the downtime, Uptime goes up.  As a result Google has been
working hard to build a logging system that goes down just ahead of all
other systems, and comes up shortly after.


Re: [google-appengine] Re: Outages? Adam Sherman 3/7/12 8:18 AM
On Wed, Mar 7, 2012 at 11:15 AM, Brandon Wirtz <dra...@digerat.com> wrote:
> In most systems the Uptime is 100% minus the summation of the downtime of
> all other systems.  The exception to this rule is logging. When Logging
> fails to record the downtime, Uptime goes up.  As a result Google has been
> working hard to build a logging system that goes down just ahead of all
> other systems, and comes up shortly after.

Well said sir!

I'm still laughing.

A.

--
Adam Sherman, CTO
Versature Corp. / +1.877.498.3772 x113

Follow us on Twitter - http://twitter.com/Versature
Check out the Versature Blog - http://inside.versature.com

Re: [google-appengine] Re: Outages? Nick 3/7/12 6:07 PM
I'm getting the same errors :(
Re: [google-appengine] Re: Outages? Miroslav Genov 3/8/12 12:50 AM
Same on my side. Normally our app is booting for 6-7 seconds, now the new instance requests are taking 40-50 seconds, which are causing request to timeout. The status page is not displaying any errors.  

Any ideas ? 
Re: Outages? Tom Carchrae 3/8/12 1:13 AM

I was crying until I read Brandon's email.  Once I finish laughing, I
will resume crying - and thinking about a desperate flee to another
host.

I've been getting many of these over the past week, I'm starting to
pull my hair out.  I cannot reproduce locally and I've no idea what is
causing it as it's pretty intermittent.  It does seem to happen more
frequently with static file requests (images and javascript, etc).

2012-03-08 00:57:26.492
The process handling this request unexpectedly died. This is likely to
cause a new process to be used for the next request to your
application. (Error code 203)

Re: Outages? mv 3/8/12 7:15 AM
Same on my syde :

com.google.apphosting.runtime.HardDeadlineExceededError: This request (148c8d61fefddf5e) started at 2012/03/08 15:06:07.837 UTC and was still executing at 2012/03/08 15:07:08.458 UTC.
	at com.google.appengine.runtime.Request.process-148c8d61fefddf5e(Request.java)
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
	at java.util.jar.JarFile.<init>(JarFile.java:150)
	at java.util.jar.JarFile.<init>(JarFile.java:114)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
	at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)

On Tuesday, March 6, 2012 10:17:37 PM UTC+1, Adam Sherman wrote:
Re: Outages? Nick 3/8/12 11:50 AM
http://code.google.com/p/googleappengine/issues/detail?id=6246

..still down, no response from the App Engine team.... This is the worst!  We can't do anything to fix the problem.  Just wait.... 
Re: Outages? charisl 3/8/12 6:03 AM
We are getting the same exceptions in our app for the past few hours.
Our app has currently very low traffic ~2 requests/sec,
however we observe like 12-15 instances live at any moment

Did anyone figure this out ?
How can we get an answer by google on this ?

Ch.

Here is a sample stack trace from our app:

com.google.apphosting.api.DeadlineExceededException: This request
(bb4da48fbdbb7a77) started at 2012/03/08 13:51:26.034 UTC and was
still executing at 2012/03/08 13:52:26.340 UTC.
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:143)
        at java.util.jar.JarFile.<init>(JarFile.java:150)
        at java.util.jar.JarFile.<init>(JarFile.java:87)
        at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
        at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
        at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.misc.URLClassPath
$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
        at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
        at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
        at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
        at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
        at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
        at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
        at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
        at
com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:
723)
        at java.lang.ClassLoader.getResource(ClassLoader.java:977)
        at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
        at
org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:
159)
        at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:
1230)
C 2012-03-08 15:52:26.871

Uncaught exception from servlet
javax.servlet.UnavailableException: Initialization failed.
        at
com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:
211)
I 2012-03-08 15:52:27.089

This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application.

W 2012-03-08 15:52:27.089

A problem was encountered with the process that handled this request,
causing it to exit. This is likely to cause a new process to be used
for the next request to your application. If you see this message
Re: Outages? ra 3/8/12 8:54 AM
We are also getting strange errors on two of our apps resulting in a
500 and unreachable apps.
Yesterday (7th) appid sciplanet-hq (paid app) was affected, today
appid backupgoo-web. backupgoo-web is now unreachable for at least 3
hours.

The GAE status says "everything okay". Can't be true...

Does anybody have a clue?


Worried,


Raphael



Re: Outages? Richard Watson 3/8/12 11:56 PM
App Engine team - is there a policy of specifically avoiding these threads? I do assume some of you feel a desire to participate, but you seem to go extra-quiet when they pop up. Just wondering why.
Re: Outages? Richard Watson 3/9/12 12:08 AM
Just saw  https://groups.google.com/forum/?fromgroups#!topic/google-appengine/jufkxPik1Js which I assume is related and does have some responses from the team. If so, ignore my question!
Re: Outages? Nikolai 3/9/12 3:33 AM
+1
we had to move to our backup systems. Everything is full of 500 errors or hardcore latency.
Most of the 500 errors we see aren't even logged so this seems to be a goole problem one abstraction layer above the app.

And yes - sometimes we have got the same feeling, that we are the only ones that use appengine in a production setting. You are not alone ;)

regards,
nikolai
Re: Outages? Thomas Baldauf 3/9/12 2:16 AM
Same here: during the last days we've been getting lots of error logs
like these:

Request was aborted after waiting too long to attempt to service your
request.
or
A problem was encountered with the process that handled this request,
causing it to exit. This is likely to cause a new process to be used
for the next request to your application. (Error code 204)

Most of these requests take several minutes until they die, which is a
disaster when it comes to user experience :-(

Please Google, do something about this! Our users are paying for the
service, we want to serve their expectations!
Re: Outages? Ronoaldo José de Lana Pereira 3/9/12 7:24 AM
+1 for seeing the same problems on my app.

It started to be worse after maintenance on March 7.
This message has been hidden because it was flagged for abuse.
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/9/12 11:15 AM
Hey everyone,

Here are a few things that will help:

1. Application IDs (<--- if you have nothing else, at least provide this)
2. What is your QPS?
3. What % of your requests are errors?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Alexander Trakhimenok 3/9/12 11:48 AM
Hey Ikai,

Our app id: petaclasses

QPS: 5-20 requests per second

Current instances in dashboard: 110 - 160
Usual instances: 8-15

It's hard to say % of failed requests as we have also request that fail for other reasons (e.g. non existing pages, etc) and not sure how easily separate them.

By the way, are you guys considering to create a page where we can post/report this data in some structured way and "join" an issue so you can accumulate and understand the scale of an issue easily.

Alex
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/9/12 11:57 AM
Alex, to answer that question: yes. We are looking to revamp the production issues tracker which is far from optimal. When users can join or aggregate issues, it allows us to quickly separate actual infrastructure hiccups from user code issues.

Thanks for the info! Is there any other behavior you can report? Does it sound reasonable that you have 110-160 instances because of long startup teams leading to more instances required to serve the same load? Are you Python, Java or Go, and do you have concurrent requests enabled?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ErrbHpuYmWgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Alexey Konovalov 3/9/12 12:10 PM
Ikai,

Our apps ids: 

rvaserver
rvauser
contentfinancial
contentsports

QPS and error rates differ but they've all been getting a lot of DeadlineExceeded exceptions and the number of instances has been higher than usual over the last couple of days.

Regards,
Alexey



On Friday, March 9, 2012 2:15:35 PM UTC-5, Ikai Lan wrote:
Hey everyone,

Here are a few things that will help:

1. Application IDs (<--- if you have nothing else, at least provide this)
2. What is your QPS?
3. What % of your requests are errors?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Fri, Mar 9, 2012 at 7:24 AM, Ronoaldo José de Lana Pereira <rper...@beneficiofacil.com.br> wrote:
+1 for seeing the same problems on my app.

It started to be worse after maintenance on March 7.

Em sexta-feira, 9 de março de 2012 08h33min36s UTC-3, Nikolai escreveu:
+1
we had to move to our backup systems. Everything is full of 500 errors or hardcore latency.
Most of the 500 errors we see aren't even logged so this seems to be a goole problem one abstraction layer above the app.

And yes - sometimes we have got the same feeling, that we are the only ones that use appengine in a production setting. You are not alone ;)

regards,
nikolai

Am Dienstag, 6. März 2012 22:17:37 UTC+1 schrieb Adam Sherman:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Alexander Trakhimenok 3/9/12 12:14 PM
We are Python 2.5 (no concurrent).

Yes, it seems the start-up time is just crazy high for at least some or all instances.

I also noticed that there are lot's of instances that served just 1 request and have average latency 0ms and have QPS=0 average instance age about 8-9 minutes (up to 11 minutes). For me it seems like an instance is created to serve static content and not used anymore and stays here until it die in a while.

At the moment we have 264 active instances and it's killing our budget :( - see the screenshot attached. We had 2 hours downtime due to exceeded budget.

Alex

On Friday, 9 March 2012 15:57:27 UTC-4, Ikai Lan wrote:
Alex, to answer that question: yes. We are looking to revamp the production issues tracker which is far from optimal. When users can join or aggregate issues, it allows us to quickly separate actual infrastructure hiccups from user code issues.

Thanks for the info! Is there any other behavior you can report? Does it sound reasonable that you have 110-160 instances because of long startup teams leading to more instances required to serve the same load? Are you Python, Java or Go, and do you have concurrent requests enabled?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Fri, Mar 9, 2012 at 11:48 AM, Alexander Trakhimenok <alexander....@gmail.com> wrote:
Hey Ikai,

Our app id: petaclasses

QPS: 5-20 requests per second

Current instances in dashboard: 110 - 160
Usual instances: 8-15

It's hard to say % of failed requests as we have also request that fail for other reasons (e.g. non existing pages, etc) and not sure how easily separate them.

By the way, are you guys considering to create a page where we can post/report this data in some structured way and "join" an issue so you can accumulate and understand the scale of an issue easily.

Alex To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ErrbHpuYmWgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Ronoaldo José de Lana Pereira 3/9/12 12:22 PM

Just a follow up:

1. Application Id: oferta-unica
2. QPS: Currently around ~10 dynamic req/sec, overall ~32 req/sec
3. After disabling concurrent requests, ~0.6 errors/sec; before, ~1.5 errors/sec.

Like Alexanders said, some of the errors aren't due to this issue, but I can confirm that we have lots of 500 user-facing errors because our custom 500 error page sends events in Google Analytics:



Thanks for your help.
Re: [google-appengine] Re: Outages? Nick 3/9/12 12:27 PM
appid: i-strive-to
java - thread safe set to true.

On Friday, March 9, 2012 2:15:35 PM UTC-5, Ikai Lan wrote:
Re: [google-appengine] Re: Outages? Amit S 3/9/12 1:30 PM
Hi Ikan,

1. Application IDs (<--- if you have nothing else, at least provide this)
textyserver

2. What is your QPS?
currently at ~10qps. Latency is approximately 115.3ms. Usually its less than 30ms.

3. What % of your requests are errors?
Current instances in dashboard: 30 - 40
Usual instances: 10-15

Some of the exceptions we are seeing are :
1) com.google.appengine.api.datastore.DatastoreTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
at com.google.appengine.api.datastore.DatastoreApiHelper.translateError(DatastoreApiHelper.java:46)
at com.google.appengine.api.datastore.DatastoreApiHelper$1.convertException(DatastoreApiHelper.java:76)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:106)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:72)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:33)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.peekQueryResultAndIfFirstRecordIndexList(QueryResultsSourceImpl.java:162)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.loadMoreEntities(QueryResultsSourceImpl.java:98)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.loadMoreEntities(QueryResultsSourceImpl.java:85)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.ensureLoaded(QueryResultIteratorImpl.java:161)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.hasNext(QueryResultIteratorImpl.java:65)

Happens once every few mins for a user.

2) java.lang.NullPointerException. 
This is thrown from our custom code when accessing a Entity with approximately 5000 records. Basically the query (accessing low-level API) did not return any results. This happens only sometimes and affects very small percentage of users.  This used to perfectly work fine few days back. We are using chunking at 1000 records.

We are on Java and have enabled concurrent requests. Let us know if we can provide more information to debug this issue. 

thanks!
Re: [google-appengine] Re: Outages? Amit S 3/9/12 1:53 PM
Also now getting below exceptions - 

java.lang.ExceptionInInitializerError
at org.datanucleus.jdo.metadata.JDOAnnotationReader.processClassAnnotations(JDOAnnotationReader.java:140)
at org.datanucleus.metadata.annotations.AbstractAnnotationReader.getMetaDataForClass(AbstractAnnotationReader.java:122)
at org.datanucleus.metadata.annotations.AnnotationManagerImpl.getMetaDataForClass(AnnotationManagerImpl.java:136)
at org.datanucleus.metadata.MetaDataManager.loadAnnotationsForClass(MetaDataManager.java:2278)
at org.datanucleus.jdo.metadata.JDOMetaDataManager.getMetaDataForClassInternal(JDOMetaDataManager.java:369)
at org.datanucleus.metadata.MetaDataManager.getMetaDataForClass(MetaDataManager.java:1125)
at org.datanucleus.store.appengine.EntityUtils.idToInternalKey(EntityUtils.java:122)
at org.datanucleus.store.appengine.jdo.DatastoreJDOPersistenceManager.getObjectById(DatastoreJDOPersistenceManager.java:63)

com.google.apphosting.runtime.HardDeadlineExceededError: This request (a9a7135b6a5f023e) started at 2012/03/09 21:37:22.971 UTC and was still executing at 2012/03/09 21:38:22.911 UTC.
at org.datanucleus.store.appengine.DatastoreManager.getDatastoreClass(DatastoreManager.java:765)
at org.datanucleus.store.appengine.DatastoreManager.getDatastoreClass(DatastoreManager.java:88)
at org.datanucleus.store.appengine.EntityUtils.determineKind(EntityUtils.java:98)
at org.datanucleus.store.appengine.EntityUtils.determineKind(EntityUtils.java:94)
at org.datanucleus.store.appengine.EntityUtils.idToInternalKey(EntityUtils.java:124)
com.google.apphosting.api.DeadlineExceededException: This request (34c3e1d5cfc6d211) started at 2012/03/09 21:36:34.867 UTC and was still executing at 2012/03/09 21:37:34.367 UTC.
at com.google.appengine.runtime.Request.process-34c3e1d5cfc6d211(Request.java)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)

Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/9/12 2:07 PM
I forgot to ask if these were master/slave or high replication apps. I can always check by going to the admin console, but I'm hoping to separate them out.

We're looking into the HR apps first (one I figure out which is which).

--
Ikai Lan 
Developer Programs Engineer, Google App Engine




--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Amit S 3/9/12 2:12 PM
textyserver is on master/slave.
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/9/12 2:14 PM
Yep, I figured it out (when you look at an app in the admin console, if the app ID has a s~ prefix, that means it runs in High Replication). I was just pointing it out for people who hadn't yet reported application IDs.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/9/12 2:44 PM
Okay, at least with this thread, it seems like the common thread is that the applications are master/slave applications. We're going to try to tweak a few things on our end to lessen the pain, but pay attention to the downtime-notify@ list (https://groups.google.com/forum/?fromgroups#!forum/google-appengine-downtime-notify). We may announce another maintenance soon as a slightly longer-term fix (while the real long term fix is on its way).

I'm going to switch to take a look at some of the issues with error code 20x - those seem to be impacting the High Replication applications as well.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Re: [google-appengine] Re: Outages? John 3/9/12 4:18 PM
My app is responding, again.

Re: [google-appengine] Re: Outages? Amit S 3/9/12 5:56 PM
Any update on this? Still seeing many errors & exceptions in the logs. 

On Fri, Mar 9, 2012 at 4:18 PM, John <jwb...@gmail.com> wrote:
My app is responding, again.


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/pJUdXZmD8uwJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? John 3/9/12 6:08 PM
Our apps seem to be better since about 15:30 PT.
Re: Outages? Alexander Trakhimenok 3/9/12 8:09 PM
Our Python 2.5 app "petaclasses" is still having crazy amount of instances - over 250 at the moment against normal 10-15.

Our average costs to run the app is $25 per day - it's 4 hours till the end of the today and we already spend $154. What the .... ??

Is it policy of Google AppEngine team that we (customers) should pay for mistakes made by GAE? Any chance to get it fixed and get a refund?

Thanks,
Alex
Re: Outages? Alex Popescu 3/10/12 2:20 AM
I'm also interested to find out how to get a refund after the reports I've submitted in the last 24h showing in this thread [1] that since the maintenance window Google AppEngine itself has been the only one responsible for consuming my quota. While historical stats have also been wiped out since the last maintenance I still have the billing history which can be used as historical data.

Alexander, should we start a separate thread to get this info from Google?

Re: Outages? Alexander Trakhimenok 3/10/12 6:18 AM
Alex,

I believe it's better wait till Monday before starting a new thread as otherwise it would not get much attention. By Monday we would get actual billing history data for the 9th of March so we would be able to raise  a case with some data so we can prove a pattern.

In meantime I belive we can fill a spreadsheet - I've created a template: https://docs.google.com/spreadsheet/ccc?key=0AtxJ_-1aIO7idEhYT3pyLVZ4R3F0cGdTMFhyZkswVWc

Also we can raise and issue in the issue tracker.

Alex
Re: [google-appengine] Re: Outages? Amit S 3/10/12 11:26 AM
appid: textyserver

Still getting lots of exceptions, mainly:

1) com.google.apphosting.runtime.HardDeadlineExceededError exceptions, 
2) Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext
3) javax.jdo.JDOException: Transaction failed to commit at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:419) at org.datanucleus.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:281)

Status page is saying everything is normal - http://code.google.com/status/appengine which does not seem true.

Please let us know if you need more information. 

thanks!
Amit
Re: [google-appengine] Re: Outages? nischalshetty 3/10/12 6:45 PM
Best explanation ever.

On Wednesday, March 7, 2012 9:45:11 PM UTC+5:30, Brandon Wirtz wrote:
> So, apparently, we all imagined the problem. The status page no longer
> admits to anything.

In most systems the Uptime is 100% minus the summation of the downtime of
all other systems.  The exception to this rule is logging. When Logging
fails to record the downtime, Uptime goes up.  As a result Google has been
working hard to build a logging system that goes down just ahead of all
other systems, and comes up shortly after.


Re: Outages? Thomas Baldauf 3/12/12 5:17 AM
We experience Google AppEngine having very poor performance / high
latency since about 2 weeks. Even the most basic requests take 1-2s,
going up to over 60s which is totally unacceptable from a user's point
of view. At the same time, about 1-2% of all requests are failing
(without any app-internal reason). Our paying (!) users are beginning
to complain and ask for a fix. Yet, we're totally dependent on
Google's mercy to solve this problem.

SO PLEASE GOOGLE: do something about it!
THANK YOU!!

appid: typescout

Regards,
Thomas
Re: Outages? Thomas Baldauf 3/12/12 5:20 AM
appid: typescout
qps: 1-2 requests per second
% failed: 1-2%

we're experiencing very high latency and too many instances without a
reason!

thanks for your help.
regards,
thomas


On 9 Mrz., 20:15, "Ikai Lan (Google)" <ika...@google.com> wrote:
> Hey everyone,
>
> Here are a few things that will help:
>
> 1. Application IDs (<--- if you have nothing else, at least provide this)
> 2. What is your QPS?
> 3. What % of your requests are errors?
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> plus.ikailan.com
>
> On Fri, Mar 9, 2012 at 7:24 AM, Ronoaldo José de Lana Pereira <
>
>
>
>
>
>
>
> rpere...@beneficiofacil.com.br> wrote:
> > +1 for seeing the same problems on my app.
>
> > It started to be worse after maintenance on March 7.
>
> > Em sexta-feira, 9 de março de 2012 08h33min36s UTC-3, Nikolai escreveu:
>
> >> +1
> >> we had to move to our backup systems. Everything is full of 500 errors or
> >> hardcore latency.
> >> Most of the 500 errors we see aren't even logged so this seems to be a
> >> goole problem one abstraction layer above the app.
>
> >> And yes - sometimes we have got the same feeling, that we are the only
> >> ones that use appengine in a production setting. You are not alone ;)
>
> >> regards,
> >> nikolai
>
> >> Am Dienstag, 6. März 2012 22:17:37 UTC+1 schrieb Adam Sherman:
>
> >>> Am I the only one seeing short duration outages? They are being
> >>> reflected at:
>
> >>>http://code.google.com/status/**appengine<http://code.google.com/status/appengine>
>
> >>> But I don't see anyone else complaining anywhere, so it makes me worried.
>
> >>> A.
>
> >>>  --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.
Re: Outages? Oliver 3/12/12 5:43 AM
Re: Outages? Tony 3/12/12 8:39 AM
So it keeps shutting down instances, warming them up and eating my money. AppID: crystalupdate


  1. 2012-03-12 08:31:44.715
    Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@1e0e954{/,/base/data/home/apps/crystalupdate/32.356258734295133782}
    com.google.apphosting.api.DeadlineExceededException: This request (74a86293e2b5f06a) started at 2012/03/12 15:30:43.629 UTC and was still executing at 2012/03/12 15:31:44.572 UTC.
    	at com.google.appengine.runtime.Request.process-74a86293e2b5f06a(Request.java)
    	at java.util.zip.ZipFile.read(Native Method)
    	at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
    	at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
    	at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
    	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    	at sun.misc.Resource.getBytes(Resource.java:124)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
    	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:616)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    	at java.lang.Class.getDeclaredConstructors0(Native Method)
    	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
    	at java.lang.Class.getConstructor0(Class.java:2716)
    	at java.lang.Class.newInstance0(Class.java:343)
    	at java.lang.Class.newInstance(Class.java:325)
    	at org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:153)
    	at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:92)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
    	at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. C 2012-03-12 08:31:44.749
    Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  3. I 2012-03-12 08:31:44.924
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  4. W 2012-03-12 08:31:44.924
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Re: Outages? Admin 3/12/12 8:40 AM
Our app id: reportingsuiteengine
Master/slave
Current instances in dashboard: 8-10
Usual instances: 2-3

Since last Thursday, without having made any kind of change either in the application configuration (in the console) or the actual code (deployments), we have been suffering an abnormal low performance and continuous errors (including 500 Server errors).

Please find additional information, including logs and screenshots, in http://code.google.com/p/googleappengine/issues/detail?id=7113

We would appreciate if you could provide us with an estimate to get this situation solve. Our customers start to feel really upset and angry, and some of them have threaten us saying they will leave us if we don't solve this soon or at least give them an estimate resolution date/ time.

Thanks in advance,
Luis
Re: Outages? Vincent 3/12/12 9:14 AM
Hi GAE guys,

same issue for us ... please react ... our cutomers are complaining :-(

appid: logos-contacts 

Regards 
Vincent

  1. 2012-03-12 11:31:23.508 /_ah/login_required?continue=https://logos-contacts.appspot.com/500 72138ms 0kb Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24
    124.148.222.42 - - [12/Mar/2012:04:31:23 -0700] "GET /_ah/login_required?continue=https://logos-contacts.appspot.com/ HTTP/1.1" 500 0 "from customer intranet" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24" "logos-contacts.appspot.com" ms=72138 cpu_ms=887 api_cpu_ms=0 cpm_usd=0.024681 loading_request=1 exit_code=104 instance=00c61b117c5f903ba6921dbd52f633545a72631c
  2. W2012-03-12 11:31:22.323
    Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@3aa791{/,/base/data/home/apps/logos-contacts/v12.357205322028193482}
    com.google.apphosting.api.DeadlineExceededException: This request (0746107fc018dfa5) started at 2012/03/12 11:30:21.209 UTC and was still executing at 2012/03/12 11:31:21.935 UTC.
    	at java.util.zip.ZipFile.open(Native Method)
    	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
    	at java.util.jar.JarFile.<init>(JarFile.java:150)
    	at java.util.jar.JarFile.<init>(JarFile.java:87)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
    	at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
    	at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
    	at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
    	at com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:723)
    	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
    	at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
    	at org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:159)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1230)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  3. C2012-03-12 11:31:22.345
    Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  4. I2012-03-12 11:31:22.669
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  5. W2012-03-12 11:31:22.669
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)

Re: Outages? Riley 3/12/12 9:44 AM
Our appid is activegrade, we use the m/s datastore, and get from 0-10 QPS throughout the day.  Normally we have 1-4 instances running, but since this seems *mostly* related to startup, we dedicated 10 idle resident instances to run all the time.  This covers us a little, but still, when a user triggers a new instance, they get the 60+ second wait and then an error. Ugh!  Our costs are relatively minor - about $20 a day now that we are running these 10 ordinarily unnecessary instances - but this is a big cost for us, and embarrassing too.


On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: Outages? Riley 3/12/12 9:46 AM
Ikai, it sounds like support for HR apps is being prioritized.  Is that the case? Should we expect that to be the case in the future? Sorry if that's documented somewhere already~

Riley
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/12/12 11:52 AM
Hi Riley,

That's a legitimate question, and one that we haven't officially answered yet. It's certainly the direction that things have been moving simply due to the nature of production management. Given that the SLA applies to HRD and not master/slave applications, you are definitely going to get a better quality of service migrating to HRD. In fact, I strongly advise that you do so. 

One challenge that we have when dealing with issues is to decide whether we should do emergency maintenance that requires downtime. With any production system, it's not always guaranteed that maintenance will result in issues being completely resolved, which would be really bad for app developers. At what threshold do we determine that a downtime with no guarantee of addressing the issues is worthwhile? Global 0.1% error rate? 1%? The call is not always clear cut because those errors may not be evenly distributed, and the impact may be huge, or it may be small. With master/slave applications, we do what we can to address the short term symptoms as well as the underlying system issues without impacting serving, which is often an order of magnitude more difficult (It kind of reminds me of that scene in Indiana Jones where he takes an artifact, swapping it with a bag of sand as quickly as possible to try to avoid setting off traps. Pillaging of historic artifacts is way easier when it's not dangerous, not speaking from personal experience). When your application runs on High Replication, the call is easy: there's no downtime required in 99% of cases, so we perform the maintenance right away because if it doesn't address the issue, there's no serving downtime for users.

If you're not subscribed to downtime-notify, I recommend that you do so. Announcements like this will NOT and never will be moving to StackOverflow:


We may be announcing a maintenance in the very near future that will impact the serving of master/slave applications.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/VzKRK5UG96MJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? j 3/12/12 12:51 PM
Ikai,

I have not moved to HRD yet. But I am pretty sure I am the only user of my application. However, ever since couples of days back, not only that it is slow but I kept on running out of quota, despite the fact that I turned on the billing. I have switched off billing yesterday as it didn't help me.

Can I get a refund? The request is due to the fact that I am hitting the app less than 20 times per day, and I run out of quotas. Enabled billing didn't help. I am perplexed with a single user, how would it be possible to exhaust 0.05 million operations? see below as an example -

Datastore Read Operations
100%
100%0.05 of 0.05 Million Ops0.00$0.70/ Million Ops$0.00

I don't want publish my appid, if you would like to know, I get text you at your google voice number. If that helps.

Thanks for any reply.
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/12/12 1:52 PM
Regarding that maintenance period:


It's happening next Monday, March 19th at 4pm US/Pacific (19th March, 23:00 GMT).

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Re: [google-appengine] Re: Outages? Riley 3/12/12 1:54 PM
Thanks a lot.  FYI: That post says Wednesday the 19th instead of Monday the 19th.

Riley
Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/12/12 1:59 PM
GAH, it's like no matter how many times I read these things over I always make at least one mistake.

And that's why code review is a Good Thing.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Re: [google-appengine] Re: Outages? Amit S 3/12/12 2:33 PM
Hi Ikan,

We wouldn't mind moving to HRD from M/S, but isn't it 3X more expensive?

Also, what's the minimal way to impact our users when datastore is in read-only mode during downtimes? Consider that every action our users take involves writing to a datastore. Will using memcache help? Will memcache be available without interruption during datastore downtime?

thanks!
Amit
RE: [google-appengine] Re: Outages? Brandon Wirtz 3/12/12 3:01 PM

> We wouldn't mind moving to HRD from M/S, but isn't it 3X more expensive?

 

No.

Re: [google-appengine] Re: Outages? Chris Ramsdale 3/12/12 3:55 PM
Moving to HRD is the safest way to ensure that your users are not impacted during a downtime. Memcache and other mechanisms can be used, but will definitely not scale and aren't guaranteed to be resilient in the face of all downtime scenarios.

For the particular issues reported on this thread, we have a few root causes that we're looking into. In terms of a fix though, all of the affected apps are running on M/S and, as a result, our options are much more constrained -- we're not able to move the as apps freely as we can with HRD-based applications.

M/S worked really well when it was first rolled out, but given the increase in number of apps and datastore transactions we needed an even better solution -- thus HRD. While the pros and cons of HRD have been discussed and debated within this group, the simple fact is: if you want to minimize your exposure to downtimes you need to move over to HRD. There's an SLA of 99.95%, which we've consistently beat month over month.

We're committed to resolving the current issue, but I strongly urge anyone running on M/S to make the move over to HRD. It's the quickest and most long-term fix that you can make.

-- Chris Ramsdale

Product Manager, Google App Engine


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MooASjwFQ28J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/12/12 4:08 PM
HRD is not 3x more expensive. We lowered the cost to make it match the master/slave cost.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Mon, Mar 12, 2012 at 2:33 PM, Amit Sangani <amit.s...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MooASjwFQ28J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/12/12 5:54 PM
Quick update: the time has been pushed back 2 hours to 6PM US/Pacific. See the latest message here:


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Re: [google-appengine] Re: Outages? philburk 3/12/12 6:22 PM
Hi Chris,


On Monday, March 12, 2012 3:55:00 PM UTC-7, Chris Ramsdale wrote:
For the particular issues reported on this thread, we have a few root causes that we're looking into. In terms of a fix though, all of the affected apps are running on M/S

Our app "prodicta" is using HRD, not M/S. We have been seeing a big degradation in performance the last few days. I just ran a test and my client code timed out because it took 47 seconds to load a 200KB static JAR file. It took 34 seconds to load a small PNG file. Customers are complaining.

I starred this issue:

Is this problem being considered a high priority? The status chart does not seem to reflect the problems being reported.

RE: [google-appengine] Re: Outages? Brandon Wirtz 3/12/12 8:14 PM

Should update the message when you start a new app that says it costs 3x as much.

 

From: google-a...@googlegroups.com [mailto:google-a...@googlegroups.com] On Behalf Of Ikai Lan (Google)
Sent: Monday, March 12, 2012 4:09 PM
To: google-a...@googlegroups.com
Subject: Re: [google-appengine] Re: Outages?

Re: [google-appengine] Re: Outages? Richard Watson 3/13/12 1:36 AM
In case you're keeping track of issues thinking it's generally cleared up:

I'm on HR and have noticed higher latencies (couple seconds instead of e.g. 300ms) lately and sometimes higher error rates (a few instead of 0-3). Yesterday over about 6 hours I got a ton of 60-second requests that threw 500's with accompanying messages [1], usually on memcache sets hitting a deadline exceeded.  Also, over the last couple weeks I've been running 3 instances permanently despite sometimes shutting them down manually.  Usually I get by on one just fine with bursts of 2 or 3.  I've noticed that one instance serves the majority of traffic with the other two serving maybe 50 requests over many hours, so the shutdown isn't aggressive enough.

"This request caused a new process to be started for your application..."
+
"A problem was encountered with the process that handled this request, causing it to exit" in the same request.

App-id: 2dumo-hr
Re: [google-appengine] Re: Outages? Miroslav Genov 3/13/12 1:59 AM
I'm encountering the same issue with HR app. The spike got started in about ~1 hour from now. 

AppID: cmsevobg
Datastore: HR
Re: [google-appengine] Re: Outages? Sébastien Tromp 3/13/12 2:02 AM
Hello,

Same thing here, since around an hour ago:
AppID: fiveorbsgame & fiveorbsgame-test

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YUHiXXkGPAgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Miroslav Genov 3/13/12 2:16 AM
Ops my mistake, our HR id is cmsevobg-hr. Normally we get initialization time for ~10 seconds, now the time is 38 seconds using F4 CPU model. 

Here are some exception traces that might help
  1. Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@199f8e6{/,/base/data/home/apps/s~cmsevobg-hr/production.357324755677494831}
    java.lang.NullPointerException
    	at com.google.inject.servlet.GuiceServletContextListener.contextInitialized(GuiceServletContextListener.java:46)
    	at xxx.xxx.xxx.xxx.AppBootstrap.contextInitialized(AdmBootstrap.java:189)
    	at org.mortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:548)
    	at org.mortbay.jetty.servlet.Context.startContext(Context.java:136)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. C2012-03-13 11:02:31.883
    Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
Re: [google-appengine] Re: Outages? Pieter Coucke 3/13/12 3:43 AM
Same here, very slow and frequent instance startups.  I use the HR datastore.

I've set idle instances to 2 F4s but I have the impression that the scheduler prefers to start a new instance instead of using one of the available idle ones.


-- 
Pieter Coucke
Onthoo BVBA
zamtam.com | cashcontact.com
Re: [google-appengine] Re: Outages? Mos 3/13/12 3:44 AM
Same thing the last minutes on our app (HRD, Java, Low-Traffic, one instance, no new deployment, simple page just hitting MemCache):

"Request was aborted after waiting too long to attempt to service your request."   -->  User sees 500er

GAE-Team, what is going on the last days?  In my opinion the Google App Engine is unreliable and looks more like a alpha- or beta- cloudenvrionment....

Please Google share you analysis with us.

Cheers
Mos
 

 

2012/3/13 Sébastien Tromp <sebasti...@gmail.com>

Re: [google-appengine] Re: Outages? Richard Watson 3/13/12 3:51 AM
My graph showing ms/sec (attached) over last 24h.  Average spikes were up to 32 seconds, but I think all the errors were 60+-1 sec.

On Tuesday, March 13, 2012 10:36:38 AM UTC+2, Richard Watson wrote:
Re: [google-appengine] Re: Outages? charisl 3/13/12 4:44 AM
Hello Ikai

our app is betscoreslive. In our settings master/slave replication is activated but we are NOT using the datastore at all and we have been experiencing the DeadlineExceedExceptions and increased instance number mentioned by the rest of the people in the discussion. Our app is only using memcache.

Normal operation
during traffic peaks: instances ~25, ~45 requests/second and QPS > 1

Now we observe: 12 instances doing nth with ~2 requests/second and QPS 0.2
We had the Deadline exceptions since one week now with small periods of normal operation. During the problematic periods, our app is pretty unresponsive.

Will the announced maintenance for master/slave apps will solve the issue for our app as well?

thanks in advance

Charis

On Friday, March 9, 2012 9:15:35 PM UTC+2, Ikai Lan wrote:
Hey everyone,

Here are a few things that will help:

1. Application IDs (<--- if you have nothing else, at least provide this)
2. What is your QPS?
3. What % of your requests are errors?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Fri, Mar 9, 2012 at 7:24 AM, Ronoaldo José de Lana Pereira <rper...@beneficiofacil.com.br> wrote:
+1 for seeing the same problems on my app.

It started to be worse after maintenance on March 7.

Em sexta-feira, 9 de março de 2012 08h33min36s UTC-3, Nikolai escreveu:
+1
we had to move to our backup systems. Everything is full of 500 errors or hardcore latency.
Most of the 500 errors we see aren't even logged so this seems to be a goole problem one abstraction layer above the app.

And yes - sometimes we have got the same feeling, that we are the only ones that use appengine in a production setting. You are not alone ;)

regards,
nikolai


Am Dienstag, 6. März 2012 22:17:37 UTC+1 schrieb Adam Sherman:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: Outages? Chris Merrill 3/13/12 6:05 AM
Our app is running on HRD and performance has, on average, been fine over the past 2 weeks. But this morning, our app is unusable. Almost every request times out. Note that our app was last deployed 2 weeks ago - no changes since then. Our staging application has newer code but a LOT less data in it - it shows the same symptoms. Even our login page times out most of the time!

AppEngine status panel say everything is ok, but we're dead in the water  :(
Re: [google-appengine] Re: Outages? Ronoaldo José de Lana Pereira 3/13/12 6:37 AM
This is very disturbing ... Our M/S app is getting higher error rates and some instances take from 15s to 70s to start. We can't do anything about this and even debug what is happening. If there was a issue with our code, they should always take 70s to start! I really can't understand or think about what in our code we are loading the whole world to take all that time... Then the solutions is: move to HRD and ... experience the same load time on app startup??? So, what can I do!!!

Sorry to the rude speaking but I'm very concerned. We started the process to get operational support and plan our migration to have the SLA, but if our app startup will continue taking a lot of time I really can imagine what to do...

Googlers, does the startup problem is getting solved for both M/S and HRD? Only HRD? None? Is there any thing we can do to avoid that strange behavior of our instances? Instance startup seems to be vital to the application's health: if your app takes to much time to startup then all concurrent requests to that instances die with 500's. This is very odd, and the "warmup" requests seems to never work. By our observations and other people observations, even if you set a fixed min_idle instances to be always there, they don't serve the traffic and you still get errors!

Hope to see some answers, we really liked GAE in our first year woking with the platform, and now I feel completely lost...

Best Regards,

-Ronoaldo
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 6:52 AM
I believe this was related to:

This should now be fixed.

On Tue, Mar 13, 2012 at 9:59 AM, Miroslav Genov <mge...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YUHiXXkGPAgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 6:53 AM
Hi, 

I believe this was also related to:

And should now be fixed.
--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 7:01 AM
As Chris pointed earlier in that thread, M/S app are more vulnerable to this kind of transient infrastructure issues because moving them around require a maintenance period.

HRD applications are covered by the SLA, replicated around multiple datacenter, and better distributed if we notice an issue impacting one or many of them we can easily take actions without impacting other applications.

I strongly suggest to you to try out the self migration tool in your administration console, depending of the size of your data and your write QPS, the read only period needed to migrate your application could be very small:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/6r3NpHShJcgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 7:04 AM
If you are not using the datastore it should be trivial to move your application to the new HRD infrastructure.

If you don't need the same appid, just create a new HRD application and deploy your code on it.

If you need the same appid, use the self migration tool:

Feel free to open a production ticket if you need any assistance migrating your application:


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/W1KVvW7v-SEJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: Outages? Jason 3/13/12 7:10 AM
I'm using HRD and am still getting tons of 500s since early this
morning.
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 7:10 AM
What is your application id?

Feel free to open a production issue, if you want to investigate this offthread:

--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 7:11 AM
What is your application id? Did you already fill a production issue? 
Re: [google-appengine] Re: Outages? Mos 3/13/12 7:22 AM
> What is your application id?

krisen-talk    (www.krisentalk.de)


> Feel free to open a production issue

There is already an issue from someone else (following this thread a lot of people are affected):

http://code.google.com/p/googleappengine/issues/detail?id=7133

Johan, what's going on with GAE the last days?  It doesn't feel like a PaaS in production mode.
Perhaps Google should reintroduce the Beta status. ;) 
Re: [google-appengine] Re: Outages? Riley 3/13/12 7:23 AM
We migrated our app to the HRD last night. With 4 GB of data quota in around 2M entities it took 20-30 minutes.  On the new app, we are seeing response times at about 1% of the MS app - 100 times faster.  Our app was read only for less than three seconds - enough to affect 10 requests from 2 users.

We decided to do this without a very thorough testing because the app was broken across all of our users on M/S, and will only break in a small percentage of users on HRD.  When it does break, it will be pretty harmless, and we expect to fix the inconsistency problems by the end of today.  We just could not wait until Monday for a possible fix for the MS app.

If you've got a similar setup or similar needs, I recommend a hastier-than-usual switch. Just watch out for the email limits~
Re: [google-appengine] Re: Outages? Johan Euphrosine (Google) 3/13/12 7:29 AM
Thanks mos,

I asked for more detail on the tickets, let's followup there.

App Engine is indeed out of beta and now covered by an SLA for paid application.

If you feel that the SLA for your application has been violated, please consider the form linked there to request credit:
Re: Outages? tempy 3/13/12 7:44 AM
To chime in: My HR app is also very unstable for what seems to be the
last few weeks. Sometimes it performs fine, and sometimes its spinning
up unnecessary instances and not using them, throwing deadlineexceeded
errors, killing requests after a long wait with "Request was aborted
after waiting too long to attempt to service your request." and on and
on - all under a light load.

Its especially galling when GAE errors result in the user seeing a
Google-branded error page, see here:
http://code.google.com/p/googleappengine/issues/detail?id=6965

All the suggestions that this is somehow isolated to the MS datastore
seem particularly disingenuous to my ears right now. For many months I
am totally happy with GAE and recommend it to everyone - then I have a
few weeks like this and I start to lay serious plans for extricating
myself from this platform. I have no idea if my SLA has been violated
as per all the legal definitions, but it sure feels like it.

On Mar 13, 3:29 pm, Johan Euphrosine <pro...@google.com> wrote:
> Thanks mos,
>
> I asked for more detail on the tickets, let's followup there.
>
> App Engine is indeed out of beta and now covered by an SLA for paid
> application.
>
> If you feel that the SLA for your application has been violated, please
> consider the form linked there to request credit:http://code.google.com/appengine/sla.html
>
>
>
>
>
>
>
>
>
> On Tue, Mar 13, 2012 at 3:22 PM, Mos <mosa...@googlemail.com> wrote:
> > > What is your application id?
>
> > krisen-talk    (www.krisentalk.de)
>
> > > Feel free to open a production issue
>
> > There is already an issue from someone else (following this thread a lot
> > of people are affected):
>
> >http://code.google.com/p/googleappengine/issues/detail?id=7133
>
> > Johan, what's going on with GAE the last days?  It doesn't feel like a
> > PaaS in production mode.
> > Perhaps Google should reintroduce the Beta status. ;)
>
> > On Tue, Mar 13, 2012 at 3:10 PM, Johan Euphrosine <pro...@google.com>wrote:
>
> >> What is your application id?
>
> >> Feel free to open a production issue, if you want to investigate this
> >> offthread:
>
> >>http://code.google.com/p/googleappengine/issues/entry?template=Produc...
>
> >> On Tue, Mar 13, 2012 at 11:44 AM, Mos <mosa...@googlemail.com> wrote:
>
> >>> Same thing the last minutes on our app (HRD, Java, Low-Traffic, one
> >>> instance, no new deployment, simple page just hitting MemCache):
>
> >>> "Request was aborted after waiting too long to attempt to service your
> >>> request."   -->  User sees 500er
>
> >>> GAE-Team, what is going on the last days?  In my opinion the Google App
> >>> Engine is unreliable and looks more like a alpha- or beta-
> >>> cloudenvrionment....
>
> >>> Please Google share you analysis with us.
>
> >>> Cheers
> >>> Mos
>
> >>> 2012/3/13 Sébastien Tromp <sebastien.tr...@gmail.com>
Re: Outages? Matt Cameron 3/13/12 8:07 AM
We are seeing lots of 500s today.  Nothing shown on the appengine status board.
Re: [google-appengine] Re: Outages? Mark 3/13/12 8:35 AM
I have had outages throughout this morning.  

  My app is bedbuzzserver.appspot.com, Java app on HR.

  From 2am until 7.48am.  I have filed Production issue 7138.

  My instances keep getting reset (and then taking too long to start up), & my 'Current Load' logs get reset to 0 (e.g. bedbuzzserver.appspot.com/logon goes from 100 calls today, to 0)
Re: Outages? Nicanor Babula 3/13/12 8:39 AM
Hi, 

We are having the same issues.
appid: domodentweb2
QPS: 0.028
%error: I would say 60%, but it's not accurate
Datastore: HRD.

We are desperate because there is no backup alternative, so if appengine is down, we are down..

Thanks.
Re: Outages? Sekhar 3/13/12 8:48 AM
Same here...a flurry of errors. The last one:


  1. E
    2012-03-13 08:42:46.475
    org.datanucleus.transaction.Transaction commit: Operation commit failed on resource: org.datanucleus.store.appengine.DatastoreXAResource@1389244, error code UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=

On Tuesday, March 6, 2012 1:17:37 PM UTC-8, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: Outages? Nicanor Babula 3/13/12 9:35 AM
We are having the same problems.

appid: domodentweb2
QPS: 0.28
%error: can't say... It's fluctuating. In15 minutes periods we can have 0%, or 80+%
datastore: HRD

Please help. 
Thanks. 

Il giorno martedì 6 marzo 2012 22:17:37 UTC+1, Adam Sherman ha scritto:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: Outages? Rick Mangi 3/13/12 12:05 PM
Same here. I use pingdom to monitor my site and it's been down on and off for the past 24 hours to the tune of around 30 minutes.

I opened an enterprise support ticket but haven't heard anything back.
Re: Outages? Mauricio Aristizabal 3/13/12 12:29 PM
Just about every request results in 1 or more resources failing with 500, and many that do finish take seconds.  Here's a sample error (I used to get 202 or 203s, now just 'aborted after waiting too long...'):
  • 2012-03-13 11:52:37.682 /resources/styles/standard.css 500 10636ms 0kb Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C)
    67.49.52.74 - - [13/Mar/2012:11:52:37 -0700] "GET /resources/styles/standard.css HTTP/1.1" 500 0 "http://www.commentous.com/f" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C)" "www.commentous.com" ms=10636 cpu_ms=0 api_cpu_ms=0 cpm_usd=0.000082 pending_ms=10610 2012-03-13 11:52:37.682 
  • Request was aborted after waiting too long to attempt to service your request.
I don't know about the legal definition in SLA, but I consider my app to be unusable and effectively down for the last six days or so.

-Mauricio





On Tuesday, March 6, 2012 1:17:37 PM UTC-8, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

RE: [google-appengine] Re: Outages? Brandon Wirtz 3/13/12 12:31 PM

We have noticed that many of the downtimes Pingdom reports are the result of AppsForDomains.  If you hit your app from another app, or via AOL, or another provider that has a peering arrangement with AppEngine it will be up.

I’m calling this AppsForDomains issues because typically during these outages we get error pages in AppsforDomains admin pages.

In these instances Green Checks will show in the status for Appengine. But your app will fail to resolve.

 

 

 

From: google-a...@googlegroups.com [mailto:google-a...@googlegroups.com] On Behalf Of Rick Mangi
Sent: Tuesday, March 13, 2012 12:06 PM
To: google-a...@googlegroups.com
Subject: [google-appengine] Re: Outages?

 

Same here. I use pingdom to monitor my site and it's been down on and off for the past 24 hours to the tune of around 30 minutes.

 

I opened an enterprise support ticket but haven't heard anything back.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/TAjLzJanor4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: Outages? Riley 3/13/12 12:40 PM
I just switched to HRD this morning, and while the app is at least accessible now, I still get "Request was aborted after waiting too long to attempt to service your request." every 60 seconds or so.  This is not even thrown as an exception or error - it's logged as Info.

This is really bad for us.  It's not that one out of every 100 users is getting a 500 page - it's that 1 out of every 100 things that EVERY USER tries to do doesn't work.  For us, that could mean that, in entering scores for a test, some of the students scores are not recorded.  Gah!  Not to mention the extra cost.
Re: Outages? stephenp 3/13/12 5:58 PM
One more here.

appid: carglyplatform (HRD)

It's been flaky off-and-on for a couple weeks, yesterday was better, today bad again. Lots of warmup errors, instance restarts, errors in general.

Stephen


On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.


On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.


On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: [google-appengine] Re: Outages? Mauricio Aristizabal 3/13/12 7:14 PM
Potential fix: set performance sliders to auto.

This is purely anecdotal but it might mean something:  After reading some post this afternoon about the instance settings not really working I switched to AUTO idle instances and AUTO pending latency (before they were set to 1-1 and 25ms-1.5s respectively).

That was about 5 hours ago, and within an hour or so everything started working fine.  Before that, problems had been continuous as far as I could tell for 6 days or so.

Or maybe the AppEngine guys finally got it under control.



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/C5nrBOmPaPcJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Mos 3/14/12 1:34 AM
> Potential fix: set performance sliders to auto.

No, that doesn't help in general. I'm running from the start with auto and having the problem.

It's still not fixed;  would be nice to get some feedback from Google ?!
Re: Outages? Nicanor Babula 3/14/12 1:35 AM
Now my app is working fine. What happened? I saw at some point my app's graphs reset and some huge values for instance hours values. Afterwards, the instance hours counters turned to normal and my app stpped raising errors. If there was the GAE team working on it, thank you very much, but I think you owe us at least an explanation. What should I say to my customers? The issue has been fixed or they should  be expecting outages in the following days? My app is on HRD.

Thanks,
Cristian


Il giorno martedì 6 marzo 2012 22:17:37 UTC+1, Adam Sherman ha scritto:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Re: [google-appengine] Re: Outages? Ikai Lan (Google) 3/14/12 11:53 AM
We've been working on addressing the issues with HRD apps.  Your experience is probably a coincidence.

There is a very small section of apps that will have a few requests (very small %) that will be a LOT slower.

We've scheduled a maintenance period for March 19th that will attempt to address issues with master/slave. You can read more about it here (scroll to the bottom for the correct time):

https://groups.google.com/forum/?fromgroups#!topic/google-appengine-downtime-notify/CO_x02OF9Ak

In general, everyone should try to migrate your application to High Replication when they can because we have a much higher speed of iteration when implementing different fixes for production issues (I have an earlier post about why this matters - we can make almost any production changes we want to HR apps without causing app downtime). At this point, I would not even create development or staging applications using master/slave because some behaviors (eventually consistency on global queries without an entity group root) differ between master/slave and high replication.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/hw3niAzjSIAJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Felippe Bueno 3/14/12 12:03 PM
Hello all. Only to share my information:

My app is doing something like this:

JS="xpto"
JS_OPTOUT="xpto1"

if request.cookie.has_key("optout") or request.headers.has_key("DNT")
 response.write(JS_OPTOUT)
else
 response.write(JS)


The normal latency is ~3ms
Now I'm seeing 13.9 ms

Earlier this morning I saw ~1000ms milisec/req
24hours ago I saw ~2400ms milisec/req

Requests/Second (24 hrs)

I noticed this behavior on all my apps for at least the last 8 days.


I'm using python 2.5/HRD.



On Wed, Mar 14, 2012 at 5:35 AM, Nicanor Babula <nicanor...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/hw3niAzjSIAJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: Outages? Rui Oliveira 4/12/12 2:00 AM

Hi,

My appId is air-menu1. HRD.

I'm getting this kind of errors always after deploy. Usually after deploy the site only cames alive after 10 - 30 minutes.
Yesterday for the first time, 24 hours after deploy the site stop completely during 6 hours.
6 Continuous hours without answering one single request. This start without any kind of modification on the program or database.
Am I alone? No one more is having this kind of problems?

After this kind of errors start always that I refersh the browser the server starts a new instance... So it's easy to have a lot of instances in some minutes.

I'm waiting to solve this issue to start business. My company has been working on this site during the last 10 month...

Please Help Me. This is very very serious...

Sincerely

Rui

  1. 2012-04-12 01:06:30.764 /com.phonemenu.conf.Configure/myRemoteService500 70285ms 0kb Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B176 Safari/7534.48.3
    188.83.217.114 - - [11/Apr/2012:17:06:30 -0700] "POST /com.phonemenu.conf.Configure/myRemoteService HTTP/1.1" 500 0 "http://www.airmenu.com/Configure.html" "Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B176 Safari/7534.48.3" "www.airmenu.com" ms=70286 cpu_ms=24714 api_cpu_ms=0 cpm_usd=0.686616 loading_request=1 pending_ms=8036 exit_code=104 instance=00c61b117c6af5c03528fba06c291f298706e3
  2. C2012-04-12 01:06:30.708
    Uncaught exception from servlet
    com.google.apphosting.runtime.HardDeadlineExceededError: This request (bc91fe1c174b394d) started at 2012/04/12 00:05:29.992 UTC and was still executing at 2012/04/12 00:06:30.679 UTC.
    	at java.security.AccessController.getStackAccessControlContext(Native Method)
    	at java.security.AccessController.checkPermission(AccessController.java:540)
    	at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
    	at com.google.apphosting.runtime.security.CustomSecurityManager.checkPermission(CustomSecurityManager.java:56)
    	at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
    	at java.io.File.lastModified(File.java:909)
    	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
    	at java.util.jar.JarFile.<init>(JarFile.java:150)
    	at java.util.jar.JarFile.<init>(JarFile.java:87)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
    	at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
    	at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
    	at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
    	at com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:723)
    	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
    	at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
    	at org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:159)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1230)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:446)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  3. I2012-04-12 01:06:30.713
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  4. W2012-04-12 01:06:30.713
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/12/12 7:58 AM
Sounds like your startup time exceeds the max (60s?) time for a request.

You need to cut down your app startup time.

Jeff

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/6IbZMM5SrmsJ.

>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Rui Oliveira 4/12/12 8:54 AM

Hi,

Thanks for your reply.

The strange is that I have 5 modules inside my application. All of the requests to the different modules fail at the same time. All the modules call different functions. All my functions are very small. My database is really small at the moment.

I don't do nothing on the startup. Should I? My client side after start just call one or two functions on the server side. Very simple and small functions. 

I turn off precompile just to test but it doesn't solve.

Yesterday night after 6 hours with the application stopped, just started running like magic. When I wake up today morning the app was on the air completely ok.

Sincerely

Rui Oliveira

Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/12/12 10:45 AM
Look in the logs for startup requests, and check the amount of time
they take to complete.

It's possible you have an app that normally loads in a few seconds,
but a hiccup at google extended those few seconds into a time that
extended past the deadline.  This would be something to complain
about.  However, since it sounds like you haven't checked those
numbers, it's likely that your app normally takes large amounts of
time to start up and normal load fluctuation pushed it over the edge.
That means this problem is going to happen again and again and again.

Check your startup time.  If it's over 30s, you should start
investigating ways to fix it.  Keep in mind that the biggest problem
for app startup is (usually) loading jars off the incredibly-slow
filesystem.  The usual culprits are having a large number of jars
and/or using tools like Spring that do classpath scanning.

Jeff

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/FQ5O-j910HgJ.

>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? alex 4/12/12 10:56 AM
Just to add to Jeff's: do use Appstats too.
Re: [google-appengine] Re: Outages? Rui Oliveira 4/13/12 5:23 AM
Thanks.

unk...@googlegroups.com 4/13/12 6:53 AM <This message has been deleted.>
Re: [google-appengine] Re: Outages? Rui Oliveira 4/13/12 6:55 AM

Hi Jeff

Thanks for your replay. Your answer was very important for me to start looking to the right part of the problem. 

Just to clarify :"startup time" is the time to start a new instance right?

How can I analyze the startup time inside the server? I'm looking to the appengine logs, appstats, and speedtracer, but in none of them I can find whats happening during the startup.

If I deploy the program I'm getting this kind of logs:

  1. 012-04-13 11:33:35.889 /_ah/warmup 500 61244ms 0kb
  2. W2012-04-13 11:33:35.787 EXCEPTION com.google.apphosting.api.DeadlineExceededException: This request (08f66682e6ba5919) started at 2012/04/13 11:32:35.741 UTC and was still e
  3. E2012-04-13 11:33:35.790 javax.servlet.ServletContext log: unavailable javax.servlet.UnavailableException: This request (08f66682e6ba5919) started at 2012/04/13 11:32:35.741 U
  4. W2012-04-13 11:33:35.810 Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@1811e2c{/,/base/data/home/apps/s~airmenudemo/29.358177921953
  5. C2012-04-13 11:33:35.816 Uncaught exception from servlet javax.servlet.UnavailableException: Initialization failed. at com.google.apphosting.runtime.jetty.AppVersionHandlerMa
  6. I2012-04-13 11:33:35.819 This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This requ
  7. W2012-04-13 11:33:35.819 A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the nex
 

On appstats: ( http://www.airmenudemo.appspot.com/appstats/stats )

 (16) 2012-04-13 11:36:31.587 "GET /appstats/" 307 real=215ms api=0ms overhead=0ms (0 RPCs)

 (17) 2012-04-13 11:35:49.207 "GET /symbolmanifest.json" 404 real=609ms api=0ms overhead=0ms (0 RPCs)

 (18) 2012-04-13 11:27:25.255 "GET /_ah/warmup" 200 real=441ms api=0ms overhead=0ms (0 RPCs)

As you can see after deploy the appstats don't log nothing.

After deploy I can't even open appstats.

After some minutes / hours everything starts ok.

Thanks

Rui

Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/13/12 7:51 AM
Right.  The problem is the 61244ms that it takes to start your app.  How long does it normally take?
Look at past warmup requests (the ones that work) and see how long they take.  My guess is that
the number is close to 60s.  If GAE gets marginally slower, it pushes you over the edge.

As for why your app takes so long to startup, I can't begin to speculate.  What does it do at startup?
Do any of your frameworks do classpath scanning?  Do you have a lot of big jars?  Zillions of
class files?  Do you load data from the datastore or blobstore?

You need to diagnose the warmup requests that *do* work.  Sure, look at appstats - although
that will only show issues if your warmup fetches data from services (ie, not the filesystem).
However, you really should be able to think about it a few minutes and figure out the problem.
App startup is almost 100% deterministic.  There are no parameters.  You know what it does.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/frES_he8DHYJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

RE: [google-appengine] Re: Outages? Brandon Wirtz 4/13/12 9:02 AM

> My guess is that the number is close to 60s.  If GAE gets marginally slower, it pushes you over the edge.

 

I’ll be doing a video very shortly about how to keep your Instances from dying on startup. But one easy way to tell if this is the issue is to up your instance size AND test that you are having the same issue on a Dynamic Backend.  If your issues magically go away, but your startups take 65 seconds, you know that you need to optimize your startup.

 

One should also watch the Instance memory usage in the dash board. Often people have instances that just spun up and are sitting at 119 of 128m of memory usage and then wonder why their instances recycle so often.

 

 

 

Re: [google-appengine] Re: Outages? Tom Willis 4/13/12 10:14 AM
FWIW, warmup requests and instance startup has been very very inconsistent the past couple of weeks. I've had an app that usually took 2-3S show up in the logs as DeadLineExceeded and causing our integration tests to fail when originating from the continuous integration server and it's very sporadic. So I think AE has been more than marginally slower as of late.

I think the lesson learned is to avoid starting up new instances, and put your whole app in one small file. :)
Re: [google-appengine] Re: Outages? Eliran Bivas 4/13/12 11:31 AM
Do you have any recommendations how to reduce the amount of IO operations during startup?
My app uses Spring and like many other Java best practices followed, my project consists of several Maven modules which results several jars creation.
I do not load any data from datastore or require HTTP connection during startup.

I understand the AppEngine filesystem is extremely slow, but on my local dev machine loading my app takes ~30sec, so I assume that in a much superior infrastructure it should take even faster. I believe AppEngine should provide some property to extend loading requests to reach over 60 seconds (like cron operations are allowed 10mins runtime).

Thanks in advance
Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/15/12 4:53 PM
Woah!  30s to start your app in your local dev environment?  That's nuts.

You have a mistaken perception of GAE.  In nearly all respects, your local dev environment will perform a single thread of execution faster than production.  Your local machine has dedicated CPU cores and I/O bandwidth, all local.  It has a mock datastore which likely has no synchronization issues.

In production, the filesystem is loading across a network.  Nearly all service calls require an RPC to a remote machine somewhere else in the cluster - maybe to several machines.  You're sharing CPU cores and RAM with a dozen other apps, some of which might be really busy.

What you get in production is a system that is not especially fast but nearly always consistent no matter what the queries/sec or database size.

Your best strategy is to figure out why your app is taking so long to start up and address that.

Jeff

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/WJ0eGsUpEgwJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Eliran Bivas 4/16/12 12:13 AM
Thanks for the clarification, it should be part of the documentation for AppEngine architecture.

As for my question, I believe that a Maven multi-module Spring application has different view on loading times.
In earlier posts here it was mentioned that loading several JARs might be an issue. BUT, that's how Maven works.
Even core Spring Framework consist of more than 10 JARs.

Are there any recommendation on what to look for (analysis tools would be great) in such deployment? Or maybe how such configuration should behave (prefer lazy bean initialization over pre-init singletons)? What about the JARs, would flatten them to a single Uber-JAR will help? And again, my context loading is without DB operations or any HTTP connection. I assume that as my project gets larger, and additional JARs will be added - are there any best practices for such scenario also?
Re: [google-appengine] Re: Outages? Eliran Bivas 4/20/12 12:34 AM
Just to add some information:
When timing the load time of my app (not from the dev server startup but starting from the point my code is beging to load) it takes my application less than 8 seconds to load (few cases were 5secs). So I still can't figure out why on AppEngine it can sometimes take more than 60 seconds...

I'd appreciate some analysis guidelines so I can further investigate what makes the app (sometimes) load very slowly.
Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/20/12 8:44 AM
As far as anyone outside Google can tell, the biggest issue involves reading files off the filesystem.  This may have changed recently since it's been a while since I (or any of the other users who commonly post about such stuff) did any measurements, but reading files seems to be painfully slow.

Some things to consider:

 * Tools that do classpath scanning (Spring, Resteasy, JDO, etc) open every jar and class file looking for annotations.  These are usually the biggest enemy.  If you can disable scanning, it will usually help a lot - although this means more manual configuration.

 * Lots of little files take longer than a few big ones.  I found a small but significant improvement (maybe 20%) by jaring up my class files - not enough that I actually still do it.  The numbers might be different if you use classpath scanning.

 * Consider carefully each jar in your project and whether you really need it.

Jeff


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/edj6rgd5s54J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Outages? Jeff Schnitzer 4/20/12 8:46 AM
Oh, what I needed to mention is that if you're trying to time your app load with (say) a filter, a lot of this load time elapses before your code begins executing.

Jeff
Re: [google-appengine] Re: Outages? Robert Kluin 4/20/12 1:02 PM
Do you mean Python or Java apps sitting at over 100mb after spin up?

I work on some extremely large Python apps, they can sit around ~75mb.
 If your Python app is heavier than that at startup... you are
probably doing stuff in a very questionable way.


Robert

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

More topics »