Outages?

1,474 views
Skip to first unread message

Adam Sherman

unread,
Mar 6, 2012, 4:17:37 PM3/6/12
to google-a...@googlegroups.com
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

Cesium

unread,
Mar 6, 2012, 4:22:30 PM3/6/12
to google-a...@googlegroups.com
I've got nothing but errors.

Jeff Schnitzer

unread,
Mar 6, 2012, 4:27:16 PM3/6/12
to google-a...@googlegroups.com
I see a lot of errors on app startup like this:

Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@15a2dc4{/,/base/data/home/apps/s~voostip/1.357238701206702102}
com.google.apphosting.api.DeadlineExceededException: This request (71d5265cd8f687bc) started at 2012/03/06 21:22:53.913 UTC and was still executing at 2012/03/06 21:23:53.699 UTC.
	at com.google.appengine.runtime.Request.process-71d5265cd8f687bc(Request.java)
	at java.util.zip.ZipFile.read(Native Method)
	at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
	at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
	at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

It's not 100% of the time, but it's often enough to be scary.  If an instance gets up it seems to stay up.  But getting there is a problem.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/AwHg5a7-EPoJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Adam Sherman

unread,
Mar 6, 2012, 4:28:32 PM3/6/12
to google-a...@googlegroups.com
That is also what I am seeing.

--
Adam Sherman, CTO
Versature Corp. / +1.877.498.3772 x113

Follow us on Twitter - http://twitter.com/Versature
Check out the Versature Blog - http://inside.versature.com

Joakim

unread,
Mar 6, 2012, 4:41:34 PM3/6/12
to google-a...@googlegroups.com
In addition to those, I've been getting logs with a single line of text reading "Request was aborted after waiting too long to attempt to service your request."
Too long seems to be about ten seconds.

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Francois Masurel

unread,
Mar 6, 2012, 4:44:41 PM3/6/12
to google-a...@googlegroups.com
Yep, getting quite a few errors on loading requests lately like this one for example :

  1. 2012-03-06 20:26:42.834
    Uncaught exception from servlet
    org.apache.xerces.parsers.ObjectFactory$ConfigurationError: Provider org.apache.xerces.parsers.XIncludeAwareParserConfiguration could not be instantiated: com.google.apphosting.api.DeadlineExceededException: This request (c2d42bb1d5647665) started at 2012/03/06 19:25:43.000 UTC and was still executing at 2012/03/06 19:26:42.782 UTC.
    	at org.apache.xerces.parsers.ObjectFactory.newInstance(Unknown Source)
    	at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
    	at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
    	at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
    	at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    	at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    	at org.mortbay.xml.XmlParser.makeFactorySecure(XmlParser.java:162)
    	at org.mortbay.xml.XmlParser.setValidating(XmlParser.java:102)
    	at org.mortbay.xml.XmlParser.<init>(XmlParser.java:91)
    	at org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:210)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1247)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. I2012-03-06 20:26:42.879
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  3. W2012-03-06 20:26:42.879
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)

Adam Sherman

unread,
Mar 7, 2012, 10:25:47 AM3/7/12
to google-a...@googlegroups.com
So, apparently, we all imagined the problem. The status page no longer
admits to anything.

A.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/BCIjV778ufoJ.
>
> To post to this group, send email to google-a...@googlegroups.com.


> To unsubscribe from this group, send email to

> google-appengi...@googlegroups.com.


> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

--

Brandon Wirtz

unread,
Mar 7, 2012, 11:15:11 AM3/7/12
to google-a...@googlegroups.com
> So, apparently, we all imagined the problem. The status page no longer
> admits to anything.

In most systems the Uptime is 100% minus the summation of the downtime of
all other systems. The exception to this rule is logging. When Logging
fails to record the downtime, Uptime goes up. As a result Google has been
working hard to build a logging system that goes down just ahead of all
other systems, and comes up shortly after.


Adam Sherman

unread,
Mar 7, 2012, 11:18:39 AM3/7/12
to google-a...@googlegroups.com
On Wed, Mar 7, 2012 at 11:15 AM, Brandon Wirtz <dra...@digerat.com> wrote:
> In most systems the Uptime is 100% minus the summation of the downtime of
> all other systems.  The exception to this rule is logging. When Logging
> fails to record the downtime, Uptime goes up.  As a result Google has been
> working hard to build a logging system that goes down just ahead of all
> other systems, and comes up shortly after.

Well said sir!

I'm still laughing.

A.

Nick

unread,
Mar 7, 2012, 9:07:58 PM3/7/12
to google-a...@googlegroups.com
I'm getting the same errors :(

Miroslav Genov

unread,
Mar 8, 2012, 3:50:09 AM3/8/12
to google-a...@googlegroups.com
Same on my side. Normally our app is booting for 6-7 seconds, now the new instance requests are taking 40-50 seconds, which are causing request to timeout. The status page is not displaying any errors.  

Any ideas ? 

Tom Carchrae

unread,
Mar 8, 2012, 4:13:48 AM3/8/12
to Google App Engine

I was crying until I read Brandon's email. Once I finish laughing, I
will resume crying - and thinking about a desperate flee to another
host.

I've been getting many of these over the past week, I'm starting to
pull my hair out. I cannot reproduce locally and I've no idea what is
causing it as it's pretty intermittent. It does seem to happen more
frequently with static file requests (images and javascript, etc).

2012-03-08 00:57:26.492
The process handling this request unexpectedly died. This is likely to
cause a new process to be used for the next request to your
application. (Error code 203)

mv

unread,
Mar 8, 2012, 10:15:14 AM3/8/12
to google-a...@googlegroups.com
Same on my syde :

com.google.apphosting.runtime.HardDeadlineExceededError: This request (148c8d61fefddf5e) started at 2012/03/08 15:06:07.837 UTC and was still executing at 2012/03/08 15:07:08.458 UTC.
	at com.google.appengine.runtime.Request.process-148c8d61fefddf5e(Request.java)
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
	at java.util.jar.JarFile.<init>(JarFile.java:150)
	at java.util.jar.JarFile.<init>(JarFile.java:114)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
	at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)

On Tuesday, March 6, 2012 10:17:37 PM UTC+1, Adam Sherman wrote:

Nick

unread,
Mar 8, 2012, 2:50:50 PM3/8/12
to google-a...@googlegroups.com
http://code.google.com/p/googleappengine/issues/detail?id=6246

..still down, no response from the App Engine team.... This is the worst!  We can't do anything to fix the problem.  Just wait.... 

charisl

unread,
Mar 8, 2012, 9:03:03 AM3/8/12
to Google App Engine
We are getting the same exceptions in our app for the past few hours.
Our app has currently very low traffic ~2 requests/sec,
however we observe like 12-15 instances live at any moment

Did anyone figure this out ?
How can we get an answer by google on this ?

Ch.

Here is a sample stack trace from our app:

com.google.apphosting.api.DeadlineExceededException: This request
(bb4da48fbdbb7a77) started at 2012/03/08 13:51:26.034 UTC and was
still executing at 2012/03/08 13:52:26.340 UTC.
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:143)
at java.util.jar.JarFile.<init>(JarFile.java:150)
at java.util.jar.JarFile.<init>(JarFile.java:87)
at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath
$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
at
com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:
723)
at java.lang.ClassLoader.getResource(ClassLoader.java:977)
at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
at
org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:
159)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:
1230)
C 2012-03-08 15:52:26.871

Uncaught exception from servlet
javax.servlet.UnavailableException: Initialization failed.
at
com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:
211)
I 2012-03-08 15:52:27.089

This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application.

W 2012-03-08 15:52:27.089

A problem was encountered with the process that handled this request,
causing it to exit. This is likely to cause a new process to be used
for the next request to your application. If you see this message

ra

unread,
Mar 8, 2012, 11:54:12 AM3/8/12
to Google App Engine
We are also getting strange errors on two of our apps resulting in a
500 and unreachable apps.
Yesterday (7th) appid sciplanet-hq (paid app) was affected, today
appid backupgoo-web. backupgoo-web is now unreachable for at least 3
hours.

The GAE status says "everything okay". Can't be true...

Does anybody have a clue?


Worried,


Raphael



Richard Watson

unread,
Mar 9, 2012, 2:56:39 AM3/9/12
to google-a...@googlegroups.com
App Engine team - is there a policy of specifically avoiding these threads? I do assume some of you feel a desire to participate, but you seem to go extra-quiet when they pop up. Just wondering why.

Richard Watson

unread,
Mar 9, 2012, 3:08:26 AM3/9/12
to google-a...@googlegroups.com
Just saw  https://groups.google.com/forum/?fromgroups#!topic/google-appengine/jufkxPik1Js which I assume is related and does have some responses from the team. If so, ignore my question!

Nikolai

unread,
Mar 9, 2012, 6:33:36 AM3/9/12
to google-a...@googlegroups.com
+1
we had to move to our backup systems. Everything is full of 500 errors or hardcore latency.
Most of the 500 errors we see aren't even logged so this seems to be a goole problem one abstraction layer above the app.

And yes - sometimes we have got the same feeling, that we are the only ones that use appengine in a production setting. You are not alone ;)

regards,
nikolai

Thomas

unread,
Mar 9, 2012, 5:16:18 AM3/9/12
to Google App Engine
Same here: during the last days we've been getting lots of error logs
like these:

Request was aborted after waiting too long to attempt to service your
request.
or
A problem was encountered with the process that handled this request,
causing it to exit. This is likely to cause a new process to be used
for the next request to your application. (Error code 204)

Most of these requests take several minutes until they die, which is a
disaster when it comes to user experience :-(

Please Google, do something about this! Our users are paying for the
service, we want to serve their expectations!

Ronoaldo José de Lana Pereira

unread,
Mar 9, 2012, 10:24:23 AM3/9/12
to google-a...@googlegroups.com
+1 for seeing the same problems on my app.

It started to be worse after maintenance on March 7.

Neal

unread,
Mar 9, 2012, 11:00:30 AM3/9/12
to google-a...@googlegroups.com
Yup, still seeing the same issues here as well.

Ikai Lan (Google)

unread,
Mar 9, 2012, 2:15:35 PM3/9/12
to google-a...@googlegroups.com
Hey everyone,

Here are a few things that will help:

1. Application IDs (<--- if you have nothing else, at least provide this)
2. What is your QPS?
3. What % of your requests are errors?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.

Alexander Trakhimenok

unread,
Mar 9, 2012, 2:48:48 PM3/9/12
to google-a...@googlegroups.com
Hey Ikai,

Our app id: petaclasses

QPS: 5-20 requests per second

Current instances in dashboard: 110 - 160
Usual instances: 8-15

It's hard to say % of failed requests as we have also request that fail for other reasons (e.g. non existing pages, etc) and not sure how easily separate them.

By the way, are you guys considering to create a page where we can post/report this data in some structured way and "join" an issue so you can accumulate and understand the scale of an issue easily.

Alex

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Ikai Lan (Google)

unread,
Mar 9, 2012, 2:57:27 PM3/9/12
to google-a...@googlegroups.com
Alex, to answer that question: yes. We are looking to revamp the production issues tracker which is far from optimal. When users can join or aggregate issues, it allows us to quickly separate actual infrastructure hiccups from user code issues.

Thanks for the info! Is there any other behavior you can report? Does it sound reasonable that you have 110-160 instances because of long startup teams leading to more instances required to serve the same load? Are you Python, Java or Go, and do you have concurrent requests enabled?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ErrbHpuYmWgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Alexey Konovalov

unread,
Mar 9, 2012, 3:10:13 PM3/9/12
to google-a...@googlegroups.com
Ikai,

Our apps ids: 

rvaserver
rvauser
contentfinancial
contentsports

QPS and error rates differ but they've all been getting a lot of DeadlineExceeded exceptions and the number of instances has been higher than usual over the last couple of days.

Regards,
Alexey


To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Alexander Trakhimenok

unread,
Mar 9, 2012, 3:14:30 PM3/9/12
to google-a...@googlegroups.com
We are Python 2.5 (no concurrent).

Yes, it seems the start-up time is just crazy high for at least some or all instances.

I also noticed that there are lot's of instances that served just 1 request and have average latency 0ms and have QPS=0 average instance age about 8-9 minutes (up to 11 minutes). For me it seems like an instance is created to serve static content and not used anymore and stays here until it die in a while.

At the moment we have 264 active instances and it's killing our budget :( - see the screenshot attached. We had 2 hours downtime due to exceeded budget.

Alex
Screen Shot 2012-03-09 at 16.12.27.png

Ronoaldo José de Lana Pereira

unread,
Mar 9, 2012, 3:22:00 PM3/9/12
to google-a...@googlegroups.com

Just a follow up:

1. Application Id: oferta-unica
2. QPS: Currently around ~10 dynamic req/sec, overall ~32 req/sec
3. After disabling concurrent requests, ~0.6 errors/sec; before, ~1.5 errors/sec.

Like Alexanders said, some of the errors aren't due to this issue, but I can confirm that we have lots of 500 user-facing errors because our custom 500 error page sends events in Google Analytics:



Thanks for your help.

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Nick

unread,
Mar 9, 2012, 3:27:39 PM3/9/12
to google-a...@googlegroups.com
appid: i-strive-to
java - thread safe set to true.

On Friday, March 9, 2012 2:15:35 PM UTC-5, Ikai Lan wrote:

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Amit Sangani

unread,
Mar 9, 2012, 4:30:22 PM3/9/12
to google-a...@googlegroups.com
Hi Ikan,

1. Application IDs (<--- if you have nothing else, at least provide this)
textyserver

2. What is your QPS?
currently at ~10qps. Latency is approximately 115.3ms. Usually its less than 30ms.

3. What % of your requests are errors?
Current instances in dashboard: 30 - 40
Usual instances: 10-15

Some of the exceptions we are seeing are :
1) com.google.appengine.api.datastore.DatastoreTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
at com.google.appengine.api.datastore.DatastoreApiHelper.translateError(DatastoreApiHelper.java:46)
at com.google.appengine.api.datastore.DatastoreApiHelper$1.convertException(DatastoreApiHelper.java:76)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:106)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:72)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:33)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.peekQueryResultAndIfFirstRecordIndexList(QueryResultsSourceImpl.java:162)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.loadMoreEntities(QueryResultsSourceImpl.java:98)
at com.google.appengine.api.datastore.QueryResultsSourceImpl.loadMoreEntities(QueryResultsSourceImpl.java:85)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.ensureLoaded(QueryResultIteratorImpl.java:161)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.hasNext(QueryResultIteratorImpl.java:65)

Happens once every few mins for a user.

2) java.lang.NullPointerException. 
This is thrown from our custom code when accessing a Entity with approximately 5000 records. Basically the query (accessing low-level API) did not return any results. This happens only sometimes and affects very small percentage of users.  This used to perfectly work fine few days back. We are using chunking at 1000 records.

We are on Java and have enabled concurrent requests. Let us know if we can provide more information to debug this issue. 

thanks!

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Amit Sangani

unread,
Mar 9, 2012, 4:53:37 PM3/9/12
to google-a...@googlegroups.com
Also now getting below exceptions - 

java.lang.ExceptionInInitializerError
at org.datanucleus.jdo.metadata.JDOAnnotationReader.processClassAnnotations(JDOAnnotationReader.java:140)
at org.datanucleus.metadata.annotations.AbstractAnnotationReader.getMetaDataForClass(AbstractAnnotationReader.java:122)
at org.datanucleus.metadata.annotations.AnnotationManagerImpl.getMetaDataForClass(AnnotationManagerImpl.java:136)
at org.datanucleus.metadata.MetaDataManager.loadAnnotationsForClass(MetaDataManager.java:2278)
at org.datanucleus.jdo.metadata.JDOMetaDataManager.getMetaDataForClassInternal(JDOMetaDataManager.java:369)
at org.datanucleus.metadata.MetaDataManager.getMetaDataForClass(MetaDataManager.java:1125)
at org.datanucleus.store.appengine.EntityUtils.idToInternalKey(EntityUtils.java:122)
at org.datanucleus.store.appengine.jdo.DatastoreJDOPersistenceManager.getObjectById(DatastoreJDOPersistenceManager.java:63)

com.google.apphosting.runtime.HardDeadlineExceededError: This request (a9a7135b6a5f023e) started at 2012/03/09 21:37:22.971 UTC and was still executing at 2012/03/09 21:38:22.911 UTC.
at org.datanucleus.store.appengine.DatastoreManager.getDatastoreClass(DatastoreManager.java:765)
at org.datanucleus.store.appengine.DatastoreManager.getDatastoreClass(DatastoreManager.java:88)
at org.datanucleus.store.appengine.EntityUtils.determineKind(EntityUtils.java:98)
at org.datanucleus.store.appengine.EntityUtils.determineKind(EntityUtils.java:94)
at org.datanucleus.store.appengine.EntityUtils.idToInternalKey(EntityUtils.java:124)
com.google.apphosting.api.DeadlineExceededException: This request (34c3e1d5cfc6d211) started at 2012/03/09 21:36:34.867 UTC and was still executing at 2012/03/09 21:37:34.367 UTC.
at com.google.appengine.runtime.Request.process-34c3e1d5cfc6d211(Request.java)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)

Ikai Lan (Google)

unread,
Mar 9, 2012, 5:07:28 PM3/9/12
to google-a...@googlegroups.com
I forgot to ask if these were master/slave or high replication apps. I can always check by going to the admin console, but I'm hoping to separate them out.

We're looking into the HR apps first (one I figure out which is which).

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Amit Sangani

unread,
Mar 9, 2012, 5:12:36 PM3/9/12
to google-a...@googlegroups.com
textyserver is on master/slave.

Ikai Lan (Google)

unread,
Mar 9, 2012, 5:14:20 PM3/9/12
to google-a...@googlegroups.com
Yep, I figured it out (when you look at an app in the admin console, if the app ID has a s~ prefix, that means it runs in High Replication). I was just pointing it out for people who hadn't yet reported application IDs.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Ikai Lan (Google)

unread,
Mar 9, 2012, 5:44:16 PM3/9/12
to google-a...@googlegroups.com
Okay, at least with this thread, it seems like the common thread is that the applications are master/slave applications. We're going to try to tweak a few things on our end to lessen the pain, but pay attention to the downtime-notify@ list (https://groups.google.com/forum/?fromgroups#!forum/google-appengine-downtime-notify). We may announce another maintenance soon as a slightly longer-term fix (while the real long term fix is on its way).

I'm going to switch to take a look at some of the issues with error code 20x - those seem to be impacting the High Replication applications as well.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



John

unread,
Mar 9, 2012, 7:18:29 PM3/9/12
to google-a...@googlegroups.com
My app is responding, again.

Amit Sangani

unread,
Mar 9, 2012, 8:56:02 PM3/9/12
to google-a...@googlegroups.com
Any update on this? Still seeing many errors & exceptions in the logs. 

On Fri, Mar 9, 2012 at 4:18 PM, John <jwb...@gmail.com> wrote:
My app is responding, again.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/pJUdXZmD8uwJ.

John

unread,
Mar 9, 2012, 9:08:30 PM3/9/12
to google-a...@googlegroups.com
Our apps seem to be better since about 15:30 PT.

Alexander Trakhimenok

unread,
Mar 9, 2012, 11:09:02 PM3/9/12
to google-a...@googlegroups.com
Our Python 2.5 app "petaclasses" is still having crazy amount of instances - over 250 at the moment against normal 10-15.

Our average costs to run the app is $25 per day - it's 4 hours till the end of the today and we already spend $154. What the .... ??

Is it policy of Google AppEngine team that we (customers) should pay for mistakes made by GAE? Any chance to get it fixed and get a refund?

Thanks,
Alex

Alex Popescu

unread,
Mar 10, 2012, 5:20:40 AM3/10/12
to google-a...@googlegroups.com
I'm also interested to find out how to get a refund after the reports I've submitted in the last 24h showing in this thread [1] that since the maintenance window Google AppEngine itself has been the only one responsible for consuming my quota. While historical stats have also been wiped out since the last maintenance I still have the billing history which can be used as historical data.

Alexander, should we start a separate thread to get this info from Google?

Alexander Trakhimenok

unread,
Mar 10, 2012, 9:18:02 AM3/10/12
to google-a...@googlegroups.com, de...@petamatic.com
Alex,

I believe it's better wait till Monday before starting a new thread as otherwise it would not get much attention. By Monday we would get actual billing history data for the 9th of March so we would be able to raise  a case with some data so we can prove a pattern.

In meantime I belive we can fill a spreadsheet - I've created a template: https://docs.google.com/spreadsheet/ccc?key=0AtxJ_-1aIO7idEhYT3pyLVZ4R3F0cGdTMFhyZkswVWc

Also we can raise and issue in the issue tracker.

Alex

Amit Sangani

unread,
Mar 10, 2012, 2:26:27 PM3/10/12
to google-a...@googlegroups.com
appid: textyserver

Still getting lots of exceptions, mainly:

1) com.google.apphosting.runtime.HardDeadlineExceededError exceptions, 
2) Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext
3) javax.jdo.JDOException: Transaction failed to commit at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:419) at org.datanucleus.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:281)

Status page is saying everything is normal - http://code.google.com/status/appengine which does not seem true.

Please let us know if you need more information. 

thanks!
Amit

Nischal Shetty

unread,
Mar 10, 2012, 9:45:02 PM3/10/12
to google-a...@googlegroups.com
Best explanation ever.

On Wednesday, March 7, 2012 9:45:11 PM UTC+5:30, Brandon Wirtz wrote:
> So, apparently, we all imagined the problem. The status page no longer
> admits to anything.

In most systems the Uptime is 100% minus the summation of the downtime of
all other systems.  The exception to this rule is logging. When Logging
fails to record the downtime, Uptime goes up.  As a result Google has been
working hard to build a logging system that goes down just ahead of all
other systems, and comes up shortly after.


Thomas

unread,
Mar 12, 2012, 8:17:05 AM3/12/12
to Google App Engine
We experience Google AppEngine having very poor performance / high
latency since about 2 weeks. Even the most basic requests take 1-2s,
going up to over 60s which is totally unacceptable from a user's point
of view. At the same time, about 1-2% of all requests are failing
(without any app-internal reason). Our paying (!) users are beginning
to complain and ask for a fix. Yet, we're totally dependent on
Google's mercy to solve this problem.

SO PLEASE GOOGLE: do something about it!
THANK YOU!!

appid: typescout

Regards,
Thomas

Thomas

unread,
Mar 12, 2012, 8:20:35 AM3/12/12
to Google App Engine
appid: typescout
qps: 1-2 requests per second
% failed: 1-2%

we're experiencing very high latency and too many instances without a
reason!

thanks for your help.
regards,
thomas
> >>>http://code.google.com/status/**appengine<http://code.google.com/status/appengine>

Oliver

unread,
Mar 12, 2012, 8:43:38 AM3/12/12
to google-a...@googlegroups.com

Tony

unread,
Mar 12, 2012, 11:39:14 AM3/12/12
to google-a...@googlegroups.com
So it keeps shutting down instances, warming them up and eating my money. AppID: crystalupdate


  1. 2012-03-12 08:31:44.715
    Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@1e0e954{/,/base/data/home/apps/crystalupdate/32.356258734295133782}
    com.google.apphosting.api.DeadlineExceededException: This request (74a86293e2b5f06a) started at 2012/03/12 15:30:43.629 UTC and was still executing at 2012/03/12 15:31:44.572 UTC.
    	at com.google.appengine.runtime.Request.process-74a86293e2b5f06a(Request.java)
    	at java.util.zip.ZipFile.read(Native Method)
    	at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
    	at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
    	at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
    	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    	at sun.misc.Resource.getBytes(Resource.java:124)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
    	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:616)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    	at java.lang.Class.getDeclaredConstructors0(Native Method)
    	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
    	at java.lang.Class.getConstructor0(Class.java:2716)
    	at java.lang.Class.newInstance0(Class.java:343)
    	at java.lang.Class.newInstance(Class.java:325)
    	at org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:153)
    	at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:92)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
    	at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. C 2012-03-12 08:31:44.749
    Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  3. I 2012-03-12 08:31:44.924
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  4. W 2012-03-12 08:31:44.924
    A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)

Admin

unread,
Mar 12, 2012, 11:40:42 AM3/12/12
to google-a...@googlegroups.com
Our app id: reportingsuiteengine
Master/slave
Current instances in dashboard: 8-10
Usual instances: 2-3

Since last Thursday, without having made any kind of change either in the application configuration (in the console) or the actual code (deployments), we have been suffering an abnormal low performance and continuous errors (including 500 Server errors).

Please find additional information, including logs and screenshots, in http://code.google.com/p/googleappengine/issues/detail?id=7113

We would appreciate if you could provide us with an estimate to get this situation solve. Our customers start to feel really upset and angry, and some of them have threaten us saying they will leave us if we don't solve this soon or at least give them an estimate resolution date/ time.

Thanks in advance,
Luis

Vincent

unread,
Mar 12, 2012, 12:14:37 PM3/12/12
to google-a...@googlegroups.com
Hi GAE guys,

same issue for us ... please react ... our cutomers are complaining :-(

appid: logos-contacts 

Regards 
Vincent

  1. 2012-03-12 11:31:23.508 /_ah/login_required?continue=https://logos-contacts.appspot.com/500 72138ms 0kb Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24
    124.148.222.42 - - [12/Mar/2012:04:31:23 -0700] "GET /_ah/login_required?continue=https://logos-contacts.appspot.com/ HTTP/1.1" 500 0 "from customer intranet" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24" "logos-contacts.appspot.com" ms=72138 cpu_ms=887 api_cpu_ms=0 cpm_usd=0.024681 loading_request=1 exit_code=104 instance=00c61b117c5f903ba6921dbd52f633545a72631c
  2. W2012-03-12 11:31:22.323
    Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@3aa791{/,/base/data/home/apps/logos-contacts/v12.357205322028193482}
    com.google.apphosting.api.DeadlineExceededException: This request (0746107fc018dfa5) started at 2012/03/12 11:30:21.209 UTC and was still executing at 2012/03/12 11:31:21.935 UTC.
    	at java.util.zip.ZipFile.open(Native Method)
    	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
    	at java.util.jar.JarFile.<init>(JarFile.java:150)
    	at java.util.jar.JarFile.<init>(JarFile.java:87)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
    	at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
    	at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
    	at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
    	at com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:723)
    	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
    	at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
    	at org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:159)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1230)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  3. C2012-03-12 11:31:22.345
  1. Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  1. I2012-03-12 11:31:22.669
  1. This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
  1. W2012-03-12 11:31:22.669
  1. A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)

Riley

unread,
Mar 12, 2012, 12:44:59 PM3/12/12
to google-a...@googlegroups.com
Our appid is activegrade, we use the m/s datastore, and get from 0-10 QPS throughout the day.  Normally we have 1-4 instances running, but since this seems *mostly* related to startup, we dedicated 10 idle resident instances to run all the time.  This covers us a little, but still, when a user triggers a new instance, they get the 60+ second wait and then an error. Ugh!  Our costs are relatively minor - about $20 a day now that we are running these 10 ordinarily unnecessary instances - but this is a big cost for us, and embarrassing too.


On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
Am I the only one seeing short duration outages? They are being reflected at:

Riley

unread,
Mar 12, 2012, 12:46:30 PM3/12/12
to google-a...@googlegroups.com
Ikai, it sounds like support for HR apps is being prioritized.  Is that the case? Should we expect that to be the case in the future? Sorry if that's documented somewhere already~

Riley

Ikai Lan (Google)

unread,
Mar 12, 2012, 2:52:29 PM3/12/12
to google-a...@googlegroups.com
Hi Riley,

That's a legitimate question, and one that we haven't officially answered yet. It's certainly the direction that things have been moving simply due to the nature of production management. Given that the SLA applies to HRD and not master/slave applications, you are definitely going to get a better quality of service migrating to HRD. In fact, I strongly advise that you do so. 

One challenge that we have when dealing with issues is to decide whether we should do emergency maintenance that requires downtime. With any production system, it's not always guaranteed that maintenance will result in issues being completely resolved, which would be really bad for app developers. At what threshold do we determine that a downtime with no guarantee of addressing the issues is worthwhile? Global 0.1% error rate? 1%? The call is not always clear cut because those errors may not be evenly distributed, and the impact may be huge, or it may be small. With master/slave applications, we do what we can to address the short term symptoms as well as the underlying system issues without impacting serving, which is often an order of magnitude more difficult (It kind of reminds me of that scene in Indiana Jones where he takes an artifact, swapping it with a bag of sand as quickly as possible to try to avoid setting off traps. Pillaging of historic artifacts is way easier when it's not dangerous, not speaking from personal experience). When your application runs on High Replication, the call is easy: there's no downtime required in 99% of cases, so we perform the maintenance right away because if it doesn't address the issue, there's no serving downtime for users.

If you're not subscribed to downtime-notify, I recommend that you do so. Announcements like this will NOT and never will be moving to StackOverflow:


We may be announcing a maintenance in the very near future that will impact the serving of master/slave applications.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/VzKRK5UG96MJ.

j

unread,
Mar 12, 2012, 3:51:19 PM3/12/12
to google-a...@googlegroups.com
Ikai,

I have not moved to HRD yet. But I am pretty sure I am the only user of my application. However, ever since couples of days back, not only that it is slow but I kept on running out of quota, despite the fact that I turned on the billing. I have switched off billing yesterday as it didn't help me.

Can I get a refund? The request is due to the fact that I am hitting the app less than 20 times per day, and I run out of quotas. Enabled billing didn't help. I am perplexed with a single user, how would it be possible to exhaust 0.05 million operations? see below as an example -

Datastore Read Operations
100%
100%0.05 of 0.05 Million Ops0.00$0.70/ Million Ops$0.00

I don't want publish my appid, if you would like to know, I get text you at your google voice number. If that helps.

Thanks for any reply.


To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Ikai Lan (Google)

unread,
Mar 12, 2012, 4:52:26 PM3/12/12
to google-a...@googlegroups.com
Regarding that maintenance period:


It's happening next Monday, March 19th at 4pm US/Pacific (19th March, 23:00 GMT).

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Riley Eynon-Lynch

unread,
Mar 12, 2012, 4:54:14 PM3/12/12
to google-a...@googlegroups.com
Thanks a lot.  FYI: That post says Wednesday the 19th instead of Monday the 19th.

Riley

Ikai Lan (Google)

unread,
Mar 12, 2012, 4:59:53 PM3/12/12
to google-a...@googlegroups.com
GAH, it's like no matter how many times I read these things over I always make at least one mistake.

And that's why code review is a Good Thing.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Amit Sangani

unread,
Mar 12, 2012, 5:33:29 PM3/12/12
to google-a...@googlegroups.com
Hi Ikan,

We wouldn't mind moving to HRD from M/S, but isn't it 3X more expensive?

Also, what's the minimal way to impact our users when datastore is in read-only mode during downtimes? Consider that every action our users take involves writing to a datastore. Will using memcache help? Will memcache be available without interruption during datastore downtime?

thanks!
Amit

Brandon Wirtz

unread,
Mar 12, 2012, 6:01:52 PM3/12/12
to google-a...@googlegroups.com

> We wouldn't mind moving to HRD from M/S, but isn't it 3X more expensive?

 

No.

Chris Ramsdale

unread,
Mar 12, 2012, 6:55:00 PM3/12/12
to google-a...@googlegroups.com
Moving to HRD is the safest way to ensure that your users are not impacted during a downtime. Memcache and other mechanisms can be used, but will definitely not scale and aren't guaranteed to be resilient in the face of all downtime scenarios.

For the particular issues reported on this thread, we have a few root causes that we're looking into. In terms of a fix though, all of the affected apps are running on M/S and, as a result, our options are much more constrained -- we're not able to move the as apps freely as we can with HRD-based applications.

M/S worked really well when it was first rolled out, but given the increase in number of apps and datastore transactions we needed an even better solution -- thus HRD. While the pros and cons of HRD have been discussed and debated within this group, the simple fact is: if you want to minimize your exposure to downtimes you need to move over to HRD. There's an SLA of 99.95%, which we've consistently beat month over month.

We're committed to resolving the current issue, but I strongly urge anyone running on M/S to make the move over to HRD. It's the quickest and most long-term fix that you can make.

-- Chris Ramsdale

Product Manager, Google App Engine


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MooASjwFQ28J.

Ikai Lan (Google)

unread,
Mar 12, 2012, 7:08:54 PM3/12/12
to google-a...@googlegroups.com
HRD is not 3x more expensive. We lowered the cost to make it match the master/slave cost.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Mon, Mar 12, 2012 at 2:33 PM, Amit Sangani <amit.s...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/MooASjwFQ28J.

Ikai Lan (Google)

unread,
Mar 12, 2012, 8:54:50 PM3/12/12
to google-a...@googlegroups.com
Quick update: the time has been pushed back 2 hours to 6PM US/Pacific. See the latest message here:


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Phil Burk

unread,
Mar 12, 2012, 9:22:47 PM3/12/12
to google-a...@googlegroups.com
Hi Chris,


On Monday, March 12, 2012 3:55:00 PM UTC-7, Chris Ramsdale wrote:
For the particular issues reported on this thread, we have a few root causes that we're looking into. In terms of a fix though, all of the affected apps are running on M/S

Our app "prodicta" is using HRD, not M/S. We have been seeing a big degradation in performance the last few days. I just ran a test and my client code timed out because it took 47 seconds to load a 200KB static JAR file. It took 34 seconds to load a small PNG file. Customers are complaining.

I starred this issue:

Is this problem being considered a high priority? The status chart does not seem to reflect the problems being reported.

Brandon Wirtz

unread,
Mar 12, 2012, 11:14:56 PM3/12/12
to google-a...@googlegroups.com

Should update the message when you start a new app that says it costs 3x as much.

Richard Watson

unread,
Mar 13, 2012, 4:36:38 AM3/13/12
to google-a...@googlegroups.com
In case you're keeping track of issues thinking it's generally cleared up:

I'm on HR and have noticed higher latencies (couple seconds instead of e.g. 300ms) lately and sometimes higher error rates (a few instead of 0-3). Yesterday over about 6 hours I got a ton of 60-second requests that threw 500's with accompanying messages [1], usually on memcache sets hitting a deadline exceeded.  Also, over the last couple weeks I've been running 3 instances permanently despite sometimes shutting them down manually.  Usually I get by on one just fine with bursts of 2 or 3.  I've noticed that one instance serves the majority of traffic with the other two serving maybe 50 requests over many hours, so the shutdown isn't aggressive enough.

"This request caused a new process to be started for your application..."
+
"A problem was encountered with the process that handled this request, causing it to exit" in the same request.

App-id: 2dumo-hr

Miroslav Genov

unread,
Mar 13, 2012, 4:59:18 AM3/13/12
to google-a...@googlegroups.com
I'm encountering the same issue with HR app. The spike got started in about ~1 hour from now. 

AppID: cmsevobg
Datastore: HR
Screen Shot 2012-03-13 at 10.50.28 AM.png

Sébastien Tromp

unread,
Mar 13, 2012, 5:02:58 AM3/13/12
to google-a...@googlegroups.com
Hello,

Same thing here, since around an hour ago:
AppID: fiveorbsgame & fiveorbsgame-test

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YUHiXXkGPAgJ.

Miroslav Genov

unread,
Mar 13, 2012, 5:16:02 AM3/13/12
to google-a...@googlegroups.com
Ops my mistake, our HR id is cmsevobg-hr. Normally we get initialization time for ~10 seconds, now the time is 38 seconds using F4 CPU model. 

Here are some exception traces that might help
  1. Failed startup of context com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@199f8e6{/,/base/data/home/apps/s~cmsevobg-hr/production.357324755677494831}
    java.lang.NullPointerException
    	at com.google.inject.servlet.GuiceServletContextListener.contextInitialized(GuiceServletContextListener.java:46)
    	at xxx.xxx.xxx.xxx.AppBootstrap.contextInitialized(AdmBootstrap.java:189)
    	at org.mortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:548)
    	at org.mortbay.jetty.servlet.Context.startContext(Context.java:136)
    	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
    	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)
    
  2. C2012-03-13 11:02:31.883
  1. Uncaught exception from servlet
    javax.servlet.UnavailableException: Initialization failed.
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:211)
    	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
    	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:422)
    	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
    	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
    	at java.lang.Thread.run(Thread.java:679)

Pieter Coucke

unread,
Mar 13, 2012, 6:43:48 AM3/13/12
to google-a...@googlegroups.com
Same here, very slow and frequent instance startups.  I use the HR datastore.

I've set idle instances to 2 F4s but I have the impression that the scheduler prefers to start a new instance instead of using one of the available idle ones.


-- 
Pieter Coucke
Onthoo BVBA
zamtam.com | cashcontact.com

Mos

unread,
Mar 13, 2012, 6:44:46 AM3/13/12
to google-a...@googlegroups.com
Same thing the last minutes on our app (HRD, Java, Low-Traffic, one instance, no new deployment, simple page just hitting MemCache):

"Request was aborted after waiting too long to attempt to service your request."   -->  User sees 500er

GAE-Team, what is going on the last days?  In my opinion the Google App Engine is unreliable and looks more like a alpha- or beta- cloudenvrionment....

Please Google share you analysis with us.

Cheers
Mos
 

 

2012/3/13 Sébastien Tromp <sebasti...@gmail.com>

Richard Watson

unread,
Mar 13, 2012, 6:51:03 AM3/13/12
to google-a...@googlegroups.com
My graph showing ms/sec (attached) over last 24h.  Average spikes were up to 32 seconds, but I think all the errors were 60+-1 sec.

On Tuesday, March 13, 2012 10:36:38 AM UTC+2, Richard Watson wrote:
MS-per-request.png

charisl

unread,
Mar 13, 2012, 7:44:27 AM3/13/12
to google-a...@googlegroups.com
Hello Ikai

our app is betscoreslive. In our settings master/slave replication is activated but we are NOT using the datastore at all and we have been experiencing the DeadlineExceedExceptions and increased instance number mentioned by the rest of the people in the discussion. Our app is only using memcache.

Normal operation
during traffic peaks: instances ~25, ~45 requests/second and QPS > 1

Now we observe: 12 instances doing nth with ~2 requests/second and QPS 0.2
We had the Deadline exceptions since one week now with small periods of normal operation. During the problematic periods, our app is pretty unresponsive.

Will the announced maintenance for master/slave apps will solve the issue for our app as well?

thanks in advance

Charis

On Friday, March 9, 2012 9:15:35 PM UTC+2, Ikai Lan wrote:
Hey everyone,

Here are a few things that will help:

1. Application IDs (<--- if you have nothing else, at least provide this)
2. What is your QPS?
3. What % of your requests are errors?

--
Ikai Lan 
Developer Programs Engineer, Google App Engine



On Fri, Mar 9, 2012 at 7:24 AM, Ronoaldo José de Lana Pereira <rper...@beneficiofacil.com.br> wrote:
+1 for seeing the same problems on my app.

It started to be worse after maintenance on March 7.

Em sexta-feira, 9 de março de 2012 08h33min36s UTC-3, Nikolai escreveu:
+1
we had to move to our backup systems. Everything is full of 500 errors or hardcore latency.
Most of the 500 errors we see aren't even logged so this seems to be a goole problem one abstraction layer above the app.

And yes - sometimes we have got the same feeling, that we are the only ones that use appengine in a production setting. You are not alone ;)

regards,
nikolai


Am Dienstag, 6. März 2012 22:17:37 UTC+1 schrieb Adam Sherman:
Am I the only one seeing short duration outages? They are being reflected at:


But I don't see anyone else complaining anywhere, so it makes me worried.

A.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/yixu1yAlMs4J.

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Chris Merrill

unread,
Mar 13, 2012, 9:05:18 AM3/13/12
to google-a...@googlegroups.com
Our app is running on HRD and performance has, on average, been fine over the past 2 weeks. But this morning, our app is unusable. Almost every request times out. Note that our app was last deployed 2 weeks ago - no changes since then. Our staging application has newer code but a LOT less data in it - it shows the same symptoms. Even our login page times out most of the time!

AppEngine status panel say everything is ok, but we're dead in the water  :(

Ronoaldo José de Lana Pereira

unread,
Mar 13, 2012, 9:37:35 AM3/13/12
to google-a...@googlegroups.com
This is very disturbing ... Our M/S app is getting higher error rates and some instances take from 15s to 70s to start. We can't do anything about this and even debug what is happening. If there was a issue with our code, they should always take 70s to start! I really can't understand or think about what in our code we are loading the whole world to take all that time... Then the solutions is: move to HRD and ... experience the same load time on app startup??? So, what can I do!!!

Sorry to the rude speaking but I'm very concerned. We started the process to get operational support and plan our migration to have the SLA, but if our app startup will continue taking a lot of time I really can imagine what to do...

Googlers, does the startup problem is getting solved for both M/S and HRD? Only HRD? None? Is there any thing we can do to avoid that strange behavior of our instances? Instance startup seems to be vital to the application's health: if your app takes to much time to startup then all concurrent requests to that instances die with 500's. This is very odd, and the "warmup" requests seems to never work. By our observations and other people observations, even if you set a fixed min_idle instances to be always there, they don't serve the traffic and you still get errors!

Hope to see some answers, we really liked GAE in our first year woking with the platform, and now I feel completely lost...

Best Regards,

-Ronoaldo

Johan Euphrosine

unread,
Mar 13, 2012, 9:52:15 AM3/13/12
to google-a...@googlegroups.com
I believe this was related to:

This should now be fixed.

On Tue, Mar 13, 2012 at 9:59 AM, Miroslav Genov <mge...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/YUHiXXkGPAgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations

Johan Euphrosine

unread,
Mar 13, 2012, 9:53:36 AM3/13/12
to google-a...@googlegroups.com
Hi, 

I believe this was also related to:

And should now be fixed.

Johan Euphrosine

unread,
Mar 13, 2012, 10:01:32 AM3/13/12
to google-a...@googlegroups.com
As Chris pointed earlier in that thread, M/S app are more vulnerable to this kind of transient infrastructure issues because moving them around require a maintenance period.

HRD applications are covered by the SLA, replicated around multiple datacenter, and better distributed if we notice an issue impacting one or many of them we can easily take actions without impacting other applications.

I strongly suggest to you to try out the self migration tool in your administration console, depending of the size of your data and your write QPS, the read only period needed to migrate your application could be very small:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/6r3NpHShJcgJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--

Johan Euphrosine

unread,
Mar 13, 2012, 10:04:44 AM3/13/12
to google-a...@googlegroups.com
If you are not using the datastore it should be trivial to move your application to the new HRD infrastructure.

If you don't need the same appid, just create a new HRD application and deploy your code on it.

If you need the same appid, use the self migration tool:

Feel free to open a production ticket if you need any assistance migrating your application:


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/W1KVvW7v-SEJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Jason

unread,
Mar 13, 2012, 10:10:08 AM3/13/12
to Google App Engine
I'm using HRD and am still getting tons of 500s since early this
morning.

Johan Euphrosine

unread,
Mar 13, 2012, 10:10:39 AM3/13/12
to google-a...@googlegroups.com
What is your application id?

Feel free to open a production issue, if you want to investigate this offthread:

Johan Euphrosine

unread,
Mar 13, 2012, 10:11:22 AM3/13/12
to google-a...@googlegroups.com
What is your application id? Did you already fill a production issue? 

Mos

unread,
Mar 13, 2012, 10:22:13 AM3/13/12
to google-a...@googlegroups.com
> What is your application id?

krisen-talk    (www.krisentalk.de)


> Feel free to open a production issue

There is already an issue from someone else (following this thread a lot of people are affected):

http://code.google.com/p/googleappengine/issues/detail?id=7133

Johan, what's going on with GAE the last days?  It doesn't feel like a PaaS in production mode.
Perhaps Google should reintroduce the Beta status. ;) 

Riley Eynon-Lynch

unread,
Mar 13, 2012, 10:23:47 AM3/13/12
to google-a...@googlegroups.com
We migrated our app to the HRD last night. With 4 GB of data quota in around 2M entities it took 20-30 minutes.  On the new app, we are seeing response times at about 1% of the MS app - 100 times faster.  Our app was read only for less than three seconds - enough to affect 10 requests from 2 users.

We decided to do this without a very thorough testing because the app was broken across all of our users on M/S, and will only break in a small percentage of users on HRD.  When it does break, it will be pretty harmless, and we expect to fix the inconsistency problems by the end of today.  We just could not wait until Monday for a possible fix for the MS app.

If you've got a similar setup or similar needs, I recommend a hastier-than-usual switch. Just watch out for the email limits~

Johan Euphrosine

unread,
Mar 13, 2012, 10:29:26 AM3/13/12
to google-a...@googlegroups.com
Thanks mos,

I asked for more detail on the tickets, let's followup there.

App Engine is indeed out of beta and now covered by an SLA for paid application.

If you feel that the SLA for your application has been violated, please consider the form linked there to request credit:

tempy

unread,
Mar 13, 2012, 10:44:10 AM3/13/12
to Google App Engine
To chime in: My HR app is also very unstable for what seems to be the
last few weeks. Sometimes it performs fine, and sometimes its spinning
up unnecessary instances and not using them, throwing deadlineexceeded
errors, killing requests after a long wait with "Request was aborted
after waiting too long to attempt to service your request." and on and
on - all under a light load.

Its especially galling when GAE errors result in the user seeing a
Google-branded error page, see here:
http://code.google.com/p/googleappengine/issues/detail?id=6965

All the suggestions that this is somehow isolated to the MS datastore
seem particularly disingenuous to my ears right now. For many months I
am totally happy with GAE and recommend it to everyone - then I have a
few weeks like this and I start to lay serious plans for extricating
myself from this platform. I have no idea if my SLA has been violated
as per all the legal definitions, but it sure feels like it.

On Mar 13, 3:29 pm, Johan Euphrosine <pro...@google.com> wrote:
> Thanks mos,
>
> I asked for more detail on the tickets, let's followup there.
>
> App Engine is indeed out of beta and now covered by an SLA for paid
> application.
>
> If you feel that the SLA for your application has been violated, please
> consider the form linked there to request credit:http://code.google.com/appengine/sla.html
>
>
>
>
>
>
>
>
>
> On Tue, Mar 13, 2012 at 3:22 PM, Mos <mosa...@googlemail.com> wrote:
> > > What is your application id?
>
> > krisen-talk    (www.krisentalk.de)
>
> > > Feel free to open a production issue
>
> > There is already an issue from someone else (following this thread a lot
> > of people are affected):
>
> >http://code.google.com/p/googleappengine/issues/detail?id=7133
>
> > Johan, what's going on with GAE the last days?  It doesn't feel like a
> > PaaS in production mode.
> > Perhaps Google should reintroduce the Beta status. ;)
>
> > On Tue, Mar 13, 2012 at 3:10 PM, Johan Euphrosine <pro...@google.com>wrote:
>
> >> What is your application id?
>
> >> Feel free to open a production issue, if you want to investigate this
> >> offthread:
>
> >>http://code.google.com/p/googleappengine/issues/entry?template=Produc...
>
> >> On Tue, Mar 13, 2012 at 11:44 AM, Mos <mosa...@googlemail.com> wrote:
>
> >>> Same thing the last minutes on our app (HRD, Java, Low-Traffic, one
> >>> instance, no new deployment, simple page just hitting MemCache):
>
> >>> "Request was aborted after waiting too long to attempt to service your
> >>> request."   -->  User sees 500er
>
> >>> GAE-Team, what is going on the last days?  In my opinion the Google App
> >>> Engine is unreliable and looks more like a alpha- or beta-
> >>> cloudenvrionment....
>
> >>> Please Google share you analysis with us.
>
> >>> Cheers
> >>> Mos
>
> >>> 2012/3/13 Sébastien Tromp <sebastien.tr...@gmail.com>

Matt Cameron

unread,
Mar 13, 2012, 11:07:21 AM3/13/12
to google-a...@googlegroups.com
We are seeing lots of 500s today.  Nothing shown on the appengine status board.

Mark

unread,
Mar 13, 2012, 11:35:21 AM3/13/12
to google-a...@googlegroups.com
I have had outages throughout this morning.  

  My app is bedbuzzserver.appspot.com, Java app on HR.

  From 2am until 7.48am.  I have filed Production issue 7138.

  My instances keep getting reset (and then taking too long to start up), & my 'Current Load' logs get reset to 0 (e.g. bedbuzzserver.appspot.com/logon goes from 100 calls today, to 0)

Nicanor Babula

unread,
Mar 13, 2012, 11:39:28 AM3/13/12
to google-a...@googlegroups.com
Hi, 

We are having the same issues.
appid: domodentweb2
QPS: 0.028
%error: I would say 60%, but it's not accurate
Datastore: HRD.

We are desperate because there is no backup alternative, so if appengine is down, we are down..

Thanks.

Sekhar

unread,
Mar 13, 2012, 11:48:06 AM3/13/12
to google-a...@googlegroups.com
Same here...a flurry of errors. The last one:


  1. E
    2012-03-13 08:42:46.475
    org.datanucleus.transaction.Transaction commit: Operation commit failed on resource: org.datanucleus.store.appengine.DatastoreXAResource@1389244, error code UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=

Nicanor Babula

unread,
Mar 13, 2012, 12:35:28 PM3/13/12
to google-a...@googlegroups.com
We are having the same problems.

appid: domodentweb2
QPS: 0.28
%error: can't say... It's fluctuating. In15 minutes periods we can have 0%, or 80+%
datastore: HRD

Please help. 
Thanks. 

Il giorno martedì 6 marzo 2012 22:17:37 UTC+1, Adam Sherman ha scritto:

Rick Mangi

unread,
Mar 13, 2012, 3:05:56 PM3/13/12
to google-a...@googlegroups.com
Same here. I use pingdom to monitor my site and it's been down on and off for the past 24 hours to the tune of around 30 minutes.

I opened an enterprise support ticket but haven't heard anything back.

Mauricio Aristizabal

unread,
Mar 13, 2012, 3:29:40 PM3/13/12
to google-a...@googlegroups.com
Just about every request results in 1 or more resources failing with 500, and many that do finish take seconds.  Here's a sample error (I used to get 202 or 203s, now just 'aborted after waiting too long...'):
  • 2012-03-13 11:52:37.682 /resources/styles/standard.css 500 10636ms 0kb Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C)
    67.49.52.74 - - [13/Mar/2012:11:52:37 -0700] "GET /resources/styles/standard.css HTTP/1.1" 500 0 "http://www.commentous.com/f" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C)" "www.commentous.com" ms=10636 cpu_ms=0 api_cpu_ms=0 cpm_usd=0.000082 pending_ms=10610 2012-03-13 11:52:37.682 
  • Request was aborted after waiting too long to attempt to service your request.
    I don't know about the legal definition in SLA, but I consider my app to be unusable and effectively down for the last six days or so.

    -Mauricio





    On Tuesday, March 6, 2012 1:17:37 PM UTC-8, Adam Sherman wrote:

    Brandon Wirtz

    unread,
    Mar 13, 2012, 3:31:48 PM3/13/12
    to google-a...@googlegroups.com

    We have noticed that many of the downtimes Pingdom reports are the result of AppsForDomains.  If you hit your app from another app, or via AOL, or another provider that has a peering arrangement with AppEngine it will be up.

    I’m calling this AppsForDomains issues because typically during these outages we get error pages in AppsforDomains admin pages.

    In these instances Green Checks will show in the status for Appengine. But your app will fail to resolve.

     

     

     

    --

    You received this message because you are subscribed to the Google Groups "Google App Engine" group.

    To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/TAjLzJanor4J.

    Riley

    unread,
    Mar 13, 2012, 3:40:34 PM3/13/12
    to google-a...@googlegroups.com
    I just switched to HRD this morning, and while the app is at least accessible now, I still get "Request was aborted after waiting too long to attempt to service your request." every 60 seconds or so.  This is not even thrown as an exception or error - it's logged as Info.

    This is really bad for us.  It's not that one out of every 100 users is getting a 500 page - it's that 1 out of every 100 things that EVERY USER tries to do doesn't work.  For us, that could mean that, in entering scores for a test, some of the students scores are not recorded.  Gah!  Not to mention the extra cost.

    stephenp

    unread,
    Mar 13, 2012, 8:58:23 PM3/13/12
    to google-a...@googlegroups.com
    One more here.

    appid: carglyplatform (HRD)

    It's been flaky off-and-on for a couple weeks, yesterday was better, today bad again. Lots of warmup errors, instance restarts, errors in general.

    Stephen


    On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
    Am I the only one seeing short duration outages? They are being reflected at:


    But I don't see anyone else complaining anywhere, so it makes me worried.

    A.


    On Tuesday, March 6, 2012 3:17:37 PM UTC-6, Adam Sherman wrote:
    Am I the only one seeing short duration outages? They are being reflected at:


    But I don't see anyone else complaining anywhere, so it makes me worried.

    A.


    Mauricio Aristizabal

    unread,
    Mar 13, 2012, 10:14:25 PM3/13/12
    to google-a...@googlegroups.com
    Potential fix: set performance sliders to auto.

    This is purely anecdotal but it might mean something:  After reading some post this afternoon about the instance settings not really working I switched to AUTO idle instances and AUTO pending latency (before they were set to 1-1 and 25ms-1.5s respectively).

    That was about 5 hours ago, and within an hour or so everything started working fine.  Before that, problems had been continuous as far as I could tell for 6 days or so.

    Or maybe the AppEngine guys finally got it under control.



    --
    You received this message because you are subscribed to the Google Groups "Google App Engine" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/C5nrBOmPaPcJ.

    Mos

    unread,
    Mar 14, 2012, 4:34:32 AM3/14/12
    to google-a...@googlegroups.com
    > Potential fix: set performance sliders to auto.

    No, that doesn't help in general. I'm running from the start with auto and having the problem.

    It's still not fixed;  would be nice to get some feedback from Google ?!

    Nicanor Babula

    unread,
    Mar 14, 2012, 4:35:53 AM3/14/12
    to google-a...@googlegroups.com
    Now my app is working fine. What happened? I saw at some point my app's graphs reset and some huge values for instance hours values. Afterwards, the instance hours counters turned to normal and my app stpped raising errors. If there was the GAE team working on it, thank you very much, but I think you owe us at least an explanation. What should I say to my customers? The issue has been fixed or they should  be expecting outages in the following days? My app is on HRD.

    Thanks,
    Cristian


    Il giorno martedì 6 marzo 2012 22:17:37 UTC+1, Adam Sherman ha scritto:

    Ikai Lan (Google)

    unread,
    Mar 14, 2012, 2:53:38 PM3/14/12
    to google-a...@googlegroups.com
    We've been working on addressing the issues with HRD apps.  Your experience is probably a coincidence.

    There is a very small section of apps that will have a few requests (very small %) that will be a LOT slower.

    We've scheduled a maintenance period for March 19th that will attempt to address issues with master/slave. You can read more about it here (scroll to the bottom for the correct time):

    https://groups.google.com/forum/?fromgroups#!topic/google-appengine-downtime-notify/CO_x02OF9Ak

    In general, everyone should try to migrate your application to High Replication when they can because we have a much higher speed of iteration when implementing different fixes for production issues (I have an earlier post about why this matters - we can make almost any production changes we want to HR apps without causing app downtime). At this point, I would not even create development or staging applications using master/slave because some behaviors (eventually consistency on global queries without an entity group root) differ between master/slave and high replication.

    --
    Ikai Lan 
    Developer Programs Engineer, Google App Engine



    --
    You received this message because you are subscribed to the Google Groups "Google App Engine" group.
    To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/hw3niAzjSIAJ.

    Felippe Bueno

    unread,
    Mar 14, 2012, 3:03:02 PM3/14/12
    to google-a...@googlegroups.com
    Hello all. Only to share my information:

    My app is doing something like this:

    JS="xpto"
    JS_OPTOUT="xpto1"

    if request.cookie.has_key("optout") or request.headers.has_key("DNT")
     response.write(JS_OPTOUT)
    else
     response.write(JS)


    The normal latency is ~3ms
    Now I'm seeing 13.9 ms

    Earlier this morning I saw ~1000ms milisec/req
    24hours ago I saw ~2400ms milisec/req

    Requests/Second (24 hrs)

    I noticed this behavior on all my apps for at least the last 8 days.


    I'm using python 2.5/HRD.



    Rui Oliveira

    unread,
    Apr 12, 2012, 5:00:21 AM4/12/12
    to google-a...@googlegroups.com

    Hi,

    My appId is air-menu1. HRD.

    I'm getting this kind of errors always after deploy. Usually after deploy the site only cames alive after 10 - 30 minutes.
    Yesterday for the first time, 24 hours after deploy the site stop completely during 6 hours.
    6 Continuous hours without answering one single request. This start without any kind of modification on the program or database.
    Am I alone? No one more is having this kind of problems?

    After this kind of errors start always that I refersh the browser the server starts a new instance... So it's easy to have a lot of instances in some minutes.

    I'm waiting to solve this issue to start business. My company has been working on this site during the last 10 month...

    Please Help Me. This is very very serious...

    Sincerely

    Rui

    1. 2012-04-12 01:06:30.764 /com.phonemenu.conf.Configure/myRemoteService500 70285ms 0kb Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B176 Safari/7534.48.3
      188.83.217.114 - - [11/Apr/2012:17:06:30 -0700] "POST /com.phonemenu.conf.Configure/myRemoteService HTTP/1.1" 500 0 "http://www.airmenu.com/Configure.html" "Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B176 Safari/7534.48.3" "www.airmenu.com" ms=70286 cpu_ms=24714 api_cpu_ms=0 cpm_usd=0.686616 loading_request=1 pending_ms=8036 exit_code=104 instance=00c61b117c6af5c03528fba06c291f298706e3
    2. C2012-04-12 01:06:30.708
      Uncaught exception from servlet
      com.google.apphosting.runtime.HardDeadlineExceededError: This request (bc91fe1c174b394d) started at 2012/04/12 00:05:29.992 UTC and was still executing at 2012/04/12 00:06:30.679 UTC.
      	at java.security.AccessController.getStackAccessControlContext(Native Method)
      	at java.security.AccessController.checkPermission(AccessController.java:540)
      	at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
      	at com.google.apphosting.runtime.security.CustomSecurityManager.checkPermission(CustomSecurityManager.java:56)
      	at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
      	at java.io.File.lastModified(File.java:909)
      	at java.util.zip.ZipFile.<init>(ZipFile.java:143)
      	at java.util.jar.JarFile.<init>(JarFile.java:150)
      	at java.util.jar.JarFile.<init>(JarFile.java:87)
      	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:919)
      	at sun.misc.URLClassPath$JarLoader.access$900(URLClassPath.java:723)
      	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:854)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.misc.URLClassPath$JarLoader.ensureOpenSynchronized(URLClassPath.java:846)
      	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:838)
      	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:785)
      	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:743)
      	at sun.misc.URLClassPath$3.run(URLClassPath.java:412)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.misc.URLClassPath.getLoader(URLClassPath.java:395)
      	at sun.misc.URLClassPath.getLoader(URLClassPath.java:371)
      	at sun.misc.URLClassPath.findResource(URLClassPath.java:201)
      	at java.net.URLClassLoader$2.run(URLClassLoader.java:379)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findResource(URLClassLoader.java:376)
      	at com.google.apphosting.runtime.security.UserClassLoader.findResource(UserClassLoader.java:723)
      	at java.lang.ClassLoader.getResource(ClassLoader.java:977)
      	at org.mortbay.resource.Resource.newSystemResource(Resource.java:203)
      	at org.mortbay.jetty.webapp.WebXmlConfiguration.configureDefaults(WebXmlConfiguration.java:159)
      	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1230)
      	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
      	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
      	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
      	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:202)
      	at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:171)
      	at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
      	at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:446)
      	at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
      	at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
      	at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
      	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
      	at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
      	at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
      	at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
      	at java.lang.Thread.run(Thread.java:679)
      
    3. I2012-04-12 01:06:30.713
      This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
    4. W2012-04-12 01:06:30.713
      A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may be throwing exceptions during the initialization of your application. (Error code 104)
    Screen shot 2012-04-12 at 09.48.47.png
    It is loading more messages.
    0 new messages