APP DOWN: App suddenly no longer starts, no code changes

687 views
Skip to first unread message

Jeff Schnitzer

unread,
Sep 12, 2012, 2:11:45 PM9/12/12
to Google App Engine
Our app has been running fine on the same version, no code deploys
(our versions are timestamped so this is 100% certain), for two days.
All of a sudden (10 mins ago) our app stopped running. Every attempt
to run a request produces this cryptic message:

--------

2012-09-12 10:56:21.791
com.google.inject.servlet.GuiceFilter setPipeline: Multiple Servlet
injectors detected. This is a warning indicating that you have more
than one GuiceFilter running in your web application. If this is
deliberate, you may safely ignore this message. If this is NOT
deliberate however, your application may not work as expected.
D 2012-09-12 10:56:21.792
st.voo.tick.util.cambridge.CambridgeSetup <init>: Establishing
cambridge view resolver
I 2012-09-12 10:56:21.792
st.voo.tick.GuiceConfig contextInitialized: Guice initialization took 514 millis
W 2012-09-12 10:56:21.888
Failed startup of context
com.google.apphosting.utils.jetty.RuntimeAppEngineWebAppContext@1479784{/,/base/data/home/apps/s~voost0/2012-09-10-1715.361669184733923098}
java.lang.RuntimeException: java.lang.RuntimeException: Unable to
instantiate MessageBodyReader
at org.jboss.resteasy.plugins.providers.RegisterBuiltin.register(RegisterBuiltin.java:35)
at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:211)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:67)
at org.jboss.resteasy.plugins.server.servlet.FilterDispatcher.init(FilterDispatcher.java:39)
at st.voo.tick.util.GuiceResteasyFilterDispatcher.init(GuiceResteasyFilterDispatcher.java:48)
at com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114)
at com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98)
at com.google.inject.servlet.GuiceFilter.init(GuiceFilter.java:172)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:219)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:194)
at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:134)
at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:447)
at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:452)
at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:459)
at com.google.tracing.TraceContext.runInContext(TraceContext.java:701)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:336)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:328)
at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:456)
at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.RuntimeException: Unable to instantiate MessageBodyReader
at org.jboss.resteasy.spi.ResteasyProviderFactory.registerProvider(ResteasyProviderFactory.java:761)
at org.jboss.resteasy.plugins.providers.RegisterBuiltin.registerProviders(RegisterBuiltin.java:70)
at org.jboss.resteasy.plugins.providers.RegisterBuiltin.register(RegisterBuiltin.java:31)
... 27 more
Caused by: java.lang.SecurityException: Unable to get members for
class org.jboss.resteasy.plugins.providers.DataSourceProvider
at com.google.appengine.runtime.Request.process-b6ca2b194d66ed23(Request.java)
at java.lang.Class.getConstructors(Class.java:291)
at org.jboss.resteasy.util.PickConstructor.pickSingletonConstructor(PickConstructor.java:27)
at org.jboss.resteasy.spi.ResteasyProviderFactory.getProviderInstance(ResteasyProviderFactory.java:1032)
at org.jboss.resteasy.spi.ResteasyProviderFactory.addMessageBodyReader(ResteasyProviderFactory.java:478)
at org.jboss.resteasy.spi.ResteasyProviderFactory.registerProvider(ResteasyProviderFactory.java:757)
at org.jboss.resteasy.plugins.providers.RegisterBuiltin.registerProviders(RegisterBuiltin.java:70)
at org.jboss.resteasy.plugins.providers.RegisterBuiltin.register(RegisterBuiltin.java:31)
at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:211)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:67)
at org.jboss.resteasy.plugins.server.servlet.FilterDispatcher.init(FilterDispatcher.java:39)
at st.voo.tick.util.GuiceResteasyFilterDispatcher.init(GuiceResteasyFilterDispatcher.java:48)
at com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114)
at com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98)
at com.google.inject.servlet.GuiceFilter.init(GuiceFilter.java:172)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:452)
at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:459)
at com.google.tracing.TraceContext.runInContext(TraceContext.java:701)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:336)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:328)
at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:456)
... 1 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
... 29 more
Caused by: java.lang.NoClassDefFoundError: java/io/FileOutputStream
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
... 29 more
Caused by: java.lang.ClassNotFoundException: java.io.FileOutputStream
... 29 more
C 2012-09-12 10:56:21.889
Uncaught exception from servlet
javax.servlet.UnavailableException: Initialization failed.
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:228)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:194)
at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:134)
at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:447)
at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:452)
at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:459)
at com.google.tracing.TraceContext.runInContext(TraceContext.java:701)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:336)
at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:328)
at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:456)
at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:251)
at java.lang.Thread.run(Thread.java:679)

Jeff Schnitzer

unread,
Sep 12, 2012, 2:53:16 PM9/12/12
to Google App Engine
HEEEEEEEEEEEEELP!

We have tried everything at this point. Shut down instances, tried to
deploy a new version, even tried old versions. We've reported a
production issue. Something is broken inside of GAE. The Guice error
must be a symptom; the smoking gun seems to be:

java.lang.ClassNotFoundException: java.io.FileOutputStream

We've been down for 30 mins and getting complaints from our clients.
This looks really, really bad. It's my nightmare scenario - an outage
in GAE that is small enough not to raise major alarm bells, but
nevertheless cripples my business. It is not the first time this has
happened. It is shaking my faith in GAE.

Jeff

Jeff Schnitzer

unread,
Sep 12, 2012, 3:11:13 PM9/12/12
to Google App Engine
More information:

* The failure began at 10:54am (pacific).
* Same app on different appid has the same problem.

(as listed in the stacktraces, the appid is voost0)

Jeff

Jeff Schnitzer

unread,
Sep 12, 2012, 4:00:24 PM9/12/12
to Google App Engine
We are back up and running now after 2 hrs of downtime.

To whoever fixed it: THANK YOU!!!

To whoever broke it in the first place: SPANKINGS!!!

Jeff

Kaan Soral

unread,
Sep 12, 2012, 4:34:46 PM9/12/12
to google-a...@googlegroups.com, je...@infohazard.org
I wonder what happened, subscribed to this topic to be updated, hope someone explains what happened and additionally hope this never happens to me (python) :)

Christina Ilvento

unread,
Sep 12, 2012, 5:55:50 PM9/12/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi All,

Beginning yesterday, September 11, Google App Engine experienced two periods of serving degradation for a subset of Java applications due to a gradual roll-out of a new version of the Java runtime. Affected applications would have seen errors related to class loading. We have resolved the first issue by fixing the underlying bug. We are still investigating the cause of the second issue but have rolled back the problematic update and all affected applications should now be returned to normal serving behavior.

No changes to your code or application configuration are needed at this time. We apologize for any inconvenience this issue has caused, and we’ll follow up with more details on the underlying incident and resolution soon.


Regards,
Christina Ilvento, App Engine PM



--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/EJUrxhiFMp4J.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



Kaan Soral

unread,
Sep 12, 2012, 6:12:24 PM9/12/12
to google-a...@googlegroups.com
This is why I love App Engine, when a problem occurs instead of having a heart attack or committing suicide, you can just wait for it to be resolved. I remember downtime's that lasted nearly a day when I was on a custom server/php platform. App Engine really is a breeze, at the worst case this happens :)

However new release related bugs are very frequent

Jon Stevens

unread,
Sep 12, 2012, 6:26:07 PM9/12/12
to google-a...@googlegroups.com
I don't appreciate Google taking down my entire company for 2 hours because they are doing testing in production. We had one customer send out 500 emails today saying "hey, check out this site" and all they got was an error page. Really poor timing.

Last time Google took my entire company down, it was for days, because they decided to block CloudFlare and couldn't just roll back quickly.

At least when I was hosting things myself, I had myself to blame... now I just have to hold my breath and hope someone is paying attention and decides to respond at some point in the future. I love the promise of not having to carry a pager, but it is also fear inducing at the same time.

Of course I have the option of paying $500/month to have a phone number to call, but that seems kind of outrageous for an issue that looks like it shouldn't have made it to production in the first place.

Really, 4+ hours of silence isn't good business. A simple response of 'Hey, we are looking into this, we'll let you know more when we know more' would have been super helpful for my blood pressure. At least I could then respond back to my client saying that I know it is being worked on. Christina, is something like that really out of the question?

thanks,

jon


To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/r5WgDSWVVvcJ.

Jeff Schnitzer

unread,
Sep 12, 2012, 8:35:10 PM9/12/12
to google-a...@googlegroups.com
On Wed, Sep 12, 2012 at 3:12 PM, Kaan Soral <kaan...@gmail.com> wrote:
> This is why I love App Engine, when a problem occurs instead of having a
> heart attack or committing suicide, you can just wait for it to be resolved.

Hmmm. This really unfortunately timed incident may have cost us an
important client, so I'm not feeling the love.

I have quite a lot of experience building and running large online
systems prior to embracing GAE and my products have never had as much
downtime as I've had over the last year. It hasn't always been
Google's fault (the entire .st registry going down for 8+ hours really
sucked[1]) but it usually has been. See:

* Instance startup time ballooning by 3X and hitting deadlines
(multiple occasions)
* GAE blocking CloudFlare with an undocumented security system
* This incident, where Java instances started mysteriously failing

Would waiting have fixed these issues? I'm not convinced. Google may
have smart people running GAE but they aren't watching _my_ app,
they're just watching for an uptick in the number of complaints. If
you're doing something slightly unusual (say, running a CF reverse
proxy), you might be statistical noise. Apparently this Java problem
_was_ widespread, but I had no way of knowing that.

GAE's value proposition is that it's better to have Google's smart
engineers building and maintaining your infrastructure. But my site
would be more reliable if I had one dumb person (possibly me) who
cares specifically about _my_ infrastructure. I've screwed up
deployments and upgrades in production before, but at least I'm aware
when changes happen, get immediate feedback, and can fix the problem
right then and there.

With GAE, the only thing I can do when my alarms go off is to whine as
loudly as possible. But there is no feedback! I have no way of
knowing if Google is working on the problem or if they're still
waiting for more complaints that will never materialize. Will I be
down for 15 minutes, 1 hour, 2 hours, 8 hours, forever? How long do
you want to wait?

This feels like a fundamental flaw in the PaaS concept, destined to
produce multiple-hour downtimes at irregular intervals. The feedback
loop is too slow (and lossy if the problem is not widespread).
There's no amount of QA or testing that will prevent failures in a
system as big as complicated as GAE. So the only reasonable option is
to get that feedback loop shorter. How can that happen? Some ideas:

* Google could announce when they are rolling out changes. I don't
need release notes (although it would be nice to know what to watch
for) but I'd like to know when I should pay extra attention. Or not
schedule client demos. Facebook does something like this, rolling out
platform changes on specific days of the week (which I long ago
stopped caring about).

* Google could make extra support channels available during this
time. Hell, use twitter. Think of us as your QA staff - if we see
something amiss, we'd like to let you know.

* Google could be more transparent about problems as they happen.
When you know there is an issue, let us know. Since I must assume
that any problem which Google hasn't acknowledged is a problem Google
doesn't know about, I can stop spamming @google.com addresses.

* Google could monitor our apps, and compare error rates before
rollout to error rates after rollout. Ideally you'd break this down
by component; figure out which apps use the search api, so when you
roll out changes to the search system, you're specifically watching
for an uptick in 500 errors from those apps. Something like that.

Any other ideas? I really like GAE and I really like the PaaS
concept. But reliability is really a problem. It's probably going to
be an even bigger problem going on into the future as GAE (hopefully)
adds new features and gets a bigger footprint. More moving parts
means more failures.

Jeff

P.S. Paying $6k/yr for Premier Support is not the answer. Whether or
not that would solve my problem, that doesn't solve GAE's problem.

[1]: http://blorn.com/post/29851770158/beware-cutesy-two-letter-tlds-for-your-domain-name

Thomas Wiradikusuma

unread,
Sep 12, 2012, 11:33:30 PM9/12/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi Jeff, 

I feel sorry for your loss. I agree completely with your message and your recommendation to Google. Hi GAE team, at the very least, please let us know when you want to roll out upgrades and stuff. It might not break stuff, but we need to know so we can be prepared (at least we won't send newsletter to thousands of people, "Hey check our website now" or schedule a pitch to investor).

Drake

unread,
Sep 13, 2012, 2:24:00 AM9/13/12
to google-a...@googlegroups.com, je...@infohazard.org

Doing Maintenance outside a scheduled window with no warning is dumb.  We use Jenkins in my shop. And we do production testing across a subset of our install base before we push everyone to it. In short we test the hell before anything goes live, then we bash the hell out of it to see that it scales. There is no excuse for a code push that breaks things. If you need App Code with unit tests from your top users say so, many of us could provide code that you could use to test code releases.

 

2 hours is a LOT of downtime. It will be a LONG time before that works back to 99.95% uptime.

 

My understanding is premier support would not have fixed this for you Jeff.

 

I was doing pitches and demos all day today at Tech Crunch Disrupt, and at VCs. If I had been down today, I would have been off AppEngine.

 

We are already getting on the fence, and have started porting code to make our stuff work on OpenStack in case we need to move for our issues.

 

 

 

 

alex

unread,
Sep 13, 2012, 3:38:30 AM9/13/12
to google-a...@googlegroups.com, je...@infohazard.org
Actually, talking about open stacks, I've been playing with AppFog recently and I've gotta say my test apps running on Java, Ruby and Python runtimes have impressive performance and throughput. 

I was pretty skeptical at the beginning (I did have a look at Cloudfoundry code base a year back and didn't like what I saw) but these days I was like "wow!". 

At the point where I'm standing today I seriously considering moving the apps to AppFog. They don't offer a GA SLA but statistically I haven't been needing one so far, who cares. Plus, the pricing model is the simplest I've ever seen. 
I'm not advertising here, just saying: I was really impressed, also from tech support point of view. They always responded back to me within 24 hours (and never charged me for that).

-- alex

Jeff Schnitzer

unread,
Sep 13, 2012, 3:42:22 AM9/13/12
to google-a...@googlegroups.com
I should probably also chime in with a positive note - on the bright
side, I haven't heard anyone complain about the datastore in a very
long time. The HRD does seem to deliver on its promise. Now we just
need all the rest of the infrastructure to come up to this same level
of robustness!

Jeff
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/OsO531fxrSAJ.

Richard Watson

unread,
Sep 13, 2012, 8:28:08 AM9/13/12
to google-a...@googlegroups.com, je...@infohazard.org
Christina wrote:

> We have resolved the first issue by fixing the underlying bug. 

I think one good option for longer-term stability is one we've identified before - we need a "stable" release channel where Google won't alter the environment until it's thoroughly tested, meaning deployed to those who aren't on stable and run without error for a week/month.

Also, if I were you at some point I'd start to deploy on weekends (which I assume are quieter, and fewer businesses are affected by downtime).  I understand you don't want to have the whole full-stack engineering team in on weekends, but deploying in the middle of the week is insane.  Have some on-call, pay them more, just don't think adding one more test will fix this permanently.  Not sure when GAE is important enough to Google to make this work, but doing anything less just raises the chance that you're taking businesses offline.

Any chance you could tell us when you're about to alter the stack, similar to how you warn about the M/S downtime?  That way we can be around and pre-warn our own customers, just in case.

One class of problem that won't go away easily are non-GAE changes.  For example, someone in a far-flung corner of Google deciding that they must block Cloudflare-like requests for all Google properties. No idea how to fix that.

Richard

Andy Stevko

unread,
Sep 13, 2012, 9:34:38 AM9/13/12
to google-a...@googlegroups.com
re: communication
Perhaps the app engine team can learn from the pain & suffering customers endured with another vendor's service outage rather than bleeding us all over again.

"the biggest failure in this event was Amazon’s communication, or rather lack thereof. The status updates were far too vague to be of much use and there was no background information whatsoever.  "


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/aeVysGiNoVEJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
-- A. Stevko
===========
"If everything seems under control, you're just not going fast enough." M. Andretti





Joshua Smith

unread,
Sep 13, 2012, 9:58:08 AM9/13/12
to google-a...@googlegroups.com


On Sep 13, 2012, at 8:28 AM, Richard Watson <richard...@gmail.com> wrote:

deploying in the middle of the week is insane

+1 on this.

Please, please, PLEASE, GAE team: Start rolling out changes on Saturdays (and monitor the systems carefully on Sundays, so you can roll back whatever you messed up before Monday).

-Joshua

hyperflame

unread,
Sep 13, 2012, 11:25:48 AM9/13/12
to Google App Engine
Christina,

Let me take a wild guess at what happened internally. A few months
ago, there was a 100+ post thread on this mailing list complaining
about long instance startup times (specifically, that classloading was
slow). You guys had an internal discussion, and produced some code to
streamline classloading, and it's failing because the streamlined
classloading process made some assumptions that don't hold up in
production.

Let me make an educated guess at why the failure is happening. Jeff
Schnitzer reported that the base exception was
"java.lang.ClassNotFoundException: java.io.FileOutputStream". But GAE
shouldn't need to load FileOutputStream, since the GAE JRE whitelist
(located at https://developers.google.com/appengine/docs/java/jrewhitelist
) doesn't even include FileOutputStream. There is no legitimate reason
to load or make available FileOutputStream if we're not allowed to use
it. So why is GAE attempting to load a non-whitelisted class?

My guess is that GAE deployed with the wrong copy of the SDK; perhaps
there was a reference to FileOutputStream in one of the internal GAE
classes, and everything went kaput when the servers failed to find the
reference. Or perhaps there was a last minute change by a programmer
who forgot that FileOutputStream wasn't available in GAE.

Am I close or completely wrong?

On Sep 12, 4:56 pm, Christina Ilvento <cilve...@google.com> wrote:
> *Hi All,
>
> Beginning yesterday, September 11, Google App Engine experienced two
> periods of serving degradation for a subset of Java applications due to a
> gradual roll-out of a new version of the Java runtime. Affected
> applications would have seen errors related to class loading. We have
> resolved the first issue by fixing the underlying bug. We are still
> investigating the cause of the second issue but have rolled back the
> problematic update and all affected applications should now be returned to
> normal serving behavior.
>
> No changes to your code or application configuration are needed at this
> time. We apologize for any inconvenience this issue has caused, and we’ll
> follow up with more details on the underlying incident and resolution soon.
>
> Regards,
> Christina Ilvento, App Engine PM*
>
>
>
> On Wed, Sep 12, 2012 at 1:34 PM, Kaan Soral <kaanso...@gmail.com> wrote:
> > I wonder what happened, subscribed to this topic to be updated, hope
> > someone explains what happened and additionally hope this never happens to
> > me (python) :)
>
> > On Wednesday, September 12, 2012 11:00:49 PM UTC+3, Jeff Schnitzer wrote:
>
> >> We are back up and running now after 2 hrs of downtime.
>
> >> To whoever fixed it:  THANK YOU!!!
>
> >> To whoever broke it in the first place:  SPANKINGS!!!
>
> >> Jeff
>
> >> On Wed, Sep 12, 2012 at 12:11 PM, Jeff Schnitzer <je...@infohazard.org>
> >> wrote:
> >> > More information:
>
> >> >  * The failure began at 10:54am (pacific).
> >> >  * Same app on different appid has the same problem.
>
> >> > (as listed in the stacktraces, the appid is voost0)
>
> >> > Jeff
>
> >> > On Wed, Sep 12, 2012 at 11:53 AM, Jeff Schnitzer <je...@infohazard.org>
> >> wrote:
> >> >> HEEEEEEEEEEEEELP!
>
> >> >> We have tried everything at this point.  Shut down instances, tried to
> >> >> deploy a new version, even tried old versions.  We've reported a
> >> >> production issue.  Something is broken inside of GAE.  The Guice error
> >> >> must be a symptom; the smoking gun seems to be:
>
> >> >> java.lang.**ClassNotFoundException: java.io.FileOutputStream
>
> >> >> We've been down for 30 mins and getting complaints from our clients.
> >> >> This looks really, really bad.  It's my nightmare scenario - an outage
> >> >> in GAE that is small enough not to raise major alarm bells, but
> >> >> nevertheless cripples my business.  It is not the first time this has
> >> >> happened.  It is shaking my faith in GAE.
>
> >> >> Jeff
>
> >> >> On Wed, Sep 12, 2012 at 11:11 AM, Jeff Schnitzer <je...@infohazard.org>
> >> wrote:
> >> >>> Our app has been running fine on the same version, no code deploys
> >> >>> (our versions are timestamped so this is 100% certain), for two days.
> >> >>> All of a sudden (10 mins ago) our app stopped running.  Every attempt
> >> >>> to run a request produces this cryptic message:
>
> >> >>> --------
>
> >> >>> 2012-09-12 10:56:21.791
> >> >>> com.google.inject.servlet.**GuiceFilter setPipeline: Multiple
> >> Servlet
> >> >>> injectors detected. This is a warning indicating that you have more
> >> >>> than one GuiceFilter running in your web application. If this is
> >> >>> deliberate, you may safely ignore this message. If this is NOT
> >> >>> deliberate however, your application may not work as expected.
> >> >>> D 2012-09-12 10:56:21.792
> >> >>> st.voo.tick.util.cambridge.**CambridgeSetup <init>: Establishing
> >> >>> cambridge view resolver
> >> >>> I 2012-09-12 10:56:21.792
> >> >>> st.voo.tick.GuiceConfig contextInitialized: Guice initialization took
> >> 514 millis
> >> >>> W 2012-09-12 10:56:21.888
> >> >>> Failed startup of context
> >> >>> com.google.apphosting.utils.**jetty.**RuntimeAppEngineWebAppContext@*
> >> *1479784{/,/base/data/home/**apps/s~voost0/2012-09-10-1715.**36166918473392­3098}
>
> >> >>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> >> >>> instantiate MessageBodyReader
> >> >>>         at org.jboss.resteasy.plugins.**providers.RegisterBuiltin.**
> >> register(RegisterBuiltin.java:**35)
> >> >>>         at org.jboss.resteasy.spi.**ResteasyDeployment.start(**ResteasyDeployment.java­:211)
>
> >> >>>         at org.jboss.resteasy.plugins.**server.servlet.**
> >> ServletContainerDispatcher.**init(**ServletContainerDispatcher.**java:67)
>
> >> >>>         at org.jboss.resteasy.plugins.**server.servlet.**
> >> FilterDispatcher.init(**FilterDispatcher.java:39)
> >> >>>         at st.voo.tick.util.**GuiceResteasyFilterDispatcher.**init(**
> >> GuiceResteasyFilterDispatcher.**java:48)
> >> >>>         at com.google.inject.servlet.**FilterDefinition.init(**FilterDefinition.java:1­14)
>
> >> >>>         at com.google.inject.servlet.**ManagedFilterPipeline.**
> >> initPipeline(**ManagedFilterPipeline.java:98)
> >> >>>         at com.google.inject.servlet.**GuiceFilter.init(GuiceFilter.*
> >> *java:172)
> >> >>>         at org.mortbay.jetty.servlet.**FilterHolder.doStart(**FilterHolder.java:97)
>
> >> >>>         at org.mortbay.component.**AbstractLifeCycle.start(**AbstractLifeCycle.java:50­)
>
> >> >>>         at org.mortbay.jetty.servlet.**ServletHandler.initialize(**ServletHandler.java­:662)
>
> >> >>>         at org.mortbay.jetty.servlet.**Context.startContext(Context.*
> >> *java:140)
> >> >>>         at org.mortbay.jetty.webapp.**WebAppContext.startContext(**WebAppContext.java:­1250)
>
> >> >>>         at org.mortbay.jetty.handler.**ContextHandler.doStart(**ContextHandler.java:51­7)
>
> >> >>>         at org.mortbay.jetty.webapp.**WebAppContext.doStart(**WebAppContext.java:467)
>
> >> >>>         at org.mortbay.component.**AbstractLifeCycle.start(**AbstractLifeCycle.java:50­)
>
> >> >>>         at com.google.apphosting.runtime.**
> >> jetty.AppVersionHandlerMap.**createHandler(**AppVersionHandlerMap.java:219)
>
> >> >>>         at com.google.apphosting.runtime.**
> >> jetty.AppVersionHandlerMap.**getHandler(**AppVersionHandlerMap.java:194)
> >> >>>         at com.google.apphosting.runtime.**jetty.**
> >> JettyServletEngineAdapter.**serviceRequest(**JettyServletEngineAdapter.**ja­va:134)
>
> >> >>>         at com.google.apphosting.runtime.**
> >> JavaRuntime$RequestRunnable.**run(JavaRuntime.java:447)
> >> >>>         at com.google.tracing.**TraceContext$**TraceContextRunnable.*
> >> *runInContext(TraceContext.**java:452)
> >> >>>         at com.google.tracing.**TraceContext$**
> >> TraceContextRunnable$1.run(**TraceContext.java:459)
> >> >>>         at com.google.tracing.**TraceContext.runInContext(**TraceContext.java:701)
>
> >> >>>         at com.google.tracing.**TraceContext$**
> >> AbstractTraceContextCallback.**runInInheritedContextNoUnref(**TraceContext.­java:336)
>
> >> >>>         at com.google.tracing.**TraceContext$**
> >> AbstractTraceContextCallback.**runInInheritedContext(**TraceContext.java:32­8)
>
> >> >>>         at com.google.tracing.**TraceContext$**
> >> TraceContextRunnable.run(**TraceContext.java:456)
> >> >>>         at com.google.apphosting.runtime.**
> >> ThreadGroupPool$PoolEntry.run(**ThreadGroupPool.java:251)
> >> >>>         at java.lang.Thread.run(Thread.**java:679)
> >> >>> Caused by: java.lang.RuntimeException: Unable to instantiate
> >> MessageBodyReader
> >> >>>         at org.jboss.resteasy.spi.**ResteasyProviderFactory.**
> >> registerProvider(**ResteasyProviderFactory.java:**761)
> >> >>>         at org.jboss.resteasy.plugins.**providers.RegisterBuiltin.**
> >> registerProviders(**RegisterBuiltin.java:70)
> >> >>>         at org.jboss.resteasy.plugins.**providers.RegisterBuiltin.**
> >> register(RegisterBuiltin.java:**31)
> >> >>>         ... 27 more
> >> >>> Caused by: java.lang.SecurityException: Unable to get members for
> >> >>> class org.jboss.resteasy.plugins.**providers.DataSourceProvider
> >> >>>         at com.google.appengine.runtime.**Request.process-**b6ca2b194d66ed23(Request.j­ava)
>
> >> >>>         at java.lang.Class.**getConstructors(Class.java:**291)
> >> >>>         at org.jboss.resteasy.util.**PickConstructor.**
> >> pickSingletonConstructor(**PickConstructor.java:27)
> >> >>>         at org.jboss.resteasy.spi.**ResteasyProviderFactory.**
> >> getProviderInstance(**ResteasyProviderFactory.java:**1032)
> >> >>>         at org.jboss.resteasy.spi.**ResteasyProviderFactory.**
> >> addMessageBodyReader(**ResteasyProviderFactory.java:**478)
> >> >>>         at org.jboss.resteasy.spi.**ResteasyProviderFactory.**
> >> registerProvider(**ResteasyProviderFactory.java:**757)
> >> >>>         at org.jboss.resteasy.plugins.**providers.RegisterBuiltin.**
> >> registerProviders(**RegisterBuiltin.java:70)
> >> >>>         at org.jboss.resteasy.plugins.**providers.RegisterBuiltin.**
> >> register(RegisterBuiltin.java:**31)
> >> >>>         at org.jboss.resteasy.spi.**ResteasyDeployment.start(**ResteasyDeployment.java­:211)
>
> >> >>>         at org.jboss.resteasy.plugins.**server.servlet.**
> >> ServletContainerDispatcher.**init(**ServletContainerDispatcher.**java:67)
>
> >> >>>         at org.jboss.resteasy.plugins.**server.servlet.**
> >> FilterDispatcher.init(**FilterDispatcher.java:39)
> >> >>>         at st.voo.tick.util.**GuiceResteasyFilterDispatcher.**init(**
> >> GuiceResteasyFilterDispatcher.**java:48)
> >> >>>         at com.google.inject.servlet.**FilterDefinition.init(**FilterDefinition.java:1­14)
>
> >> >>>         at com.google.inject.servlet.**ManagedFilterPipeline.**
> >> initPipeline(**ManagedFilterPipeline.java:98)
> >> >>>         at com.google.inject.servlet.**GuiceFilter.init(GuiceFilter.*
> >> *java:172)
> >> >>>         at org.mortbay.jetty.servlet.**FilterHolder.doStart(**FilterHolder.java:97)
>
> >> >>>         at org.mortbay.component.**AbstractLifeCycle.start(**AbstractLifeCycle.java:50­)
>
> >> >>>         at org.mortbay.jetty.servlet.**ServletHandler.initialize(**ServletHandler.java­:662)
>
> >> >>>         at org.mortbay.jetty.servlet.**Context.startContext(Context.*
> >> *java:140)
> >> >>>         at org.mortbay.jetty.webapp.**WebAppContext.startContext(**WebAppContext.java:­1250)
>
> >> >>>         at
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -

Christina Ilvento

unread,
Sep 13, 2012, 11:45:35 AM9/13/12
to google-a...@googlegroups.com
Hi All,

First, thanks for all of the feedback, it's very helpful for us to hear directly from customers about how issues affect them and what would make them feel more confident using the platform. Please feel free to reach out to me directly (cilvento@) if you have anything that you'd like to discuss without an audience.

For this issue in particular, we are investigating the duration and severity of the incident on our side. No one can promise a 100% bug-free environment, but we do try to be as transparent as possible about these issues, and we'll be posting a more detailed report soon.



Thanks,
Christina

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--

Christina Ilvento |
 Google App Engine | cilv...@google.com | (650)-201-9399



psm

unread,
Sep 13, 2012, 2:12:35 PM9/13/12
to google-a...@googlegroups.com, je...@infohazard.org
Jeff,

these are good ideas and suggestions.  we are working on a number of different strategies to ameliorate these issues.  some of the items you are suggesting are already in progress, and others besides.  and i agree that this is a general philosophical challenge with PaaS.  on GAE we now regularly serve several hundreds of thousands of applications, so it is indeed a challenge to handle the "long tail" problem.  we are aware of this, and you should expect us to be rolling out a number of things to address it.  in fact, we expect to make our experience of running this large workload over a long period of time into an advantage with GAE. 

Peter S Magnusson
(GAE Eng Dir)

Per

unread,
Sep 13, 2012, 3:11:31 PM9/13/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi Peter,

I'm sure it's extremely(!) hard to host this many applications reliably and at a sane cost. I won't try to make any suggestions about that.  But what concerns me is the support situation on App Engine. It shouldn't be too hard to monitor a low-bandwidth forum like this, and to provide somewhat timely feedback when downtime is being reported by some of the most senior users, during a time when you're actually rolling out changes.

I can understand that you don't want to turn this into an official support forum for all the dumb question we see here, and I can understand that you want to encourage us to purchase premium accounts. But I feel you're hurting your own business by not responding swiftly to posts like these. We've been through similar issues with our application, and the support situation is really the one thing that stops me from recommending App Engine wholeheartedly. I bet I'm not the only one. :) When discussing your strategies, it would be great if you could consider "improved responsiveness" too.

Cheers,
Per

Peter Magnusson

unread,
Sep 13, 2012, 11:19:47 PM9/13/12
to google-a...@googlegroups.com, Jeff Schnitzer
point taken.  i agree.


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/tmYWDN8r2pUJ.
Reply all
Reply to author
Forward
0 new messages