Instances/Java go crazy

307 views
Skip to first unread message

Mos

unread,
Jul 30, 2012, 11:47:00 AM7/30/12
to google-a...@googlegroups.com
Anyone else seeing issue with GAE instance management?
In one minute - 10 requests - around 5 instances are started. Old ones
do not response. New instances are created again and again...

http://code.google.com/p/googleappengine/issues/detail?id=7910

Rerngvit Yanggratoke

unread,
Jul 30, 2012, 11:50:03 AM7/30/12
to google-a...@googlegroups.com
Just try with my app in Java runtime and do not find the problem. Which runtime you are using?


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Best Regards,
Rerngvit Yanggratoke 

Mos

unread,
Jul 30, 2012, 12:37:08 PM7/30/12
to google-a...@googlegroups.com
> Which runtime you are using?

GAE/Java 2.7

Kristopher Giesing

unread,
Jul 31, 2012, 11:29:54 PM7/31/12
to google-a...@googlegroups.com, mos...@googlemail.com
Just noticed this thread.  I'm seeing this as well.  More details here:

Mos

unread,
Aug 1, 2012, 5:50:27 PM8/1/12
to google-a...@googlegroups.com
And again I have to pay for GAE issues:

On Jul 30 the Frontend Instance Hours goes beyond the free limit.
First time this week in my application history. Remember: The application was
unusable on this day because GAE starts instances like crazy (on low traffic).
Frontent Instance Hours were consumed of the buggy instance scheduler.

But to rescue there is another bug in GAE. The billing does not work.
It's stuck for months and I still read the following message on my
billing history:
"We were unable to process your last payment. If the account balance
($2.32) is not paid in full by 06/01/2012, this application's quotas
may be reset to the free levels."

That's ingenious, Google. ;)


On Mon, Jul 30, 2012 at 5:47 PM, Mos <mos...@googlemail.com> wrote:

Mos

unread,
Aug 4, 2012, 3:00:26 PM8/4/12
to google-a...@googlegroups.com
Again a not very reliable GAE month:

Pingdom July

Uptime Downtime Outages Response time
99.90% 0h 43m 35s 27 466 ms

Drake

unread,
Aug 5, 2012, 7:23:00 AM8/5/12
to google-a...@googlegroups.com

Well I live in unicorn land. But this was my score from pingdom for www.xyhd.tv .

 

 

Pingdom Monthly Report

2012-07-01 - 2012-07-31

 

Overview: Average of all checks

Uptime

Outages

Response time

99.99%

3

376 ms

Checks with downtime

Check name

Uptime

Downtime

Outages

Response time

XYHD

99.99%

0h 05m 00s

3

376 ms

Checks without downtime

Check name

Uptime

Downtime

Outages

Response time

This is a scheduled report from Pingdom. If you wish to no longer receive this report you can unsubscribe by logging in to Pingdom Panel and updating your email report settings.

Copyright © 2012 Pingdom AB

 

 

> -----Original Message-----

> From: google-a...@googlegroups.com [mailto:google-

> appe...@googlegroups.com] On Behalf Of Mos

> Sent: Saturday, August 04, 2012 12:00 PM

> To: google-a...@googlegroups.com

> Subject: [google-appengine] Re: Instances/Java go crazy

> --

> You received this message because you are subscribed to the Google Groups

> "Google App Engine" group.

> To post to this group, send email to google-a...@googlegroups.com.

> To unsubscribe from this group, send email to google-

Mos

unread,
Aug 18, 2012, 8:56:20 AM8/18/12
to google-a...@googlegroups.com
Add again. The last 48 hours the scheduler creates and closes
instances without reason (traffic as usual, no software updates, no
special tasks)

That's definitely not a "WorkAsIntended" issue!

As a result instance hours go up and we have to pay again for GAE's
fault. Great business-model: Let customers pay for bugs and underline
it as "WorkAsIntended"!

Check: http://code.google.com/p/googleappengine/issues/detail?id=7910

Jeff Schnitzer

unread,
Aug 18, 2012, 10:52:43 AM8/18/12
to google-a...@googlegroups.com
Have you set the _min_ idle instances to something other than
automatic? This setting seems to be especially catastrophic for a
low-traffic app.

Jeff
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Mos

unread,
Aug 18, 2012, 2:21:49 PM8/18/12
to google-a...@googlegroups.com
Hi Jeff,

min-idle instance is on automatic. Max is on two.
There is at least one request per minute for my "low-traffic app",

From my perspective a very straight-forward application that shouldn't
have any problems on GAE. (Check: http://www.krisentalk.de/ if you
can speak German. ;) )

I'm surprised that you are still go with GAE for new applications?
You had your own "experiences" from what I saw on this list. ;)

I'm not recommending GAE anymore. I tell my customer to go with other
platforms. After one year following discussions and issues I don't thing
reliability will be improved on GAE in near future (at least for small
and mid-size applications).

Cheers
Mos

Jeff Schnitzer

unread,
Aug 18, 2012, 4:19:51 PM8/18/12
to google-a...@googlegroups.com
On Sat, Aug 18, 2012 at 2:21 PM, Mos <mos...@googlemail.com> wrote:
>
> I'm surprised that you are still go with GAE for new applications?
> You had your own "experiences" from what I saw on this list. ;)
>
> I'm not recommending GAE anymore. I tell my customer to go with other
> platforms. After one year following discussions and issues I don't thing
> reliability will be improved on GAE in near future (at least for small
> and mid-size applications).

I will confess that my enthusiasm for GAE has tempered somewhat, but
every platform has issues. All my GAE apps have parts that run in
other cloud providers where it makes sense. That doesn't bother me.

The shining gem in GAE-land is the datastore. It would be
extraordinarily hard to reproduce a distributed, replicated,
fault-tolerant, infinitely-scaling, self-managing database. Maybe
Amazon's DynamoDB is getting close - I certainly like the performance
of it, but it still seems to lack a lot of the features of the GAE
datastore. All other data storage solutions I am familiar with
require a significant amount of maintenance as they scale, and I don't
want to think about that. I have no ops team and don't ever want one.

The task queue and memcache work well and are nicely integrated. I
like that you can transactionally add a task. But yeah, these parts
can be replicated elsewhere.

My main points of frustration are:

* Requests that go to cold start instances. I think this will
eventually get fixed.

* $100/mo SSL. Way too hard on startups, and pushes people away from
the mantra that everything should be ssl all the time.

* General performance. Some things are just slower than they should
be. For example, we (Voost) proxy OSM map tiles because Mapquest
doesn't support HTTPS - everything must be HTTPS or browsers show
mixed content warnings. Proxying through GAE was visibly slow, and
requests would often fail even with multiple retries. It's possible
that this is because Akamai (which serves the tiles) is throttling
requests from GAE urlfetch; dunno. But moving our https tile proxy to
nginx on Heroku made a _world_ of difference. It's like night and
day.

* GAE is actually pretty expensive, in a way that isn't so obvious
from the price chart. The natural tendency is to compare an
"instance" of GAE to an "instance" from some other service, and by
this standard GAE is fairly expensive. But each GAE instance can
handle a lot less load than an "instance" from almost any other
service, so there is another multiple you wouldn't necessarily expect.
For most apps the added expense is still tolerable, but for some apps
(ie Richard's game) it's downright pathological.

There are a number of other tricks and difficulties on GAE but there
are almost always workarounds. It's the nature of working on a large,
clustered, distributed system that some things (eg, aggregating
rapidly changing data) are hard. Overall I still recommend GAE for
most standard webappy kinds of things... but there are some apps I
would steer away:

* Apps with a lot of transactional logic are hard.
* Apps with a lot of ad-hoc query needs are hard.
* Apps with a lot of rapidly mutating aggregation are hard.

Someday when I have more time I might write a short "GAE Survival
Guide" and sell it for a few bucks as an ebook.

Jeff

Mos

unread,
Aug 18, 2012, 4:26:46 PM8/18/12
to google-a...@googlegroups.com
> Someday when I have more time I might write a short "GAE Survival
> Guide" and sell it for a few bucks as an ebook.

That's seems to be a good motivation to invest so much time in GAE
issues. ;) I would had bought the book one year ago....

Jeff Schnitzer

unread,
Aug 18, 2012, 9:58:24 PM8/18/12
to pphalen, google-a...@googlegroups.com
On Sat, Aug 18, 2012 at 8:59 PM, pphalen <patrick...@gmail.com> wrote:
>> Apps with a lot of rapidly mutating aggregation are hard.
>
> Hi, I'm new to app engine, so this is news to me. Could you please
> characterize a bit more? E.g., what is "a lot"?

Nothing in the datastore can change faster than once per second;
imagine trying to keep a count of products sold when you're selling
hundreds per second. Sharding helps but it becomes tricky to get the
value in a timely manner when you get to hundreds of shards. If you
need a transactionally accurate count (say, you are selling N units
and you must not sell N+1 units) this becomes even harder - you have
to go out of the datastore to another tool like memcache increment().

There are other data storage technologies that are optimized for rapid
updates to atomic data. Redis, Mongo, and of course traditional
RDBMSes solve this problem pretty easily for most typical scaling
needs. There isn't an equivalent tool in the GAE toolbox. This
doesn't mean you can't use GAE even if your app does some of this
aggregation... but if you do a lot of it, the platform works against
you rather than for you.

Jeff

alex

unread,
Aug 19, 2012, 4:47:54 AM8/19/12
to google-a...@googlegroups.com
@pphalen just remember, you're reading replies from other developers, which are often biased (e.g. to a specific programming language) or coming from a different background; others wear mermaid costume; etc.

For instance,

> Nothing in the datastore can change faster than once per second;

is true only in case of a single entity group. So, you're better off reading official docs.

What I'm saying is, you'll find a lot of posts in this forum that are more like personal opinions of single developers (even when they scream "app engine is broken" sending you a bunch of charts that somehow are supposed to confirm that). So you should treat it as such, and not as an "absolute truth" unless you trust the guy or this is a google employee who really knows what she's talking about.

Jeff Schnitzer

unread,
Aug 19, 2012, 1:35:24 PM8/19/12
to google-a...@googlegroups.com
On Sun, Aug 19, 2012 at 4:47 AM, alex <al...@cloudware.it> wrote:
>
>> Nothing in the datastore can change faster than once per second;
>
> is true only in case of a single entity group. So, you're better off reading
> official docs.

Perhaps I wasn't clear enough - that's my entire point. No _single
thing_ in the datastore can change faster than once per second. In
the case of rapidly mutating data, this is a problem. Aggregations in
particular, because if you aggregate 100 things mutating once per
second, the aggregation has to mutate 100 times per second.

Thus: "Apps with a lot of rapidly mutating aggregation are hard."

This is not to say that you should not read the official docs. You
definitely should. But some walls are not obvious until you slam your
head into them a few times.

Jeff

Barry Hunter

unread,
Aug 20, 2012, 9:43:51 AM8/20/12
to google-a...@googlegroups.com
>>
>> There are other data storage technologies that are optimized for rapid
>> updates to atomic data. Redis, Mongo, and of course traditional
>> RDBMSes solve this problem pretty easily for most typical scaling
>> needs. There isn't an equivalent tool in the GAE toolbox.

CloudSQL. That is a 'traditional RDBMS' :)

https://developers.google.com/cloud-sql/

Mos

unread,
Aug 20, 2012, 12:15:13 PM8/20/12
to google-a...@googlegroups.com
The last two days the instances kept unstable. Today I kill all
instances hopping this helps. But 3 instances spanned up again and now
every second request to the application fails with the known GAE
problem:

com.google.apphosting.api.DeadlineExceededException: This request
(9706ce3068c95802) started at 2012/08/20 16:05:11.636 UTC and was
still executing at 2012/08/20 16:06:11.214 UTC.
at com.google.appengine.runtime.Request.process-9706ce3068c95802(Request.java)

PLEASE GOOGLE - SOME FEEDBACK / EVALUATIONS WOULD BE NICE !

see also: http://code.google.com/p/googleappengine/issues/detail?id=7910


latest Pingdom History:
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 18:02
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 18:01
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 17:59
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:58
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 17:53
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:52
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 17:50
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:47
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 17:45
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:44
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 17:42
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:41
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 17:35
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 16:52
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 16:52
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 04:24
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 04:23
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP 02:39
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN 02:38
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP So 19:16
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN So 19:15
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP So 17:19
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN So 17:19
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP So 16:03
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN So 16:02
al...@pingdom.com UP alert: krisentalk (www.krisentalk.de) is UP So 5:15
al...@pingdom.com DOWN alert: krisentalk (www.krisentalk.de) is DOWN So 5:14

Jeff Schnitzer

unread,
Aug 20, 2012, 12:28:09 PM8/20/12
to google-a...@googlegroups.com
On Mon, Aug 20, 2012 at 9:43 AM, Barry Hunter <barryb...@gmail.com> wrote:
>
> CloudSQL. That is a 'traditional RDBMS' :)
>
> https://developers.google.com/cloud-sql/

Except as we've experimentally seen when testing Richard's game, Cloud
SQL has significant per-instance throughput limits. I don't know why;
possibly something to do with the infrastructure between GAE and Cloud
SQL? You'll be lucky to get hundreds of updates per second - and
that's across the instance, not across any particular piece of data.

BTW, yesterday we shut down the 20 backends that Richard needed to
collect game scores. They're now getting submitted to three $16/mo
node.js instances, each of which alone could handle peak loads and
probably more. When the next game client update goes out and they
start submitting directly to the node.js instances instead of GAE,
he'll be able to shut down most of the ~80 frontend instances
currently required.

Granted, this is a fairly unusual app, but by surgically using "the
right tool for the right job" (tools which do not exist on GAE) we're
going from thousands per month to hundreds per month. I don't
consider this a condemnation of GAE - I think it's perfectly
acceptable to run a hybrid application, and by launching GCE it's
clear that Google does too - but it does kinda bother me that there
are these GAE components (like backends) that look really useful but
have undocumented limitations (like horrible throughput) that
effectively render them useless. Worse than useless, because of all
the time wasted experimenting with them.

Jeff

Jeff Schnitzer

unread,
Aug 20, 2012, 12:32:22 PM8/20/12
to google-a...@googlegroups.com
Is that the full stacktrace?!?

Jeff

Mos

unread,
Aug 20, 2012, 12:55:10 PM8/20/12
to google-a...@googlegroups.com
> Is that the full stacktrace?!?

No, here is one example:

com.google.apphosting.api.DeadlineExceededException: This request
(681ae7cef438e16b) started at 2012/08/20 16:30:31.376 UTC and was
still executing at 2012/08/20 16:31:30.903 UTC.
at com.google.appengine.runtime.Request.process-681ae7cef438e16b(Request.java)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1200(ZipFile.java:57)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:476)
at java.util.zip.ZipFile$1.fill(ZipFile.java:259)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at sun.misc.Resource.getBytes(Resource.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:273)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:188)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1003)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:907)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:270)
at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:125)
at org.springframework.beans.factory.support.ConstructorResolver.resolveConstructorArguments(ConstructorResolver.java:616)
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:148)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1003)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:907)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:485)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:196)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBeansOfType(DefaultListableBeanFactory.java:400)
at org.springframework.context.support.AbstractApplicationContext.getBeansOfType(AbstractApplicationContext.java:1146)
at org.springframework.beans.factory.BeanFactoryUtils.beansOfTypeIncludingAncestors(BeanFactoryUtils.java:275)
at org.springframework.web.servlet.handler.AbstractUrlHandlerMapping.initInterceptors(AbstractUrlHandlerMapping.java:168)
at org.springframework.web.servlet.handler.AbstractHandlerMapping.initApplicationContext(AbstractHandlerMapping.java:110)
at org.springframework.web.servlet.handler.AbstractDetectingUrlHandlerMapping.initApplicationContext(AbstractDetectingUrlHandlerMapping.java:57)
at org.springframework.context.support.ApplicationObjectSupport.initApplicationContext(ApplicationObjectSupport.java:119)
at org.springframework.web.context.support.WebApplicationObjectSupport.initApplicationContext(WebApplicationObjectSupport.java:72)

Jeff Schnitzer

unread,
Aug 20, 2012, 2:36:54 PM8/20/12
to google-a...@googlegroups.com
This looks like a startup request that is taking too long - is that the issue?

If so, it could be yet another blip in startup times - as I (and
others) have complained many times in the past, there seems to be a
lot of variance in startup times. 20s today could be 60s deadline
failures tomorrow.

One thing that may help: Start a cron job that hits your app as often
as possible. Cron has a 10m deadline so you're almost guaranteed to
get an instance off the ground even when it blows the limit.

Jeff

Jeff Schnitzer

unread,
Aug 20, 2012, 2:41:56 PM8/20/12
to google-a...@googlegroups.com
Oh, and if you haven't already, star these issues:

http://code.google.com/p/googleappengine/issues/detail?id=7706
http://code.google.com/p/googleappengine/issues/detail?id=7865

Jeff

Random not-so-amusing anecdote: My app, voo.st, is also down right
now - not because of appengine, but because the .st domain registry
started screwing up dns delegation this morning. Half of requests
fail to delegate to the correct 'voo' nameservers. And their support
channels are doing nothing. I've never seen a nic fail before... and
it's not like I can switch to a different provider!

Joakim

unread,
Aug 20, 2012, 2:48:56 PM8/20/12
to google-a...@googlegroups.com, mos...@googlemail.com
As a Spring user on GAE, the best thing I've done is disable the annotation driven context:component-scan. It's a hassle migrating everything from @Component and @Autowired to <bean class="x.y.Z">...</bean>, but now Spring's FrameworkServlet initializes in 15 seconds including Objectify instead of 40 seconds excluding Objectify. Reflections seem to be mind bogglingly slow.
Also, I wish frameworks supported doing this work at build-time.

Mos

unread,
Aug 21, 2012, 3:29:01 AM8/21/12
to google-a...@googlegroups.com
Hi Jeff,

thanks for the feedback!

> This looks like a startup request that is taking too long - is that the issue?

No, not really. It's an instance that is already up. It happens
suddenly from time
to time.
Please check (and star)
http://code.google.com/p/googleappengine/issues/detail?id=7982
for another stacktrace.

Cheers
Mos

Jorge Amat

unread,
Oct 11, 2013, 10:45:29 AM10/11/13
to google-a...@googlegroups.com, mos...@googlemail.com
Still having these same issues nowadays,  this is outrageous.

Jorge Amat

unread,
Oct 11, 2013, 10:58:58 AM10/11/13
to google-a...@googlegroups.com, mos...@googlemail.com
Sometimes it has happened to me that when I set the min idle instance to 1,  after some minutes it's again set as automatic.without any consent.
Other thing I have tried is to pause the defauls task queue, since suddently starts to make a lot of requests, however doesn't look to be the solution for the non justified increase of instance hours and consequently expenses for users.
Reply all
Reply to author
Forward
0 new messages