Outage - Python 2.7, HRD. CloudSQL

109 views
Skip to first unread message

Philip Kilner

unread,
Mar 1, 2013, 6:15:50 AM3/1/13
to google-a...@googlegroups.com
Hi,

I've had an outage now since approximately 00:50am UK time - Pingdom
tells me that the app is up and down, but in reality it isn't serving pages.

Looking at logs, I see a wide variety of errors, even though every hit
is to the home page or redirected to the login page (just monitor hist
and login attempts).

It really isn't consistent enough for me to feel that my previously
happy app has an issue, and in fact the errors are primarily time-outs
or throw tracebacks related to framework code making a DB connection.

Errors spotted so far are are: -

- "Deadline exceeded"

- "Exceeded soft private memory limit with 154.66 MB after servicing 1
requests total"

- "File
"/python27_runtime/python27_lib/versions/1/google/storage/speckle/python/api/rdbms.py",
line 967, in _MakeRetriableRequest
raise _ToDbApiException(sql_exception)
InternalError: (0L, u'Connection is already in use.')"

...none of which I've seen before.

I'm hoping matters will improve, and fortunately am not (yet) under too
much pressure but wanted to ask: -

- Is anyone else seeing this?

- Is there a known issue with Cloud SQL? (When it gets as far as an
error, RDBMS connection seems to be the issue)

- Is there anything useful I can do other than wait?



--

Regards,

PhilK


'a bell is a cup...until it is struck'

Philip Kilner

unread,
Mar 2, 2013, 1:33:42 AM3/2/13
to google-a...@googlegroups.com, Mendel
Hi Mendel,

On 01/03/13 12:54, Mendel wrote:
> Same here, began about 7 hours ago...timeouts, very varying response
> time (50ms to 30000ms)
>

Thanks for confirming that it's not just me.

It came back to life briefly last night (see attached - not a pretty
picture), but is down again now. Pingdom is now reporting 29% uptime for
my app, but that's very optimistic - it might save a page 29% of the
time, but it hasn't sustained a session for more that 5% of the time in
the last 36 hours.

Can I ask anyone suffering similar to let us know if our apps have
anything in common, please?

My app is a very simple crud app, with a modest number of records so
far, on Python 2.7, HRD, Cloud SQL.

Because there hadn't been any changes in the app for 12 hours prior to
this, and because of the wide range of error messages / failure types,
I'm as sure as I can be that these issues flow from some change in GAE.

The one change I've allowed myself to make was to beef up my instances
form F1 to F2, but the mix of error messages didn't change much, and the
"soft memory limit exceeded" message was still thrown, just wit bigger
numbers.

The fact that I opted for CloudSQL is saving me a lot of stress here - I
do have the option to move it if this isn't resolved, and I don't expect
any response from Google so all I can do us wait and share my
experiences here until it's resolved or I run out f time and have to move.
Uptime Report.png

Philip Kilner

unread,
Mar 3, 2013, 7:30:40 AM3/3/13
to google-a...@googlegroups.com
Hi All,

My app has now been down for 48 hours, and since no-one else here seems
to have the same issue or severity, so have posted a production issue
here: -

https://code.google.com/p/googleappengine/issues/detail?id=8918

If you /do/ have the same issue, please star!


--

Regards,

PhilK


e: ph...@xfr.co.uk - m: 07775 796 747

'work as if you lived in the early days of a better nation'
- alasdair gray

johnP

unread,
Mar 3, 2013, 11:01:57 AM3/3/13
to google-a...@googlegroups.com, ph...@xfr.co.uk
There are lots of posts lately about bad serving performance across many configurations.  There have also lots of posts questioning why Google has not acknowledged any of the posts. 

There are clearly serving issues.  There is clearly no reaction, over an extended period of time.

Philip Kilner

unread,
Mar 3, 2013, 12:49:50 PM3/3/13
to johnP, google-a...@googlegroups.com
Hi John,

On 03/03/13 16:01, johnP wrote:
> There are lots of posts lately about bad serving performance across many
> configurations. There have also lots of posts questioning why Google
> has not acknowledged any of the posts.
>

All true.

It's probably worth mentioning out that this is a simple CRUD app - it
really isn't doing anything exotic. The only thing on "my" side that I
can see affecting it is the Cloud SQL connection code in the framework,
although I'd hardly expect that to suddenly stop working.

Fortunately, this app is portable and is currently in the process of
being commissioned, so I can move it temporarily without significant
impact on the customer. If it were not for the fact that I've planned
for that contingency, this situation would very serious indeed.


> There are clearly serving issues. There is clearly no reaction, over an
> extended period of time.
>

Well, I've been tracking this forum and the situation for a couple of
years, an have gone into it with my eyes open and a "Plan B" all ready
to roll, so I'm prepared - but the response to this will be a bit of a
litmus test.
Reply all
Reply to author
Forward
0 new messages