Some App Engine apps experiencing quota denial 503 errors

142 views
Skip to first unread message

Google Cloud Platform Status

unread,
Mar 24, 2015, 5:16:44 PM3/24/15
to google-appengine...@googlegroups.com
We're investigating an issue with Google App Engine beginning at Tuesday
2015-03-24 13:05 (all times are in US/Pacific)]. We will provide more
information shortly within 20 minutes.

Google Cloud Platform Status

unread,
Mar 24, 2015, 5:35:54 PM3/24/15
to google-appengine...@googlegroups.com
Some App Engine apps are serving 503 "Over serving quota" errors. For
everyone who is affected, we apologize for any inconvenience you or your
customers are experiencing. We will provide an update by 15:00 with current
details.

Google Cloud Platform Status

unread,
Mar 24, 2015, 6:00:23 PM3/24/15
to google-appengine...@googlegroups.com
Several applications running on Google Application Engine have reported
elevated rates of "503: Over Serving Quota" errors. The Google engineering
and support teams are working with affected applications to correct and
diagnose the cause of these elevated error rates. The overall error rate
in Google Application Engine increased by 1% at the time of the reports,
and has since returned to approximately baseline levels. We expect to have
another update at 15:30.

Google Cloud Platform Status

unread,
Mar 24, 2015, 6:34:06 PM3/24/15
to google-appengine...@googlegroups.com
The issue with App Engine "503: Over Serving Quota" errors was resolved as
of 14:52 on Tuesday 2015-03-24. We apologize for any issues you may have
experienced.

We will provide a detailed analysis of this incident at
https://status.cloud.google.com/incident/appengine/15009 once we have
completed our internal investigation.

Google Cloud Platform Status

unread,
Mar 24, 2015, 11:15:18 PM3/24/15
to google-appengine...@googlegroups.com
The issue with 503 "Over serving quota" errors has resurfaced. We're
currently investigating it and will provide more information shortly within
an hour.

Google Cloud Platform Status

unread,
Mar 25, 2015, 12:12:14 AM3/25/15
to google-appengine...@googlegroups.com
The issue with 503 "Over serving quota" errors was resolved as of 14:52 PDT
on Tuesday 2015-03-24, as previously indicated on the 15:00 update. We
received a report that led us to mistakenly conclude that the issue had
resurfaced. However, our reliability engineering team uncovered a
different root cause for that report. If your app is receiving 503 “Over
serving quota errors” after 14:53 PDT please file a support case.

Google Cloud Platform Status

unread,
Mar 26, 2015, 1:32:44 AM3/26/15
to google-appengine...@googlegroups.com
SUMMARY:

On Tuesday 24th March 2015, Google App Engine served elevated 503 errors on
<1% of applications for a typical duration of 50 minutes. We know how
important high uptime and low error rates are to you and your users, and we
apologize for these errors. We are learning from this incident and are
implementing several improvements to make our service more reliable.

DETAILED DESCRIPTION OF IMPACT:

On Tuesday 24th March 2015 from 13:03 to 13:53 PDT approximately 1% of
requests to App Engine erroneously received an error 503 with a
message "Over Quota. This application is temporarily over its serving
quota. Please try again later." This occurred despite applications being
within their quotas. The distribution of these errors was not uniform; some
applications received a disproportionately high fraction of the total
errors.

ROOT CAUSE:

A latent bug in the App Engine quota handling code was triggered during a
routine software update of the quota system. This resulted in App Engine
returning over-quota errors to some applications that were not over quota.
As App Engine software updates are rolled out progressively, only some
applications were affected by the update before the issue was detected and
remediated.

REMEDIATION AND PREVENTION:

Google engineers directed traffic away from the affected App Engine
infrastructure once the nature of the problem was understood. This led to
the return of global 503 rates to pre-incident levels at 13:53. Google
engineers identified a small number of applications that escaped the
initial change and fixed their quota behavior manually at 14:45.

In order to prevent recurrence of this issue, Google engineers will add
monitoring and alerting for the quota issue that resulted in spurious 503
errors, create a new quick response protocol for handling erroneous quota
responses, and will modify application quota behavior to tolerate novel
quota system behavior with lower application impact.
Reply all
Reply to author
Forward
0 new messages