Problem with App Engine custom domains

83 views
Skip to first unread message

Google Cloud Platform Status

unread,
Apr 22, 2015, 11:08:53 AM4/22/15
to google-appengine...@googlegroups.com
We are currently experiencing an issue with custom domains on Google App
Engine. For everyone who is affected, we apologize for any inconvenience
you may be experiencing. We will provide an update by 08:30 US/Pacific time
with current details.

Google Cloud Platform Status

unread,
Apr 22, 2015, 11:31:33 AM4/22/15
to google-appengine...@googlegroups.com
The issue with custom domains is alleviated for most customers. Google
engineers are working to ensure service is fully restored.

Google Cloud Platform Status

unread,
Apr 22, 2015, 11:47:46 AM4/22/15
to google-appengine...@googlegroups.com
We confirm that the problem with custom domains on Google App Engine was
resolved as of shortly after 08:09 PDT on Wednesday 22 April, 2015. We
apologize for the inconvenience and thank you for your patience and
continued support. Please rest assured that system reliability is a top
priority at Google, and we are making continuous improvements to make our
systems better.

We will provide a more detailed analysis of this incident once we have
completed our internal investigation.

Google Cloud Platform Status

unread,
Apr 23, 2015, 1:36:44 PM4/23/15
to google-appengine...@googlegroups.com
SUMMARY:

On Wednesday 22 April 2015, for a duration of 92 minutes, some requests
from European regions to Google App Engine custom domains were redirected
to the Google front page. We apologise to our customers and users who were
affected by this issue, and we have taken and are taking immediate steps to
improve the platform’s availability.

DETAILED DESCRIPTION OF IMPACT:

Starting at 06:37 PDT on Wednesday 22 April, some custom-domain URL
requests from the Europe region were redirected to the www.google.com front
page, or to equivalent national Google front pages, instead of being
dispatched to their target Google App Engine applications.

The incident had two phases. In the first phase, from 06:37 to 07:30, 7.9%
of traffic to custom domains was affected. In the second phase, from 07:30
to 08:09, 13.7% of custom domain traffic was affected. In total,
approximately 0.2% of requests to App Engine were incorrectly redirected
during the incident.

Requests originating outside Europe were not affected, except for a very
small percentage which were routed to the Google network through European
points of presence. Requests to applications via appspot.com domains were
also not affected. The hosting region of the application was not a factor.

ROOT CAUSE:

App Engine custom domains are handled by a system which performs domain
mapping for a number of Google services. In order to increase performance,
capacity and supportability, Google engineers are in the process of
migrating this system's traffic onto Google's general-purpose network
infrastructure.

The outage commenced when a rollout of this integration began in European
datacenters, with a small fraction of custom domain requests being routed
through the general infrastructure. Detailed monitoring was in place for
this migration but, incorrectly, did not include App Engine custom domains.
Due to a configuration error, the migrated App Engine custom domains were
not recognized by the infrastructure, which therefore redirected them to
its default target of the Google front page.

REMEDIATION AND PREVENTION:

At 08:04, the issue was identified and Google engineers immediately
cancelled the rollout, restoring service by 08:09.

To prevent similar issues from reaching production in future, Google
engineers are implementing software release tests to identify the class of
configuration error that triggered the incident.

In case similar issues do reach production, Google engineers are extending
rollout testing to include App Engine custom domains so that problematic
rollouts will be detected and cancelled automatically and immediately.

Finally, continuous monitoring will be added to ensure that all types of
custom domain are being correctly recognized and dispatched by the
infrastructure, so that Google engineers will be rapidly notified if
similar issues recur, regardless of the cause.
Reply all
Reply to author
Forward
0 new messages