Network Connectivity and Latency Issues in Europe

94 views
Skip to first unread message

Google Cloud Platform Status

unread,
Nov 10, 2015, 3:34:27 PM11/10/15
to google-appengine...@googlegroups.com
We are investigating reports of issues with network connectivity and
latency for Google App Engine and Google Compute Engine in Europe. We will
provide more information by 13:30 US/Pacific.

Google Cloud Platform Status

unread,
Nov 10, 2015, 4:41:00 PM11/10/15
to google-appengine...@googlegroups.com
We have resolved the issue with high latency and network connectivity
to/from services hosted in Europe. This issue started at approximately
08:00 PST and was resolved as of 13:15 PST. We will be conducting an
internal investigation and will share the results of our investigation
soon. If you continue to see issues with connectivity to/from services in
Europe, pease create a case and let us know.

Google Cloud Platform Status

unread,
Nov 13, 2015, 3:22:28 PM11/13/15
to google-appengine...@googlegroups.com
SUMMARY:

On Tuesday, 10 November 2015, outbound traffic going through one of our
European routers from both Google Compute Engine and Google App Engine
experienced high latency for a duration of 6h43m minutes. If your service
or application was affected, we apologize — this is not the level of
quality and reliability we strive to offer you, and we have taken and are
taking immediate steps to improve the platform’s performance and
availability.


DETAILED DESCRIPTION OF IMPACT:

On Tuesday, 10 November 2015 from 06:30 - 13:13 PST, a subset of outbound
traffic from Google Compute Engine VMs and Google App Engine instances
experienced high latency. The disruption to service was limited to
outbound traffic through one of our European routers, and caused at peak
40% of all traffic being routed through this device to be dropped. This
accounted for 1% of all Google Compute Engine traffic being routed from
EMEA and <0.05% of all traffic for Google App Engine.


ROOT CAUSE:

A network component failure in one of our European routers temporarily
reduced network capacity in the region causing network congestion for
traffic traversing this route. Although the issue was mitigated by changing
the traffic priority, the problem was only fully resolved when the affected
hardware was replaced.


REMEDIATION AND PREVENTION:

As soon as significant traffic congestion in the network path was detected,
at 09:10 PST, Google Engineers diverted a subset of traffic away from the
affected path. As this only slightly decreased the congestion, Google
Engineers made a change in traffic priority which fully mitigated the
problem by 13:13 PST time. The replacement of the faulty hardware resolved
the problem.

To address time to resolution, Google engineers have added appropriate
alerts to the monitoring of this type of router, so that similar congestion
events will be spotted significantly more quickly in future. Additionally,
Google engineers will ensure that capacity plans properly account for all
types of traffic in single device failures. Furthermore, Google engineers
will audit and augment capacity in this region to ensure sufficient
redundancy is available.
Reply all
Reply to author
Forward
0 new messages