Google Workspace Status : GmailGoogle Calendar Google Chat Google Meet - AVAILABLE - 2025-02-14 10:38:02 UTC

3 views
Skip to first unread message

back...@gmail.com

unread,
Feb 14, 2025, 5:40:36 AMFeb 14
to g-suite-st...@googlegroups.com
Product: GmailGoogle Calendar Google Chat Google Meet
Status: AVAILABLE
Published: 2025-02-14 10:38:02 UTC

# Incident Report
## Summary
On Monday, February 10, 2025, Google services including Chat,
Calendar, Gmail, Meet, and Docs experienced intermittent 'Error 500'
messages and increased latency. The Google Cloud Console also
encountered '500' errors and failed to load traffic in South America.
These disruptions lasted for 36 minutes and primarily affected users
in North and South America regions.
This is not the level of quality and reliability we strive to offer
you and we are taking immediate steps to improve the platform's
performance and availability.
## Root Cause
During a routine disaster recovery exercise for Google Account
Infrastructure, a set of cluster resources in one of our Data Centers
were drained, caused by a config change resulting in traffic being
redirected to another service in another Data Center. The increased
traffic load overwhelmed the service which provides authentication and
other core services to Workspace products, leading to errors and
increased latencies impacting traffic in North and South America
regions. Engineers undrained the set of cluster resources to mitigate
the issue.
## Remediation and Prevention
Google engineers were alerted to the outage via internal monitoring on
Monday, 10 February at 11:48 US/Pacific and immediately started an
investigation. Once the nature and scope of the issue became clear at
12:12 US/Pacific, Google engineers quickly undrained the set of
cluster resources which caused the traffic shifts and overloaded tasks
to mitigate the issue at 12:26 US/Pacific.
Google is committed preventing a repeat of this issue in the future
and is completing the following actions: - We are working on adding
automated guardrails around drain/undrain operations for the relevant
services to prevent configuration changes of this type for the
relevant services.
- We are making proactive adjustments to how systems handle network
changes to avoid/prevent any potential disruptions.
- We are addressing a safely draining process in specific areas to
ensure smooth and reliable operation and develop a workaround that
will allow us to safely manage our network while maintaining
uninterrupted service.
## Detailed Description of Impact
On Monday, 10 February 2025 from 11:50 to 12:26 US/Pacific multiple
Google Workspace products experienced intermittent '500' errors along
with increased latency, and the Google Cloud Console experienced '500'
errors and failed to load for traffic for some users in South America.
### Google Workspace:
* Google Chat - 65k affected users.
* Google Calendar - 25k affected users
* Gmail - 77k affected users
* Google Meet - 8k affected users
* Google Docs - 1k affected users ### Google Cloud Console:
* Approximately 8k users were affected, predominantly in South
America, during the outage. In North America the effect was very short
lived (<2 minutes).
Errors and slowness for a portion of traffic in the Data Center were
observed due to overload and cascading failure. Errors led to
intermittent 500 errors on all authenticated apps. Some of the issues
may have been "sticky" for users who were triply affected by broken
tasks, leading to more consistent 500 errors, for about 10 minutes.

Link to official detail of this status:
https://www.google.com/appsstatus/dashboard/incidents/cyCMb3ggSVhQkRC4owxN
Reply all
Reply to author
Forward
0 new messages