SUMMARY:
From 20:12 US/Pacific on the 7th April, to 07:08 on the 8th April, some applications using the Google API (GAPI) JavaScript client library experienced errors when using JSON-RPC Google APIs through the library. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we have taken and are taking immediate steps to improve the platform’s performance and availability.
DETAILED DESCRIPTION OF IMPACT:
Starting at 20:12 US/Pacific on 7 April, clients which loaded a new version of the GAPI library, and used the JSON-RPC interface, began to receive 404 “Not Found” responses from the Google API servers. The issue continued to develop and reached a plateau at 01:13 on 8 April. At this time, about 31% of JSON-RPC calls were affected. Remediation began at 03:57, with the error rate decreasing steadily from then until 07:08. A very small number of clients may have experienced errors after this time.
The most severely affected Google Cloud Platform services were the Cloud Storage object browser in the Cloud Developer Console, and the Cloud Endpoints service.
Clients using JSON-REST calls were not affected, nor were clients using JSON-RPC calls with a cached older version of the GAPI library. In total during the incident, 21% of JSON-RPC GAPI calls, and 0.07% of all GAPI library calls, were affected by this issue.
ROOT CAUSE:
The root cause of the outage was an error in the GAPI client JavaScript library, which caused the ‘apiVersion’ parameter in API calls to be set to “undefined”. The faulty library version was deployed in a software package that was rolled out across the Google servers from 20:12 to 01:13 US/Pacific. In response to calls from this library, the API servers returned a HTTP 404 “Not Found” error, causing the propagation of an error to the client invoking the GAPI library call.
In addition, the fault was detected by automated release testing, but an error in the monitoring configuration prevented the automation from halting the release process.
REMEDIATION AND PREVENTION:
To remedy the issue, as soon as the code error was identified, Google engineers began a rollback to the previous version of the software package that includes the GAPI library. This process took approximately 3 hours 11 minutes to complete across the entire Google estate.
To prevent recurrences, Google engineers will review the GAPI JavaScript library testing and monitoring, and ensure that it is fully integrated with the release automation so that releases are detected and rolled back as necessary.