Google has revealed that a recent six-hour outage at one of its cloudy regions was caused by uninterruptible power supplies not doing their job. The outage commenced on March 29th and caused “degraded service or unavailability” for over 20 Google Cloud services in the us-east5-c zone. Google’s US east zone is centered on Columbus, Ohio.
Google’s incident report states that the outage started with “loss of utility power in the affected zone.” Hyperscalers build to survive that sort of thing with uninterruptible power supplies (UPSes) that are supposed to immediately provide power if the grid goes dead, and keep doing so for a few hours before diesel-powered generators kick in. Google’s UPSes, however, suffered a “critical battery failure” and didn’t provide any juice.
They also appear to have prevented power from generators reaching Google’s racks, because the incident report states the advertising giant’s engineers had to bypass the UPSes before power became available. Engineers were alerted to the incident at 12:54 Pacific Time and their efforts saw generators come online at 14:49. “The majority of Google Cloud services recovered shortly thereafter,” the incident report states, although “A few services experienced longer restoration times as manual actions were required in some cases to complete full recovery.
” Google is terribly sorry this happened and “committed to preventing a repeat of this issue in the future.” To avoid similar messes in future, the web giant has promised to do the following: Oh, to be a fly on the wall when Google meets with that UPS vendor. Hyperscalers promise resilience and mostly succeed, but even their plans can sometimes go awry.
The lesson for the rest of us is that regular testing of all disaster recovery infrastructure and procedures - including what to do when public clouds have outages – is not optional or something that can be put off. ®.
Technology
Google Cloud’s so-called uninterruptible power supplies caused a six-hour interruption

When the power went out, they didn’t switch on Google has revealed that a recent six-hour outage at one of its cloudy regions was caused by uninterruptible power supplies not doing their job....