Lightning Strikes Disrupt Google Data Center
Recently, Google data centers in Belgium have been hit by a series of lightning strikes, which not only took some of its cloud storage systems offline briefly, but caused errors in some customers cloud infrastructure. It was initially reported that lightning had struck electrical systems in one of its three data centers in a small town about fifty miles southwest of Brussels called St. Ghislain. It was later relayed that lightning had not struck the data center, but had hit the local utility grid. This had caused the data center’s power to be interrupted.
Failover systems may switch to an auxiliary power if the primary source goes offline while servers in the data centers have batteries for extra backup. The servers supporting Persistent Disk, cloud storage that acts independently of compute, were backed up with such batteries. However, some servers stilled failed because extended use of the batteries caused them to drain. The incident report stated “In almost all cases the data was successfully committed to stable storage, although manual intervention was required in order to restore the systems to their normal serving state.”
Over the five days that problems had appeared with the cloud storage systems, Google engineers had estimated that around five percent of the persistent disks in the Belgium zone had at least one I/O read or a write failure. A miniscule fraction of all the persistent disks were permanently deleted from servers, roughly 0.000001 percent according to the Google incident report.
Google’s infrastructure teams are swiftly working to replace storage systems with hardware that is more resilient against power failure in case of another emergency such as this one, so that data may be backed up. According to Google, most of the Persistent Disk Storage has already begun running on this stronger hardware.
Following this outage, Google has reminded customers that it has a multitude of cloud computing regions throughout the globe and within these regions are multiple isolated zones so users may set up resilient infrastructure that may fail over from a different zone in case of a single zone outage, like what occurred in Belgium. Google Compute Engine has three different regions: Central US in Council Bluffs, Iowa, Western Europe in St. Ghislain, and East Asia in Changhua County, Taiwan. In the Central United States region, there are four different zones while in Western Europe and East Asia there are three zones each. Because of the different zones, customers may successfully prepare for situations like the one that occurred in Belgium.