Microsoft had issues with its datacenter near San Antonio, Texas yesterday -- some of which are still ongoing -- that led to some of the server and network infrastructure equipment operating in that facility to shut down, impacting multiple major cloud services for many of its customers.
The cause was a lightning strike resulting from severe storms in the area that led to what Microsoft describes as a "power voltage increase that impacted cooling systems." With cooling systems being critical in a data center, the loss of cooling meant that Microsoft's automated data center procedures to keep data and hardware from being lost or damaged began a "structured power down process." By 11:40 AM Microsoft had restored power to the affected buildings and the systems were coming back online.
The result of that shutdown meant that some Azure customers with workloads running in that data center were unable to access their services and the cooling issues also affected infrastructure that was serving some Office 365 users. Users that required Microsoft Active Directory to log into their accounts were also hit. During the entire process, Microsoft had promised to keep users apprised of its efforts to restore full functionality. The rub for people keeping an eye on what was going on is that the Azure service status page, where the promised status updates post, was also down several times during the outage.
Multiple other Microsoft services were affected by the outage including Visual Studio Team Services in multiple regions; Xbox Live and OneDrive also had issues for some users. Microsoft was still trying to get services back online as of 3:15 PM yesterday afternoon. As of writing this morning (9/5/18), Microsoft has yet to recover completely from the outage. The Azure Status page notes that engineers are still working to recover impacted Azure Storage scale units in the data center and to recover remaining Storage-dependent services in the South Central U.S.