Last week was a bad week for Microsoft and Google Cloud apps. Microsoft’s online services infrastructure experienced outage affecting some customers in North America online. It caused interruptions in Office 365 and various Windows Live services for a few hours. Coincidently on the same day, Google’s cloud productivity service, Google Docs, went offline for some time.
Google on the outages: The [Google Docs] outage was caused by a change designed to improve real time collaboration within the document list,Unfortunately this change exposed a memory management bug which was only evident under heavy usage … We have assembled a list of steps which will both reduce the chance of a future event, decrease the time required to notice and resolve a problem, and limit the scope which any single problem can affect.
Microsoft on the outage: Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services, A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption. We are continuing to review the incident.
Amazon: Few months back, Amazon had faced similar outages. In the second week of Aug, Amazon EC2 and RDS outage had impacted Netflix. In the month of April, Amazon had hit with serious outages for which Amazon had to apologize.
Solution? so far, the outages are low and quickly addressed. so, this outages are an issue that is being discussed but so far not adversely affecting adaption of cloud . However, it does raise a need to come up with a solution to mitigate such risk. Standardizations of cloud platform and having a standby on private cloud or secondary provider could be possible solution.