Amazon Web Services Was Taken Down by a Simple Typo

A single typo was apparently responsible for taking down a chunk of the Internet on Tuesday, Feb. 28, costing companies somwhere around $150 million. The revelation came from an online statement released by Amazon after its popular Amazon Web Services (AWS) platform was taken offline Tuesday for about four hours.

The service disruption, which affected AWS' Simple Storage Service (S3), resulted in problems for many of the Internet's most popular Web sites and services, including Trello, IFTTT, Slack and Gizmodo. According to Web site monitoring firm Apica, 54 of the largest online retailers experienced performance impairments on their Web sites, with some slowing down more than 20 percent.

Two Subsystems at Fault

"We want to apologize for the impact this event caused for our customers," Amazon said in the statement. "While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further."

The cause of the disruption was apparently a single typo entered by an Amazon team member who mistyped a command during an attempt to debug the service's billing system.

"At 9:37AM PST [Feb. 28], an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process," Amazon said. "Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems."

The two subsystems affected included an index subsystem that manages the metadata and location information for all...

Comments are closed.