Google Blames Outage on Software Bug

If you are a hardcore Google user, you may have been tempted to pull out a few hairs last Friday as several of the companyEUs key services experienced a painful hiccup. Now, Google is shedding some light on the incident.

Specifically, Google users who use logged-in services like Gmail, Google+, Calendar and Documents were unable to access those services for about 25 minutes, according to Google vice president of Engineering Ben Treynor.

EUFor about 10 percent of users, the problem persisted for as much as 30 minutes longer,EU he said on Friday. EUWhether the effect was brief or lasted the better part of an hour, please accept our apologies -- we strive to make all of GoogleEUs services available and fast for you, all the time, and we missed the mark today.EU

What Really Happened?

Treynor reports that the issue has been resolved, and the company is now focused on correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesnEUt happen again. He then offered a technical explanation for what occurred and how it was fixed.

At 10:55 a.m. PST Friday morning, Treynor explained, an internal system that generates configurations -- essentially, information that tells other systems how to behave -- encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused usersEU requests for their data to be ignored, and those services, in turn, generated errors.

EUUsers began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted GoogleEUs Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending...

Comments are closed.