Hundreds of millions of people were unable to access Facebook, Instagram and WhatsApp for more than six hours on Monday, underscoring the world’s reliance on platforms owned by the Silicon Valley giant.
But what actually caused the outage?
What does Facebook say happened?
In an apologetic blog post, Santosh Janardhan, Facebook’s vice president of infrastructure, said that “configuration changes on the backbone routers that coordinate network traffic between our data centres caused issues that interrupted this communication”.
Can you explain that in plain English?
Cyber experts think the problem boils down to something called BGP, or Border Gateway Protocol — the system the internet uses to pick the quickest route to move packets of information around.
Sami Slim of data centre company Telehouse compared BGP to “the internet equivalent of air traffic control”.
In the same way that air traffic controllers sometimes make changes to flight schedules, “Facebook did an update of these routes,” Slim said.
But this update contained a crucial error.
It’s not yet clear how or why, but Facebook’s routers essentially sent a message to the internet announcing that the company’s servers no longer existed.
Why did it take so long to fix the problem?
Experts say Facebook’s technical infrastructure is unusually reliant on its own systems — and that proved disastrous on Monday.
After Facebook sent the fateful routing update, its engineers got locked out of the system that would allow them to communicate that the update had, in fact, been an error. So they couldn’t fix the problem.
“Normally it’s good not to put all your eggs in one basket,” said Pierre Bonis of AFNIC, the association that manages domain names in France.
“For security reasons, Facebook has had to very strongly concentrate its infrastructure,” he said.
“That streamlines things on a daily basis — but because everything is in the same place, when that place has a problem, nothing works.”
The knock-on effects of the shutdown included some Facebook employees being unable to even enter their buildings because their security badges no longer worked, further slowing the response.
Is this unprecedented?
Social media outages are not uncommon: Instagram alone has experienced more than 80 in the past year in the United States, according to website builder ToolTester.
This week’s Facebook outage was rare in its length and scale, however.
There is also a precedent for BGP meddling being at the root of a social media shutdown.
In 2008, when a Pakistani internet service provider was attempting to block YouTube for domestic users, it inadvertently shut down the global website for several hours.
And the outage’s impact?
Between Facebook, Instagram, WhatsApp and Facebook Messenger, “billions of users have been impacted by the services being entirely offline”, the Downdetector tracking service said.
Facebook, whose shares fell nearly five percent over the outage, has stressed there is “no evidence that user data was compromised as a result of this downtime”.
But even though it lasted just a few hours, the impact of the shutdown ran deep.
Facebook’s services are crucial for many businesses around the world, and users complained of being cut off from their livelihoods.
Facebook accounts are also commonly used to log in to other websites, which faced additional problems due to the company’s technical meltdown.
Rival instant messaging services meanwhile reported that they had benefited from the fact that WhatsApp and Facebook Messenger were down.
Telegram went from the 56th most downloaded free app in the US to the fifth, according to monitoring firm SensorTower, while Signal tweeted that “millions” of new users had joined.
And among the more curious side-effects, several domain name registration companies listed Facebook.com as available for purchase.
“There was never any reason to believe Facebook.com would actually be sold as a result, but it’s fun to consider how many billions of dollars it could fetch on the open market,” said cyber security expert Brian Krebs.