IT Pro is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more

Facebook blames faulty configuration change for hours-long outage

The update caused a "cascading effect" that brought all of the social network's services to a halt

A faulty configuration change has been blamed for taking Facebook, WhatsApp and Instagram offline for more than six hours on Monday night. 

The social network's engineering team said that the changes affected the routers that coordinate the platform's network traffic between its data centres. This, they said, caused a "cascading effect" on the way its data centres communicate, bringing all of the company's services to a halt. 

"Our services are now back online and we're actively working to fully return them to regular operations," the company said in a blog post. "We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime."

In order to remedy the issue, Facebook sent engineers to one of its main data centres in California, according to The New York Times, suggesting it couldn't be fixed remotely. It was also reported that the outage prevented staff from accessing company buildings and conference rooms with their badges.

The incident caught the attention of internet giant Cloudflare, which initially assumed something was wrong with its own DNS servers. However, after an investigation, engineers realised something more serious was happening, and reported in a blog that "social media quickly burst into flames."

Related Resource

The care and feeding of cloud

How to support cloud infrastructure post-migration

How to support cloud infrastructure post-migration - webinar from Trend MicroWatch now

"Facebook and its affiliated services WhatsApp and Instagram were, in fact, all down," Cloudflare said. "Their DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had 'pulled the cables' from their data centres all at once and disconnected them from the Internet."

The issues were down to BGP - the Border Gateway Protocol - which is a mechanism that exchanges routing information between autonomous systems on the web. The bigger versions of these make the internet work and have constantly updated lists for the possible routes of traffic, according to Cloudflare. 

"The Internet is literally a network of networks, and it's bound together by BGP," the firm said in its blog. "BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can't find Facebook's network and so it is unavailable."

Featured Resources

Accelerating AI modernisation with data infrastructure

Generate business value from your AI initiatives

Free Download

Recommendations for managing AI risks

Integrate your external AI tool findings into your broader security programs

Free Download

Modernise your legacy databases in the cloud

An introduction to cloud databases

Free Download

Powering through to innovation

IT agility drive digital transformation

Free Download

Recommended

Rugged servers market to hit $945 million by 2032
Hardware

Rugged servers market to hit $945 million by 2032

30 May 2022
Senator wants social media companies held liable for spreading anti-vax lies
social media

Senator wants social media companies held liable for spreading anti-vax lies

23 Jul 2021

Most Popular

Salaries for the least popular programming languages surge as much as 44%
Development

Salaries for the least popular programming languages surge as much as 44%

23 Jun 2022
The UK's best cities for tech workers in 2022
Business strategy

The UK's best cities for tech workers in 2022

24 Jun 2022
LockBit 2.0 ransomware disguised as PDFs distributed in email attacks
Security

LockBit 2.0 ransomware disguised as PDFs distributed in email attacks

27 Jun 2022