Bengaluru, NFAPost: Cloudflare 18.104.22.168 DNS service issues caused many major websites and services unreachable for a period on Friday afternoon. But the company proactively intervened to plug the glitch and saved the website downtime at few geographies.
Commenting on the outage in a blog, Cloudflare CTO John Graham-Cumming said that the cause of the 50% drop in traffic across the network, and the subsequent internet outages, was “a configuration error in our backbone network.”
The outage seems to have started at about 2:15 Pacific time and lasted for about 25 minutes before connections began to be restored. Google DNS may also have been affected. The company also issued a statement via email emphasising that this was not an attack on the system.
“This afternoon we saw an outage across some parts of our network. It was not as a result of an attack. “It appears a router on our global backbone announced bad routes and caused some portions of the network to not be available. We believe we have addressed the root cause and monitoring systems for stability now. We will share more shortly—we have a team writing an update as we speak,” the company said in a statement.
Backbone From Newark to Chicago
Cloudflare CTO John Graham-Cumming said the outage occurred because, while working on an unrelated issue with a segment of the backbone from Newark to Chicago, the company’s network engineering team updated the configuration on a router in Atlanta to alleviate congestion.
“This configuration contained an error that caused all traffic across our backbone to be sent to Atlanta. This quickly overwhelmed the Atlanta router and caused Cloudflare network locations connected to the backbone to fail,” said Cloudflare CTO John Graham-Cumming.
Not only were websites down but also some status pages meant to provide warnings and track outages. In at least one case, even the status page for the status page was down. The affected locations were San Jose, Dallas, Seattle, Los Angeles, Chicago, Washington, DC, Richmond, Newark, Atlanta, London, Amsterdam, Frankfurt, Paris, Stockholm, Moscow, St. Petersburg, São Paulo, Curitiba, and Porto Alegre. Other locations continued to operate normally.
“We are sorry for the disruption to our customers and to all the users who were unable to access Internet properties while the outage was happening,” said Cloudflare CTO John Graham-Cumming.
Cloudflare wrote in a tweet and an update to its own status page (which thankfully remained available) that it was “investigating issues with Cloudflare Resolver and our edge network in certain locations. Customers using Cloudflare services in certain regions are impacted as requests might fail and/or errors may be displayed.”
Some of the services and sites also relied on Google’s Public DNS service (22.214.171.124), which appeared to be having simultaneous issues, but this has not been able to directly confirm by any site. Google shows no interruption to services on its status dashboard.
Despite much speculation as to the cause of the outage, there is no evidence that it was caused by a denial-of-service attack or any other form of malicious hackery.
Cloudflare takes on the threats by defeating threat actors and regularly protects customers from massive distributed denial of service (DDoS) attacks. These attacks are ever-increasingly sophisticated, often throwing large resource loads at Cloudflare’s routers and appliances to take sites down.
Cloudflare operates a backbone between many of its data centers around the world. The backbone is a series of private lines between at its data centers that is used for faster and more reliable paths between them. These links allow the company to carry traffic between different data centers, without going over the public Internet.
Cloudflare CTO John Graham-Cumming stated that the company never experienced an outage on its backbone and its team responded quickly to restore service in the affected locations, but this was a very painful period for everyone involved.