FCC Report: AT&T 9-1-1 Outage Likely Could Have Been Prevented
Thursday, May 18, 2017 | Comments

A report from the FCC’s Public Safety and Homeland Security Bureau (PSHSB) found that a nationwide March 8 9-1-1 outage on AT&T’s wireless network was caused by “an error that likely could have been avoided had AT&T implemented additional checks with respect to their critical 9-1-1 network assets.”

During the outage, which lasted five hours, about 12,600 users tried to call 9-1-1 but were unable to reach emergency services through the traditional 9-1-1 network. It was one of the largest 9-1-1 outages ever reported in the FCC’s network outage reporting system (NORS), as measured by the number of unique users affected, the FCC report said.

“The findings are highly instructive,” FCC Chairman Ajit Pai said in a statement. “Most importantly, this outage could have been prevented. The Bureau’s report shows that there were shortfalls in operational redundancies, risk assessment, and stakeholder and consumer outreach.”

The failures that caused the outages occurred entirely within AT&T’s network, the FCC report said. The outage was traced back to an incorrect record of whitelisted IP addresses that the company placed in its customer provisioning system. The incorrect list contained incorrect IP addresses for Comtech, one of two companies that provide 9-1-1 call routing information for AT&T.

On March 8, while the company was working on an unrelated project, a network changed initiated by AT&T pushed the incorrect record on to the live network, breaking AT&T’s connection to Comtech and disrupting the flow of information as to which public-safety answering points should receive certain 9-1-1 calls to AT&T’s network.

A soft reset initiated because of error messages reaching a certain threshold then disrupted transmissions from West Safety Services, the other company that provides 9-1-1 call routing functions for AT&T, because transmissions for both 9-1-1 subcontractors went over the same link.

While the connections to Comtech and West were down, AT&T routed 9-1-1 calls to a backup call center with call-takers who could manually routes the calls to the correct PSAP. However, the backup call center was not equipped to address a nationwide outage and could not handle all of the additional traffic caused by the outage, leading to many calls being dropped, the report said.

The report highlighted four major steps AT&T has taken to prevent a similar outage from occurring, including: 1. Treating connections between itself and 9-1-1 call routing subcontractors as infrastructure assets instead of customer assets. Changes to infrastructure assets have to undergo more rigorous testing before implementation on the live network. Using this method before the outage would have likely caught the incorrect list of IP addresses before it was implemented, the report said.
2. Modifying its internal alarm system so error reports are received immediately and concurrently by its 9-1-1 and VoLTE troubleshooting teams and its IP team. During the outage, different teams were notified at different times leading to some delays in the response.
3. Creating separate links to both of its 9-1-1 call-routing subcontractors so that an outage to one provider does not affect another provider.
4. Implementing a manual process to drop VoLTE services and move to 3G for 9-1-1 calls during VoLTE 9-1-1 calls. 9-1-1 calls on AT&T’s 3G network were not affected by the outage.

“The good news is that AT&T has now addressed the vulnerabilities that led to this outage,” Pai said in his statement. “Had these safeguards been in place on March 8, it is exceedingly unlikely that this outage would have occurred.”

Pai encouraged other carriers to assess networks and address any similar vulnerabilities and asked PSAPs and industry and consumer groups to explore ways to improve notifications to both PSAPs and consumers when outages occur.

“We can’t turn back time and undo this outage,” Pai’s statement said. “But by learning the right lessons from it, we can, must and will reduce the odds that such an outage will happen again.”

When reached for comment, an AT&T spokesman noted the steps that the company has taken to prevent another outage from occurring.

“We’ve done an extensive evaluation of this outage and concluded it was caused by a system configuration change affecting connectivity between a 9-1-1 vendor and our network,” the spokesman said. “We’ve taken steps to prevent this from happening again.”

Find the full report here.

Would you like to comment on this story? Find our comments system below.

Post a comment
Name: *
Email: *
Title: *
Comment: *


No Comments Submitted Yet

Be the first by using the form above to submit a comment!

Site Navigation