John Heidemann

Detecting Internet Outages with Precise Active Probing (extended)

TitleDetecting Internet Outages with Precise Active Probing (extended)
Publication TypeTechnical Report
Year of Publication2012
AuthorsL. Quan, J. Heidemann, and Y. Pradkin
Date Publishedfeb

Parts of the Internet are down \emphevery day, from the intentional shutdown of the Egyptian Internet in Jan. 2011 and the results of natural disasters such as the Mar. 2011 Japanese earthquake, to the thousands of small, daily outages caused by localized accidents or human error. In this paper we present a new system to detect network outages by active probing. We show that a single PC can track outages across the entire analyzable IPv4 Internet, probing a sample of 20 addresses in all 2.5M responsive /24 address blocks. We develop new algorithms to identify and visualize outages and to cluster those outages into network-level events. We carefully validate our approach to active probing, showing consistent results over two years of observations taken from three different sites. Using public BGP archives and news sources we confirm 83% of large events. We also examine a random sample of 50 observed events, confirming prior work showing that small outages often do not appear in control-plane messages, since only 38% of small events include partial control-plane information. Emulating controlled outages, we show that our approach detects 100% of full-block outages that last at least twice our probing interval. We show that our system is significantly more accurate than prior approaches that use a single representative for each routed block, cutting the number of outage mis-classifications from%7e44% to under%7e8%. Finally, we report on Internet stability as a whole, and the size and duration of typical outages. We find that about 0.3% of the Internet is likely to be unreachable at any time, suggesting the Internet provides only 2.5 ``nines'' of availability. By providing a baseline estimate of Internet outages, we lay the groundwork to evaluate ISP reliability.