University of Southern California

62 Days + Almost 3 Billion Pings + New Visualization Scheme = the First Internet Census Since 1982

October 8, 2007

Researchers at the University of Southern California Information Sciences Institute, one of the birthplaces of the Internet decades ago, have just completed and plotted a comprehensive census of all of the more 2.8 billion allocated addresses on the Internet -- the first complete effort of its kind in more than two decades, they say.

"An Internet census," explains John Heidemannn, an ISI project leader who also has an appointment in the USC Viterbi School of Engineering computer science department, "is just that: every single assigned address in the entire Internet was sent a probe."

The technical name for an Internet probe, more commonly called a "ping" is an "Internet Control Message Protocol (ICMP) echo request packet." It took some 62 days to send almost 3 billion of these from four machines, an effort carried out by Heidmann's ISI collaborator Yuri Pradkin.

A detailed account of the research is at http://www.isi.edu/ant/address/index.html

Many (61 percent) of the pings received no response at all. Many others got a "do not disturb" or "no information available" response that many network administrators program into their routers and firewalls. Some of the non- replies were probably also due to firewalls intentionally blocking the pings. Still, as the census went on, millions of sites did respond, positively and negatively, and a unique Internet atlas took shape.

Below: Pradkin, left, and Heidemann. (click image for larger view)

The atlas is not geographic, though geographic areas (North American, Europe, etc) show up on it. Instead, it is numerical, building on the mathematical structure of the Internet address system.

Each Internet address is a number between 0 and 2 to the 32nd power (4,294,967,295), usually written in "dotted- decimal notation" as four base-10 numbers separated by periods; for example 128.150.4.107. Each number represents one 8-bit part of the whole address.

These addresses appear in the chart as a grid of squares, each square representing all the addresses beginning with the same first number ("128," in the preceding example). The map is arranged in not in simple ascending numerical order, but instead in a looping pattern called a Hilbert curve, which keeps adjacent addresses physically near each other, and also makes it possible to zoom seamlessly in to show greater detail. "The idea of using a Hilbert curve actually came from a web comic, xkcd," said Heidemann.

The smallest feature the map shows is a singe pixel, which is records averaged responses from some 65,536 (2 to the 16th) addresses. The averaging is conveyed by color coding, with all-positive responses showing up as brilliant green, all- negative as brilliant red, equal numbers as brilliant yellow, with brilliance decreasing down to dim shades in areas where fewer addresses respond.

The map presents a novel census view of the visible Internet. "To our knowledge," said Heidemann," the only other census of the Internet was in 1982," when the Intenet consisted of 315 allocated addresses.

Heidemannn and Pradkin have also plotted a second rendering where each pixel represents a single address. When printed out at laser-printer resolution, this map that literally shows every address in the Internet took up a 9x9 foot space on a corridor wall in at a recent conference. (see photo below)


The project is continuing. Heidemann hopes to continue censuses to create not just a snapshot -which is what the current map is - but a dynamic movie of Internet evolution, which can aid in detecting and monitoring trends. He and his collaborators are intensively studying the census results working toward this goal.

While the new census is the first they have visualized. ISI has been taking censuses since 2003, when Pradkin and Joseph Bannister (of ISI) and Ramesh Govindan (of the USC Viterbi School of Engineering, started collecting data. Their hopes were to study the growth of the Internet, and their group is still processing this data to look for trends.

"Internet census data is useful for several reasons", Heidemannn says. "As the Internet use becomes widespread, we are running out of Internet addresses-good predictions by Geoff Huston suggest all addresses may be allocated as soon as early 2010. The IETF (Internet Engineering Task Force, the technical body that manages the Internet) has anticipated this since the 1990s and designed a new protocol, IPv6, to solve this problem, but deployment has been slow. Our data can help illustrate the need to move forward."

It's hoped tha tthe census also can improve Internet security. In fact, the Department of Homeland Security "supported our work with the goal of improving network security," said Heidemann, pointing to the work of ISI researcher Jelena Mirkovic that is using this census data to study how worms spread in the Internet. Other researchers have plotted maps of where cyber-attacks originate.

"There's also a sense of discovery in these maps", Heidemannn says. "We've built a huge Internet and use it every day. Like the far side of the moon, wouldn't you like to know what it looks like?"

The census was undertaken by the Ant project, a research group, according to its web site, " spanning USC/ISI, the USC and Colorado State University Computer Science Departments, the USC Electrical Engineering department, and USC's Information Technology Services,. We're looking at novel ways to examine network traffic."

More details about the census project and the full-scale map are at http://www.isi.edu/ant/ address/whole_internet/

ISI was one of the original nurseries of the Internet, playing a key role in the development of the domain name system and other features. ISI computer scientist Jon Postel (1943-1998) directed the Internet Assigned Numbers Authority for years.

The Department of Homeland Security and the National Science Foundation supported the research.