Networking and Cybersecurity

NETWORK AND SECURITY MEASUREMENT, ANALYSIS AND DEFENSES

The ANT lab has been developing new methods to generate and share network data with researchers for more than a decade. We draw on data from network traffic—anonymized packet headers, controlled testbed experiments, traffic flow data, and curated data such as distributed denial-of-service (DDoS) attacks; Internet scanning—censuses of all IPv4, data about IPv6, network topology, Internet outages; and application level data—anonymized Domain-Name-System queries, anycast mapping.

New Data Collection: Near-Real Time Network Outages

New in 2018 was our development of near-real time detection of network outages. Since November 2013, we have been observing Internet outages with Trinocular. Trinocular observes the Internet from six locations around the world, and when combined, provides a picture of global IPv4 reliability. 

Internet outages in the Carolinas, 24 hours after Hurricane Florence made landfall

We have added streaming data processing to Trinocular, providing reports of outages within about one hour of their onset. (In our prior work, results were computed with large, batched computations every few months.)

We visualize these results in a website at https://outage.ant.isi.edu. Outages are useful to understanding how the U.S. infrastructure reacts to natural disasters such as hurricanes.

This figure shows Internet outages resulting from Hurricane Florence's flooding and high winds in September, 2018.

We describe potential uses of outage detection to understand government policies in our paper "The Policy Potential of Measuring Internet Outages" in the Proceedings of the TPRC, September 2018. Our outage website was initially developed with support from an ISI Michael Keston Endowment and DHS. It is being extended by the NSF and DHS.

 
Dataset generation, distribution, and popularity

Data to Researchers

The ANT lab has been developing new measurement methods for more than a decade, with support from the DNS and NSF. Between 2006 and December 2018, we provided 1798 datasets (733 TB of data before compression).

This graph shows datasets generation (colored regions) and distributions (dots), and it shows the popularity of our datasets. Each dot is a dataset being sent to a researcher. The data shows several interesting things: the bands at the top show very popular datasets over the entire period, like our curated distributed-denial-of-service (DDoS) attacks. The middle blue areas show our regularly generated Internet scans and our recent Internet outage data. Some of these are subscribed to by external researchers, while other times (the vertical towers of dots) show a research group requesting our entire "back catalog" of multiple years of data to carry out longitudinal analysis.  

 

Understanding Privacy of Network Data

Sharing data requires careful attention to the privacy of network users. ISI has been studying these issues for some time, and has been a leader in understanding and improving privacy of shared DNS data.

In 2018 we distributed new libraries for CryptoPAN, an algorithm initially developed at Georgia Tech. We pioneered its use for DNS, and our library is in use with open-source software from us and from DNS-OARC, and industry research group.

In addition, we have advanced the discussion around DNS privacy through two papers at the NDSS Workshop on DNS Privacy in February 2018: "Analyzing and Mitigating Privacy with the DNS Root Service" and "Enumerating Privacy Leaks in DNS Data Collected Above the Recursive."

Network Traffic Identification for Cybersecurity

Identifying an application from its traffic has been a topic of interest for decades, and many techniques have been proposed. Unfortunately, many current techniques fail on encrypted traffic as the features they rely upon become obfuscated. Modern methods to detect applications within encrypted flows generally utilize machine learning approaches, including k-means clustering, k-nearest neighbors and hidden Markov models. The principal challenge to identifying encrypted applications is defeating the obfuscation caused by encapsulation and potential multiplexing of multiple application flows. Additionally, performing traffic analysis and classification at line-rates is challenging. Our research looks at transformation of flows into waveforms, enabling signal and image processing techniques for rapid application identification without isolating individual flows. In pre-transformation, one is looking at various methods of featurization, including extracting blocks of common application patterns, such as "bulk transfer" and "parallel bulk transfer," and higher-level patterns such as "TLS renegotiation" or "stream buffering." Preliminary results show that these approaches can rapidly identify known applications and have the potential to detect instances of new, distinct applications. These techniques can also be used to identify and isolate anomalous behavior within a flow or across multiple flows. This work has been applied to the GAWSEED project (DARPA's CHASE program) and will be used in the recently awarded APROPOS project (DARPA's Searchlight program).