ANT provides a number of datasets
in different formats.
Getting Datasets
See our separate datasets requests page for steps to take to get access to our data,
Alternatively go straight to our
full
list of our datasets,
or see below for the categories of data that we have and the formats they are in.
Dataset Categories
Our datasets come from several different categories of data types:
Address Space Allocation Data contains Internet addresses that have some properties that characterize Internet topology (for example, addresses that respond with different codes, or that appear to be dynamic, etc.). The IP addresses in this dataset are not typically anonymized because they are determined from measurement traffic and not actual sender-receiver communications and so are not associated with specific individuals. This data can be used to better understand the Internet topology and address usage.
Specific sub-categories of address space allocation data include:
We also have some Internet Topology data: Internet topology data is created by a program that tries to map the Internet. The program is able to determine which routers are capable of talking to other routers. Internet topology data only shows router connectivity within the Internet core and to external enterprise borders; it does not contain any identifiable information or internal enterprise topology information. This dataset can be used for worm outbreak modeling and simulation, worm containment and countermeasures, zombie distribution for DDoS attacks, vulnerability assessments, longitudinal studies of the evolution of Internet topology and address distribution, Internet topology and address map inference. (Example: internet_router_map_planetlab-20030412).
Much of our data is IP Packet Headers: These datasets are comprised of headers of traffic data, containing information such as anonymized source and destination IP addresses and other IP and transport (e.g., TCP, UDP, ICMP, SCTP) header fields. No packet contents are included. Depending on the specific dataset, this category of data can be used for characterization of typical Internet traffic, or of traffic anomalies such as DDoS attacks, port scans, or worm outbreaks.
We also have traffic flow data: Network traffic can represented as flows between two endpoints. This dataset contains traffic flow information, which includes a variety of attributes such as source and destination IP address, source and destination port, protocol type, and packet and byte counts. This data can be in different formats generated by a range of different collection tools such as netflow, IPFIX, and argus. IP addresses in these files are anonymized on a per-dataset or per-time interval basis. These datasets are useful for research such as network economics and accounting, network planning, analysis, security, denial of service attacks, network monitoring, as well as traffic visualization.
We expect to have data related to Unsolicited bulk e-mail (UBE) or spam constitutes a significant fraction of all e-mail connection attempts and routinely frustrates users, consumes resources, and serves as an infection vector for malicious software. The collection and analysis of datasets in this category enable a wide range of research including: characterizing spam trends, detecting bots and development of spam mitigation algorithms. These datasets may include spam logs collected at individual organizations, reputation lists data such as those provided by Spamhaus, SORBS, and others, and e-mails, including both headers and contents, captured at spam traps or otherwise specifically identified as spam. It may include IP addresses or e-mail addresses of suspected spammers and potentially known spam e-mail message contents. Datasets in this category may be anonymized.
Anycast enumeration datasets contain active probing information to DNS anycast services such as root DNS. Typically probes are made from many vantage points with the goal to enumerate all anycast nodes in the service. Anycast enumeration datasets are useful to understand the operational status and geographic reach of anycast services and nodes.
For detail of the dataset, please refer to the description page.
Datasets in this category record information about Internet outages--address blocks that become unreachable. Typically outages are inferred from active probing. It may include /24 block-level outages over time, or lists of inferred outages that affect larger parts of the Internet. Outage data can be useful to understand Internet reliability.
For detail of the dataset, please refer to the description page.
In addition, we document the formats of our datasets here:
If you would like to request any of the datasets above, please
send email to