![]() |
|---|
Binary format version 3 is used for surveys and censuses starting it29*-20091102.
Dataset is divided into one or several files, each file named after the probing machine used. Each binary files are compressed using bzip2 and contain records of two types: DATAv3 and TEXTv3. DATA-type records describe the results of a single probe, while TEXT-type records can store arbitrary text (metadata). All fields stored in big endian byte order.
| Field Name | Byte Length | Description |
|---|---|---|
| Type | 1 | =5. This is type of DATAv2 record and it's always set to 3 |
| Length | 1 | =24. This is the length of DATAv2 record and it's always set to 24 |
| ICMP reply type | 1 | Type of ICMP message received. This would be 0 for echo reply, 3 for destination unreachable, and 8 for no reply. Other types are possible, see rfc792 for details. |
| ICMP reply code | 1 | Code of ICMP message received. Typical values are:
|
| Reserved | 2 | =0, Reserved for future use. |
| Flags | 1 | Flags marking this record:
|
| TTL | 1 | Remaining TTL of the response: this field
is copied from the IP header of the response packet and
should be equal to initialTTL minus Number of hops from
probed host to probing systemt. Because we don't
know what the TTL was initalized with, the number of hops
is only to be guessed at. Special case: for type 3 responses, the original datagram of the probe is included with the reply. In such cases, we set this field to the value from the original probe IP header. This means that for type 3 the hop distance is 64 minus this value. |
| Timestamp | 4 | Sent (if not available, received) in seconds since the Epoch See also Pcap Capture for other semantics of this field. |
| RTT | 4 | RTT in microseconds from the time probe was sent until we got a response. If we were unable to match a probe with a response record, this is set to zero. See also Pcap Capture for other semantics of this field. |
| Probe IP | 4 | IP address of the probe records. If the reply could not be matched to a probe, this field is set to zero. |
| Response IP | 4 | IP address of the response. If this is a probe that timed out without reply (ICMP reply type is 8), then this is set to zero. |
| Field Name | Byte Length | Description |
|---|---|---|
| Type | 1 | =6. This is type of TEXTv2 record and it's always set to 4 |
| Length | 1 | =24. This is the length of TEXTv2 record and it's always set to 24 |
| Text | 22 | Arbitrary text (metadata) |
Text is in UTF-8 encoding (in practice, it uses only the 7-bit ASCII subset). Text shorter than one record should be NUL-padded. There is no guarantee that text will always include a trailing null (text exactly 22 characters long will fill exactly one record). Text longer than one record should be stored in multiple records; those records will be concatinated on output, and intermediate records will not include NUL characters.
ICMP_UNREACHABLEs sent to us in response to our ICMP_ECHO_REQUESTs should contain a copy of the received IP header and the ICMP_ECHO_REQUEST. We shall call this copy a reflected request. In the v2 of the application/data format we implicitly trusted the destination address of the reflected packet (from the ip header) to be the probe address. Thus, when we received an ICMP_UNREACHABLE, we used this address to look up in prober's cache and if found, match the response to the request (one indication of having matched a response would be a non-zero RTT). So far so good. However, if no match was found, we just assumed that the probe address was the destination address of the reflected packet and recorded it as such in the data. In some cases it turned out to be not very reliable, apparently because of NAT. In the examples that we examined manually, we saw the reflected destination address re-mapped to private address space (10/8), so NATs were modifying IP addresses in "outer" IP headers, but not in the reflected (embedded) IP headers. As a result, we had probe records containing unreliable (often private) addresses in the probe-field.
Sample code for reading and printing binary records is provided:
When printed, the ICMP reply typ and reply code are often output as a typeandcode 4-byte hexadecimal field. Common values (in decreasing frequency):
| typeandcode | meaning |
|---|---|
| 0800 | no reply |
| 0000 | echo reply---the expected result with a host present |
| 03xx | error reply---the xx show the ICMP reply code more detail |
Because probing and the world is complicated, researchers should take care with how they interpret probeaddr and replyaddr.
replyaddr is always taken directly from the return packet.
However, replyaddr cannot always be trusted for two reasons. (1) return addresses can be forged, and (2) the return address can be different from the the original target addresss of the probe if the target is multihomed or is a NAT with port forwarding. We do not know how common these cases are, but we believe they occur in a few percent of reply records. Because of these cases, though, one can get replyaddr values that are different from the network that is probed. (For example, we see some replyaddrs in 10/8 and we never probe it.)
The probeaddr is handled more carefully because we usually know what addresses we probe. We describe the exact algorithm we use to compute the probe address below (v2 packet matching and v3 package matching).
In general, probeaddr can be fully trusted if (1) typeandcode == 0x0000 and probeaddr != 0x00000000 (a correct, matched reply), or (2) if typeandcode == 0x8000 (timeout with non-reply), or (3) type == 0x03 && (flags & 6) != 0 (an ICMP error where we could confirm the reply address).
Probeaddr may be less trustworth if type == 0x03, ICMP error reply. If type == 0x03 && (flags & 6 == 0) then we were not able to match the packet, and probeaddr may be incorrect (if present, it was taken from the reply packet, but not all OSes reflect the original echo request back in the reply, and NATs can interfere).
As a result, users of this data may wish to filter out and discard untrustworthy records (as defined above).
In the v2 of application/data-file we used this algorithm to match ICMP_UNREACHABLE responses:
1. if match_in_output_cache(reflected_header->dst_ip):
then output with
out.probe_ip = reflected_header->dst_ip
out.response_ip = src_ip
out.rtt = calc_rtt()
2. else
out.probe_ip = reflected_header->dst_ip
out.response_ip = src_ip
out.rtt = 0
As discussed in the previous section, we only attempted to match reflected header and if
there was no match, we trusted the reflected header's destination address to be the probe
address.
Version 3 of the application/data-file adds 2 additional flags:
#define IPR_FLAG_MATCH_RH 0x02 #define IPR_FLAG_MATCH_SRC 0x04They are used to capture three possible outcomes of the matching of ICMP_UNREACHABLEs process:
1. if match_in_output_cache(reflected_header->dst_ip):
then output with
out.probe_ip = reflected_header->dst_ip
out.response_ip = src_ip
out.rtt = calc_rtt()
out.flags |= IPR_FLAG_MATCH_RH
if reflected_header->dst_ip == src_ip:
then
out.flags |= IPR_FLAG_MATCH_SRC
(i.e.: first try to match the reflected header and if it's found:
PROBABLY_CLEAN if reflected_header->dst_ip == src_ip
MAYBE_MULTI_HOMED if reflected_header->dst_ip != src_ip)
2. else if match_in_output_cache(src_ip):
out.probe_ip = src_ip
out.response_ip = reflected_header->dst_ip
out.rtt = calc_rtt()
out.flags |= IPR_FLAG_MATCH_SRC
(i.e.: next try to match to source of response:
PROBABLY_NAT if out.flags & IPR_FLAG_MATCH_SRC)
3. else:
out.probe_ip = reflected_header->dst_ip
out.response_ip = src_ip
out.rtt = 0
(i.e. none of the two flags set, really: PROBABLY_SPURIOUS_REPLY, we also
save such entire packets in a pcap trace)
In summary, when analyzing a record of an ICMP_UNREACHABLE, consider:
probe_addr = src_ip
reply_addr = src_ip
rtt = ...
probe_addr = reflected_header->dst_ip
reply_addr = src_ip
rtt = ...
probe_addr = src_ip
reply_addr = reflected_header->dst_ip
rtt = ...
probe_addr = reflected_header->dst_ip
reply_addr = src_ip
rtt = 0
Such packets are saved in a pcap trace.
#define IPR_FLAG_COOKIE1 0x08 #define IPR_FLAG_COOKIE2 0x10Ther meaning is as follows:
COOKIE2, COOKIE1: 00 -cookie not tried (backward compatible with older v3 censuses/surveys COOKIE2, COOKIE1: 11 -cookie tried, matched COOKIE2, COOKIE1: 10 -cookie tried, notmatched COOKIE2, COOKIE1: 01 -cookie tried, not returned
| Field Name | Byte Length | Description |
|---|---|---|
| Type | 1 | =3. This is type of DATAv3 record and it's always set to 5 | Type | 1 | =4. This is type of TEXTv3 record and it's always set to 6 |
#define IPR_TYPE_DATAv1 0x1 /* XXX deprecated in future use */
#define IPR_TYPE_DATAv1_LEN sizeof(icmptrain_probe_record_datav1_t)
typedef struct icmptrain_probe_record_datav1_ {
/* all fields in network byte order */
uint8_t ipr_type; /* record type = IPR_TYPE_DATAv1 */
uint8_t ipr_len; /* record length = IPR_TYPE_DATAv1_LEN */
uint8_t ipr_reply_type; /* reply type (or IPR_REPLY_NOREPLY) */
uint8_t ipr_ttl; /* remaining ttl of the response */
uint32_t ipr_time_s; /* sent (if not available, received) seconds since the Epoch */
uint32_t ipr_rtt_us; /* us */
uint32_t ipr_probe_addr; /* probed address */
uint32_t ipr_reply_addr; /* if different from probe_addr, or 0 */
} icmptrain_probe_record_datav1_t;
TEXTv1 records are represented by the following structure:
#define IPR_TYPE_TXT_v1 0x2 /* XXX deprecated in future use */
#define IPR_TYPE_TXT_v1_LEN 255
typedef struct icmptrain_probe_record_txt_v1_ {
uint8_t ipr_type; /* = IPR_TYPE_TXT_v1 */
uint8_t ipr_len; /* = IPR_TYPE_TXT_v1_LEN */
#define IPR_MSG_TXT_v1_MAX (IPR_TYPE_TXT_v1_LEN-2)
char ipr_msg[IPR_MSG_TXT_v1_MAX];
} icmptrain_probe_record_txt_v1_t;
