Chapter 4

A Mirage of NTP

This is an application of the Mirage model to an existing protocol, the Network Time Protocol (NTP). We chose this protocol for its simplicity and its prevalence. Here we apply the abstract principles of Mirage and perform an experiment that helps further refine the Mirage model.

This analysis demonstrates the equivalence between latency variability and imprecision in the local perception of the remote state, thus extending the domain of the Mirage model to variable latency regimes. We also examined how Mirage relies on time measurements, due to the seeming contradiction in modeling a clock protocol with a time-based model.

We concluded that several optional components of the NTP specification are integral to our model of NTP in Mirage, and that these components should therefore be required. These components include the logical clock, peer dispersion, and data filter algorithms.

The Mirage model of NTP reduces to a hardware clock signal where delay variability is zero. This implies that the variability of the latency limits clock synchronization, rather than the overall latency magnitude. This reduction resulted in a refinement of the variability of the measured offset.

NTP has proven useful in demonstrating the application of some basic components of Mirage. Unfortunately, many of the novel aspects of Mirage are not applicable to NTP, e.g., guarded messages, state space partitioning, and communicability tradeoffs. Later chapters of this dissertation (m‑Net in Chapter 5, and m‑Scope in Chapter 6) demonstrate those components that NTP cannot.

4.1. An overview of NTP

We begin with a summary of NTP. Whereas there have been 4 versions of NTP (numbered 0 through 3), this exercise focuses on core aspects of the protocol, which have changed little in recent versions. Thus Version 3 is referenced here [Mi90b].

The Network Time Protocol is a protocol for synchronizing system clocks. It consists of a request/response engine for exchanging timestamp messages, a logical clock providing an adjustable time reference, and a set of algorithms for integrating sets of message responses to determine an appropriate adjustment.

The logical clock provides a mechanism for a user-adjustable time reference, if not already explicit in the nodeÕs operating system. The NTP logical clock engine is named the Fuzzball. The clock consists of an absolute offset, called an epoch, and a frequency ratio scale. The logical clock is defined as the epoch plus the hardware elapsed time multiplied by the frequency ratio, so an otherwise fixed system clock can be adjusted in both absolute offset and frequency. Drift, the variability in frequency, is not compensated for in the logical clock provided in NTP, and is assumed to be zero. Furthermore, the logical clock is described as ÔoptionalÕ in the NTP specification. Between two logical clocks, offset is the difference in epochs, skew is the difference in frequency. Dispersion is the known error in the local clock.

NTP also includes a mechanism for organizing a set of clock servers into a hierarchy based on clock strata, where stratum is a group of clocks at a specified precision. Descriptions of various strata are given in [Mi90b]. Lower strata (e.g., Stratum 1) describe more accurate clocks; strata range from 1 to 255, with 0 representing an unspecified stratum. Stratum 1 clocks are called primary servers; strata 2-255 are called secondary. Stratum 1 clocks are the root(s) of the timeserver hierarchy, and a stratum k server is defined as having a distance of k -1 to the nearest root.

The core of the NTP protocol is a client/server engine that stamps messages when sent and received. A message initiated at a host returns with 3 timestamps in its packet body (sender out ÔoriginateÕ, receiver in ÔreceiveÕ, receiver out ÔtransmitÕ) and a fourth stamp retained upon receipt of the message, but separate from it (sender in ÔinputÕ) (Figure 4.1). Received messages are automatically replied, and sent messages are periodically initiated according to local state information.

 

0 1

2 5

6 8

9 16

17 24

25 31

LI

VN

Mode

Stratum

Poll

Precision

Root Delay (32)

Root Dispersion (32)

Reference Identifier (32)

Reference Timestamp (64)

Originate Timestamp (64)

Receive Timestamp (64)

Transmit Timestamp (64)

Authenticator (optional) (96)

Figure 4.1

NTP message format

The local state information is combined with sets of replies. These replies are processed to prevent individual server clock errors from affecting clocks of their clients, where possible. Various combining and filtering algorithms provide this processing, and adjust the current local clock and maintain the clock server hierarchy. The hierarchy determines which servers will be consulted for future clock data.

There are two types of sets of message replies considered in NTP. Temporal sets of data are taken at different times from the same server; ensemble sets are taken at (nearly) the same time from a set of different servers.[1] Temporal sets are held in a shift register, and combined using a filter algorithm; ensemble sets are combined using the peer selection algorithm and the combining algorithm.

Other components of NTP include control messaging, authentication, and asymmetric modes of operation. The conventional symmetric mode of operation is outlined above; asymmetric modes include servers broadcasting to a set of workstation clients, and those clients, as well as root servers (read-only clocks). Control messaging provides user-level access to protocol state, for remote monitoring and manipulation. Authentication provides security and additional protection from Byzantine failures [Pe80]. These components are optional in the NTP specification. They are not considered in this analysis because they are supplemental to the basic operation of the protocol.

4.1.1. How NTP reads a clock

NTP reads a remote clock by sending a NTP message to the remote node (Figure 4.1), and waiting for a reply. Upon receipt of the reply message, the client/server engine pairs the incoming packet with the current local time (Ôinput timeÕ, a.k.a. Ôpeer receiveÕ), and processes it. The message is stamped at each reception and emission (Figure 4.2).

Figure 4.2

NTP message exchange

The four timestamps – originate, receive, transmit, and peer receive (input) – are used to compute round trip time and offset measurements. The round trip time (d) can be computed from the difference between the sender interval and the receiver interval (Equation 4.1). This formula appears in the NTP specification.

ÔClock timeÕ is defined as the average over an interval, i.e., local time is the average of the originate and input values, and remote time is the average of the receive and transmit values (Equations 4.2, 4.3). ÔOffsetÕ (q) is defined as the difference between the remote clock time and the local clock time, where a positive offset indicates that the remote clock is ahead of the local clock (Equation 4.4, Figure 4.2). This formula is derived differently than in the NTP specification, but the result is the same (Equations 4.5, 4.6).

 

Equation 4.1:   

Equation 4.2:   

Equation 4.3:   

Equation 4.4:   

Equation 4.5:   

Equation 4.6:   

The calculated round trip delay is ÔexactÕ, i.e., the four timestamps exactly determine the total round trip time. The offset calculation, however is not similarly exact. The NTP formula for offset makes several assumptions: (1) that clock time varies only linearly during the exchange, and (2) that outgoing transit time and incoming transit times are identical . Assumption (1) is required – were it not, the protocol would not be able to determine a clock value. The clock value is thus defined to be the linear interpolation of the send and receive times of the message.

The calculated offset is affected by the difference between the outgoing and incoming transit times. Outgoing transit time adds to the measured offset, whereas incoming transit time subtracts from the measured offset. Under no other assumptions, the one-way delays are bounded between 0 and the total round trip time (d) (Equation 4.7). This loose bound on delay results in a similarly loose constraint on calculated offsets (Equation 4.8). These bounds are shown visually (Figures 4.3, 4.4). Figure 4.3 shows that if outgoing delay is larger than incoming delay, the offset is incorrectly measured as larger than intended; if incoming delay dominates the round trip time (Figure 4.4), the offset is incorrectly measured as smaller (falsely negative).

Other boundaries exist in the specification that require local time lines to be non-negative, a receive time between the originate and transmit times, and a transmit time between the receive and input times as well.

 

Equation 4.7:   

Equation 4.8:   

Figure 4.3

Exchange slid forward (maximum offset)

Figure 4.4

Exchange slid backward (minimum offset)

4.1.2. NTP background

NTP relies on the UDP protocol for connectionless transport-layer services [Po80], i.e., unreliable datagram services. UDP provides user-level access to the IP protocol [Po81b] and is the basis of the request/response transport mechanism, but the NTP protocol does not require either IP or UDP specifically. NTP evolved from earlier time protocol extensions of the Internet protocol suite, specifically the Time Protocol [Po83] and ICMP Timestamp message [Po81c].

Various versions of NTP exist, beginning with the original RFC [Mi85], which included only data and packet formats, and the specification of the client/server engine. Version 1 [Mi88] added a self-organizing clock hierarchy and a logical clock algorithm, neither of which is integral to the protocol. Version 2 added authentication, control message capability, and asymmetric modes of operation, also supplemental to the protocol [Mi89b]. The latest NTP, Version 3 [Mi90b], includes an ÒoverhauledÓ local clock algorithm, and new algorithms for peer offset combination; it also refines the definitions of delay, offset, and dispersion.

We use Version 3 NTP for our discussion and analysis. A summary of the version enhancements of NTP appears in Table 4.1. ÒRequiredÓ indicates whether the NTP specification denotes the component as required or optional, and Òfirst versionÓ indicates the version number in which the component first appeared.

 

NTP component

Required?

First version

packet exchange engine

y

0

logical adjustable clock

n

1

self organizing hierarchy

y

1

overcome unreliable

y

1

peer select / filter

n

1

control message

y

2

asymmetric modes

y

2

authentication

n

2

Table 4.1

NTP versions and components.

4.2. Casting NTP into Mirage

Viewing NTP using the Mirage model (ÔcastingÕ) involves describing the state space of NTP and interpreting messages of NTP in terms of the state space transformations of Mirage (Chapter 2). We define the state space of a protocol by the variables that characterize the protocol operation, rather than by those of the protocol specification. We differentiate the state that the protocol manipulates from the state required to manage that manipulation, in the sense of first-order state and second-order state. First order state characterizes the state space of the protocol, whereas second order state helps define varying characteristics of the transformation functions.

The Mirage model contains numerous references to time as absolute, so it may seem confusing to apply it to a protocol that manages clocks. Prior work in clock synchronization has called a clock Òa function that maps real time to clock timeÓ [Ma85], so the ÔtimeÕ variable that NTP manages is just another linear state space. Clock protocols manage the agreement of a single variable (clock value), as it changes over time. Although time is used to manage the clock (e.g., in NTP using the polling interval), the effect of clock manipulations on the time variable of the protocol is ignored. NTP (and Mirage) assume that such variations are small in comparison to the intervals measured.

NTP endeavors to replicate a remote clock value that is assumed to be precise and accurate in its local space. The precision and accuracy of this clock are denoted by user classification, not within the protocol. The correspondence between the clock value and real time (true accuracy) is a semantic issue that is arguably not provable and that neither NTP nor Mirage proposes to address.

4.3. Resolution of domain differences

The Mirage model is designed for the domain where messages are delayed by a fixed, known amount, and the remote process exhibits variability in its computation. Conversely, NTP operates where messages are delayed by a variable amount (and potentially lost), and the remote process (clock) has very little intrinsic variability (error). We first describe how variability in message latency is equivalent to variability in computation at the remote node.

   In NTP a round trip message exchange specifies a delay and offset; delay is the round trip message latency, and offset is the difference between the clock values. Consider the case where two nodes are absolutely correct, but the communication latencies vary. The true offset is zero, but the measured offset is not.

Let the unidirectional message delay be a probability density function (pdf) approximated by a Poisson distribution (Figure 4.5). The round trip delay is the sum of the unidirectional delays, and can be computed by the convolution of the unidirectional delay with itself (Figure 4.6). The offset is affected by the difference of the unidirectional delays, because outgoing delay causes the remote clock to be measured as late, whereas incoming (return) delay causes the remote clock to be measured as early. The difference in these delays is the convolution of the unidirectional delay with a reverse of itself (f(-x), as in Figure 4.7). Both these formulae assume that the unidirectional delays are identical pdfs.

Figure 4.5

Unidirectional delay as a Poisson pdf[2]

Figure 4.6

Bidirectional delay is the convolution of unidirectional delays

Figure 4.7

Measured offset is unidirectional delay convolved with its reverse

The resulting bidirectional delay has an Erlangian distribution (by definition[4]) (Figure 4.8). The measured offset exhibits the effects of the difference between incoming and outgoing delay, and that even perfect clocks are measured as statistically variable when communication latencies vary (Figure 4.9). The consequent offset pdf is symmetric about the zero offset, and appears Gaussian in form. An asymmetric pdf would result if the communication latency were asymmetric (forward vs. reverse paths).

Figure 4.8

Bidirectional delay as Erlangian (N=2)

Figure 4.9

Measured offset as pseudo-Gaussian.

The bidirectional delay and measured offset of this system can be combined by an outer product, to show one possible structure of a delay/offset pair (Figure 4.10). The internal structure of such a graph cannot be determined from the delay and offset pdfs alone, because two one-dimensional pdfs underspecify the 2-dimensional delay/offset pdf. Even so, the outer product shows a structure similar to that of real NTP measurements, shown later. The result is that, even in the case of perfectly aligned clocks, a wide distribution of offset values occurs.

Figure 4.10

Probability vs. [delay, offset] pairs, density plot.

4.3.1. Analysis of delay and offset measurements

To verify these assumptions, we performed some measurements of NTP. We used a Unix[5] shell script to exchange NTP packets (with timeout) between a local host (in Pennsylvania) with a remote node in California (a Stratum 2 NTP server). Each set of measurements reflects 2200 NTP requests (with a failure rate of approximately 1.5%, or 33 lost requests), at a rate of approximately 200 per hour, for a total of 11 hours, beginning at the time indicated. ÒFriday 8amÓ indicates a peak period (Friday 8am – 7pm EDT), and ÒTuesday 8pmÓ indicates an off-peak period (Tuesday 8pm – 7am EDT).

The offset curves are Gaussian-like (Figures 4.11, 4.12), in which the peak measurements (Figure 4.12) have slightly higher variance than off peak (9.3% higher variance for peak).

Figure 4.11

Tuesday offset values (off-peak)

Figure 4.12

Friday offset values (peak)

The delay measurements are also as expected, approximating Erlangian distributions (Figures 4.13, 4.14). The off-peak measurement exhibits a smaller minimum delay (Figure 4.13). Both experiments show an unexpected quantization, especially because it appears in the delay measurements but not in the offset measurements, and both quantities are derived from the same timestamp sets (from NTP message headers).

   The quantization may be the result of using a fixed set of IP paths, except that the quanta are nearly exact units of 0.01 seconds (10 ms). We assume that this quantization was due to a fixed service interval in the local NTP server response loop. This conclusion is further complicated by the fact that some replies have delay quantizations that are slightly smaller than the 0.01 second quanta. Similar quantizations are manifested in ensemble averages, so we conclude that the code of our local NTP implementation is the most likely suspect. Further analysis of NTP may prove this conclusion, but is beyond the scope of this dissertation.

Figure 4.13

Tuesday delay values (off-peak)

Figure 4.14

Friday delay values (peak)

Regardless of the quantization effects, the curves are bounded by Erlangian-shaped envelopes. Again the peak values have a slightly higher variance (15.7% larger variance for peak), although this is because the peak delay pdf has a longer ÔtailÕ of values of higher latency. The peak pdf is has a larger minimum delay, and delay values are grouped more tightly about the mode than in the off-peak. The higher latency is the result of higher network loads during peak times. The more tightly grouped latency values are most likely the result of side effects of network load on routing information accuracy.

The NTP implementation we tested is supported by the IP datagram transport protocol. IP routing information is maintained as a side effect of IP datagram transmission [Co91a]. Higher network loads result in more accurate routing information, which in turn results in more precise (i.e., repeatable) latency. Lower network load can permit routing table inaccuracies to persist longer, so that datagrams experience larger variability in route paths, in turn increasing delay variability.

These measurements indicate that: (1) there is a minimum delay that is not affected by the latency variability, so there exist tighter bounds on the measured offset, and (2) forward and reverse path delays are not equal. The distributions of delay/offset pairs for both peak and off-peak data sets are shown in Figures 4.15, 4.16, respectively. The previous plots of offset and delay are the (respectively) horizontal and vertical projections of these plots (Figures 4.13, 4.14, 4.11, 4.12). For comparison, we also present a 3-dimensional plot of the off-peak values (Figure 4.17).

The round trip latency can be partitioned into two components – a fixed minimum, and a variable additional delay, where each component is non-negative (by definition). The corresponding fixed minimum components of the two alternate bounds are plotted as vertical lines (gray and dashed).

Figure 4.15

Delay v.s offset, time-repetition density (off-peak)

Figure 4.16

Delay vs. offset, time-repetition density (peak)

In these plots, the NTP bounds (Equation 4.8) are shown as solid diagonals. These bounds appear too loose, and are based on the assumption that one-way latency is bounded by the total round trip latency only. Shifting these bounds to the right yields more appropriate boundaries to the displayed data; two such alternate bounds are shown (loose in gray diagonal, and more restricted in dashed diagonal lines). These alternate boundaries suggest reinterpretation of the bound on the difference between outgoing and incoming latency, and their effects on the measured offset values.

Figure 4.17

Delay vs. offset vs. probability, time-series (off-peak)

Off-peak measurements have larger minimum delay components, but there is less variability in our visual estimation of the minimum delay, compared to peak measurements. Again, this can be explained as an effect of the routing table maintenance, where light network load results in low delay, but in high variability in delay, because route information is updated less often. High load causes high delay, with low variability because heavy traffic helps maintain accurate route information.

The partitioning of round trip latency into non-negative fixed and variable components (Equation 4.9) can be incorporated into the previous constraint equations (Equations 4.7, 4.8), resulting in Equations 4.10, 4.11.

 

Equation 4.9:   

               where 

Equation 4.10: 

Equation 4.11: 

Visually, the effects of this partitioning of the delay can be shown by modifying the NTP message exchange diagram (Figure 4.2) to Figure 4.18. Black portions indicate fixed components of the delay, and gray indicates variable components; the remote interval is bounded between the fixed components of latency. The bounds on the offset are more tightly restricted as a result (Figures 4.19, 4.20), compared to the earlier diagrams (Figures 4.3, 4.4).

Figure 4.18

Fixed (black) and variable (gray) delay in message exchange

Figure 4.19

Exchange maximum offset, variable delay

Figure 4.20

Exchange minimum offset, variable delay

Thus far the delays have been assumed identical on the forward and reverse path of a message. If this were the case, all measurements would exhibit a symmetric offset distribution. Temporal data sets do exhibit this behavior, but ensemble data sets are heavily skewed toward positive offsets (Figure 4.21). This skew is the result of a larger average forward latency, which is the result of routing effects in the underlying IP transport mechanism.

An ensemble data set consists of a set of unique message destinations. Each message is routed to the destination and back. The reverse path of a message occurs by an updated and presumably shorter path than the forward message, because the forward message updates routing information as it is delivered. Forward messages are routed by comparatively stale information, exhibiting longer routes as a result.

Lower strata (e.g., Stratum 1) servers exhibit large offset skews, which disappear as the stratum increases (Figures 4.22, 4.23, 4.24, 4.25)[6]. Offset skews should tighten as strata decrease due to higher precision in local clocks, whereas empirically, offsets tighten and become more symmetric as strata increase. Both the tightening and more symmetric nature of higher strata can be attributed to the locality (and thus known routes) of lower strata servers, whereas higher strata servers were typically more distant; this is a side effect of the NTP server hierarchy and its partitioning of siblings according to their parent and topological access criteria. Empirically, lower strata servers were more distant than higher strata servers.

Figure 4.21 also includes NTP bounds (black diagonal lines), violations of the NTP bounds (circles), and a possibly more constrictive bound (gray diagonal lines) resulting from an assumed latency minimum (gray vertical line).

Figure 4.21

Delay vs. offset, entire ensemble (all strata)

Figure 4.22

Stratum 1 ensemble

Figure 4.23

Stratum 2 ensemble

Figure 4.24

Stratum 3 ensemble

Figure 4.25

Stratum 4 ensemble

4.4. Description of NTP in Mirage

The following is a description of the variables of NTP, (some individual, some groups), and their description as Mirage indicates (Table 4.2). Local state denotes the protocol state space as Mirage considers it, and remote perception is the value obtained by communication with a remote node, i.e., part of the nodeÕs total view. Computation function parameters denote those components of NTP that are part of the Mirage time-transformation function. Protocol management denotes components that affect the boundaries of the computation function, and that govern the send actions. These last management operations are part of communicability, and the optimization thereof.

 

NTP ÔstateÕ[7]

NTP description

Mirage description

(time)

current clock value

local state

(frequency scale)

clock frequency adj.

comp. func. params

drift

frequency variability

comp. func. params

wander

drift variability

comp. func. params

offset

remote clock shift

remote perception

delay