USC / Information Sciences Institute (touch@isi.edu)
Abstract
The First IEEE Gigabit Networking (GBN) Workshop defined a set of characteristics of "interesting" high-speed applications. The GBN criteria ensure that the application addresses a significant problem, and that it actually requires a gigabit network. This paper presents five challenges that augment the GBN criteria. These challenges ask whether gigabit applications require new research into different protocols, or can be supported by existing protocols that merely run faster.
During the past several years the networking research community has been considering the problem of gigabit protocols, especially how they differ from their slower counterparts [9]. There are two primary issues - increased speed or performance of existing protocols, and domains where existing protocols may not suffice. This paper characterizes the latter by a list of challenges developed to complement the GBN criteria.
This paper first summarizes the GBN criteria and justifies the need for additional challenges. The challenges are then presented. Finally, as a challenge unanswered is not entirely useful, an application is presented that survives these challenges. This is done to open the door to broader consideration of some unconventional trade-offs that use bandwidth as a resource (see Note) rather than as a constraint.
Criteria B, C, and D ensure that gigabit bandwidth is required, and is not a result of aggregation of users or groups of independent applications.
The first four criteria ensure that the application addresses substantial user communities and requires gigabit bandwidth. There are many applications that satisfy these criteria but are considered "uninteresting," because they can already be implemented with existing protocols. Criterion E is an attempt by GBN to filter some of these out.
The GBN criteria are conditions where gigibits are useful, but not where they are necessary. Criterion E does not sufficiently exclude classes of gigabit applications for which solutions already exist. This is why this paper augments these criteria with further challenges.
The application that exhibits this goal is the World-Wide Web (WWW) client server system [2]. It exhibits gigabit requirements when real-time interactive constraints are added [13].
The WWW is emerging as a dominant (and thus realistic) consumer and business application [2]. Although originally developed as an interface to Internet navigation of file transfer client/server systems, its current use is evolving towards a distributed interactive application. As such, the requirements on response time are narrowing. As users begin to expect real-time interactive response, they demand response times in the 100-200 ms range (see Note).
Whereas its minimum user bandwidth is not many Mbps for conventional client/server operation, it can be for interactive use. Consider a 100 ms transmission and switching latency. That leaves around 50 ms for request creation, server parsing, server retrieval, and server response. Assuming that the other components consume a minimal 10 ms, only about 40 ms remain for file transmission itself.
The typical WWW HTML (hypertext-only) document is only around 6 Kilobytes, but small images can be around 60 Kilobytes. Transmitting a 60 KB file in 40 ms requires 12 Mbps. As the file sizes increase to 200 KB (for larger still images), the bandwidth increases to 40 Mbps. Whereas the minimum bandwidth to the user is not many Mbps yet, as the demand for still images and short video clips increases, so too file sizes increase. As other latencies remain constant, the bandwidth requirements increase as a result. And the expectation of interactive response time is growing. This is an issue that "rides the technology curve."
The minimum potential base of WWW is easily millions of simultaneous users, and the aggregate bandwidth could be in excess of 1 Tbps as well, satisfying the other GBN criteria.
There are an average of 10-20 hypertext links per page and 6 Kilobytes per file for Web pages (measured). If the user spends 20 seconds reading each page, the entire contents of every possible "next page" can be sent in 20 links * 6 KB/link / 20 s = 50 Kbps. If the user scrolls through at 1 page per second, that increases to 1 Mbps.
These are modest bandwidth requirements, but they increase where the Web pages have images, averaging 60 Kbytes per page, or where the pages have embedded "IMG" icons, PostScript (100 Kbytes average), or executable binaries (200 Kbytes average, at NCSA). Even higher bandwidth is required for video clips (1 Mbyte), around 160 Mbps. When a user begins poking around video archives, the interaction speed increases to 0.2 seconds, resulting in a single-user bandwidth requirement of 800 Mbps.
Further, users currently pay for maximum bandwidth, but not per-packet. This should continue (the PTTs use this as an design criteria). Unused bandwidth is wasted - there is not necessarily a need to charge for extra bandwidth. Use of this bandwidth to reduce the user-perceived latency is a win.
These WWW modifications are also useful in low bandwidth environments, where the channel is idle in-between requests. The common characteristic is that of surplus bandwidth-delay product. In one case, the product is "vertical" - the increase in BW results in a "tall" pipe that can not be filled. In the other case, the product is "horizontal" - idle periods form longitudinal gaps in the pipe that are not filled. The solutions proposed here enable interactive WWW applications that require surplus bandwidth-delay product, regardless of which type.
One solution is to use multiplexing to share the channel among user processes [8]. This is a parallel of process-swapping in OS - when one process runs out of data to send, another is activated. This works, provided that the process activation is deterministic, i.e., that the other side of the channel knows what the process activation order is [11]. This is also tantamount to not having a gigabit application protocol - each application does not use a gigabit channel, so bandwidth needs to be aggregated over a set of multiplexed applications.
The case where the application order is not predictable is more interesting, as will be discussed later.
The questions this paper addresses are:
Challenge #1 - Increase the Clock Rate
One easy way to increase the speed of a protocol is to increase the clock rate of the processor, interface, and transmission lines. If the protocol is slow because you are using CMOS processors, it might work fine in ECL technology.
There are a few issues here. First, fast protocols require fast operating systems, fast interfaces, fast transmission lines, fast memory, fast disks, and fast everything else. If the problem is completely solved by speeding up the clock rate of the system, then this is not a protocol issue. It is a clock rate issue.
Also consider the fact that fast TCP code relies on precompiled branch prediction (so-called "fast-path"), a very old and well known optimization technique. Some of the other protocol stack optimizations are just recognition that general interfaces are slow, and specific interfaces can be optimized, also a well-known tradeoff. None of these change the protocol - they are implementation enhancements.
There is a need to distinguish between protocol issues and issues of overall speed. If the protocol itself is not keeping up with the speeds of the rest of the system, then protocol issues are indeed involved. Parallelization addresses this case, but does not always help [14].
In general, data path parallelization works well, but control path parallelization is, by definition, poor. Control paths can not be factored efficiently without synchronization between components, which adds overhead that defeats the parallelization.
Data path parallelization is a way to speed the "clock rate" of the data. Control parallelization works where packets are unrelated - e.g., UDP, but not for TCP-like protocols [14]. In the latter case, regardless of partitioning (per packet, per function, etc.), the parallelism is limited to about 5 processors per connection.
The real issue is that of "protocol relativity" [12]. A protocol does not know the clock rate - only the number of bits in transit between components. Protocol speedup is a control and feedback issue, sensitive to the bandwidth-delay product only.
Challenge #2 - Multiplex (Deterministic)
Multiplexing has been proposed as a solution to the "do not have enough stuff to send" issue, as mentioned before [8]. This is equivalent to not having a gigabit application - the gigabits are shared among a set of applications on a workstation. Using multiple hosts, users, or processes are all ways of providing aggregate gigabits only.
In the deterministic multiplexer case, this avoids the domain examined here [11]. In the nondeterministic case, the problem has just moved down a level in the protocol stack. In the original case, there was not enough data to send, primarily because data was sent to be sufficient for the current state of the other end of the channel, and not any possible subsequent states. If there were enough for subsequent states, that data would have been sent too, and so on, increasing the amount of data available to send ad infinitum. This is how sliding windows works - by predicting subsequent states, in a linear manner.
If the subsequent state is not predictable, neither is the subsequent data [12]. It does not matter whether it is the application state, or the multiplexer state. Nondeterministic multiplexing moves the state prediction problem to the multiplexer-synchronization level, i.e., lower in the protocol stack.
Challenge #3 - Use Large Payloads
Using large payloads is another way to shut off the protocol, and increase the effective speed of the control protocol [13]. Versions of TCP run at 1 Gbps by using 64Kbyte packets, i.e., by this technique (e.g., UltraNet).
Using large payloads slows down the control protocol. Header frequency determines the rate of the protocol. The ratio of header to payload determines the stability of the protocol [14].
Large payloads are also an attempt to amortize the cost of context switches. Increased payload transfers between the host and network reduce the effective overhead of the transfer setup costs. This is an attempt to overcome an existing problem in the host - if context switches are this costly, making network I/O faster is the least of the worries.
Challenge #4 - Increase Window Size
As mentioned before, increasing the window size increases the amount of information an implementation of a sliding-windows protocol can pipeline. This helps fill the pipe only if the application has sufficient data to supply. It addresses an implementation deficiency only. The main difficulties are backward compatibility and acquiring the consensus of standards bodies [5] [6] [7].
The window size parameter is also an example of a compile- or run-time parameter that is unfortunately treated as a specification constant. There are several such parameters - maximum protocol data unit, timeout values, window granularity and range, etc. The constancy of these parameters is a limitation of implementations only.
Challenge #5 - Relocate Everything
If the data can be copied or cached, i.e., if it is stable enough that there is no need for separating it from the application, then a copy can be put near the application via a low-speed channel to avoid the need for high-speed communication altogether. A protocol is not needed if there is no communication, or more precisely, if there is no feedback of state between two separated entities.
Another way to relocate data is to circulate it among the nodes of a protocol [4]. Variants of this protocol rely on predictive behavior of data reuse to govern caching [1]. These are useful techniques, but are protocol extrapolations of previously known methods.
One application that requires a gigabit protocol is an interactive client/server system with real-time response. Several years ago when the problem of latency and high speed protocols was analyzed, the conditions were specified under which bandwidth and bandwidth-delay product could be used to compensate for latency. The goal is to reduce the perceived latency, to give the illusion of low latency. This work began as "Mirage" (a model) and continues as "Parallel Communication" (a protocol based on Mirage) [12].
Latency compensation is possible using source-based anticipation (presending). The composition of two presending channels (back-to-back) is the more common receiver-based anticipation, i.e., prefetching. This differs because it is source-based.
There are several advantages of presending over prefetching. Presending distributes the computational effort between source and receiver. It also avoids unnecessary prefetch messages from the receiver, allowing better use of asymmetric communications channels (e.g., satellite, cable-TV, or high-speed digital telephone with dial-up feedback).
This solution requires knowledge of the state space evolution of the other end of the channel, where the state evolution has moderately-constrained branching properties. The domain was described where source-anticipation would help, specifically distributed hypermedia navigation [12] [13].
This describes the WWW, used as a real-time interactive distributed system [2]. The WWW browsers are currently used as client/server interfaces, where response time tolerance is high. Casual users have come to begin to expect a level of real-time interaction that does not match the client/server design of the system.
In addition, HTML (the document notation language of the WWW) has come to be an effective high-level application language. Systems use WWW to drive bulletin-board services, interactive query systems, on-line forms systems, etc. This further drives the expectation of interactive response time from these WWW interactive applications (WWWias).
Char. #1 - Requires Feedback
Non-interactive applications, i.e., those that pipeline data to fill the bandwidth-delay product, can be accommodated with existing transport protocols. These include streaming data applications, such as digital audio or video, as used in teleconferencing.
WWWia's require feedback between the client and server. Even though the servers are stateless, they keep soft-state that helps govern source-based anticipation.
Although there are caching proxies for WWW servers, they will not help for the first-use of documents. If the response time is very large, even for some small percent of the time, the interactive nature of the WWWias will be defeated. Also, the WWW drives the interaction towards first-use, because the clients themselves have caches.
Char. #2 - "Non-linear" Communication
The feedback needs to be non-deterministic. Otherwise simple pipelining again works fine, as in the case of sending a very large file or database in total [1] [4].
WWWia's have a branching control structure with recursion, as indicated by the URL links and the "history" of the browser (user interface).
Large windows or packets help only during the transmission of a branch item. The branching structure cannot be accommodated by current sliding-windows protocols, and inhibits use of large linear windows or large packets [12].
The combination of feedback and nonlinear communication defines a rich control structure. It is this structure that the source uses to guide its presending. Making the data chunks larger reduces the richness of the control. WWWia's are reasonably rich, because the branching of the control is reasonably large (7-10 links per page), even though the data chunk is small (6 K for HTML text).
Char. #3 - "Well-Defined" App.-App. BW
This characteristic helps determine that the data can not be moved, and that the distributed application has not been broken in a particularly bad place (for high bandwidth) with no other justification. The system should not require dissection or detailed constraints to evidence the application-application bandwidth. WWWia's are well-defined - the server is one side, the browser is the other.
The control is reasonably rich with respect to the packet stream. The data chunks are large (30-60 K), but the control is much richer than "buffer empty/full" as in current protocols - it specifies a unique file on the server.

FIGURE 1. Implementation of the WWW intermediaries called the pump and filter.

FIGURE 2. Design of the Pump and Filter appears to the server and client as if it were a Proxy Cache.
The pump acts as a proxy for the browser at the server. It keeps soft state indicating the last request received from the browser, and peeks into the data stream to find URLs embedded in replies from the server. The pump then makes requests for URLs on the same server to be forwarded to the filter. The pump and filter together appear as a proxy cache to the client and server (Figure 2). The protocol is outlined in Figures 3 and 4.
The pump permits two kinds of HTML replies to be sent to the browser - direct replies, and present replies. The present replies are tagged to be saved on the disk by the filter. In a high bandwidth-delay product network, these tags may not be necessary, because the present documents arrive just as they are needed at the browser. The most disk space required is the larger of the bandwidth-delay product and the bandwidth-(idle-time) product. If there is some upper-bound on reasonable disk usage for the filter to cache present data, that can be indicated to the pump, to avoid wasted effort.

FIGURE 3. Pump operation.
The filter stores forwarded server replies to the disk. It also intercepts URL requests from the browser. If the URL is on the disk, the filter responds with the request and forwards the URL to the pump (not to be forwarded to the server). If the URL is not on the disk, the request is sent to the pump to be forwarded to the server (Figure 3).
Note that in either case, the URL is sent to the Pump. This provides feedback to the Pump. In the case where the file has not yet been sent, it indicates a corrective action to the Pump. In the case where the Pump has already sent the file to the disk, it indicates which file was used, and permits the Pump to focus further preloading.

FIGURE 4. Filter operation.
The branching-TCP extensions support the tags indicated in the figures and provide application-layer signalling of excess bandwidth that can be used for latency reduction.
The pump manages the sending of all possible next requests, and manages the possible states of the client. The pump uses the server-side TCP signal of excess bandwidth to initiate presending, and the branching window allows the pump to send alternate streams of messages to the client. As the pump emits these messages, the branching in the server-side TCP increases.
The filter allows the browser to receive only those messages that correspond to a particular state. This client-side TCP also indicates branch selections to the server-side TCP, to perform state resolution.
One observation is that current Web cannot be supported interactively by ISDN lines (14% hit rate within 0.1 seconds). By augmenting the protocol to support server preloading of receiver caches, the same bandwidth can support 0.1 second response 83% of the time (Figure 5).

FIGURE 5. Response time (probability of a 0.1 second response) as bandwidth increases.
The bandwidth required for source preloading of receiver caches in the Web has also been measured. Specifically, number of links per page (Figure 6) and the amount of bandwidth for general preloading (Figure 7) were measured. The links per page is measured both in general terms, as well as local to the server (where preloading is possible). The bandwidth is a comparison of the bytes per page vs. the total bytes required for the files pointed to by the links on that page.

FIGURE 6. Number of HREFs (hypertext links) per page in the Web (Local only, and All).

FIGURE 7. Amount of additional bandwidth required (relative to the current page).
The channel utilization can be measured, where the goal is a load of 100%. Conventional request/response systems achieve loads near 50%, because the response channel is idle in-between and during requests. The goal is to keep the server-to-client channel busy 100% of the time.
The bandwidth of the messages that are actually received (effective bandwidth) can also be measured. This will always be at least as large as the effective bandwidth of a request/response system, because guessed messages are not counted, and because a direct request always overrides this protocol.
Similarly, the effective latency is always reduced relative to a conventional protocol. Responses that are anticipated reduce the measured latency, and responses not anticipated cost the same as in the conventional case.
The overall cost is difficult to measure without externally imposed network cost functions. The cost can be expressed in terms of the bandwidth used, but it is of little meaning due to the number of variables. The real result is that a set of conditions must exist:
This discussion also assumes the availability of sufficient information (i.e., hypermedia links) to support server-based preloading. There need not be a correlation between users (ensemble) or a repeated history of a single user's actions (temporal); the requested item need only be from among the links on a page, rather than overridden by typing in an arbitrary URL. The URLs within links on a page are information the server can use to optimize the response latency; arbitrary URLs are (by nature) unpredictable, and will require a conventional client/server interaction (and its associated latency).
[2] Berners-Lee, T.J., Cailliau, R., Groff, J-F, Pollermann, B., "World-Wide Web: The Information Universe," Electronic Networking: Research, Applications and Policy, Meckler Publishing, Connecticut, Spring 1992, pp. 52-58.
[3] Braden, R., "Extending TCP for Transactions -- Concepts," RFC-1379, USC/ISI, Nov. 1992.
[4] Herman, G., Gopal, G., Lee, K., and Weinrib, A., "The datacycle architecture for very high throughput database systems," in Proc. ACM SIGMOD Conf., 1987, pp. 97-103.
[5] Jacobson, V., and Braden, R., "TCP Extensions for Long-Delay Paths," RFC-1072, LBL and USC/Information Sciences Institute, Oct. 1988.
[6] Jacobson, V., Braden, R., and Zhang, L., "TCP Extensions for High-Speed Paths," RFC-1185, LBL and USC/Information Sciences Institute, Oct. 1990.
[7] Jacobson, V., Braden, R., and Borman, D., "TCP Extensions for High Performance," RFC-1323, LBL, USC/Information Sciences Institute, and Cray Research, May 1992.
[8] Kleinrock, L, "The Latency / Bandwidth Tradeoff in Gigabit Networks," IEEE Communications Magazine, Vol. 30, No. 4, April 1992, pp. 36-40.
[9] NSF Report 92-109, "Research Priorities in Networking and Communications," Oct. 1992.
[10] Sterbenz, J., et. al., Gigabit Networking Workshop '94, <http://info.gte.com/ieee-tcgn/conference/gbn94>.
[11] Touch, J.D., and Farber, D.,"Reducing Latency in Communication," IEEE Communications Magazine, Vol. 31, No. 2, Feb. 1993, pp. 8-9.
[12] Touch, J.D., "Parallel Communication," Proc. IEEE Infocom, Mar. 1993, pp. 505-512.
[13] Touch, J.D., and Farber, D., "An Experiment in Latency Reduction," Proc. IEEE Infocom, May. 1994, pp. 175-181.
[14] Touch, J.D., "Protocol Parallelization," Protocols for High-Speed Networks III, Elsevier, 1994, (to appear).
Joseph D. Touch (S'83-M'92) received a B.S. (Hons.) degree in biophysics and computer science from the University of Scranton in 1985, the M.S. degree from Cornell University in 1988 and the Ph.D. degree from the University of Pennsylvania in 1992., both in computer science.
He joined USC/Information Sciences Institute, Marina del Rey, in 1992, and is currently a Project Leader in the High Performance Computing and Communications Division there, directing the ATOMIC-2 and PC-ATOMIC tasks. He is also a Research Assistant Professor in the Department of Computer Science, University of Southern California, where he teaches Advanced Operating Systems. Since 1988 he has been addressing issues of latency and source-anticipative protocols. In 1994, he received a U.S. patent for a device for latency reducing processor-memory interface. He is also interested in issues of telecommuting and on-line city services, and response-time reducing extensions to the World-Wide Web.
Dr. Touch is a member of the program committees of IEEE Infocom '94 and '95, Protocols for High Speed Networks '94, and Physcomp '94. He is a member of Sigma Xi (S'84, M'93).