Vikram Visweswaraiah and John Heidemann
TCP's congestion avoidance mechanisms are not tuned for request-response traffic like HTTP. Prior work on HTTP performance has shown that enhancements to HTTP (P-HTTP) can result in poorer performance than expected. This suggests that certain changes may need to be made to TCP to obtain the expected performance. The increasing use of the World Wide Web and the use of HTTP in areas other than the Web require a clearer understanding of the need for these changes and the problems that would exist without the changes. One such problem has to do with some TCP implementations forcing slow-start in the middle of a connection that has been idle for a certain amount of time, even if there is no packet loss. Other existing TCP implementations do not treat idle time as a special case and use the prior value of the congestion window to send data. Both extremes lead to poor performance of P-HTTP over TCP. This document describes the motivation and implementation of rate based pacing for TCP, which provides a good compromise between the two extremes.
A compressed, PostScript version of this document is available for off-line reading.
The infrastructures for information exchange have evolved rapidly in the recent past. Changes in application behavior have resulted in different network dynamics, driving the networking community to tune the underlying protocols for optimal performance. The World Wide Web, which uses HTTP, is one such application. The increasing use of the web and the use of HTTP in applications outside the Web domain emphasize the need to enhance the performance of HTTP. One such enhancement, only recently being standardized in HTTP/1.1, is P-HTTP, an implementation of HTTP which avoids the need for multiple TCP connections across a transaction to the same server . However, P-HTTP interacts with current TCP implementations in a manner that degrades performance . One of the interactions has to do with TCP's congestion avoidance mechanisms, which is examined in this document.
We describe the problem, named ``slow-start restart'' and propose a possible solution. We then describe our implementation of the solution and discuss the new behavior in contrast with existing TCP implementations. Finally we describe the current status of our work and discuss future goals.
TCP is not optimized for multiple request/responses over a single connection. This is the common case with HTTP/1.1. When a new request/response occurs after the connection has been idle, how should TCP on the server behave? Some TCP implementations force slow-start again (for example 4.4 BSD and Linux 2.x). Other implementations (SunOS) do not even detect this idle time and thus use the old value of the congestion window. The latter approach can overrun queues at intermediate routers, leading to packet loss. Though restarting with slow-start avoids this risk, it means added delay for each time we slow-start and get to steady state. This can degrade the performance of layers that TCP provides service to, a strong example being P-HTTP.
Prior work has suggested that this ``slow-start restart'' problem is a contributor to poor performance of P-HTTP over TCP . One way of solving the problem is to send segments at a certain ``pace'' until we get the ACK clock running again. This pace or rate should be based on a fraction of prior estimates of data transfer rate, since that is the closest estimate of available bandwidth that we have (if we had some magical way of knowing the exact available bandwidth at the end of the idle time, we could have used that). We believe that this modification, called Rate Based Pacing (RBP), will give better performance for the circumstances mentioned in the problem.
Rate based pacing requires the following changes to TCP:
Idle time detection is done by some TCP implementations (4.4 BSD, Linux 2.x). Instead of forcing slow start upon detection of idle time, we modify the behavior to RBP. TCP Vegas gives us a method for bandwidth estimation . We borrowed the Vegas port of USC for our implementation . The RBP window and the timing between segments in that window are based on functions of the estimated bandwidth and the RTT. The segments are clocked by a custom RBP timer which is operational only for the time that RBP is in effect.
The implementation does not change the default behavior of the kernel. RBP mode is available on a connection only via the setsockopt interface. We modified the interface to be able to select RBP and slow-start restart. Using the setsockopt interface avoids the need to recompile the kernel to test each case.
The goal of the experiments we conducted was to verify that RBP mode works. An easy method to do this is to send a bunch of data from a RBP enabled machine, pause for some time (greater than the retransmission time-out interval) and send another bunch of data again. RBP behavior should be observable at the beginning of the second data transfer phase.
A program capable of doing such data transfer is Stevens' sock program . To compare RBP behavior with the other two extremes (slow-start restart and no slow-start restart), we modified Stevens' sock program to understand setsockopt options for all three cases. This included adding a command line switch capable of taking the following values:
With an unspecified switch, the behavior is that of SunOS 4.1.3, which is no slow-start restart.
Now using sock we ran tests from a Sun SPARC 20/71 at ISI West, running the modified SunOS 4.1.3 kernel to a machine on the east coast (metro). Typical network conditions were 12 hops, 200ms avg RTT and approximately 32KB/s bandwidth.
The tests send 2 chunks of 512KB to metro with a pause of 20 seconds in between. metro had sock running as a sink. This test gives the conditions we need for the idle time to occur, enabling us to observe what each flavor of TCP does when a midstream data transfer begins.
Using tcpdump on the sending side and programs to graph tcpdump output, we were able to get sequence number vs time plots for each case.
Figure 1: Default SunOS 4.1.3 behaviour when data transfer occurs after idle time
Figure 2: Slow start restart behaviour when data transfer occurs after idle time
Figure 3: TCP RBP's behaviour when data transfer occurs after idle time
Restarting with slow-start in the middle of a connection can lead to poor performance. At the same time, dumping segments all at once can mean overrunning intermediate router queues, leading to a drop in throughput. Rate based pacing gives a good compromise between the two extremes and solves the slow-start restart problem.
We are currently conducting experiments to examine the impact of RBP on HTTP throughput.
Source code for rate based pacing is currently available only for SunOS4.1.3. Follow this link for further instructions on downloading.
Rate Based Pacing for TCP
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -dir /nfs/yelo/visweswa/public_html/RBP/UNSPLIT/ rbp-web.
The translation was initiated by Vikram Visweswaraiah on Tue Jun 17 17:59:05 PDT 1997