Rate Based Pacing for TCP

Vikram Visweswaraiah and John Heidemann

Abstract:

TCP's congestion avoidance mechanisms are not tuned for request-response traffic like HTTP. Prior work on HTTP performance has shown that enhancements to HTTP (P-HTTP) can result in poorer performance than expected. This suggests that certain changes may need to be made to TCP to obtain the expected performance. The increasing use of the World Wide Web and the use of HTTP in areas other than the Web require a clearer understanding of the need for these changes and the problems that would exist without the changes. One such problem has to do with some TCP implementations forcing slow-start in the middle of a connection that has been idle for a certain amount of time, even if there is no packet loss. Other existing TCP implementations do not treat idle time as a special case and use the prior value of the congestion window to send data. Both extremes lead to poor performance of P-HTTP over TCP. This document describes the motivation and implementation of rate based pacing for TCP, which provides a good compromise between the two extremes.

A compressed, PostScript version of this document is available for off-line reading.  

Introduction

The infrastructures for information exchange have evolved rapidly in the recent past. Changes in application behavior have resulted in different network dynamics, driving the networking community to tune the underlying protocols for optimal performance. The World Wide Web, which uses HTTP, is one such application. The increasing use of the web and the use of HTTP in applications outside the Web domain emphasize the need to enhance the performance of HTTP. One such enhancement, only recently being standardized in HTTP/1.1, is P-HTTP, an implementation of HTTP which avoids the need for multiple TCP connections across a transaction to the same server [1]. However, P-HTTP interacts with current TCP implementations in a manner that degrades performance [2]. One of the interactions has to do with TCP's congestion avoidance mechanisms, which is examined in this document.

We describe the problem, named ``slow-start restart'' and propose a possible solution. We then describe our implementation of the solution and discuss the new behavior in contrast with existing TCP implementations. Finally we describe the current status of our work and discuss future goals.

The Slow-start Restart Problem

  TCP is not optimized for multiple request/responses over a single connection. This is the common case with HTTP/1.1. When a new request/response occurs after the connection has been idle, how should TCP on the server behave? Some TCP implementations force slow-start again (for example 4.4 BSD and Linux 2.x). Other implementations (SunOS) do not even detect this idle time and thus use the old value of the congestion window. The latter approach can overrun queues at intermediate routers, leading to packet loss. Though restarting with slow-start avoids this risk, it means added delay for each time we slow-start and get to steady state. This can degrade the performance of layers that TCP provides service to, a strong example being P-HTTP.

Prior work has suggested that this ``slow-start restart'' problem is a contributor to poor performance of P-HTTP over TCP [2]. One way of solving the problem is to send segments at a certain ``pace'' until we get the ACK clock running again. This pace or rate should be based on a fraction of prior estimates of data transfer rate, since that is the closest estimate of available bandwidth that we have (if we had some magical way of knowing the exact available bandwidth at the end of the idle time, we could have used that). We believe that this modification, called Rate Based Pacing (RBP), will give better performance for the circumstances mentioned in the problem.

RBP Implementation

Rate based pacing requires the following changes to TCP:

  1. Idle time detection and indication that RBP needs to be started.
  2. Bandwidth estimation.
  3. Calculation of the window that we expect to send in RBP and the timing between segments in that window.
  4. A mechanism that clocks the segments sent in RBP.

Idle time detection is done by some TCP implementations (4.4 BSD, Linux 2.x). Instead of forcing slow start upon detection of idle time, we modify the behavior to RBP. TCP Vegas gives us a method for bandwidth estimation [3]. We borrowed the Vegas port of USC for our implementation [4]. The RBP window and the timing between segments in that window are based on functions of the estimated bandwidth and the RTT. The segments are clocked by a custom RBP timer which is operational only for the time that RBP is in effect.

The implementation does not change the default behavior of the kernel. RBP mode is available on a connection only via the setsockopt interface. We modified the interface to be able to select RBP and slow-start restart. Using the setsockopt interface avoids the need to recompile the kernel to test each case.

Demonstration

The goal of the experiments we conducted was to verify that RBP mode works. An easy method to do this is to send a bunch of data from a RBP enabled machine, pause for some time (greater than the retransmission time-out interval) and send another bunch of data again. RBP behavior should be observable at the beginning of the second data transfer phase.

A program capable of doing such data transfer is Stevens' sock program [5]. To compare RBP behavior with the other two extremes (slow-start restart and no slow-start restart), we modified Stevens' sock program to understand setsockopt options for all three cases. This included adding a command line switch capable of taking the following values:

TCP_RENO_RESTART:
For slow-start restart.
TCP_RBP:
For rate based pacing.

With an unspecified switch, the behavior is that of SunOS 4.1.3, which is no slow-start restart.

Now using sock we ran tests from a Sun SPARC 20/71 at ISI West, running the modified SunOS 4.1.3 kernel to a machine on the east coast (metro). Typical network conditions were 12 hops, 200ms avg RTT and approximately 32KB/s bandwidth.

The tests send 2 chunks of 512KB to metro with a pause of 20 seconds in between. metro had sock running as a sink. This test gives the conditions we need for the idle time to occur, enabling us to observe what each flavor of TCP does when a midstream data transfer begins.

Using tcpdump on the sending side and programs to graph tcpdump output, we were able to get sequence number vs time plots for each case.

   figure39
Figure 1: Default SunOS 4.1.3 behaviour when data transfer occurs after idle time

No slow-start restart:
There is no idle time detection and hence data is dumped all at once using the prior value of the congestion window. This is clear from Figure 1. In networks where bandwidth is dynamically allocated, such as the Internet, this behaviour can be aggressive, leading to router queue overflows and subsequent packet loss. Such losses will cause TCP to slow start again, leading to increased delay and low throughput. This bursty behavior is harmful to other users behind the same router; if the router enforces drop tail queuing, packets from other connections will be overwhelmed by the burst.

   figure48
Figure 2: Slow start restart behaviour when data transfer occurs after idle time

Slow-start restart:
Flavors like 4.4 BSD and Linux 2.x force slow-start. Figure 2 illustrates this behavior. Slow start restart solves the problems associated with sending back to back packets. Building up from slow start means low chances of suffering packet loss as against sending a burst and also fairness to other users. Being conservative is good behavior in the Internet; hence this option has been adopted by many people. However, slow start restart adds the extra delay of getting to steady state each time a data transfer is initiated midstream.

Rate based pacing:
Our implementation's behaviour is shown in Figure 3. Here, we can observe the initial 5 segments being sent at a certain pace, that we calculated based on the rate given to us by Vegas. This is an excellent compromise between the two extremes of dumping segments back to back and restarting with slow-start. We believe that this implementation will give much better performance, at least for the situations mentioned in [2].

   figure59
Figure 3: TCP RBP's behaviour when data transfer occurs after idle time

Conclusions and Future Work

Restarting with slow-start in the middle of a connection can lead to poor performance. At the same time, dumping segments all at once can mean overrunning intermediate router queues, leading to a drop in throughput. Rate based pacing gives a good compromise between the two extremes and solves the slow-start restart problem.

We are currently conducting experiments to examine the impact of RBP on HTTP throughput.

Source code availability

Source code for rate based pacing is currently available only for SunOS4.1.3. Follow this link for further instructions on downloading.

References

1
Jeffrey C. Mogul. The case for persistent-connection HTTP. In Proceedings of the SIGCOMM '95, pages 299-313. ACM, August 1995.

2
John Heidemann. Performance interactions between P-HTTP and TCP implementations. ACM Computer Communication Review, 27(2):tba, April 1997. Draft versions were available in 1996 as [Heidemann96b].

3
L. Brakmo and L. Peterson. TCP Vegas: End to end congestion avoidance on a global internet. IEEE Journal of Selected Areas in Communication, 13(8):1465-1480, October 1995.

4
J.S. Ahn, Peter B. Danzig, Z. Liu, and L. Yan. TCP Vegas: Emulation and experiment. In Proceedings of the ACM SIGCOMM '95, page xxx. ACM, xxx 1995.

5
W. Richard Stevens. TCP/IP Illustrated, volume 1. Addison-Wesley, 1994.

About this document ...

Rate Based Pacing for TCP

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -dir /nfs/yelo/visweswa/public_html/RBP/UNSPLIT/ rbp-web.

The translation was initiated by Vikram Visweswaraiah on Tue Jun 17 17:59:05 PDT 1997

...TCP
Questions and comments about this document may be directed to visweswa@isi.edu or johnh@isi.edu
 


Vikram Visweswaraiah
Tue Jun 17 17:59:05 PDT 1997