In FTP, a client opens a TCP connection with the server for control (Figure 1). Once that connection is established, a request for a file is sent on that channel. The server then opens a separate TCP connection for the file transfer, and returns the file in that other connection. Each connection requires one round-trip time (RTT) to open. The request takes 1/2 a RTT to get to the server, and the response takes another 1/2 RTT to return, in addition to the transmission time of the file. The overall time required for an FTP transaction is:
1 RTT control-channel OPEN 0.5 RTT send request on control-channel 1 RTT file-channel OPEN 0.5 RTT file starts to arrive on file-channel Ftrans time to transmit the file -------- 3 RTT + Ftrans = time to get the first file in FTPThis is shown in Figure 1, below. The control channel interaction is shown in red, and the file channel is shown in blue.

0.5 RTT send request on control-channel 1 RTT file-channel OPEN 0.5 RTT file starts to arrive on file-channel Ftrans time to transmit the file -------- 2 RTT + Ftrans = time to get subsequent files in FTP

1 RTT channel OPEN 0.5 RTT send request 0.5 RTT file starts to arrive Ftrans time to transmit the file -------- 2 RTT + Ftrans = time to get a file in HTTP

1 RTT channel OPEN and send request, file starts to arrive Ftrans time to transmit the file -------- 1 RTT + Ftrans = time to get a file optimally


In slow-start, when a connection opens, only one packet is sent until an ACK is received. For each ACK received, the number of packets that can be sent is increased by one. For each round-trip, the number of outstanding packets doubles, until a set of thresholds have been reached.
The packet size is negotiated. The default is 536 bytes in TCP, although many implementations round this down to 512. Hosts on Ethernets typically use 1460 for local connections. Where implemented, MTU discovery will allow Ethernet-sized MTUs on wide-area connections [mtu].
Slow-start occurs when a connection is initialized, when a packet is lost, or may occur when there is a significant idle period in the connection. The latter is described in [tcp-ss-rev], and implemented in 4.4BSD and deriviatives, although it has not been adopted by earlier BSD-TCP users (for example, SunOS 4).
P-HTTP attempts to achieve optimal transaction time for sequences of transactions to the same server. The initial transaction occurs as in HTTP, but the connection is not closed. Subsequent requests occur without needing to re-open the connection.
In addition, P-HTTP attempts to avoid slow-start restart for each new transaction, again by using a single connection for a sequence of transactions. Unfortunately, sufficiently large gaps in the arrival of requests may cause a restart of slow-start anyway, due packet loss during the resulting packet burst when the transmission resumes, notably in the 4.4BSD-derived TCP implementations. The P-HTTP method is useful primarily for multiple adjacent requests, as would occur on pages with embedded images, for example.
P-HTTP achieves this efficiency at the expense of application-layer complexity. Re-using a single connection requires application-layer multiplexing or can stall concurrent requests arbitrarily. Consider retrieving a large PostScript file, and issuing a small HTML file request during the transfer. The HTML response will either be stalled until the end of the PostScript file transmission, or the PostScript file will be segmented. The server cannot know whether this segmentation is required or not when it started to send the PostScript file. MIME-style headers are in-line and not encoded via "escapes"; only the specified length is used to determine when to parse the next header. As a result, the inefficiency of application-layer segmentation and reassembly occurs for every transaction. Finally, application-level multiplexing interferes with emerging Integrated Services multiplexing in the kernel, for Type-of-Service and Quality-of-Service mechanisms [int-svc].
In addition, T/TCP caches other TCP protocol control block parameters, such as round-trip time measures, to avoid inefficiencies with reconnecting to the same host. Reusing slow-start information, which would avoid slow-start restart, is discussed briefly in the T/TCP specification.
S-TCBs optimize only the inefficiency of the slow-start restart component of HTTP over TCP. Also described in the S-TCB memo are the effects of application-layer multiplexing, and ways in which kernel-based multilevel feedback queuing, as in Integrated Services, would be adversely affected.
When S-TCB and T/TCP are coupled, they provide similar efficiency to P-HTTP, but at the kernel-level rather than requiring application-layer multiplexing.
Moskowitz described performance problems at the server, where buffering limitations in the operating systems affected transaction performance. The claim is that the server runs out of buffers to create new TCP connections; the purported evidence is the "Host Connected, Waiting Reply" message. This message is emitted after the TCP connection is established, which is in turn after the TCB control block is allocated. This message is possibly evidence of processing bottlenecks at the server after the connection is established, although it is counter-proof of the claimed lack of buffers.
We are currently looking at HTTP performance over several transport protocols, including TCP, p-HTTP, T/TCP, and UDP-based RPC protocols, over a wider variety of network conditions [web-transp]. This paper will contain both a more detailed model of HTTP performance than presented here and validation of this model against real-world traffic.
Net BW MSS Latency (ms) (bps) (bytes) LAN/MAN WAN -------------------------------------------- sat. 9 K 512 250 500 modem 29 K 512 150 250 ISDN 112 K 1460/512 30 130 direct 1 M 1460/512 2 100 fast 155 M 8192/512 2 100The default TCP MSS is 536 bytes for data, although most current implementations round this down to 512 bytes. Fast links support larger MSSs, but TCP often ignores them and uses the default for connections where MTU discovery is not implemented.
Web File File Size Type (bytes) -------------------------------------------------------- HTML 6 K ASCII text "Web page" 6+2+2 K HTML and links to two icons text 60 K ASCII text icon 2 K small GIF (icons) image 20 K large GIF (clickable map) photo 200 K very large GIF (photo)Although this describes the characteristics of web pages in general, the majority of accesses are to 6 KB files, and that is the focus of the discussion. Other analysis at ISI shows that the "Web page" case has a greater potential for optimization than the HTML case, because it is composed of multiple files [web-transp]. In particular, the aggregate of files denoted by a Web page can be retrieved in a single connection with a single aggregate 'GET-ALL' request, rather than even using persistent connections [web-lat].
The effect of the optimizations (avoiding connection establishment and
slow-start) depends on the network properties; for modem links it is 11%,
for ISDN it is also low, around 27%. Optimization increases significantly
for faster connections or for higher latency paths [web-transp],
which is similar to the results in These analysis consider optimal performance of the system. We assume
that the hosts are limited only by the network bandwidth, that server processing
time is negligible, and that disk I/O and other bottlenecks are minimal.
We consider the worst-case performance of HTTP over TCP, assuming no packet
loss. This maximizes the benefit of persistent connections. Even so, these
optimizations benefit most Web users only minimally. When other factors,
such as server processing, packet loss, etc., are included, the optimizations
are even less noticeable.
The following section presents analysis for HTTP over TCP. A more complete
analysis of HTTP over several transport protocols and P-HTTP is currently
underway [web-transp]. The formulae below
are simplified versions originally developed there. The formulae in this
paper provide an upper bound on performance, the formulae in [web-transp]
are more precise and include implemention-specific interactions.
S is the number of round-trips stalled during the initial slow-start.
The initial send window starts at 1 MSS, but in most BSD implementations
it is increased to 2 when the TCP SYN (connect start) is ACK'd. The window
for data transport effectively begins at 2, and doubles each round-trip.
The transmission stalls each time this window is smaller than M.
Each stall wastes at most one round-trip; actually, the entire round-trip
time is not wasted, since the window was non-zero, but this is an upper-bound.
W is the amount of wasted time, the total of one round trip for
connection establishment, and at most one round trip for each slow-start
stall. We compare this number to the absolute minimum for a transaction,
composed only of one round-trip for the request exchange and the file transmission
time. We assume here that the request transmission time is negigible compared
to the file transmission time; typically requests are less than 100 bytes.
We also ingore header overheads.
The amount of wasted time becomes noticible to the user when it is the
same as the minimum transaction time, or larger. At that point, removing
re-connection and slow-start restart overheads will halve the time of access.
This assumes no network loss.
There are two ratios we can compare. The first is percent wasted time,
the ratio of waste to optimal file transaction time. The second, more intuitive
notion is percent of possible reduction, the ratio of waste to transaction
including waste. The first is more meaningful experimentally, because the
waste varies independently of the optimal transaction time, and so the
ratio varies in the numerator only. The second is easier to intuit; a 10-second
transaction with 50% possible reduction optimizes to 5 seconds, whereas
in the first equiation this would be a 1:1 ratio. We will show both the
ratio and the percent possible reduction.
The graph is fixed for a constant filesize of 6 KB. We consider bandwidths
from 10 Kbps - 1 Mbps and latencies from .01-1 seconds. Typical latencies
are 70 ms for end-to-end latency for average Web surfing in the USA, with
30-150 ms of additional latency for modem or ISDN links. We therefore consider
250 ms total latency for modem links and 100 ms total latency for other
types of directly-connected networks. Satellite network latencies are higher,
but not considered below.
Shown in Figure 6 are contour lines where the waste
to useful time is 0.25:1, 0.50:1, 2:1, and 3:1. I.e., for 1:1, removing
the overhead halves the effective transaction time. The shaded area shows
where this 2x speedup (or greater) applies. For this graph, we consider
Internet interactions, so that the MSS is 512 bytes.
ISDN (112 Kbps) at 100 ms latency has 27% waste (C), a ratio of 0.37:1.
At 100 ms latency, 260 Kbps end-to-end links are the minimum required to
approach the waste ratio of 1:1, i.e., the 2x speedup sought (D).
To be noticible, the optimizations require network and file characteristics
that are not true for most users. Connection establishment optimizations
require that the file is as small as the round-trip bandwidth-delay product,
or smaller. Slow-start optimizations require that there are a large number
of packets in the round-trip, and that the overall number of packets is
a small number of round-trips' worth. Neither of these assumptions hold
for users over modem or ISDN lines accessing the vast majority of Web files.
In such cases, only 1-2 packets are typically in transit in the round-trip,
negating the effects of slow-start optimizations. Typically, files are
over 10x larger than the round-trip bandwidth-delay product, negating the
effects of connection establishment optimizations.
In the future, bandwidths are sure to increase. Packet sizes are also
likely to increase, e.g., to 9 Kbytes for ATM, and MSS discovery should
be more widely available. File sizes may increase as well. Given all three
of these advances, it is not easy to predict the overall effect. This is
discussed in further detail in ongoing work [web-trans].
This page written and maintained by the
LSAM Group.
Evaluation
We want to determine the potential for inefficiency in HTTP over TCP. For
this purpose, we analyze the time required for an HTTP interaction, computing
an upper-bound for both the per-transaction connection establishment and
potential slow-start overheads, and compare that to the optimal time for
transfer.
Analysis
The following notation is used in the analysis:
R = RTT
bw = bandwidth
MSS = maximum segment size (packet size)
K = number of packets in the file
= filesize / MSS
L = round trip time in packets, i.e., length of the pipe
= number of packets to fill the pipe
= bw * R / MSS
M = max useful window size (lower bound)
= min(L, K)
S = round trips stalled in slow-start, assuming no loss (upper bound)
(window starts at 2, see [web-transp])
= floor(log2( ceil(M/2) ))
W = amount of wasted time
= (upper-bound on waste -- not all slow-start is wasted, though)
= slow-start + connection-setup
= R * S + R
F = min. file transmission time
= filesize / bw
Tmin = min. transaction time
= F + R
T = transaction time
= Tmin + W (discounting server processing time)
R, bw, MSS, K, and L are self-explanatory. M is the
maximum useful window size for this file and network, bounded by the smaller
of the packets in round-trip (L) and the packets in the file (K).
Tmin + P <= W
(under best conditions, assume P goes to zero)
Tmin <= W
So we can plot the ratio of time wasted to file transmission, as an upper-bound
on the optimization possible. This ignores processing time and other impediments
at the server, as mentioned earlier. We also count the entire round-trip
of each slow-start exchange as wasted, which is not strictly true. Up to
one RTT-worth of data is sent during this exchange, at most; by ignoring
this, we achieve a further upper-bound.
W
------ = ratio wasted time
Tmin
W
---------- = percent of possible reduction
Tmin + W
Some common cases
We plotted the ratio of waste to useful time on a contour plot. For a given
network RTT, we want to see what bandwidth is required for the proposed
optimizations to reduce the overall transaction by half, i.e., where the
waste is the same as the useful time.

Figure 6: Effect of optimizations (for Internet MSS)
Contour plot of (wasted time)/(useful time)
MODEM (Internet MSS):
R = .250 s
bw = 28,800 bps
MSS = 512 Bytes = 4096 bits
(filesize = 6 KB)
K = 12 packets
L = 1.76
M = min(1.76, 12) = 1.76 packets
S = floor(log2(ceil(1.76/2))) = 0 rtts
W = .250 s
F = 1.71 s
W/Tmin = waste ratio is .13:1
W/(Tmin+W) = percent possible improvement is 11%
ISDN (Internet MSS):
R = .100 s
bw = 112,000 bps
MSS = 512 Bytes = 4096 bits
K = 12 packets
L = 2.73 packets
M = min(2.73, 12) = 2.73 packets
S = floor(log2(ceil(2.73/2))) = 1 rtt
W = .200 s
F = .44 s
W/Tmin = .37:1 waste ratio
W/(Tmin+W) = 27% percent possible improvement
We re-evaluated the graph for the case where MTU discovery is implemented,
and packets contain the Ethernet-MSS (1460 bytes), as shown in Figure
7. In this case, the results of the optimizations are different. End-to-end
rates of 80 Kbps are required at 250 ms latency (E), and 200 Kbps is required
for 100 ms latency (F). ISDN at 100 ms gains only 16% from the optimizations.

Figure 7: Effect of optimizations (for Ethernet MSS)
Contour plot of (wasted time)/(useful time)
ISDN (Ethernet to ISDN):
R = .100 s
bw = 112,000 bps
MSS = 1460 Bytes = 11680 bits
K = 4.21 packets
L = 0.96 packets
M = min(0.96, 4.21) = 0.96 packets
S = floor(log2(ceil(0.96/2))) = 0 rtt
W = .100 s
F = .44 s
W/Tmin = .19:1 waste ratio
W/(Tmin+W) = 16% possible improvement
These observations indicate that avoiding connection establishment and
slow-start does not benefit current Web access for the vast majority of
users. Most users see end-to-end latencies of about 250 ms and use modem
lines. At these rates, the optimizations reduce the overall transaction
time by 16%. Rates over 200 Kbps are required to provide user-noticeable
performance.
Conclusions
Our analysis indicates that the persistent connection optimizations do
not substantially affect Web access for the vast majority of users. Most
users see end-to-end latencies of about 250 ms and use modem lines. At
these rates, the optimizations reduce the overall transaction time by 11%.
Bandwidths over 200 Kbps are required to provide user-noticeable performance
improvements.
Acknowledgements
We would like to thank the members of ISI's HPCC Division, especially Ted
Faber, for their assistance with this document. This document was the result
of discussions on the http-ng and web-talk mailing lists, and we also thank
the members of those lists for their feedback.
References
Go back to the LSAM home page. / Go back
to the ISI home page.
Please mail any problems with or
comments about this page.
Last modified August 16, 1996.
Copyright © 1996 The University of Southern California. All rights
reserved.