I have an idea for regulating the TCP congestion window based on
the round trip time and the retransmission timeout. I haven't
tested it yet. I hope some of you find it useful.
If somebody could comment on the technical merits as well as
what I should do next to promote this, I would appreciate it.
I am new to this mailing list. The best way to regulate the congestion
window is probably some combination of all the proposals, so I would
like to contribute to that.
The problem
-----------
The only way that TCP will reduce the congestion window is when a packet
is dropped. Fast recovery and fast retransmit (FRFR) helps considerably
if a congested gateway drops 1 in 4 packets or less, and drops no ACKs
in the reverse direction. However, TCP keeps increasing the congestion
window (cwnd) until the onset of congestion and FRFR fails if more than
the previously mentioned packets are dropped. Of course, if the user
sets the receive and transmit buffers of the connection low enough, none
of this will happen. TCP has no way of figuring out the optimal buffer
sizes (window size) for each connection and this can vary by orders of
magnitude from modem links to high bandwidth satellite links.
With every ACK received, TCP increases cwnd. A way to decrease cwnd
before the onset of congestion is required to keep cwnd at an optimal
equilibrium. FRFR will keep it oscillating between the point of
congestion and half that, but only if only a few individual packets are
dropped.
My proposal
-----------
Inferring the level of congestion from measurements of throughput is
unreliable. A pipe that is nearly full has the same throughput as
one that is nearly empty. Throughput is reduced only if congestion is
increasing (or the pipe overflows).
The maximum rtt is an unreliable number to base congestion inferences
on, because it can be distorted too much by external influences.
The maximum rtt could easily occur at a time of little congestion, but
be way beyond the onset of congestion at another time. For example, when
a WAN link is being established or when congestion occurs because of
other flows, but this flow is only small.
Using the rate of change of rtt to move cwnd is unreliable, because
cwnd will soon drift if the measured rtt is distorted because of
competing flows.
The level of congestion in a connection can only be measued by the rtt.
Explicit notification or dropped packets are exempted. The lower bound
for rtt can be measured reliably. This is just the minimum of observed
rtt's. The min_rtt can change, for example on a LEO satellite system,
so it can be tracked up with a very slow 1st order filter.
When there is less data in the pipe than optimal, or just enough,
the min_rtt will be observed. If the router buffers start to fill,
rtt will increase. The question is how much should it increase over
min_rtt before we stop increasing rtt. The calculated retransmission
timer (rto) seems to be the best number to derive something from: if
you wait any longer than that one, you're wasting your time. So, I
propose:
  Whenever an rtt is measured, if that rtt is more than half way between
  the minimum rtt and rto ((min_rtt + rto)/2), then reduce cwnd by
  one mss (maximum segment size), but never make it less than one mss.
  Otherwise, increase cwnd by one mss, but never make it greater than
  the peer's maximum advertised receive window.
This method to increase the cwnd is to replace the current one of
  cwnd += mss*mss/cwnd
for each ACK received.
This algorithm runs only while the sender is in the congestion avoidance
phase. During the slow start phase, if meas_rtt > (min_rtt + rto)/2,
slow start is terminated and congestion avoidance begins. Slow start
will also terminate when cwnd reaches ssthresh, as usual.
The quantum by which cwnd is moved is mss bytes. It could be specified
as a number of packets instead of a number of bytes or the quantum could
be the number of bytes in the packet for which the rtt was measured.
This might be better for connections that have nagle turned off.
However, I leave this issue open for further study and comment.
Regardless of what units cwnd is specified in and incremented by, the
proposal is an improvement to TCP.
Other details: The rtt should be measured with a clock with granularity
of 10mS or better. TCP timers can still have their usual coarse
granularity. If min_rtt and rto are less than 2 clock ticks apart, just
increase cwnd by one mss, don't bother with the reduction part.
This is probably obvious, but if meas_rtt >= rto, we face a
retransmission, so the normal cwnd setting procedures take effect. If
the retransmission does not happen, because of the TCP timer phasing
and granularity, then force a retransmission anyway. The first min_rtt
is determined by the SYN/ACK exchange.
min_rtt is changed whenever an rtt is measured, as follows:
  if (meas_rtt > min_rtt)
      min_rtt = (32*min_rtt + (meas_rtt - min_rtt)) / 32;
  else
      if ((size of measured segment > 3/4*mss) || (nagle is off))
          min_rtt = meas_rtt;
I also propose to have slow start start with cwnd = 2*mss instead of
one. If the second packet needs to be retransmitted, we finished slow
start, so there is no reason to start it with one mss. Connections will
ramp up one rtt quicker: good for satellites.
-- 
/*******************************************************************\
* Jacob Heitz         Tel:510-747-2917/Fax:2859   home:510-888-9429 *
* Ascend->Engineering->Software->Alameda   mailto:[email protected] *
\*******************************************************************/
This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:42 EST