Aaron,
I think you brought up an important point:
> But then I started thinking about how often this might happen. The
period
> of SCH+FCH bandwidth needs to be fairly long (several seconds
according to
> the authors) to have RTO converge the current RTT. Then you shift
down to
> narrowband and get a spurious RTO since your store-and-forward delay
has
> increased. But at this point your window can afford to be smaller
since
> your BDP is smaller. It takes less time to open a smaller window in
CA.
> So, perhaps this isn't a problem after all.
It is true that it takes a couple of seconds of high-bandwidth (SCH+FCH)
burst (depending on the bandwidth and cwnd) for RTO value to converge to
RTT. When the burst ends and channel switches to narrowband, the delay
increases suddenly and we get spurious RTO. One result of the RTO expiration
is the reduction of cwnd as you mention. However, this may not always be
beneficial especially if the SCH+FCH is re-allocated after a short delay.
More importantly, we believe the major consequence of unnecessary RTO is the
"spurious retransmissions". When RTO occurs there are "cwnd_previous" (cwnd
value prior to RTO) bytes of outstanding data in the network that have
already been transmitted by the TCP source. These original TCP segments are
eventually ACKed by the receiver. However, since source thinks that original
segments are lost, these ACKs are interpreted as if they are due to the
recently transmitted segments and each ACK results in the transmission of
next sequence numbers. Eventually, a total of "cwnd_previous" bytes of data
are retransmitted by the source. Moreover, these retransmissions later cause
duplicate ACKs which can trigger fast retransmit/fast recovery
unnecessarily. We observed this type of behavior in CDMA2000 tests and ns
simulations. Also [R. Ludwig, R. Katz : The Eifel Algorithm] describes
similar observations. We believe the spurious retransmissions can degrade
system throughput significantly especially in systems where the
high-bandwidth channel is released and allocated frequently.
regards,
mehmet
-----Original Message-----
From: Aaron Falk [mailto:falk@ISI.EDU]
Sent: Wednesday, December 05, 2001 12:08 PM
To: pilc
Subject: large RTT variation caused by bandwidth oscillation
So, here's something interesting. In draft-ietf-pilc-link-design-07.txt we
say the following:
> The goal is to compute a RTO that is small enough to detect and
> recover from packet losses while minimizing unnecessary ("spurious")
> retransmissions when packets are unexpectedly delayed but not lost.
> Although these goals conflict, the algorithm works well when the
> delay variance along the Internet path is low, or the packet loss
> rate is low.
>
> If the path delay variance is high, TCP sets a RTO that is much
> larger than the mean of the measured delays. But if the packet loss
> rate is low, the large RTO is of little consequence, as timeouts
> occur only rarely. Conversely, if the path delay variance is low,
> then TCP recovers quickly from lost packets; again, the algorithm
> works well.
>
> But when delay variance and the packet loss rate are both high, these
> algorithms perform poorly, especially when the mean delay is also
> high.
But Farid, et al, point out in draft-khafizov-pilc-cdma2000-00.txt:
> For some (default) network configurations, bandwidth oscillation
> proved to be the single most significant factor in reducing
> throughput. CDMA2000 1x standard, IS-2000.2 [10], provides means of
> transmitting data over two type of traffic channels: Fundamental
> (FCH) and Supplemental (SCH). Fundamental channel has a fixed low
> bandwidth (e.g., 9.6 or 14.4 kbps). Bandwidth of SCH is a multiple
> of that and could be as high as 32 times of FCH bandwidth. To
> simplify notation we denote (SCH+FCH)/FCH bandwidth ratio by O. FCH
> is always assigned before data transmission begins. SCH is assigned
> on per needed basis. When SCH is being used we say that the call is
> in burst. There are two type of SCH assignments: finite and
> infinite [11], which will be referred to as finite burst and
> infinite burst, respectively. Infinite burst means that SCH can be
> used for transmitting data until a release command is issued.
> Finite burst mode of operation limits the SCH usage to one of
> fourteen finite time intervals [11] before it must be released. We
> denote the duration of SCH allocation by B. After SCH is released,
> it can be acquired again after certain delay (D).
>
> One of the ways of detecting congestion in TCP is RTO expiration.
> RTO computation algorithm [12] was designed to follow closely round
> trip time (RTT), but is known to work poorly when delay variance is
> high [13]. During high bandwidth (FCH+SCH) RTT is low and, if B is
> relatively long (e.g., 5.12 seconds), RTO converges to RTT. When
> SCH is released, suddenly RTT increases (proportionally to O) and
> low RTO expires forcing TCP into the Slow Start state, while
> actually none of the TCP segments were lost.
>
> B
> |<--------------->| |-----------------| |-------------
> | | | | |
> | | D | | | SCH
> ---| |<---->| |------| +
> FCH FCH
> -------------------------------------------------------------------
> Figure 1. Bandwidth oscillation. Full cycle time is B+D. SCH and
> FCH are used for transmitting data for time B, then SCH
> is released and only FCH carries data for time D.
At first, I thought that we might want to modify the statement in LINK
because CDMA2000 shows a case where you can get RTOs even though there is
no packet loss at all.
But then I started thinking about how often this might happen. The period
of SCH+FCH bandwidth needs to be fairly long (several seconds according to
the authors) to have RTO converge the current RTT. Then you shift down to
narrowband and get a spurious RTO since your store-and-forward delay has
increased. But at this point your window can afford to be smaller since
your BDP is smaller. It takes less time to open a smaller window in CA.
So, perhaps this isn't a problem after all.
Any thoughts?
--aaron
This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 09:12:29 EST