Re: large RTT variation caused by bandwidth oscillation

From: Kacheong Poon (kcpoon@Eng.Sun.COM)
Date: Tue Jan 08 2002 - 22:16:04 EST


Sorry for the late reply in this thread. I was on vacation the
last month. I just want to make several comments since Solaris
was mentioned several times in this thread.

It was mentioned that Solaris always had a problem with spurious
retransmission. In pre-Solaris 2.6, Karn's algorithm (maybe this
is the reason why Phil Karn stopped using Solaris...) was not
implemented correctly. One can see that what this can cause when a
timeout due to congestion happens. And there were various other
little things which caused problems with slow link. I guess this all
leads to the now "infamous" quote about Solaris having serious
spurious retransmission problem... I hope this can be straightened
out.

Another comment is about RFC 2988 compliant. We have several
reservations about RFC 2988. And it may be better not to conform
to it.

For example, in section 2, rule 2.2, one can easily see that if G
is not a big value (say at least 500ms or 1s), this can cause problem
in a slow link. For example, consider the case that the first
segment which is used for RTT sampling has only a few bytes and the
next segment sent is actually a full MSS size segments. If G is
in the range of 100ms and the slowest link speed of the route is
9600bps, the second segment will trigger a timeout. This problem
is masked by rule 2.4, which restricts the minimum RTO to be 1s.
But as mentioned in the RFC, this is quite conservative. And I
beleive many implementations choose not to comply with it. As
a matter of fact, I believe having a fixed RTO of 1s for land
links (no wireless) should be able to aovid most spurious
retransmission in today's Internet (oops, I cannot substantiate
this claim with facts )-:) We don't need to do RTT sampling at
all (-:

There is one comment specifically saying that Solaris does not
comply with Section 5, rule 5.3. It is correct, Solaris does
not implement the part of restarting timer in rule 5.3. The timer
is restarted when a fast retransmit happens or during the fast
recovery phase when missing segments are retransmitted.

Rule 5.3 simply says that the RTO calculation is not good enough so
that implementations should add a little fudge factor to it. For
example, if 2 segments are sent and the receiver does not delay
ack'ing. The second segment is dropped. After the ACK for the
first segment arrives, rule 5.3 suggests that the timer should
be restarted. This means that the actual timeout value for the
second segment is RTO+RTT. The arrival of first data segment
correlates weakly, if there is any correlation, to the fate of the
next segment. This point has been mentioned by a lot of other people.
And the fact that it helps spurious retransmission is that it makes
the timeout longer and longer. If we make RTO a fixed 5s, I conjecture
that there will be no spurious retransmission, even for wireless
links (-: (Please don't take those "smiley points" seriously (-:)

And I believe Farid Khafizov's last email (there are too many emails
in my various mailboxes, I may have missed some...) mentioned that
even with this timeout fudge factor, RTT oscillation is still a
problem. His earlier mails suggesting that RFC 2988 compliant
TCP implementations could handle his earlier experiements may just
be "pure luck..." The current TCP RTO algorithm is not designed to
handle this kind of wireless environment. And we all know that the
current assumption of the RTO calculation is that the round trip route
does not change much. So it seems to me that we may need a better way
to deal with this in the RTO calculation. A quick hack for Solaris
admin is to set tcp_rexmit_interval_extra to a value which is appropriate
for the kind of RTT oscillation of the particular network. This
knob is there to handle cases when the RTO algorithm fails, and I
believe this kind of oscillation in the wireless medium may be one
of them. In any case, we are open to suggestions to what we should
do to accomodate different network environments.

I have another minor comment about what actually happens after a timeout.
I believe all modern TCP implementations will not have the false fast
retransmit after timeout problem mentioned in this thread.

                                                        K. Poon.
                                                        kcpoon@eng.sun.com



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 09:12:29 EST