Re: Retransmissions without explanation

From: Kacheong Poon ([email protected])
Date: Tue May 18 1999 - 19:02:38 EDT


> We were monitoring the transfer with a LAN Analyzer, and we saw a message
> reporting excessive retransmissions. We looked at the traces, and found that
> the Sun was periodically retransmitting some packets. There was no packet
> discard in the routers, no CRC errors, and no TCP segment asking for
> retransmission (with an already acknowledged ACK). So, we concluded that the
> retransmission timeout was expiring at the Sun. We were surprised, because
> the RTO for TCP should be dinamically adjusted to reflect the round trip
> delay of the link.

This question came up some time before in this mailing list. The following
was my response. Note that in Solaris 7, the workaround mentioned in the
mail should not be needed.

                                                        K. Poon.
                                                        [email protected]

>----- Begin Included Message -----<

Date: Mon, 10 Nov 1997 16:46:33 -0800 (PST)
From: "Kacheong Poon" <[email protected]>
Subject: Re: TCP over GEO < 512kbps
To: "Chuck Nunez" <[email protected]>
Cc: [email protected]

> The reference in another message today about a patch (SUN's CONSULT-TCPLFN)
> for Solaris 2.5.1 and below, adds support for Long Fat Networks
> (RFC1323-TCP Extensions for High Performance). It is included in 2.6 (so
> I've been told).

Yes, Solaris 2.6 has all the RFC 1323 extensions.
 
> Finally, there is one other nit that might present an obstacle to high
> performance. If the default values for Solaris' retransmit timers
> (tcp_rexmit_interval_initial, tcp_rexmit_interval_min,
> tcp_rexmit_interval_max) are set below the RTT, then Solaris will
> retransmit unacknowledged packets in spite of the fact that the RTT has not
> yet expired. For example, it the RTT is 750 ms, make sure that the
> retransmit timers are set to 1000 ms or more to preclude unnecessary
> retransmissions.

Well, RTT (round trip time) won't expire (-: The problem you mentioned was
that the calculated RTO (timeout) was lower than RTT. This causes packets to
be retransmitted unnecessarily. I think this RTO problem was discussed in the
last IETF tcp-impl WG meeting. RFC 1323 timestamp option allows TCP to get
very accurate samples of RTT from every packets. TCP's RTO is calculated using
the following formula:

        RTO = sa + 4 * sd

sa is the smoothed average of RTT samples and sd is the smoothed mean
deviation of RTT samples. With timestamp, sa converges quickly to the
real RTT while sd becomes very small. As you mentioned in a previous mail,
RTT of satellite links may sometimes jump to 800ms. This kind of sudden jump
may not be captured by sd. This can cause some unnecessary retransmissions.

In BSD's TCP implementation, RTO usually has a ~500ms "buffer zone." So
the above problem is not seen often. Solaris has a fine grain RTO, so you
may see this problem sometimes. In 2.6, the workaround is to set
tcp_rexmit_interval_extra to 500 to get an extra 500ms buffer zone like BSD.

The tuning you suggested above is for a different bug in earlier releases of
Solaris. 2.6 should not have that problem.

                                                        K. Poon.
                                                        [email protected]

>----- End Included Message -----<



This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:55 EST