I've finally gotten around to reading Error, and I have a few
additional comments on it:
> 2.1 Slow Start and Congestion Avoidance [RFC2581]
>
> Slow Start and Congestion Avoidance [RFC2581] are essential to
> the Internet's stability. These mechanisms were designed to
> accommodate networks that didn't provide explicit congestion
> notification.
I'd recommend changing the wording slightly here: "These mechanisms
were designed to accomodate networks that didn't provide any form of
explicit congestion control". (we aren't just talking about ECN
in this sentence. And ECN in fact uses SS and CA...)
Section 2.2:
> NewReno [RFC2582] apparently does help a sender
> better handle partial ACKs and multiple losses in a single
> window, but at this point is not recommended due to its
> experimental nature. Instead, SACK (Selective Acknowledgements)
> is the preferred mechanism.
Both NewReno and SACK are becoming popular. I'd say that the bad
thing about NewReno is that it takes N RTTs to recover if there are N
losses. The good thing about it is that it requires only one side of
the connection to implement the protocol. SACK appears to be a much
better solution. Unless there is objection, I'd say recommend SACK
generally and NewReno (in addition, not instead) for situations where
SACK won't work (this would normally be because the person reading
this would only be able to install a solution on the transmitting end
of the connection). Here is new text:
Recommendation: Implement Fast Retransmit and Fast Recovery at this
time. This is a widely-implemented optimization and is currently at
Proposed Standard level. [RFC2488] recommends implementation of
Fast Retransmit/Fast Recovery in satellite environments. In cases
where SACK (see next section) can not be enabled for both sides of
a connection, NewReno [RFC2582] may be used by TCP senders to
better handle partial ACKs and multiple losses in a single window.
Section 2.3:
> In low-speed, high error-rate environments (for example, the
> wireless WAN environment), TCP windows are much smaller, and burst
> errors must be much longer in duration in order to damage multiple
> segments. Accordingly, the complexity of SACK may not be
> justifiable, unless there is a high probability of both burst
> errors and congestion.
This logic doesn't make a lot of sense to me. I suppose that if your
error rate is high, and you are constantly timing out, SACK is not
going to make any difference. But if there is congestion too
(i.e. the total error rate is even higher), SACK is going to help even
less.
I'm not an expert, but I would think that in wireless environments,
changes in surroundings (i.e. someone walks in front of the wireless
card) would cause a high probability of correlated errors. This would
imply to me multiple losses per window, exactly the thing that SACK
is good for. So if things mostly *don't* suck, SACK will help for the
times when they do.
If things mostly *do* suck, not much is going to help.
Here is a proposed change/addition:
In low-speed, high error-rate environments (for example, the
wireless WAN environment), TCP windows are much smaller, and burst
errors must be much longer in duration in order to damage multiple
segments. Accordingly, the complexity of SACK may not be
justifiable.
On the other hand, if error rates are generally low but
occasionally increase due to interference, TCP will have the
opportunity to increase its window to larger values. When
interference occurs, multiple losses within a window are likely to
occur. In this case, SACK would provide benefits in speeding the
recovery and preventing unnecessary extra reduction of window size.
Section 4:
I promised Aaron a note on "explicit notification of transmission
error" based on a discussion Phil and Aaron and I had at IETF. I'll
try to emit that tomorrow. It probably doesn't impact the draft once
(except maybe to change this one sentence), but it ought to spark some
interesting discussions.
Section 4.1:
Playing something of the devil's advocate here... I don't see that
the recommendations are impacted at all by whether connections are
short or long. Even in the current world with lots of short flows,
sometimes in parallel, we still want people to do SS/CA, FR/FR and
SACK. I don't think there is any question that it would be of
additional benefit if parallel TCP connections shared congestion state
either through 2140 or ECM. I don't necessarily think I'd argue that
they are "crucial to the long-term stability of the Internet" (we've
lasted this long, after all!), but they should definitely be
beneficial to performance.
The recent CCR paper by Eggert, Heidemann, and Touch on Ensemble TCP
showed some of the benefits of combining congestion state for http
traffic. I'd kind of like to see the same work done in an "error"
environment to see what effect it would have on performance...
--Jamshid
This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 09:12:24 EST