Re: I-D ACTION:draft-ietf-tcpsat-stand-mech-00.txt

From: Eric \ ([email protected])
Date: Tue Oct 14 1997 - 15:19:22 EDT


Curtis,

I've missed most of this discussion, but I got your message :o)
I'll risk responding prior to pulling the mailing list archive
and catching up with the rest.

>Sending one source quench per second would have no affect at all on
>typical Internet traffic which has hundreds to hundreds of thousands
>of TCP flows. In congested multiple bottleneck situations with lots
>of flows, sending more traffic (source quench) has been proven to be a
>bad idea. Links that are congested with a very high contribution from
>small duration flows can need 5-15% loss. This would mean you'd need
>to add 5-15% more packets (small ones) to get the equivalent slow down.

I believe the point with the "one source quench per second" was as an
example recognizing the importance of damping the rate of generation.
The rate is something that is undetermined; Such a scheme is highly
dependent on widespread deployment of RED, so the "source quench" in
question *is* an ECN; But instead of (or in addition to) marking a bit
on the IP header, it also generates a Source Quench. You don't send one
per offending segment, but rather damp it by limiting the rate of
generation to a particular source. RED makes the generation of source
quenches far more palatable than it currently is; This is not intended
to REPLACE loss as a signal of congestion, but rather it is meant to
supplement it. I'm not sure why you were led to believe otherwise, but
this simply is not the case.

The availability of ECN is particularly important in a long delay
environment because the "loss as a signal of congestion" indication will
take more than a RTT to propagate back to the sender in the form of 3
duplicate ACKs; This extends the length of the congestion event so that
it lasts at least this long, and if the long delay source is the major
contributor to the congestion, everyone is going to suffer. With ECN in
the form of a quench, you've got a shot at short-circuiting the long
delay feedback path; Yes, this only helps of the congestion is on the
near side of the long path relative to the source, and yes, source
quenches are unreliable.

The number of flows through the lossy link is an important to factor
to consider; To deploy such a scheme on a backbone link just won't
scale; But, on such links, it should be more of a moot point. If I
can provide enough power/bandwidth to trunk large numbers of small
flows, you can bet that I would have done everything possible to
mitigate the possibility of errors on the channel. In the transient
situations where this proves inadequate, oh well. That's life - if
this is going to happen regularly, then I've really made some bad
design choices.

The scheme you are reacting to is intended for links supporting
small to moderate numbers of traffic flows. No, I won't quantify
what I mean by small to moderate, because it really isn't important
right now.

The range of satellite uses extends far beyond that of large trunks.
The trend for scientific/research satellites is smaller, cheaper,
better.
Part of this triple is being able to use the Internet (or an internet)
to transfer data from the vehicle to the principle investigator. There
is a strong desire to use TCP as part of that process;

Similarly the wireless environment is more skewed toward a small number
of flows over their potentially lossy links. They want to run unmodified
TCP based applications too.

Different solutions may be appropriate for dramatically different
situations.

>You can't propose something that will help the lossy wireless and
>satellite cases and break the rest of the Internet. (Actually you can
>propose anything, you just can't expect the proposal to get anywhere).

In a general response to the above, I'll have to counter with:

If we stick our collective heads in the sand and ignore the problems
of wireless and satellite links, some less "community spirited" vendors
are going to begin to do things that are antisocial; People in these
environments will buy into such products/protocols if it makes *their*
performance appear better - the rest of the Internet be damned. The
motivation is to prevent this. Sure, the counter attack to antisocial
applications (such as things based on UDP without any congestion control
mechanisms) is to attack them at the routers. Isn't it better to attempt
to prevent the need for such escalation now by attempting to address the
problems as a community now, rather than reacting to them later?

The satellite and wireless communities are growing at a rapid clip,
and just telling them to live with poor performance isn't a particularly
good idea - there's money to be made, so there is incentive to be
"unfair" in performance.

I've heard more than a few times from folks in the commercial satellite
community statements to the effect of:

"My launch date is 200x. We really want to be good neighbors and
work within the system [wrt to Internet traffic], but if the IETF
isn't able to provide us with solutions, we're going to do whatever
we have to do for providing good service to our customers."

That is the really important point to consider throughout the life span
of this mailing list.

They are looking for solutions, but they've got large amounts of $$$
at stake. I happen to think it is prudent to explore ways of helping
them out sooner rather than later.

For a more specific response:

I'm not aware that this scheme has been proposed for use of the
Internet.
In fact, we've been pretty darned careful NOT to propose this for use on
the Internet to date. we need to build the sufficiently strong case that
this will NOT harm the traffic on ALL the shared paths first. This *is*
effective (and apparently fair) for the environment which it was
originally conceived; OK? :o)

>When Sally Floyd put out the TCP-ECN paper in 1994, the source quench
>idea was soundly rejected but there has been interest in the ECN part
>as a means to suppliment loss as a congestion indication, not replace
>it.

Again, who is attempting to replace loss as a congestion indication?
                                                
When you can provide information regarding a short term *trend* toward
loss
                                          
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
on a particular link that can be correlated to link layer activity, that
information is provided to TCP entities that can understand this.
The trigger on a loss event needs to be rather tight, it isn't
intended to be tripped on one, two or even a small integer number of
link events within a given period.

>The idea of an "experiencing loss" bit where loss should be ignored
>is also dangerous if you assume that some routers will be congestion
>bit aware and some still use drop as the means to indicate congestion.
>Such a scheme would face serious obstacles in being deployed.

Again, there is no replacement of any existing signals. At least from
our perspective, the "experiencing loss" bit is actually a TCP
option on every segment sent back during the corruption period. It
is done at the end-systems, and that behavior can only be triggered
by receipt of a "corruption experienced message" from a corruption
aware routing entity. These entities would only be deployed at the
terminus of RF links;

The core of all of this is to recognize the trend toward link-layer
loss and signal this to the appropriate end-to-end entities. For this
to work, there needs to be explicit notification of congestion in
addition to signaling congestion through loss. What happens if a path
is experiencing both congestion based losses and corruption based
losses simultaneously?

    Corruption is the default response to loss, so in the absence of
    any other indications, a congestion response is triggered at the
    source; If "corruption experienced messages" begin to arrive
    piggybacked on segments/ACKs *and* there is a loss within this
    stream (signaled by duplicate ACKs or a retransmission timeout),
    then a corruption response is triggered (don't cut cwnd); If the
    source receives a quench message in the midst of the corruption
    experienced options, it triggers the congestion response.
    
    If a quench is lost in the middle of a corruption event, then there
    is the risk of being antisocial. However, corruption events are
    to be bounded in duration (but they might cascade), so we have to
    determine how bad this could be on the Internet (as opposed to an
    internet);



This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:31 EST