Much of the recent traffic in the TCPSAT WG has been
concerned with the error rates of satellite links. There seem to be a
school of thought that satellite links can be made to perform as well
as fiber links by the use of FEC schemes and proper design of the
RF components. Others have commented that due to uncontrolled
natural occurrences that there will be occasional degradation of the
signal and loss of data on the link. I think the discussion on these
issues is missing one big point. TCP is an end to end protocol, one
that is affected by all possible degradation in the complete path. To
focus in on just one link in the chain might be shortsighted.
Reducing Loss on a satellite link will help reduce the overall loss in
the path, but it does not necessarily eliminate it.
Let's assume for the moment that a satellite link can give us BER
performance comparable to fiber. That doesn't mean that the rest of
the link path will be error free. The probability of loss of a packet
(or
its ack) is the accumulation of the probabilities of loss due to
congestion, link failure, corruption or other factors on ALL of the
links
in the path. In tests that I have run over the open Internet using
PING at different rates and packet sizes, I have seen random error
rates between 1% and 5%.
Various research on the effect of loss on the size of the TCP
congestion window puts the effective window size at approx.
1/sqrt(probability of loss). Even at losses as low as 0.5% the
effective cwnd size for a connection would be around 14 segments.
Using larger send/receive windows does not change this fact. Larger
send/receive windows only improve data flow in VERY LOW loss
situations. Our initial tests with SACK enabled TCP has shown
some improvement, but the increase to the cwnd is only about 1
when losses are between 1 and 5%.
Theoretical throughput can be calculated as WindowsSize / RTT.
WindowSize will be the smaller of the receive window or the
congestion window. If we assume very large receive window sizes
and some loss in the network than our throughput becomes
1/(RTT * sqrt(Loss)). Clearly it's the combination of loss and delay
that causes problems in TCP throughput. Reduce either factor to
zero and bandwidth is only limited by the link speed and/or recv
window size.
If we're still assuming that satellite and fiber are comparable in
terms of error rates, then the key distinguishing feature of satellites
(GEO in particular) is the longer latency of the link. We need to fully
explore how long latency will effect performance in a shared network
environment where we must assume some losses occur. Here are a
few issues that I think need to be more fully addressed:
TCP assumes all loss is due to congestion. How true is this
assumption? How would the use of ECN improve the way that
TCP responds to loss and congestion?
Is loss that IS caused by congestion distributed fairly? What
about non-TCP streams, are they reducing their throughput when
congestion is indicated? Some of the research on this issue
seem to indicate the answer is NO in both cases. What can be
done to correct this? Is RED the answer?
TCP recovers from congestion at a rate that is related to the RTT.
This would seem to give the advantage to low latency paths when
shared resources are in use. I have seen some suggestion for
changing the congestion recovery scheme of TCP so that it is no
longer linked to RTT but would increase by some constant rate. I
haven't seen any research on what the negative implications of
this might be.
The only way to improve throughput without dealing with the
losses is to lower the latency of the link. This is not possible
on a
GEO satellite without seriously breaking a few laws of physics.
What can be done is to give the APPEARANCE of lower latency
by spoofing acks. This approach has its limits though, it only
works well when the traffic is asymmetrical, it does nothing for
interactive applications, and may have a serious impact on
security. These need to be well documented so that those who
choose the spoofing route are fully aware of the risks as well as
the benefits.
Let me finish by saying that the purpose of this message is not to
criticize, but to generate some discussion on some issues that might
need more attention. If any of my assumptions are way off base
please let me know, but please don't send me mail telling me to
check this RFC and that Research paper. I have probably already
seen them and I am trying to generalize the issues, not detail the
specifics.
Thomas J Lynch
AT&T Labs
101 Crawfords Corner Rd
Holmdel, NJ
[email protected]
This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:38 EST