Phil,
in a paper that we will present at the upcoming ACM SIGMETRICS 99
conference we describe a measurement-based study of TCP bulk data over
GSM's RLP protocol. The platform we used for tracing allowed us to monitor
TCP as well as RLP. We measured both in very good but also _very_ bad radio
coverage. The major findings are:
1. At least with the FEC/Interleaving used for circuit-switched data (CSD)
in GSM, fully-reliable ARQ does _not_ interfere with TCP's Rexmt timer. We
only found 2 spurious timeouts because of this race condition which were
both at the beginning of a connection when the RTO had not yet converged to
the "right" value.
-> To be fair I have to say that we were using TCP-Timestamps to get more
instant feedback when the RTT increased if e.g. the link quality started to
decrease and RLP had to do more retransmissions. So probably timing every
packet is what you want to do in this environment although I'm not at all
convinced that we have to put in a 12-bytes TCP option for that. A few bits
- maybe 2 to unambiguously mark the first 3 transmission of a packet -
should be sufficient to get rid off the retransmission ambiguity problem
you discussed in your SIGCOMM 87 paper.
-> In another in-progress paper we are studying this topic in more general
terms. Our current results shows that the reason for the extremely
infrequent spurious timeouts is not so much the coarseness (500 ms) of the
BSD4.4 timers but much more (among other reasons we discuss in that paper)
A. the hyper-senstivity of the RTO to RTT variations and B. the fact that
TCP's Rexmt timer is always off by roughly one RTT because (at least in
BSD-derived implementations) the Rexmt timer gets re-started
(re-initialized with the RTO) with every ACK for new data. You do need some
extra slack to give the FastRetransmit algorithm a chance to kick in, but
an entire RTT is probably a bit too conservative.
2. Semi-reliable ARQ as you proposed can do a lot of harm when you are
running VJ TCP/IP header compression which you certainly want to do on a
low bandwidth link. The reason is that the header decompressor screws up
when one or more packets are lost causing checksum errors at the TCP
receiver and thus a loss of an entire window. This is particularly painful
when you are running over a massively overbuffered link because the windows
grow huge. The latter happens easily: just take a BSD-UNIX machine which by
default has an interface buffer of 50 packets and connect it to a low
bandwidth link (the per-route metrics that TCP maintains in the routing
table don't really help: they are flushed whenever the PPP link goes down
and they certainly don't work for the default route). We detected this
phenomena also with GSM's RLP protocol when it does a link reset after it
gives up after a default of 6 retransmisisons. However, A) this happened
_very_ rarely and B) you can bump up that paramater (N2) with an AT command
to get rid off the problem.
In conclusion we believe that reliable end-to-end flows like TCP over
wireless links which are often the bottleneck, do require well engineered
FEC/Interleaving schemes + fully-reliable ARQ. First, because of the header
compression problem but more importantly because we believe that a spurious
timeout is a very useful signal for the TCP sender (if it can find out
about it; which it currently can't) as discussed in the following.
Spurious timeouts don't have to be that disastrous as they are. For that we
are currently implementing a "spurious timeout detection mechanism" . The
problem with spurious timeouts caused by excessive delays is that the
retransmission ambiguity problem fools the sender into believing that an
entire window got lost and he will retransmit it again. Above that these
DUPPACKs will generate DUPACKs which will then trigger a spurious fast
retransmit. The basic idea of our mechanism is to mark the packets to allow
the sender to discriminate between an original ACK and an ACK for a
retransmission (e.g. using timestamps or better the 2 bits discussed
above). This will not prevent the first spurious retransmission, however,
once the sender gets the _original_ ACK, i.e. the "spurious timeout"
signal, it will know that it did the wrong thing. The sender can then use
that "signal" and the timing derived from that late ACK to A. restore the
congestion window to the vaule it had before the timeout occured and B. to
update the RTO given the new measurement in order to hopefully prevent
further spurious retransmissions. BTW, the same approach could also be used
to detect spurious FastRetransmits after packet re-orderings > 3 packets.
The latter is not new and has e.g. also been proposed by Sally Floyd in a
private discussion.
A few more comments to your mail ...
> I designed the radio link protocol for IS-95 CDMA packet data with
> this issue specifically in mind. Some form of link-level ARQ is
> essential in that system, because the raw physical layer frame erasure
> rate (1-2% for a ~30 byte frame) is too high for end-to-end TCP
> retransmission alone to give acceptable performance. But "too much"
> link-level ARQ would cause the problem you describe.
Not necessarily. Would be interesting to see measurements with the IS95-RLP
when you do more retransmission attempts than just 2.
> My approach was to limit the number of frame-level retransmission
> cycles to two, and to send two duplicate copies on the second cycle to
> increase the chances of at least one getting through. This design was
> somewhat ad-hoc, but they seemed to work well in field tests. Even a
> single retransmission made such a dramatic improvement in TCP-level
> throughput that it didn't matter much what we did beyond that. We
> simply made the link "good enough" to carry TCP with reasonable
> throughput. The link didn't have to be, nor should it have been,
> perfect.
I don't know exactly how the IS95-RLP really works but I think it would be
better to give up on the basis of some treshold retransmission delay
introduced per packet.
Something I always wanted to know: Does the IS95-RLP toss the entire IP
packet or only that fragment of it to leave the job of discarding to e.g. a
PPP receiver?
> You can make your link layer loss rate as low as you want *if* you
> don't care about latency. You just FEC code and interleave over a
> sufficiently long time span. But then you *always* have to wait for
> your data. With ARQ, you wait only when you have an retransmission.
Good point. The GSM designers certainly overdid it when they designed the
interleaving scheme for GSM CSD introducing a one-way (!) latency of
roughly 100ms (!). Fortunately, this has been corrected for the upcoming
GSM packet data service.
> [...] And the ARQ should give up after an
> interval comparable to the TCP round trip time -- not that I know how
> to do this without some extra inter-layer communication.
We thought of the same but as you say yourself: at the link layer you have
no way to determine the path's RTT. The only worst case approach that you
can do is to assume the wireless link's RTT + maybe the queueing delay of
the flow's packets in your RLP send buffer.
> The IP TTL field is theoretically calibrated in seconds. In theory,
> the sending TCP could set the IP TTL field based on its current
> retransmission timeout so an undelivered packet delayed in the network
> by an unusual number of link level retransmissions (or for any other
> reason) could be dropped when TCP would retransmit anyway. The coarse
> quantization of the TTL field (1 sec increments) and the minimum
> per-hop decrement of 1 makes this impractical in today's Internet.
>
> Now that GPS clocks are common, we could define an IP header option
> with a precise expiration time. Can anybody think of a real link for
> which this wouldn't be overkill?
I really think that this is a great idea but not necessarily for TCP.
However, for semi-reliable end-to-end flows (e.g. a stock quote
broadcasting application that periodically refreshes obsolete information)
this would be very helpful and this is where I see that semi-reliable ARQ
is useful. An advantage would be that it would also work with IPsec. Other
header compression techniques would be required, though.
///Reiner
This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 09:12:19 EST