I want to take up on this again because I believe that it is an important
issue for this group and at the same time one of the very few things I
don't agree on with you, Phil.
I will use the abbreviation FEC where 'C' stands for 'Control' not
'Correction' to refer to all other transmission schemes (power control,
FECorrection, Interleaving, frame length control, spreading factor control,
etc.) but _not_ ARQ.
Assuming for this discussion that the IP header tells the link layer that
it is dealing with a reliable flow (e.g TCP, reliable multicast, or a flow
that built reliability ontop of UDP - as Steve Deering mentioned) ...
At 07:10 PM 1/27/99 , Phil Karn wrote:
>> The link layer retransmissions should be continued to the maximum possible
>>amount of time in order to get the maximum benefit out of it, making sure
>
>My point is that on any reasonable channel, the maximum benefit from
>link-level ARQ comes very quickly with the very first retransmission
>or two. It just doesn't have to be carried out very far, because under
>normal circumstances you don't have to retransmit very often. If you
>do, something is probably broken. Just blindly hammering away
>indefinitely is probably not going to help.
I think we have to differentiate a little bit what is broadly talked about
as "intermittent connectivity". I see at least 3 cases:
1. complete link outage that exceeds the order of the path's RTT
2. transient link outage that does not exceed the order of the path's RTT
3. sudden bandwidth drop (this is actually not "intermittent connectivity"
but gets close)
I actually believe that case 2 is just the "worst case" of case 3.
I think/hope it is probably easy to agree that case 1 is not a networking
problem but much rather requires more or better tuned base stations. You
also can't expect TCP to work fine if you pull the Ethernet jack. Just as a
side note: certainly a fully-reliable LL ARQ sender will have means to
detect these situations and react accordingly, e.g. consider the link as
broken after some timeout, as e.g. done in GPRS.
Case 3 happens with ARQ (both semi- and fully-reliable) if the user changes
into an environment where the radio characteristics are completely
different (worse) than before and suddenly the fraction of frames the LL
ARQ sender has to retransmit increases considerably. Note that even if the
channel gets bad usually most frames (> 90 percent; unless FEC is poorly
designed or non-existent (e.g. 802.11 WLAN) or this is not a bad channel
but a link outage) will still make it OK across the link. Just not so many
anymore which leads to decreased throughput.
I fully agree with you that it is the link designer's goal to engineer the
adaptive FEC techniques in such a way that these sudden bandwidth drops
(case 3) are not all that sudden but smoothed out "sufficiently". The
tricky part is to define "sufficiently". As a rule of thumb I would say
that the delay due to LL ARQ introduced per IP packet should in 95 percent
of all cases not exceed the wireless link's RTT (maybe also including the
queueing delay of the per flow queued packets). However, I have not thought
about that rule long enough yet.
Maybe one could even go that far and say that experience reported in the
form of measurements in several papers indicate that the 802.11 WLAN link
is just not designed well enough to meet these guidelines. On the other
hand I'm convinced that e.g. the GSM circuit-switched data link
"over-meets" those guidelines and that much higher throughput could be
achieved by weakening the FEC (which by the way is being done in GSM) while
still being in the limits of such guidelines. Again, this is only in the
context of dealing with reliable flows like TCP.
I do not agree with you, however, that semi-reliable LL ARQ is the right
path to take (when dealing with reliable flows) but instead argue in favor
of fully-reliable LL ARQ. The reason is simply that this will both in case
2 and case 3 cause congestion which is exactly the right signal we want to
give to the sender. If over the duration of a transport connection the
frequency of events like case 2 or the 5 percent of the case 3 situations
that FEC couldn't catch, don't happen often then we don't have to bother
about them. Else we might think about changes in the transport protocols to
minimize the damage, e.g. eliminating excessive spurious retransmissions,
as discussed before.
///Reiner
This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 09:12:20 EST