Re: BER and TCP/IP performance

From: Vijay G Bharadwaj ([email protected])
Date: Thu Apr 01 1999 - 12:27:53 EST


On Wed, 31 Mar 1999, Eric Travis wrote:

> I do have some tcpdumps of pathological runs tucked away - I'll dig for
> them. However, I'm not so sure your data contradicts me... My errors
> were intentionally whacking a segment every nth frame (set up the
> channel emulator to do this for me). For T1 rates, your numbers don't
> surprise me - they don't contrict me either :o)
>
> BER is the wrong metric to be using here. What is important to TCP is
> segment loss distribution.

Ahh... okay, this is where we are talking at cross purposes...

Yes, periodic error is the worst case, and it will give you horrible
results because you keep halving the cwnd until it totally dies out.
However, it is worth noting that whacking every nth frame does not
constitute a constant BER, because of TCP's congestion control algorithms.
So you can't use these results to make a statement about acceptable BER.

Before I go on, let me state that I agree with you completely when you
state that segment loss distribution, not BER, is the metric that says the
most about the effect of error on TCP. Having said that, designers of
satellite links in the real world must calculate link budgets based on
BER, and segment loss is not a parameter they can use. Which I think is
part of the reason this question keeps coming up.

Now for my explanation. I'm using a delay simulator which inserts random
bit errors in the raw data stream. So at a constant BER, your probability
of segment loss in a given RTT decreases when you decrease your cwnd. This
is because a certain number of error events are going to happen in that
time period, and if it happens that you were not sending data when one of
these events happened, then it can't affect you. So for example the bit
that the error hits might just be part of your idle fill on the channel,
in which case TCP is unaffected by it.

And now for a little example. There's a lot of hand-waving in this example
and some of the math isn't even close to precise, but I'm including it to
try and present an intuitive picture. In the example I assume a stationary
error process, independent of the data transmitted. Please don't pay too
much attention to the exact values I quote; I just don't feel like
including all the conditions required to make it correct.

Say for the sake of argument that the window grows up to W, which is the
bandwidth-delay product of the channel (don't ask how it got that large).
Now if I have a BER that gives me one error per RTT on average, then
typically I will get hit with one error in this RTT, and will halve my
cwnd. Now in the next RTT, I am only transmitting half the time, so at the
same BER my probability of losing a segment in this RTT is about half. So
now in the typical case I will only lose a segment once in two RTTs. And
so on.

The thing to note here is that typically I will get *some* window growth,
and even in this case I will not fall down to one segment per RTT if I
avoid timeouts by doing SACK and so on. And eventually an equilibrium will
be reached at some point, but it will be at a cwnd of more than just a
single segment or two.

So yes, your results are at odds with mine, but I don't feel they
contradict me either ;)

Another point I was trying to make was that if your errors were bursty
then TCP (at least the newer flavors) sees only a single congestion event
per errored RTT and so the halving is less frequent.

For my model it's nontrivial to correlate bit errors to specific RTTs
exactly, without accounting for the exact behavior of TCP and when it puts
each segment on the channel. I'm not sure if an approximate approach would
come close, but it might be within a fudge factor of the correct answer...

Now the big question I guess is which model is closer to what actually
happens in real life?

-Vijay



This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:54 EST