On Thu, 1 Apr 1999, Vijay G Bharadwaj wrote:
I think we are on the same side of things.
> On Wed, 31 Mar 1999, Eric Travis wrote:
>
> > I do have some tcpdumps of pathological runs tucked away - I'll dig for
> > them. However, I'm not so sure your data contradicts me... My errors
> > were intentionally whacking a segment every nth frame (set up the
> > channel emulator to do this for me). For T1 rates, your numbers don't
> > surprise me - they don't contrict me either :o)
> >
> > BER is the wrong metric to be using here. What is important to TCP is
> > segment loss distribution.
>
> Ahh... okay, this is where we are talking at cross purposes...
>
> Yes, periodic error is the worst case, and it will give you horrible
> results because you keep halving the cwnd until it totally dies out.
> However, it is worth noting that whacking every nth frame does not
> constitute a constant BER, because of TCP's congestion control algorithms.
> So you can't use these results to make a statement about acceptable BER.
Right - I was responding to the notion that you can get decent TCP
performance as long as you don't get more than a single loss every
one or two RTTs.
An "acceptable" BER for TCP is no more quantifiable than an acceptable
path latency, there is more information required to make a decision - BER
needs a distribution and a translation into a segment loss before you can
make any critical decisions.
Without a distinction between congestion based loss and corruption
(or erasure) based loss, you really need as clean a link as possible
with TCP.
It is a testiment to TCP's effectiveness that you can get ~50%
utilization over a GEO-hopped T1 with measured BER (over some period)
of 1E-6. Then again, depending on what that T1 is costing *me* to lease,
this still might be unacceptable utilization. Bandwidth is far from free
:o)
> Before I go on, let me state that I agree with you completely when you
> state that segment loss distribution, not BER, is the metric that says the
> most about the effect of error on TCP. Having said that, designers of
> satellite links in the real world must calculate link budgets based on
> BER, and segment loss is not a parameter they can use. Which I think is
> part of the reason this question keeps coming up.
Right - this is one place (amongst many) where I think the paper Phil Karn
proposed during the PILC BOFs can really help. These questions need to
consider the entire system, not just one element.
> Now for my explanation. I'm using a delay simulator which inserts random
> bit errors in the raw data stream. So at a constant BER, your probability
> of segment loss in a given RTT decreases when you decrease your cwnd. This
> is because a certain number of error events are going to happen in that
> time period, and if it happens that you were not sending data when one of
> these events happened, then it can't affect you. So for example the bit
> that the error hits might just be part of your idle fill on the channel,
> in which case TCP is unaffected by it.
>
> And now for a little example. There's a lot of hand-waving in this example
> and some of the math isn't even close to precise, but I'm including it to
> try and present an intuitive picture. In the example I assume a stationary
> error process, independent of the data transmitted. Please don't pay too
> much attention to the exact values I quote; I just don't feel like
> including all the conditions required to make it correct.
>
> Say for the sake of argument that the window grows up to W, which is the
> bandwidth-delay product of the channel (don't ask how it got that large).
> Now if I have a BER that gives me one error per RTT on average, then
> typically I will get hit with one error in this RTT, and will halve my
> cwnd. Now in the next RTT, I am only transmitting half the time, so at the
> same BER my probability of losing a segment in this RTT is about half. So
> now in the typical case I will only lose a segment once in two RTTs. And
> so on.
Right, but I still think your loss rate (measured in RTTs)
is less frequent:
For every RTT you lose a segment, you cut your cwnd in half;
For every RTT you don't lose a segment you can increase your cwnd by
(at most) one segment. [*]
The decrease is exponential, the increase linear.
If you are halving your cwnd every other RTT, you'll never realize
any sustained growth.
[*] I've omitted slow-start behavior because that will end once
you hit your first loss event.
A contrived example:
At epoch 0, your cwnd is W
A loss event occurs, you cut cwnd in half
At epoch 1, you cwnd is now W/2
You grow it (assuming no delayed Acks) one mss
At epoch 2, your cwnd is now (W/2 + 1)
A loss event occurs, you cut cwnd in half
At epoch 3, your cwnd is now (W/4 + 1/2)
You grow it..
etc.
So, after something close to 2*log2(W) RTTs of this cycle,
you've bottomed out. My math might be off, but the trend is
clear.
> The thing to note here is that typically I will get *some* window growth,
> and even in this case I will not fall down to one segment per RTT if I
> avoid timeouts by doing SACK and so on. And eventually an equilibrium will
> be reached at some point, but it will be at a cwnd of more than just a
> single segment or two.
In my contrived example above, I get window growth 50% of the time.
Keeping those gains is another story. :o)
It would get even more interesting if you introduced a real congestion
bottlneck into the path, and then some contending flows with shorter
paths. You'd definitely see different performance/behavior. Real world
considerations must consider the effects of congestion - even with
point-to-point paths (multiple flows across the link).
> So yes, your results are at odds with mine, but I don't feel they
> contradict me either ;)
On the contrary, I thought your results were perfectly reasonable.
No contradictions either way.
> Another point I was trying to make was that if your errors were bursty
> then TCP (at least the newer flavors) sees only a single congestion event
> per errored RTT and so the halving is less frequent.
Absolutely, bursty errors are *good* here.
> For my model it's nontrivial to correlate bit errors to specific RTTs
> exactly, without accounting for the exact behavior of TCP and when it puts
> each segment on the channel. I'm not sure if an approximate approach would
> come close, but it might be within a fudge factor of the correct answer...
Is there any way to monitor the path (either/both sides of the link
simulator) using TCPdump and then use that raw data to find segment
losses? These could then be correlated to specific RTT epochs...
> Now the big question I guess is which model is closer to what actually
> happens in real life?
Well, I know that my testing was pathological - I intentionally rigged
it to be so.
Raw bit errors seem to be bursty, but the effects of link coding
tends to distribute them across multiple packets (a good thing for
recovery), so what happens in real life depends on the design of
individual links. At least, in my simple-minded world view :o)
I still have the distinct feeling that I've been in this conversation
before...
Oh well.
Eric
This archive was generated by hypermail 2b29 : Mon Feb 14 2000 - 16:14:54 EST