[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ns] FullTcp & "done"



Well, to make things more clear, I'd like to point out that we are dealing
with two problems here: invoking the done callback at the right time, and
reusing FullTcp agents. These two issues are closely related, but they are not
exactly the same.

Real TCP implementations don't have a done callback, so it is not even well
defined what we mean by that. Does it mean *both* endpoints are closed? Or
that the local endpoint is closed and not willing to accept any more packets?
(I'm not even considering half-closed connections yet...). If we are going to
use done for reusing agents, it makes sense to invoke done when both endpoints
are closed. Unfortunately, this is troublesome, since we will have to know
that both agents are in TCPS_CLOSED before invoking the done callback (in
fact, this is a bad approximation to any real implementation). This is not
what we are actually trying to do now, since done is invoked whenever an agent
considers its connection closed, but it doesn't care about the other endpoint
(well, by the implemented TCP state machine, the other endpoint is also closed
or closing the connection). The later definition is enough for webtraf, since
we can attach the cleanup done callback to the client endpoint (close is
invoked by the server endpoint, so the client will be the passive closing
endpoint, and its done callback will be invoked iff both server and client are
not going to send any more packets.)

I did some significant progress implementing the done callback. First of all,
I discovered a bug in FullTcp code. tcp-full.h defines a closed_ instance
variable in FullTcpAgent class, and this is wrong (*). closed_ is already in
TcpAgent, so it shouldn't be redefined. That was causing a visibility problem
with TcpAgent::reset(), that did not assign 0 to FullTcpAgent::closed_. After
removing closed_ from tcp-full.h, we can call finish instead of evalf in
tcp-full.cc:

 // haoboy: Is here the place for done{} of active close?
 // It cannot be put in the switch above because we might need to do
 // send_much() (an ACK)
 // Felix (using Tarik suggestion)
 if (state_ == TCPS_CLOSED) {
  finish();
 }

We don't need to call cancel_timers() in TCPS_LAST_ACK, since that's now done
by finish(). I tried this and it works well. I have attached an example tcl
script. However, I found that this is not enough when the FIN from the passive
closing endpoint is lost. In this case, the done callback is never invoked.
The passive closing endpoint keeps retransmitting FIN, with no answer from the
active closing endpoint. You can test this using my script, if you enable the
errormodule (it is commented out). The problem here is that FullTcpAgent's TCP
implementation in NS lacks one state (TIME_WAIT). In real implementations,
this state is in charge of waiting 2MSL before allowing any socket to reuse
the port. It is also responsible for acknowledging FIN retransmissions (see
RFC 793). Since we don't have this state, our FullTcp agent goes directly to
TCPS_CLOSED. All retransmitted FINs match the first case in recv():

 if (state_ == TCPS_CLOSED)
  goto drop;

This statement is also in the BSD code, so it's correct, and we have to keep
it here. In BSD's tcp_input(), we have two cases before this one, that will
send a RST packet for unknown connections or closing connections (using the
PCB and TCPCB). We don't have these cases in FullTcpAgent (neither we have PCB
or TCPCB). We don't even have RST packets!

We could hack the code and add a case just before that if statement:

 // Felix: bug fix
 // acknowledge FIN from passive closer even in TCPS_CLOSED state
 // (since we lack TIME_WAIT state and RST packets,
 // the loss of the FIN packet from the passive closer
 // will make that endpoint to retransmit the FIN forever)
 if ( (state_ == TCPS_CLOSED) && (tiflags & TH_FIN) )
  goto dropafterack;

This code solves the FIN problem. My script works fine now. I also tried
test-all-full and all tests succeeded. However, this is only intended as a
temporary solution. I feel we should fix FullTcp and add a TIME_WAIT state. I
guess KF didn't include this state thinking that we can do better in a
simulator that in real life, but it makes things more difficult to understand
and creates new situations, increasing the chance of bugs. Additionally,
adding TIME_WAIT will probably solve the problem with on-the-wire packets,
since this state will force the agent to wait 2MSL before moving to
TCPS_CLOSED.

Anyway, adding a TIME_WAIT state is a "major" modification in FullTcp, so I
will appreciate your feedback.

Cheers,

-- Felix Hernandez

(*) It was added in revision 1.17 for .h and 1.32 for .cc, actually to support
a done callback through finish().

test_submit.tcl