[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Solution to my multicast woes



After a couple of days stumbling around inside gdb and tcl-debug
barking up numerous wrong trees (and not seeing the wood for the
trees), I've discovered the root of my multicast problems in my
own simulations.

I'm using either ns 2.1b3 or a very recent snapshot downloaded
and built 11 September 1998; results are the same for both.

The root of all my problems is (drum roll!): 
Node expandaddr
(or, if you prefer:
$ns set-address-format expanded
- it makes no difference.)

With large address space enabled for any size of simulation, errors of
the form: 

 _o17: no target for slot 128

where $ns gen-map tells me _o17 exists inside the node sourcing the
multicast as:

Node _o16(id 0)
               dmux__o1235(Classifier/Addr)
               multiclassifier__o19(Classifier/Multicast/Replicator)
               switch__o18(Classifier/Addr)
               ifaces__o96 _o120 _o243 _o776(DuplexNetInterface DuplexNetInterface DuplexNetInterface DuplexNetInterface)
               mcastproto__o20(McastProtoArbiter)
               classifier__o17(Classifier/Addr)
               address_0()

appear to be a given, unavoidable certainty for me and my simulation
scripts.

This is irrespective of the type of multicast chosen or the size of my
simulated network. Remove this, and (provided the number of nodes
created does not exceed 128, natch) things work exactly as they're
supposed to, the view from nam is just beautiful, my chances of
ever getting a PhD increase exponentially, etc. 

Still, ouch; most of my planned simulations will exceed 128 nodes in
size, so I'm very interested in helping get this significant bug 
fixed. 

I noticed that the multicast test suite used in:
ns/tcl/test/test-all-mcast 
doesn't set a large address space. Out of interest, I enabled the 
large address space by adding 'Node expandaddr' in:
ns/tcl/test/test-suit-mcast.tcl
causing a set of tests that previously passed flawlessly to fail
completely for me for the exact same above reason.

Surprisingly, test-all-webcache is the *only* test suite that sets
Node expandaddr (or the alternative new syntax), and that webcache
suite tests out fine, as do my simulations on large topologies with
tcp variants etc; this is definitely a multicast/expanded address
space interaction.

Shouldn't test suites run using both types of address space for
completeness?  Is this a problem known to the developers, or is there
someone out there successfully doing multicast simulations with large
address space enabled? 

If this problem is specific to Solaris/my installation (I can't test
on anything else - I'm using tcl/tk 8.0p2, tclcl-1.0b6, otcl-1.0a3,
perl 5.003, tcl-debug-1.7 for tcl8 on Solaris 2.5.1 and 2.4), I'll
need considerable help in tracking down/fixing the real cause. 

(Even with this diagnosed and the offending line removed from my
 scripts, I'm still getting those spurious object identifiers, rather
 than the number of the source node, displayed with the McastMonitor
 trace-tree, which always shows the unhelpful:
0.20000000000000001 0 0 0 0 _o1359 0x8001
 where only the time varies for each multicast...
 and the previously reported prune errors with trace-topo.
 This seems like one or more unrelated problems.)

Help on this would be appreciated.


And now a quick intro to all this for any other beginning ns users;
see:
http://www-mash.cs.berkeley.edu/ns/ns-debugging.html
for further info that's not in the Notes and Documentation
($ns gen-map is dead useful, but isn't mentioned in there), build with
--enable-debug --tcl-debug=<dir>
and remember:
> gdb <full pathname of ns executable>
(gdb) run <your tcl script name and any parameters here>
 
Incidentally, I've no idea about using dmalloc; the ns configure
script has always consistently denied dmalloc exists where I've told
it I've built it, which has me stumped. 

Cheers,
  
L.

Oh, I've also realised that since VINT is also DARPA funded, Y2K
guarantees of ns become unnecessary from an official DARPA viewpoint,
yes?  Just don't let them see the 'no warranty' warning on gdb... 

<http://www.ee.surrey.ac.uk/Personal/L.Wood/>PGP<[email protected]>