/* * Copyright (c) 2006 University of Southern California/ISI. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * This XCP code is based on the system defined in "Decoupling Congestion * Control from the Bandwidth Allocation Policy and its Application to High * Bandwidth-Delay Product Networks," Dina Katabi's MIT Ph.D thesis, published * March 2003. Furthermore much of the implementation is directly based on * the simulations supporting that thesis. * * This material is based upon work supported by the National Science Foundation * under Grant No. ANI-0334186 and a subcontract to The Regents of the * University of California, San Diego under Purchase Order No. 10217675 * supported by the National Science Foundation under Grant No. ANI-0225642.. * * Any opinions, findings and conclusions or recommendations expressed in this * material are those of the author(s) and do not necessarily reflect the views * of the National Science Foundation (NSF), USC/Information Sciences Institute * and The Regents of the University of California, San Diego. * */ Table of Contents 1. Introduction 2. Installation 2.1 Getting the XCP source code 2.2 Building and Installing the kernel 2.2.1 Steps for building the kernel 2.2.2 Recommended kernel configuration parameters 2.3 Configuring XCP 2.3.1 Sysctls and recommended values 2.3.2 Configuration files 2.4 Verifying the installation 3. XCP Utilities 3.1 Tools description 3.2 Building the tools 4. Testing the implementation 4.1 Analysis tools 4.2 Setting up the testbed 4.3 Verifying the testbed 4.3.1 Checking for connectivity and routes 4.3.2 TCP benchmarking of the testbed 4.4 Setting up XCP queue 4.5 Setting up delay inducers 4.6 Capturing traffic 4.6.1 Capturing traffic using 'tcpdump' 4.6.2 Capturing traffic using 'loggerd' and 'decipher' 5. Sample Test case and results. 6. Contact details 1. Introduction --------------- This is the README for the XCP implementation in FreeBSD-6.0. Please read the 'Release Notes' for changes in the current implementation. This README will explain the steps from installation to testing this implementation. It is recommended that the user read through the complete installation instructions, given below, before beginning the installation. While the XCP code release has been found to be stable and interesting in the lab, there are certainly some untested code paths and undetected bugs (Note: The features that are known to be untested have been appropriately noted. Also, there a few known bugs and the details of the same are given in the files Bugs_XCP). Please be advised this is a beta release. If you find a bug, please tell us how to replicate it so we can fix it (contact details are provided at the end). Even better would be a patch and a description of the problem. 2. Installation --------------- For installation of the XCP kernel it is assumed that there is a working FreeBSD-6.0 installation on the target system. See http://www.freebsd.org/releases/6.0R/announce.html for details of installing or upgrading FreeBSD on your systems. 2.1 Getting the XCP source code ------------------------------- The source code and all the required utilities are available at http://www.isi.edu/isi-xcp/sw/xcp_FreeBSD-6.0.tar.gz . The tar file has the following directory structure * xcp_FreeBSD-6.0/contrib - Has the "pfctl" and "tcpdump" code. * xcp_FreeBSD-6.0/confs_XCP - Has the sample configuration files. * xcp_FreeBSD-6.0/sbin - Has the "ip firewall" & "XCP tools" code along with the Makefile for the "pfctl" tool. * xcp_FreeBSD-6.0/sys - Has the kernel source. * xcp_FreeBSD-6.0/usr.sbin - Has the necessary Makefile to build "tcpdump" * xcp_FreeBSD-6.0/sample-testcase_XCP - Has the queue configuration file used and the throughput plot of the sample testcase described in Section 5. * xcp_FreeBSD-6.0/xcp_spec - Has the latest XCP specification. 2.2 Building and Installing the kernel -------------------------------------- The following sections explain in brief the steps needed to build the XCP kernel. Please refer to "http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbo ok/kernelconfig-building.html" for detailed instructions on building the kernel. 2.2.1 Steps for building the kernel ----------------------------------- * Unzip & Untar xcp_FreeBSD-6.0.tar.gz in /usr/src. Note: ----- It is assumed in the rest of this document that the source code will be available under '/usr/src/xcp_FreeBSD-6.0'. In the rest of the document any reference to XCP related file or directory will be relative to this directory. * Change directory to sys/i386/conf * Use the sample configuration file 'KERNEL-XCP-NODIV' provided with the source or any other configuration file containing the XCP parameters set as specified in the sample file can be used for building the kernel. * Execute the following commands config KERNEL-XCP-NODIV (Note: Using the sample file here) cd ../compile/KERNEL-XCP-NODIV make cleandepend make depend make all make install reboot (the system should now boot with the new kernel) 2.2.2 Recommended kernel configuration parameters ------------------------------------------------- Please read the comments for XCP specific parameters in the sample configuration file - 'KERNEL-XCP-NODIV'. 2.3 Configuring XCP ------------------- XCP can be configured by setting various sysctl variables with appropriate values. While one of the parameters must be set using configuration files (ex: /boot/loader.conf), the rest of them can be set using the command line interface. But it is strongly recommended that all the values be set using the configuration files (ex: /etc/sysctl.conf) so as to reduce the chances of human error. The following two sections provide further details. 2.3.1 Sysctls and recommended values ------------------------------------- The following are the set of sysctls which are exported by XCP. Features that are controlled by these sysctls but have not been tested are by default switched off. a) net.inet.xcp.do_xcp_feedback (default -> 1) When set it will enable modulation of the TCP congestion window with XCP feedback. This has to be set for the hosts to honor the feedback (i.e. for XCP to work). b) net.inet.xcp.xcp_delayed_csum (default -> 0) When set it will allow the hardware to compute the checksum. Currently not all the interface cards support this feature when the XCP header is present, e.g., Intel Gigabit cards. Hence it is switched off by default. c) net.inet.xcp.xcp_debug_header (default -> 1) When set it will include the debug header in every packet. The contents of this header can used for analysis of the protocol performance. The details of the header and the methodology to use these values are described in Section 4. d) net.inet.xcp.xcp_smooth_input (default -> 0) When set at the router, instead of taking the exact input bytes seen, in the last control interval, a smoothed estimate is used for computing the feedback parameters at the end of the control interval. This feature is untested. e) net.inet.xcp.initial_ctl_interval (default -> 0x01000000 (1/16s) ) The value of this variable is used to initialize the control interval timer when the XCP queue controller is initialized (i.e. when a XCP queue is setup). This value is also used when the computed average RTT value is 0. The numerical format used is that of 'X' (see note below). Note: Format of X ----------------- 'X' is a 32 bit fixed point real number with an imaginary binary point after 4 bits. The format is Width of integer part : 4 bits Width of fractional part : 28 bits Ex: A value of 0xf0000000 represents 15 A value of 0x00000001 represents 2**-28 Detailed discussion on the format of 'X' is available in the specification. f) net.inet.xcp.xcp_link_overhead (default -> 24) When XCP receives a frame, some bytes, expected to be filled by the driver are missing. This sysctl should reflect per-packet bytes that XCP doesn't see. It is necessary for correct calculations of current link utilization. In Ethernet case these fields are: inter-frame-gap 12 preamble 8 crc 4 Which totals up to 24 bytes. Thus, this sysctl is preset to Ethernet. If another interface type is used a warning will be logged in /var/log/messages and the user may want to adjust this sysctl to match the interface. g) net.inet.xcp.xcp_forall_tcp (default -> 0) When set, newly created tcpcb's will use XCP even when not explicitly requested by the user. This is useful for testing how applications will behave under XCP without having to recompile the application. The way it works: only newly created sockets will be affected. But, if a listening socket had been created _before_ this sysctl was set, new connections on the listen socket will continue to use TCP (because in this case tcpcb's are duplicated). h) net.inet.xcp.aggressive (default -> 0) When set, the end-system will be aggressive in computing the delta throughput. This feature is untested. i) net.inet.xcp.metered (default -> 0) When set, the end-system will compute a smoothed estimate of throughput. This feature is untested. For the purpose of debugging a logging facility has been made available in the XCP layer called XCP_TRACE. The following 6 sysctls can be used to control its functionality. These sysctls will not be visible (and therefore the functionality they control will be unusable) if the kernel is not compiled with the XCP_TRACE option enabled. To enable this option ensure that kernel configuration described in Section 2.2.1 includes the line option XCP_TRACE. Also, please note that including this option will increase the size of the kernel. For further discussion please refer to Section 4.1.2. j) net.inet.xcp.trace_index (default -> 0) This will contain the index of the latest record in the buffer. This is an XCP_TRACE sysctl. k) net.inet.xcp.trace_fwd_toggle (default -> 0) When set the queue controller will log a set of values for each packet that is queued. This is an XCP_TRACE sysctl. l) net.inet.xcp.trace_teo_toggle (default -> 0) When set the queue controller will log a set of values each time the control interval timer expires. This is an XCP_TRACE sysctl. m) net.inet.xcp.trace_tqo_toggle (default -> 0) When set the queue controller will log a set of values each time the queue interval timer expires. This is an XCP_TRACE sysctl. n) net.inet.xcp.trace_rcv_toggle (default -> 0) When set the end-system will log a set of values at the arrival of each 'ack' packet. This is an XCP_TRACE sysctl. o) net.inet.xcp.trace_snd_toggle (default -> 0) When set the end-system will log a set of values at the departure of each XCP data packet. This is an XCP_TRACE sysctl. p) net.inet.xcp.min_flow_id (default -> 0) When set this will be used as the lower bound for setting the flow_id field. By default it is set to 0, which is considered as an invalid value for flow_id. The default value is set to an invalid value to help detect misconfiguration (not setting a valid value for this sysctl) on end-system(s) during a test run. This is an XCP_TRACE_LOG sysctl. Please refer to Section 3.1 part vii for further details. q) net.inet.xcp.max_flow_id (default -> 0) When set this will be used as the upper bound for setting the flow_id field. By default it is set to 0, which is considered as an invalid value for flow_id. The reason for setting the default value to an invalid is as explained above for the sysctl net.inet.xcp.min_flow_id. The flow_id field in the current implementation will assign new value for each new flow incrementally between the bounds set through net.inet.xcp.min_flow_id and net.inet.xcp.max_flow_id. This is an XCP_TRACE_LOG sysctl. Please refer to Section 3.1 part vii for further details. The following are the set of sysctls which are used by XCP but exported by other modules. For detailed explanation about these sysctls please refer to on-line documentation available at http://www.freebsd.org. With each variable a recommended value for XCP is given, these values are for paths with a capacity of 100Mbps and with a maximum RTT of 1sec. Please refer to the file "confs_XCP/ sysctl.conf" for explanation about the method used for computing these values. a) net.inet.ip.intr_queue_maxlen (recommended -> 5000) b) net.inet.tcp.sendspace (recommended -> 12500000) c) net.inet.tcp.recvspace (recommended -> 12500000) d) kern.ipc.maxsockbuf (recommended -> 26024000) e) net.inet.tcp.inflight.enable (recommended -> 0) f) kern.ipc.nmbclusters (recommended -> 130000, this will support upto 10 flows originating from a single system, when the system is using the above mentioned parameter values) g) net.inet.ip.forwarding (should be set to 1 in all routers) h) net.inet6.ip6.forwarding (should be set to 1 in all routers supporting IPv6) 2.3.2 Configuration files ------------------------- The following are the primary configuration files used for XCP. Sample files for each of them are available with source code in the directory 'confs_XCP'. Each file is annotated with the requisite comments for the settings to be made. Please refer to them for further details. a) /boot/loader.conf : The value of the sysctl "nmbclusters" is the only XCP specific setting that is done using this file. Since the kernel uses this value only at the time of network stack initialization it must be done through this file and cannot be changed dynamically. This also implies that if the value is changed then the system has to be rebooted for the new value to take effect. b) /etc/rc.conf : The static ip addresses and routes are setup using this file. In addition if the "packet filter" device is built as module then the module insertion and configuration should be done through this file. c) /etc/sysctl.conf : This is the file where rest of the sysctls mentioned in the Section 2.3.1 can be initialized. d) pf.conf : This file contains the necessary settings to configure XCP queue. The sample file contains different examples. The filename is not fixed and it can be placed anywhere. Only requirement is the right path is used in rc.conf or when using with 'pfctl' utility. Note: ----- Detailed explanation of the purpose and options available with each of the above files, can be obtained from the respective man pages loader.conf(5), rc.conf(5), sysctl.conf(5) and pf.conf(5). 2.4 Verifying the installation ------------------------------ A successful installation at this stage would be, a working system running the XCP kernel with the specified values for all the parameters. The running kernel can be verified by the "uname -a" command. Following is a sample output, # uname -a FreeBSD dfw.isi.edu 6.0-RELEASE FreeBSD 6.0-RELEASE #6: Mon May 22 20:23:25 PDT 2006 root@dfw.isi.edu:/usr/src/xcp_FreeBSD-6.0/sys/i386/compile/KERNEL-XCP-NODIV i386 If there is a failure at any stage (Section 2.2 or 2.3) then please make sure all the instructions have been followed. Also, please consult the references mentioned in the corresponding sections for further details. Please let us know if there is a failure even after following all the instructions (contact details are provided at the end). 3. XCP Utilities ---------------- The source code comes with a set of utilities. They can be classified into two groups a) Modified tools: These are the tools that are natively available but have been modified/extended to support XCP. The tools in this category are i) tcpdump ii) pfctl iii) ipfw b) XCP specific: These are the tools that have been specifically developed to test XCP implementations. i) xstream ii) xserver iii) rxt iv) xcp_logger v) loggerd vi) decipher 3.1 Tools description --------------------- i) tcpdump : This tool is used to capture traffic on the wire for analysis. Please refer to the man page for a detailed explanation. This version has been extended to recognize XCP headers and print them in the text output. The XCP fields are enclosed between '/XCP/' tags for easy identification. It supports all the options supported by the standard tcpdump. ii) pfctl : This tool is the user interface to configure the packet filter and to configure the queues setup using the altq framework. Please refer to the man pages pfctl(8), pf.conf(5) for details. This version has been extended to support the new XCP queue enhancement added to the ALTQ framework. When an XCP queue is setup it will initialize an XCP queue controller for that queue and the capacity set for the queue will be used as the capacity value in XCP feedback calculations. Please refer to Section 4.3.2 for details. It supports all the options of the standard 'pfctl' tool. Please refer to Section 4.3 for details about setting up the XCP queue. iii) ipfw : This is the interface used to setup pipes/queues in dummynet. It is primarily used to induce path delays while testing XCP code. There are no XCP specific extensions added to this tool. It is distributed along with XCP code because 'dummynet' headers have been modified and 'ipfw' depends on these, REQUIRING a REBUILD with the new headers for working with the new kernel. iv) xstream: This is an XCP client that injects maximum possible traffic. It supports both XCP and TCP. Many options are available. Running xstream with the -h option prints usage information. v) xserver : This serves as a sink for both XCP and TCP traffic. This coupled with "xstream" is used to generate the traffic required for testing Running xserver with the -h option prints usage information. Note: ----- a) xstream and xserver together only generate one way traffic. This has been found to be sufficient for our testing. b) In the current implementation, in a connection even if one of the sockets, either the client or the server, is an XCP socket then the connection is automatically converted to an XCP connection. For generating TCP traffic both, client and server, should be started with the TCP option (-t). vi) rxt : This tool displays the log records stored in a buffer in the XCP layer. For using this tool the kernel should be built with XCP_TRACE option. Please note that this is a legacy tool. The logger tool provides a superset of facilities provided by this tool. It is recommended the logger tool be used for XCP analysis. vii) xcplogger : This is a pseudo-device which is available when the kernel is compiled with the option XCP_TRACE_LOG. It is a read-only device and can be accessed through the device file '/dev/xcplogger'. The device acts as a temporary buffer for storing the log messages generated by the XCP layer. Currently 8 types of records are supported. The detailed format of each of the records and their use can be obtained from their declarations and associated comments in the header file 'dev/xcp/xcp_records.h'. The motivation for the development was that the currently available tools including the modified 'tcpdump' do not provide enough insight into the working on the XCP protocol with per packet granularity. This effort was started as a class project to satisfy the course requirements and a paper giving details about the motivation, architecture and working was presented in the class. Interested users may refer to this document available at 'sbin/xcp-logger-tools/EPA.pdf'. Though the basic architecture remains as specified in the paper there have been significant changes in the implementation since then. So, the paper should be considered only as an introductory material. The source code and the associated comments should be referred for the further details. For effective use of this tool all the machines in the testbed should be running the kernel with this tool enabled. Once the the kernel is built with this tool for every interesting event Examples of interesting events are, an end-system starting a new XCP flow, arrival and departure of XCP packets etc and on routers on arrival of XCP packets, expiry of control timers, new log messages are generated and buffered in the pseudo-device. User space application can then read from the device and store event traces on a stable storage. The log generated can then be used for further analysis. Two sample applications 'loggerd' and 'decipher' have been provided to serve as a starting point for using the tool and also a reference for implementing better applications. These tools are further described below. Lastly, to support distributed logging and analysis a unique flow id is associated with each flow. A simple mechanism has been used to realize this. A new field called the flow_id field has been added to the XCP debug header. Each end-system assigns a new ID for each new flow. To further simplify the analysis each connection will have 2 flow_id's associated with it, one for each direction. For every new flow a log message is generated at the end systems which has the details of the source and destination IP (IPv4 or IPv6) address and ports. Using this information available in the logs on the end-systems and flow_id value available in each packet, hence available in the router logs, the log messages on different systems in the test-bed can be correlated and analyzed. However one must still ensure that the flow_ids are unique in a test run. To help realize this, two new sysctls 'net.inet.xcp.min_flow_id' & 'net.inet.xcp.max_flow_id' have been added. The flow_ids on each end-system will be assigned sequentially in the range between the 'net.inet.xcp.min_flow_id' and 'net.inet.xcp.max_flow_id'. The assignment will roll over if the upper limit is reached. In this setup the uniqueness of the flow_id can be ensured simply by ensuring that no 2 ranges assigned in the test-network overlap. It is left as the responsibility of the tester to ensure that this property is valid for a test-run. As explained in Section 2.3.1 (p) the default values for both the the sysctls is set to the invalid value 0. This should help identify if there was any omission of setting valid values for these sysctls on end-system(s). As can be easily inferred this facility of invalid values for flow_ids is available only for the first test-run after rebooting the system and for subsequent test-runs the identification of any misconfiguration is the responsibility of the tester. The two main advantages of this mechanism is its simplicity and independence of the network protocols (IPv4 or IPv6) used. Also, we believe that the total number of flows, in one test run, that can be supported is sufficiently large (2**16) and hence the available range of flow_ids will not be a problem for conducting any meaningful experiment. viii) loggerd : This is the user space application that reads the log messages from the device and stores them in a specified log file. This task can be sub-divided into to two steps, a) Efficient reading and writing: Since the transfer of data from kernel to user space (reading from the device) and writing data to the stable storage are costly single records are not read or written. Large number of records are read, processed, buffered and then written in bulk. b) De-multiplexing and processing records : Each record has an associated type value. Using this field log records are identified and processed to ensure that the fields are converted to network byte order, buffered, and stored contiguously. This processing will make the logs both machine and platform independent. Also, there is savings in file sizes because only the necessary fields for each record are stored. ix) decipher : This application takes as input the binary log generated by 'loggerd' and outputs a set of deciphered ascii text files. Two or four files will be generated depending on whether the log was generated at the router or an end-system respectively For a given log file "foobar" the output files that are generated are, foobar.abs - The timestamp associated with each record is the value that was logged. foobar.rel - The timestamp associated with each record is relative to the first record in the log file. The following files are generated only if the log was generated on an end-system. foobar-list.abs - This file only has the records that are are generated when new flows are started. As stated earlier these are the records that have the information regarding the end-points of the connections associated with each flow_id. This file will thus give the list of flows that originated from that end-system. The timestamps associated with each record is the value that was logged. foobar-list.rel - Same as above with the only difference being that the timestamps are relative to the first record in the log file. As can be inferred this application not essential for using the logger tool or for performing analysis (the binary file can be directly used). Note: ---- For the exact syntax and the options available for these application please use the command 'loggerd/decipher -h' 3.2 Building the tools ---------------------- Some of the kernel header files have been changed for supporting XCP. Hence for building the accompanied tools the compiler should include the new files. Ideally, the Makefiles should be able to set the proper include paths so as to not include the older header files from the default directories, e.g, /usr/include. Currently the Makefiles are not working as expected. We are working to fix the problem. One hack that works is over-writing the old header files with the new ones. If you plan to use this approach, then please be advised to backup the files before over-writing. Only files from three directories have been changed and these are the ones that need to be updated. The following commands will do the needful cp -f /usr/src/xcp_FreeBSD-6.0/sys/netinet/*.h /usr/include/netinet cp -f /usr/src/xcp_FreeBSD-6.0/sys/contrib/altq/altq/*.h /usr/include/altq cp -f /usr/src/xcp_FreeBSD-6.0/sys/contrib/pf/net/*.h /usr/include/net cp -f /usr/src/xcp_FreeBSD-6.0/sys/dev/xcp/xcp_records.h /usr/include/dev/xcp All the tools can be built by using the respective Makefiles. The location of the Makefiles are i) tcpdump : usr.sbin/tcpdump/tcpdump ii) pfctl : sbin/pfctl iii) ipfw : sbin/ipfw iv) xstream, xserver & rxt : sbin/xcptools v) loggerd & decipher : sbin/xcp-logger-tools 4. Testing the implementation ----------------------------- The following sections explain the procedure that can be used to test the implementation. In the process, an explanation is also given about the practical usage of all the tools that have been described so far. XCP uses the standard socket interface for data communication. But the option 'IPPROTO_XCP' has to be used as the protocol option for setting up an XCP socket. Refer to the source code of the sample XCP applications, xstream and xserver, that are available with the source code. These can be used as examples for a better understanding of the requirements and usage. XCP also supports IPv6. The standard options along with the IPPROTO_XCP option are enough to open an XCP socket on IPv6. But there is a different option that is required for setting up an XCP queue at the router that recognizes IPv6. Please refer to the file 'pf.conf' in 'confs_XCP' directory for further details. In the current implementation, the XCP header is attached as a separate protocol header similar to the implementation for IPv4. This has been done for expedience. In a future release XCP header will be included as an IPv6 extension header. 4.1 Analysis tools ------------------- To support protocol analysis, the XCP implementation provides two methods to access the instantaneous values of the internal variables. The first method is in the form of a debug header attached to each packet and the second in the form of log records stored in a buffer in the XCP layer. The first method has been used in all the analysis presented in this document. 4.1.1 Debug header ------------------ This is the header which is attached to each packet. The availability of this feature can be controlled using the sysctl 'net.inet.xcp.xcp_debug_header' explained above. The current debug header has the following 5 fields. i) Congestion Window Size: This field is filled in by the end-systems. The originator of the packet fills this field with the instantaneous value of the congestion window size for that particular connection. i) Queue length (in bytes): Each router fills this field. The value filled in each packet is the instantaneous queue length of the XCP queue that is on the outgoing interface on which the packet is being queued. It is overwritten by each subsequent XCP router installation. iii) Positive Feedback (in B/s): As above this field is filled by each router. The value contains the positive feedback component of the computed feedback. iv) Negative Feedback (in B/s): Same as above. But this field contains the negative feedback component. v) Flow_id: This field is filled in by the end-system. Each direction of a connection is assigned a flow_id by the end-system. This ID is then stored in each packet belonging to that connection. Please refer to Section 3.1 for further details. The values in the debug header (except the Flow_id field) are captured using the modified "tcpdump" tool (this is provided with the source). Interested users can add/delete fields in this structure for getting a better insight into the working of the protocol. Along with the changes made in kernel files, please be advised to make corresponding changes in the function "xcp_print" present in the file "contrib/tcpdump/print-tcp.c" for capturing the values in the new structure. 4.1.2 XCP_TRACE --------------- This is a logging facility that has been made available in the XCP layer. Different types of log records are generated depending on the sysctls that have been switched on (i.e one or more of the sysctl from 'i' to 'm' from Section 2.3.1). The different types of records that are available and their associated fields are found in the 'xcp_debug_info_t' structure from the file 'sys/netinet/xcp_var.h'. A simpler approach is to use the tool 'rxt' provided with the code for displaying the logs on the standard output. Note: For this facility to be available, the XCP kernel should be built with the option XCP_TRACE. 4.1.3 XCP_TRACE_LOG ------------------- This is the facility that was described in Section 3.1. For this facility to be available the XCP kernel should be built with the option XCP_TRACE_LOG. 4.2 Setting up the testbed --------------------------- The following test setup will be used to explain the generic procedure that can be used for testing the implementation. Test setup description: ----------------------- Any testbed for analyzing XCP will contain the following 5 classes of nodes a) Clients/Sources : The traffic generators. b) Servers/Sinks : The nodes which have server sockets. They usually act as sinks for the incoming traffic. But, if needed, can generate traffic to create bi-directional data flows. c) Routers : The nodes routing the traffic. It will be an XCP router if there is at least 1 bottleneck and there is an active XCP queue controller to manage the utilization. d) Delay inducers : The nodes which induce artificial delays to simulate different RTTs. In the current setup the required delay is induced using the delay pipes facility of the dummynet. Details are discussed below. e) Traffic monitors: These are the nodes which will capture the traffic for analysis. The example setup has a simple 2-input-1-output topology with all the nodes running the XCP-NODIV-KERNEL-6.0. All the network interfaces are Gigabit interfaces. 'S' is the server(sink), 'R' the router and 'A', 'B' the clients. 'R' also serves as the delay inducer machine and as a traffic capturing node. Though it appears that the router is overloaded, we found that the router was neither CPU nor memory limited during our experiments. This was understandable because the router used was a high end system. If it is found that the router is getting CPU and/or memory limited, insert another machine in the path to act as a delay inducer and/or traffic monitor. The new machine would need to be placed between the router and the server(sink). [A] \ \ L1 \ E1 \ L3 [R] -------- [S] E2 / E3 / / L2 / [B] where, A,B - Sources IP Addresses R - XCP router ------------ S - Sink or server A - 10.10.1.6 L1 - Link 1 (1 Gbps) B - 10.10.1.1 L2 - Link 2 (1 Gbps) R - 10.10.1.5 (E1) L3 - Link 3 (100 Mbps) 10.10.1.2 (E2) 10.10.1.9 (E3) S - 10.10.1.10 E1 - Interface 1 on R (to A) E2 - Interface 2 on R (to B) E3 - Interface 3 on R (to S) Fig 1: Example test setup ------------------------- Note: ----- a) The delay inducing pipes setup in dummynet should always be done on the 'ack' path. This will prevent the possibility of long queues that might be created if data packets were queued, especially when long RTTs are being simulated. Long queues can potentially alter the performance. b) If an alternate machine (other than the router) is used for traffic monitoring then it should be placed on segment of the path after the XCP router. In cases where multiple bottlenecks are created, there should be one monitoring machine after each bottleneck but before the next bottleneck on the path to collect data about that router's output. c) L3 in the above figure is a 1 Gbps link but has been downgraded to a 100Mbps link using the standard 'ifconfig' tool. Please refer to Section 4.4 for the details. 4.3 Verifying the testbed ------------------------- These can be considered as a set of functional tests. 4.3.1 Checking for connectivity and routes ------------------------------------------ i) Tests should be done to ascertain the testbed has complete connectivity or reachability . This can be done by using the "ping" utility. This should be done between all combination of the nodes. At the least ping test should be done between all the intended end-systems. In our example test setup, between A & S and B & S. This will expose both physical connectivity problems and/or any routing mis-configurations. ii) Next, check to make sure that the path being traversed between any two systems is in-fact the intended one. This can be done using the "traceroute" utility. Note: ---- Use the '-n' switch with traceroute if the target machine is local to the testbed i.e. has no entry in the DNS server or using the RFC-1918 addresses. Please refer to the man page of traceroute for details. 4.3.2 TCP benchmarking of the testbed ------------------------------------- Before starting the XCP tests it is advisable to check the performance of TCP. The tests should be done both with and without induced bottlenecks. These tests can be done using any of the available TCP benchmarking tools. We use "netperf" (http://www.netperf.org/netperf/NetperfPage.html). If you have the FreeBSD ports system installed, netperf is available from /usr/ports/benchmarks/netperf. More information on using the ports system is available from http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ports.html Ex: --- The netserver is started on 'S'. Then the netperf is started with the following command on A and then on B A# netperf -H 10.10.1.10 -t TCP_STREAM B# netperf -H 10.10.1.10 -t TCP_STREAM The observed throughput will vary with the machine hardware and path characteristics. It should be ascertained that there is a valid explanation for the observed throughputs. These tests are important because they serve 2 important purposes, i) If the observed TCP behavior matches the expected then, one can be reasonably sure that the non-XCP settings are correct and that there are no hidden bottlenecks, like, buffer shortage in routers, unexpected packet losses etc ii) Since the XCP performance will invariably be compared with TCP, these tests will provide the valuable reference point even before the XCP tests are done. 4.4 Setting up an XCP queue --------------------------- The XCP queue is to be setup on the bottleneck machine. For this either a known bottleneck should exist or an artificial bottleneck has to be created. As an example we consider the later case. A bottleneck can be setup by creating a queue configured with a lower output capacity than the input data rate. Here we set up a bottleneck by creating a queue using the 'altq' and choosing the 'xcpq' as the queuing discipline. By using the 'xcpq' discipline the XCP queue controller is initialized. There are 2 ways of setting up an XCP queue, i) Setting it up as a stand alone queue (Examples 2 & 3 in the sample file pf.conf see below). In this setup no other queuing discipline is used and only 1 such queue is setup for each outgoing interface. ii) Can be setup in conjunction with other queuing disciplines. Currently only the "Priority Queue" discipline is supported. An example configuration file 'pf.conf' has been provided with the source code. It is available under the directory 'confs_XCP'. The example configuration file sets up a bottleneck of 100 Mbps. Now the setup is ready for testing. It should be noted that the 'ifconfig' interface is no longer necessary either for creating bottlenecks or setting up XCP queues. Ifconfig can still be used to set up a hard upper limit for capacity ex: 100Mbps. This will remove any possibility of overshooting because of the hysteresis that altq might have and which is inherent in any of the computation based rate limiters. 4.5 Setting up delay inducers ----------------------------- Different RTTs can be simulated on the testbed by setting up delays in the path using the dummynet framework. As mentioned in Section 3.1 "ipfw" is the tool that can be used to setup the delay pipes using dummynet. In the example setup, router 'R' is also used to induce delays. A 100ms and 5ms second delays can be setup between 'A' & 'S' and between 'B' & 'S' respectively, with the following commands run on 'R'. R# ipfw add 10 pipe 1 ip from 10.10.1.10 to 10.10.1.6 out R# ipfw add 20 pipe 2 ip from 10.10.1.10 to 10.10.1.1 out R# ipfw pipe 1 config delay 100ms R# ipfw pipe 2 config delay 5ms Note: ----- a) The 'out' option at the end of the first 2 commands is very important. If omitted the packets will be queued twice, once on the incoming path and again on outgoing path in the kernel stack, thus doubling the induced delay. b) Note that the delay is induced in the 'ack' path i.e. path from server to client for reasons explained above. 4.6 Capturing traffic --------------------- 4.6.1 Capturing traffic using 'tcpdump' --------------------------------------- The traffic can be captured using the "tcpdump" utility, distributed with the source code. The monitor node should be placed as specified in Section 4.1. In the example setup node 'R' captures the traffic. The following command gets the desired result, R# tcpdump -n -i E3 -s 100 -w /traces/XCP-test.tr The above will store the captured traffic in the binary file 'XCP-test.tr'. Each of the option used is important. a) -n - Tcpdump will not attempt to convert addresses to names. b) -i E3 - Always the outgoing interface should be monitored. This makes sure the traffic is captured only after the XCP queue controller has filled in the feedback into the packed. c) -s 100 - This will only capture the first 100 bytes of the packet. All the headers of interest including the XCP debug header are accommodated in the first 100 bytes. It has 2 important uses, i) Will considerably reduce the size of the trace file collected. ii) Will considerably reduce the disk I/O, thus reducing the overall load on the system. During our test runs it was observed that higher disk I/O occasionally resulted in packet capture loss by tcpdump thus skewing the analysis. Note that the loss was not in the kernel, i.e there was no packet drop, but was in the packet capture module of tcpdump. Note: ----- Default capture length is 96 bytes and that should be sufficient. The "-s" option has been included to bring to notice the afore mentioned problems if higher snap lengths are used. This binary file is then converted to plain text using the following command R# tcpdump -n -s 100 -r /traces/XCP-test.tr > /traces/XCP-test.ascii The XCP header values will be enclosed between the '/XCP/' tags which can then used for further analysis. 4.6.2 Capturing traffic using 'loggerd' and 'decipher' ----------------------------------------------------- The traffic can be captured and analyzed using these tools if the kernel is built with option XCP_TRACE_LOG turned on. The detailed usage method for these tools has been described in Section 3.1. The steps to be followed for a typical test run will be, a) On each node on the test setup to be used as end-system assign non-overlapping ranges of flow_ids by assigning appropriate values for 'net.inet.xcp.min_flow_id' and 'net.inet.xcp.max_flow_id'. b) Start loggerd specifying the target log filename on all the test setup machines. c) Start the test cases. d) Terminate the loggerd applications on all the machines by sending the Ctrl-C signal. e) Perform analysis using the generated logs. 5. Sample Test case and results ------------------------------- Along with this distribution a sample testcase and the results are presented as a reference. All the relevant files are present in the directory "sample-testcase_XCP". Please read the README in that directory for further details. 6. Contact details ------------------ Project website: http://www.isi.edu/isi-xcp/ Contact ID : xcp@isi.edu Please be sure to subscribe to the mailing list (see below) before mailing. Mailing List : http://mailman.isi.edu/mailman/listinfo/xcp