ns Batch Jobs with GNU Queue

Preparing to use Queue

Installing Queue

If you are using Linux on an x86 machine, you should add your computer to the Queue cluster so it can host queued jobs. Just follow the following procedure:

  1. Make sure your computer is running NIS and is mounting the appropriate disks via NFS so all the queue users can log into your machine.
  2. Contact Action and request root access to /nfs/pub-boreas for your machine.
  3. Install the Queue RPM: /nfs/pub-boreas/vint/queue-1.40.1beta.i386.rpm
  4. Verify that your computer's name has been added to the file /nfs/pub-boreas/vint/queue/qhostsfile

Testing Queue

Run the following command:

      queue -- date
    

This should print the current date on your screen, just as if you had run the date command normally. But in this case, the program was run by the Queue system, possibly on a different machine other than your own! Try the following command:

      queue -- hostname
    

This will print the name of the computer which has accepted your job. This might be your machine, or it might be a different one. Queue tries to find the machine which is the most idle and runs your job on that machine.

Queuing jobs

You can queue up jobs with the following syntax:

      queue -i -w -- /path/to/ns filename.tcl
    

This will run your job immediately (-i) and display output to the screen (-w) This is actually the default mode, so you can just use queue -- command if you want.

This is useful for testing, but you would normally use the syntax below to queue up a batch job.

      queue -q -r -- /path/to/ns filename.tcl
    

This will submit your job to the batch mode queue (-q) and email the output to you (-r). Note: Output capture is not reliable right now, so make sure to put any important output into a file instead of printing it to standard out.

There are several queues to which you may submit jobs. The default queue will run your job with very low priority, put a limit on the amount of memory it can use, and only run it on machines which have a lot of resources to spare. Therefore you should put your big jobs into a queue which will give them more resources.

The default queue for interactive jobs is called now and the default queue for batch jobs is called wait. These queues impose limits on the amount of memory you can use. In order to run jobs with large memory requirements, be sure to use the big queue, by specifying it with the -d queuename option:

      queue -q -r -d big -- /path/to/ns filename.tcl
    

The -- marks the end of queue options. The command to run follows it. To make a job run on a specific machine, use the -h hostname option:

      queue -q -r -h vir.isi.edu -- /path/to/ns filename.tcl
    

Problems

Queue has a few minor problems right now.

  1. The Linux dynamic linker unsets the LD_LIBRARY_PATH environment variable when running a setuid/setgid program, in order to avoid the security problems of allowing users to tell a setuid program which libraries to use. The multiuser version of GNU queue uses a setuid executable (so it can use privileged TCP ports), so you cannot depend on having this variable set.

    Therefore, you have to play stupid linker tricks to make ns work.

    You can either put -rpath /path/where/libs/are on the gcc command line, or you can set the LD_RUN_PATH environment variable when compiling it. (see the ld man page)

    I chose the LD_RUN_PATH method, as it doesn't require modification of the makefile:

    	  LD_RUN_PATH=/nfs/ruby/buchheim/nsnam/lib gmake
    	
  2. ns doesn't seem to want to do an interactive session when you use queue. That's ok, though, since for batch jobs you are going to be storing your programs in files, not typing them into the interpreter directly.

  3. Queue does not seem to be capturing output from queued batch jobs right now. This is a bug. Please send all output to a file instead of printing it to standard out. When jobs are run immediately (-i) this is not an issue.

More Information

Queue comes with documentation in several formats, including a man page. You can also get more information about it at its SourceForge project page or its page at GNU.org.