Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The purpose of the queuing system is (info) to (i) promote the efficient utilization of our computer facilities, and (ii) to promote equity of access to those resources across our user community. This page

How it works for users

Users should edit their shell scripts to add special directives to the queue system, beginning with "#PBS", that request resources, declare a required walltime, and direct standard output and error. Users can then "submit" their job

A simple example of such a script can be found in the attached apoa.sh, which runs the ApoA1 benchmark.

...

  1. Users should request the lowest possible walltime for their jobs, since the queue system will need to "block out" the entire 24-hour period when no walltime is specified. This is analogous to a customer arriving at a busy barbershop and explaining that he only needs a "very quick trim."
  2. Input and output to files do not seem to be as immediate as when running directly from the command line. Users should not count on immediate access to program output.
  3. Users should test system scaling before expanding beyond one node; for systems of 10 katoms, poor scaling has been observed beyond 8 ppn, while the 92 katom ApoA1 benchmark case scales well to 2 nodes.

...

Working with the queuing system, and the approach used by Torque

The following commands can be used

command

purpose

qsub jobscript

Submit job in script jobscript. Can accept other arguments as discussed above.

qsub -I -l nodes=1:ppn=4

Request interactive job with indicated resources.

qdel jobID

Delete job number jobID. Seems to kill processes on compute nodes cleanly.

qstat

List active jobs

qnodes

List all nodes and their state and properties.

qnodes -l down

List those nodes currently down.

qnodes -l active or qnodes -l active

List nodes currently used for jobs.

qnodes -l free

List nodes currently free.

qmgr -c "print server"

Print queue configuration details

The PBS queue system allocates a set of nodes and processors to an individual job, either for the walltime specified in the job or the maximum walltime in the queue. It then provides a set of environmental variables to the shell in which the script runs, such as PBS_NODEFILE, the temporary node file describing allocated CPUs.

...