Policies and Queuing System

Queuing system on Cyrus1 and Quantum2

How it works for users

Users should edit their shell scripts to add special directives to the queue system, beginning with "#PBS", that request resources, declare a required walltime, and direct standard output and error. Users can then "submit"

A simple example of such a script can be found in the attached apoa.sh, which runs the ApoA1 benchmark.

A job name is assigned with a "#PBS -N" statement, the destination queue is specified using a "#PBS -q" statement:

#PBS -N solution_equilibration_273K

#PBS -q short

Users can request resources using a "#PBS -l" statement. Resources include the walltime (in mm:ss or hh:mm:ss format) and the number of nodes and number of processors per node. In the example below, several alternative examples of node requests are given to illustrate the possible syntax; only one would be included

#PBS -l walltime=14:30:00

#PBS -l nodes=1:ppn=4
     OR
#PBS -l nodes=1:ppn=8
     OR
#PBS -l nodes=n024:ppn=8
     OR
#PBS -l nodes=1:ppn=8+nodes=1:ppn=4
     OR
#PBS -l nodes=n024:ppn=8+nodes=1:ppn=8

Some or all of these arguments can also be given at the command line. Command-line settings override any settings in the script.

[bashprompt]$ qsub -q short -l walltime=5:00:00 -l nodes=2:ppn=8 -e test_stderr.txt  ./test_simulation.sh

Some notes and suggestions for users:

Users should request the lowest possible walltime for their jobs, since the queue system will need to "block out" the entire 24-hour period when no walltime is specified. This is analogous to a customer arriving at a busy barbershop and explaining that he only needs a "very quick trim."
Input and output to files do not seem to be as immediate as when running directly from the command line. Users should not count on immediate access to program output.
Users should test system scaling before expanding beyond one node; for systems of 10 katoms, poor scaling has been observed beyond 8 ppn, while the 92 katom ApoA1 benchmark case scales well to 2 nodes.

Technical Approach used by Torque

The PBS queue system allocates a set of nodes and processors to an individual job, either for the walltime specified in the job or the maximum walltime in the queue. It then provides a set of environmental variables to the shell in which the script runs, such as PBS_NODEFILE, the temporary node file describing allocated CPUs.

Table of Queues

The following tables are available in printer-friendly form in an attached file. Note that the settings can be adjusted to meet users' needs as those needs become clear.

Queue attributes on Cyrus1 and Quantum2

	debug	short	long
max walltime	20 min	24 hr	6 days
max nodes per job	1	2	1
priority	100	80	60

Queue attributes on Darius

	debug	short	long
max walltime	20 min	24 hr	12 days
max nodes per job	1	4	8
priority	100	80	60

Child pages

Policies and Queuing System

Queuing system on Cyrus1 and Quantum2

How it works for users

Technical Approach used by Torque

Table of Queues