Queuing system on Cyrus1 and Quantum2
How it works for users
Users should edit their shell scripts to add special directives to the queue system, beginning with "#PBS", that request resources, declare a required walltime, and direct standard output and error. Users can then "submit"
A simple example of such a script can be found in the attached apoa.sh, which runs the ApoA1 benchmark.
A job name is assigned with a "#PBS -N" statement, the destination queue is specified using a "#PBS -q" statement:
#PBS -N solution_equilibration_273K
#PBS -q short
Users can request resources using a "#PBS -l" statement. Resources include the walltime (in mm:ss or hh:mm:ss format) and the number of nodes and number of processors per node. In the example below, several alternative examples of node requests are given to illustrate the possible syntax; only one would be included
#PBS -l walltime=14:30:00
#PBS -l nodes=1:ppn=4 OR #PBS -l nodes=1:ppn=8 OR #PBS -l nodes=n024:ppn=8 OR #PBS -l nodes=1:ppn=8+nodes=1:ppn=4 OR #PBS -l nodes=n024:ppn=8+nodes=1:ppn=8
Some or all of these arguments can also be given at the command line. Command-line settings override any settings in the script.
[bashprompt]$ qsub -q short -l walltime=5:00:00 -l nodes=2:ppn=8 -e test_stderr.txt ./test_simulation.sh
Some notes and suggestions for users:
- Users should request the lowest possible walltime for their jobs, since the queue system will need to "block out" the entire 24-hour period when no walltime is specified. This is analogous to a customer arriving at a busy barbershop and explaining that he only needs a "very quick trim."
- Input and output to files do not seem to be as immediate as when running directly from the command line. Users should not count on immediate access to program output.
- Users should test system scaling before expanding beyond one node; for systems of 10 katoms, poor scaling has been observed beyond 8 ppn, while the 92 katom ApoA1 benchmark case scales well to 2 nodes.
Technical Approach used by Torque
The PBS queue system allocates a set of nodes and processors to an individual job, either for the walltime specified in the job or the maximum walltime in the queue. It then provides a set of environmental variables to the shell in which the script runs, such as PBS_NODEFILE, the temporary node file describing allocated CPUs.
Table of Queues
The following tables are available in printer-friendly form in an attached file. Note that the settings can be adjusted to meet users' needs as those needs become clear.
Queue attributes on Cyrus1 and Quantum2
|
debug |
short |
long |
---|---|---|---|
max walltime |
20 min |
24 hr |
6 days |
max nodes per job |
1 |
2 |
1 |
priority |
100 |
80 |
60 |
Queue attributes on Darius
|
debug |
short |
long |
---|---|---|---|
max walltime |
20 min |
24 hr |
12 days |
max nodes per job |
1 |
4 |
8 |
priority |
100 |
80 |
60 |