Qsub
From HPC
(→Embedding qsub options in scripts) |
(→tiny.q) |
||
(20 intermediate revisions not shown) | |||
Line 25: | Line 25: | ||
example -m ase | example -m ase | ||
-t start-end:step Array job (i.e. 1-5), :step is optional and is the step increment (1-6:2 would be 1 2 4 6). Environment variable $SGE_TASK_ID hold current position. | -t start-end:step Array job (i.e. 1-5), :step is optional and is the step increment (1-6:2 would be 1 2 4 6). Environment variable $SGE_TASK_ID hold current position. | ||
+ | -q queue_name | ||
+ | -l mem_free=1G,h_vmem=1.2G (request node with 1G free memory and terminate if memory exceeds 1.2G. Adjust to requirements) | ||
+ | |||
+ | =Choosing a job queue= | ||
+ | |||
+ | If you don't specify on of the following when submitting jobs, you the system will automatically select one. This is likely to cause you jobs to have problems. | ||
+ | |||
+ | To choose a queue when submiting: | ||
+ | |||
+ | qsub -q queue_name myscriptfile | ||
+ | |||
+ | or in you script file add | ||
+ | |||
+ | #$ -q queue_name | ||
+ | |||
+ | Please choose a queue from the following: | ||
+ | |||
+ | ==long.q== | ||
+ | |||
+ | *Single CPU workloads | ||
+ | *25 nodes | ||
+ | *up to 220 concurrent jobs, 50 running jobs per user | ||
+ | *Job/task wall clock time of 7 days. This means you jobs will be terminated if they take longer than 7 days to run. | ||
+ | |||
+ | ==short.q== | ||
+ | |||
+ | *Single CPU workloads | ||
+ | *All cluster nodes | ||
+ | *up to 268 concurrent jobs, 100 running jobs per user | ||
+ | *Job/task wall clock time of 6 hours. This means you jobs will be terminated if they take longer than 6 hours to run. | ||
+ | |||
+ | ==tiny.q== | ||
+ | |||
+ | *Single CPU workloads | ||
+ | *Two nodes | ||
+ | *upto 16 jobs | ||
+ | *Job/task wall clock time of 30 mins. This means you jobs will be terminated if they take longer than 30 minutes to run. | ||
+ | |||
+ | ==parallel.q== | ||
+ | |||
+ | For use with multi CPU/OpenMPI jobs | ||
+ | |||
+ | *Multi CPU workloads | ||
+ | *20 nodes | ||
+ | *up to 180 concurrent jobs/slots, 40 running slot per user. | ||
+ | *Job/task wall clock time of 30 days. This means you jobs will be terminated if they take longer than 30 days to run. | ||
=Embedding qsub options in scripts= | =Embedding qsub options in scripts= | ||
Line 41: | Line 87: | ||
myprogram | myprogram | ||
- | qsub -V -cwd myscript | + | qsub -V -cwd -N MYHPCJOB -q short.q -l mem_free=1G,h_vmem=1.2G myscript |
- | Is the same as | + | Is the same as saving the setting in a job script |
#!/bin/bash | #!/bin/bash | ||
+ | #$ -N MYHPCJOB | ||
#$ -V -cwd | #$ -V -cwd | ||
+ | #$ -q short.q | ||
+ | #$ -l mem_free=1G,h_vmem=1.2G | ||
myprogram | myprogram | ||
qsub myscript | qsub myscript | ||
- | + | =Memory usage and requesting large memory allocation= | |
- | + | The HPC contains nodes with mixed quantities of memory (RAM). There are currently nodes with 8GB, 16GB and 32GB available to run your work. | |
- | #$ - | + | |
- | #$ - | + | When submitting your work you need to specify how much memory you will need to run the work. You should specify two values: mem_free the amount you need (plus a small amount just in case) and h_vmem a hard limit on the maximum it can consume and terminate if exceeded. |
- | + | ||
+ | These values should be realistic values so the Job Scheduler can fit yours and other users work on to the various nodes to make the best use of resources. The less memory you use the more nodes that will be available to run you work. If you specify an email address in you script the email at the end of the job will contain information on how much memory your job consumed, this will help in planing your usage requirements. In the example below you can see Max vmem was 763MB or 0.76GB | ||
+ | |||
+ | Job 5013 (sim) Complete | ||
+ | User = train | ||
+ | Queue = serial.q@comp00.gecko.lshtm.ac.uk | ||
+ | Host = comp00.gecko.lshtm.ac.uk | ||
+ | Start Time = 10/14/2008 16:53:08 | ||
+ | End Time = 10/14/2008 20:22:20 | ||
+ | User Time = 02:55:55 | ||
+ | System Time = 00:30:35 | ||
+ | Wallclock Time = 03:29:12 | ||
+ | CPU = 03:26:30 | ||
+ | Max vmem = 763.133M | ||
+ | Exit Status = 0 | ||
+ | |||
+ | The reason you should specify a maximum memory limit is if the Job Scheduler believes you only need X amounte it will add additional jobs to the node to consume all the memory. If you job uses more than planed the node will run out of memory and then all the jobs will take 100X long to run. This may happen if you incorrectly specified the required memory or if there is a bug in your code or the application you are using. By specifying a hard limit you protect yours and other users work. | ||
+ | |||
+ | ==Examples of memory requests== | ||
+ | |||
+ | To request 1GB of RAM and terminate if you exceed 1.2G | ||
+ | |||
+ | #$ -l mem_free=1G,h_vmem=1.2G | ||
+ | |||
+ | To request 16GB you ask for just under as the OS on the 16GB node will consume a little memory, you can specify more but it will run on a 32GB node of which there are less. | ||
+ | |||
+ | #$ -l mem_free=15G,h_vmem=16.1G | ||
+ | |||
+ | To request 32GB | ||
+ | |||
+ | #$ -l mem_free=31G,h_vmem=32.2G | ||
+ | |||
+ | ==Important notes on memory== | ||
+ | |||
+ | mem_free is based on the free memory available on the node at the time the jobs runs. If you plan to use all the memory on a node, choose 1GB less than is available. This allows for OS running overhead | ||
+ | |||
+ | h_vmem is a hard limit on the amount of memory your job can consume before it gets cancelled. This should be set slightly higher than the amount you expect to use. | ||
+ | |||
+ | Do not try to make a node use more memory than it has. If it runs out of memory your jobs will take many orders of magnitudes longer to run or simple never complete. | ||
=Email notification= | =Email notification= | ||
Line 62: | Line 149: | ||
You can get an email notification for changes to the status of your jobs. The -N email@address option sets the the address to be emailed (you can specify multiple, they just need to be seperated by commas) | You can get an email notification for changes to the status of your jobs. The -N email@address option sets the the address to be emailed (you can specify multiple, they just need to be seperated by commas) | ||
- | qsub - | + | qsub -M me@lshtm.ac.uk myscript |
By default no changes to status will result in email notifications, you need to additionally specify what statuses you want to be notified about using the -m option. | By default no changes to status will result in email notifications, you need to additionally specify what statuses you want to be notified about using the -m option. | ||
Line 87: | Line 174: | ||
#$ -N MYHPCJOB | #$ -N MYHPCJOB | ||
#$ -M me@lshtm.ac.uk -m eabs | #$ -M me@lshtm.ac.uk -m eabs | ||
+ | #$ -q short.q | ||
+ | #$ -l mem_free=1G,h_vmem=1.2G | ||
#$ -V -cwd | #$ -V -cwd | ||
myprogram | myprogram | ||
Line 120: | Line 209: | ||
#$ -N ARRAY_TEST_JOB | #$ -N ARRAY_TEST_JOB | ||
#$ -cwd -V | #$ -cwd -V | ||
+ | #$ -q short.q | ||
+ | #$ -l mem_free=1G,h_vmem=1.2G | ||
#$ -t 1-10 | #$ -t 1-10 | ||
Line 142: | Line 233: | ||
#$ -N ARRAY_TEST_JOB | #$ -N ARRAY_TEST_JOB | ||
#$ -cwd -V | #$ -cwd -V | ||
+ | #$ -q short.q | ||
#$ -t 2-12:2 | #$ -t 2-12:2 | ||
Line 167: | Line 259: | ||
qsub -hold_jib 5230,5236,5302 myscript | qsub -hold_jib 5230,5236,5302 myscript | ||
+ | You can also specify the dependant jobs via there job name rather than id | ||
+ | |||
+ | qsub -hold_jid myjob1,myjob2,myjob3 myscript | ||
+ | |||
+ | =Parallel Jobs - OpenMP/SMP/OpenMPI= | ||
+ | |||
+ | The grid engine support jobs which require access to more than one cpu core or multiple nodes. These jobs should be sent to the parallel.q job queue and have a suitable parallel environment (pe) specified. Please see the following page for more details on submitting parallel jobs | ||
+ | [[Parallel Jobs]] | ||
[[Category:SGE]] | [[Category:SGE]] |
Current revision as of 22:16, 24 May 2019
qsub is the part of the Sun Grid Engine that allows you to submit you work to the cluster job queue. There are many command options for it i.e. setting up the enviroment, job name, array jobs, email alerts etc.
The two import switches to use are -cwd (makes the job start from the current working directory) and -V (makes the job run with the current enviroment variables)
qsub options program
Example
qsub -cwd -V myprog
Contents |
Options
-V (set enviroment variables to those when job is submitted) -cwd (run the job from current working directory) -N name (sets name of job) -M email@address (set email address for job notifaction) -m options (when to send email notification) b (begining of job) e (end of job) a (abortion or rescheduling of job) n (never email) s (suspention of job) example -m ase -t start-end:step Array job (i.e. 1-5), :step is optional and is the step increment (1-6:2 would be 1 2 4 6). Environment variable $SGE_TASK_ID hold current position. -q queue_name -l mem_free=1G,h_vmem=1.2G (request node with 1G free memory and terminate if memory exceeds 1.2G. Adjust to requirements)
Choosing a job queue
If you don't specify on of the following when submitting jobs, you the system will automatically select one. This is likely to cause you jobs to have problems.
To choose a queue when submiting:
qsub -q queue_name myscriptfile
or in you script file add
#$ -q queue_name
Please choose a queue from the following:
long.q
- Single CPU workloads
- 25 nodes
- up to 220 concurrent jobs, 50 running jobs per user
- Job/task wall clock time of 7 days. This means you jobs will be terminated if they take longer than 7 days to run.
short.q
- Single CPU workloads
- All cluster nodes
- up to 268 concurrent jobs, 100 running jobs per user
- Job/task wall clock time of 6 hours. This means you jobs will be terminated if they take longer than 6 hours to run.
tiny.q
- Single CPU workloads
- Two nodes
- upto 16 jobs
- Job/task wall clock time of 30 mins. This means you jobs will be terminated if they take longer than 30 minutes to run.
parallel.q
For use with multi CPU/OpenMPI jobs
- Multi CPU workloads
- 20 nodes
- up to 180 concurrent jobs/slots, 40 running slot per user.
- Job/task wall clock time of 30 days. This means you jobs will be terminated if they take longer than 30 days to run.
Embedding qsub options in scripts
NOTE If you write your job script on a windows machine and copy it to the cluster you must convert it using the dos2unix command i.e. dos2unix myjobscript
You can embed qsub options in to the scripts you use to run your programs on the cluster. They take the same overall form as if submitted via the command line, but each line must be prefixed with #$
Examples
#!/bin/bash myprogram
qsub -V -cwd -N MYHPCJOB -q short.q -l mem_free=1G,h_vmem=1.2G myscript
Is the same as saving the setting in a job script
#!/bin/bash #$ -N MYHPCJOB #$ -V -cwd #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G myprogram
qsub myscript
Memory usage and requesting large memory allocation
The HPC contains nodes with mixed quantities of memory (RAM). There are currently nodes with 8GB, 16GB and 32GB available to run your work.
When submitting your work you need to specify how much memory you will need to run the work. You should specify two values: mem_free the amount you need (plus a small amount just in case) and h_vmem a hard limit on the maximum it can consume and terminate if exceeded.
These values should be realistic values so the Job Scheduler can fit yours and other users work on to the various nodes to make the best use of resources. The less memory you use the more nodes that will be available to run you work. If you specify an email address in you script the email at the end of the job will contain information on how much memory your job consumed, this will help in planing your usage requirements. In the example below you can see Max vmem was 763MB or 0.76GB
Job 5013 (sim) Complete User = train Queue = serial.q@comp00.gecko.lshtm.ac.uk Host = comp00.gecko.lshtm.ac.uk Start Time = 10/14/2008 16:53:08 End Time = 10/14/2008 20:22:20 User Time = 02:55:55 System Time = 00:30:35 Wallclock Time = 03:29:12 CPU = 03:26:30 Max vmem = 763.133M Exit Status = 0
The reason you should specify a maximum memory limit is if the Job Scheduler believes you only need X amounte it will add additional jobs to the node to consume all the memory. If you job uses more than planed the node will run out of memory and then all the jobs will take 100X long to run. This may happen if you incorrectly specified the required memory or if there is a bug in your code or the application you are using. By specifying a hard limit you protect yours and other users work.
Examples of memory requests
To request 1GB of RAM and terminate if you exceed 1.2G
#$ -l mem_free=1G,h_vmem=1.2G
To request 16GB you ask for just under as the OS on the 16GB node will consume a little memory, you can specify more but it will run on a 32GB node of which there are less.
#$ -l mem_free=15G,h_vmem=16.1G
To request 32GB
#$ -l mem_free=31G,h_vmem=32.2G
Important notes on memory
mem_free is based on the free memory available on the node at the time the jobs runs. If you plan to use all the memory on a node, choose 1GB less than is available. This allows for OS running overhead
h_vmem is a hard limit on the amount of memory your job can consume before it gets cancelled. This should be set slightly higher than the amount you expect to use.
Do not try to make a node use more memory than it has. If it runs out of memory your jobs will take many orders of magnitudes longer to run or simple never complete.
Email notification
You can get an email notification for changes to the status of your jobs. The -N email@address option sets the the address to be emailed (you can specify multiple, they just need to be seperated by commas)
qsub -M me@lshtm.ac.uk myscript
By default no changes to status will result in email notifications, you need to additionally specify what statuses you want to be notified about using the -m option.
These are the various notification options
b (begining of job) e (end of job) a (abortion or rescheduling of job) n (never email - Default) s (suspention of job)
So to be emailed when the job finishes or is aborted/rescheduled you would specify the following
qsub -M me@lshtm.ac.uk -m ea myscript
For all notifications you would do
qsub -M me@lshtm.ac.uk -m eabs myscript
In a script:
#!/bin/bash #$ -N MYHPCJOB #$ -M me@lshtm.ac.uk -m eabs #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd myprogram
Example notification on a finished job
Job 5013 (sim) Complete User = train Queue = serial.q@comp00.gecko.lshtm.ac.uk Host = comp00.gecko.lshtm.ac.uk Start Time = 10/14/2008 16:53:08 End Time = 10/14/2008 20:22:20 User Time = 02:55:55 System Time = 00:30:35 Wallclock Time = 03:29:12 CPU = 03:26:30 Max vmem = 763.133M Exit Status = 0
Array Jobs
You can submit one job with multiple tasks using the option qstat option -t. This is an ideal method if you have a single program to run and multiple separate datasets to be processed. You can however use it in much more complicated setups, especially when combined with job dependancies.
When an array job is submitted an environment variable $SGE_TASK_ID is populated with current position in the array. So the value of this variable will change each time the job scheduler steps through the array.
The $SGE_TASK_ID variable can either be used in the script you create to submit your job or from within your application (i.e. within the C, Java, R code).
Example array job scripts
Simple 10 iteration loop (-t 1-10)
#!/bin/bash #$ -N ARRAY_TEST_JOB #$ -cwd -V #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -t 1-10 myProgram dataset.${SGE_TASK_ID}.dat
This example would submit 10 tasks to the job queue, the effective output would be:
myProgram dataset.1.dat myProgram dataset.2.dat myProgram dataset.3.dat myProgram dataset.4.dat myProgram dataset.5.dat myProgram dataset.6.dat myProgram dataset.7.dat myProgram dataset.8.dat myProgram dataset.9.dat myProgram dataset.10.dat
Simple 12 in steps of 2 (-t 2-12:2)
#!/bin/bash #$ -N ARRAY_TEST_JOB #$ -cwd -V #$ -q short.q #$ -t 2-12:2 myProgram dataset.${SGE_TASK_ID}.dat
This example would submit 6 task to the job queue, the effective output would be:
myProgram dataset.2.dat myProgram dataset.4.dat myProgram dataset.6.dat myProgram dataset.8.dat myProgram dataset.10.dat myProgram dataset.12.dat
Job Dependencies
You can specify that your job will not run until another job has completed
qsub -hold_jid <jobids> myscript
Examples
qsub -hold_jid 5204 myscript
qsub -hold_jib 5230,5236,5302 myscript
You can also specify the dependant jobs via there job name rather than id
qsub -hold_jid myjob1,myjob2,myjob3 myscript
Parallel Jobs - OpenMP/SMP/OpenMPI
The grid engine support jobs which require access to more than one cpu core or multiple nodes. These jobs should be sent to the parallel.q job queue and have a suitable parallel environment (pe) specified. Please see the following page for more details on submitting parallel jobs