Parallel Jobs

From HPC

Jump to: navigation, search

Contents

Parallel queue

The parallel queue is for running two types of parallel enviroment job Shared Memory/openMP (single node) and openMPI (single/multiple nodes).

Basic jobs spec

#$ -q parallel.q
#$ -pe smp 4
#$ -R y

or

#$ -q parallel.q
#$ -pe openmpi 16
#$ -R y

Important notes

Your code MUST limit the number of CPUS/Cores/Thread/Processes used to the same number of slots requested.

If you don't do this your code may run on the same host as other users jobs and you will consume more resource than you requested affecting the other users work. This can lead to resource starvation causing your jobs and others users jobs to take exponentially longer or even crash the host.

You cannot assume the number of CPUS on a host (we have various combination for example 12 or 32 CPU hosts). So you cannot simply assume that the number of slots requested will get you exclusive use of a host to save limiting your code. Requesting 32 slots will severely limit the hosts your job can run on and will take a long time to start running.

Jobs take a long time to start

The more CPUSs/slots you request the longer you will have to wait for a host that has that many free slots. You will be waiting for multiple jobs to finish so the scheduler can give you have the requested resource for your job.

The HPC contains hosts with different numbers of Slots (CPUS/Cores) some will have 8, 12 or 32 for example. The more slots you request the less hosts that can run the job. If you request 20 slots, the job cannot run on a host with less than 20 CPUs. The exception to this run is OpenMPI jobs which are designed to run across multiple hosts.

Running Shared Memory or OpenMP job (single node, most common)

For OpenMP jobs - The enviroment OMP_NUM_THREADS must be set to define the number of threads the program should use. Use as many as you have processors i.e 8

Memory requests - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB).

The "#$ -pe smp 2" specifies the job should run on one node and will consume two slots/cpus.

Example submission script (8 CPUS/Slots for 8 OpenMP threads)

#!/bin/bash
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1G
#$ -q parallel.q
#$ -pe smp 8
#$ -R y
export OMP_NUM_THREADS=8
myOpenMPapp

Example of requesting two cpus (non OpenMP)

#!/bin/bash
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1G
#$ -q parallel.q
#$ -pe smp 2
#$ -R y
myMultiCPuApp

To submit job to the queue

qsub script


OpenMP

Shared memory, single node. This is parallel across the cores on a single node, can often be archived by just compiling your code with openMP flags. However better performance can be achieve if you programs are written with openMP in mind.

Compiling

enabling openMP while compiling:

GNU: gfortran -fopenmp -o <exec> <src>

OpenMPI

Distributed Memory, multiple nodes. This method of parallel requires you to write your programs to work with openMPI.

Memory requests - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB).


Openmpi documentation for Grid Engine - https://www.open-mpi.org/faq/?category=sge (not pe is openmpi not orte)

Process that a parallel job goes through

  • SGE produces a list of hosts $PE_HOSTFILE
  • SGE executes a "start" script for the PE
  • SGE runs the users job script
  • On termination a "stop" script is executed

Compiling OpenMPI

Setup up the system environment, using modules to load the OpenMPI libraries you want to use. This is likely to be a choice between GNU and Intel.

Compile using via wrapper scripts

GNU

With the module (currently Open MPI: 1.6.4)

mpic++, mpiCC, mpicc, mpicxx, mpif77, mpif90

Submitting OpenMPI Job

You should specify parallel environment openmpi (-pe openmpi x)

Memory requests - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB).

Job script (16 slots)

#!/bin/bash
#$ -cwd -V 
#$ -l mem_free=1G,h_vmem=1G
#$ -q parallel.q
#$ -pe openmpi 16
#$ -R y
mpirun -np 16 <program>

Note: -np value should match -pe value

Personal tools