Parallel Jobs
From HPC
(→Running Shared Memory or OpenMP job (single node, most common)) |
(→OpenMPI) |
||
Line 61: | Line 61: | ||
=OpenMPI= | =OpenMPI= | ||
- | Distributed Memory, multiple nodes. This method of parallel requires you to write your programs to work with openMPI. | + | Distributed Memory, multiple nodes. This method of parallel requires you to write your programs to work with openMPI. |
'''Memory requests''' - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB). | '''Memory requests''' - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB). | ||
+ | |||
+ | |||
+ | Openmpi documentation for Grid Engine - https://www.open-mpi.org/faq/?category=sge (not pe is openmpi not orte) | ||
==Process that a parallel job goes through== | ==Process that a parallel job goes through== |
Revision as of 09:45, 9 April 2015
Contents |
Parallel queue
The parallel queue is for running two types of parallel enviroment job Shared Memory/openMP (single node) and openMPI (single/multiple nodes).
Basic jobs spec
#$ -q parallel.q #$ -pe smp 4 #$ -R y
or
#$ -q parallel.q #$ -pe openmpi 16 #$ -R y
Running Shared Memory or OpenMP job (single node, most common)
For OpenMP jobs - The enviroment OMP_NUM_THREADS must be set to define the number of threads the program should use. Use as many as you have processors i.e 8
Memory requests - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB).
The "#$ -pe smp 2" specifies the job should run on one node and will consume two slots/cpus.
Example submission script (8 CPUS/Slots for 8 OpenMP threads)
#!/bin/bash #$ -cwd -V #$ -l mem_free=1G,h_vmem=1G #$ -q parallel.q #$ -pe smp 8 #$ -R y export OMP_NUM_THREADS=8 myOpenMPapp
Example of requesting two cpus (non OpenMP)
#!/bin/bash #$ -cwd -V #$ -l mem_free=1G,h_vmem=1G #$ -q parallel.q #$ -pe smp 2 #$ -R y myMultiCPuApp
To submit job to the queue
qsub script
OpenMP
Shared memory, single node. This is parallel across the cores on a single node, can often be archived by just compiling your code with openMP flags. However better performance can be achieve if you programs are written with openMP in mind.
Compiling
enabling openMP while compiling:
GNU: gfortran -fopenmp -o <exec> <src>
OpenMPI
Distributed Memory, multiple nodes. This method of parallel requires you to write your programs to work with openMPI.
Memory requests - The memory specified by h_vmem is per slot, so if ask for 8 slots (-pe smp 8) then the node requires 8 x h_vmem (8 x 1GB = 8GB).
Openmpi documentation for Grid Engine - https://www.open-mpi.org/faq/?category=sge (not pe is openmpi not orte)
Process that a parallel job goes through
- SGE produces a list of hosts $PE_HOSTFILE
- SGE executes a "start" script for the PE
- SGE runs the users job script
- On termination a "stop" script is executed
Scripts that are automatically used are in
/usr/local/sge6.0/streamline/mpi/
The ompi_start.sh script
#!/bin/sh # Local info # This is executed on the front end server # SERVER=`hostname -s` function ncpus() { n=`cat /proc/cpuinfo | grep processor | tail -1 | cut -f 2- -d :` echo $((n+1)) return } SMP=${SMP:-`ncpus`} . /etc/profile # pe_hostfile=$1 echo $pe_hostfile cat $pe_hostfile job_id=$2 user=`basename ${HOME}` if [ -d /users/$user ]; then user_dir=/users/$user else user_dir=${HOME} fi mpich_dir=${user_dir}/.mpich mkdir -p $mpich_dir cat $pe_hostfile | cut -f 1 -d " " | sort > $mpich_dir/mpich_hosts.$job_id
Compiling OpenMPI
Setup up the system environment, using modules to load the OpenMPI libraries you want to use. This is likely to be a choice between GNU and Intel.
Compile using via wrapper scripts
Intel
With the module openmpi/1.2.6-1/intel loaded
/opt/openmpi-1.2.6-1/intel/bin/
mpic++, mpiCC, mpicc, mpicxx, mpif77, mpif90
GNU
With the module openmpi/1.2.6-1/gnu loaded
/opt/openmpi-1.2.6-1/gcc/bin/
mpic++, mpiCC, mpicc, mpicxx, mpif77, mpif90
Submitting OpenMPI Job
Run using batch scheduler
ompisub C <app> (C = cores)
ompisub 8 <prog> <args> (8 cores, which equals on node)
ompisub NxC <app> (N = Nodes C = Cores)
ompisub 4x3 <prog> <args> (4 nodes each setup to use 3 cores)