Python

From HPC

Jump to: navigation, search

Contents

Python

It is recommended you install and manage python with either PyEnv or Conda rather than use the system python. This puts you in control of the python version and when it gets updated. Making use of python virtual environments allows you have various reusable environments that can contain python packages of different versions or compatibilities.

Managing packages with virtual environments

PyEnv

| Pyenv site

pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well.It is recommend to use Python Virtual Environments to create isolated Python environments where you can install any package combinations you need without worrying about dependencies and versions, and indirectly permissions.

pyenv-virtualenv

| Pyenv-virtualenv site

pyenv-virtualenv is a pyenv plugin that provides features to manage virtualenvs and conda environments for Python on UNIX-like systems.

Install

git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bash_profile
exec "$SHELL"

Now you can install your preferred version of Python

pyenv install 3.7.4

Anaconda/Conda/Miniconda

| conda site

Package, dependency and environment management for any language---Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs but it can package and distribute software for any language.

See also Conda (R/Python package management)

Running Python Jobs on the HPC

You need to submit the job to the long.q or parallel.q. If your code only uses one CPU then use the long.q and parallel.q if you could can use multiple CPUS/Threads. When submitting to the Parallel.q you must know exactly how many CPUs/Threads your code will consume and specify that number in your job script. This allows the HPC to know how much resource will be consumed when running your jobs so it only runs the number of jobs a compute node can handle. The same applies to the amount of memory your job consumes.

First setup your python environment and use pip to install dependancies

Install pyenv as instructed above or install Minicoda (see advanced below).

Install the version of Python you need

pyenv install 3.6.4

Once the install finished check the version available

pyenv versions
python --version

Next use pip to install dependancies

pip install <package>

Advanced - third party libraries

If you need python packages that have dependancies on third party libraries you are advised to use Anaconda or Minicoda as they have pre-compiled third party libraries. You need to read the documentation for each linked above or Conda (R/Python package management).

Submitting the job with a job script

Create a text file called 'jobscript' (if you create this on windows when on the hpc run the command 'dos2unix jobscript'

#!/bin/bash
#$ -N PYTHONJOB
#$ -V -cwd
#$ -q long.q 
#$ -l mem_free=1G,h_vmem=1.2G

# Note you may need to load/activate the version of python you want to load or the virtual python environment (the command depends on Pyenv or Anaconda) for example
# activate python-3.2.4
 
python name_of_script.py

To then submit the job with the following command

qsub jobscript

Notes

This is a single thread/cpu job on the HPC ( #$ -q long.q )

The jobs uses a maximum of 1G of RAM ( #$ -l mem_free=1G,h_vmem=1.2G), if you need more then adjust the values for example if you need 16 GB ( #$ -l mem_free=16G,h_vmem=16.2G)

Running Task array jobs

Example SGE job script (One job with 10 tasks)

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -cwd -V
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

python mycode.py

To use the current SGE job task number your python code you need to access the variable and assign it.

#!/bin/python
import os
os.environ.keys() 
taskid = os.environ['JOB_ID']
Personal tools