Python

From HPC

Revision as of 09:55, 10 October 2019 by Aitsswhi (Talk | contribs)
Jump to: navigation, search

Contents

Python

It is recommended you install and manage python with either PyEnv or Conda rather than use the system python. This puts you in control of the python version and when it gets updated. Making use of python virtual environments allows you have various reusable environments that can contain python packages of different versions or compatibilities.

Managing packages with virtual environments

PyEnv

| Pyenv site

pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well.It is recommend to use Python Virtual Environments to create isolated Python environments where you can install any package combinations you need without worrying about dependencies and versions, and indirectly permissions.

pyenv-virtualenv

| Pyenv-virtualenv site

pyenv-virtualenv is a pyenv plugin that provides features to manage virtualenvs and conda environments for Python on UNIX-like systems.

Anaconda/Conda/Miniconda

| conda site

Package, dependency and environment management for any language---Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs but it can package and distribute software for any language.

See also Conda (R/Python package management)


Running Python Jobs on the HPC

You need to submit the job to the long.q or parallel.q. If your code only uses one CPU then use the long.q and parallel.q if you could can use multiple CPUS/Threads. When submitting to the Parallel.q you must know exactly how many CPUs/Threads your code will consume and specify that number in your job script. This allows the HPC to know how much resource will be consumed when running your jobs so it only runs the number of jobs a compute node can handle. The same applies to the amount of memory your job consumes.

First setup your python environment and use pip to install dependancies

Running Task array jobs

Example SGE job script (One job with 10 tasks)

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -cwd -V
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

python mycode.py

To use the current SGE job task number your python code you need to access the variable and assign it.

#!/bin/python
import os
os.environ.keys() 
taskid = os.environ['JOB_ID']
Personal tools