R
From HPC
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
Website: http://www.r-project.org/
Documentation/Tutorials
R Versions
The core R install is upgraded regularly. If this causes issues with packages we hold older versions that can be used. If you need a different version please ask
The command module is used to load/unload versions of R
List available versions of R
Type module avail
The versions available will be listed as R/?.?.?, where ?.?.? is the version number i.e. 3.4.1 or 3.5.0
module avail
------------------------------------------------------------------------------------------------------ /usr/local/modulefiles ------------------------------------------------------------------------------------------------------- dot module-git module-info modules null R/3.4.1 R/3.4.4 R/3.5.2 use.own
Load a specific version of R
To load version 3.4.1 you would type
module load R/3.4.1
or for version 3.5.2
module load R/3.5.2
Loading in you qsub script
#!/bin/bash #$ -N R_JOB #$ -M me@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd module load R/3.4.1 R CMD BATCH myrscript myrscript.out
Main R Project Wiki
http://wiki.r-project.org/rwiki/doku.php
Moving from Stata to R
http://wiki.r-project.org/rwiki/doku.php
Installing R libraries for your account on the HPC
Important notes
- Installing packages - There are limits on hpclogin to stop users consuming too much resource (cpu/memory). You can install small packages interactively in R but for larger more complex packages you will need to install them via qsub as a job on the HPC.
- You should not use install.packages() in your jobs scripts where calculations are being run, you should use library() to load them. Either install interactively or run a job just for installing required packages.
- You install them from CRAN with install.packages("x").
- You use them in R with library("x").
First time install of packages - Once per Major.Minor version - create a personal library folder (i.e. 3.4.? or 3.5.?)
The first time you try to install a package you should run R interactively (login to hpclogin, run module load R/?.?.? and then type R) as it will prompt you to create a personal library. You need to answer yes twice.
Run a simple install for a small package
install.packages('Rcpp', repos="http://cran.ma.imperial.ac.uk/") Warning in install.packages("Rcpp", repos = "http://cran.ma.imperial.ac.uk/", : 'lib = "/usr/local/packages/apps/R/3.5.2/lib64/R/library"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘/home/aitsswhi/R/x86_64-library/3.5’ to install packages into? (yes/No/cancel) yes
Install and use packages
Run job to install packages
Qsub Script
- (use short.q unless large number of packages being install in which case use long.q)
- build.out will contain a log of the install to check it completed OK or locate any errors
#!/bin/bash #$ -N R_BUILD_JOB #$ -M your.email@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=4G,h_vmem=4.2G #$ -V -cwd module load R/3.5.3 which R R CMD BATCH build.R build.out
build.R
install.packages("ggplot2", repos = "https://cloud.r-project.org/", dependencies = TRUE) install.packages("arm", repos = "https://cloud.r-project.org/", dependencies = TRUE) install.packages("zoo", repos = "https://cloud.r-project.org/", dependencies = TRUE) install.packages("coda", repos = "https://cloud.r-project.org/", dependencies = TRUE) install.packages("stats", repos = "https://cloud.r-project.org/", dependencies = TRUE) install.packages("sna", repos = "https://cloud.r-project.org/", dependencies = TRUE)
Using installed library/package in your code
Add this to the top of your R script file.
library("ggplot2") library("arm") library("zoo") library("coda") library("stats") library("sna")
Package install locations (R_LIBS_USER)
Your packages need to be installed to folder you have write permissions
By default R will add packages to the R folder in your home directory ~/R (which is shorthand for /home/username/R). This folder must exist.
R will then create a set of sub folders for each version of R (~/R/x86_64-library/3.4 or ~/R/x86_64-library/3.5)
Note The library folder is shared for the same major.minor (e.g 3.5) version the patch number (e.g 3.5.1) will not affect the installed libraries.
Override R_LIBS_USER
You can set R_LIBS_USER to any path you like with in your home space or share with another user in their homespace.
Create/edit ~/.Renviron
default
R_LIBS_USER=~/R/x86_64-library/%v
You could also add custom R_LIBS_USER to your job script (for this to work you may want to not specify R_LIBS_USER in .Renviron)
#!/bin/bash #$ -N R_JOB #$ -M me@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd
module load R/3.4.1
R_LIBS_USER=~/R-custom-packages/%v
R CMD BATCH myrscript myrscript.out
Compiler overrides for R Packages
Create the file ~/.R/Makevars
CXX14FLAGS=-O2 -march=native -mtune=native -fPIC CC=gcc CXX=g++
Using a different version of GCC
Several major versions of GCC are available via environment modules
To see all available modules including GCC
module avail
To load the version you want
module load gcc/6.5.0
IMPORTANT depending on the package and how it compiles, you may need to load the same module(s) before you use the package. You load modules automatically by adding to your qsub script or even adding the commands to the bottom of ~/.bashrc
Running R program from file
R CMD BATCH rscriptfile outputfile
Example
R CMD BATCH myrscript myrscript.out
Your results will be saved to a file called myrscript.out
If you run multiple jobs at the same time they can cause problems when saving the workspace at the same time. To resolve the issue use
R CMD BATCH --no-save --no-restore myrscript myrscript.out
Running R program on HPC
Example job script saved as myrjob
#!/bin/bash #$ -N R_JOB #$ -M me@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd R CMD BATCH myrscript myrscript.out
Submitting job
qsub myrjob
Array Job
Array jobs allow you to submit one job with multiple tasks, you can then access the task id to use in scripts. You might use this with 10 task to process 10 different data files or use it to process you simulation with 10 different parameters.
qsub -t 1:10
See array jobs qsub
Job Script
#!/bin/bash #$ -N ARRAY_TEST_JOB #$ -q short.q #$ -cwd -V #$ -l mem_free=1G,h_vmem=1.2G #$ -t 1-10 R CMD BATCH myrscript myrscript${SGE_TASK_ID}.out
-t 1:10 This specifys the number of sequential tasks 1,2,3,4,5,6,7,8,9,10. This can be any range i.e. 1:5 or 1:1286.
This will submit your job with 10 tasks (1-10) and will create 10 R output files i.e:
myscript1.out .... myscript10.out
R Script
To use the task id in your R script you need to use Sys.getenv("SGE_TASK_ID")
taskIdChar <- Sys.getenv("SGE_TASK_ID") taskIdInteger <- (as.numeric(alphaenv)-1) dataFilename <- paste("coredate-", taskIdChar, ".dta")
taskIdChar is the string/char value of the current taskid
taskIdInteger is the current task id converted to an integer number
dataFilename is a sting combining the task id to specify a file to read/write to
Example R Program
Save the following to a file called myrscript to work with example instructions for running R jobs.
library("sna") library("network") # These are the names of the actors labels <- c("Allison", "Drew", "Ross", "Sarah", "Eliot", "Keith") net <- network.initialize(6) # Label the verticies net %v% "vertex.names" <- labels # Data on page 123. add.edges(net, c(1,1,2,2,5,6,3,4), c(2,3,4,5,2,3,4,2)) degree(net, cmode="outdegree") degree(net, cmode="indegree") # Note that the variance of indegree and outdegree, on page 128, can be calculated with: var(degree(net, cmode="outdegree")) var(degree(net, cmode="indegree"))
Guides
[Build RStan]