R

From HPC

(Difference between revisions)
Jump to: navigation, search
(Important notes)
(First time install of packages for a Major.Minor version - create user library location (i.e. 3.4.? or 3.5.?))
Line 72: Line 72:
Run a simple install for a small package
Run a simple install for a small package
-
   install.packages('packageName', repos="http://cran.ma.imperial.ac.uk/")
+
   install.packages('Rcpp', repos="http://cran.ma.imperial.ac.uk/", dependencies = TRUE)
 +
Warning in install.packages("Rcpp", repos = "http://cran.ma.imperial.ac.uk/",  :
 +
  'lib = "/usr/local/packages/apps/R/3.5.2/lib64/R/library"' is not writable
 +
Would you like to use a personal library instead? (yes/No/cancel) yes
 +
Would you like to create a personal library
 +
‘/home/aitsswhi/R/x86_64-library/3.5’
 +
to install packages into? (yes/No/cancel) yes
==Install and use packages==
==Install and use packages==

Revision as of 09:57, 4 April 2019

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

Website: http://www.r-project.org/

Contents

Documentation/Tutorials

R Versions

The core R install is upgraded regularly. If this causes issues with packages we hold older versions that can be used. If you need a different version please ask

The command module is used to load/unload versions of R

List available versions of R

Type module avail

The versions available will be listed as R/?.?.?, where ?.?.? is the version number i.e. 3.4.1 or 3.5.0

module avail
------------------------------------------------------------------------------------------------------ /usr/local/modulefiles -------------------------------------------------------------------------------------------------------
dot  module-git  module-info  modules  null  R/3.4.1  R/3.4.4  R/3.5.2  use.own

Load a specific version of R

To load version 3.4.1 you would type

module load R/3.4.1

or for version 3.5.2

module load R/3.5.2

Loading in you qsub script

#!/bin/bash
#$ -N R_JOB
#$ -M me@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -V -cwd 

module load R/3.4.1

R CMD BATCH myrscript myrscript.out

Main R Project Wiki

http://wiki.r-project.org/rwiki/doku.php

Moving from Stata to R

http://wiki.r-project.org/rwiki/doku.php

Installing R libraries for your account on the HPC

Important notes

  • Installing packages on hpclogin will not work for larger packages as there are resource limits (cpu/memory), you will need to run your package installs as a job.
  • You should not use install.packages() in your jobs scripts where calculations are being run, you should use library() to load them. Either install interactively or run a job just for installing required packages.
    • You install them from CRAN with install.packages("x").
    • You use them in R with library("x").

First time install of packages for a Major.Minor version - create user library location (i.e. 3.4.? or 3.5.?)

The first time you try to install a package you should run R interactively as it will prompt for questions.

Run a simple install for a small package

 install.packages('Rcpp', repos="http://cran.ma.imperial.ac.uk/", dependencies = TRUE)
Warning in install.packages("Rcpp", repos = "http://cran.ma.imperial.ac.uk/",  :
 'lib = "/usr/local/packages/apps/R/3.5.2/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/home/aitsswhi/R/x86_64-library/3.5’
to install packages into? (yes/No/cancel) yes

Install and use packages

Run job to install packages

Qsub Script

  • (use short.q unless large number of packages being install in which case use long.q)
  • build.out will contain a log of the install to check it completed OK or locate any errors
#!/bin/bash
#$ -N R_BUILD_JOB
#$ -M your.email@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=4G,h_vmem=4.2G
#$ -V -cwd

module load R/3.5.3 

which R

R CMD BATCH build.R build.out

build.R

install.packages("ggplot2", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("arm", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("zoo", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("coda", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("stats", repos = "https://cloud.r-project.org/", dependencies = TRUE)
install.packages("sna", repos = "https://cloud.r-project.org/", dependencies = TRUE)

Using installed library in your code

Add this to the top of your R script file.

library("ggplot2")
library("arm")
library("zoo")
library("coda")
library("stats")
library("sna")

Package install locations (R_LIBS_USER)

Your packages need to be installed to folder you have write permissions

By default R will add packages to the R folder in your home directory ~/R (which is shorthand for /home/username/R). This folder must exist.

R will then create a set of sub folders for each version of R (~/R/x86_64-library/3.4 or ~/R/x86_64-library/3.5)

Note The library folder is shared for the same major.minor (e.g 3.5) version the patch number (e.g 3.5.1) will not affect the installed libraries.

Override R_LIBS_USER

You can set R_LIBS_USER to any path you like with in your home space or share with another user in their homespace.

Create/edit ~/.Renviron

default

R_LIBS_USER=~/R/x86_64-library/%v

You could also add custom R_LIBS_USER to your job script (for this to work you may want to not specify R_LIBS_USER in .Renviron)

#!/bin/bash
#$ -N R_JOB
#$ -M me@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -V -cwd 
module load R/3.4.1
R_LIBS_USER=~/R-custom-packages/%v
R CMD BATCH myrscript myrscript.out

Compiler overrides for R Packages

Create the file ~/.R/Makevars

CXX14FLAGS=-O2 -march=native -mtune=native -fPIC
CC=gcc
CXX=g++

Using a different version of GCC

Several major versions of GCC are available via environment modules

To see all available modules including GCC

module avail

To load the version you want

module load gcc/6.5.0

IMPORTANT depending on the package and how it compiles, you may need to load the same module(s) before you use the package. You load modules automatically by adding to your qsub script or even adding the commands to the bottom of ~/.bashrc

Running R program from file

R CMD BATCH rscriptfile outputfile

Example

R CMD BATCH myrscript myrscript.out

Your results will be saved to a file called myrscript.out

If you run multiple jobs at the same time they can cause problems when saving the workspace at the same time. To resolve the issue use

R CMD BATCH --no-save --no-restore myrscript myrscript.out

Running R program on HPC

Example job script saved as myrjob

#!/bin/bash
#$ -N R_JOB
#$ -M me@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -V -cwd
R CMD BATCH myrscript myrscript.out

Submitting job

qsub myrjob

Array Job

Array jobs allow you to submit one job with multiple tasks, you can then access the task id to use in scripts. You might use this with 10 task to process 10 different data files or use it to process you simulation with 10 different parameters.

qsub -t 1:10

See array jobs qsub

Job Script

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -q short.q
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

R CMD BATCH myrscript myrscript${SGE_TASK_ID}.out

-t 1:10 This specifys the number of sequential tasks 1,2,3,4,5,6,7,8,9,10. This can be any range i.e. 1:5 or 1:1286.

This will submit your job with 10 tasks (1-10) and will create 10 R output files i.e:

myscript1.out
....
myscript10.out

R Script

To use the task id in your R script you need to use Sys.getenv("SGE_TASK_ID")

taskIdChar <- Sys.getenv("SGE_TASK_ID")
taskIdInteger <- (as.numeric(alphaenv)-1)

dataFilename <- paste("coredate-", taskIdChar, ".dta") 

taskIdChar is the string/char value of the current taskid

taskIdInteger is the current task id converted to an integer number

dataFilename is a sting combining the task id to specify a file to read/write to

Example R Program

Save the following to a file called myrscript to work with example instructions for running R jobs.

library("sna")
library("network")

# These are the names of the actors
labels <- c("Allison", "Drew", "Ross", "Sarah", "Eliot", "Keith")

net <- network.initialize(6)

# Label the verticies
net %v% "vertex.names" <- labels

# Data on page 123.
add.edges(net, c(1,1,2,2,5,6,3,4), c(2,3,4,5,2,3,4,2))

degree(net, cmode="outdegree")
degree(net, cmode="indegree")

# Note that the variance of indegree and outdegree, on page 128, can be calculated with:
var(degree(net, cmode="outdegree"))
var(degree(net, cmode="indegree"))

Drawing Graphs/Plots

To draw plots, graphs etc on the cluster you need to use the Cairo package.

The Cairo package documentation pdf.

The Cairo package can create graphs in multiple formats:

CairoPNG(...)
CairoJPEG(...)
CairoTIFF(...)
CairoPDF(...)
CairoSVG(...)
CairoPS(...)

Example PNG Graph

require(Cairo)
Cairo(600, 600, file="plotcairotest.png", type="png", bg="white") 
plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red 
dev.off() # creates a file "plot.png" with the above plot

or

require(Cairo)
CairoPNG(600, 600, file="plotcairotest2.png", bg="white") 
plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red 
dev.off() # creates a file "plot.png" with the above plot
Personal tools