R

From HPC

Revision as of 06:50, 3 April 2019 by Aitsswhi (Talk | contribs)
Jump to: navigation, search

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

Website: http://www.r-project.org/

Contents

Documentation/Tutorials

R Versions

The core R install is upgraded regularly. If this causes issues with packages we hold older versions that can be used. If you need a different version please ask

The command module is used to load/unload versions of R

List available versions of R

Type module avail

The versions available will be listed as R/?.?.?, where ?.?.? is the version number i.e. 3.4.1 or 3.5.0

module avail
------------------------------------------------------------------------------------------------------ /usr/local/modulefiles -------------------------------------------------------------------------------------------------------
dot  module-git  module-info  modules  null  R/3.4.1  R/3.4.4  R/3.5.2  use.own

Load Version

To load version 3.4.1 you would type

module load R/3.4.1

or for version 3.5.2

module load R/3.5.2

Main R Project Wiki

http://wiki.r-project.org/rwiki/doku.php

Moving from Stata to R

http://wiki.r-project.org/rwiki/doku.php

Packages Installed Centrally


The are other packages may already be installed. If you require other packages to please contact the cluster administrators or see Installing R libraries for your account on the HPC.


Other packages are listed at http://cran.r-project.org/web/packages/

Installing R libraries for your account on the HPC

You can install an R package your home directory, which is then available to all your jobs running on any compute node. Note that install.packages() automatically goes out and downloads the package from CRAN, unless you specify another non standard repository.

You should not use install.packages() in your jobs scripts where calculations are being run. Either install interactively or run a job just for installing required packages.

The first time you try to install a package you should run R interactively as it will prompt for questions.

Login to hpclogin.lshtm.ac.uk and then run R interactively

R

Install Library from Cran

install.packages('packageName') 

You should be prompted to choose a cran repository mirror. If you have problems choosing, you can specify a mirror as part of the install (see http://cran.r-project.org/mirrors.html for a full list of mirror sites)

install.packages('packageName', repos="http://cran.ma.imperial.ac.uk/") 

OR Install Library from local .tar.gz file. (If you can't use CRAN because you library is not hosted there or if you have problems with accessing CRAN just view in browser and download file)

This command must be run in the directory containing the downloaded package

R CMD INSTALL -l packageName.tar.gz

Using library in your code, add this to the top of your R script file.

library(packageName)

Running R program from file

R CMD BATCH rscriptfile outputfile

Example

R CMD BATCH myrscript myrscript.out

Your results will be saved to a file called myrscript.out

If you run multiple jobs at the same time they can cause problems when saving the workspace at the same time. To resolve the issue use

R CMD BATCH --no-save --no-restore myrscript myrscript.out

Running R program on HPC

Example job script saved as myrjob

#!/bin/bash
#$ -N R_JOB
#$ -M me@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -V -cwd
R CMD BATCH myrscript myrscript.out

Submitting job

qsub myrjob

Array Job

Array jobs allow you to submit one job with multiple tasks, you can then access the task id to use in scripts. You might use this with 10 task to process 10 different data files or use it to process you simulation with 10 different parameters.

qsub -t 1:10

See array jobs qsub

Job Script

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -q short.q
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

R CMD BATCH myrscript myrscript${SGE_TASK_ID}.out

-t 1:10 This specifys the number of sequential tasks 1,2,3,4,5,6,7,8,9,10. This can be any range i.e. 1:5 or 1:1286.

This will submit your job with 10 tasks (1-10) and will create 10 R output files i.e:

myscript1.out
....
myscript10.out

R Script

To use the task id in your R script you need to use Sys.getenv("SGE_TASK_ID")

taskIdChar <- Sys.getenv("SGE_TASK_ID")
taskIdInteger <- (as.numeric(alphaenv)-1)

dataFilename <- paste("coredate-", taskIdChar, ".dta") 

taskIdChar is the string/char value of the current taskid

taskIdInteger is the current task id converted to an integer number

dataFilename is a sting combining the task id to specify a file to read/write to

Example R Program

Save the following to a file called myrscript to work with example instructions for running R jobs.

library("sna")
library("network")

# These are the names of the actors
labels <- c("Allison", "Drew", "Ross", "Sarah", "Eliot", "Keith")

net <- network.initialize(6)

# Label the verticies
net %v% "vertex.names" <- labels

# Data on page 123.
add.edges(net, c(1,1,2,2,5,6,3,4), c(2,3,4,5,2,3,4,2))

degree(net, cmode="outdegree")
degree(net, cmode="indegree")

# Note that the variance of indegree and outdegree, on page 128, can be calculated with:
var(degree(net, cmode="outdegree"))
var(degree(net, cmode="indegree"))

Drawing Graphs/Plots

To draw plots, graphs etc on the cluster you need to use the Cairo package.

The Cairo package documentation pdf.

The Cairo package can create graphs in multiple formats:

CairoPNG(...)
CairoJPEG(...)
CairoTIFF(...)
CairoPDF(...)
CairoSVG(...)
CairoPS(...)

Example PNG Graph

require(Cairo)
Cairo(600, 600, file="plotcairotest.png", type="png", bg="white") 
plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red 
dev.off() # creates a file "plot.png" with the above plot

or

require(Cairo)
CairoPNG(600, 600, file="plotcairotest2.png", bg="white") 
plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red 
dev.off() # creates a file "plot.png" with the above plot
Personal tools