R
From HPC
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
Website: http://www.r-project.org/
Contents |
Documentation/Tutorials
R Versions
The core R install is upgraded regularly. If this causes issues with packages we hold older versions that can be used. If you need a different version please ask
The command module is used to load/unload versions of R
List available versions of R
Type module avail
The versions available will be listed as R/?.?.?, where ?.?.? is the version number i.e. 3.4.1 or 3.5.0
module avail
------------------------------------------------------------------------------------------------------ /usr/local/modulefiles ------------------------------------------------------------------------------------------------------- dot module-git module-info modules null R/3.4.1 R/3.4.4 R/3.5.2 use.own
Load a specific version of R
To load version 3.4.1 you would type
module load R/3.4.1
or for version 3.5.2
module load R/3.5.2
Loading in you qsub script
#!/bin/bash #$ -N R_JOB #$ -M me@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd module load R/3.4.1 R CMD BATCH myrscript myrscript.out
Main R Project Wiki
http://wiki.r-project.org/rwiki/doku.php
Moving from Stata to R
http://wiki.r-project.org/rwiki/doku.php
Packages Installed Centrally
- arm
- zoo
- coda
- stats
- sna
- rjags http://cran.r-project.org/web/packages/rjags/index.html
- R2jags http://cran.r-project.org/web/packages/R2jags/index.html
- Matching http://sekhon.berkeley.edu/matching/
- rgenoud
- Cairo http://cran.r-project.org/web/packages/Cairo/
The are other packages may already be installed. If you require other packages to please contact the cluster administrators or see Installing R libraries for your account on the HPC.
Other packages are listed at http://cran.r-project.org/web/packages/
Installing R libraries for your account on the HPC
You can install an R package your home directory, which is then available to all your jobs running on any compute node. Note that install.packages() automatically goes out and downloads the package from CRAN, unless you specify another non standard repository.
You should not use install.packages() in your jobs scripts where calculations are being run. Either install interactively or run a job just for installing required packages.
The first time you try to install a package you should run R interactively as it will prompt for questions.
Login to hpclogin.lshtm.ac.uk and then run R interactively
R
Install Library from Cran
install.packages('packageName')
You should be prompted to choose a cran repository mirror. If you have problems choosing, you can specify a mirror as part of the install (see http://cran.r-project.org/mirrors.html for a full list of mirror sites)
install.packages('packageName', repos="http://cran.ma.imperial.ac.uk/")
OR Install Library from local .tar.gz file. (If you can't use CRAN because you library is not hosted there or if you have problems with accessing CRAN just view in browser and download file)
This command must be run in the directory containing the downloaded package
R CMD INSTALL -l packageName.tar.gz
Using library in your code, add this to the top of your R script file.
library(packageName)
Running R program from file
R CMD BATCH rscriptfile outputfile
Example
R CMD BATCH myrscript myrscript.out
Your results will be saved to a file called myrscript.out
If you run multiple jobs at the same time they can cause problems when saving the workspace at the same time. To resolve the issue use
R CMD BATCH --no-save --no-restore myrscript myrscript.out
Running R program on HPC
Example job script saved as myrjob
#!/bin/bash #$ -N R_JOB #$ -M me@lshtm.ac.uk -m be #$ -q short.q #$ -l mem_free=1G,h_vmem=1.2G #$ -V -cwd R CMD BATCH myrscript myrscript.out
Submitting job
qsub myrjob
Array Job
Array jobs allow you to submit one job with multiple tasks, you can then access the task id to use in scripts. You might use this with 10 task to process 10 different data files or use it to process you simulation with 10 different parameters.
qsub -t 1:10
See array jobs qsub
Job Script
#!/bin/bash #$ -N ARRAY_TEST_JOB #$ -q short.q #$ -cwd -V #$ -l mem_free=1G,h_vmem=1.2G #$ -t 1-10 R CMD BATCH myrscript myrscript${SGE_TASK_ID}.out
-t 1:10 This specifys the number of sequential tasks 1,2,3,4,5,6,7,8,9,10. This can be any range i.e. 1:5 or 1:1286.
This will submit your job with 10 tasks (1-10) and will create 10 R output files i.e:
myscript1.out .... myscript10.out
R Script
To use the task id in your R script you need to use Sys.getenv("SGE_TASK_ID")
taskIdChar <- Sys.getenv("SGE_TASK_ID") taskIdInteger <- (as.numeric(alphaenv)-1) dataFilename <- paste("coredate-", taskIdChar, ".dta")
taskIdChar is the string/char value of the current taskid
taskIdInteger is the current task id converted to an integer number
dataFilename is a sting combining the task id to specify a file to read/write to
Example R Program
Save the following to a file called myrscript to work with example instructions for running R jobs.
library("sna") library("network") # These are the names of the actors labels <- c("Allison", "Drew", "Ross", "Sarah", "Eliot", "Keith") net <- network.initialize(6) # Label the verticies net %v% "vertex.names" <- labels # Data on page 123. add.edges(net, c(1,1,2,2,5,6,3,4), c(2,3,4,5,2,3,4,2)) degree(net, cmode="outdegree") degree(net, cmode="indegree") # Note that the variance of indegree and outdegree, on page 128, can be calculated with: var(degree(net, cmode="outdegree")) var(degree(net, cmode="indegree"))
Drawing Graphs/Plots
To draw plots, graphs etc on the cluster you need to use the Cairo package.
The Cairo package documentation pdf.
The Cairo package can create graphs in multiple formats:
CairoPNG(...) CairoJPEG(...) CairoTIFF(...) CairoPDF(...) CairoSVG(...) CairoPS(...)
Example PNG Graph
require(Cairo) Cairo(600, 600, file="plotcairotest.png", type="png", bg="white") plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red dev.off() # creates a file "plot.png" with the above plot
or
require(Cairo) CairoPNG(600, 600, file="plotcairotest2.png", bg="white") plot(rnorm(4000),rnorm(4000),col="#ff000018",pch=19,cex=2) # semi-transparent red dev.off() # creates a file "plot.png" with the above plot