Stata

From HPC

Jump to: navigation, search

Stata DO files are processed on the cluster. You should create this files in Stata on your workstation and then upload them to the cluster for processing.

When creating you Stata do files, it is worth thinking about braking the work up into multiple DO files if possible. This means when you come to run them on the cluster you can run multiple simultaneously. For example you may have just one DO file that would take 20 hours to process, if you chopped it up into 10 DO files, it would take just 2hrs. It is also worth doing so that if for some reason the cluster has problems (node dies or crashes), that it will not take too long to re-run your job(s).

Once you have created your work, you upload the the files via SFTP (see Accessing the Cluster) to you home directory. It is highly recommend that you create a new directory for each stata job/project you plan to run. You then login in to the cluster via ssh/putty (see Accessing the Cluster ) and submit your stata do file via a job script to the job queue.

Once you have uploaded you do files(s) you will need to create a script to submit the work to the job queue.

Contents

Important note about running Stata jobs

Stata jobs must be submitted using the qstata command and not the normal qsub command.

Single do file script

In a text editor create the following script in the same directory as your do file. For this example the scripts filename is myscript, but you can call it anything you like.

#!/bin/bash
#$ -N JOB_NAME
#$ -M email@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -cwd -V
stata-se  -b do filename_of_do

So if you had a do file called mywork.do

#!/bin/bash
#$ -N JOB_NAME
#$ -M email@lshtm.ac.uk -m be
#$ -q short.q
#$ -l mem_free=1G,h_vmem=1.2G
#$ -cwd -V
stata-se -b do mywork

Then to submit your job

qstata myscript

All you output files should be generated in the directory you submitted the job from.

Multiple do files script

If you have chopped your stata do file up to create multiple, you will need to name them the same, but with a sequence number prepended or appended. For example

1mywork.do
2mywork.do
3mywork.do

or

mywork1.do
mywork2.do
mywork3.do

Then in a text editor create the following script in the same directory as your DO files. For this example the scripts filename is myscript, but you can call it anything you like. This example assumes you used the mywork1.do, mywork2.do... naming convention.

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -M email@lshtm.ac.uk -m be
#$ -q short.q
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

stata-se -b do mywork${SGE_TASK_ID}

This script as it stands will submit 10 DO files, if you want to change it to say 20, change this line

#$ -t 1-10

To

#$ -t 1-20

Then submit the job

qstata myscript

Single Do File, using SGE_TASK_ID to run multiple tasks

Submission script

#!/bin/bash
#$ -N ARRAY_TEST_JOB
#$ -M email@lshtm.ac.uk -m be
#$ -cwd -V
#$ -l mem_free=1G,h_vmem=1.2G
#$ -t 1-10

stata-se -b do mywork

From within your Stata do script, you assign the SGE_TASK_ID to a local variable

local x : env SGE_TASK_ID

You can then use this to choose different data sets to upload, or as a seed.

Personal tools