Notes on using UC Berkeley's Savio cluster and XSEDE for multicore and multinode parallel R computation via SLURM.
- h2o-slurm-multinode.Rmd - example of how to start a multinode h2o cluster using R
- sbatch-r-rmd.sh - generalized slurm job script that runs any R or Rmd file
- Makefile - generalized Makefile to customize slurm parameters and submit jobs
- compile-R.md - compiling R on Savio
- batchtools.slurm.tmpl - example batchtools SLURM file, esp. for usage with future.batchtools (see Makefile and future-batchtools.R)
- Research IT's Savio user guide
- Berkeley Research Computing's Savio 2017 training repository
- Chris Paciorek's parallel distributed repository
Opens a bash shell with access to 1 node for 30 minutes, via the biostat condo:
srun -A co_biostat -p savio2 -N 1 -t 30:0 --pty bash
After 30 minutes have elapsed the system will terminate the bash shell and send you back to the login node. Or run the "exit" command to stop early, per usual.
Do the same thing, but with 2 nodes and for 5 hours, then check that it works:
srun -A co_biostat -p savio2 -N 2 -t 300:0 --pty bash
echo $SLURM_NODELIST # Should list two computer hostnames
Run myjob.sh, which defines the parameters of the SLURM job (see Chris P's biostats repo above).
sbatch myjob.sh
squeue -u $USER
Create ~/.Rprofile and put this in the first line:
options(repos=structure(c(CRAN="http://cran.rstudio.com/")))
Then when you install packages you won't have to select a mirror every time.
Run (without the angle brackets):
wwall -j <JOBID>
This will show you current CPU and memory usage for a given job. This can be helpful when understanding the performance characteristics of an analysis running sequentially or in parallel.
After a job has completed (or been terminated/cancelled), you can review the maximum memory used via the sacct command.
sacct -j <JOBID> --format=JobID,JobName,AveRSS,MaxRSS,NNodes,NCPUS,Elapsed
MaxRSS will show the maximum amount of memory that the job used in kilobytes, so divide by 1024^2 to get gigabytes.
I find that a customized output for squeue gives clearer information, so I've added an sq
alias to my ~/.bashrc file:
alias sq='squeue -u ${USER} -o "%.7i %.12P %.13j %.10q %.10M %.6D %R"'
This provides longer strings for the partition and account, adds in the QOS, automatically restricts to jobs submitted by the current user, and removes some unnecessary columns.
- Follow github instructions to create a new ssh key on your personal computer.
- Call the new key id_rsa_savio so that it's a different file from your existing github ssh key.
- Add the public key to your Github account.
- Copy the private key onto Savio/XSEDE:
~/.ssh/id_rsa
- You could copy the private key to your clipboard (as shown in github instructions) and then paste it into a new textfile on Savio using a text editor like vim or pico.
- Or you could use scp or ftp to copy it. E.g.
scp ~/.ssh/id_rsa_savio username@dtn.brc.berkeley.edu:.ssh/id_rsa
- Make sure
.ssh/id_rsa
is only readable by you:chmod 600 ~/.ssh/id_rsa
- Then edit
~/.ssh/config
on Savio/XSEDE (using pico or vim) to include the following lines:Host github.com IdentityFile ~/.ssh/id_rsa
- Make sure
.ssh/config
is only readable by you:chmod 600 ~/.ssh/config
Login nodes will kill long-running processes after a certain amount of time - something like 2-3 days. So using screen-saving program like tmux does not work on a login node directly. However, the data transfer node (dtn.brc.berkeley.edu) does not seem to restrict how long a process can run. Therefore to use tmux, ssh-agent, or related long-running processes, ssh to dtn, start up the processes, then from within dtn ssh into a login node to submit jobs. (Thanks to Aaron Culich for relaying me this tip.)
Example (starting from personal computer):
ssh username@dtn.brc.berkeley.edu
# Start tmux
module load tmux
tmux a
# Load ssh-agent
eval $(ssh-agent -s)
# Add github key. Note that a better way to do this is to edit ~/.ssh/config
ssh-add ~/.ssh/savio_id_rsa
# Connect to a login node to submit jobs.
ssh ln001
Note: as of January 2017 this does not seem to work anymore. To be explored more.
- Install osxfuse, sshfs, and Macfusion
This can be done easily with Homebrew:
brew cask install osxfuse sshfs
Currently (Nov. 16) Homebrew does not have the latest version of Macfusion, so that needs to be installed from https://github.com/ElDeveloper/macfusion2.
You can then use MacFusion's GUI to mount your Savio directory to your mac using ssh. This makes it easy to operate on remote files as though they are on your computer, e.g. opening R scripts in RStudio to edit. Make sure to use "dtn.brc.berkeley.edu" as the host rather than "hpc.brc.berkeley.edu", as DTN is intended for remote mount operations and HPC won't allow it.