Skip to content

Commit

Permalink
Merge pull request #103 from uvarc/staging
Browse files Browse the repository at this point in the history
Slurm + CLI workshop
  • Loading branch information
orndorffp authored Jun 6, 2024
2 parents 9973f19 + 274015b commit a801717
Show file tree
Hide file tree
Showing 13 changed files with 140 additions and 43 deletions.
4 changes: 2 additions & 2 deletions content/notes/hpc-from-terminal/section1.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ Invoke the utility `rm` to delete a file.
```bash
$rm myfile
```
In this example, the shell issues a request to the kernel to delete `myfile`. The kernel then communicates with the software that manages file storage to exectute the operation.
In this example, the shell issues a request to the kernel to delete `myfile`. The kernel then communicates with the software that manages file storage to execute the operation.

When it is complete the shell then returns the UNIX prompt to the user, indicating that it is waiting for further commands.

## Rnning Our First Command
## Running Our First Command

Let’s run our first command. Into a terminal type
```bash
Expand Down
4 changes: 2 additions & 2 deletions content/notes/hpc-from-terminal/section2.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,11 +277,11 @@ $rmdir ~/test_dir/sub_dir
</details>

Q3: Copy the directory
`/share/resources/tutorials/rivanna-cli/shakespeare` into your home directory.
`/share/resources/tutorials/rivanna-cl/shakespeare` into your home directory.
<details><summary>Solution</summary>

```bash
$cp -r /share/resources/tutorials/rivanna-cli/shakespeare ~
$cp -r /share/resources/tutorials/rivanna-cl/shakespeare ~
```
</details>

8 changes: 4 additions & 4 deletions content/notes/hpc-from-terminal/section3.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ $more filename
This displays the contents of a file on the screen with line scrolling. To scroll you can use ‘arrow’ keys. To advance one line, press the Enter key. To advance a full page, press the space bar. Press `q` to exit.

```bash
$more ~/rivanna-cli/shakespeare/Lear.txt
$more ~/rivanna-cl/shakespeare/Lear.txt
```

To page upward within the text, press `b` (back).
Expand All @@ -164,7 +164,7 @@ To page upward within the text, press `b` (back).
You can search in the forward direction with `/`<pattern>, where pattern is a combination of characters you wish to find.

```bash
$more ~/rivanna-cli/shakespeare/Lear.text
$more ~/rivanna-cl/shakespeare/Lear.text
/serpent
```
<pre>
Expand Down Expand Up @@ -213,7 +213,7 @@ If used single `>` in place of the double `>>` in the above, `cat` will overwrit

Displays only the starting lines of a file. The default is first ten lines. Use “-n” to specify the number of lines.
```bash
$head ~/rivanna-cli/shakespeare/Lear.text
$head ~/rivanna-cl/shakespeare/Lear.text
```
<pre>
This Etext file is presented by Project Gutenberg, in
Expand All @@ -232,7 +232,7 @@ PROVIDED BY PROJECT GUTENBERG WITH PERMISSION. ELECTRONIC AND

Displays the last 10 lines.
```bash
$tail 30 ~/rivanna-cli/shakespeare/Lear.text
$tail 30 ~/rivanna-cl/shakespeare/Lear.text
```
<pre>
The cup of their deservings.- O, see, see!
Expand Down
4 changes: 2 additions & 2 deletions content/notes/slurm-from-cli/scripts/array.slurm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=standard
#SBATCH -A myalloc
#SBATCH --partition=interactive
#SBATCH -A hpc_training
#SBATCH --time=3:00:00
#SBATCH -o out%A.%a
#SBATCH -e err%A.%a
Expand Down
6 changes: 3 additions & 3 deletions content/notes/slurm-from-cli/scripts/hello.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=32000 # mb total memory
#SBATCH --time=1-12:00:00
#SBATCH --partition=standard
#SBATCH --account=myalloc
#SBATCH --time=2:00:00
#SBATCH --partition=interactive
#SBATCH --account=hpc_training

module purge
module load anaconda
Expand Down
3 changes: 2 additions & 1 deletion content/notes/slurm-from-cli/scripts/hybrid.slurm
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/bin/bash
#SBATCH -N 3
#SBATCH --ntasks-per-node=4
#SBATCH -c 10
#SBATCH -p parallel
#SBATCH -A myalloc
#SBATCH -A hpc_training
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END
Expand Down
4 changes: 2 additions & 2 deletions content/notes/slurm-from-cli/scripts/multicore.slurm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#SBATCH -n 1
#SBATCH -c 25
#SBATCH -p standard
#SBATCH -A myalloc
#SBATCH -p interactive
#SBATCH -A hpc_training
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END
Expand Down
2 changes: 1 addition & 1 deletion content/notes/slurm-from-cli/scripts/multinode.slurm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=40
#SBATCH --account=myalloc
#SBATCH --account=hpc_training
#SBATCH -p parallel
#SBATCH -t 10:00:00
#SBATCH --mail-user=mst3k@virginia.edu
Expand Down
4 changes: 2 additions & 2 deletions content/notes/slurm-from-cli/scripts/slow.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --partition=standard
#SBATCH --account=myalloc
#SBATCH --partition=interactive
#SBATCH --account=hpc_training

module purge

Expand Down
12 changes: 6 additions & 6 deletions content/notes/slurm-from-cli/section1.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,13 @@ Slurm refers to the root process as a **task**. By default, each task is assigne
SLURM refers to queues as __partitions__ . We do not have a default partition; each job must request one explicitly.

{{< table >}}
| Queue Name | Purpose | Job Time Limit | Memory / Node | Cores / Node |
| Queue Name | Purpose | Job Time Limit | Max Memory / Node / Job | Max Cores / Node |
| :-: | :-: | :-: | :-: | :-: |
| standard | For jobs on a single compute node | 7 days | 384 GB | 40 |
| gpu | For jobs that can use general purpose GPU’s<br /> (P100,V100,A100) | 3 days | 256 GB<br />384 GB<br />1 TB | 28<br />40<br />128 |
| parallel | For large parallel jobs on up to 50 nodes (<= 1500 CPU cores) | 3 days | 384 GB | 40<br /> |
| largemem | For memory intensive jobs | 4 days | 768 GB<br />1 TB | 16 |
| dev | To run jobs that are quick tests of code | 1 hour | 128 GB | 40 |
| standard | For jobs on a single compute node | 7 days | 375 GB | 37 |
| gpu | For jobs that can use general purpose GPU’s<br /> (A40,A100,A6000,V100,RTX3090) | 3 days | 1953 GB | 125 |
| parallel | For large parallel jobs on up to 50 nodes (<= 1500 CPU cores) | 3 days | 375 GB | 40<br /> |
| largemem | For memory intensive jobs | 4 days | 768 GB<br />1 TB | 45 |
| interactive | For quick interactive sessions (up to two RTX2080 GPUs) | 12 hours | 216 GB | 37 |
{{< /table >}}

To see an online list of available partitions, from a command line type
Expand Down
67 changes: 61 additions & 6 deletions content/notes/slurm-from-cli/section2.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,16 @@ The lines starting with `#SBATCH` are the resource requests. They are called "p
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=32000 # mb total memory
#SBATCH –-time=1-12:00:00
#SBATCH --partition=standard
#SBATCH --account=myalloc
#SBATCH --time=2:00:00
#SBATCH --partition=interactive
#SBATCH --account=hpc_training
```
Here we are requesting
* 1 node, 1 task
* 32GB of memory (measured in MB). Strictly speaking this will be "Gibibyes."
* 1 day and 12 hours of running time.
* 2 hours of running time.
* The standard partition (queue). A partition must be specified.
* The account (allocation) group `rivanna-training`
* The account (allocation) group `hpc_training`

The next lines set up the environment to run our job.
```bash
Expand Down Expand Up @@ -110,6 +110,61 @@ Angle brackets `< >` indicate a value to be specified, and are not typed.

See also our [documentation](https://www.rc.virginia.edu/userinfo/rivanna/slurm/) for many more examples.



## Modules

Any application software that you want to use will need to be loaded with the `module load` command.

For example:

```
$ module load matlab
$ module load anaconda
$ module load goolf R
```
Modules need to be loaded any time that a new shell is created to setup the same working environment. This includes every time that you log out and back in, and every time that you run a batch job on a compute node.

### Module Details

`module avail` – Lists all available modules and versions.

`module spider` – Shows all available modules

`module key keyword` – Shows modules with the keyword in the description

`module list` – Lists modules loaded in your environment.

`module load mymod` – Loads the default module to set up the environment for some software.

`module load mymod/N.M` – Loads a specific version N.M of software mymod.
module load compiler mpi mymod – For compiler- and MPI- specific modules, loads the modules in the appropriate order and, optionally, the version.

`module purge` – Clears all modules.

### Learning more about a Module

To locate a python module, try the following:

```
$ module avail python
$ module spider python
$ module key python
```

To find bioinformatics software packages, try this:

```
$ module key bio
```

The available software is also listed on our [website](https://www.rc.virginia.edu/userinfo/rivanna/software/complete-list/)

**Question:**

Why does the command `module load R` give an error?


## Working with Files and Folders

When using Slurm in terminal mode, you will probably want to create your own folders to organize your Slurm scripts, any input files, and the output. You will need to be able to move around from one folder to another at the terminal.
Expand Down Expand Up @@ -157,7 +212,7 @@ $pwd

**Exercise 2**

Use FastX or Open OnDemand or the command line to create a new folder under your home directory. Practice changing into and out of it.
Use FastX or Open OnDemand or the command line to create a new folder under your scratch directory. Practice changing into and out of it.

Use FastX and Caja to navigate to your `/scratch` directory. To get there, click `Go` in the Caja menu. A textbox will open. Be sure that "search for files" is unchecked. Erase whatever is in the textbox and type `/scratch/mst3k` (substituting your own user ID). Still in FastX, open a terminal (the black box, or in the System Tools menu) and navigate to your new scratch folder.

53 changes: 53 additions & 0 deletions content/notes/slurm-from-cli/section4.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,56 @@ You can also cancel individual tasks
```bash
scancel 1283839_11
```

## Useful Commands

When you submit a job and it doesn't start or fails for an unknown reason it could be due to restraints in your account. This could include running out of storage space or SUs on your allocation. Additionally, it's useful to see how busy the queue is. The following subsections highight how to identify these problems.

### Allocations

Sometimes it’s useful to check how many SUs are still available on your allocation. The `allocations` command displays information on your allocations and how many SUs are associated with them:

```
$ allocations
Account Balance Reserved Available
----------------- --------- --------- ---------
hpc_training 1000000 0 999882
```

running `allocations -a <allocation_name>` provides even more detail on when the allocation was last renewed and its members. E.g.

```
$ allocations -a hpc_training
Description StartTime EndTime Allocated Remaining PercentUsed Active
----------- ------------------- ---------- ----------- ---------- ----------- ------
new 2024-05-29 17:33:13 2025-05-29 1000000.000 999881.524 0.01 True
Name Active CommonName EmailAddress DefaultAccount
------ ------ ------------------------------ ------------------- ----------------------
.
.
.
```

### Storage Quota
One way to check your storage utilization is with the `hdquota` command. This command will show you how much of your home, scratch, and leased (if applicable) storage are being utilized. Below is the sample output for `hdquota`:

```
$ hdquota
Type Location Name Size Used Avail Use%
====================================================================================================
home /home mst3k 50G 16G 35G 32%
Scratch /scratch mst3k 12T 2.0T 11T 17%
```

This is a useful command to check whether or not you’re running out of storage space or to see where files need to be cleaned up. For more detailed information on disk utilization you may also use the `du` command to investiage specific directories.

### Queue limits and Usage

To gain information on the different queues you can use the `qlist` command. This will show the list of paritions, their usage, and the SU charge rate. You can use `qlimits` for information on each queue’s limits.

The the `sinfo` command will provide some more detailed information on the health of each queue and the number of active nodes available. These commands can be useful in diagnosing why a job may not be running, or to better understand the queue usage for more efficient job throughput. More information on hardware specifications and queue information can be found [here](https://rc.virginia.edu/userinfo/rivanna/overview/#hardware-configuration) on our website.

## Need Help

Research Computing is ready to help you learn to use our systems efficiently. You can [submit a ticket](https://www.rc.virginia.edu/form/support-request/). For in-person help, please attend one of our weekly sessions of [office hours](https://www.rc.virginia.edu/support/#office-hours).
12 changes: 0 additions & 12 deletions content/notes/slurm-from-cli/section5.md

This file was deleted.

0 comments on commit a801717

Please sign in to comment.