Skip to content

Commit

Permalink
Tweak all episodes
Browse files Browse the repository at this point in the history
  • Loading branch information
ocaisa committed Jan 30, 2024
1 parent 8e0184d commit 4d7d7cb
Show file tree
Hide file tree
Showing 6 changed files with 173 additions and 143 deletions.
49 changes: 26 additions & 23 deletions episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,33 +67,37 @@ rule hostname_login:
## Key points about this file

1. The file is named `Snakefile` - with a capital `S` and no file extension.
1. Some lines are indented. Indents must be with space characters, not tabs. See the
setup section for how to make your text editor do this.
1. The rule definition starts with the keyword `rule` followed by the rule name, then a colon.
1. We named the rule `hostname`. You may use letters, numbers or underscores, but the rule name
must begin with a letter and may not be a keyword.
1. Some lines are indented. Indents must be with space characters, not tabs. See
the setup section for how to make your text editor do this.
1. The rule definition starts with the keyword `rule` followed by the rule name,
then a colon.
1. We named the rule `hostname_login`. You may use letters, numbers or
underscores, but the rule name must begin with a letter and may not be a
keyword.
1. The keywords `input`, `output`, `shell` are all followed by a colon.
1. The file names and the shell command are all in `"quotes"`.
1. The output filename is given before the input filename. In fact, Snakemake doesn't care what
order they appear in but we give the output first throughout this course. We'll see why soon.
1. In this use case there is no input file for the command so we leave this blank.
1. The output filename is given before the input filename. In fact, Snakemake
doesn't care what order they appear in but we give the output first
throughout this course. We'll see why soon.
1. In this use case there is no input file for the command so we leave this
blank.

:::

Back in the shell we'll run our new rule. At this point, if there were any missing quotes, bad
indents, etc. we may see an error.
Back in the shell we'll run our new rule. At this point, if there were any
missing quotes, bad indents, etc. we may see an error.

```bash
$ snakemake -j1 -p hostname_login.txt
$ snakemake -j1 -p hostname_login
```

::: callout

## `bash: snakemake: command not found...`

If your shell tells you that it cannot find the command `snakemake` then we need to make the
software available somehow. In our case, this means searching for the module that we need to
load:
If your shell tells you that it cannot find the command `snakemake` then we need
to make the software available somehow. In our case, this means searching for
the module that we need to load:
```bash
module spider snakemake
```
Expand Down Expand Up @@ -148,17 +152,16 @@ What does the `-p` option in the `snakemake` command above do?
1. Tells Snakemake to only run one process at a time
1. Prompts the user for the correct input file

*Hint: you can search in the text by pressing `/`, and quit back to the shell with `q`*
*Hint: you can search in the text by pressing `/`, and quit back to the shell
with `q`*

:::::: solution

(2) Prints the shell commands that are being run to the terminal

This is such a useful thing we don't know why it isn't the default! The `-j1` option is what
tells Snakemake to only run one process at a time, and we'll stick with this for now as it
makes things simpler. The `-F` option tells Snakemake to always overwrite output files, and
we'll learn about protected outputs much later in the course. Answer 4 is a total red-herring,
as Snakemake never prompts interactively for user input.
This is such a useful thing we don't know why it isn't the default! The `-j1`
option is what tells Snakemake to only run one process at a time, and we'll
stick with this for now as it makes things simpler. Answer 4 is a total
red-herring, as Snakemake never prompts interactively for user input.
::::::
:::

Expand All @@ -167,7 +170,7 @@ as Snakemake never prompts interactively for user input.
- "Before running Snakemake you need to write a Snakefile"
- "A Snakefile is a text file which defines a list of rules"
- "Rules have inputs, outputs, and shell commands to be run"
- "You tell Snakemake what file to make and it will run the shell command defined in the
appropriate rule"
- "You tell Snakemake what file to make and it will run the shell command
defined in the appropriate rule"

:::
41 changes: 20 additions & 21 deletions episodes/02-snakemake_on_the_cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,11 @@ Nothing to be done (all requested files are present and up to date).
```

Nothing happened! Why not? When it is asked to build a target, Snakemake checks
the 'last modification
time' of both the target and its dependencies. If any dependency has been
updated since the target, then the actions are re-run to update the target.
Using this approach, Snakemake knows to only rebuild the files that, either
directly or indirectly, depend on the file that changed. This is called an
_incremental build_.
the 'last modification time' of both the target and its dependencies. If any
dependency has been updated since the target, then the actions are re-run to
update the target. Using this approach, Snakemake knows to only rebuild the
files that, either directly or indirectly, depend on the file that changed. This
is called an _incremental build_.

::: callout
## Incremental Builds Improve Efficiency
Expand All @@ -53,12 +52,11 @@ more efficient.
::: challenge
## Running on the cluster

We need another rule now that executes the `hostname` on the cluster. Create the
rule in your Snakefile and try to execute it on cluster with the options
`--executor slurm` to `snakemake`
We need another rule now that executes the `hostname` on the _cluster_. Create
a new rule in your Snakefile and try to execute it on cluster with the option
`--executor slurm` to `snakemake`.

:::::: solution

The rule is almost identical to the previous rule save for the rule name and
output file:

Expand Down Expand Up @@ -109,14 +107,13 @@ Complete log: .snakemake/log/2024-01-29T180346.788174.snakemake.log
Note all the warnings that Snakemake is giving us about the fact that the rule
may not be able to execute on our cluster as we may not have given enough
information. Luckily for us, this actually works on our cluster and we can take
a look in the output file we asked for, `hostname_remote.txt`:
a look in the output file the new rule creates, `hostname_remote.txt`:
```bash
[ocaisa@node1 ~]$ cat hostname_remote.txt
```
```output
tmpnode1.int.jetstream2.hpc-carpentry.org
```

::::::

:::
Expand Down Expand Up @@ -167,8 +164,10 @@ the help of a translation table:
| `--cpus-per-task` | `cpus_per_task` | number of cpus per task (in case of SMP, rather use `threads`) |
| `--nodes` | `nodes` | number of nodes |

The warnings given by Snakemake hinted that we need to provide these options.
One way to do it is to provide them is as part of the Snakemake rule, e.g.,
The warnings given by Snakemake hinted that we may need to provide these
options. One way to do it is to provide them is as part of the Snakemake rule
using the keyword `resources`,
e.g.,
```python
rule:
input: ...
Expand All @@ -178,8 +177,9 @@ rule:
runtime: <some number>
```
and we can also use the profile to define default values for these options to
use with our project. For example, the available memory on our cluster is about
4GB per core, so we can add that to our profile:
use with our project, using the keyword `default-resources`. For example, the
available memory on our cluster is about 4GB per core, so we can add that to our
profile:
```yaml
printshellcmds: True
jobs: 3
Expand All @@ -189,7 +189,7 @@ default-resources:
```

:::challenge
We know that our problem runs in a very short time. Make the default length of
We know that our problem runs in a very short time. Change the default length of
our jobs to two minutes for Slurm.

::::::solution
Expand Down Expand Up @@ -227,10 +227,9 @@ Slurm executor (which is what we are doing via our new profile) this
won't happen any more. So how do we force the rule to run on
the login node?

Well, it's no surprise that some Snakemake rules perform trivial tasks where job
submission might be
overkill (e.g., less than 1 minute worth of compute time). Similar to our case,
it would be a better
Well, in the case where a Snakemake rule performs a trivial task job submission
might be overkill (e.g., less than 1 minute worth of compute time). Similar to
our case, it would be a better
idea to have these rules execute locally (i.e. where the `snakemake` command is
run) instead of as a job. Snakemake lets you indicate which rules should always
run locally with the `localrules` keyword. Let's define `hostname_login` as a
Expand Down
5 changes: 3 additions & 2 deletions episodes/03-placeholders.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,9 @@ exercises: 30

::: questions
- "How do I make a generic rule?"
- "How does Snakemake decide what rule to run?"
:::

::: objectives
- "Understand the basic steps Snakemake goes through when running a workflow"
- "See how Snakemake deals with some errors"
:::

Expand Down Expand Up @@ -71,6 +69,9 @@ replace them with appropriate values - `{input}` with the full name of the input
file, and
`{output}` with the full name of the output file -- before running the command.

`{resources}` is also a placeholder, and we can access a named element of the
`{resources}` with the notation `{resources.runtime}` (for example).

:::keypoints
- "Snakemake rules are made more generic with placeholders"
- "Placeholders in the shell part of the rule are replaced with values based on the chosen
Expand Down
Loading

0 comments on commit 4d7d7cb

Please sign in to comment.