diff --git a/CITATION.cff b/CITATION.cff
new file mode 100644
index 0000000..09cb074
--- /dev/null
+++ b/CITATION.cff
@@ -0,0 +1,56 @@
+# This CITATION.cff file was generated with cffinit.
+# Visit https://bit.ly/cffinit to generate yours today!
+
+cff-version: 1.2.0
+title: HPC Workflow Management with Snakemake
+message: >-
+  If you use this software, please cite it using the
+  metadata from this file.
+type: software
+authors:
+  - given-names: Alan
+    family-names: O'Cais
+    email: alan.ocais@cecam.org
+    affiliation: University of Barcelona
+    orcid: 'https://orcid.org/0000-0002-8254-8752'
+repository-code: 'https://github.com/carpentries-incubator/hpc-workflows'
+url: 'https://carpentries-incubator.github.io/hpc-workflows/'
+abstract: >-
+  When using HPC resources, it's very common to need to
+  carry out the same set of tasks over a set of data
+  (commonly called a workflow or pipeline). In this lesson
+  we will make an experiment that takes an application which
+  runs in parallel and investigate it’s scalability. To do
+  that we will need to gather data, in this case that means
+  running the application multiple times with different
+  numbers of CPU cores and recording the execution time.
+  Once we’ve done that we need to create a visualisation of
+  the data to see how it compares against the ideal case.
+
+
+  We could do all of this manually, but there are useful
+  tools to help us manage data analysis pipelines like we
+  have in our experiment. In the context of this lesson,
+  we’ll learn about one of those: Snakemake.
+keywords:
+  - HPC
+  - Carpentries
+  - Lesson
+  - Workflow
+  - Pipeline
+license: CC-BY-4.0
+references:
+  - authors:
+      - family-names: Collins
+        given-names: Daniel
+    title: "Getting Started with Snakemake"
+    type: software
+    repository-code: 'https://github.com/carpentries-incubator/workflows-snakemake/'
+    url: 'https://carpentries-incubator.github.io/workflows-snakemake/'
+  - authors:
+      - family-names: Booth
+        given-names: Tim
+    title: "Snakemake for Bioinformatics"
+    type: software
+    repository-code: 'https://github.com/carpentries-incubator/snakemake-novice-bioinformatics/'
+    url: 'https://carpentries-incubator.github.io/snakemake-novice-bioinformatics'
\ No newline at end of file
diff --git a/config.yaml b/config.yaml
index ec8758f..13b4362 100644
--- a/config.yaml
+++ b/config.yaml
@@ -58,22 +58,22 @@ contact: 'maintainers-hpc@lists.carpentries.org'
 # - another-learner.md
 
 # Order of episodes in your lesson
-episodes: 
-- amdahl_foundation.md
-- snakemake_single.md
-- snakemake_multiple.md
-- snakemake_cluster.md
-- snakemake_profiles.md
-- amdahl_snakemake.md
+episodes:
+- 01-introduction.md
+- 02-snakemake_on_the_cluster.md
+- 03-placeholders.md
+- 04-snakemake_and_mpi.md
+- 05-chaining_rules.md
+- 06-expansion.md
 
 # Information for Learners
-learners: 
+learners:
 
 # Information for Instructors
-instructors: 
+instructors:
 
 # Learner Profiles
-profiles: 
+profiles:
 
 # Customisation ---------------------------------------------
 #
diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md
new file mode 100644
index 0000000..159154e
--- /dev/null
+++ b/episodes/01-introduction.md
@@ -0,0 +1,176 @@
+---
+title: "Running commands with Snakemake"
+teaching: 30
+exercises: 30
+---
+
+::: questions
+- "How do I run a simple command with Snakemake?"
+:::
+
+:::objectives
+- "Create a Snakemake recipe (a Snakefile)"
+:::
+
+
+## What is the workflow I'm interested in?
+
+In this lesson we will make an experiment that takes an application which runs
+in parallel and investigate it's scalability. To do that we will need to gather
+data, in this case that means running the application multiple times with
+different numbers of CPU cores and recording the execution time. Once we've
+done that we need to create a visualisation of the data to see how it compares
+against the ideal case.
+
+From the visualisation we can then decide at what scale it
+makes most sense to run the application at in production to maximise the use of
+our CPU allocation on the system.
+
+We could do all of this manually, but there are useful tools to help us manage
+data analysis pipelines like we have in our experiment. Today we'll learn about
+one of those: Snakemake.
+
+In order to get started with Snakemake, let's begin by taking a simple command
+and see how we can run that via Snakemake. Let's choose the command `hostname`
+which prints out the name of the host where the command is executed:
+
+```bash
+[ocaisa@node1 ~]$ hostname
+```
+```output
+node1.int.jetstream2.hpc-carpentry.org
+```
+
+That prints out the result but Snakemake relies on files to know the status of
+your workflow, so let's redirect the output to a file:
+
+```bash
+[ocaisa@node1 ~]$ hostname > hostname_login.txt
+```
+
+## Making a Snakefile
+
+Edit a new text file named `Snakefile`.
+
+Contents of `Snakefile`:
+
+```python
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:  
+    shell:
+        "hostname > hostname_login.txt"
+```
+
+::: callout
+
+## Key points about this file
+
+1. The file is named `Snakefile` - with a capital `S` and no file extension.
+1. Some lines are indented. Indents must be with space characters, not tabs. See
+   the setup section for how to make your text editor do this.
+1. The rule definition starts with the keyword `rule` followed by the rule name,
+   then a colon.
+1. We named the rule `hostname_login`. You may use letters, numbers or
+   underscores, but the rule name must begin with a letter and may not be a
+   keyword.
+1. The keywords `input`, `output`, `shell` are all followed by a colon.
+1. The file names and the shell command are all in `"quotes"`.
+1. The output filename is given before the input filename. In fact, Snakemake
+   doesn't care what order they appear in but we give the output first
+   throughout this course. We'll see why soon.
+1. In this use case there is no input file for the command so we leave this
+   blank.
+
+:::
+
+Back in the shell we'll run our new rule. At this point, if there were any
+missing quotes, bad indents, etc. we may see an error.
+
+```bash
+$ snakemake -j1 -p hostname_login
+```
+
+::: callout
+
+## `bash: snakemake: command not found...`
+
+If your shell tells you that it cannot find the command `snakemake` then we need
+to make the software available somehow. In our case, this means searching for
+the module that we need to load:
+```bash
+module spider snakemake
+```
+
+```output
+[ocaisa@node1 ~]$ module spider snakemake
+
+--------------------------------------------------------------------------------------------------------
+  snakemake:
+--------------------------------------------------------------------------------------------------------
+     Versions:
+        snakemake/8.2.1-foss-2023a
+        snakemake/8.2.1 (E)
+
+Names marked by a trailing (E) are extensions provided by another module.
+
+
+--------------------------------------------------------------------------------------------------------
+  For detailed information about a specific "snakemake" package (including how to load the modules) use the module's full name.
+  Note that names that have a trailing (E) are extensions provided by other modules.
+  For example:
+
+     $ module spider snakemake/8.2.1
+--------------------------------------------------------------------------------------------------------
+
+```
+
+Now we want the module, so let's load that to make the package available
+
+```bash
+[ocaisa@node1 ~]$ module load snakemake
+```
+
+and then make sure we have the `snakemake` command available
+
+```bash
+[ocaisa@node1 ~]$ which snakemake
+```
+```output
+/cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/snakemake/8.2.1-foss-2023a/bin/snakemake
+```
+:::
+
+::: challenge
+## Running Snakemake
+
+Run `snakemake --help | less` to see the help for all available options.
+What does the `-p` option in the `snakemake` command above do?
+
+1. Protects existing output files
+1. Prints the shell commands that are being run to the terminal
+1. Tells Snakemake to only run one process at a time
+1. Prompts the user for the correct input file
+
+*Hint: you can search in the text by pressing `/`, and quit back to the shell
+with `q`*
+
+:::::: solution
+(2) Prints the shell commands that are being run to the terminal
+
+This is such a useful thing we don't know why it isn't the default! The `-j1`
+option is what tells Snakemake to only run one process at a time, and we'll
+stick with this for now as it makes things simpler. Answer 4 is a total
+red-herring, as Snakemake never prompts interactively for user input.
+::::::
+:::
+
+::: keypoints
+
+- "Before running Snakemake you need to write a Snakefile"
+- "A Snakefile is a text file which defines a list of rules"
+- "Rules have inputs, outputs, and shell commands to be run"
+- "You tell Snakemake what file to make and it will run the shell command
+  defined in the appropriate rule"
+
+:::
diff --git a/episodes/02-snakemake_on_the_cluster.md b/episodes/02-snakemake_on_the_cluster.md
new file mode 100644
index 0000000..30eba35
--- /dev/null
+++ b/episodes/02-snakemake_on_the_cluster.md
@@ -0,0 +1,248 @@
+---
+title: "Running Snakemake on the cluster"
+teaching: 30
+exercises: 20
+---
+
+::: objectives
+
+- "Define rules to run locally and on the cluster"
+
+:::
+
+::: questions
+
+- "How do I run my Snakemake rule on the cluster?"
+
+:::
+
+What happens when we want to make our rule run on the cluster rather than the
+login node? The cluster we are using uses Slurm, and it happens that Snakemake
+has built in support for Slurm, we just need to tell it that we want to use it.
+
+Snakemake uses the `executor` option to allow you to select the plugin that you
+wish to execute the rule. The quickest way to apply this to your Snakefile is to
+define this on the command line. Let's try it out
+
+```bash
+[ocaisa@node1 ~]$ snakemake -j1 -p --executor slurm hostname_login
+```
+
+```output
+Building DAG of jobs...
+Retrieving input from storage.
+Nothing to be done (all requested files are present and up to date).
+```
+
+Nothing happened! Why not? When it is asked to build a target, Snakemake checks
+the 'last modification time' of both the target and its dependencies. If any
+dependency has been updated since the target, then the actions are re-run to
+update the target. Using this approach, Snakemake knows to only rebuild the
+files that, either directly or indirectly, depend on the file that changed. This
+is called an _incremental build_. 
+
+::: callout
+## Incremental Builds Improve Efficiency
+
+By only rebuilding files when required, Snakemake makes your processing
+more efficient.
+:::
+
+
+::: challenge
+## Running on the cluster
+
+We need another rule now that executes the `hostname` on the _cluster_. Create
+a new rule in your Snakefile and try to execute it on cluster with the option
+`--executor slurm` to `snakemake`.
+
+:::::: solution
+The rule is almost identical to the previous rule save for the rule name and 
+output file:
+
+```python
+rule hostname_remote:
+    output: "hostname_remote.txt"
+    input:
+    shell:
+        "hostname > hostname_remote.txt"
+
+```
+You can then execute the rule with
+```bash
+[ocaisa@node1 ~]$ snakemake -j1 -p --executor slurm hostname_remote
+```
+```output
+Building DAG of jobs...
+Retrieving input from storage.
+Using shell: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
+Provided remote nodes: 1
+Job stats:
+job                count
+---------------  -------
+hostname_remote        1
+total                  1
+
+Select jobs to execute...
+Execute 1 jobs...
+
+[Mon Jan 29 18:03:46 2024]
+rule hostname_remote:
+    output: hostname_remote.txt
+    jobid: 0
+    reason: Missing output files: hostname_remote.txt
+    resources: tmpdir=<TBD>
+
+hostname > hostname_remote.txt
+No SLURM account given, trying to guess.
+Guessed SLURM account: def-users
+No wall time information given. This might or might not work on your cluster. If not, specify the resource runtime in your rule or as a reasonable default via --default-resources.
+No job memory information ('mem_mb' or 'mem_mb_per_cpu') is given - submitting without. This might or might not work on your cluster.
+Job 0 has been submitted with SLURM jobid 326 (log: /home/ocaisa/.snakemake/slurm_logs/rule_hostname_remote/326.log).
+[Mon Jan 29 18:04:26 2024]
+Finished job 0.
+1 of 1 steps (100%) done
+Complete log: .snakemake/log/2024-01-29T180346.788174.snakemake.log
+```
+Note all the warnings that Snakemake is giving us about the fact that the rule
+may not be able to execute on our cluster as we may not have given enough
+information. Luckily for us, this actually works on our cluster and we can take
+a look in the output file the new rule creates, `hostname_remote.txt`:
+```bash
+[ocaisa@node1 ~]$ cat hostname_remote.txt
+```
+```output
+tmpnode1.int.jetstream2.hpc-carpentry.org
+```
+::::::
+
+:::
+
+## Snakemake profile
+
+Adapting Snakemake to a particular environment can entail many flags and
+options. Therefore, it is possible to specify a configuration profile to be used
+to obtain default options. This looks like
+```bash
+snakemake --profile myprofileFolder ...
+```
+The profile folder must contain a file called `config.yaml` which is what will
+store our options. The folder may also contain other files necessary for the
+profile. Let's create the file `cluster_profile/config.yaml` and insert some of
+our existing options:
+
+```yaml
+printshellcmds: True
+jobs: 3
+executor: slurm
+```
+
+We should now be able rerun our workflow by pointing to the profile rather than
+the listing out the options. To force our workflow to rerun, we first need to
+remove the output file `hostname_remote.txt`, and then we can try out our new
+profile
+```bash
+[ocaisa@node1 ~]$ rm hostname_remote.txt 
+[ocaisa@node1 ~]$ snakemake --profile cluster_profile hostname_remote
+```
+
+The profile is extremely useful in the context of our cluster, as the Slurm
+executor has lots of options, and sometimes you need to use them to be able to
+submit jobs to the cluster you have access to. Unfortunately, the names of the
+options in Snakemake are not _exactly_ the same as those of Slurm, so we need
+the help of a translation table:
+
+| SLURM             | Snakemake         | Description                                                    |
+|-------------------|-------------------|----------------------------------------------------------------|
+| `--partition`     | `slurm_partition` | the partition a rule/job is to use                             |
+| `--time`          | `runtime`         | the walltime per job in minutes                                |
+| `--constraint`    | `constraint`      | may hold features on some clusters                             |
+| `--mem`           | `mem, mem_mb`     | memory a cluster node must                                     |
+|                   |                   | provide (mem: string with unit), mem_mb: int                   |
+| `--mem-per-cpu`   | `mem_mb_per_cpu`  | memory per reserved CPU                                        |
+| `--ntasks`        | `tasks`           | number of concurrent tasks / ranks                             |
+| `--cpus-per-task` | `cpus_per_task`   | number of cpus per task (in case of SMP, rather use `threads`) |
+| `--nodes`         | `nodes`           | number of nodes                                                |
+
+The warnings given by Snakemake hinted that we may need to provide these
+options. One way to do it is to provide them is as part of the Snakemake rule
+using the keyword `resources`,
+e.g.,
+```python
+rule:
+    input: ...
+    output: ...
+    resources:
+        partition: <partition name>
+        runtime: <some number>
+```
+and we can also use the profile to define default values for these options to
+use with our project, using the keyword `default-resources`. For example, the
+available memory on our cluster is about 4GB per core, so we can add that to our
+profile:
+```yaml
+printshellcmds: True
+jobs: 3
+executor: slurm
+default-resources:
+  - mem_mb_per_cpu=3600
+```
+
+:::challenge
+We know that our problem runs in a very short time. Change the default length of
+our jobs to two minutes for Slurm.
+
+::::::solution
+
+```yaml
+printshellcmds: True
+jobs: 3
+executor: slurm
+default-resources:
+  - mem_mb_per_cpu=3600
+  - runtime=2
+```
+::::::
+
+:::
+
+There are various `sbatch` options not directly supported by the resource
+definitions in the table above. You may use the `slurm_extra` resource to
+specify any of these additional flags to `sbatch`:
+
+```python
+rule myrule:
+    input: ...
+    output: ...
+    resources:
+        slurm_extra="--mail-type=ALL --mail-user=<your email>"
+```
+
+## Local rule execution
+
+Our initial rule was to
+get the hostname of the login node. We always want to run that rule on the login
+node for that to make sense. If we tell Snakemake to run all rules via the 
+Slurm executor (which is what we are doing via our new profile) this
+won't happen any more. So how do we force the rule to run on
+the login node?
+
+Well, in the case where a Snakemake rule performs a trivial task job submission
+might be overkill (e.g., less than 1 minute worth of compute time). Similar to
+our case, it would be a better
+idea to have these rules execute locally (i.e. where the `snakemake` command is
+run) instead of as a job. Snakemake lets you indicate which rules should always
+run locally with the `localrules` keyword. Let's define `hostname_login` as a
+local rule near the top of our `Snakefile`.
+
+```python
+localrules: hostname_login
+```
+
+::: keypoints
+
+- "Snakemake generates and submits its own batch scripts for your scheduler."
+- "You can store default configuration settings in a Snakemake profile"
+- "`localrules` defines rules that are executed locally, and never submitted to a cluster."
+
+:::
diff --git a/episodes/03-placeholders.md b/episodes/03-placeholders.md
new file mode 100644
index 0000000..8e93283
--- /dev/null
+++ b/episodes/03-placeholders.md
@@ -0,0 +1,79 @@
+---
+title: "Placeholders"
+teaching: 40
+exercises: 30
+---
+
+::: questions
+- "How do I make a generic rule?"
+:::
+
+::: objectives
+- "See how Snakemake deals with some errors"
+:::
+
+Our Snakefile has some duplication. For example, the names of text
+files are repeated in places throughout the Snakefile rules. Snakefiles are
+a form of code and, in any code, repetition can lead to problems (e.g. we rename
+a data file in one part of the Snakefile but forget to rename it elsewhere).
+
+::: callout
+## D.R.Y. (Don't Repeat Yourself)
+
+In many programming languages, the bulk of the language features are
+there to allow the programmer to describe long-winded computational
+routines as short, expressive, beautiful code.  Features in Python,
+R, or Java, such as user-defined variables and functions are useful in
+part because they mean we don't have to write out (or think about)
+all of the details over and over again.  This good habit of writing
+things out only once is known as the "Don't Repeat Yourself"
+principle or D.R.Y.
+:::
+
+Let us set about removing some of the repetition from our Snakefile.
+
+## Placeholders
+
+To make a more general-purpose rule we need **placeholders**. Let's take a look
+at what a placeholder looks like
+
+```python
+rule hostname_remote:
+    output: "hostname_remote.txt"
+    input:
+    shell:
+        "hostname > {output}"
+
+```
+
+As a reminder, here's the previous version from the last episode:
+
+```python
+rule hostname_remote:
+    output: "hostname_remote.txt"
+    input:
+    shell:
+        "hostname > hostname_remote.txt"
+
+```
+
+The new rule has replaced explicit file names with things in `{curly brackets}`,
+specifically `{output}` (but it could also have been `{input}`...if that had
+a value and were useful).
+
+
+### `{input}` and `{output}` are **placeholders**
+
+Placeholders are used in the `shell` section of a rule, and Snakemake will
+replace them with appropriate values - `{input}` with the full name of the input
+file, and
+`{output}` with the full name of the output file -- before running the command.
+
+`{resources}` is also a placeholder, and we can access a named element of the
+`{resources}` with the notation `{resources.runtime}` (for example).
+
+:::keypoints
+- "Snakemake rules are made more generic with placeholders"
+- "Placeholders in the shell part of the rule are replaced with values based on the chosen
+   wildcards"
+:::
diff --git a/episodes/04-snakemake_and_mpi.md b/episodes/04-snakemake_and_mpi.md
new file mode 100644
index 0000000..0e7b41a
--- /dev/null
+++ b/episodes/04-snakemake_and_mpi.md
@@ -0,0 +1,442 @@
+---
+title: "MPI applications and Snakemake"
+teaching: 30
+exercises: 20
+---
+
+::: objectives
+
+- "Define rules to run locally and on the cluster"
+
+:::
+
+::: questions
+
+- "How do I run an MPI application via Snakemake on the cluster?"
+
+:::
+
+Now it's time to start getting back to our real workflow. We can execute a
+command on the cluster, but what about executing the MPI application we are
+interested in? Our application is called `amdahl` and is available as an
+environment module.
+
+::: challenge
+
+Locate and load the `amdahl` module and then _replace_ our `hostname_remote`
+rule with a version that runs `amdahl`. (Don't worry about parallel MPI just
+yet, run it with a single CPU, `mpiexec -n 1 amdahl`).
+
+Does your rule execute correctly? If not look through the log files to find out
+why?
+
+::::::solution
+
+```bash
+module spider amdahl
+module load amdahl
+```
+will locate and then load the `amdahl` module. We can then update/replace our
+rule to run the `amdahl` application:
+```python
+rule amdahl_run:
+    output: "amdahl_run.txt"
+    input:
+    shell:
+        "mpiexec -n 1 amdahl > {output}"
+```
+However, when we try to execute the rule we get an error (unless you already
+have a different version of `amdahl` available in your path). Snakemake
+reports the
+location of the logs and if we look inside we can (eventually) find
+```output
+...
+mpiexec -n 1 amdahl > amdahl_run.txt
+--------------------------------------------------------------------------
+mpiexec was unable to find the specified executable file, and therefore
+did not launch the job.  This error was first reported for process
+rank 0; it may have occurred for other processes as well.
+
+NOTE: A common cause for this error is misspelling a mpiexec command
+      line parameter option (remember that mpiexec interprets the first
+      unrecognized command line token as the executable).
+
+Node:       tmpnode1
+Executable: amdahl
+--------------------------------------------------------------------------
+...
+```
+So, even though we loaded the module before running the workflow, our
+Snakemake rule didn't find the executable. That's because the Snakemake rule
+is running in a clean _runtime environment_, and we need to somehow tell it to
+load the necessary environment module before trying to execute the rule.
+
+::::::
+:::
+
+## Snakemake and environment modules
+
+Our application is called `amdahl` and is available on the system via an
+environment module, so we need to
+tell Snakemake to load the module before it tries to execute the rule. Snakemake
+is aware of environment modules, and these can be specified via (yet another)
+option:
+```python
+rule amdahl_run:
+    output: "amdahl_run.txt"
+    input:
+    envmodules:
+      "mpi4py",
+      "amdahl"
+    input:
+    shell:
+        "mpiexec -n 1 amdahl > {output}"
+```
+
+Adding these lines are not enough to make the rule execute however. Not only do
+you have to tell Snakemake what modules to load, but you also have to tell it to
+use environment modules in general (since the use of environment modules is 
+considered to make your runtime environment less reproducible as the available
+modules may differ from cluster to cluster). This requires you to give Snakemake
+an additonal option
+```bash
+snakemake --profile cluster_profile --use-envmodules amdahl_run
+```
+
+::: challenge
+
+We'll be using environment modules throughout the rest of tutorial, so make that
+a default option of our profile (by setting it's value to `True`)
+
+::::::solution
+
+Update our cluster profile to
+```yaml
+printshellcmds: True
+jobs: 3
+executor: slurm
+default-resources:
+  - mem_mb_per_cpu=3600
+  - runtime=2
+use-envmodules: True
+```
+If you want to test it, you need to erase the output file of the rule and rerun
+Snakemake.
+
+::::::
+
+:::
+
+## Snakemake and MPI
+
+We didn't really run an MPI application in the last section as we only ran on
+one core. How do we request to run on multiple cores for a single rule?
+
+Snakemake has general support for MPI, but the only executor that currently
+explicitly supports MPI is the Slurm executor (lucky for us!). If we look back
+at our Slurm to Snakemake translation table we notice the relevant options
+appear near the bottom:
+
+| SLURM             | Snakemake         | Description                                                    |
+|-------------------|-------------------|----------------------------------------------------------------|
+| ...               | ...               | ...                                                            |
+| `--ntasks`        | `tasks`           | number of concurrent tasks / ranks                             |
+| `--cpus-per-task` | `cpus_per_task`   | number of cpus per task (in case of SMP, rather use `threads`) |
+| `--nodes`         | `nodes`           | number of nodes                                                |
+
+The one we are interested is `tasks` as we are only going to increase the number
+of ranks. We can define these in a `resources` section of our rule and refer to
+them using placeholders:
+```python
+rule amdahl_run:
+    output: "amdahl_run.txt"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi='mpiexec',
+      tasks=2
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl > {output}"
+```
+
+That worked but now we have a bit of an issue. We want to do this for a few
+different values of `tasks` that would mean we would need a different output
+file for every run. It would be great if we can somehow indicate in the `output`
+the value that we want to use for `tasks`...and have Snakemake pick that up.
+
+We could use a _wildcard_ in the `output` to allow us to
+define the `tasks` we wish to use. The syntax for such a wildcard looks like
+```python
+output: "amdahl_run_{parallel_tasks}.txt"
+```
+where `parallel_tasks` is our wildcard.
+
+::: callout
+## Wildcards
+
+Wildcards are used in the `input` and `output` lines of the rule to represent
+parts of filenames.
+Much like the `*` pattern in the shell, the wildcard can stand in for any text
+in order to make up
+the desired filename. As with naming your rules, you may choose any name you
+like for your
+wildcards, so here we used `parallel_tasks`. Using the same wildcards in the
+input and output is what tells Snakemake how to match input files to output
+files.
+
+If two rules use a wildcard with the same name then Snakemake will treat them as
+different entities - rules in Snakemake are self-contained in this way.
+
+In the `shell` line you can reference the wildcard with
+`{wildcards.parallel_tasks}`
+:::
+
+## Snakemake order of operations
+
+We're only just getting started with some simple rules, but it's worth thinking about exactly what Snakemake is doing when you run it. There are three distinct phases:
+
+1. Prepares to run:
+    1. Reads in all the rule definitions from the Snakefile
+1. Plans what to do:
+    1. Sees what file(s) you are asking it to make
+    1. Looks for a matching rule by looking at the `output`s of all the rules it knows
+    1. Fills in the wildcards to work out the `input` for this rule
+    1. Checks that this input file (if required) is actually available
+1. Runs the steps:
+    1. Creates the directory for the output file, if needed
+    1. Removes the old output file if it is already there
+    1. Only then, runs the shell command with the placeholders replaced
+    1. Checks that the command ran without errors *and* made the new output file as expected
+    
+::: callout
+## Dry-run (`-n`) mode
+
+It's often useful to run just the first two phases, so that Snakemake will plan out the jobs to
+run, and print them to the screen, but never actually run them. This is done with the `-n`
+flag, eg:
+
+```bash
+> $ snakemake -n ...
+```
+:::
+
+The amount of checking may seem pedantic right now, but as the workflow gains more steps this will
+become very useful to us indeed.
+
+## Using wildcards in our rule
+
+We would like to use a wildcard in the `output` to allow us to
+define the number of `tasks` we wish to use. Based on what we've seen so far,
+you might imagine this could look like
+```python
+rule amdahl_run:
+    output: "amdahl_run_{parallel_tasks}.txt"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      tasks="{parallel_tasks}"
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl > {output}"
+```
+but there are two problems with this:
+
+* The only way for Snakemake to know the value of the wildcard is for the user
+  to explicitly request a concrete output file (rather than call the rule):
+  ```bash
+  snakemake --profile cluster_profile amdahl_run_2.txt
+  ```
+  This is perfectly valid, as Snakemake can figure out that it has a rule that
+  can match that filename.
+* The bigger problem is that even doing that does not work, it seems we cannot
+  use a wildcard for `tasks`:
+  ```output
+  WorkflowError:
+  SLURM job submission failed. The error message was sbatch: error: Invalid numeric value "{parallel_tasks}" for --ntasks.
+  ```
+
+Unfortunately for us, there is no direct way for us to access the wildcards
+for `tasks`. The
+reason for this is that Snakemake tries to use the value of `tasks` during it's
+initialisation stage, which is before we know the value of the wildcard. We need
+to defer the determination of `tasks` to later on. This can be achieved by
+specifying an input function instead of a value for this
+scenario. The solution then is to write a one-time use function to manipulate
+Snakemake into doing this for us. Since the function is specifically for the
+rule, we can use a one-line function without a name. These kinds of functions
+are called either anonymous functions or lamdba functions (both mean the same
+thing), and are a feature of Python (and other programming languages).
+
+To define a lambda function in python, the general syntax is as follows:
+```python
+lambda x: x + 54
+```
+Since our function _can_ take the wildcards as arguments, we can use that to set
+the value for `tasks`:
+```python
+rule amdahl_run:
+    output: "amdahl_run_{parallel_tasks}.txt"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl > {output}"
+```
+
+Now we have a rule that can be used to generate output from runs of an
+arbitrary number of parallel tasks.
+
+::: callout
+
+## Comments in Snakefiles
+
+In the above code, the line beginning `#` is a comment line. Hopefully you are already in the
+habit of adding comments to your own scripts. Good comments make any script more readable, and
+this is just as true with Snakefiles.
+
+:::
+
+Since our rule is now capable of generating an arbitrary number of output files
+things could get very crowded in our current directory. It's probably best then
+to put the runs into a separate folder to keep things tidy. We can add the
+folder directly to our `output` and Snakemake will take of directory creation
+for us:
+
+```python
+rule amdahl_run:
+    output: "runs/amdahl_run_{parallel_tasks}.txt"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl > {output}"
+```
+
+::: challenge
+
+Create an output file (under the `runs` folder) for the case where we have 6
+parallel tasks
+
+(HINT: Remember that Snakemake needs to be able to match the requested file to
+the `output` from a rule)
+
+:::::: solution
+
+```bash
+snakemake --profile cluster_profile runs/amdahl_run_6.txt
+```
+
+::::::
+
+:::
+
+Another thing about our application `amdahl` is that we ultimately want to
+process the output to generate our scaling plot. The output right now is useful
+for reading but makes processing harder. `amdahl` has an option that actually
+makes this easier for us. To see the `amdahl` options we can use
+```bash
+[ocaisa@node1 ~]$ module load amdahl
+[ocaisa@node1 ~]$ amdahl --help
+```
+```output
+usage: amdahl [-h] [-p [PARALLEL_PROPORTION]] [-w [WORK_SECONDS]] [-t] [-e]
+
+options:
+  -h, --help            show this help message and exit
+  -p [PARALLEL_PROPORTION], --parallel-proportion [PARALLEL_PROPORTION]
+                        Parallel proportion should be a float between 0 and 1
+  -w [WORK_SECONDS], --work-seconds [WORK_SECONDS]
+                        Total seconds of workload, should be an integer greater than 0
+  -t, --terse           Enable terse output
+  -e, --exact           Disable random jitter
+```
+The option we are looking for is `--terse`, and that will make `amdahl` print
+output in a format that is much easier to process, JSON. JSON format in a file
+typically uses the file extension `.json` so let's add that option to our 
+`shell` command _and_ change the file format of the `output` to match our new
+command:
+
+```python
+rule amdahl_run:
+    output: "runs/amdahl_run_{parallel_tasks}.json"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl --terse > {output}"
+```
+
+There was another parameter for `amdahl` that caught my eye. `amdahl` has an
+option `--parallel-proportion` (or `-p`) which we might be interested in
+changing as it changes the behaviour of the code,and therefore has an impact on
+the values we get in our results. Let's add
+another directory layer to our output format to reflect a particular choice for
+this value. We can use a wildcard so we done have to choose the value right
+away:
+
+```python
+rule amdahl_run:
+    output: "p_{parallel_proportion}/runs/amdahl_run_{parallel_tasks}.json"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl --terse -p {wildcards.parallel_proportion} > {output}"
+```
+
+::: challenge
+
+Create an output file for a value of `-p` of 0.999 (the default value is 0.8)
+for the case where we have 6 parallel tasks.
+
+:::::: solution
+
+```bash
+snakemake --profile cluster_profile p_0.999/runs/amdahl_run_6.json
+```
+
+::::::
+
+:::
+
+
+::: keypoints
+
+- "Snakemake chooses the appropriate rule by replacing wildcards such that the
+  output matches the target"
+- "Snakemake checks for various error conditions and will stop if it sees a
+  problem"
+
+:::
diff --git a/episodes/05-chaining_rules.md b/episodes/05-chaining_rules.md
new file mode 100644
index 0000000..b8cdbfb
--- /dev/null
+++ b/episodes/05-chaining_rules.md
@@ -0,0 +1,191 @@
+---
+title: "Chaining rules"
+teaching: 40
+exercises: 30
+---
+
+::: questions
+- "How do I combine rules into a workflow?"
+- "How do I make a rule with multiple inputs and outputs?"
+:::
+
+::: objectives
+- ""
+:::
+
+## A pipeline of multiple rules
+
+We now have a rule that can generate output for any value of `-p` and any number
+of tasks, we just need to call Snakemake with the parameters that we want:
+```bash
+snakemake --profile cluster_profile p_0.999/runs/amdahl_run_6.json
+```
+
+That's not exactly convenient though, to generate a full dataset we have to run
+Snakemake lots of times with different output file targets. Rather than that,
+let's create a rule that can generate those files for us.
+
+Chaining rules in Snakemake is a matter of choosing filename patterns that
+connect the rules.
+There's something of an art to it - most times there are several options that
+will work:
+
+```python
+rule generate_run_files:
+    output: "p_{parallel_proportion}_runs.txt"
+    input:  "p_{parallel_proportion}/runs/amdahl_run_6.json"
+    shell:
+        "echo {input} done > {output}"
+```
+
+::: challenge
+
+The new rule is doing no work, it's just making sure we create the file we want.
+It's not worth executing on the cluster. How do ensure it runs on the login node
+only?
+
+:::::: solution
+
+We need to add the new rule to our `localrules`:
+```python
+localrules: hostname_login, generate_run_files
+```
+
+:::
+
+:::
+
+Now let's run the new rule (remember we need to request the output file by name
+as the `output` in our rule contains a wildcard pattern):
+```bash
+[ocaisa@node1 ~]$ snakemake --profile cluster_profile/ p_0.999_runs.txt
+```
+```output
+Using profile cluster_profile/ for setting default command line arguments.
+Building DAG of jobs...
+Retrieving input from storage.
+Using shell: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
+Provided remote nodes: 3
+Job stats:
+job                   count
+------------------  -------
+amdahl_run                1
+generate_run_files        1
+total                     2
+
+Select jobs to execute...
+Execute 1 jobs...
+
+[Tue Jan 30 17:39:29 2024]
+rule amdahl_run:
+    output: p_0.999/runs/amdahl_run_6.json
+    jobid: 1
+    reason: Missing output files: p_0.999/runs/amdahl_run_6.json
+    wildcards: parallel_proportion=0.999, parallel_tasks=6
+    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, mem_mb_per_cpu=3600, runtime=2, mpi=mpiexec, tasks=6
+
+mpiexec -n 6 amdahl --terse -p 0.999 > p_0.999/runs/amdahl_run_6.json
+No SLURM account given, trying to guess.
+Guessed SLURM account: def-users
+Job 1 has been submitted with SLURM jobid 342 (log: /home/ocaisa/.snakemake/slurm_logs/rule_amdahl_run/342.log).
+[Tue Jan 30 17:47:31 2024]
+Finished job 1.
+1 of 2 steps (50%) done
+Select jobs to execute...
+Execute 1 jobs...
+
+[Tue Jan 30 17:47:31 2024]
+localrule generate_run_files:
+    input: p_0.999/runs/amdahl_run_6.json
+    output: p_0.999_runs.txt
+    jobid: 0
+    reason: Missing output files: p_0.999_runs.txt; Input files updated by another job: p_0.999/runs/amdahl_run_6.json
+    wildcards: parallel_proportion=0.999
+    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/tmp, mem_mb_per_cpu=3600, runtime=2
+
+echo p_0.999/runs/amdahl_run_6.json done > p_0.999_runs.txt
+[Tue Jan 30 17:47:31 2024]
+Finished job 0.
+2 of 2 steps (100%) done
+Complete log: .snakemake/log/2024-01-30T173929.781106.snakemake.log
+```
+
+Look at the logging messages that Snakemake prints in the terminal. What has happened here?
+
+1. Snakemake looks for a rule to make `p_0.999_runs.txt`
+1. It determines that "generate_run_files" can make this if
+   `parallel_proportion=0.999`
+1. It sees that the input needed is therefore `p_0.999/runs/amdahl_run_6.json`
+<br/><br/>
+1. Snakemake looks for a rule to make `p_0.999/runs/amdahl_run_6.json`
+1. It determines that "amdahl_run" can make this if `parallel_proportion=0.999`
+   and `parallel_tasks=6`
+<br/><br/>
+1. Now Snakemake has reached an available input file (in this case, no input
+   file is actually required), it runs both steps to get the final output
+
+This, in a nutshell, is how we build workflows in Snakemake.
+
+1. Define rules for all the processing steps
+1. Choose `input` and `output` naming patterns that allow Snakemake to link the
+   rules
+1. Tell Snakemake to generate the final output file(s)
+
+If you are used to writing regular scripts this takes a little
+getting used to. Rather than listing steps in order of execution, you are alway
+**working backwards** from the final desired result. The order of operations is
+determined by applying the pattern matching rules to the filenames, not by the
+order of the rules in the Snakefile.
+
+::: callout
+
+## Outputs first?
+
+The Snakemake approach of working backwards from the desired output to determine
+the workflow is why we're putting the `output` lines first in all our rules - to
+remind us that these are what Snakemake looks at first!
+
+Many users of Snakemake, and indeed the official documentation, prefer to have
+the `input` first, so in practice you should use whatever order makes sense to
+you.
+
+:::
+
+::: callout 
+
+## `log` outputs in Snakemake
+
+Snakemake has a dedicated rule field for outputs that are
+[log files](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files),
+and these are mostly treated as regular outputs except that log files are not
+removed if the job produces an error. This means you can look at the log to help
+diagnose the error. In a real workflow this can be very useful, but in terms of
+learning the fundamentals of Snakemake we'll stick with regular `input` and
+`output` fields here.
+
+:::
+
+::: callout
+
+## Errors are normal
+
+Don't be disheartened if you see errors when first testing
+your new Snakemake pipelines. There is a lot that can go wrong when writing a
+new workflow, and you'll normally need several iterations to get things just
+right. One advantage of the Snakemake approach compared to regular scripts is
+that Snakemake fails fast when there is a problem, rather than ploughing on
+and potentially running junk calculations on partial or corrupted data. Another
+advantage is that when a step fails we can safely resume from where we left off.
+
+:::
+
+
+
+::: keypoints
+- "Snakemake links rules by iteratively looking for rules that make missing
+  inputs"
+- "Rules may have multiple named inputs and/or outputs"
+- "If a shell command does not yield an expected output then Snakemake will
+  regard that as a failure"
+:::
+
diff --git a/episodes/06-expansion.md b/episodes/06-expansion.md
new file mode 100644
index 0000000..e332dbe
--- /dev/null
+++ b/episodes/06-expansion.md
@@ -0,0 +1,194 @@
+---
+title: "Processing lists of inputs"
+teaching: 50
+exercises: 30
+---
+
+::: questions
+- "How do I process multiple files at once?"
+- "How do I combine multiple files together?"
+:::
+
+::: objectives
+- "Use Snakemake to process all our samples at once"
+- "Make a scalability plot that brings our results together"
+:::
+
+We created a rule that can generate a single output file, but we're not going to
+create multiple rules for every output file. We want to generate all of the run
+files with a single rule if we could, well Snakemake can indeed take a list of
+input files:
+
+```python
+rule generate_run_files:
+    output: "p_{parallel_proportion}_runs.txt"
+    input:  "p_{parallel_proportion}/runs/amdahl_run_2.json", "p_{parallel_proportion}/runs/amdahl_run_6.json"
+    shell:
+        "echo {input} done > {output}"
+```
+
+That's great, but we don't want to have to list all of the files we're
+interested in individually. How can we do this?
+
+## Defining a list of samples to process
+
+To do this, we can define some lists as Snakemake **global variables**.
+
+Global variables should be added before the rules in the Snakefile.
+
+```python
+# Task sizes we wish to run
+NTASK_SIZES = [1, 2, 3, 4, 5]
+```
+
+* Unlike with variables in shell scripts, we can put spaces around the `=` sign,
+  but they are not mandatory.
+* The lists of quoted strings are enclosed in square brackets and
+  comma-separated. If you know any Python you'll recognise this as Python list
+  syntax.
+* A good convention is to use capitalized names for these variables, but this is
+  not mandatory.
+* Although these are referred to as variables, you can't actually change the
+  values once the workflow is running, so lists defined this way are more like
+  constants.
+
+## Using a Snakemake rule to define a batch of outputs
+
+Now let's update our Snakefile to leverage the new global variable and create a
+list of files:
+```python
+rule generate_run_files:
+    output: "p_{parallel_proportion}_runs.txt"
+    input:  expand("p_{{parallel_proportion}}/runs/amdahl_run_{count}.json", count=NTASK_SIZES)
+    shell:
+        "echo {input} done > {output}"
+```
+
+The `expand(...)` function in this rule generates a list of filenames, by taking
+the first thing in the single parentheses as a template and replacing `{count}`
+with all the `NTASK_SIZES`. Since there are 5 elements in the list, this will
+yield 5 files we want to make. Note that we had to protect our wildcard in a
+second set of parentheses so it wouldn't be interpreted as something that needed
+to be expanded.
+
+In our current case we still rely on the file name to define the value of the 
+wildcard `parallel_proportion` so we can't call the rule directly, we still need
+to request a specific file:
+
+```bash
+snakemake --profile cluster_profile/ p_0.999_runs.txt
+```
+
+If you don't specify a target rule name or any file names on the command line
+when running Snakemake, the default is to use **the first rule** in the
+Snakefile as the target. 
+
+::: callout
+## Rules as targets
+
+Giving the name of a rule to Snakemake on the command line only works when that
+rule has *no wildcards* in the outputs, because Snakemake has no way to know
+what the desired wildcards might be. You will see the error "Target rules may
+not contain wildcards." This can also happen when you don't supply any explicit
+targets on the command line at all, and Snakemake tries to runthe first rule
+defined in the Snakefile. 
+
+:::
+
+## Rules that combine multiple inputs
+
+Our `generate_run_files` rule is a rule which takes a list of input files. The
+length of that list is not fixed by the rule, but can change based on
+`NTASK_SIZES`. 
+
+In our workflow the final step is to take all the generated files and combine
+them into a plot. To do that, you may have heard that some people use a python
+library called `matplotlib`. It's beyond the scope of this tutorial to write
+the python script to create a final plot, so we provide you with the script as
+part of this lesson. You can download it with
+```bash
+curl -O https://ocaisa.github.io/hpc-workflows/files/plot_terse_amdahl_results.py
+```
+
+The script `plot_terse_amdahl_results.py` needs a command line that looks like:
+```bash
+python plot_terse_amdahl_results.py <output jpeg filename> <1st input file> <2nd input file> ...
+```
+Let's introduce that into our `generate_run_files` rule:
+
+
+```python
+rule generate_run_files:
+    output: "p_{parallel_proportion}_runs.txt"
+    input:  expand("p_{{parallel_proportion}}/runs/amdahl_run_{count}.json", count=NTASK_SIZES)
+    shell:
+        "python plot_terse_amdahl_results.py {output} {input}"
+```
+
+::: challenge
+
+This script relies on `matplotlib`, is it available as an environment module?
+Add this requirement to our rule.
+
+:::::: solution
+
+```python
+rule generate_run_files:
+    output: "p_{parallel_proportion}_scalability.jpg"
+    input:  expand("p_{{parallel_proportion}}/runs/amdahl_run_{count}.json", count=NTASK_SIZES)
+    envmodules:
+      "matplotlib"
+    shell:
+        "python plot_terse_amdahl_results.py {output} {input}"
+```
+
+::::::
+
+:::
+
+Now we finally get to generate a scaling plot! Run the final Snakemake command
+```bash
+snakemake --profile cluster_profile/ p_0.999_scalability.jpg
+```
+
+::: challenge
+
+Generate the scalability plot for all values from 1 to 10 cores.
+
+:::::: solution
+
+```python
+NTASK_SIZES = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+```
+
+::::::
+
+:::
+
+::: challenge
+
+Rerun the workflow for a `p` value of 0.8
+
+:::::: solution
+
+```bash
+snakemake --profile cluster_profile/ p_0.8_scalability.jpg
+```
+
+::::::
+
+:::
+
+::: challenge
+## Bonus round
+
+Create a final rule that can be called directly and generates a scaling plot for
+3 different values of `p`.
+
+:::
+
+::: keypoints
+- "Use the `expand()` function to generate lists of filenames you want to combine"
+- "Any `{input}` to a rule can be a variable-length list"
+:::
+
diff --git a/episodes/amdahl_foundation.md b/episodes/amdahl_foundation.md
deleted file mode 100644
index fb0a532..0000000
--- a/episodes/amdahl_foundation.md
+++ /dev/null
@@ -1,126 +0,0 @@
----
-title: "Running a Parallel Application on the Cluster"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- What output does the Amdahl code generate?
-- Why does parallelizing the amdahl code make it faster?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Run the amdahl parallel code on the cluster
-- Note what output is generated, and where it goes
-- Predict the trend of execution time vs parallelism
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Introduction
-
-A high-performance computing cluster offers powerful
-computational resources to its users, but taking advantage
-of these resources is not always straightforward. The
-cluster system does not work in the same way as systems
-you may be more familiar with.
-
-The software we will use in this lesson is a model of
-the kind of parallel task that is well-adapted to
-high-performance computing resources. It's called "amdahl",
-named for Eugene Amdahl, a famous computer scientist who
-coined "Amdahl's Law", which is about the advantages and
-limitations of parallelism in code execution.
-
-:::::::::::::::::::::::::::::::: callout
-
-[Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) is
-a statement about how much benefit you can expect to get by
-parallelizing a computer program.
-
-The limitation arises from the fact that, in any application,
-there is some fraction of the work to be done which is inherently
-serial, and some fraction which is amenable to parallelization.
-The law is a quantitative expression of the fact that, by
-parallelizing the code, you can only ever make the parallel
-part faster, you cannot reduce the execution time of the
-serial part.
-
-As a practical matter, this means that developer effort spent
-on parallelization has diminishing returns on the overall
-reduction in execution time.
-
-::::::::::::::::::::::::::::::::::::::::
-
-## The Amdahl Code
-
-Download it and install it, via pip.
-Note that `amdahl` depends on MPI,
-so make sure that's also available.
-
-On the HPC Carpentry cluster:
-
-``` shell
-[user@login1 ~]$ module load OpenMPI
-[user@login1 ~]$ module load Python
-[user@login1 ~]$ pip install amdahl
-```
-
-## Running It on the Cluster
-
-Use the `sacct` command to see the run-time.
-The run-time is also recorded in the output itself.
-
-``` shell
-[user@login1 ~]$ nano amdahl_1.sh
-```
-
-``` bash
-#!/bin/bash
-#SBATCH -t 00:01          # max 1 minute
-#SBATCH -p smnodes        # max 4 cores
-#SBATCH -n 1              # use 1 core
-#SBATCH -o amdahl-np1.out # record result
-
-module load OpenMPI
-module load Python
-
-mpirun amdahl
-```
-
-``` shell
-[user@login1 ~]$ sbatch amdahl_1.sh
-```
-
-:::::::::::::::::::::::::::::: challenge
-
-Run the amdhal code with a few (small!) levels
-of parallelism. Make a quantitative estimate of
-how much faster the code will run with 3 processors
-than 2. The naive estimate would be that it would run
-1.5× the speed, or equivalently, that it would
-complete in 2/3 the time.
-
-:::::::::::::::: solution
-
-``` shell
-[user@login1 ~]$ sbatch amdahl_1.sh  # serial job     ~ 25 sec
-[user@login1 ~]$ sbatch amdahl_2.sh  # 2-way parallel ~ 20 sec
-[user@login1 ~]$ sbatch amdahl_3.sh  # 3-way parallel ~ 16 sec
-```
-
-The amdahl code runs faster with 3 processors than with
-2, but the speed-up is less than 1.5×.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- The amdahl code is a model of a parallel application
-- The execution speed depends on the degree of parallelism
-
-::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/amdahl_snakemake.md b/episodes/amdahl_snakemake.md
deleted file mode 100644
index 4686339..0000000
--- a/episodes/amdahl_snakemake.md
+++ /dev/null
@@ -1,61 +0,0 @@
----
-title: "Amdahl Parallel Runs"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- How can we collect data on Amdahl run times?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Collect systematic data on the runtime of the amdahl code
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Systematic Data Collection
-
-Using what we have learned so far, including Snakemake
-profiles and rules, we will now compose a Snakefile
-that runs the Amdahl example code over a range of
-parallel widths. This workflow will generate the
-data we will use in the next module to demonstrate
-the diminishing returns of increasing parallelism.
-
-## Write a File
-
-Compose the Snakemake file that does what we want.
-
-We can put the widths in a list and iterate over
-them. We will use the profile generated previously
-to ensure that the jobs run on the cluster.
-
-## Run Snakemake
-
-Throw the switch!
-
-:::::::::::::::::::::::::::::: challenge
-
-Our example has a single paramter, the parallelism,
-that we vary. How would you generalize this to arbitrary
-parameters?
-
-:::::::::::::::: solution
-
-Arbitrary parameters are still finite, so you could
-just generate a flat list of all the combinations, and iterate
-over that. Or you could generate two lists and do a nested
-loop.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- A relatively compact snakemake file collects interesting data.
-
-::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/files/Snakefile_amdahl_cluster b/episodes/files/Snakefile_amdahl_cluster
deleted file mode 100644
index eca4d3e..0000000
--- a/episodes/files/Snakefile_amdahl_cluster
+++ /dev/null
@@ -1,8 +0,0 @@
-rule one:
-  input:
-  output: 'amdahl_cluster.txt'
-  resources:
-    mpi="mpirun",
-    tasks=3
-  shell: 
-    "module load OpenMPI; mpirun -np {resources.tasks} amdahl > amdahl_cluster.txt"
diff --git a/episodes/files/Snakefile_cluster b/episodes/files/Snakefile_cluster
deleted file mode 100644
index ac60d86..0000000
--- a/episodes/files/Snakefile_cluster
+++ /dev/null
@@ -1,4 +0,0 @@
-rule:
-  input:
-  output: 'host.txt'
-  shell: 'hostname > host.txt'
diff --git a/episodes/files/Snakefile_cluster_iteration b/episodes/files/Snakefile_cluster_iteration
deleted file mode 100644
index 41a94a2..0000000
--- a/episodes/files/Snakefile_cluster_iteration
+++ /dev/null
@@ -1,24 +0,0 @@
-#
-# Run a bunch of Amdahl jobs and aggregate the output.
-#
-WIDTHS=[1,2]
-#
-def getwidth(wildcards):
-    return wildcards.sample
-    
-rule plot:
-  input: expand('{size}.out',size=WIDTHS)
-  output: 'done.out'
-  resources:
-    mpi="mpirun",
-    tasks=1
-  shell: 'echo "{WIDTHS}, Done!" > done.out'
-rule iterate:
-  input: 
-  output: '{sample}.out' 
-  resources:
-    mpi="mpirun",
-    tasks=getwidth
-  shell: 
-    "module load OpenMPI; mpirun -np {resources.tasks} amdahl > {wildcards.sample}.out"
-
diff --git a/episodes/files/Snakefile_hello b/episodes/files/Snakefile_hello
deleted file mode 100644
index 0b94a00..0000000
--- a/episodes/files/Snakefile_hello
+++ /dev/null
@@ -1,4 +0,0 @@
-rule:
-  input: 
-  output: 'hello.txt'
-  shell:  'echo "Hello there, world!" >> hello.txt'
diff --git a/episodes/files/Snakefile_iterative b/episodes/files/Snakefile_iterative
deleted file mode 100644
index 8fe13f8..0000000
--- a/episodes/files/Snakefile_iterative
+++ /dev/null
@@ -1,13 +0,0 @@
-#
-# Iterative example.
-#
-NAMES=['one','two','three']
-#
-rule done:
-  input: expand('{name}.out',name=NAMES)
-  output: 'done.out'
-  shell: 'echo "Done!" > done.out'
-rule iterate:
-  input: 
-  output: '{sample}.out' 
-  shell: 'echo {output} > {output}'
diff --git a/episodes/files/Snakefile_tworules b/episodes/files/Snakefile_tworules
deleted file mode 100644
index 66558a6..0000000
--- a/episodes/files/Snakefile_tworules
+++ /dev/null
@@ -1,9 +0,0 @@
-rule last:
-  input: 'lower.txt'
-  output: 'upper.txt'
-  shell: 'cat lower.txt | tr a-z A-Z > upper.txt'
-
-rule first:
-  input:
-  output: 'lower.txt'
-  shell: 'echo "Hello, world!" > lower.txt'
diff --git a/episodes/files/plot_terse_amdahl_results.py b/episodes/files/plot_terse_amdahl_results.py
new file mode 100644
index 0000000..a85425f
--- /dev/null
+++ b/episodes/files/plot_terse_amdahl_results.py
@@ -0,0 +1,49 @@
+import sys
+import json
+import matplotlib.pyplot as plt
+import numpy as np
+
+def process_files(file_list, output="plot.jpg"):
+    value_tuples=[]
+    for filename in file_list:
+      # Open the JSON file and load data
+      with open(filename, 'r') as file:
+        data = json.load(file)
+      value_tuples.append((data['nproc'], data['execution_time']))
+
+    # Sort the tuples
+    sorted_list = sorted(value_tuples)
+
+    # Unzip the sorted list into two lists
+    x, y = zip(*sorted_list)
+
+    # Create a line plot
+    plt.plot(x, y, marker='o')
+
+    # Adding the y=1/x line
+    x_line = np.linspace(1, max(x), 100)  # Create x values for the line
+    y_line = (y[0]/x[0]) / x_line  # Calculate corresponding (scaled) y values
+
+    plt.plot(x_line, y_line, linestyle='--', color='red', label='Perfect scaling')
+
+    # Adding title and labels
+    plt.title("Scaling plot")
+    plt.xlabel("Number of cores")
+    plt.ylabel("Wallclock time (seconds)")
+
+    # Show the legend
+    plt.legend()
+
+    # Save the plot to a JPEG file
+    plt.savefig(output, format='jpeg')
+
+if __name__ == "__main__":
+    # The first command-line argument is the script name itself, so we skip it
+    output = sys.argv[1]
+    filenames = sys.argv[2:]
+
+    if filenames:
+        process_files(filenames, output=output)
+    else:
+        print("No files provided.")
+
diff --git a/episodes/files/queuing_config.yaml b/episodes/files/queuing_config.yaml
deleted file mode 100644
index 7db5043..0000000
--- a/episodes/files/queuing_config.yaml
+++ /dev/null
@@ -1,6 +0,0 @@
-#  snakemake -j 3 --cluster "sbatch -N 1 -n {resources.tasks} -p node"
-cluster: 
-  sbatch
-    --partition=node
-    --nodes=1
-    --tasks={resources.tasks}
diff --git a/episodes/files/snakefiles/Snakefile_ep01 b/episodes/files/snakefiles/Snakefile_ep01
new file mode 100644
index 0000000..32de8e2
--- /dev/null
+++ b/episodes/files/snakefiles/Snakefile_ep01
@@ -0,0 +1,5 @@
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:
+    shell:
+        "hostname > hostname_login.txt"
diff --git a/episodes/files/snakefiles/Snakefile_ep02 b/episodes/files/snakefiles/Snakefile_ep02
new file mode 100644
index 0000000..6957cfb
--- /dev/null
+++ b/episodes/files/snakefiles/Snakefile_ep02
@@ -0,0 +1,13 @@
+localrules: hostname_login
+
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:
+    shell:
+        "hostname > hostname_login.txt"
+
+rule hostname_remote:
+    output: "hostname_remote.txt"
+    input:
+    shell:
+        "hostname > hostname_remote.txt"
diff --git a/episodes/files/snakefiles/Snakefile_ep04 b/episodes/files/snakefiles/Snakefile_ep04
new file mode 100644
index 0000000..b8c9897
--- /dev/null
+++ b/episodes/files/snakefiles/Snakefile_ep04
@@ -0,0 +1,22 @@
+localrules: hostname_login
+
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:
+    shell:
+        "hostname > hostname_login.txt"
+
+rule amdahl_run:
+    output: "p_{parallel_proportion}/runs/amdahl_run_{parallel_tasks}.json"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl --terse -p {wildcards.parallel_proportion} > {output}"
diff --git a/episodes/files/snakefiles/Snakefile_ep05 b/episodes/files/snakefiles/Snakefile_ep05
new file mode 100644
index 0000000..93ec684
--- /dev/null
+++ b/episodes/files/snakefiles/Snakefile_ep05
@@ -0,0 +1,28 @@
+localrules: hostname_login, generate_run_files
+
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:
+    shell:
+        "hostname > hostname_login.txt"
+
+rule generate_run_files:
+    output: "p_{parallel_proportion}_runs.txt"
+    input:  "p_{parallel_proportion}/runs/amdahl_run_6.json"
+    shell:
+        "echo {input} done > {output}"
+
+rule amdahl_run:
+    output: "p_{parallel_proportion}/runs/amdahl_run_{parallel_tasks}.json"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl --terse -p {wildcards.parallel_proportion} > {output}"
diff --git a/episodes/files/snakefiles/Snakefile_ep06 b/episodes/files/snakefiles/Snakefile_ep06
new file mode 100644
index 0000000..73c17e4
--- /dev/null
+++ b/episodes/files/snakefiles/Snakefile_ep06
@@ -0,0 +1,32 @@
+NTASK_SIZES = [1, 2, 3, 4, 5]
+
+localrules: hostname_login, generate_run_files
+
+rule hostname_login:
+    output: "hostname_login.txt"
+    input:
+    shell:
+        "hostname > hostname_login.txt"
+
+rule generate_run_files:
+    output: "p_{parallel_proportion}_scalability.jpg"
+    input:  expand("p_{{parallel_proportion}}/runs/amdahl_run_{count}.json", count=NTASK_SIZES)
+    envmodules:
+      "matplotlib"
+    shell:
+        "python plot_terse_amdahl_results.py {output} {input}"
+
+rule amdahl_run:
+    output: "p_{parallel_proportion}/runs/amdahl_run_{parallel_tasks}.json"
+    input:
+    envmodules:
+      "amdahl"
+    resources:
+      mpi="mpiexec",
+      # No direct way to access the wildcard in tasks, so we need to do this
+      # indirectly by declaring a short function that takes the wildcards as an
+      # argument
+      tasks=lambda wildcards: int(wildcards.parallel_tasks)
+    input:
+    shell:
+        "{resources.mpi} -n {resources.tasks} amdahl --terse -p {wildcards.parallel_proportion} > {output}"
diff --git a/episodes/files/snakefiles/cluster_profile_ep02/config.yaml b/episodes/files/snakefiles/cluster_profile_ep02/config.yaml
new file mode 100644
index 0000000..60685b5
--- /dev/null
+++ b/episodes/files/snakefiles/cluster_profile_ep02/config.yaml
@@ -0,0 +1,6 @@
+printshellcmds: True
+jobs: 3
+executor: slurm
+default-resources:
+  - mem_mb_per_cpu=3600
+  - runtime=2
diff --git a/episodes/files/snakefiles/cluster_profile_ep04/config.yaml b/episodes/files/snakefiles/cluster_profile_ep04/config.yaml
new file mode 100644
index 0000000..2fbcb60
--- /dev/null
+++ b/episodes/files/snakefiles/cluster_profile_ep04/config.yaml
@@ -0,0 +1,7 @@
+printshellcmds: True
+jobs: 3
+executor: slurm
+default-resources:
+  - mem_mb_per_cpu=3600
+  - runtime=2
+use-envmodules: True
diff --git a/episodes/snakemake_cluster.md b/episodes/snakemake_cluster.md
deleted file mode 100644
index c157a55..0000000
--- a/episodes/snakemake_cluster.md
+++ /dev/null
@@ -1,63 +0,0 @@
----
-title: "Snakemake and the Cluster"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- How can we express a one-task cluster operation in Snakemake?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Write a Snakefile that executes a job on the cluster
-- Use MPI options to ensure the job runs in parallel
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Snakemake and the Cluster
-
-Snakemake has provisions for operating on an HPC cluster.
-
-Various command-line arguments can be provided to tell
-Snakemake not to run things locally, but do run things
-via the queuing system instead.
-
-In this lesson, we will repeat the first module, running
-the admahl code on the cluster, but will use snakemake
-to make it happen.
-
-## Write a cluster Snakemake rule file
-
-Open your favorite editor, do the thing.
-Specify resources. Provide command line arguments
-to do the cluster operations by hand.
-
-## Run Snakemake
-
-Throw the switch!
-
-:::::::::::::::::::::::::::::: challenge
-
-How can you control the degree of parallelism
-of your cluster task?
-
-:::::::::::::::: solution
-
-Use the "mpi" option in the resource block of
-the Snakemake rule, and specify the number of tasks.
-This will be mapped to the `-n` argument of the
-equivalent `sbatch` command.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- Snakemake rule files can submit cluster jobs.
-- There are a lot of options.
-
-::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/snakemake_multiple.md b/episodes/snakemake_multiple.md
deleted file mode 100644
index 9967018..0000000
--- a/episodes/snakemake_multiple.md
+++ /dev/null
@@ -1,77 +0,0 @@
----
-title: "More Complicated Snakefiles"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- What is a task graph?
-- How does the Snakemake file express a task graph?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Write a multiple-rule Snakefile with dependent rules
-- Translate between a task graph and rule set
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Snakemake and Workflow
-
-A Snakefile can contain multiple rules. In the trivial
-case, there will be no dependencies between the rules, and
-they can all run concurrently.
-
-A more interesting case is when there are dependencies between
-the rules, e.g. when one rule takes the output of another rule
-as its input. In this case, the dependent rule (the one that needs
-another rule's output) cannot run until the rule it depends on
-has completed.
-
-It's possible to express this relationship by means of
-a task graph, whose nodes are tasks, and whose arcs are
-input-output relationships between the tasks.
-
-A Snakemake file is textual description of a task
-graph.
-
-## Write a multi-rule Snakemake rule file
-
-Open your favorite editor, do the thing.
-
-## Run Snakemake
-
-Throw the switch!
-
-:::::::::::::::::::::::::::::: challenge
-
-Draw the task graph for your Snakefile.
-
-Given an example task graph, write a Snakefile that
-implements it.
-
-:::::::::::::::: solution
-
-The rules in the snakefile are nodes in the task
-graph. Two rules are connected by an arc in the task
-graph if the output of one rule is the input to the
-other. The task graph is directed, so the arc points
-from the rule that generates a file as output to the rule
-that consumes the same file as input.
-
-A rule with an output that no other rules consumes is
-a terminal rule.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- Snakemake rule files can be mapped to task graphs
-- Tasks are executed as required in dependency order
-- Where possible, tasks may run concurrently.
-
-::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/snakemake_profiles.md b/episodes/snakemake_profiles.md
deleted file mode 100644
index 27c6702..0000000
--- a/episodes/snakemake_profiles.md
+++ /dev/null
@@ -1,67 +0,0 @@
----
-title: "Snakemake Profiles"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- How can we encapsulate our desired snakemake configuration?
-- How do we balance non-reptition and customizability?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Write a Snakemake profile for the cluster
-- Run the amdahl code with varying degrees of parallelism
-  with the cluster profile.
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Snakemake Profiles
-
-Snakemake has a provision for profiles, which allow users
-to collect various common settings together in a special
-file that snakemake examines when it runs. This lets users
-avoid repetition and possible errors of omission for common
-settings, and encapsulates some of the cluster complexity
-we encoutered in the previous module.
-
-Not all settings should be in the profile. Users can
-choose which ones to make static and which ones to make
-adjustable. In our case, we will want to have the freedom
-to choose the degree of parallelism, but most of the
-cluster arguments will not change, and so can be static
-in the profile.
-
-## Write a Profile
-
-Do the thing.
-
-## Run Snakemake
-
-Throw the switch!
-
-:::::::::::::::::::::::::::::: challenge
-
-Write a profile that allows you to choose a
-different partition, in addition to the level of
-parallelism.
-
-:::::::::::::::: solution
-
-The profile files can have variables taken from
-the rule file, and in particular can refer to
-resources from a rule.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- Snakemake profiles encapsulate cluster complexity.
-- Retaining operational flexibliity is also important.
-
-::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/snakemake_single.md b/episodes/snakemake_single.md
deleted file mode 100644
index f9a47e4..0000000
--- a/episodes/snakemake_single.md
+++ /dev/null
@@ -1,69 +0,0 @@
----
-title: "Introduction to Snakemake"
-teaching: 10
-exercises: 2
----
-
-:::::::::::::::::::::::::::::: questions
-
-- What are Snakemake rules?
-- Why do Snakemake rules not always run?
-
-::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::: objectives
-
-- Write a single-rule Snakefile and execute it with Snakemake
-- Predict whether the rule will run or not
-
-::::::::::::::::::::::::::::::::::::::::
-
-## Snakemake
-
-Snakemake is a workflow tool. It takes as input
-a description of the work that you would like the computer
-to do, and when run, does the work that you have
-asked for.
-
-The description of the work takes the form of a
-series of rules, written in a special format into a
-Snakefile. Rules have outputs, and the Snakefile
-and generated output files make up the system state.
-
-## Write a Snakemake rule file
-
-Open your favorite editor, do the thing.
-
-## Run Snakemake
-
-Throw the switch!
-
-:::::::::::::::::::::::::::::: challenge
-
-Remove the output file, and run Snakemake. Then
-run it again. Edit the output file, and run it
-a third time. For which of these invocations
-does Snakemake do non-trivial work?
-
-:::::::::::::::: solution
-
-The rule does not get executed the second time. The
-Snakemake infrastructure is stateful, and knows that
-the required outputs are up to date.
-
-The rule also does not get executed the third time.
-The output is not the output from the rule, but the
-Snakemake infrastructure doesn't know that, it only
-checks the file time-stamp. Editing Snakemake-manipulated
-files can get you into an inconsistent state.
-
-:::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::: keypoints
-
-- Snakemake is an indirect way of running executables
-- Snakemake has a notion of system state, and can be fooled.
-
-::::::::::::::::::::::::::::::::::::::::