Skip to content

Commit

Permalink
genomedk docs with singularity
Browse files Browse the repository at this point in the history
  • Loading branch information
SamueleSoraggi committed Jun 19, 2024
1 parent 1af93e0 commit 262e951
Show file tree
Hide file tree
Showing 5 changed files with 155 additions and 4 deletions.
3 changes: 3 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ website:
- text: Access
menu:
- href: access/UCloud.qmd
text: UCloud
- href: access/genomedk.qmd
text: GenomeDK
# - href: datasets/synthdata.qmd
- href: galaxy/galaxy.qmd
text: Galaxy
Expand Down
4 changes: 1 addition & 3 deletions access/UCloud.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: webpage
title: UCloud
title: Accessing the NGS summer school on UCloud
parent: Access
has_children: false
nav_order: 2
Expand All @@ -9,8 +9,6 @@ hide:
- toc
---

# Accessing the NGS summer school on UCloud

**1.** User accounts on UCloud are enabled by university login credentials using WAYF (Where Are You From). Access the WAYF login portal with the button below [here](https://cloud.sdu.dk/), and then find your affiliated university or institution using the search bar.

 
Expand Down
141 changes: 140 additions & 1 deletion access/genomedk.qmd
Original file line number Diff line number Diff line change
@@ -1 +1,140 @@
sss
---
layout: webpage
title: Accessing the NGS summer school on GenomeDK
parent: Access
has_children: false
nav_order: 2
hide:
- footer
- toc
---

If you are using GenomeDK, you have two options. One is to use a pre-packaged Docker container, which contains jupyterlab and the necessary packages you need to run all the notebooks. GenomeDK comes with `singularity`, which can import and execute Docker containers (with some perks, such as not showing system folders in the container, but we are going to take care about it by running a simple script) and is able to ensure full reproducibility of the analysis. The second option is to download the github repository of the course and create your own conda environment: this solution works also on any computing cluster where you can have `conda` installed and is shown [in the page dedicate to the access with any computing cluster](./otherHPC.qmd).

## Singularity container

**1.** Log into the cluster using the command line, and substituting `USERNAME` with your actual user name:

```{.bash}
ssh USERNAME@login.genome.au.dk
```

:::{.callout-warning title="Technical prerequisites"}

- if you do not yet have an account on GenomeDK, please get one. [Click on this link to get to the account request.](https://console.genome.au.dk/user-requests/create/)

- you need to have (or be part of) an active project on GenomeDK. This ensures you can get some computing resources to run the course material. [Follow these instructions to request a project.](https://genome.au.dk/docs/projects-and-accounting/#requesting-a-project).

:::

**2.** Get into a folder inside your project, for example

```{.bash}
cd MYPROJECT/ngsSummerSchool
```

**3.** Use `singularity` to download the container of the course. This will take some time, and at the end a file called `course.sif` is created into the folder.

```{.bash}
singularity pull course.sif docker://hdssandbox/ngssummerschool:2024.07
```

**4.** Now we need to run a configuration script, which will setup jupyterlab so that the packages are detected correctly. This is downloaded from the internet and runs immediately, downloading also the necessary data. If a folder called `Data` exists, it will not download the data again (also meaning that you can use our container with your own data folder for your own analysis in future)

```{.bash}
wget -qO- https://raw.githubusercontent.com/hds-sandbox/NGS_summer_course_Aarhus/docker/scripts/courseMaterial.sh | bash
```

:::{.callout-warning}

You need to create the file `course.dif` only once. Next time, you only need the configuration script.

:::

**5.** Now it's time to get a few resources to run all the material. We suggest one CPU and 32GB of RAM for the first three modules, and 2 CPUs and 64GB of RAM for the single-cell analysis. For the first configuration suggested, you get resources using

```{.bash}
srun --mem=32g --cores=1 --time=8:0:0 --account=MYPROJECT --pty /bin/bash
```

and very similarly for the second configuration, when you want instead to work on the single cell analysis.

:::{.callout-warning}

Note you need your project name, and you can also choose for how long you want the resources to be available to you. **Asking for resources means waiting for some time in a queue before they are assigned.**

:::

**6.** Once resources are assigned, note down the node name. This is on the left side of the command line: for example, in the figure below, the node is `s21n33`

![](../images/genomedkNode.png){fig-align="center" width="400px"}


**7.** execute the container with

```{.bash}
singularity exec course.sif /bin/bash
```

Note that the command line shows now `Apptainer>` on its left. We are *inside* the container and the tools we need are now available into it.


**7.** We are ready to go. Activate the environment and start jupyterLab with the following:

```{.bash}
conda activate /opt/conda/envs/NGS_aarhus_py
jupyter-lab --no-browser --port=$UID --ip=0.0.0.0
```

you will see a lot of messages, which is normal. You need also to create a tunnel between your computer and genomeDK to be able to see jupyterlab in your browser. Now you need to use the node name you wrote down before! **Open a new terminal window** and write

```{.bash}
ssh -L6835:NODENAME:6835 samuele@login.genome.au.dk
```

where you substitute `NODENAME` with the correct depiction.

**8.** Open your browser and go to the address [http://127.0.0.1:6835/lab](http://127.0.0.1:6835/lab). Jupyterlab opens


**9.** Now you are ready to use JupyterLab for coding. Use the file browser (on the left-side) to find the folder `Notebooks`. Select one of the four tutorials of the course. You will see that the notebook opens on the right-side pane. Read the text of the tutorial and execute each code cell starting from the first. You will see results showing up directly on the notebook!

![](../images/startNotebook.gif)

:::{.callout-tip}

Right click on a notebook or a saved results file, and use the download option to save it locally on your computer.

:::

**10.** At the end of your session, it is a good idea to empty the cache of `singularity`. This will fill up your home folder very quickly (size limit is 100GB). Simply run these two commands:

```{.bash}
rm -rf /home/samuele/.apptainer/cache/*
rm -rf ~/.singularity/cache/*
```

### Recovering the material from your previous session

Everything is saved in the folder you are working in. Next time, follow the whole procedure again - the download script will only link the packages to jupyterlab and avoid downloading new data, notebooks and scripts, because the folders will be detected as existing!









11 changes: 11 additions & 0 deletions access/otherHPC.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
layout: webpage
title: Accessing the NGS summer school on a computing cluster
parent: Access
has_children: false
nav_order: 2
hide:
- footer
- toc
---

Binary file added images/genomedkNode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 262e951

Please sign in to comment.