genomedk docs with singularity

hds-sandbox · Jun 19, 2024 · 262e951 · 262e951
1 parent 1af93e0
commit 262e951
Show file tree

Hide file tree

Showing 5 changed files with 155 additions and 4 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -25,6 +25,9 @@ website:
       - text: Access
         menu:
           - href: access/UCloud.qmd
+            text: UCloud
+          - href: access/genomedk.qmd
+            text: GenomeDK
 #          - href: datasets/synthdata.qmd
       - href: galaxy/galaxy.qmd
         text: Galaxy

diff --git a/access/UCloud.qmd b/access/UCloud.qmd
@@ -1,6 +1,6 @@
 ---
 layout: webpage
-title: UCloud
+title: Accessing the NGS summer school on UCloud
 parent: Access
 has_children: false
 nav_order: 2
@@ -9,8 +9,6 @@ hide:
   - toc
 ---
 
-# Accessing the NGS summer school on UCloud
-
 **1.** User accounts on UCloud are enabled by university login credentials using WAYF (Where Are You From). Access the WAYF login portal with the button below [here](https://cloud.sdu.dk/), and then find your affiliated university or institution using the search bar. 
 
 &nbsp;

diff --git a/access/genomedk.qmd b/access/genomedk.qmd
@@ -1 +1,140 @@
-sss
+---
+layout: webpage
+title: Accessing the NGS summer school on GenomeDK
+parent: Access
+has_children: false
+nav_order: 2
+hide:
+  - footer
+  - toc
+---
+
+If you are using GenomeDK, you have two options. One is to use a pre-packaged Docker container, which contains jupyterlab and the necessary packages you need to run all the notebooks. GenomeDK comes with `singularity`, which can import and execute Docker containers (with some perks, such as not showing system folders in the container, but we are going to take care about it by running a simple script) and is able to ensure full reproducibility of the analysis. The second option is to download the github repository of the course and create your own conda environment: this solution works also on any computing cluster where you can have `conda` installed and is shown [in the page dedicate to the access with any computing cluster](./otherHPC.qmd).
+
+## Singularity container
+
+**1.** Log into the cluster using the command line, and substituting `USERNAME` with your actual user name:
+
+```{.bash}
+ssh USERNAME@login.genome.au.dk
+```
+
+:::{.callout-warning title="Technical prerequisites"}
+
+- if you do not yet have an account on GenomeDK, please get one. [Click on this link to get to the account request.](https://console.genome.au.dk/user-requests/create/)
+
+- you need to have (or be part of) an active project on GenomeDK. This ensures you can get some computing resources to run the course material. [Follow these instructions to request a project.](https://genome.au.dk/docs/projects-and-accounting/#requesting-a-project).
+
+:::
+
+**2.** Get into a folder inside your project, for example
+
+```{.bash}
+
+cd MYPROJECT/ngsSummerSchool
+
+```
+
+**3.** Use `singularity` to download the container of the course. This will take some time, and at the end a file called `course.sif` is created into the folder.
+
+```{.bash}
+
+singularity pull course.sif docker://hdssandbox/ngssummerschool:2024.07
+
+```
+
+**4.** Now we need to run a configuration script, which will setup jupyterlab so that the packages are detected correctly. This is downloaded from the internet and runs immediately, downloading also the necessary data. If a folder called `Data` exists, it will not download the data again (also meaning that you can use our container with your own data folder for your own analysis in future)
+
+```{.bash}
+
+wget -qO-  https://raw.githubusercontent.com/hds-sandbox/NGS_summer_course_Aarhus/docker/scripts/courseMaterial.sh | bash
+
+```
+
+:::{.callout-warning}
+
+You need to create the file `course.dif` only once. Next time, you only need the configuration script.
+
+:::
+
+**5.** Now it's time to get a few resources to run all the material. We suggest one CPU and 32GB of RAM for the first three modules, and 2 CPUs and 64GB of RAM for the single-cell analysis. For the first configuration suggested, you get resources using 
+
+```{.bash}
+
+srun --mem=32g --cores=1 --time=8:0:0  --account=MYPROJECT --pty /bin/bash
+
+```
+
+and very similarly for the second configuration, when you want instead to work on the single cell analysis. 
+
+:::{.callout-warning}
+
+Note you need your project name, and you can also choose for how long you want the resources to be available to you. **Asking for resources means waiting for some time in a queue before they are assigned.**
+
+:::
+
+**6.** Once resources are assigned, note down the node name. This is on the left side of the command line: for example, in the figure below, the node is `s21n33`
+
+![](../images/genomedkNode.png){fig-align="center" width="400px"}
+
+
+**7.**  execute the container with 
+
+```{.bash}
+singularity exec course.sif /bin/bash
+```
+
+Note that the command line shows now `Apptainer>` on its left. We are *inside* the container and the tools we need are now available into it.
+
+
+**7.** We are ready to go. Activate the environment and start jupyterLab with the following:
+
+```{.bash}
+conda activate /opt/conda/envs/NGS_aarhus_py
+jupyter-lab --no-browser --port=$UID --ip=0.0.0.0
+```
+
+you will see a lot of messages, which is normal. You need also to create a tunnel between your computer and genomeDK to be able to see jupyterlab in your browser. Now you need to use the node name you wrote down before! **Open a new terminal window** and write
+
+```{.bash}
+
+ssh -L6835:NODENAME:6835 samuele@login.genome.au.dk
+
+```
+
+where you substitute `NODENAME` with the correct depiction.
+
+**8.** Open your browser and go to the address [http://127.0.0.1:6835/lab](http://127.0.0.1:6835/lab). Jupyterlab opens
+
+
+**9.** Now you are ready to use JupyterLab for coding. Use the file browser (on the left-side) to find the folder `Notebooks`. Select one of the four tutorials of the course. You will see that the notebook opens on the right-side pane. Read the text of the tutorial and execute each code cell starting from the first. You will see results showing up directly on the notebook!
+
+![](../images/startNotebook.gif)
+
+:::{.callout-tip}
+
+Right click on a notebook or a saved results file, and use the download option to save it locally on your computer.
+
+:::
+
+**10.** At the end of your session, it is a good idea to empty the cache of `singularity`. This will fill up your home folder very quickly (size limit is 100GB). Simply run these two commands:
+
+```{.bash}
+
+rm -rf /home/samuele/.apptainer/cache/*
+rm -rf ~/.singularity/cache/*
+
+```
+
+### Recovering the material from your previous session
+
+Everything is saved in the folder you are working in. Next time, follow the whole procedure again - the download script will only link the packages to jupyterlab and avoid downloading new data, notebooks and scripts, because the folders will be detected as existing!
+
+
+
+
+
+
+
+
+
diff --git a/access/otherHPC.qmd b/access/otherHPC.qmd
@@ -0,0 +1,11 @@
+---
+layout: webpage
+title: Accessing the NGS summer school on a computing cluster
+parent: Access
+has_children: false
+nav_order: 2
+hide:
+  - footer
+  - toc
+---
+
diff --git a/images/genomedkNode.png b/images/genomedkNode.png