From 262e95123a0c02c0f69a09d3334df5e5c93be6c3 Mon Sep 17 00:00:00 2001 From: SamueleSoraggi Date: Wed, 19 Jun 2024 12:31:55 +0200 Subject: [PATCH] genomedk docs with singularity --- _quarto.yml | 3 + access/UCloud.qmd | 4 +- access/genomedk.qmd | 141 +++++++++++++++++++++++++++++++++++++++- access/otherHPC.qmd | 11 ++++ images/genomedkNode.png | Bin 0 -> 2127 bytes 5 files changed, 155 insertions(+), 4 deletions(-) create mode 100644 access/otherHPC.qmd create mode 100644 images/genomedkNode.png diff --git a/_quarto.yml b/_quarto.yml index d4e0fef..5d4c68d 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -25,6 +25,9 @@ website: - text: Access menu: - href: access/UCloud.qmd + text: UCloud + - href: access/genomedk.qmd + text: GenomeDK # - href: datasets/synthdata.qmd - href: galaxy/galaxy.qmd text: Galaxy diff --git a/access/UCloud.qmd b/access/UCloud.qmd index df75623..fcfa40c 100644 --- a/access/UCloud.qmd +++ b/access/UCloud.qmd @@ -1,6 +1,6 @@ --- layout: webpage -title: UCloud +title: Accessing the NGS summer school on UCloud parent: Access has_children: false nav_order: 2 @@ -9,8 +9,6 @@ hide: - toc --- -# Accessing the NGS summer school on UCloud - **1.** User accounts on UCloud are enabled by university login credentials using WAYF (Where Are You From). Access the WAYF login portal with the button below [here](https://cloud.sdu.dk/), and then find your affiliated university or institution using the search bar.   diff --git a/access/genomedk.qmd b/access/genomedk.qmd index 1689899..04e5495 100644 --- a/access/genomedk.qmd +++ b/access/genomedk.qmd @@ -1 +1,140 @@ -sss \ No newline at end of file +--- +layout: webpage +title: Accessing the NGS summer school on GenomeDK +parent: Access +has_children: false +nav_order: 2 +hide: + - footer + - toc +--- + +If you are using GenomeDK, you have two options. One is to use a pre-packaged Docker container, which contains jupyterlab and the necessary packages you need to run all the notebooks. GenomeDK comes with `singularity`, which can import and execute Docker containers (with some perks, such as not showing system folders in the container, but we are going to take care about it by running a simple script) and is able to ensure full reproducibility of the analysis. The second option is to download the github repository of the course and create your own conda environment: this solution works also on any computing cluster where you can have `conda` installed and is shown [in the page dedicate to the access with any computing cluster](./otherHPC.qmd). + +## Singularity container + +**1.** Log into the cluster using the command line, and substituting `USERNAME` with your actual user name: + +```{.bash} +ssh USERNAME@login.genome.au.dk +``` + +:::{.callout-warning title="Technical prerequisites"} + +- if you do not yet have an account on GenomeDK, please get one. [Click on this link to get to the account request.](https://console.genome.au.dk/user-requests/create/) + +- you need to have (or be part of) an active project on GenomeDK. This ensures you can get some computing resources to run the course material. [Follow these instructions to request a project.](https://genome.au.dk/docs/projects-and-accounting/#requesting-a-project). + +::: + +**2.** Get into a folder inside your project, for example + +```{.bash} + +cd MYPROJECT/ngsSummerSchool + +``` + +**3.** Use `singularity` to download the container of the course. This will take some time, and at the end a file called `course.sif` is created into the folder. + +```{.bash} + +singularity pull course.sif docker://hdssandbox/ngssummerschool:2024.07 + +``` + +**4.** Now we need to run a configuration script, which will setup jupyterlab so that the packages are detected correctly. This is downloaded from the internet and runs immediately, downloading also the necessary data. If a folder called `Data` exists, it will not download the data again (also meaning that you can use our container with your own data folder for your own analysis in future) + +```{.bash} + +wget -qO- https://raw.githubusercontent.com/hds-sandbox/NGS_summer_course_Aarhus/docker/scripts/courseMaterial.sh | bash + +``` + +:::{.callout-warning} + +You need to create the file `course.dif` only once. Next time, you only need the configuration script. + +::: + +**5.** Now it's time to get a few resources to run all the material. We suggest one CPU and 32GB of RAM for the first three modules, and 2 CPUs and 64GB of RAM for the single-cell analysis. For the first configuration suggested, you get resources using + +```{.bash} + +srun --mem=32g --cores=1 --time=8:0:0 --account=MYPROJECT --pty /bin/bash + +``` + +and very similarly for the second configuration, when you want instead to work on the single cell analysis. + +:::{.callout-warning} + +Note you need your project name, and you can also choose for how long you want the resources to be available to you. **Asking for resources means waiting for some time in a queue before they are assigned.** + +::: + +**6.** Once resources are assigned, note down the node name. This is on the left side of the command line: for example, in the figure below, the node is `s21n33` + +![](../images/genomedkNode.png){fig-align="center" width="400px"} + + +**7.** execute the container with + +```{.bash} +singularity exec course.sif /bin/bash +``` + +Note that the command line shows now `Apptainer>` on its left. We are *inside* the container and the tools we need are now available into it. + + +**7.** We are ready to go. Activate the environment and start jupyterLab with the following: + +```{.bash} +conda activate /opt/conda/envs/NGS_aarhus_py +jupyter-lab --no-browser --port=$UID --ip=0.0.0.0 +``` + +you will see a lot of messages, which is normal. You need also to create a tunnel between your computer and genomeDK to be able to see jupyterlab in your browser. Now you need to use the node name you wrote down before! **Open a new terminal window** and write + +```{.bash} + +ssh -L6835:NODENAME:6835 samuele@login.genome.au.dk + +``` + +where you substitute `NODENAME` with the correct depiction. + +**8.** Open your browser and go to the address [http://127.0.0.1:6835/lab](http://127.0.0.1:6835/lab). Jupyterlab opens + + +**9.** Now you are ready to use JupyterLab for coding. Use the file browser (on the left-side) to find the folder `Notebooks`. Select one of the four tutorials of the course. You will see that the notebook opens on the right-side pane. Read the text of the tutorial and execute each code cell starting from the first. You will see results showing up directly on the notebook! + +![](../images/startNotebook.gif) + +:::{.callout-tip} + +Right click on a notebook or a saved results file, and use the download option to save it locally on your computer. + +::: + +**10.** At the end of your session, it is a good idea to empty the cache of `singularity`. This will fill up your home folder very quickly (size limit is 100GB). Simply run these two commands: + +```{.bash} + +rm -rf /home/samuele/.apptainer/cache/* +rm -rf ~/.singularity/cache/* + +``` + +### Recovering the material from your previous session + +Everything is saved in the folder you are working in. Next time, follow the whole procedure again - the download script will only link the packages to jupyterlab and avoid downloading new data, notebooks and scripts, because the folders will be detected as existing! + + + + + + + + + diff --git a/access/otherHPC.qmd b/access/otherHPC.qmd new file mode 100644 index 0000000..2123c6a --- /dev/null +++ b/access/otherHPC.qmd @@ -0,0 +1,11 @@ +--- +layout: webpage +title: Accessing the NGS summer school on a computing cluster +parent: Access +has_children: false +nav_order: 2 +hide: + - footer + - toc +--- + diff --git a/images/genomedkNode.png b/images/genomedkNode.png new file mode 100644 index 0000000000000000000000000000000000000000..babd98fabf815a9c8185360e4a65588d5c83d8aa GIT binary patch literal 2127 zcmV-V2(b5wP)Px#1ZP1_K>z@;j|==^1poj532;bRa{vGi!vFvd!vV){sAK>D2jodaK~#8N?VL?$ z6iF1v|GVsM1~vv1*&u}A5QqT-fgA#XAs%uFL?IxommmmYPU9hhLJ$!Y%wcm!wafxsLB0R=Y@A|W6FNg#U2oOj==s-FI;?ym0n&}jXjak{%wUG?hKtM^`Y z`%B*+zyAk|#bW6Yzp8&N7K^1LSfgXHSUQR|Iu?tiqgbP3u~<4vbRv1|DPGSFV0Ymv zo@~fVGmHOV^4umCZ`?;sUi#*6n~Gl^oqZ^hV6aSb4^{tq>rm-11B0L?MfO8|j)?eSf zQZ21+0>kro{jjL6G{=7V3a(G_amo!2^Xj(h%VWGdQP(oU#=X?PjYUx(9Shqi%)P~f zF;&12u4kp}9+Bh529wJyKgFoJlnUqXJ;$?K>Yjs18B=(|HS_mrFnN|L)9Bmcthx~( zLC&>DSIu>(zZ*Gx#zBJ3&={Q}9xT&}Ppe|{-a`l2xR0CVkH`(*O+5iDX7Fkk@^DG+ ze{TUp(-F$7p;3=xxWu^42PO}DoHPjrkDD`;4TI}Gt9L3eQBBI6Muf*Pf}_R)x{Ce2 z>44#Bk(>!0S7Eq>SA7waaYl{^8J*FGcQKTw(4q-9yyKAatnKA7bSLAgXo+DK*T}uz zuY^PDT4fvczRQ?V=ShfF(`&e1y%7rEJLo8=T=2k?rL?ZDwTRHOG?d(!s8I04PA`=_)Yz*D5BV(r?~A+bZBw- z2{m7?Koi^h?lYSFNT`eqZCpVS%3|l#DNdsENgQSssq8gcrE1e^T?5>#A7T8=)K8!#mLa?Sp(>z zBKsBpq@Nj-br!}coZT>!b278I)W>-qX)k8ZknJ##o!`?Ln?QjYd~^`gAa>Knj~gy> z$ira44!&1z9UmMElFQ|d8MzDz*3F;DBQ$wMQp&x*b~u7kRE)o|j$JaeXM0+c+X-RP zW|O|N`zho{E-%ibxIBTc3tW!mm?Bh3EuSb7>S1NA@2sOH`hT7#wp!6njm2@*yxk&t&L!9mh2Q9iM9@e48>c7%>&zZTJwJe= zlTYCsi_f*HqPjyZ%Xg!ouE^(j1rvk>HBQPNNMwhe<)ey&LlQv$M7l|5N@AM{lfDyf zsFFwSa6Ln^K8t*G#Me{vDAMTo0CFeWs7AHbL_Ps#7L|v$njw_;>7icEVxn_?fe!W% z_RAh4;9ydwX$;3JgA7B6IvhA}Gmoyy72JGe?!B4A9uqH~oX43bgRL$n;4?a-J?GW0 z6l{FL5lx=1esxX+cPP9rp+LmWv@*@f`{Bf$Tmd=B=&ff(xTgMRb&SL|OA`AYP^vVp zyi2R(6JL)+wzoj8v*VF?f5X|WFqDsjNGc20yp21gzC`gVz~M+R!D&*aX$;3Jr3|&~ z91AZ9{p2;3*^Bmjo7wa&j_0_?vQyp{C*ry7al6SrqfDetZAMK_e+32+s{Jz`z!(N?ydJG#1w9 z5=w={o_WZyRC0yvhDIt@Ocj56Cq{eGw>f>^vy(+WI!x29S>MxO0}%JYd`%@~i{Z*){L z)_lv5Vn)uiHzt}?PG>iQ31-~&s0rKfQpn<0c*U7ExAG=Q@Rky8bGE*R+5JL;;K@&f z>&YM=ElKsT0!!st`XI}#%+U%;-T;RRVmB?)a0KN;s6^>EF*xeku{*OSWug!sM%3KN zI$jzXGKSA`!?WN#Z9O72quMz$&0%A7ZaDNAgj8^7w;y$rDEwM9C$Bu86L7+>(v1>t z6{`%pb~5S|k&}EZEHAokMnZ%(O?{#v+s*GT=)x1THYfC5^oMtVShlk4Ss_2*!SXzY z4xRdV$H9}B#&EC_5b$JA3Gxte@tk}RJgJ*pA1{yAL z8+So|W_U*hBPjWFftBo~@teq&A03&bHcf4$`pW!!gk5C&m3g~S-mk|S4~Ny3ACX`f z){1er|3|XLV(BQF@XBJbSlWj*Iu?tiqgbP3u~<3?@IR{-jCpd4IST*)002ovPDHLk FV1mv=5D)+W literal 0 HcmV?d00001