From 73125d68f435ac2bc58ee1d970b019c2b99a3ab9 Mon Sep 17 00:00:00 2001 From: samarth8392 Date: Wed, 21 Aug 2024 14:35:31 -0400 Subject: [PATCH 1/7] docs: updated FAQs to include jobby --- docs/troubleshooting.md | 57 +++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 346a1b7..399ffab 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -4,9 +4,14 @@ We have compiled this FAQ from the most common problems. If you are running into ## Job Status -**Q. How do I know if RNA-seek finished running successfully?** +**Q: How do I know if RENEE pipeline finished running?** + +**A.** Once the pipeline is done running to completion, you will receive an email with header like + +`Slurm Job_id=xxxx Name=pl:renee Ended, Run time xx:xx:xx, COMPLETED, ExitCode 0` + +There are several different ways of checking the status of each job submitted to the cluster. -**A.** There are several different ways of checking the status of each job submitted to the cluster. Here are a few suggestions: !!! tldr "Check Job Status" @@ -15,21 +20,7 @@ Here are a few suggestions: You can check the status of Biowulf jobs through the your [user dashboard](https://hpc.nih.gov/dashboard/). - Each job that RNA-seek submits to the cluster starts with the `pl:` prefix. - - === "Snakemake Log" - - [Snakemake](https://snakemake.readthedocs.io/en/stable/) generates the following file, `Reports/snakemake.log`, in each pipeline's working directory. This file contains information about each job submitted to the job scheduler. If there are no problems, snakemake will report 100% steps done in those last few lines of the file. - - You can take a peek of the end of the file by running the following command: - ```bash - tail -n30 Reports/snakemake.log - ``` - - Or more specifically, you can pull out the timestamps of the last few completed jobs like this: - ```bash - grep -A 1 done Reports/snakemake.log | tail - ``` + Each job that RENEE submits to the cluster starts with the `pl:` prefix. === "Query Job Scheduler" @@ -45,34 +36,34 @@ Here are a few suggestions: sjobs ``` - Each job that RNA-seek submits to the cluster starts with the `pl:` prefix. + Each job that RENEE submits to the cluster starts with the `pl:` prefix. -**Q. How do I identify failed jobs?** -**A.** If there are errors, you'll need to identify which jobs failed and check its corresponding SLURM output file. -The SLURM output file may contain a clue as to why the job failed. +**Q: What if the pipeline is finished running but I received a "FAILED" status? How do I identify failed jobs?** -!!! tldr "Find Failed Jobs" +**A.** In case there was some error during the run, the easiest way to diagnose the problem is to go to logfiles folder within the RENEE output folder and look at the `snakemake.log.jobby.short` file. It contains three columns: jobname, state, and std_err. The jobs that failed would have the FAILED state and the jobs that completed successfully would have "COMPLETED" state. +!!! tldr "Find Failed Jobs" === "SLURM output files" - Quick and dirty method to search for failed jobs by looking through each job's output file: + Quick and dirty method to search for failed jobs by looking through each job's output file ```bash - grep -i 'fail' slurmfiles/slurm-*.out - ``` + # Go to the logfiles folder within the renee output folder + cd renee_output/logfiles - === "Snakemake Log" - - [Bash script]( https://github.com/CCBR/Tools/blob/master/Biowulf/get_slurm_file_with_error.sh) identify the SLURM ID of the first failed job and check if the output file exists. + # List the files that failed + grep "FAILED" snakemake.log.jobby.short | less + ``` + All the failed jobs would be listed with absolute paths to the error file (with extension `.err`). Go through the error files corresponding to the FAILED jobs (std_err) to explore why the job failed. Many failures are caused by filesystem or network issues on Biowulf, and in such cases, simply re-starting the Pipeline should resolve the issue. Snakemake will dynamically determine which steps have been completed, and which steps still need to be run. If you are still running into problems after re-running the pipeline, there may be another issue. If that is the case, please feel free to [contact us](https://github.com/skchronicles/RNA-seek/issues). -**Q. How do I cancel ongoing RNA-seek jobs?** +**Q. How do I cancel ongoing RENEE jobs?** -**A.** Sometimes, you might need to manually stop a RNA-seek run prematurely, perhaps because the run was configured incorrectly or if a job is stalled. Although the walltime limits will eventually stop the workflow, this can take up to 5 or 10 days depending on the pipeline. +**A.** Sometimes, you might need to manually stop a RENEE run prematurely, perhaps because the run was configured incorrectly or if a job is stalled. Although the walltime limits will eventually stop the workflow, this can take up to 5 or 10 days depending on the pipeline. -To stop RNA-seek jobs that are currently running, you can follow these options. +To stop RENEE jobs that are currently running, you can follow these options. !!! tldr "Cancel running jobs" @@ -108,11 +99,11 @@ Once you've ensured that all running jobs have been stopped, you need to unlock **Q. Why am I getting `sbatch: command not found error`?** -**A.** Are you running the `rna-seek` on `helix.nih.gov` by mistake. [Helix](https://hpc.nih.gov/systems/) does not have a job scheduler. One may be able to fire up the singularity module, initial working directory and perform dry-run on `helix`. But to submit jobs, you need to log into `biowulf` using `ssh -Y username@biowulf.nih.gov`. +**A.** Are you running the `renee` on `helix.nih.gov` by mistake. [Helix](https://hpc.nih.gov/systems/) does not have a job scheduler. One may be able to fire up the singularity module, initial working directory and perform dry-run on `helix`. But to submit jobs, you need to log into `biowulf` using `ssh -Y username@biowulf.nih.gov`. **Q. Why am I getting a message saying `Error: Directory cannot be locked. ...` when I do the dry-run?** -**A.** This is caused when a run is stopped prematurely, either accidentally or on purpose, or the pipeline is still running in your working directory. Snakemake will lock a working directory to prevent two concurrent pipelines from writing to the same location. This can be remedied easily by running `rna-seek unlock` sub command. Please check to see if the pipeline is still running prior to running the commands below. If you would like to cancel a submitted or running pipeline, please reference the instructions above. +**A.** This is caused when a run is stopped prematurely, either accidentally or on purpose, or the pipeline is still running in your working directory. Snakemake will lock a working directory to prevent two concurrent pipelines from writing to the same location. This can be remedied easily by running `renee unlock` sub command. Please check to see if the pipeline is still running prior to running the commands below. If you would like to cancel a submitted or running pipeline, please reference the instructions above. ```bash # Load Dependencies From b1e7691d4ee9b63f41695126e377191fa4935d72 Mon Sep 17 00:00:00 2001 From: samarth8392 Date: Wed, 21 Aug 2024 14:59:46 -0400 Subject: [PATCH 2/7] docs: remove references to rna-seek --- docs/troubleshooting.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 399ffab..53a7c7c 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -4,15 +4,13 @@ We have compiled this FAQ from the most common problems. If you are running into ## Job Status -**Q: How do I know if RENEE pipeline finished running?** +**Q: How do I know if RENEE pipeline finished running? How to check status of each job?** **A.** Once the pipeline is done running to completion, you will receive an email with header like `Slurm Job_id=xxxx Name=pl:renee Ended, Run time xx:xx:xx, COMPLETED, ExitCode 0` -There are several different ways of checking the status of each job submitted to the cluster. - -Here are a few suggestions: +To check the status of each individual job submitted to the cluster, there are several different ways. Here are a few suggestions: !!! tldr "Check Job Status" @@ -41,12 +39,12 @@ Here are a few suggestions: **Q: What if the pipeline is finished running but I received a "FAILED" status? How do I identify failed jobs?** -**A.** In case there was some error during the run, the easiest way to diagnose the problem is to go to logfiles folder within the RENEE output folder and look at the `snakemake.log.jobby.short` file. It contains three columns: jobname, state, and std_err. The jobs that failed would have the FAILED state and the jobs that completed successfully would have "COMPLETED" state. +**A.** In case there was some error during the run, the easiest way to diagnose the problem is to go to logfiles folder within the RENEE output folder and look at the `snakemake.log.jobby.short` file. It contains three columns: jobname, state, and std_err. The jobs that completed successfully would have "COMPLETED" state and jobs that failed would have the FAILED state. !!! tldr "Find Failed Jobs" === "SLURM output files" - - Quick and dirty method to search for failed jobs by looking through each job's output file + + All the failed jobs would be listed with absolute paths to the error file (with extension `.err`). Go through the error files corresponding to the FAILED jobs (std_err) to explore why the job failed. ```bash # Go to the logfiles folder within the renee output folder @@ -55,9 +53,9 @@ Here are a few suggestions: # List the files that failed grep "FAILED" snakemake.log.jobby.short | less ``` - All the failed jobs would be listed with absolute paths to the error file (with extension `.err`). Go through the error files corresponding to the FAILED jobs (std_err) to explore why the job failed. + -Many failures are caused by filesystem or network issues on Biowulf, and in such cases, simply re-starting the Pipeline should resolve the issue. Snakemake will dynamically determine which steps have been completed, and which steps still need to be run. If you are still running into problems after re-running the pipeline, there may be another issue. If that is the case, please feel free to [contact us](https://github.com/skchronicles/RNA-seek/issues). +Many failures are caused by filesystem or network issues on Biowulf, and in such cases, simply re-starting the Pipeline should resolve the issue. Snakemake will dynamically determine which steps have been completed, and which steps still need to be run. If you are still running into problems after re-running the pipeline, there may be another issue. If that is the case, please feel free to [contact us](https://github.com/CCBR/RENEE/issues). **Q. How do I cancel ongoing RENEE jobs?** @@ -70,20 +68,20 @@ To stop RENEE jobs that are currently running, you can follow these options. === "Master Job" You can use the `sjobs` tool [provided by Biowulf](https://hpc.nih.gov/docs/biowulf_tools.html#sjobs) to monitor ongoing jobs. - Examine the `NAME` column of the `sjobs` output, one of them should match `pl:rna-seek`. This is the "primary" job that orchestrates the submission of child jobs as the pipeline completes. Terminating this job will ensure that the pipeline is cancelled; however, you will likely need to unlock the working directory before re-running rna-seek again. Please see our instructions below in `Error: Directory cannot be locked` for how to unlock a working directory. + Examine the `NAME` column of the `sjobs` output, one of them should match `pl:renee`. This is the "primary" job that orchestrates the submission of child jobs as the pipeline completes. Terminating this job will ensure that the pipeline is cancelled; however, you will likely need to unlock the working directory before re-running renee again. Please see our instructions below in `Error: Directory cannot be locked` for how to unlock a working directory. You can [manually cancel](https://hpc.nih.gov/docs/userguide.html#delete) the primary job using `scancel`. However, secondary jobs that are already running will continue to completion (or failure). To stop them immediately, you will need to run `scancel` individually for each secondary job. See the next tab for a bash script that tries to automate this process. === "Child Jobs" - When there are lots of secondary jobs running, or if you have multiple RNA-seek runs ongoing simultaneously, it's not feasible to manually cancel jobs based on the `sjobs` output (see previous tab). + When there are lots of secondary jobs running, or if you have multiple RENEE runs ongoing simultaneously, it's not feasible to manually cancel jobs based on the `sjobs` output (see previous tab). - We provide [a script](https://github.com/CCBR/Tools/blob/master/Biowulf/cancel_snakemake_jobs.sh) that will parse the snakemake log file and cancel all jobs listed within. + We provide [a script](https://github.com/CCBR/Tools/blob/main/scripts/cancel_snakemake_jobs.sh) that will parse the snakemake log file and cancel all jobs listed within. ```bash ## Download the script (to the current directory) - wget https://raw.githubusercontent.com/CCBR/Tools/master/Biowulf/cancel_snakemake_jobs.sh + wget https://raw.githubusercontent.com/CCBR/Tools/main/scripts/cancel_snakemake_jobs.sh ## Run the script bash cancel_snakemake_jobs.sh /path/to/output/logfiles/snakemake.log @@ -93,13 +91,13 @@ To stop RENEE jobs that are currently running, you can follow these options. This script will NOT cancel the primary job, which you will still have to identify and cancel manually, as described in the previous tab. -Once you've ensured that all running jobs have been stopped, you need to unlock the working directory (see below), and re-run rna-seek to resume the pipeline. +Once you've ensured that all running jobs have been stopped, you need to unlock the working directory (see below), and re-run RENEE to resume the pipeline. ## Job Errors **Q. Why am I getting `sbatch: command not found error`?** -**A.** Are you running the `renee` on `helix.nih.gov` by mistake. [Helix](https://hpc.nih.gov/systems/) does not have a job scheduler. One may be able to fire up the singularity module, initial working directory and perform dry-run on `helix`. But to submit jobs, you need to log into `biowulf` using `ssh -Y username@biowulf.nih.gov`. +**A.** Are you running the `renee` on `helix.nih.gov` by mistake? [Helix](https://hpc.nih.gov/systems/) does not have a job scheduler. One may be able to fire up the singularity module, initial working directory and perform dry-run on `helix`. But to submit jobs, you need to log into `biowulf` using `ssh -Y username@biowulf.nih.gov`. **Q. Why am I getting a message saying `Error: Directory cannot be locked. ...` when I do the dry-run?** @@ -110,7 +108,7 @@ Once you've ensured that all running jobs have been stopped, you need to unlock module load ccbrpipeliner # Unlock the working directory -rna-seek unlock --output /path/to/working/dir +renee unlock --output /path/to/working/dir ``` **Q. Why am I getting a message saying `MissingInputException in line ...` when I do the dry-run?** From c984d86d44ecb07365db92d3f186f1079f25daf9 Mon Sep 17 00:00:00 2001 From: samarth8392 Date: Wed, 21 Aug 2024 15:06:34 -0400 Subject: [PATCH 3/7] docs: update renee_gui to renee gui --- docs/RNA-seq/images/gui_nx_renee.png | Bin 516709 -> 516680 bytes 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/docs/RNA-seq/images/gui_nx_renee.png b/docs/RNA-seq/images/gui_nx_renee.png index c88b3d37ecf79ade935165e9ffd6dfb7ef579847..e728832557197a9d20ac8ce3dd1dc6a095636269 100644 GIT binary patch delta 4257 zcmeH}XHe5?v%o2V1o2Qr=@>dHO=%Gh5SkQe2MYlykpqYrq)3yNAfZYL(*7t_kBEqn zBZ?4df)J1v5-9>9NN<8PsVOh#yzk6C_tX7y@3;HR&d%&__SxBIXZM+%Yln8VLu2FF z<2m9v<3aIU@!apgdwrNvXr>6tWy}N;_dr~m%rEPxkZA;Yn0Vv73^x}@R>1nWynOz_ z1EXz;OeYg*C(AQW=M>%~ytvkqX(^j}S7dN?*pchbl0@FxaPX})Dxe#DE1)FibVC92 z$46@&1HvT|w`6f*LBt3+dbxsqF)+&+%{isiPOND4>dQtv6Q6U)~`oXGo%X@=(yRyXQ$rjL1wNKNL-7Q{yFDG%On)>wc$nS4D zT^$o)wEI|L6(^$+_bAoAC)%l}BW4A6Zd3XVq4o9i4Oaa3 zmndcZ%;(v5vY%yTES$`M%`yOuP26ivtxbzrEDB%T*T$e0tLkXau2(vybmtg`%!^=> zIvYq?qZVtO^!=^dOOyc0vyN2X(Hd{VU~ZD{Q-vt)o#;IoMtbO^N8;omb60n(i*Kxv zkSA8Quzh9M%^XI96i8*~LuY&K<>Y?7{pE#i$Qohc!+zcUjk%DKqVTJ%SW>K98UL~nAN|*XrSL)o1CD+ zY3&-=lB?HQiN1_Pgl~P0$G~Kh9v@oDO;%cgStG6`ibYzz|^MKa}IMKWr zowYF92b**~fb1$j3|SHVHXZ2bTcvn#(9Ec8n{su1+*%8ZS%V(e>97AJmdL&D&S$vM zE>za&l2BhVvB?Vu*vViSjP!u+gWFTD0|$|fR+q9=qv;!+R=YJjg2S!1Y}DVs&3S^p zRJG}zOTFR|F)E_dmXoK-Tcq05&KsjDffa|mQhytOhjzQ(m~5ogQy=Es5xXxg+}?dD)ez%hBDgs(U$G05EaGvAXKKM6oN z87y^tpv{Rm_qE19CfKSU7$Ad{jzKe{`b!4vGVO-|b_!UI9S0~-z{;2l;mIzegBp%| z;aiAeB?lI@&qFGle@^lzskXIQ9(&cZ@kH>?jh#cE5~`<%oImyJ9<}Ea0YHl|vfr2S zx1me#XB;fxwk?WB`qB+AV3sOm(q{?E;RF{;@Y5FX~fRYEci6Hrs|8E3$-AQMs`H*jGHED9J6fsbK4&5ZIa+OiUJs^(6Rr#{?wBV5O0SxbDa>&0iOYR_E!2?@j1Zq_4i;YH(BZ%f{yy!idU zs>*M)z0#`R1o#I?Yaf(RahC-y-r(Rj+$-(rr#O98DU4nVUik70oGw$;J<27ldZQiV ztfKQJW_Myi17~T@UKe{dwIzC;^0J^jp`ubbW|MBKs?iYZ6}4cz(`cC~%#2Im)nQ&% z9een6u>oqBYFE1?w{3Pnw3tE&J9Vxgyv1^#8~o@~9qi7hqB5S((A5p2=^yM4&XI$P zQnU+3nlWhdSbB@|YBjaq+a>{yIfqe;M|jNWhzIGqD!J+=wslN=jLWV&L%(D`-7~?z z(rk8+&A6hm;u<#U9n#7rGLR0VS(4x_iPg;qcVOnZ9rD_uM0xezN4kpbvjmnldaWmp z$x~Qlsu9+>H*K8}P_LPR>_*_cs(mP=1Ny>yy|tv{2=qhUGc#&RawHuXW^zw?kKd=< zF?!!Hh2Pkx6OXbI|AzC^IDtfGr)vGS>!h4YdWQLZSzOR9T!v%qj}_z@myGAl=d%x$ zi$Xa@qL6t%Wary1ej;31^r$c#z8Bpr&2T!8VUJlEpM17Of|pdcFe4+U5=0goM$o`+ zhvZX-(d3sIf%YJmS~0SwPOyC>jJAF}t3j0{HsuUAX>6BtuRKVsi0>1oH|(@9mH>2pi^Y`MAs#P zRu=$W{RZRgqGc7a=)fy}jtFemQi);ut*!lD&HkG~PW2lob6GfJ`^&LVb>8>)Xz%ZK z`NZ>F-#{9TsjXvq_GIoMLg5te?p@-?tv1#1BwbC|PGae2OKsglzg{8iO_7-^w;glmH>`8JA9_6vndZ$3{noTglN*heZU`H0s?W%Ct5fkkCIgW~ zabU675R!rN;&du7AQeuX(m3Wf_AZ5nKD-;b`B`ZRyE|EKtk!eb#V2_&@O4A#j&bR! z2lgdl;%=BmFP@kUYvl7&aO70U&ssld)>fO;+*U)bOI5oByO+>$9nRZYnKp$Vx0k;7 z&i6hJIchA5Eo{Qf)YP=FZ!u!*aLQerW@w{S60H$4ujxb1Qb3lD`!}RB89y-O00bkC zM*mWPHXSru*lkZ5pqL8cRBNAdb4auOEkMdJSOjxqTuq5};o%8TNY*n%4cpuZ*)bXG zjIx%m>}kbNHcoUJ7YX)WW?mmBpfXo z$Oyj58O!^>S3ddUL=Fp{maRMUvd}~fB+KEi7EG+xKp{L24hU*`n-XbJxi0y*IZ(Q! z-G?4xtfJeG0U6*oDp(k}Nj(~kBL&7l*jr|co$EJx^qYC`)Y};dEY&^lo`v-vI-w3O zQD-0CyW;gixcD`2gI4`#xf zO?`?Y6l3Npk!6J7g%8I$gXepnMr_S*{o2F57%ya3+7z`*`S(8d$nmukW~zoS+2H8W zHlQ#@ut=3#L*8(ei^VQ23cvhBAMjoak37zQxPUI-`IGa34oF^Tq}U+JRWaYEI3iJ4 z&DFAn`7_|6P_Lg}s%9&F_YW3jZK7cbN~d*|m!O}#_Zz)jiC|@zcYYs{s`U-;UKvE! z6>NI8vKg^~{w=@~1$_J;9sWX}eHYeCG!)(U^x2P&n*>P*a6))g21_s5^C&;42XJL_ zF}MBV-<-4t`~uC;ZOEIzDGQ`i3z|WeXypJAWd?4!C#17`>f=7S@)^URc!#GS%96DQm%lo_z^7h!j!wKTTi1+i*y)bZvK@n5 zc25ovxQK?-cU%)&do)A;Dc(s^?X)yz11TCw(={%zi_APp{3Urxzq&Q3Dt7W)#>zOZ z6$_M(9DU~a5iqws8zlClm;A>bk(xM}D2Fryb?=y_!#saCi|(^6Z>plSuLvXte}|0x zXIzhl_D}B~VWWJu2imCNfp@;?byw^@O0Z5idv!g|6cYVM>Y6@L3QE-FXaR%)b{J=DkC6ah?!) zU#_)WbY;Kx?zn*V=9aY_Cnsl_37(hI$#jQhZQyoVpJVNkxBo3=GgE@V~OzGYn&pWRlG3YeD(>N~( zcZ6h^8tjyrbAO>`c-dE>nOjA`R(Qp4!t~dEt~zRWc3cOq(~4WXnF-6=!A4rIq-67)V5DR5)3HW`?p z>i$5Vl-5b<|JnEd!WCna@xK*TSz{^vyrPUw+OmmW0mU2Ue{Ilw$>o@X)L6g(k666y zl!Lo@n#8b!)E;ax7CsGsA|T~z4xFC`O8|r^aICmqT8rE^{BrVd>jlPBV78-Ub;-`G J+T>>3zW|R|cCY{d delta 4620 zcmeHJXHXN|wgp6j6ln?~QWOvoBqB-)Qly0@C`fOLG^qlimrz1ist~CWl%fcsNr%v+ z1qehT^ne&3bm;*?9`Bob=icAn+&k~jJF|YA*)wbHea_zJ?A_bNPw3(ggHgk1VCP`8 zu=6mw8o=Qow1hvI%f(_+g+3zu>VpxM8^*${0<`L?ciyu`($KKQ3e&N(>ynk7c6pLb zRRl~mZ{-t&E2D~@q3(~+a^^Fx__~zReA{K#@xysLY)r%Dyf^d^GchyFl%|k8`GWyD zM~6_qq4k?mLFH)k8xtmDWfnaX%n}9J9<5P{*W^?Zpj^NrRR9Y%F`OC{e0X;qk3tM7 z9=n=XJG@2D-N*z+M@KuMGw!%o_sq5g5!=Ee)iGH+lu?{@;gkm|ms8wS6FOcKC*=^F z80x!&aT=PIgWPsZBZW!n#FFQRa(B0NfSzMfAvNpiCOP<{Cks(H2eHS4M@D9~%-ZSC zuFF8hF|rFxGUWjxgWin2soabCF??PUoxcM6QFenR_@oa~vWF~zhx0L3#?k9jbw3K! z6P8y>kwtRei^GvDip$okki$rpWz@jNpbQ?82{tabByNn(%?UJVvG|j=2}7qa4n2Xs z8`r_6YVd7!u8=gk!`6gvZ$jqE>XtM2R-wdiIO~t!TyXLIHHwf$E@(_oBPMkMywTGU z#pp!Z_83Vm)Oqzi{<5Ms9p+WOaL|MO7K$P{*9UJKji&3nK?F)>L5g6NYA>Tfd>K;CKxC$=S!##_~CE_1z;@t z!tM_sFS}fOTKCJ}eR)hm1^F@rhdK?N{Eo>lL%yaA?UI{*vv1&YrA?j;}$jVN@l zek{VBDtd>7kP4;zq0Qv`H3{U=Zn4viX4NEIynh;mm(LIuMbe7Ajew7P_U`4EIv+D4H*}j_-j}K-J#Ncn82>twxgSy+gN}%AH zk?s7*oXXX8)JOubRa1%Fr|++b&@!>|xjdNlAc4<+jv7u{Mdny(ANKEPwG4f{Jgzs z)OY7{23+BVpzfLv|-q7IqD5ES?JvBzvar5~QEcex(3q~T{L_#f;|(NB9ErH;@WVoE z{#@I`@Dr0BBC~lyY)`Q!80GF&x_JapxXG5!nE{Ya6zZfpGGfZWF)2FbDlzw zFWLa^a?r4ZC|)gZ8QPVCo(4B=bubc7b}+@+?u~KMuH(u$OYgT+W!}lvE;W496&9Mm zqKsa>-a~GDMBqKar5A2LnlrsH4W*DJUi>&8RW}j{(KRvj3NW7stsg}x{WRoS`YAW&o}P=ZS5_@y9E6+zku_>n#3ip>VBPQ z-Y4%QjJcl-I{Oj2@4UQyLNbJUUT0#(5x@Dl{qD_x7l{Ls*KcCT$Q^GR&fTPi zwQH%B3k^$oe!-o&o+k#ajY8fEX`y3FN_C^2N_O?zh69hd{p7dgFCwSK-pxtISx~IZx~)&wLYK)=|zUe5a8R+mX z+Fq~#)SQa$bc8vQ>G#S-I?CEn<)6`2p5UJZQXq){Z(@=( zFO};1p|AAV#L`pUvAcqX8typ`5P*hX5ZX#d24{wn&`nO=Z!#erYYqK7;pYHOTqthk z{fxC!9Lgw!wUQRL3xG%vl*t0w@0TGAdQL{YW6PZR8_3EgRhRj8oeFy1_*=Whjs<+$vXG zh>K(T)wN@I0^|OCGh~~SjfklB*o-N?a|C)gt)=>X{8P)X{FH^B@0+sRX3u8oMSAO2 zm;CEZb90KkC|^F=&MP`HJp@iyg}i^+EUGMRr`6m$0qvID^g6m5otX5%D|@{A$zZ93 z?W0A#VlX%5W#jY5BlG%)l-Du?(q<+nNyX4TX-lpK4h{lxBWRi+*$o=VtXqlUiLCyt zvucl-!cP{tsk*!g0w6YnXcu#(xrp3EZ8aJvu%nmNyxXoaYk% zIOX0nXn$$E(MPQ1^k{W={AJQe&}OR+*)roGg%<>Is%lEue0#r|keP0&UTQe9;hS`fMGy{Ma|5G0*@76PP`S!TGc zjml9fFnS+_LeTkL#Kwu_X(GsL#2VKLE@Fm-dp_l43BTL)4fPRqLIQ@W{`6luIE+aq z=EwHvx|M`e(TDv5aL_b>1HLr{poO+UZp<00bXZ&{NW^4HWoP76zQ4FN%ig!*b-48+ z@Sx>DyKEszDNm71)c2gR1f@%LY}u^9efCsDeWEJ%JGcxkaYL{2yok`A(fYD>5~m;| zjzlNp=zGnqcf!um7TpVH>!UY3A)pYb79-y)g2_eO{=+h?$rLI(3G|Lh-owt7E6np^=_QFUFn9oDV6TqJ9%shTpC7 zf@uFOp$pgN;L{eI?kD$>JUYZ=vKB@K>fnXbBiNG1RzJtA;|p@mf;4dO$viRIpZw-R zOab%dTX?j5vnKTSUiy%%d^d_-Q#wHqoH$&obeeLyJF|%;7Uyc%kSB3Yrza$d&bH=& z9jEq)sFR^8MbJk5f)nMGeEGs! zgls8IHu(X`X?Pw|7l-~Zk4LlC>KHjM8oY+%CDh2T`#7Vf9v@derxKu|`3rFR5db&z zM_$-5kHbs;Hz$?V!HVr3G|6?;Fj`LM;eujVy7Dx-2b-dJU%JCIzEDT{N;+bTYux^O zC6~0@BxFsaf+}2+{`Wv>h2(SClIMH$_2r`@*-|=3!Q_Ff66mKXna|%iJY}Xy13#xh zpvMGMa!mJ7kp(};EyH& zLeL+nR7j7*7SZn)({yed ze&)*iVb7{>W08V1p-j4Mr!q@g-io{--kT`$vD@wqi(AB5K`YD9Tl!Ae@BTE@e>>&V zTl+2K*!OHqeoWNA*Gfw%H9`6|MT8W(@^_{$RZ|{+_lPFlg1U${$quO;zIDlQ*WQWS zHvwr-|EoLOfCQgdw3ggyF==()ORuQZwc#qQI%FzUa}PoJuBQh%Mc1&sYi7j@Z2h@D z$7yI^|B-(O=Mo(3E~mjjBXIX3%{CflhrYVWcF%q9brJlKo`KtuNJFoRbAR;}e(3vs+IR)A{hOJKMH!3f4~~jtDpNi> zV)AD42cRDmv;_qPCDL_8cLTfYWZj&d<6iXJ>flmc^vw-*YK)Ri8#5k`8#$JJ7EbrI ztb*JLa7EmB?~z@#f9Ln{aF%&N5U)^1xiE|GsF{niRv|B3+)9UUs+fLps_1J7BT2EQ zD1(358W30L(idsvGnQp;@Ty1nOulDdG=fws>Vj8kg;&l3u6eY&^-tC%Z>qXD$8Grh z2fY5j)Bp49h5om5T9}u|mzs3VYjItqbpZcSZ-B;zH4Z7Zn?0phKh)$21$jNKFUYHU z@=RDy-k0gg0+|k*+1lC)bM{li_RMb74{gWqxAX4+#hx`5zz$VUuAerto15veL1UG= zY>Z1F>KFdRoSA)uYpQZ?oU%e=QD2`BdsRiA`B?C{`Sd?+qFUz(p1e{w%Do{jdB|h;H@48f$@@>yUjB{XP3*8dFLYI0 zuZ%`Pqzln;^-rJj6ncoN`8c5-iP*8m!+QL=CEZQIK6CgF=T2iO@DKBVM0n0Tfc+}* c?x_Zma?#cy=_ Date: Thu, 22 Aug 2024 16:26:14 -0400 Subject: [PATCH 4/7] docs: Added expected output tab to doc website --- docs/RNA-seq/output.md | 207 +++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 208 insertions(+) create mode 100644 docs/RNA-seq/output.md diff --git a/docs/RNA-seq/output.md b/docs/RNA-seq/output.md new file mode 100644 index 0000000..5e17894 --- /dev/null +++ b/docs/RNA-seq/output.md @@ -0,0 +1,207 @@ +After a successful `renee` run execution for multisample paired-end data, the following files and folders are created in the output folder. + +```bash +renee_output/ +├── bams +├── config +├── config.json # Contains the configuration and parameters used for this specific RENEE run +├── DEG_ALL +├── dryrun.{datetime}.log # Output from the dry-run of the pipeline +├── FQscreen +├── FQscreen2 +├── fusions +├── kraken +├── logfiles +├── nciccbr +├── preseq +├── QC +├── QualiMap +├── rawQC +├── Reports +├── resources +├── RSeQC +├── sample1.R1.fastq.gz -> /path/to/input/fastq/files/sample1.R1.fastq.gz +├── sample1.R2.fastq.gz -> /path/to/input/fastq/files/sample1.R2.fastq.gz +... +.. +. +├── sampleN.R1.fastq.gz -> /path/to/input/fastq/files/sampleN.R1.fastq.gz +├── sampleN.R2.fastq.gz -> /path/to/input/fastq/files/sampleN.R2.fastq.gz +├── STAR_files +├── trim +└── workflow +``` + +## Folder details and file descriptions + +### 1. `bams` + +Contains the STAR aligned reads for each sample analyzed in the run. + +```bash +/bams/ +├── sample1.fwd.bw # forward strand bigwig files suitable for a genomic track viewer like IGV +├── sample1.rev.bw # reverse strand bigwig files +├── sample1.p2.Aligned.toTranscriptome.out.bam # BAM alignments to transcriptome using STAR in two-pass mode +├── sample1.star_rg_added.sorted.dmark.bam # Read groups added and duplicates marked genomic BAM file (using STAR in two-pass mode) +├── sample1.star_rg_added.sorted.dmark.bam.bai +... +.. +. +``` + +### 2. `config` + +Contains config files for the pipeline. + + +### 3. `DEG_ALL` + +Contains the output from RSEM estimating gene and isoform expression levels for each sample and also combined data matrix with all samples. + +```bash +/DEG_ALL/ +├── combined_TIN.tsv # RSeQC logfiles containing transcript integrity number information for all samples +├── RSEM.genes.expected_count.all_samples.txt # Expected gene counts matrix for all samples (useful for downstream differential expression analysis) +├── RSEM.genes.expected_counts.all_samples.reformatted.tsv # Expected gene counts matrix for all samples with reformatted gene symbols (format: ENSEMBLID | GeneName) +├── RSEM.genes.FPKM.all_samples.txt # FPKM Normalized expected gene counts matrix for all samples +├── RSEM.genes.TPM.all_samples.txt # TPM Normalized expected gene counts matrix for all samples +├── RSEM.isoforms.expected_count.all_samples.txt # File containing isoform level expression estimates for all samples. +├── RSEM.isoforms.FPKM.all_samples.txt # FPKM Normalized expected isoform counts matrix for all samples +├── RSEM.isoforms.TPM.all_samples.txt # TPM Normalized expected isoform counts matrix for all samples +├── sample1.RSEM.genes.results # Expected gene counts for sample 1 +├── sample1.RSEM.isoforms.results # Expected isoform counts for sample 1 +├── sample1.RSEM.stat # RSEM stats for sample 1 +│   ├── sample1.RSEM.cnt +│   ├── sample1.RSEM.model +│   └── sample1.RSEM.theta +├── sample1.RSEM.time # Run time log for sample 1 +... +.. +. +├── sampleN.RSEM.genes.results +├── sampleN.RSEM.isoforms.results +├── sampleN.RSEM.stat +│   ├── sampleN.RSEM.cnt +│   ├── sampleN.RSEM.model +│   └── sampleN.RSEM.theta +└── sampleN.RSEM.time + +``` + +### 4. `FQScreen` and `FQScreen2` + +These folders contain results from quality-control step to screen for different sources of contamination. FastQ Screen compares your sequencing data to a set of different reference genomes to determine if there is contamination. It allows a user to see if the composition of your library matches what you expect. These results are plotted in the multiQC report. + +### 5. `fusions` + +Contains gene fusions output for each sample. + +```bash +fusions/ +├── sample1_fusions.arriba.pdf +├── sample1_fusions.discarded.tsv # Contains all events that Arriba classified as an artifact or that are also observed in healthy tissue. +├── sample1_fusions.tsv # Contains fusions for sample 1 which pass all of Arriba's filters. The predictions are listed from highest to lowest confidence. +├── sample1.p2.arriba.Aligned.sortedByCoord.out.bam # Sorted BAM file for Arriba's Visualization +├── sample1.p2.arriba.Aligned.sortedByCoord.out.bam.bai +├── sample1.p2.Log.final.out # STAR final log file +├── sample1.p2.Log.out # STAR runtime log file +├── sample1.p2.Log.progress.out # log files +├── sample1.p2.Log.std.out # STAR runtime output log +├── sample1.p2.SJ.out.tab # Summarizes the high confidence splice junctions for sample 1 +├── sample1.p2._STARgenome # Extra files generated during STAR aligner +│   ├── exonGeTrInfo.tab +│   ├── . +│   ├── . +│   └── transcriptInfo.tab +├── sample1.p2._STARpass1 # Extra files generated during STAR first pass +│   ├── . +│   └── . +... +.. +. + +``` + +### 6. `kraken` + +Contains per sample kraken output files which is a Quality-control step to assess for potential sources of microbial contamination. Kraken is used in conjunction with Krona to produce an interactive reports stored in `.krona.html` files. These results are present in the multiQC report. + +### 7. `logfiles` + +Contains logfiles for the entire RENEE run, job error/output files for each individual job that was submitted to SLURM, and some other stats generated by different software. Important to diagnose errors if the pipeline fails. The per sample stats information is present in the mulitQC report. + +```bash +/logfiles/ +├── master.log # Logfile for the main (master) RENEE job +├── mjobid.log # SLURM JOBID for the master RENEE job +├── runtime_statistics.json # Runtime statistics for each rule in the RENEE run +├── sample1.flagstat.concord.txt # sample mapping stats +├── sample1.p2.Log.final.out # sample STAR alignment stats +├── sample1.RnaSeqMetrics.txt # sample stats collected by Picard CollectRnaSeqMetrics +├── sample1.star.duplic # Mark duplicate metrics +... +.. +. +├── slurmfiles +│   ├── {MASTER_JOBID}.{JOBID}.{rule}.{wildcards}.out +│   ├── {MASTER_JOBID}.{JOBID}.{rule}.{wildcards}.err +│   ... +│   .. +│   . +├── snakemake.log # The snakemake log file which documents the entire pipeline log +├── snakemake.log.jobby # Detailed summary report for each individual job. +└── snakemake.log.jobby.short # Short summary report for each individual job. +``` + +### 8. `nciccbr` + +Contain Arriba resources for gene fusion estimation. Manually curated and only exist for a few reference genomes (mm10, hg38, hg19). + +### 9. `preseq` + +Contains library complexity curves for each sample. These results are part of the multiQC report. + +### 10. `QC` and `rawQC` + +Contains per sample output from FastQC for raw and adapter trimmed fastq files with insert size estimates. These results are part of the multiQC report. + +### 11. `QualiMap` + +Contains per sample output for Quality-control step to assess various post-alignment metrics and a secondary method to calculate insert size. These results are part of the multiQC report. + +### 12. `Reports` + +Contains the multiQC report which visually summarizes the quality control metrics and other statistics for each sample (`multiqc_report.html`). All the data tables used to generate the multiQC report is available in the `multiqc_data` folder. The `RNA_report.html` file is an interactive report the aggregates sample quality-control metrics across all samples. This interactive report to allow users to identify problematic samples prior to downstream analysis. It uses flowcell and lane information from the FastQ file. + +### 13. `resources` + +Contains resources necessary to run the RENEE pipeline. + +### 14. `RSeQC` + +Contains various QC metrics for each sample collected by RSeQC. These results are part of the multiQC report. + +### 15. `STAR_files` + +Contains log files, splice junction tab file (`SJ.out.tab`), and `ReadsPerGene.out.tab` file, and other various output files for each sample generated by STAR aligner. + +### 16. `trim` + +Contains adapter trimmed FASTQ files for each sample used for all the downstream analysis. + +```bash +trim +├── sample1.R1.trim.fastq.gz +├── sample1.R2.trim.fastq.gz +... +.. +. +├── sampleN.R1.trim.fastq.gz +└── sampleN.R2.trim.fastq.gz + +``` + +### 17. `workflow` + +Contains the RENEE pipeline workflow. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 8d3b628..28236d7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -104,6 +104,7 @@ nav: - cache: RNA-seq/cache.md - unlock: RNA-seq/unlock.md - Graphical Interface: RNA-seq/gui.md + - Expected Output: RNA-seq/output.md - Resources: RNA-seq/Resources.md - FAQ: - General Questions: general-questions.md From d33c0563028416ad6a56a3d6ef1fcada80a83b5f Mon Sep 17 00:00:00 2001 From: samarth8392 Date: Thu, 22 Aug 2024 17:22:28 -0400 Subject: [PATCH 5/7] chore: updated CHANGELOG.md --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 05d6e1e..a27c3c1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,7 @@ - Add GUI instructions to the documentation website. (#38, @samarth8392) - The docs website now has a dropdown menu to select which version to view. The latest release is shown by default. (#150, @kelly-sovacool) - Show the name of the pipeline rather than the python script for CLI help messages. (#131, @kelly-sovacool) +- Added Expected output tab to the documentation website and updated FAQs (#146, #147, @samarth8392) ## RENEE 2.5.12 From 9b9d6993da06507b34e1a196507274d69771392d Mon Sep 17 00:00:00 2001 From: samarth8392 Date: Mon, 26 Aug 2024 10:17:17 -0400 Subject: [PATCH 6/7] fix: script links in troubleshooting.md doc --- docs/troubleshooting.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 53a7c7c..6dc2259 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -77,11 +77,11 @@ To stop RENEE jobs that are currently running, you can follow these options. === "Child Jobs" When there are lots of secondary jobs running, or if you have multiple RENEE runs ongoing simultaneously, it's not feasible to manually cancel jobs based on the `sjobs` output (see previous tab). - We provide [a script](https://github.com/CCBR/Tools/blob/main/scripts/cancel_snakemake_jobs.sh) that will parse the snakemake log file and cancel all jobs listed within. + We provide [a script](https://github.com/CCBR/Tools/blob/c3324fc0ad2f9858438c84bbb2f24927a8f3a220/scripts/cancel_snakemake_jobs.sh) that will parse the snakemake log file and cancel all jobs listed within. ```bash ## Download the script (to the current directory) - wget https://raw.githubusercontent.com/CCBR/Tools/main/scripts/cancel_snakemake_jobs.sh + wget https://raw.githubusercontent.com/CCBR/Tools/c3324fc0ad2f9858438c84bbb2f24927a8f3a220/scripts/cancel_snakemake_jobs.sh ## Run the script bash cancel_snakemake_jobs.sh /path/to/output/logfiles/snakemake.log From 58259dc15060b1a20ca89d16d72934966538b875 Mon Sep 17 00:00:00 2001 From: Kelly Sovacool Date: Mon, 26 Aug 2024 11:20:02 -0400 Subject: [PATCH 7/7] chore: update changelog entry with PR number --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a27c3c1..be802bf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,7 +18,7 @@ - Add GUI instructions to the documentation website. (#38, @samarth8392) - The docs website now has a dropdown menu to select which version to view. The latest release is shown by default. (#150, @kelly-sovacool) - Show the name of the pipeline rather than the python script for CLI help messages. (#131, @kelly-sovacool) -- Added Expected output tab to the documentation website and updated FAQs (#146, #147, @samarth8392) +- Added Expected output tab to the documentation website and updated FAQs (#156, @samarth8392) ## RENEE 2.5.12