From be85cbac144ca3dbf7e2b9db040bba5fb918c35c Mon Sep 17 00:00:00 2001 From: jbloom Date: Thu, 14 Dec 2023 11:32:17 -0500 Subject: [PATCH 1/5] add `markdown` to `conda` env --- environment.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/environment.yml b/environment.yml index e73b931..188748a 100644 --- a/environment.yml +++ b/environment.yml @@ -10,6 +10,7 @@ dependencies: - black-jupyter - jupyterlab - mamba + - markdown=3.5 - nbqa - pandas=2.1 - papermill=2.4 From 9a2864e4fb5f4bff09220c3d4d9105399bafc096 Mon Sep 17 00:00:00 2001 From: jbloom Date: Thu, 14 Dec 2023 12:06:50 -0500 Subject: [PATCH 2/5] set up skeleton to build docs --- .gitignore | 3 +++ README.md | 21 +++++++++++++++++++++ seqneut-pipeline.smk | 38 ++++++++++++++++++++++++++++++++++++++ test_example/config.yml | 18 ++++++++++++++++++ 4 files changed, 80 insertions(+) diff --git a/.gitignore b/.gitignore index 51c8045..b2f64e5 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,6 @@ !.git* _* *.out + +docs/* +!docs/*.html diff --git a/README.md b/README.md index e540615..5b985e8 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ This is a modular analysis pipeline for analyzing high-throughput sequencing-based neutralization assays of the type developed in the [Bloom lab](https://research.fredhutch.org/bloom/en.html). See **[add link to Loes et al when available]** for a description of these assays. That paper is also the scientific citation for this analysis pipeline. +See [here](https://github.com/jbloomlab/seqneut-pipeline/graphs/contributors) for a list of the contributors to this pipeline. Essentially, this pipeline goes from the FASTQ files that represent the counts of each barcoded viral variant to the computed neutralization titers for each sera. The titers are computed by fitting Hill-curve style neutralization curves using the [neutcurve](https://jbloomlab.github.io/neutcurve/) package; see the documentation for the details of these curves. @@ -94,6 +95,26 @@ This will almost always be a subdirectory of the same name, so this key will be seqneut-pipeline: seqneut-pipeline +### docs +Location where we create the `./docs/` subdirectory with HTMLs for rendering on GitHub pages. +This will almost always be `docs`, so this key will be as shown below unless you have a good reason to do otherwise: + + docs: docs + +### description +Description of pipeline, used in the HTML docs rendering. +Should include title (with markdown `#` heading, authors and/or citation, and link to GitHub repo. +For instance: +``` +description: | + # + <short description> + + <authors and/or link to citation> + + See <GitHub repo link> for code and numerical data. +``` + ### viral_libraries A dictionary (mapping) of viral library names to CSV files holding these libraries. So in general this key will look like: diff --git a/seqneut-pipeline.smk b/seqneut-pipeline.smk index ac48753..2f846ac 100644 --- a/seqneut-pipeline.smk +++ b/seqneut-pipeline.smk @@ -257,9 +257,47 @@ rule notebook_to_html: "jupyter nbconvert --to html {input.notebook} &> {log}" +rule build_docs: + """Build the HTML documentation.""" + input: + titers_chart=rules.aggregate_titers.output.titers_chart, + serum_titers_htmls=lambda wc: expand( + "results/sera/{serum}/serum_titers_{serum}.html", + serum=sera_plates(), + ), + process_counts_htmls=expand( + "results/plates/{plate}/process_counts_{plate}.html", + plate=plates, + ), + output: + index=os.path.join(config["docs"], "index.html"), + titers_chart=os.path.join( + config["docs"], os.path.basename(rules.aggregate_titers.output.titers_chart), + ), + serum_titers_htmls=lambda wc: expand( + os.path.join(config["docs"], "serum_titers_{serum}.html"), + serum=sera_plates(), + ), + process_counts_htmls=expand( + os.path.join(config["docs"], "process_counts_{plate}.html"), + plate=plates, + ), + params: + description=config["description"], + sera=lambda wc: list(sera_plates()), + plates=list(plates), + conda: + "environment.yml" + log: + "results/logs/build_docs.txt", + script: + "scripts/build_docs.py" + + seqneut_pipeline_outputs = [ rules.qc_process_counts.output.qc_summary, rules.qc_serum_titers.output.qc_summary, rules.aggregate_titers.output.titers, rules.aggregate_titers.output.pickle, + rules.build_docs.output.index, ] diff --git a/test_example/config.yml b/test_example/config.yml index b21c84e..67c0bf5 100644 --- a/test_example/config.yml +++ b/test_example/config.yml @@ -8,6 +8,24 @@ # ALMOST CERTAINLY BE "seqneut-pipeline" FOR REAL ANALYSES** seqneut-pipeline: ../ +# Location where we place the docs for rendering on GitHub Pages, typically "docs" +# **FOR THIS TEST EXAMPLE ONLY, THIS LOCATION IS "../docs" BUT IT SHOULD +# ALMOST CERTAINLY BE "docs" FOR REAL ANALYSES** +docs: ../docs + +# Description of the project in markdown, should have title, authors and/or citation, +# and link to the GitHub repo. +# **CHANGE THIS TO A GOOD DESCRIPTION OF YOUR PROJECT** +description: | + # Test example for [seqneut-pipeline](https://github.com/jbloomlab/seqneut-pipeline) + This is a small toy-example created by subsetting a real experiment dataset. + + See [https://github.com/jbloomlab/seqneut-pipeline](https://github.com/jbloomlab/seqneut-pipeline) + for the computer code and underlying numerical data. + + See [here](https://github.com/jbloomlab/seqneut-pipeline/graphs/contributors) for a + list of all contributors to the pipeline. + # Specify each viral library and corresponding CSV matching barcode to strain name # CSV must have columns "barcode" and "strain" viral_libraries: From 55ee14c73451209d09dd029e30e852d40326367f Mon Sep 17 00:00:00 2001 From: jbloom <jbloom@fredhutch.org> Date: Thu, 14 Dec 2023 14:57:56 -0500 Subject: [PATCH 3/5] create docs --- docs/index.html | 26 + docs/process_counts_plate11.html | 9663 +++++++++++++++++ docs/process_counts_plate2.html | 9663 +++++++++++++++++ docs/serum_titers_M099d0.html | 8213 ++++++++++++++ docs/serum_titers_M099d30.html | 8211 ++++++++++++++ docs/serum_titers_Y044d30.html | 8211 ++++++++++++++ docs/serum_titers_Y154d182.html | 8204 ++++++++++++++ docs/titers.html | 41 + scripts/build_docs.py | 60 + seqneut-pipeline.smk | 15 +- .../results/sera/M099d30/titers_median.csv | 0 11 files changed, 52294 insertions(+), 13 deletions(-) create mode 100644 docs/index.html create mode 100644 docs/process_counts_plate11.html create mode 100644 docs/process_counts_plate2.html create mode 100644 docs/serum_titers_M099d0.html create mode 100644 docs/serum_titers_M099d30.html create mode 100644 docs/serum_titers_Y044d30.html create mode 100644 docs/serum_titers_Y154d182.html create mode 100644 docs/titers.html create mode 100644 scripts/build_docs.py mode change 100755 => 100644 test_example/results/sera/M099d30/titers_median.csv diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..858b06d --- /dev/null +++ b/docs/index.html @@ -0,0 +1,26 @@ +<h1 id="test-example-for-seqneut-pipeline">Test example for <a href="https://github.com/jbloomlab/seqneut-pipeline">seqneut-pipeline</a></h1> +<p>This is a small toy-example created by subsetting a real experiment dataset.</p> +<p>See <a href="https://github.com/jbloomlab/seqneut-pipeline">https://github.com/jbloomlab/seqneut-pipeline</a> +for the computer code and underlying numerical data.</p> +<p>See <a href="https://github.com/jbloomlab/seqneut-pipeline/graphs/contributors">here</a> for a +list of all contributors to the pipeline.</p> +<div class="toc"><span class="toctitle">Contents</span><ul> +<li><a href="#plot-of-titers-for-all-sera">Plot of titers for all sera</a></li> +<li><a href="#analyses-of-per-serum-neutralization-titers">Analyses of per-serum neutralization titers</a></li> +<li><a href="#analyses-of-per-plate-counts-and-curve-fits">Analyses of per-plate counts and curve fits</a></li> +</ul> +</div> +<h2 id="plot-of-titers-for-all-sera">Plot of titers for all sera</h2> +<p><a href="../docs/titers.html">Interactive chart of titers</a></p> +<h2 id="analyses-of-per-serum-neutralization-titers">Analyses of per-serum neutralization titers</h2> +<ul> +<li><a href="../docs/serum_titers_M099d0.html">M099d0</a></li> +<li><a href="../docs/serum_titers_M099d30.html">M099d30</a></li> +<li><a href="../docs/serum_titers_Y044d30.html">Y044d30</a></li> +<li><a href="../docs/serum_titers_Y154d182.html">Y154d182</a></li> +</ul> +<h2 id="analyses-of-per-plate-counts-and-curve-fits">Analyses of per-plate counts and curve fits</h2> +<ul> +<li><a href="../docs/process_counts_plate2.html">plate2</a></li> +<li><a href="../docs/process_counts_plate11.html">plate11</a></li> +</ul> \ No newline at end of file diff --git a/docs/process_counts_plate11.html b/docs/process_counts_plate11.html new file mode 100644 index 0000000..493226e --- /dev/null +++ b/docs/process_counts_plate11.html @@ -0,0 +1,9663 @@ +<!DOCTYPE html> + +<html lang="en"> +<head><meta charset="utf-8"/> +<meta content="width=device-width, initial-scale=1.0" name="viewport"/> +<title>process_counts_plate11 + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + diff --git a/docs/process_counts_plate2.html b/docs/process_counts_plate2.html new file mode 100644 index 0000000..d04e75c --- /dev/null +++ b/docs/process_counts_plate2.html @@ -0,0 +1,9663 @@ + + + + + +process_counts_plate2 + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + diff --git a/docs/serum_titers_M099d0.html b/docs/serum_titers_M099d0.html new file mode 100644 index 0000000..c21ce17 --- /dev/null +++ b/docs/serum_titers_M099d0.html @@ -0,0 +1,8213 @@ + + + + + +serum_titers_M099d0 + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + diff --git a/docs/serum_titers_M099d30.html b/docs/serum_titers_M099d30.html new file mode 100644 index 0000000..8589f71 --- /dev/null +++ b/docs/serum_titers_M099d30.html @@ -0,0 +1,8211 @@ + + + + + +serum_titers_M099d30 + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + diff --git a/docs/serum_titers_Y044d30.html b/docs/serum_titers_Y044d30.html new file mode 100644 index 0000000..b87cfa9 --- /dev/null +++ b/docs/serum_titers_Y044d30.html @@ -0,0 +1,8211 @@ + + + + + +serum_titers_Y044d30 + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + diff --git a/docs/serum_titers_Y154d182.html b/docs/serum_titers_Y154d182.html new file mode 100644 index 0000000..bfd9f2d --- /dev/null +++ b/docs/serum_titers_Y154d182.html @@ -0,0 +1,8204 @@ + + + + + +serum_titers_Y154d182 + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + diff --git a/docs/titers.html b/docs/titers.html new file mode 100644 index 0000000..ce661fb --- /dev/null +++ b/docs/titers.html @@ -0,0 +1,41 @@ + + + + + + + + + +
+ + + \ No newline at end of file diff --git a/scripts/build_docs.py b/scripts/build_docs.py new file mode 100644 index 0000000..1f23b1d --- /dev/null +++ b/scripts/build_docs.py @@ -0,0 +1,60 @@ +"""Implements ``snakemake`` rule to translate gene sequence.""" + + +import os +import shutil +import sys + +import markdown +import markdown.extensions.toc + + +sys.stderr = sys.stdout = log = open(snakemake.log[0], "w") + +copied_files = { + f: os.path.join(snakemake.output.docs, os.path.basename(f)) for f in snakemake.input +} +assert len(copied_files) == len(set(copied_files.values())) == len(snakemake.input) +os.makedirs(snakemake.output.docs, exist_ok=True) +for f in snakemake.input: + shutil.copy(f, snakemake.output.docs) +assert all(os.path.isfile(f) for f in copied_files.values()) + +md_text = [ + snakemake.params.description, + "", + # table of contents: https://python-markdown.github.io/extensions/toc/ + "[TOC]", + "", + "## Plot of titers for all sera", + f"[Interactive chart of titers]({copied_files[snakemake.input.titers_chart]})", + "", + "## Analyses of per-serum neutralization titers", +] + +assert len(snakemake.params.sera) == len(snakemake.input.serum_titers_htmls) +for serum, f in zip(snakemake.params.sera, snakemake.input.serum_titers_htmls): + md_text.append(f" - [{serum}]({copied_files[f]})") + +md_text += ["", "## Analyses of per-plate counts and curve fits"] +assert len(snakemake.params.plates) == len(snakemake.input.process_counts_htmls) +for plate, f in zip(snakemake.params.plates, snakemake.input.process_counts_htmls): + md_text.append(f" - [{plate}]({copied_files[f]})") + +md_text = "\n".join(md_text) + +print(f"Rendering the following markdown text:\n\n{md_text}\n\n") + +html = markdown.markdown( + md_text, + extensions=[ + markdown.extensions.toc.TocExtension( + title="Contents", + toc_depth="2-2", + ), + ], +) + +index_html = os.path.join(snakemake.output.docs, "index.html") +with open(index_html, "w") as f: + f.write(html) diff --git a/seqneut-pipeline.smk b/seqneut-pipeline.smk index 2f846ac..ba00813 100644 --- a/seqneut-pipeline.smk +++ b/seqneut-pipeline.smk @@ -270,18 +270,7 @@ rule build_docs: plate=plates, ), output: - index=os.path.join(config["docs"], "index.html"), - titers_chart=os.path.join( - config["docs"], os.path.basename(rules.aggregate_titers.output.titers_chart), - ), - serum_titers_htmls=lambda wc: expand( - os.path.join(config["docs"], "serum_titers_{serum}.html"), - serum=sera_plates(), - ), - process_counts_htmls=expand( - os.path.join(config["docs"], "process_counts_{plate}.html"), - plate=plates, - ), + docs=directory(config["docs"]), params: description=config["description"], sera=lambda wc: list(sera_plates()), @@ -299,5 +288,5 @@ seqneut_pipeline_outputs = [ rules.qc_serum_titers.output.qc_summary, rules.aggregate_titers.output.titers, rules.aggregate_titers.output.pickle, - rules.build_docs.output.index, + rules.build_docs.output.docs, ] diff --git a/test_example/results/sera/M099d30/titers_median.csv b/test_example/results/sera/M099d30/titers_median.csv old mode 100755 new mode 100644 From e49638f5706a400804fa812df6d555df10b0ae8f Mon Sep 17 00:00:00 2001 From: jbloom Date: Thu, 14 Dec 2023 15:03:46 -0500 Subject: [PATCH 4/5] describe `docs` in README --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 5b985e8..502d316 100644 --- a/README.md +++ b/README.md @@ -453,6 +453,10 @@ You will need to address these QC failures by adjusting `serum_titers_qc_exclusi It is expected that you may have to perform several iterations of running and fixing QC failures. The pipeline will only run to completion when all all QC filters are passed. +## Rendering HTML plots and notebooks in docs +If the pipeline runs to completion, it will create HTML documentation with plots of the overall titers, per-serum titer analyses, and per-plate analyses in a docs subdirectory, which will typically named be `./docs/` (if you use suggested key in configuration YAML). +This HTML documentation can be rendered via [GitHub Pages](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site) from the `./docs/` directory. + ## Test example and testing via GitHub Actions The [./test_example](test_example) subdirectory contains a small test example that illustrates use of the pipeline. From c1988119e4cd2315da8998da595f23feed6fdad6 Mon Sep 17 00:00:00 2001 From: jbloom Date: Thu, 14 Dec 2023 15:34:10 -0500 Subject: [PATCH 5/5] update example README --- test_example/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/test_example/README.md b/test_example/README.md index 5f0fb84..ddf2876 100644 --- a/test_example/README.md +++ b/test_example/README.md @@ -15,3 +15,7 @@ To run the example, build the `seqneut-pipeline` `conda` environment in [../envi snakemake -j --use-conda --keep-going This will create the results in [./results/](results). + +The HTML documentation for the example is rendered on GitHub pages at [https://jbloomlab.github.io/seqneut-pipeline](https://jbloomlab.github.io/seqneut-pipeline). + +The files [expected_titers_for_test.csv](expected_titers_for_test.csv) and [test_titers_as_expected.py](test_titers_as_expected.py) are files for testing the pipeline.