Add read the docs (#31)

* add apidocs to gitignore * Add read the docs and initial docs * add __init__ files for autodoc * Move existing docs * Update index.rst * fixup! Format Python code with psf/black pull_request * Fix links to markdown files --------- Co-authored-by: PMBio <PMBio@users.noreply.github.com>
PMBio · Nov 22, 2023 · ef7bb5c · ef7bb5c
1 parent 21758ce
commit ef7bb5c
Show file tree

Hide file tree

Showing 18 changed files with 233 additions and 14 deletions.
diff --git a/.gitignore b/.gitignore
@@ -164,3 +164,4 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 .idea/
+/docs/apidocs/
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,20 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the OS, Python version and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+
+sphinx:
+   configuration: docs/conf.py
+   fail_on_warning: true
+
+python:
+   install:
+   - requirements: docs/requirements.txt
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 
 Rare variant association testing using deep learning and data-driven burden scores
 
+[![Documentation Status](https://readthedocs.org/projects/deeprvat/badge/?version=latest)](https://deeprvat.readthedocs.io/en/latest/?badge=latest)
 
 ## Installation
 
@@ -36,12 +37,12 @@ If you are running on an computing cluster, you will need a [profile](https://gi
 
 ### Run the preprocessing pipeline on VCF files
 
-Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/preprocessing/README.md)
+Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/docs/preprocessing.md)
 
 
 ### Annotate variants
 
-Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/annotations/README.md)
+Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/docs/annotations.md)
 
 
 

diff --git a/deeprvat/annotations/__init__.py b/deeprvat/annotations/__init__.py
diff --git a/deeprvat/deeprvat/__init__.py b/deeprvat/deeprvat/__init__.py
diff --git a/deeprvat/preprocessing/__init__.py b/deeprvat/preprocessing/__init__.py
diff --git a/deeprvat/seed_gene_discovery/__init__.py b/deeprvat/seed_gene_discovery/__init__.py
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/_static/annotation_pipeline_dag.png b/docs/_static/annotation_pipeline_dag.png
diff --git a/...at/preprocessing/preprocess_rulegraph.svg → docs/_static/preprocess_rulegraph.svg b/...at/preprocessing/preprocess_rulegraph.svg → docs/_static/preprocess_rulegraph.svg
diff --git a/deeprvat/annotations/README.md → docs/annotations.md b/deeprvat/annotations/README.md → docs/annotations.md
@@ -1,15 +1,17 @@
 # DeepRVAT Annotation pipeline
 
-This pipeline is based on [snakemake](https://snakemake.readthedocs.io/en/stable/). It uses [bcftools + samstools](https://www.htslib.org/), as well as [perl](https://www.perl.org/), [deepRiPe](https://ohlerlab.mdc-berlin.de/software/DeepRiPe_140/) and [deepSEA](http://deepsea.princeton.edu/) as well as [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html), including plugins for [primateAI](https://github.com/Illumina/PrimateAI) and  [spliceAI](https://github.com/Illumina/SpliceAI). DeepRiPe annotations were acquired using [faatpipe repository by HealthML](https://github.com/HealthML/faatpipe)[[1]](#1) and DeepSea annotations were calculated using [kipoi-veff2](https://github.com/kipoi/kipoi-veff2)[[2]](#2), abSplice scores were computet using [abSplice](https://github.com/gagneurlab/absplice/)[[3]](#3)
+This pipeline is based on [snakemake](https://snakemake.readthedocs.io/en/stable/). It uses [bcftools + samstools](https://www.htslib.org/), as well as [perl](https://www.perl.org/), [deepRiPe](https://ohlerlab.mdc-berlin.de/software/DeepRiPe_140/) and [deepSEA](http://deepsea.princeton.edu/) as well as [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html), including plugins for [primateAI](https://github.com/Illumina/PrimateAI) and  [spliceAI](https://github.com/Illumina/SpliceAI). DeepRiPe annotations were acquired using [faatpipe repository by HealthML](https://github.com/HealthML/faatpipe)[[1]](#reference-1-target) and DeepSea annotations were calculated using [kipoi-veff2](https://github.com/kipoi/kipoi-veff2)[[2]](#reference-2-target), abSplice scores were computet using [abSplice](https://github.com/gagneurlab/absplice/)[[3]](#reference-3-target)
 
-![dag](https://github.com/PMBio/deeprvat/assets/23211603/d483831e-3558-4e21-9845-4b62ad4eecc3)
+![dag](_static/annotation_pipeline_dag.png)
 *Figure 1: Example DAG of annoation pipeline using only two bcf files as input.*
 
 ## Input
 
-The pipeline uses left-normalized bcf files containing variant information, a reference fasta file as well as a text file that maps data blocks to chromosomes as input. It is expected that the bcf files contain the columns "CHROM" "POS" "ID" "REF" and "ALT". Any other columns, including genotype information are stripped from the data before annotation tools are used on the data. The variants may be split into several vcf files for each chromosome and each "block" of data. The filenames should then contain the corresponding chromosome and block number. The pattern of the file names, as well as file structure may be specified in the corresponding [config file](config/deeprvat_annotation_config.yaml).
+The pipeline uses left-normalized bcf files containing variant information, a reference fasta file as well as a text file that maps data blocks to chromosomes as input. It is expected that the bcf files contain the columns "CHROM" "POS" "ID" "REF" and "ALT". Any other columns, including genotype information are stripped from the data before annotation tools are used on the data. The variants may be split into several vcf files for each chromosome and each "block" of data. The filenames should then contain the corresponding chromosome and block number. The pattern of the file names, as well as file structure may be specified in the corresponding [config file](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_annotation_config.yaml).
 
+(requirements-target)=
 ## Requirements 
+
 BCFtools as well as HTSlib should be installed on the machine, 
 - [CADD](https://github.com/kircherlab/CADD-scripts/tree/master/src/scripts) as well as 
 - [VEP](http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html),  
@@ -18,7 +20,7 @@ BCFtools as well as HTSlib should be installed on the machine,
 - [faatpipe](https://github.com/HealthML/faatpipe), and the
 - [vep-plugins repository](https://github.com/Ensembl/VEP_plugins/)
 
-will be installed by the pipeline together with the [plugins](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html) for primateAI and spliceAI. Annotation data for CADD, spliceAI and primateAI should be downloaded. The path to the data may be specified in the corresponding [config file](config/deeprvat_annotation_config.yaml). 
+will be installed by the pipeline together with the [plugins](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html) for primateAI and spliceAI. Annotation data for CADD, spliceAI and primateAI should be downloaded. The path to the data may be specified in the corresponding [config file](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_annotation_config.yaml). 
 Download path:
 - [CADD](http://cadd.gs.washington.edu/download): "All possible SNVs of GRCh38/hg38" and "gnomad.genomes.r3.0.indel.tsv.gz" incl. their  Tabix Indices
 - [SpliceAI](https://basespace.illumina.com/s/otSPW8hnhaZR): "genome_scores_v1.3"/"spliceai_scores.raw.snv.hg38.vcf.gz" and "spliceai_scores.raw.indel.hg38.vcf.gz" 
@@ -30,7 +32,7 @@ Download path:
 The pipeline outputs one annotation file for VEP, CADD, DeepRiPe, DeepSea and Absplice for each input vcf-file. The tool further creates concatenated files for each tool and one merged file containing Scores from AbSplice, VEP incl. CADD, primateAI and spliceAI as well as principal components from DeepSea and DeepRiPe.
 
 ## Configure the annotation pipeline
-The snakemake annotation pipeline is configured using a yaml file with the format akin to the [example file](config/deeprvat_annotation_config.yaml).
+The snakemake annotation pipeline is configured using a yaml file with the format akin to the [example file](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_annotation_config.yaml).
 
 The config above would use the following directory structure:
 ```shell
@@ -81,20 +83,20 @@ Data for VEP plugins and the CADD cache are stored in `annotation data`.
 
 ## Running the annotation pipeline
 ### Preconfiguration
-- Inside the annotation directory create a directory `repo_dir` and run the [annotation setup script](setup_annotation_workflow.sh) 
+- Inside the annotation directory create a directory `repo_dir` and run the [annotation setup script](https://github.com/PMBio/deeprvat/blob/main/deeprvat/annotations/setup_annotation_workflow.sh) 
   ```shell
     setup_annotation_workflow.sh repo_dir/ensembl-vep/cache repo_dir/ensembl-vep/Plugins repo_dir
   ```
-  or manually clone the repositories mentioned in the [requirements](#requirements) into `repo_dir` and install the needed conda environments with  
+  or manually clone the repositories mentioned in the [requirements](#requirements-target) into `repo_dir` and install the needed conda environments with  
     ```shell
     mamba env create -f repo_dir/absplice/environment.yaml
     mamba env create -f repo_dir/kipoi-veff2/environment.minimal.linux.yml
     mamba env create -f deeprvat/deeprvat_annotations.yml
     ```
-  If you already have some of the needed repositories on your machine you can edit the paths in the [config](../../pipelines/config/deeprvat_annotation_config.yaml).
+  If you already have some of the needed repositories on your machine you can edit the paths in the [config](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_annotation_config.yaml).
 
 
-- Inside the annotation directory create a directory `annotation_dir` and download/link the prescored files for CADD, SpliceAI, and PrimateAI (see [requirements](#requirements))
+- Inside the annotation directory create a directory `annotation_dir` and download/link the prescored files for CADD, SpliceAI, and PrimateAI (see [requirements](#requirements-target))
 
 
 ### Running the pipeline
@@ -113,8 +115,13 @@ However, the annotation pipeline requires some files from this pipeline that the
 
 
 ## References
+
+(reference-1-target)=
 <a id="1">[1]</a> Monti, R., Rautenstrauch, P., Ghanbari, M. et al. Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes. Nat Commun 13, 5332 (2022). https://doi.org/10.1038/s41467-022-32864-2
 
+(reference-2-target)=
 <a id="2">[2]</a> Žiga Avsec et al., “Kipoi: accelerating the community exchange and reuse of predictive models for genomics,” bioRxiv, p. 375345, Jan. 2018, doi: 10.1101/375345.
 
+(reference-3-target)=
 <a id="3">[3]</a>N. Wagner et al., “Aberrant splicing prediction across human tissues,” Nature Genetics, vol. 55, no. 5, pp. 861–870, May 2023, doi: 10.1038/s41588-023-01373-3.
+
diff --git a/docs/conf.py b/docs/conf.py
@@ -0,0 +1,30 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = "DeepRVAT"
+copyright = "2023, Clarke, B., Holtkamp, E., Öztürk, H., Mück, M., Wahlberg, M., Meyer, K., Brechtmann, F., Hölzlwimmer, F. R., Gagneur, J., & Stegle, O"
+author = "Clarke, B., Holtkamp, E., Öztürk, H., Mück, M., Wahlberg, M., Meyer, K., Brechtmann, F., Hölzlwimmer, F. R., Gagneur, J., & Stegle, O"
+release = "0.1.0"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = ["autodoc2", "myst_parser", "sphinx_copybutton"]
+autodoc2_packages = [
+    "../deeprvat",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = "sphinx_rtd_theme"
+html_static_path = ["_static"]
diff --git a/docs/index.rst b/docs/index.rst
@@ -0,0 +1,28 @@
+.. DeepRVAT documentation master file, created by
+   sphinx-quickstart on Wed Nov 22 10:24:36 2023.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to DeepRVAT's documentation!
+====================================
+
+Rare variant association testing using deep learning and data-driven burden scores
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   usage.md
+   preprocessing.md
+   annotations.md
+   seed_gene_discovery.md
+   apidocs/index
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/deeprvat/preprocessing/README.md → docs/preprocessing.md b/deeprvat/preprocessing/README.md → docs/preprocessing.md
@@ -1,9 +1,9 @@
 # DeepRVAT Preprocessing pipeline
 
 The DeepRVAT preprocessing pipeline is based on [snakemake](https://snakemake.readthedocs.io/en/stable/) it uses
-[bcftools+samstools](https://www.htslib.org/) and a [python script](preprocess.py) preprocessing.py.
+[bcftools+samstools](https://www.htslib.org/) and a [python script](https://github.com/PMBio/deeprvat/blob/main/deeprvat/preprocessing/preprocess.py) preprocessing.py.
 
-![DeepRVAT preprocessing pipeline](./preprocess_rulegraph.svg)
+![DeepRVAT preprocessing pipeline](_static/preprocess_rulegraph.svg)
 
 ## Output
 
@@ -44,7 +44,7 @@ pip install -e .
 ## Configure preprocessing
 
 The snakemake preprocessing is configured using a yaml file with the format below.
-An example file is included in this repo: [example config](config/deeprvat_preprocess_config.yaml).
+An example file is included in this repo: [example config](https://github.com/PMBio/deeprvat/blob/main/pipelines/config/deeprvat_preprocess_config.yaml).
 
 ```yaml
 # What chromosomes should be processed

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,6 @@
+sphinx==7.2.6
+myst-parser==2.0.0
+sphinx-autodoc2==0.4.2
+astroid==2.15.8
+sphinx-copybutton==0.5.2
+sphinx-rtd-theme==1.3.0
diff --git a/deeprvat/seed_gene_discovery/README.md → docs/seed_gene_discovery.md b/deeprvat/seed_gene_discovery/README.md → docs/seed_gene_discovery.md
diff --git a/docs/usage.md b/docs/usage.md
@@ -0,0 +1,71 @@
+# Using DeepRVAT
+
+## Installation
+
+1. Clone this repository:
+```
+git clone git@github.com:PMBio/deeprvat.git
+```
+1. Change directory to the repository: `cd deeprvat`
+1. Install the conda environment. We recommend using [mamba](https://mamba.readthedocs.io/en/latest/index.html), though you may also replace `mamba` with `conda` 
+
+   *note: [the current deeprvat env does not support cuda when installed with conda](https://github.com/PMBio/deeprvat/issues/16), install using mamba for cuda support.*
+```
+mamba env create -n deeprvat -f deeprvat_env.yaml 
+```
+1. Activate the environment: `mamba activate deeprvat`
+1. Install the `deeprvat` package: `pip install -e .`
+
+If you don't want to install the gpu related requirements use the `deeprvat_env_no_gpu.yml` environment instead.
+```
+mamba env create -n deeprvat -f deeprvat_env_no_gpu.yaml 
+```
+
+
+## Basic usage
+
+### Customize pipelines
+
+Before running any of the snakefiles, you may want to adjust the number of threads used by different steps in the pipeline. To do this, modify the `threads:` property of a given rule.
+
+If you are running on an computing cluster, you will need a [profile](https://github.com/snakemake-profiles) and may need to add `resources:` directives to the snakefiles.
+
+
+### Run the preprocessing pipeline on VCF files
+
+Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/preprocessing/README.md)
+
+
+### Annotate variants
+
+Instructions [here](https://github.com/PMBio/deeprvat/blob/main/deeprvat/annotations/README.md)
+
+
+
+### Try the full training and association testing pipeline on some example data
+
+```
+mkdir example
+cd example
+ln -s [path_to_deeprvat]/example/* .
+snakemake -j 1 --snakefile [path_to_deeprvat]/pipelines/training_association_testing.snakefile
+```
+
+Replace `[path_to_deeprvat]` with the path to your clone of the repository.
+
+Note that the example data is randomly generated, and so is only suited for testing whether the `deeprvat` package has been correctly installed.
+
+
+### Run the association testing pipeline with pretrained models
+
+```
+mkdir example
+cd example
+ln -s [path_to_deeprvat]/example/* .
+ln -s [path_to_deeprvat]/pretrained_models
+snakemake -j 1 --snakefile [path_to_deeprvat]/pipelines/association_testing_pretrained.snakefile
+```
+
+Replace `[path_to_deeprvat]` with the path to your clone of the repository.
+
+Again, note that the example data is randomly generated, and so is only suited for testing whether the `deeprvat` package has been correctly installed.