Skip to content

Commit

Permalink
major API update with new parallelization scheme
Browse files Browse the repository at this point in the history
  • Loading branch information
umahsn committed Jun 8, 2022
1 parent 2abfb78 commit 85af00e
Show file tree
Hide file tree
Showing 13 changed files with 806 additions and 1,256 deletions.
1 change: 0 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,3 @@ COPY ./NanoCaller /opt/conda/envs/nanocaller_env/bin
COPY ./NanoCaller_WGS /opt/conda/envs/nanocaller_env/bin

RUN chmod +x /opt/conda/envs/nanocaller_env/bin/NanoCaller
RUN chmod +x /opt/conda/envs/nanocaller_env/bin/NanoCaller_WGS
231 changes: 67 additions & 164 deletions NanoCaller

Large diffs are not rendered by default.

290 changes: 0 additions & 290 deletions NanoCaller_WGS

This file was deleted.

19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ NanoCaller is a computational method that integrates long reads in deep convolut
NanoCaller is distributed under the [MIT License by Wang Genomics Lab](https://wglab.mit-license.org/).

## Latest Updates
_**v3.0.0** (June 7 2022)_ : A major update in API with single entry point for running NanoCaller. Major changes in parallelization routine with GNU parallel no longer used for whole genome variant calling.

_**v2.0.0** (Feb 2 2022)_ : A major update in API and installation instructions, with release of bioconda recipe for NanoCaller. Added support for indel calling in case of poor or non-existent phasing.

_**v1.0.0** (Aug 8 2021)_ : First post-production release with citeable DOI: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5176764.svg)](https://doi.org/10.5281/zenodo.5176764)
Expand All @@ -13,8 +15,6 @@ _**v0.4.1** (Aug 3 2021)_ : Fixed a bug causing slower runtime in whole genome v

_**v0.4.0** (June 2 2021)_ : Added NanoCaller models trained on ONT reads basecalled with Guppy v4.2.2 and Bonito v0.30, as well as R10.3 reads. Added new NanoCaller models trained with long CCS reads (15-20kb library selection). Improved indel calling with rolling window for candidate selection which helps with indels in low complexity regions.

## Citing NanoCaller
Please cite: Ahsan, M.U., Liu, Q., Fang, L. et al. NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22, 261 (2021). https://doi.org/10.1186/s13059-021-02472-2.

## Installation
NanoCaller can be installed using Docker or Conda. The easiest way to install is from the bioconda channel:
Expand All @@ -24,20 +24,20 @@ NanoCaller can be installed using Docker or Conda. The easiest way to install is
or using Docker:

```
VERSION="2.0.0"
VERSION="3.0.0"
docker pull genomicslab/nanocaller:${VERSION}
```
Please refer to [Installation](docs/Install.md) for instructions regarding installing NanoCaller through other methods.

## Usage
General usage of NanoCaller is described in [Usage](docs/Usage.md). For a comprehensive case study of variant calling on Nanopore reads, see [ONT Case Study](docs/ONT%20Case%20Study.md), where we describe end-to-end variant calling pipeline for using NanoCaller, where we start with aligning FASTQ files of HG002, calls variants using NanoCaller, and evaluate performances on various genomic regions.
General usage of NanoCaller is described in [Usage](docs/Usage.md). Some quick usage examples:

## Example
An example of NanoCaller usage is provided in [sample](sample). The results are stored in [test output](sample/test_run) and were created using the following command:
- `NanoCaller --bam YOU_BAM --ref YOU_REF --cpu 10` will run NanoCaller on whole genome using 10 parallel processes.
- `NanoCaller --bam YOU_BAM --ref YOU_REF --cpu 10 --regions chr22:20000000-21000000 chr21` will NanoCaller on chr21 and chr22:20000000-21000000 only.
- `NanoCaller --bam YOU_BAM --ref YOU_REF --cpu 10 --mode snps` will only call SNPs.

`NanoCaller-bam HG002.nanopore.chr22.sample.bam -p ont -o test_run -chrom chr22 -start 20000000 -end 21000000 -ref chr22_ref.fa -cpu 4 > log`
For a comprehensive case study of variant calling on Nanopore reads, see [ONT Case Study](docs/ONT%20Case%20Study.md), where we describe end-to-end variant calling pipeline for using NanoCaller, where we start with aligning FASTQ files of HG002, calls variants using NanoCaller, and evaluate performances on various genomic regions.

which is also in the file [sample_call](sample/sample_call). This example should take about 10-15 minutes to run.

## Trained models
Trained models for [ONT](https://github.com/WGLab/NanoCaller/tree/master/nanocaller_src/release_data/ONT_models) data, [CLR](https://github.com/WGLab/NanoCaller/tree/master/nanocaller_src/release_data/clr_models) data and [HIFI](https://github.com/WGLab/NanoCaller/tree/master/nanocaller_src/release_data/hifi_models) data can be found [here](https://github.com/WGLab/NanoCaller/tree/master/nanocaller_src/release_data). These models are trained on chr1-22 of the genomes stated below, unless mentioned othewise.
Expand Down Expand Up @@ -79,3 +79,6 @@ You can specify SNP and indel models using `--snp_model` and `--indel_model` par
| CCS-HG002 | PacBio CCS | HG002 | 56 | v4.2.1 | \- |
| NanoCaller1 | ONT R9.4.1 | HG001 | 34 | v3.3.2 | Guppy2.3.8 |
| NanoCaller3 | PacBio CCS | HG001 | 29 | v3.3.2 | \- |

## Citing NanoCaller
Please cite: Ahsan, M.U., Liu, Q., Fang, L. et al. NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22, 261 (2021). https://doi.org/10.1186/s13059-021-02472-2.
30 changes: 21 additions & 9 deletions docs/Install.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Installation
There are three ways to install and run NanoCaller, via Docker, Singularity or conda.
There are three ways to install and run NanoCaller, via Docker, Singularity or Conda.

NanoCaller has been developed and tested to work with Linux OS; we do not recommend using Windows or Mac OS. However, if you use Windows or Mac OS and have Docker installed on your machine, you can run NanoCaller inside a Docker container. NanoCaller does not require a GPU or any other special hardware to run.

Expand All @@ -8,9 +8,6 @@ Please check the [NanoCaller Docker Hub repository](https://hub.docker.com/repos

## Conda Installation

You can install NanoCaller in conda using:
`conda install -c bioconda nanocaller`

If you do not have Anaconda, you will need to install it first. Here, we show how to install Miniconda, a minimal installation of Anaconda, which is much smaller and has a faster installation:

```
Expand All @@ -19,13 +16,28 @@ bash Miniconda3-latest-Linux-x86_64.sh
```
Go through all the prompts (installation in `$HOME` is recommended). The installation should take about 10 minutes, including the installation of Miniconda.

### Bioconda
You can install NanoCaller in conda using Bioconda recipe:
`conda install -c bioconda nanocaller`

It is recommened that you install NanoCaller in a new conda environment to avoid any package conflict, in the following way:
It is recommened that you install NanoCaller in a new conda environment to avoid any package conflict and use mamba for fast installation, in the following way:
```
conda create -n nanocaller_env -c conda-forge mamba
conda activate nanocaller_env
mamba install -c bioconda nanocaller
```
conda create -n nanocaller_env -c bioconda nanocaller

### Manual Installation
You can obtain the latest NanoCaller version from github that has not yet been pushed to bioconda via manual installation.

```
git clone https://github.com/WGLab/NanoCaller.git
conda env create -f NanoCaller/environment.yml
chmod +x NanoCaller/NanoCaller
conda activate nanocaller_env
```

Then you can run NanoCaller using `PATH_TO_NANOCALLER_REPO/NanoCaller --help`.


## Docker Installation
Expand All @@ -34,7 +46,7 @@ For instructions regarding Docker installation, please visit [Docker website](ht
### 1) via Docker Hub (preferred)
You can pull NanoCaller docker images from Docker Hub by specifiying a version number.
```
VERSION="2.0.0"
VERSION="3.0.0"
docker run genomicslab/nanocaller:${VERSION} NanoCaller --help
```

Expand All @@ -44,13 +56,13 @@ If you want to build an image for the latest commit of NanoCaller Github reposit
```
git clone https://github.com/WGLab/NanoCaller.git
docker build -t nanocaller NanoCaller
docker run nanocaller NanoCaller --help
docker run nanocaller NanoCaller --help
```

## Singularity
For instructions regarding Singularity installation, please visit [Singularity website] (https://sylabs.io/guides/3.7/user-guide/quick_start.html).
```
VERSION="2.0.0"
VERSION="3.0.0"
singularity pull docker://genomicslab/nanocaller:${VERSION}
singularity exec -e --pwd /app nanocaller_${VERSION}.sif NanoCaller --help
```
10 changes: 5 additions & 5 deletions docs/ONT Case Study.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ GM24385_2_Guppy_4.2.2_prom.fastq.gz GM24385_3_Guppy_4.2.2_prom.fastq.gz -t $CPU
samtools index HG002.Guppy_4.2.2_prom.bam -@ $CPU
# run nanocaller
VERSION=2.0.0
docker run -it -v ${PWD}:'/mnt/' genomicslab/nanocaller:${VERSION} NanoCaller_WGS \
-bam /mnt/HG002.Guppy_4.2.2_prom.bam -ref /mnt/GRCh38.fa -prefix HG002 -p ont \
-o /mnt/calls -cpu $CPU --exclude_bed hg38
VERSION=3.0.0
docker run -it -v ${PWD}:'/mnt/' genomicslab/nanocaller:${VERSION} NanoCaller \
--bam /mnt/HG002.Guppy_4.2.2_prom.bam --ref /mnt/GRCh38.fa --prefix HG002 --preset ont \
--output /mnt/calls --cpu $CPU --exclude_bed hg38 --wgs_contigs chr1-22XY
# If you want to run NanoCaller without docker, run the following command `NanoCaller_WGS -bam HG002.Guppy_4.2.2_prom.bam -ref GRCh38.fa -prefix HG002 -p ont -o calls --exclude_bed hg38 -cpu $CPU`
# If you want to run NanoCaller without docker, run the following command `NanoCaller --bam HG002.Guppy_4.2.2_prom.bam --ref GRCh38.fa --prefix HG002 --preset ont --output calls --exclude_bed hg38 --cpu $CPU --wgs_contigs chr1-22XY`
# run `conda install -c bioconda bedtools` to install bedtools to create BED files for variant calling evaluation in difficult-to-map genomic regions.
Expand Down
Loading

0 comments on commit 85af00e

Please sign in to comment.