NanoCaller is a computational method that integrates long reads in deep convolutional neural network for the detection of SNPs/indels from long-read sequencing data. NanoCaller uses long-range haplotype structure to generate predictions for each SNP candidate variant site by considering pileup information of other candidate sites sharing reads. Subsequently, it performs read phasing, and carries out local realignment of each set of phased reads and the set of all reads for each indel candidate variant site to generate indel calling, and then creates consensus sequences for indel sequence prediction.
NanoCaller is distributed under the MIT License by Wang Genomics Lab.
v3.6.0 (April 22 2024): CSI indices generated for VCF files instead of TBI to accommodate larger contigs.
v3.5.0 (March 27 2024): CRAM files are supported in input as well as in phased output if whatshap version>=2 is being used with NanoCaller.
v3.4.0 (July 31 2023): VCF files contain total and strand-specific allele depths for SNP calls from SNP calling models. A new mode for short ONT reads (5-10kbp) added. --phase_qual_score
parameter filters out low quality SNP calls from phasing by WhatsHap; these SNP calls are kept in the output, but neither phased nor used for phasing reads.
v3.3.0 (July 14 2023): Detailed description of SNP calls, including unfiltered SNP calls for variants determined to be false by NanoCaller, and inclusion of per-base probability output. Quality score has been adjusted to be on Phred scale.
v3.2.0 (May 14 2023): Support added for haploid variant calling which has significant improvement in recall for indel calling. New feature generation methods and models are are used for haploid SNP and indel calling. Now chrY and chrM are assumed to be haploid, with additional parameter --haploid_X to specify if chrX is haploid. Another parameter --haploid_genome can be used for haploid variant calling on all chromosomes.
v3.0.1 (March 14 2023) : Several critical bugs regarding coverage normalization and integer overflow fixed. These bug affected very low and high coverage sample. The normalization bug was only introduced in v3.0.0 so any samples processed before that should not have been affected. Whereas integer overflow bug was much older and it only was affecting sample with more than 256 coverage.
v3.0.0 (June 7 2022) : A major update in API with single entry point for running NanoCaller. Major changes in parallelization routine with GNU parallel no longer used for whole genome variant calling.
v2.0.0 (Feb 2 2022) : A major update in API and installation instructions, with release of bioconda recipe for NanoCaller. Added support for indel calling in case of poor or non-existent phasing.
v1.0.0 (Aug 8 2021) : First post-production release with citeable DOI:
NanoCaller can be installed using Docker or Conda. The easiest way to install is from the bioconda channel:
conda install -c bioconda nanocaller
or using Docker:
VERSION="3.4.1"
docker pull genomicslab/nanocaller:${VERSION}
or using Singularity:
VERSION="3.4.1"
singularity pull docker://genomicslab/nanocaller:${VERSION}
Please refer to Installation for instructions regarding installing NanoCaller through other methods.
General usage of NanoCaller is described in Usage. Some quick usage examples:
NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10
will run NanoCaller on whole genome using 10 parallel processes.NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --mode snps
will only call SNPs.NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --mode snps --phase
will only call SNPs and phase them, and will additionally phase the BAM file (under intermediate_phase_files subfolder split by chromosomes).NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --haploid_genome
will run NanoCaller on whole genome under the assumption that the genome is haploid.NanoCaller --bam YOUR_BAM --ref YOUR_REF --cpu 10 --regions chr22:20000000-21000000 chr21
will NanoCaller on chr21 and chr22:20000000-21000000 only.
For a comprehensive case study of variant calling on Nanopore reads, see ONT Case Study, where we describe end-to-end variant calling pipeline for using NanoCaller, where we start with aligning FASTQ files of HG002, calls variants using NanoCaller, and evaluate performances on various genomic regions.
Trained models for ONT data, CLR data and HIFI data can be found here. These models are trained on chr1-22 of the genomes stated below, unless mentioned othewise.
You can specify SNP and indel models using --snp_model
and --indel_model
parameters with a model name from tables below. For instance, if you want to use 'ONT-HG002_bonito' SNP model and 'ONT-HG002' indel model, use the following command:
NanoCaller --snp_model ONT-HG002_bonito --indel_model ONT-HG002
Model Name | Sequencing Technology | Genome | Coverage | Benchmark | Basecaller |
---|---|---|---|---|---|
ONT-HG001 | ONT R9.4.1 | HG001 | 55 | v3.3.2 | Guppy4.2.2 |
ONT-HG001_GP2.3.8 | ONT R9.4.1 | HG001 | 34 | v3.3.2 | Guppy2.3.8 |
ONT-HG001_GP2.3.8-4.2.2 | ONT R9.4.1 | HG001 | 45 | v3.3.2 | Guppy (2.3.8 + 4.2.2) |
ONT-HG001-4_GP4.2.2 | ONT R9.4.1 | HG001-4 | 69 | v3.3.2 (HG001) + v4.2.1 (HG002-4) | Guppy4.2.2 |
ONT-HG002 | ONT R9.4.1 | HG002 | 47 | v4.2.1 | Guppy4.2.2 |
ONT-HG002_GP4.2.2_v3.3.2 | ONT R9.4.1 | HG002 | 47 | v3.3.2 | Guppy4.2.2 |
ONT-HG002_GP2.3.4_v3.3.2 | ONT R9.4.1 | HG002 | 53 | v3.3.2 | Guppy2.3.4 |
ONT-HG002_GP2.3.4_v4.2.1 | ONT R9.4.1 | HG002 | 53 | v4.2.1 | Guppy2.3.4 |
ONT-HG002_bonito | ONT R9.4.1 | HG002 (chr1-21) | 51 | v4.2.1 | Bonito v0.30 |
ONT-HG002_r10.3 | ONT R10.3 | HG002 (chr1-21) | 32 | v4.2.1 | Guppy4.0.11 |
CCS-HG001 | PacBio CCS | HG001 | 57 | v3.3.2 | - |
CCS-HG002 | PacBio CCS | HG002 | 56 | v4.2.1 | - |
CCS-HG001-4 | PacBio CCS | HG001-4 | 55 | v3.3.2 (HG001) + v4.2.1 (HG002-4) | Guppy4.2.2 |
CLR-HG002 | PacBio CLR | HG002 | 58 | v4.2.1 | - |
NanoCaller1 | ONT R9.4.1 | HG001 | 34 | v3.3.2 | Guppy2.3.8 |
NanoCaller2 | ONT R9.4.1 | HG002 | 53 | v3.3.2 | Guppy2.3.4 |
NanoCaller3 | PacBio CLR | HG003 | 28 | v3.3.2 | - |
Model Name | Sequencing Technology | Genome | Coverage | Benchmark | Basecaller |
---|---|---|---|---|---|
ONT-HG001 | ONT R9.4.1 | HG001 | 55 | v3.3.2 | Guppy4.2.2 |
ONT-HG002 | ONT R9.4.1 | HG002 | 47 | v4.2.1 | Guppy4.2.2 |
CCS-HG001 | PacBio CCS | HG001 | 57 | v3.3.2 | - |
CCS-HG002 | PacBio CCS | HG002 | 56 | v4.2.1 | - |
NanoCaller1 | ONT R9.4.1 | HG001 | 34 | v3.3.2 | Guppy2.3.8 |
NanoCaller3 | PacBio CCS | HG001 | 29 | v3.3.2 | - |
Please cite: Ahsan, M.U., Liu, Q., Fang, L. et al. NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22, 261 (2021). https://doi.org/10.1186/s13059-021-02472-2.