Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 2.73 KB

README.md

File metadata and controls

51 lines (35 loc) · 2.73 KB

The PrecisionCallerPipeline (PCP)

The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.

Prerequisites

We use a workflow based on Snakemake in a Linux-based system with:

  • Awk, for SAM file editing;
  • BEDTools, for BAM to FASTQ conversion;
  • BWA-MEM, for read alignment;
  • Pycision, for amplicon delimitation and selection;
  • RtN!, for NUMT removal;
  • SAMtools, for BAM conversion, sorting, indexing, and merging;
  • Trimmomatic, for read quality control and trimming.

Installation

Install the software above and clone this repo to your directory of choice:

git clone https://github.com/filcfig/PCP.git

Add pycision.py, trimmomatic-0.39.jar, and the RtN folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa) to the tools folder.

Usage

Start by adding the FASTQ files to the sequencing/selected_fastqfiles folder. Then, make run_FASTQ.sh executable and run it (make sure Snakemake is activated - if you use conda, type conda activate snakemake):

chmod +x run_FASTQ.sh
./run_FASTQ.sh

Since running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN, by running Snakefile_noRtN instead:

snakemake -s Snakefile_noRtN -j

The final BAM files will be available at the sequencing/merged folder.

Data

The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.

Citation

Our manuscript is published at:

Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.

License

Distributed under the MIT License. See LICENSE for more information.