The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.
We use a workflow based on Snakemake in a Linux-based system with:
- Awk, for SAM file editing;
- BEDTools, for BAM to FASTQ conversion;
- BWA-MEM, for read alignment;
- Pycision, for amplicon delimitation and selection;
- RtN!, for NUMT removal;
- SAMtools, for BAM conversion, sorting, indexing, and merging;
- Trimmomatic, for read quality control and trimming.
Install the software above and clone this repo to your directory of choice:
git clone https://github.com/filcfig/PCP.git
Add pycision.py
, trimmomatic-0.39.jar
, and the RtN
folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa
) to the tools
folder.
Start by adding the FASTQ files to the sequencing/selected_fastqfiles
folder. Then, make run_FASTQ.sh
executable and run it (make sure Snakemake is activated - if you use conda
, type conda activate snakemake
):
chmod +x run_FASTQ.sh
./run_FASTQ.sh
Since running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN
, by running Snakefile_noRtN
instead:
snakemake -s Snakefile_noRtN -j
The final BAM files will be available at the sequencing/merged
folder.
The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.
Our manuscript is published at:
Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.
Distributed under the MIT License
. See LICENSE for more information.