Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Manta for identifying candidate larger indels #142

Open
iskandr opened this issue Aug 6, 2019 · 2 comments
Open

Add Manta for identifying candidate larger indels #142

iskandr opened this issue Aug 6, 2019 · 2 comments

Comments

@iskandr
Copy link
Contributor

iskandr commented Aug 6, 2019

Installation:

conda install -c bioconda manta

Usage:

configManta.py \
--normalBam normal.cram \
--tumorBam tumor.cram \
--referenceFasta genome.fa \
--runDir ${MY_MANTA_WORKDIR} \
--callRegions canonicalChromosomes.bed \
--exome 

Followed by:

${MY_MANTA_WORKDIR}/runWorkflow.py -j ${NUM_CORES}

Notes:

  • For WES data we're including the --exome flag, for WGS data (which is less likely to have very deep coverage regions), omit this flag.
  • To improve performance I'm filtering the call regions to canonical chromosomes using the --callRegions flag. This expects a BED file, for example this GRCh38-specific BED file: https://github.com/Illumina/manta/blob/master/docs/userGuide/README.md#extended-use-cases -- other genomes will need their own BED file and this option should be omitted for custom genomes.
@iskandr
Copy link
Contributor Author

iskandr commented Aug 7, 2019

One possible wrinkle: Manta requires python 2.6 or 2.7. I'm running it inside a python3 conda env but I think it's picking up the base installed Python:

The configManta.py script starts with:

#!/usr/bin/env python2

@iskandr
Copy link
Contributor Author

iskandr commented Aug 7, 2019

Files generated by Manta:

*diploidSV.vcf.gz*
SVs and indels scored and genotyped under a diploid model for the set of samples in a joint diploid sample analysis or for the normal sample in a tumor/normal subtraction analysis. In the case of a tumor/normal subtraction, the scores in this file do not reflect any information from the tumor sample.

*somaticSV.vcf.gz*
SVs and indels scored under a somatic variant model. This file will only be produced if a tumor sample alignment file is supplied during configuration

*candidateSV.vcf.gz*
Unscored SV and indel candidates. Only a minimal amount of supporting evidence is required for an SV to be entered as a candidate in this file. An SV or indel must be a candidate to be considered for scoring, therefore an SV cannot appear in the other VCF outputs if it is not present in this file. Note that by default this file includes indels of size 8 and larger. The smallest indels in this set are intended to be passed on to a small variant caller without scoring by manta itself (by default manta scoring starts at size 50).

*candidateSmallIndels.vcf.gz*
Subset of the candidateSV.vcf.gz file containing only simple insertion and deletion variants less than the minimum scored variant size (50 by default). Passing this file to a small variant caller will provide continuous coverage over all indel sizes when the small variant caller and manta outputs are evaluated together. Alternate small indel candidate sets can be parsed out of the candidateSV.vcf.gz file if this candidate set is not appropriate.

The passing somatic structural variants are in somaticSV.vcf.gz. The smaller indels get filtered out into candidateSmallIndels.vcf.gz, which should be used as an input to Strelka2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant