This is a subworkflow that is part of the Kids First DRC Somatic Variant Workflow that can be run as standalone. Mutect2 is a variant caller that calls single nucleotide variants, multi-nucleotide variants, and small insertions and deletions.
In general, Mutect2 is run following the Broad Best Practices, as of this workflow. This subworkflow does the following things as described below:
- Run the Mutect2 variant caller tool, scattering on the input interval list for greater speed
- Filter tables are created using steps as outlined in section II of this documentation
- Scattered mutect2 VCF results are merged into a single VCF
- Stats tables from Mutect2 run are merged
- Merged stats table and outputs from filter tables are used to populate the
FILTER
attribute in the merged Mutect2 VCF. This creates the pre-pass VCF containingPASS
and variants with various failing-to-pass criteria in theFILTER
field - Select
PASS
step subsets the pre-pass VCF toPASS
only - Annotate the
PASS
VCF using the annotation sub workflow - most relevant information is here! - Rename outputs to fit a standard format
This workflow runs Mutect2 4.1.1.0 which calls single nucleotide variants (SNV) and insertions/deletions (INDEL).
inputs:
indexed_reference_fasta: {type: File, secondaryFiles: [.fai, ^.dict]}
reference_dict: File
bed_invtl_split: {type: 'File[]', doc: "Bed file intervals passed on from and outside pre-processing step"}
af_only_gnomad_vcf: {type: File, secondaryFiles: ['.tbi']}
exac_common_vcf: {type: File, secondaryFiles: ['.tbi']}
input_tumor_aligned:
type: File
secondaryFiles: |
${
var dpath = self.location.replace(self.basename, "")
if(self.nameext == '.bam'){
return {"location": dpath+self.nameroot+".bai", "class": "File"}
}
else{
return {"location": dpath+self.basename+".crai", "class": "File"}
}
}
doc: "tumor BAM or CRAM"
input_tumor_name: string
input_normal_aligned:
type: File
secondaryFiles: |
${
var dpath = self.location.replace(self.basename, "")
if(self.nameext == '.bam'){
return {"location": dpath+self.nameroot+".bai", "class": "File"}
}
else{
return {"location": dpath+self.basename+".crai", "class": "File"}
}
}
doc: "normal BAM or CRAM"
input_normal_name: string
exome_flag: {type: ['null', string], doc: "set to 'Y' for exome mode"}
output_basename: string
getpileup_memory: {type: int?}
learnorientation_memory: {type: int?}
filtermutectcalls_memory: {type: int?}
select_vars_mode: {type: ['null', {type: enum, name: select_vars_mode, symbols: ["gatk", "grep"]}], doc: "Choose 'gatk' for SelectVariants tool, or 'grep' for grep expression", default: "gatk"}
tool_name: {type: string?, doc: "String to describe what tool was run as part of file name", default: "mutect2_somatic"}
# VEP params
vep_cache: {type: 'File', doc: "tar gzipped cache from ensembl/local converted cache", "sbg:suggestedValue": {class: File, path: 6332f8e47535110eb79c794f,
name: homo_sapiens_merged_vep_105_indexed_GRCh38.tar.gz}}
vep_ram: {type: 'int?', default: 32, doc: "In GB, may need to increase this value depending on the size/complexity of input"}
vep_cores: {type: 'int?', default: 16, doc: "Number of cores to use. May need to increase for really large inputs"}
vep_buffer_size: {type: 'int?', default: 1000, doc: "Increase or decrease to balance speed and memory usage"}
dbnsfp: { type: 'File?', secondaryFiles: [.tbi,^.readme.txt], doc: "VEP-formatted plugin file, index, and readme file containing dbNSFP annotations" }
dbnsfp_fields: { type: 'string?', doc: "csv string with desired fields to annotate. Use ALL to grab all"}
merged: { type: 'boolean?', doc: "Set to true if merged cache used", default: true }
cadd_indels: { type: 'File?', secondaryFiles: [.tbi], doc: "VEP-formatted plugin file and index containing CADD indel annotations" }
cadd_snvs: { type: 'File?', secondaryFiles: [.tbi], doc: "VEP-formatted plugin file and index containing CADD SNV annotations" }
run_cache_existing: { type: 'boolean?', doc: "Run the check_existing flag for cache" }
run_cache_af: { type: 'boolean?', doc: "Run the allele frequency flags for cache" }
# annotation vars
genomic_hotspots: { type: 'File[]?', doc: "Tab-delimited BED formatted file(s) containing hg38 genomic positions corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 607713829360f10e3982a423, name: tert.bed}] }
protein_snv_hotspots: { type: 'File[]?', doc: "Column-name-containing, tab-delimited file(s) containing protein names and amino acid positions corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 66980e845a58091951d53984, name: kfdrc_protein_snv_cancer_hotspots_20240718.txt}] }
protein_indel_hotspots: { type: 'File[]?', doc: "Column-name-containing, tab-delimited file(s) containing protein names and amino acid position ranges corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 663d2bcc27374715fccd8c6f, name: protein_indel_cancer_hotspots_v2.ENS105_liftover.tsv}] }
retain_info: {type: 'string?', doc: "csv string with INFO fields that you want to keep", default: "gnomad_3_1_1_AC,gnomad_3_1_1_AN,gnomad_3_1_1_AF,gnomad_3_1_1_nhomalt,gnomad_3_1_1_AC_popmax,gnomad_3_1_1_AN_popmax,gnomad_3_1_1_AF_popmax,gnomad_3_1_1_nhomalt_popmax,gnomad_3_1_1_AC_controls_and_biobanks,gnomad_3_1_1_AN_controls_and_biobanks,gnomad_3_1_1_AF_controls_and_biobanks,gnomad_3_1_1_AF_non_cancer,gnomad_3_1_1_primate_ai_score,gnomad_3_1_1_splice_ai_consequence,MBQ,TLOD,HotSpotAllele"}
retain_fmt: {type: 'string?', doc: "csv string with FORMAT fields that you want to keep"}
retain_ann: { type: 'string?', doc: "csv string of annotations (within the VEP CSQ/ANN) to retain as extra columns in MAF", default: "HGVSg" }
echtvar_anno_zips: {type: 'File[]?', doc: "Annotation ZIP files for echtvar anno"}
bcftools_strip_columns: {type: 'string?', doc: "csv string of columns to strip if needed to avoid conflict, i.e INFO/AF"}
bcftools_public_filter: {type: 'string?', doc: "Will hard filter final result to create a public version", default: FILTER="PASS"|INFO/HotSpotAllele=1}
gatk_filter_name: {type: 'string[]', doc: "Array of names for each filter tag to add, recommend: [\"NORM_DP_LOW\", \"GNOMAD_AF_HIGH\"]"}
gatk_filter_expression: {type: 'string[]', doc: "Array of filter expressions to establish criteria to tag variants with. See https://gatk.broadinstitute.org/hc/en-us/articles/360036730071-VariantFiltration, recommend: \"vc.getGenotype('\" + inputs.input_normal_name + \"').getDP() <= 7\"), \"gnomad_3_1_1_AF != '.' && gnomad_3_1_1_AF > 0.001\"]"}
disable_hotspot_annotation: { type: 'boolean?', doc: "Disable Hotspot Annotation and skip this task.", default: false }
maf_center: {type: 'string?', doc: "Sequencing center of variant called", default: "."}
Recommended reference inputs - all file references can be obtained here*
Secondary files needed for each reference file will be a sub-bullet point.
- For recommendations for inputs in the
#annotation
section, see the annotation subworkflow docs.
indexed_reference_fasta
:Homo_sapiens_assembly38.fasta
Homo_sapiens_assembly38.fasta.fai
Homo_sapiens_assembly38.dict
reference_dict
:Homo_sapiens_assembly38.dict
hg38_strelka_bed
:hg38_strelka.bed.gz
bed_invtl_split
: an array of bed intervals, would be created by a wrapper workflowaf_only_gnomad_vcf
:af-only-gnomad.hg38.vcf.gz
af-only-gnomad.hg38.vcf.gz.tbi
echtvar_anno_zips
:gnomad_3_1_1.vwb_subset.echtvar_0_1_9.zip
exac_common_vcf
:small_exac_common_3.hg38.vcf.gz
small_exac_common_3.hg38.vcf.gz.tbi
exome_flag
Y
if exomeN
or leave blank if WGS
Occassionally, default memory values may not suffice to process the inputs. Edit where needed to unblock these steps. All memory units are in whole GB
getpileup_memory: {type: int?}
learnorientation_memory: {type: int?}
filtermutectcalls_memory: {type: int?}
outputs:
mutect2_filtered_stats: {type: File, outputSource: filter_mutect2_vcf/stats_table}
mutect2_filtered_vcf: {type: File, outputSource: filter_mutect2_vcf/filtered_vcf}
mutect2_protected_outputs: {type: 'File[]', outputSource: rename_protected/renamed_files}
mutect2_public_outputs: {type: 'File[]', outputSource: rename_public/renamed_files}
mutect2_filtered_stats
: Summary table ofFILTER
statsmutect2_filtered_vcf
: VCF with SNV, MNV, and INDEL variant calls. Contains all softFILTER
values generated by variant caller. Use this file if you believe important variants are being left out when using the algorithm'sPASS
filtermutect2_protected_outputs
: Array of files containing MAF format of PASS hits,PASS
VCF with annotation pipeline softFILTER
-added values, and VCF indexmutect2_public_outputs
: Same as above, except MAF and VCF have had entries with softFILTER
values removed