Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 v1.0 release prep #19

Merged
merged 4 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,27 @@
# Kids First DRC Tumor Only Pipeline

This repository contains tools and workflows for processing of tumor-only samples.
It is currently in beta phase.
Much of the components have been borrowed from the Kids First Somatic Workflow.
It can also be used to process PDX data by first pre-processing reads using the Xenome tool, explained more here in documentation.
This repository contains tools and workflows for processing of tumor-only
samples. The Kids First DRC recommends running the tumor only pipeline ONLY
when no matched normal sample is available. If your data has matched normals
we recommend running the [Kids First DRC Somatic Variant
Workflow](https://github.com/kids-first/kf-somatic-workflow) instead. This
workflow is not a traditional production pipeline run on all data, but rather
is run at the user's request.

When comparing the SNV outputs of this workflow to those of the somatic workflow,
we have found the outputs to be considerably more noisy. To cut down on this
noise, we have included some recommended inputs, parameters, and filters for
Mutect2 [in our docs](./docs/MUTECT2_TUMOR_ONLY_FILTERING.md). In short we recommend:
- Restrict the callable regions with a blacklist and Panel of Normals (PON)
- Remove low support reads:
- Allele Depth (AD) == 0: WGS uninformative reads
- Variant Allele Frequency (VAF) < 1%: WXS noise
- Remove potential germline variants: gnomAD AF > 0.00003
- Only keep variants that are PASS
- Rescue any variants that fall in hotspot regions/genes

It can also be used to process PDX data by first pre-processing reads using the
Xenome tool, explained more here in documentation.

<p align="center">
<img src="docs/kids_first_logo.svg" alt="Kids First repository logo" width="660px" />
Expand Down
3 changes: 3 additions & 0 deletions subworkflows/kfdrc_controlfreec_sub_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ inputs:
b_allele: {type: ['null', File], doc: "germline calls, needed for BAF. VarDict input recommended. Tool will prefilter for germline and pass if expression given"}
coeff_var: {type: float, default: 0.05, doc: "Coefficient of variantion to set window size. Default 0.05 recommended"}
cfree_sex: {type: ['null', {type: enum, name: sex, symbols: ["XX", "XY"] }], doc: "If known, XX for female, XY for male"}
tool_name: { type: 'string?', doc: "Tool name to use in outputs." }

outputs:
ctrlfreec_cnvs: {type: File, outputSource: rename_outputs/ctrlfreec_cnvs}
Expand Down Expand Up @@ -82,6 +83,7 @@ steps:
in:
input_files: [control_free_c/cnvs, control_free_c/cnvs_pvalue, control_free_c/config_script, control_free_c/ratio, control_free_c/sample_BAF, control_free_c/info_txt]
input_pngs: control_free_c/pngs
tool_name: tool_name
output_basename: output_basename
out: [ctrlfreec_cnvs, ctrlfreec_pval, ctrlfreec_config, ctrlfreec_pngs, ctrlfreec_bam_ratio, ctrlfreec_baf, ctrlfreec_info]

Expand All @@ -95,4 +97,5 @@ steps:
ctrlfreec_ratio: control_free_c/ratio
sample_name: input_tumor_name
output_basename: output_basename
tool_name: tool_name
out: [ctrlfreec_ratio2seg]
6 changes: 3 additions & 3 deletions subworkflows/kfdrc_manta_sub_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ inputs:
manta_cores: {type: "int?"}
select_vars_mode: {type: ['null', {type: enum, name: select_vars_mode, symbols: ["gatk", "grep"]}], doc: "Choose 'gatk' for SelectVariants tool, or 'grep' for grep expression", default: "gatk"}
annotsv_annotations_dir_tgz: {type: 'File', doc: "TAR.GZ'd Directory containing annotations for AnnotSV"}
tool_name: { type: 'string?', default: "manta", doc: "Tool name to use in outputs." }

outputs:
manta_prepass_vcf: {type: File, outputSource: rename_manta_samples/reheadered_vcf}
Expand All @@ -37,6 +38,7 @@ steps:
cores: manta_cores
reference: indexed_reference_fasta
hg38_strelka_bed: hg38_strelka_bed
tool_name: tool_name
out: [output_sv, small_indels]

rename_manta_samples:
Expand All @@ -48,12 +50,10 @@ steps:

gatk_selectvariants_manta:
run: ../tools/gatk_selectvariants.cwl
label: GATK Select Manta PASS
in:
input_vcf: rename_manta_samples/reheadered_vcf
output_basename: output_basename
tool_name:
valueFrom: $("manta")
tool_name: tool_name
mode: select_vars_mode
out: [pass_vcf]

Expand Down
4 changes: 3 additions & 1 deletion subworkflows/kfdrc_mutect2_sub_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,9 @@ steps:
reference: indexed_reference_fasta
input_bams: mutect2/mutect2_bam
enable_tool: make_bamout
output_basename: output_basename
output_basename:
source: [output_basename, tool_name]
valueFrom: $(self.join("."))
out: [output]

gatk_learn_orientation_bias:
Expand Down
54 changes: 30 additions & 24 deletions tools/manta.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,52 @@ cwlVersion: v1.0
class: CommandLineTool
id: kfdrc-manta-sv
label: Manta sv caller
doc: 'Calls structural variants. Tool designed to pick correct run mode based on if tumor, normal, or both crams are given'
doc: 'Calls structural variants. Tool designed to pick correct run mode based on if tumor, normal, or both crams are given'
requirements:
- class: ShellCommandRequirement
- class: InlineJavascriptRequirement
- class: ResourceRequirement
ramMin: ${ return inputs.ram * 1000 }
ramMin: $(inputs.ram * 1000)
coresMin: $(inputs.cores)
- class: DockerRequirement
dockerPull: 'pgc-images.sbgenomics.com/d3b-bixu/manta:1.4.0'

baseCommand: [/manta-1.4.0.centos6_x86_64/bin/configManta.py]
baseCommand: []
arguments:
- position: 1
- position: 0
shellQuote: false
valueFrom: >-
${
var std = " --ref " + inputs.reference.path + " --callRegions " + inputs.hg38_strelka_bed.path + " --runDir=./ && ./runWorkflow.py -m local -j " + inputs.cores + " ";
var mv = " && mv results/variants/";
if (typeof inputs.input_tumor_aligned === 'undefined' || inputs.input_tumor_aligned === null){
var mv_cmd = mv + "diploidSV.vcf.gz " + inputs.output_basename + ".manta.diploidSV.vcf.gz" + mv + "diploidSV.vcf.gz.tbi " + inputs.output_basename + ".manta.diploidSV.vcf.gz.tbi" + mv + "candidateSmallIndels.vcf.gz " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz" + mv + "candidateSmallIndels.vcf.gz.tbi " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz.tbi";
return "--bam ".concat(inputs.input_normal_aligned.path, std, mv_cmd);
}
else if (typeof inputs.input_normal_aligned === 'undefined' || inputs.input_normal_aligned === null){
var mv_cmd = mv + "tumorSV.vcf.gz " + inputs.output_basename + ".manta.tumorSV.vcf.gz" + mv + "tumorSV.vcf.gz.tbi " + inputs.output_basename + ".manta.tumorSV.vcf.gz.tbi" + mv + "candidateSmallIndels.vcf.gz " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz" + mv + "candidateSmallIndels.vcf.gz.tbi " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz.tbi";
return "--tumorBam " + inputs.input_tumor_aligned.path + std + mv_cmd;
}
else{
var mv_cmd = mv + "somaticSV.vcf.gz " + inputs.output_basename + ".manta.somaticSV.vcf.gz" + mv + "somaticSV.vcf.gz.tbi " + inputs.output_basename + ".manta.somaticSV.vcf.gz.tbi" + mv + "candidateSmallIndels.vcf.gz " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz" + mv + "candidateSmallIndels.vcf.gz.tbi " + inputs.output_basename + ".manta.candidateSmallIndels.vcf.gz.tbi";
return "--tumorBam " + inputs.input_tumor_aligned.path + " --normalBam " + inputs.input_normal_aligned.path + std + mv_cmd;
}
}
/manta-1.4.0.centos6_x86_64/bin/configManta.py --runDir=./
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nice reorg and cleanup

- position: 9
shellQuote: false
valueFrom: >-
$(inputs.input_normal_aligned != null ? inputs.input_tumor_aligned != null ? "--normalBam " + inputs.input_normal_aligned.path : "--bam " + inputs.input_normal_aligned.path : "")
- position: 10
shellQuote: false
prefix: "&&"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix: "$$"?! Cue Joe Rogan freaking out GIF

valueFrom: >-
./runWorkflow.py -m local
- position: 20
shellQuote: false
valueFrom: >-
&& mv results/variants/diploidSV.vcf.gz $([inputs.output_basename, inputs.tool_name, "diploidSV.vcf.gz"].join(".")) || :
&& mv results/variants/diploidSV.vcf.gz.tbi $([inputs.output_basename, inputs.tool_name, "diploidSV.vcf.gz.tbi"].join(".")) || :
&& mv results/variants/tumorSV.vcf.gz $([inputs.output_basename, inputs.tool_name, "tumorSV.vcf.gz"].join(".")) || :
&& mv results/variants/tumorSV.vcf.gz.tbi $([inputs.output_basename, inputs.tool_name, "tumorSV.vcf.gz.tbi"].join(".")) || :
&& mv results/variants/somaticSV.vcf.gz $([inputs.output_basename, inputs.tool_name, "somaticSV.vcf.gz"].join(".")) || :
&& mv results/variants/somaticSV.vcf.gz.tbi $([inputs.output_basename, inputs.tool_name, "somaticSV.vcf.gz.tbi"].join(".")) || :
&& mv results/variants/candidateSmallIndels.vcf.gz $([inputs.output_basename, inputs.tool_name, "candidateSmallIndels.vcf.gz"].join("."))
&& mv results/variants/candidateSmallIndels.vcf.gz.tbi $([inputs.output_basename, inputs.tool_name, "candidateSmallIndels.vcf.gz.tbi"].join("."))

inputs:
reference: {type: File, secondaryFiles: [^.dict, .fai]}
hg38_strelka_bed: {type: File, secondaryFiles: [.tbi]}
input_tumor_aligned: { type: 'File?', secondaryFiles: ["^.bai?", ".bai?", "^.crai?", ".crai?"], doc: "tumor BAM or CRAM" }
reference: {type: File, secondaryFiles: [^.dict, .fai], inputBinding: {position: 2, prefix: "--ref"}}
hg38_strelka_bed: {type: File, secondaryFiles: [.tbi], inputBinding: {position: 2, prefix: "--callRegions"}}
input_tumor_aligned: { type: 'File?', secondaryFiles: ["^.bai?", ".bai?", "^.crai?", ".crai?"], inputBinding: {position: 9, prefix: "--tumorBam"}, doc: "tumor BAM or CRAM" }
input_normal_aligned: { type: 'File?', secondaryFiles: ["^.bai?", ".bai?", "^.crai?", ".crai?"], doc: "normal BAM or CRAM" }
cores: {type: ['null', int], default: 16}
cores: {type: ['null', int], default: 16, inputBinding: {position: 12, prefix: "-j"}}
ram: {type: "int?", default: 10, doc: "GB of RAM an instance must have to run the task"}
output_basename: string
tool_name: { type: 'string?', default: "manta", doc: "Tool name to use in outputs." }
outputs:
output_sv:
type: File
Expand Down
3 changes: 2 additions & 1 deletion tools/ubuntu_ratio2seg.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ arguments:
smp = "$(inputs.sample_name)"

ratio_file = open("$(inputs.ctrlfreec_ratio.path)")
out = open("$(inputs.output_basename).controlfreec.seg", "w")
out = open("$(inputs.output_basename).$(inputs.tool_name).seg", "w")
out.write("ID\tchrom\tloc.start\tloc.end\tnum.mark\tseg.mean\n")
head = next(ratio_file)
count = 0
Expand Down Expand Up @@ -70,6 +70,7 @@ inputs:
ctrlfreec_ratio: File
sample_name: string
output_basename: string
tool_name: { type: 'string?', default: "controlfreec", doc: "Tool name to use for output filename" }

outputs:
ctrlfreec_ratio2seg:
Expand Down
7 changes: 4 additions & 3 deletions tools/ubuntu_rename_outputs.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ arguments:
parts.shift();
var check = fname.substr(fname.length - 10);
if (check == "config.txt") {
cmd += "cp " + inputs.input_files[i].path + " " + inputs.output_basename + ".controlfreec.config.txt;";
cmd += "cp " + inputs.input_files[i].path + " " + [inputs.output_basename, inputs.tool_name, "config.txt;"].join(".");
} else {
fname = inputs.output_basename + ".controlfreec." + parts.join(".");
fname = [inputs.output_basename, inputs.tool_name, parts.join(".")].join(".");
cmd += " cp " + inputs.input_files[i].path + " " + fname + ";";
}
}
Expand All @@ -41,7 +41,7 @@ arguments:
fname = fname.replace(".txt", "");
var parts = fname.split(".");
parts.shift();
fname = inputs.output_basename + ".controlfreec." + parts.join(".") + ".png";
fname = [inputs.output_basename, inputs.tool_name, parts.join("."), "png"].join(".");
cmd += " cp " + inputs.input_pngs[j].path + " " + fname + ";";
}
return cmd;
Expand All @@ -50,6 +50,7 @@ arguments:
inputs:
input_files: File[]
input_pngs: File[]
tool_name: { type: 'string?', default: "controlfreec", doc: "Tool name to use in outputs." }
output_basename: string
outputs:
ctrlfreec_baf:
Expand Down
Loading
Loading