GitHub - gatk-workflows/intel-gatk3-4-germline-snps-indels: Workflows for processing and variant discovery with GATK (v3+v4) optimized by Intel for on-premises infrastructure

Intel Optimized GATK-3-4 Germline SNPs and Indels Variant Calling Workflow.

WORKFLOWS AND JSONS

This repository contains a few different files - each tuned for certain requirements.

├── 2T_PairedSingleSampleWf_optimized.inputs.json → Throughput JSON file
├── 56T_PairedSingleSampleWf_optimized.inputs.20k.json → 20k test data JSON file
├── 56T_PairedSingleSampleWf_optimized.inputs.json → Latency JSON file
├── HDD_2T_PairedSingleSampleWf_optimized.inputs.json → Throughput JSON file (for HDD)
├── HDD_56T_PairedSingleSampleWf_optimized.inputs.json → Latency JSON file (for HDD)
├── PairedSingleSampleWf_noqc_nocram_optimized.wdl → WDL optimized for on-prem
├── PairedSingleSampleWf_noqc_nocram_withcleanup_optimized.wdl → WDL optimized for on-prem with cleanup of output results (for throughput analysis)

For the PairedSingleSampleWf_noqc_nocram_optimized.wdl file, modify Line 1270 to the path where datasets reside in your cluster.

For the PairedSingleSampleWf_noqc_nocram_withcleanup_optimized.wdl file, modify Line 1317 to the path where datasets reside in your cluster.

In the JSON files, modify the paths to the datasets and tools where they reside in your cluster.

FPGA CHANGES

Assuming the environemnt has been setup to offload the pairhmm kernel of HaplotypeCaller to FPGA - the below changes must be enabled in the WDL/JSON files (based on the comments) to make use of the FPGA.

a. In the WDL file, for task Haplotype Caller runtime section, uncomment the line: require_fpga: "yes"

b. In the JSON file, change the "PairedEndSingleSampleWorkflow.gatk_gkl_pairhmm_implementation" from "VECTOR_LOGLESS_CACHING" to "VECTOR_LOGLESS_CACHING_FPGA_EXPERIMENTAL".

Refer to: WDL and JSON examples.

DATASETS

Contact Intel/Broad for access to the WGS data needed for this workflow.

TOOLS

For on-prem, the workflow uses non-dockerized tools. To keep up with the exact versions released by Broad for their best practices workflow, we download the tools from the docker image to our shared file system.

Run the command:

docker run -v /path/to/shared_filesystem:/path/to/shared_filesystem -it broadinstitute/genomes-in-the-cloud:2.3.1-1504795437 /bin/bash

This command will pull the docker image (if it is not already there locally), and put you within the container from where you can copy the tools needed for the workflow.

root@54754360159e:/usr/gitc# cp -r /usr/local/bin/samtools gatk4 bwa picard.jar /path/to/shared_filesystem
root@54754360159e:/usr/gitc# exit

In addition to above, this workflow uses the latest optimized GATK 3.8-1 jar
with optimizations which can be obtained from GATK archive website.

Lastly, Hybrid workflow also needs a tool called "VerifyBamID" that can be downloaded as follows:

If on Centos, you will need to do a yum install curl-devel before proceeding.

cd /path/to/shared_filesystem
wget https://github.com/Griffan/VerifyBamID/archive/c8a66425c312e5f8be46ab0c41f8d7a1942b6e16.zip && \
unzip c8a66425c312e5f8be46ab0c41f8d7a1942b6e16.zip && \
cd VerifyBamID-c8a66425c312e5f8be46ab0c41f8d7a1942b6e16 && \
mkdir build && \
cd build && \
CC=$(which gcc-4.9) CXX=$(which g++-4.9) cmake ..  && \
make && \
make test && \
cd ../../ && \
mv VerifyBamID-c8a66425c312e5f8be46ab0c41f8d7a1942b6e16/bin/VerifyBamID . && \
rm -rf c8a66425c312e5f8be46ab0c41f8d7a1942b6e16.zip VerifyBamID-c8a66425c312e5f8be46ab0c41f8d7a1942b6e16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intel Optimized GATK-3-4 Germline SNPs and Indels Variant Calling Workflow.

WORKFLOWS AND JSONS

FPGA CHANGES

DATASETS

TOOLS

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
2T_PairedSingleSampleWf_optimized.inputs.json		2T_PairedSingleSampleWf_optimized.inputs.json
56T_PairedSingleSampleWf_optimized.inputs.20k.json		56T_PairedSingleSampleWf_optimized.inputs.20k.json
56T_PairedSingleSampleWf_optimized.inputs.json		56T_PairedSingleSampleWf_optimized.inputs.json
HDD_2T_PairedSingleSampleWf_optimized.inputs.json		HDD_2T_PairedSingleSampleWf_optimized.inputs.json
HDD_56T_PairedSingleSampleWf_optimized.inputs.json		HDD_56T_PairedSingleSampleWf_optimized.inputs.json
PairedSingleSampleWf_noqc_nocram_optimized.wdl		PairedSingleSampleWf_noqc_nocram_optimized.wdl
PairedSingleSampleWf_noqc_nocram_withcleanup_optimized.wdl		PairedSingleSampleWf_noqc_nocram_withcleanup_optimized.wdl
README.md		README.md

gatk-workflows/intel-gatk3-4-germline-snps-indels

Folders and files

Latest commit

History

Repository files navigation

Intel Optimized GATK-3-4 Germline SNPs and Indels Variant Calling Workflow.

WORKFLOWS AND JSONS

FPGA CHANGES

DATASETS

TOOLS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages