Variant Calling Pipeline

If you'd like to run the variant calling pipeline on its own, you should provide required input in configs/config-variant_calling.yaml.

Inputs

FASTQ files containing DNA sequencing reads for each sample. Specify the location of these files in a tab delimited text file containing three columns: unique_sample_name | dna_fastq_path_1 | dna_fastq_path_2 where each row is a different sample.
A reference genome for DNA-seq alignment
A BWA index of the aforementioned reference genome

Other Inputs and Options

If you choose to use GATK's base quality (BQSR) and variant quality (VQSR) score recalibration, you must provide the following, as described in this GATK article:
- True sites training resource: HapMap
- True sites training resource: Omni
- Non-true sites training resource: 1000G
- Known sites resource, not used in training: dbSNP
These resources can usually be easily obtained from the GATK resource bundle.

You will also be required to specify a target sensitivity value, as described in a previously mentioned GATK article.
If you choose not to perform VQSR, the pipeline will default to hard filtering your variants. You will need to provide a GATK filter expression, as described in this GATK article. One example might be "QD < 2.0 || FS > 60.0 || MQ < 40.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0".

Running the variant calling pipeline on its own

When calling Snakemake, use options -s and --configfile to specify the location of the Snakefile and its corresponding config file. We also recommend using the --use-conda option to let Snakemake handle all dependencies of the pipeline.

snakemake -s Snakefiles/Snakefile-variant_calling --configfile configs/config-variant_calling.yaml --use-conda

Output

The variant calling pipeline creates the following directories under the output directory specified in your config file. The genotypes folder will contain the final output, a filtered VCF containing heterozygous SNPs for all samples.

dna_align - output from BWA and samtools
base_recal - output from GATK's BQSR
haplotype - output from GATK's Haplotype Caller and a file ALL.genotype.vcf.gz containing genotyped variants for all samples
variant_filter - output from variant filtering (either VQSR or hard filtering) of SNPs
genotypes - heterozgyotes from the filtered VCF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.variant_calling.md

README.variant_calling.md

Variant Calling Pipeline

Inputs

Other Inputs and Options

Running the variant calling pipeline on its own

Output

Files

README.variant_calling.md

Latest commit

History

README.variant_calling.md

File metadata and controls

Variant Calling Pipeline

Inputs

Other Inputs and Options

Running the variant calling pipeline on its own

Output