-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
58fc7a3
commit 38dbe76
Showing
2 changed files
with
37 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,40 @@ | ||
# kf-jointgenotyping-workflow | ||
# Kids First DRC Joint Genotyping Workflow | ||
|
||
![data service logo](https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png) | ||
|
||
Kids First Data Resource Center Joint Genotyping Workflow (cram-to-deNovoGVCF). Cohort sample variant calling and genotype refinement. | ||
|
||
Using existing gVCFs, likely from GATK Haplotype Caller, we follow this workflow: [Germline short variant discovery (SNPs + Indels)](https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145), to create family joint calling and joint trios (typically mother-father-child) variant calls. Peddy is run to raise any potential issues in family relation definitions and sex assignment. | ||
|
||
|
||
### Tips To Run: | ||
|
||
1. inputs vcf files are the gVCF files from GATK Haplotype Caller, need to have the index **.tbi** files copy to the same project too. | ||
|
||
2. ped file in the input shows the family relationship between samples, the format should be the same as in GATK website [link](https://gatkforums.broadinstitute.org/gatk/discussion/7696/pedigree-ped-files), the Individual ID, Paternal ID and Maternal ID must be the same as in the inputs vcf files header. | ||
|
||
3. Here we recommend to use GRCh38 as reference genome to do the analysis, positions in gVCF should be GRCh38 too. | ||
|
||
4. Reference locations: | ||
- https://console.cloud.google.com/storage/browser/broad-references/hg38/v0/ | ||
- kfdrc bucket: s3://kids-first-seq-data/broad-references/ | ||
- cavatica: https://cavatica.sbgenomics.com/u/yuankun/kf-reference/ | ||
5. Suggested inputs: | ||
- Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz | ||
- Homo_sapiens_assembly38.dbsnp138.vcf | ||
- hapmap_3.3.hg38.vcf.gz | ||
- Mills_and_1000G_gold_standard.indels.hg38.vcf.gz | ||
- 1000G_omni2.5.hg38.vcf.gz | ||
- 1000G_phase1.snps.high_confidence.hg38.vcf.gz | ||
- Homo_sapiens_assembly38.dict | ||
- Homo_sapiens_assembly38.fasta.fai | ||
- Homo_sapiens_assembly38.fasta | ||
- 1000G_phase3_v4_20130502.sites.hg38.vcf | ||
- hg38.even.handcurated.20k.intervals | ||
- homo_sapiens_vep_93_GRCh38_convert_cache.tar.gz, from ftp://ftp.ensembl.org/pub/release-93/variation/indexed_vep_cache/ - variant effect predictor cache. | ||
- wgs_evaluation_regions.hg38.interval_list | ||
|
||
## basic info | ||
- pipeline flowchart: | ||
- [draw.io](https://tinyurl.com/y9cq6yp8) | ||
![pipeline flowchart](./docs/kf_jointgenotyping_workflow_optimized_and_refinement.cwl.png) | ||
- tool images: https://hub.docker.com/r/kfdrc/ | ||
- dockerfiles: https://github.com/d3b-center/bixtools |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.