Skip to content

Create "haplotype maps" of variants in order to use with Picardtools Crosscheck_Files utility, which allows for robust genotyping of functional genomic data.

Notifications You must be signed in to change notification settings

dmcmanam/fingerprint_maps

 
 

Repository files navigation

build_fingerprint_maps

build_fingerprint_maps is a tool for building haplotype maps for use with the Picard-Tools(http://broadinstitute.github.io/picard/) fingerprinting software CrosscheckFingerprints. A haplotype map is a collection of "blocks" of SNPs which are in tight linkage with SNPs of the same block and low linkage with SNPs of different blocks.

In order to download build_fingerprint_maps, you should clone this repository via the command

git clone https://github.com/naumanjaved/fingerprint_maps.git

Precomputed map files with headers(headers do not contain any entries for scaffolds or contigs)

The map_files directory also contains pre-computed maps with relaxed intra- and inter- block correlation thresholds. Map names contain the parameters used.

Dependencies

In order to run build_fingerprint_maps, you must have working installations of:

  1. Python (>=2.7)

  2. PLINK2 - https://www.cog-genomics.org/plink2

  3. VCFTools - https://vcftools.github.io/man_latest.html

  4. Anaconda(https://anaconda.org/anaconda/python) or the following modules: a. subprocess b. os c. itertools d. numpy e. sys f. argparse g. traceback h. time i. datetime

  5. LDSC(LDScore regression)

Required Files

Fingerprint maps uses VCFs from 1000 Genomes Phase 3 and recombination maps(SHAPEIT format). These can be found here:

See run.sh to see a sample run script. Run python build_fingerprint_maps.py -h to see a list of command line options.

Use with Picardtools

The above maps are to be used

For most cases where each file you want to compare with CrosscheckFingerprints contains data for only a single Fingerprint, you should run Crosscheck with the CROSSCHECK_BY FILE flag enabled. Picard with default settings can be strict about properly formatted headers and read names, so if a validation error arises, try running with the VALIDATION_STRINGENCY flag set to LENIENT (of course after ensuring that the formatting error does not indicate a legitimate problem with the input bam file).

When comparing many files, it is recommended to upfront precompute VCFs containing extracted fingerprints using the ExtractFingerprint tool in the Picard suite. This will avoid CrosscheckFingerprints having to redundantly compute fingerprints for the same file each time it is used for a comparison.

Custom map files

If you create a custom map file, make sure to append the appropriate header file to the map file. Below there are some headers for hg19 and hg38 with entries for reference chromosomes.

Support

Email javed@broadinstitute.org for issues.

Authors

Nauman Javed(Broad Institute) wrote the above scripts to generate fingerprint maps. Yossi Farjoun(Broad Institute) wrote CrosscheckFingerprints and ExtractFingerprints for which the above maps are inputs.

Citation

If you use the above tool/maps with CrosscheckFingerprints in your publication please cite the Picard-tools repo as well as the paper Javed, N., Farjoun, Y., Fennell, T.J. et al. Detecting sample swaps in diverse NGS data types using linkage disequilibrium. Nat Commun 11, 3697 (2020). DOI: https://doi.org/10.1038/s41467-020-17453-5

About

Create "haplotype maps" of variants in order to use with Picardtools Crosscheck_Files utility, which allows for robust genotyping of functional genomic data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.5%
  • Shell 1.5%