variant-scorer

The variant scoring repository provides a set of scripts for scoring genetic variants using a ChromBPNet model.

1. variant_scoring.py

This script takes a list of variants in various input formats and generates scores for the variants using a ChromBPNet model. The output is a TSV file containing the scores for each variant.

Usage:

python variant_scoring.py -l [VARIANTS_FILE] -g [GENOME_FASTA] -m [MODEL_PATH] -o [OUT_PREFIX] -s [CHROM_SIZES] [OTHER_ARGS]

Input arguments:


-l or --list: (required) a TSV file containing a list of variants to score

-g or --genome: (required) a genome fasta file

-pg or --peak_genome: a genome fasta file for peaks

-m or --model: (required) the ChromBPNet model to use for variant scoring. For most use cases, this should be the bias-corrected model (chrombpnet_nobias.h5)

-o or --out_prefix: (required) the path to store SNP effect score predictions from the script. The directory should already exist

-s or --chrom_sizes: (required) the path to a TSV file with chromosome sizes

-ps or --peak_chrom_sizes: the path to a TSV file with chromosome sizes for the peak genome

-dm or --debug_mode: subsample 10000 variants for debug

-bs or --batch_size: the batch size to use for the model. Default is 512

-sc or --schema: the format for the input variants list. Choices are: 'bed', 'plink', 'chrombpnet', 'original'. Default is 'chrombpnet'

-p or --peaks: a bed file containing peak regions

-n or --num_shuf: the number of shuffled scores per SNP. Default is 10

-t or --total_shuf: the total number of shuffled scores across all SNPs. Overrides --num_shuf

-c or --chrom: only score SNPs in the selected chromosome

-r or --random_seed: the random seed for reproducibility when sampling. Default is 1234

--no_hdf5: do not save detailed predictions in hdf5 file

-fo or --forward_only: run variant scoring only on forward sequence

-st or --shap_type: the type of SHAP values to compute. Default is "counts"

Supported Variant List Schemas:

chrombpnet : ['chr', 'pos', 'allele1', 'allele2', 'variant_id']
bed : ['chr', 'pos', 'end', 'allele1', 'allele2', 'variant_id']
plink : ['chr', 'variant_id', 'ignore1', 'pos', 'allele1', 'allele2']
original : ['chr', 'pos', 'variant_id', 'allele1', 'allele2']

2. variant_summary_across_folds.py

This script takes variant scores generated by the variant_scoring.py script and generates a TSV file with the mean scores for each score type.

Usage:

python variant_summary_across_folds.py -sd [VARIANT_SCORE_DIR] -sl [SCORE_LIST] -o [out_prefix] -s [SCHEMA]

Input arguments:


-sd or --score_dir (required): Path to directory with variant scores that will be used to generate summary

-sl or --score_list: (required): Names of variant score files that will be used to generate summary

-o or --out_prefix (required): Path prefix for storing the summary file with average scores across folds; directory should already exist

-sc or --schema: the format for the input variants list. Choices are: 'bed', 'plink', 'chrombpnet', 'original'. Default is 'chrombpnet'

3. variant_annotation.py

This script takes a list of variants and annotates each with their closest genes and any overlaps with peaks.

NOTE: This script assumes that the peaks and genes are in the same reference genome as the variants, and it does not perform any liftover operations.

Usage:

python variant_annotation.py -sd [VARIANT_SCORE_DIR] -o [out_prefix] -p [PEAKS] -g [GENES] -s [SCHEMA]

Input arguments:


-l or --list: (required) a TSV file containing a list of variants to annotate

-o or --out_prefix (required): Path prefix for storing the annotated file; directory should already exist

-p or --peaks (required): a bed file containing peak regions

-g or --genes: (required): A bed file with gene coordinates

-sc or --schema: the format for the input variants list. Choices are: 'bed', 'plink', 'chrombpnet', 'original'. Default is 'chrombpnet'

Note: pos (position) column is for 1-indexed SNP position, unless the schema is bed

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

variant-scorer

1. variant_scoring.py

Usage:

Input arguments:

Supported Variant List Schemas:

2. variant_summary_across_folds.py

Usage:

Input arguments:

3. variant_annotation.py

Usage:

Input arguments:

About

Releases

Packages

Contributors 5

Languages

License

kundajelab/variant-scorer

Folders and files

Latest commit

History

Repository files navigation

variant-scorer

1. variant_scoring.py

Usage:

Input arguments:

Supported Variant List Schemas:

2. variant_summary_across_folds.py

Usage:

Input arguments:

3. variant_annotation.py

Usage:

Input arguments:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages