Current version does not support inserts and is poorly tested for duplications!
Written and tested in Python 3.7.9.
This script allows direct HGVS mutation variant prediction using SpliceAI.
This entire script is based on SpliceAI. The code can be found on their GitHub:
https://github.com/Illumina/SpliceAI
Genome Annotation
This script requires genome annotation for the genome the user provides. These can be downloaded here:
hg38: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
hg19: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
Libraries
This script requires some libraries to run. These can be found on their respective GitHub pages:
HGVS: https://github.com/biocommons/hgvs
pyfaidx: https://github.com/mdshw5/pyfaidx
Ensembl Rest: https://github.com/gawbul/pyEnsemblRest
Pandas: https://github.com/pandas-dev/pandas
Alternatively, these can be installed directly via:
pip install hgvs
pip install pyfaidx
pip install pyensemblrest
pip install pandas
The script can be run directly from the command line:
python3 HGVSpredict.py -I input -O output -G genome -P preferred_transcript (optional)
Input can be any regular text format readable by python 3.7.9 (.txt for example), with variants separated by newline characters. Encoding does not matter.
The output is in .csv format. It is therefore advised to put .csv in the output file name.
- Check arguments
- Validate variants with preferred transcripts
- Per-variant runs:
-
- Conversion from HGVS to genomic variant
-
- Locating the mutation within the gene
-
- Get SpliceAI scores
-
- Predict transcript effect based on location and scores
Kyran Wissink
Student Biomedical Sciences
University of Groningen
github.com/KyranWissink
k.wissink@student.rug.nl