This project offers an accessible script to analyze and visualize AlphaMissense-predicted pathogenicity scores. It integrates with ClinVar and AlphaFold to promote a more comprehensive interpretation of the pathogenicity predictions.
- Fetches and processes AlphaMissense predictions for a specified protein (based on UniProt ID)
- Automatically fetches AlphaFold PDB files or utilizes user-provided experimental PDB files to generate a modified PDB file with AlphaMissense pathogenicity scores
- Replaces the temperature factor (B-factor) with pathogenicity values, enabling visualization in molecular visualization tools
- Creates a heatmap with the AlphaMissense predicted pathogenicity scores for all amino acid residue substitutions
- Extracts reporter missense variants from ClinVar
- Compares the pathogenicity classification of all missense ClinVar variants to the AlphaMissense classifications (exported as
{GENE_ID}_clinvar_AM.csv
)
- Compares the pathogenicity classification of all missense ClinVar variants to the AlphaMissense classifications (exported as
- Produces a line graph visualizing:
- Average AlphaMissense predicted pathogenicity at each amino acid residue
- AlphaFold per-residue model confidence score (pLDDT)
- Secondary structure annotations for alpha helices and beta sheets
- Produces another line graph visualizing:
- Average AlphaMissense predicted pathogenicity at each amino acid residue
- Missense variants from ClinVar along with their associated classification (simplified for plot simplicity)
- Python 3.11 or higher
You can install the required packages using:
pip install -r requirements.txt
Alternatively, you can create a conda environment with the required packages using:
conda create --name amissense python requests pandas seaborn matplotlib plotly numpy biopython conda-forge::python-kaleido salilab::dssp
conda activate amissense
With the setup.py
file, you can now install the package using the following:
pip install .
This will install the amissense
package and make the amissense
command-line interface (CLI) available.
To run the main pipeline, use the following command:
amissense pipeline -u UNIPROT_ID -g GENE_ID [-o OUTPUT_DIR] [-e EXPERIMENTAL_PDB]
Arguments:
UNIPROT_ID
: The UniProt ID of the protein you want to analyze. This is a required positional argument.GENE_ID
: The gene ID associated with the protein. This is a required positional argument.OUTPUT_DIR
: The directory to store output files (default isout/
).EXPERIMENTAL_PDB
: The experimental PDB file for the protein (optional). If not provided, the script will use the AlphaFold predicted structure.
amissense pipeline -u P12345 -g BRCA1 -o output_dir -e experimental.pdb
You can also use utility commands for additional operations:
- Download AlphaMissense predictions:
amissense utils download-predictions -t /path/to/tmp_dir
- Download a PDB file:
amissense utils download-pdb -p 6LID -o /path/to/output_dir
- Query UniProt for a gene's UniProt ID:
amissense utils uniprot-query -n GENE_NAME -i ORGANISM_ID
To generate JSON files from AlphaMissense TSV data:
amissense generate-json /path/to/AlphaMissense_aa_substitutions.tsv.gz /path/to/output_dir
The script generates several output files in the specified output directory {GENE_ID}_{UNIPROT_ID}_{YYYY-MM-DD}
:
Figures
{UNIPROT_ID}_heatmap.png
: Heatmap of the AlphaMissense pathogenicity scores for each amino acid substitution{UNIPROT_ID}_line_graph.png
: Line graph showing the AlphaMissense mean pathogenicity, AlphaFold per-residue model confidence score (pLDDT), and secondary structure annotations{GENE_ID}_avgAM_clinvar.png
: Line graph showing the AlphaMissense mean pathogenicity and extracted ClinVar missense variants with their classification{GENE_ID}_sankey_diagram.png
: Sankey diagram depicting the flow quantity between the AlphaMissense variant pathogenicity classification and the ClinVar variant classifications
PDB
{UNIPROT_ID}_alphafold_{YYYY-MM-DD}.pdb
: Unaltered PDB file from AlphaFold{UNIPROT_ID}_pathogenicity.pdb
: PDB file with pathogenicity scores replacing the B-factor
Tables
{UNIPROT_ID}_AM_pathogenicity_predictions_{YYYY-MM-DD}.csv
: CSV file containing the AlphaMissense predictions{GENE_ID}_clinvar_AM_{YYYY-MM-DD}.csv
: CSV file with ClinVar and AlphaMissense data combined