Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two local cubes encapsulating the physico-chemical and geometrical environments around the wild-type and the mutant amino acids. The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental properties of protein-protein interfaces. The evolutionary constraints and conformational heterogeneity improves the performance of DLA-Mutation.
-
Prediction of the changes of binding affinity upon single-point mutation using Siamese architecture.
-
Transfer the knowledge of protein-protein interfaces to various down stream tasks. For instance, given a single partially masked cube, it recovers the identity and physico-chemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex.
-
Using structural and evolutionary information.
-
Fast generation of cubes and evaluation of interfaces.
-
Training and testing 3D-CNN models.
DLA-Ranker can be run on Linux, MacOS, and Windows. We recommend to use DLA-Ranker on the machines with GPU. It requires following packages:
- FreeSASA or NACCESS
- ProDy
- lz4 compression tool
- Python version 3.7 or 3.8.
- Tensorflow version 2.2 or 2.3.
- Cuda-Toolkit
- Scikit-Learn, numpy pandas matplotlib lz4 and tqdm (conda install -c pytorch -c pyg -c conda-forge python=3.9 numpy pandas matplotlib tqdm pytorch pyg scikit-learn cuda-toolkit lz4).
All-in-one: Run conda create --name dla --file dla.yml
ssDLA is a structure-based general purpose model to generate informative representations from the local environments (masked or not-masked) around interfacial residues for downstream tasks.
We can use the pre-trained ssDLA model to predict the type of amino acid given a masked cube.
- Place the protein complexes in a directory (e.g. 'Examples/complex_directory') like below. The 'complex_list.txt' is a csv file that contains three columns separated by ';': Name of target complex (
Comp
); receptor chain ID(s) (ch1
), ligand chain ID(s) (ch2
).
Example
|___complex_list.txt
|
|___complex_directory
|
|___complex 1
|___complex 2
|
..........
- Specify the path to FreeSASA or NACCESS in
lib/tools.py
(FREESASA_PATH
orNACCESS_PATH
). The choice between FreeSASA or NACCESS can be specified inlib/tools.py
(default isUSE_FREESASA = True
). - If you have 'Nvidia GPU' on your computer, or execute on 'Google Colab', set
FORCE_CPU = False
inlib/tools.py
. Otherwise setFORCE_CPU = True
(default isFORCE_CPU=False
). - Specify the type of masking in
Representation/generate_cubes_interface.py
. You have the following options:- Masking a sphere of radius 5A randomly centered on an atom of the central residue. This is the default masking. The ssDLA model is trained by this masking option.
- Masking a sphere of radius 3A randomly centered on an atom of the central residue.
- Masking only the side-chain of the central residue.
- Masking the whole central residue.
- No masking at all.
- From
Representation
runpython generate_cubes_interface.py
.
The output will be directory 'map_dir' with the following structure:
Example
|___map_dir
|___complex 1
|___complex 2
..........
Each output represents interface of a complex and contains a set of local environments (e.g. atomic density map, structure classes (S,C,R), ...)
An atomic density map is a 4 dimensional tensor: a voxelized 3D grid with a size of 24*24*24
. Each voxel encodes some characteristics of the protein atoms. Namely, the first 167 dimensions correspond to the
atom types that can be found in amino acids (without the hydrogen). This dimension can be reduced to 4 element symbols (C,N,O,S) by running python generate_cubes_reduce_channels_multiproc.py
(ATTENTION: This code overwrites the existing files).
From directory 'Evaluation' run python test_xray.py
or python test_xray_4channels.py
depending on the number of channels.
It processes all the target complexes and produces csv files 'output_xray_wt_mask' ('output_xray_wt_mask_4channels') as the output and 'intermediate_xray_wt_mask_200' ('intermediate_xray_wt_mask_200_4channels') as the embedding vectors. Each row of the output file belongs to an interfacial residue of a target complex and has 10 columns separated by 'tab':
Name of the complex (complex
)
Residue name (resname
)
Structural region of the residue (resregion
)
Residue number (resnumber
; according to PDB)
Residue coordinate position (respos
)
Receptor or ligand (partner
)
The predicted vector of size 20 (prediction
)
The one-hot encoding of the target residue (target
)
Entropy of the predicted vector (entropy
)
Cross-entropy between the predicted and target vectors (crossentropy
)
Each row of the embedding file also belongs to an interfacial residue. Beside the information mentioned above, it has the feature vectors of size 200 extracted from each cube. This files serves as input for the downstream tasks (transfer learning with frozen weights).
Similar analysis can be performed on backrub models by running python test_backrub.py
or python test_backrub_4channels.py
depending on the number of channels.
- Place the wild-type and mutant complex backrub models in a directory (e.g. 'Examples/backrub_directory') like below.
Example
|___backrub_directory
|
|___complex-mutation 1
| | model 1
| | model 2
| | ...
|
|___complex-mutation 2
| | model 1
| | model 2
| | ...
|
..........
-
From
Representation
runpython generate_cubes_ddg.py
. It extracts cubic volumetric maps around the mutation positions from both wild-type and mutant complexes. -
From
Evaluation
runpython test_ddg.py
. The output contains the values of predicted and experimental values of ΔΔG.
- Specify the non masking in
Representation/generate_cubes_interface.py
. - From
Representation
runpython generate_cubes_interface.py
. - From
Evaluation
runpython test_xray.py
orpython test_xray_4channels.py
(depending on the number of channels) to extract the embeddings. - From
Evaluation
runpython transfer_learning_aa_reducedalphabet.py
orpython transfer_learning_aa_reducedalphabet_xray.py
to train a small neural network.
- Specify the non masking in
Representation/generate_cubes_interface.py
. - From
Representation
runpython generate_cubes_interface.py
. - From
Evaluation
runpython test_xray.py
orpython test_xray_4channels.py
(depending on the number of channels) to extract the embeddings. - From
Evaluation
runpython transfer_learning_function.py
to train a small neural network.
We would like to thank Dr. Sergei Grudinin and his team for helping us with the initial source code of maps_generator
and load_data.py
. See Ornate.