Augmenting a training dataset of the generative diffusion model for molecular docking with artificial binding pockets
The model is trained to dock small molecules in a predefined binding pocket.
Therefore, the input PDB file is expected to include only pocket residues.
A general recommendation is to consider all the residues within 5-6 Å of any heavy atom of known ligand (15-30 residues)
or equivalent pocket sizes for the binding sites defined by other methods. Refer to the examples/extract_pocket.py
as basic pocket extraction script.
- Install the recommended dependencies compatible to your hardware and operating system
- Git LFS
- Python >= 3.8
- PyTorch >= 2.0
- PyTorch Geometric (including
torch_scatter
andtorch_cluster
) - reduce
- Git LFS
- Clone the repository, navigate to the cloned folder, pull model weights
git clone https://github.com/vtarasv/pocket-cfdm.git
cd pocket-cfdm/
git lfs pull
- Install required packages
pip install -r requirements.txt
- Run the inference
python predict.py --pdb my_pocket.pdb --sdf my_ligands.sdf --save_path my_ligands_docked.sdf --samples 16 --batch_size 16 --no_filter
An increase ofsamples
argument will lead to generation of higher alternative poses per docked molecule (better prediction quality for additional computational cost).
Consider decreasing thebatch_size
if you face GPU memory-related errors.
By default the results include only poses with acceptable quality. Theno_filter
flag allows to write all the generated poses despite their quality.
The first script run will take some time to precompute and save in the cache required data distributions.
- Pull the docker image
docker pull vtarasv/pocket-cfdm
- Run the inference code using docker
docker run -it --rm --gpus all -v '/home/':'/home/' vtarasv/pocket-cfdm -m predict --pdb /home/user/temp/my_pocket.pdb --sdf /home/user/temp/my_ligands.sdf --save_path /home/user/temp/my_ligands_docked.sdf