Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release single-step model environments #15

Merged
merged 13 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,6 @@ MANIFEST
# Unit test / coverage reports
.coverage
.coverage.*

# Cloned single-step model repositories
syntheseus/reaction_prediction/environments/external/
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.

### Added

- Release single-step evaluation framework and wrappers for several model types ([#14](https://github.com/microsoft/syntheseus/pull/14)) ([@kmaziarz])
- Release single-step evaluation framework and wrappers for several model types ([#14](https://github.com/microsoft/syntheseus/pull/14), [#15](https://github.com/microsoft/syntheseus/pull/15)) ([@kmaziarz])
- Add option to terminate search when the first solution is found ([#13](https://github.com/microsoft/syntheseus/pull/13)) ([@austint])
- Add code to extract routes in order found instead of by minimum cost ([#9](https://github.com/microsoft/syntheseus/pull/9)) ([@austint])
- Declare support for type checking ([#4](https://github.com/microsoft/syntheseus/pull/4)) ([@kmaziarz])
Expand Down
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,20 @@
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

Syntheseus is a package for retrosynthetic planning.
It contains implementations of common search algorithms
and a simple API to wrap custom reaction models and write
custom algorithms.
It contains implementations of common search algorithms, a simple API to wrap custom reaction models and write
custom algorithms, and wrappers for many state-of-the-art reaction models from the literature.
It is meant to allow for simple benchmarking of the components
of retrosynthesis algorithms.

## Installation
## Setup

Currently `syntheseus` is not hosted on PyPI
(although this will likely change in the future).
AustinT marked this conversation as resolved.
Show resolved Hide resolved
To install, please run:
We support two installation modes:
- *core installation* not tied to a specific reaction model allows you to build and benchmark your own models or search algorithms
- *full installation* backed by one of the supported models allows you to perform end-to-end retrosynthetic search

For full installation we currently support the following reaction models: Chemformer, LocalRetro, MEGAN, MHNreact, RetroKNN and RootAligned SMILES; see [here](syntheseus/reaction_prediction/environments/README.md) for detailed setup instructions.

For core installation simply run

```bash
# Clone and cd into the repository.
Expand Down
31 changes: 31 additions & 0 deletions syntheseus/reaction_prediction/environments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Single-step Model Environments

Every single-step model may require a different environment and set of dependencies.
Here we outline the steps to set up an environment for each of the supported models, which can be then used to run single-step model evaluation or multi-step search.

## Basic setup

All models apart from GLN can be set up using a shared base `conda` environment extended with a few model-specific dependencies. The general workflow is:

```bash
conda env create -f environment_shared.yml # Create the shared environment.
conda activate syntheseus-single-step # Activate the environment.
pip install -e ../../../ # Install `syntheseus`.
source setup_[MODEL_NAME].sh # Run the extra setup commands.
```

If you wish to use several models, it's enough to create the environment once and run all the corresponding setup scripts.
However, note that RetroKNN depends on LocalRetro, so if you want to use both, it is enough to run just `setup_retro_knn.sh`.

In `environment_shared.yml` and `setup_local_retro.sh` we pinned the CUDA version (to 11.3) for reproducibility.
If you want to use a different one, make sure to edit these two files accordingly.

The GLN model is not compatible with the others, currently requiring a specialized environment creation which includes building `rdkit` from source.
We packaged all the necessary steps into a Docker environment defined in `gln/Dockerfile`.

## Back-translation

In `reaction_prediction/cli/eval.py` a forward model may be used for computing back-translation (round-trip) accuracy.
Currently, Chemformer is the only supported forward model.

To evaluate a particular model with back-translation computed using Chemformer, simply set up an environment for that model and then run `setup_chemformer.sh` on top.
12 changes: 12 additions & 0 deletions syntheseus/reaction_prediction/environments/environment_shared.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: syntheseus-single-step
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- numpy
- pandas
- pip
- python==3.9.7
- pytorch=1.10.2=py3.9_cuda11.3_cudnn8.2.0_0
- rdkit=2021.09.4
AustinT marked this conversation as resolved.
Show resolved Hide resolved
38 changes: 38 additions & 0 deletions syntheseus/reaction_prediction/environments/gln/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.0-cudnn7-ubuntu18.04
MAINTAINER krmaziar@microsoft.com

# Set bash, as conda doesn't like dash
SHELL [ "/bin/bash", "--login", "-c" ]

# Make bash aware of conda
RUN echo ". /opt/miniconda/etc/profile.d/conda.sh" >> ~/.profile

# Turn off caching in pip
ENV PIP_NO_CACHE_DIR=1

# Install the dependencies into conda's default environment
COPY ./environment.yml /tmp/
RUN conda install mamba -n base -c conda-forge
RUN mamba env update -p /opt/miniconda -f /tmp/environment.yml && conda clean -ay

# Install RDKit from source
RUN git clone https://github.com/rdkit/rdkit.git
WORKDIR /rdkit
RUN git checkout 7ad9e0d161110f758350ca080be0fc05530bee1e
RUN mkdir build && cd build && cmake -DPy_ENABLE_SHARED=1 \
-DRDK_INSTALL_INTREE=ON \
-DRDK_INSTALL_STATIC_LIBS=OFF \
-DRDK_BUILD_CPP_TESTS=ON \
-DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ; print(numpy.get_include())')" \
-DBOOST_ROOT="$CONDA_PREFIX" \
.. && make && make install
WORKDIR /

# Install GLN (this relies on `CUDA_HOME` being set correctly).
RUN git clone https://github.com/Hanjun-Dai/GLN.git
WORKDIR /GLN
RUN git checkout b5bd7b181a61a8289cc1d1a33825b2c417bed0ef
RUN pip install -e .

ENV PYTHONPATH=$PYTHONPATH:/rdkit:/GLN
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/rdkit/lib
AustinT marked this conversation as resolved.
Show resolved Hide resolved
26 changes: 26 additions & 0 deletions syntheseus/reaction_prediction/environments/gln/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: gln-env
channels:
- conda-forge
- pytorch
dependencies:
- cudatoolkit=10.0
- cudatoolkit-dev=10
- python=3.7
- pytorch==1.2.0
- scipy
- tqdm
# Dependencies below are needed to build `rdkit` from source:
- boost
- boost-cpp
- cairo
- cmake
- eigen
- gxx_linux-64
- pillow
- pkg-config
- py-boost
- pip:
- torch-cluster==1.4.5
- torch-geometric==1.3.2
- torch-scatter==1.4.0
- torch-sparse==0.4.3
11 changes: 11 additions & 0 deletions syntheseus/reaction_prediction/environments/setup_chemformer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

# Install extra dependencies specific to Chemformer.
pip install pytorch-lightning==1.9.4 git+https://github.com/MolecularAI/pysmilesutils.git

export GITHUB_ORG_NAME=MolecularAI
export GITHUB_REPO_NAME=Chemformer
export GITHUB_REPO_DIR=chemformer
export GITHUB_COMMIT_ID=6333badcd4e1d92891d167426c96c70f5712ecc3

source setup_shared.sh
12 changes: 12 additions & 0 deletions syntheseus/reaction_prediction/environments/setup_local_retro.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

# Install extra dependencies specific to LocalRetro.
conda install dgl-cuda11.3 -c dglteam -y
pip install dgllife chardet

export GITHUB_ORG_NAME=kaist-amsg
export GITHUB_REPO_NAME=LocalRetro
export GITHUB_REPO_DIR=local_retro
export GITHUB_COMMIT_ID=7dab59f7f85eca8b1c04c18fe8575fb1568ff7ae

source setup_shared.sh
11 changes: 11 additions & 0 deletions syntheseus/reaction_prediction/environments/setup_megan.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

# Install extra dependencies specific to MEGAN.
pip install gin-config==0.3.0 tensorflow==2.13.0 torchtext==0.13.1

export GITHUB_ORG_NAME=molecule-one
export GITHUB_REPO_NAME=megan
export GITHUB_REPO_DIR=$GITHUB_REPO_NAME
export GITHUB_COMMIT_ID=bd6179e42052521e46728adb2bb80dea6905bf40

source setup_shared.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

# Install extra dependencies specific to MHNreact.
conda install rdchiral_cpp -c conda-forge -y
pip install scikit-learn scipy swifter tqdm wandb

# Install our fork of the open-source MHNreact code, which includes some efficiency improvements.
pip install git+https://github.com/kmaziarz/mhn-react.git
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

# Set up LocalRetro first, which RetroKNN depends on.
source setup_local_retro.sh

# Install extra dependencies specific to RetroKNN.
conda install faiss-gpu -c pytorch -y
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
11 changes: 11 additions & 0 deletions syntheseus/reaction_prediction/environments/setup_root_aligned.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

# Install extra dependencies specific to RootAligned.
pip install OpenNMT-py==2.2.0 textdistance==4.2.2

export GITHUB_ORG_NAME=otori-bird
export GITHUB_REPO_NAME=retrosynthesis
export GITHUB_REPO_DIR=root_aligned # Override the repository name to make it less ambiguous.
export GITHUB_COMMIT_ID=ea3b5729752fdc319b18ea4c65c1a573e24d7320

source setup_shared.sh
16 changes: 16 additions & 0 deletions syntheseus/reaction_prediction/environments/setup_shared.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Make a subdirectory for storing downloaded external repositories.
mkdir -p external

# Add the `external/` directory to `PYTHONPATH` when the environment is activated.
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo "export PYTHONPATH=$PWD/external:.:$PYTHONPATH" >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

export GITHUB_NAME="$GITHUB_ORG_NAME/$GITHUB_REPO_NAME"
export MODEL_DIR="external/$GITHUB_REPO_DIR"

echo "Setting up $GITHUB_NAME under $MODEL_DIR"
git -C external clone "https://github.com/$GITHUB_NAME.git" $GITHUB_REPO_DIR
git -C $MODEL_DIR checkout $GITHUB_COMMIT_ID
10 changes: 5 additions & 5 deletions syntheseus/reaction_prediction/inference/chemformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ def __init__(
# There should be exaclty one `*.ckpt` file under `model_dir`.
chkpt_path = get_unique_file_in_dir(model_dir, pattern="*.ckpt")

import Chemformer
import chemformer

# Fix for Chemformer's relative imports.
chemformer_root_dir = get_module_path(Chemformer)
chemformer_root_dir = get_module_path(chemformer)
sys.path.insert(0, chemformer_root_dir)

import Chemformer.molbart.util as util
from Chemformer.molbart.decoder import DecodeSampler
from Chemformer.molbart.models.pre_train import BARTModel
import chemformer.molbart.util as util
from chemformer.molbart.decoder import DecodeSampler
from chemformer.molbart.models.pre_train import BARTModel

self._is_forward = is_forward
self.device = device
Expand Down
18 changes: 9 additions & 9 deletions syntheseus/reaction_prediction/inference/local_retro.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ def __init__(self, model_dir: Union[str, Path], device: str = "cuda:0") -> None:
- `model_dir/data` contains `*.csv` data files needed by LocalRetro
"""

import LocalRetro
from LocalRetro import scripts
import local_retro
from local_retro import scripts

# We need to hack `sys.path` because LocalRetro uses relative imports.
sys.path.insert(0, get_module_path(LocalRetro))
sys.path.insert(0, get_module_path(local_retro))
sys.path.insert(0, get_module_path(scripts))

from LocalRetro.Retrosynthesis import load_templates
from LocalRetro.scripts.utils import init_featurizer, load_model
from local_retro.Retrosynthesis import load_templates
from local_retro.scripts.utils import init_featurizer, load_model

data_dir = Path(model_dir) / "data"
self.args = init_featurizer(
Expand Down Expand Up @@ -67,7 +67,7 @@ def get_parameters(self):

def _mols_to_batch(self, mols: List[Molecule]) -> Any:
from dgllife.utils import smiles_to_bigraph
from LocalRetro.scripts.utils import collate_molgraphs_test
from local_retro.scripts.utils import collate_molgraphs_test

graphs = [
smiles_to_bigraph(
Expand All @@ -85,8 +85,8 @@ def _mols_to_batch(self, mols: List[Molecule]) -> Any:
def _build_batch_predictions(
self, batch, num_results, inputs, batch_atom_logits, batch_bond_logits
):
from LocalRetro.scripts.Decode_predictions import get_k_predictions
from LocalRetro.scripts.get_edit import combined_edit, get_bg_partition
from local_retro.scripts.Decode_predictions import get_k_predictions
from local_retro.scripts.get_edit import combined_edit, get_bg_partition

graphs, nodes_sep, edges_sep = get_bg_partition(batch)
start_node = 0
Expand Down Expand Up @@ -135,7 +135,7 @@ def _build_batch_predictions(

def __call__(self, inputs: List[Molecule], num_results: int) -> List[BackwardPredictionList]:
import torch
from LocalRetro.scripts.utils import predict
from local_retro.scripts.utils import predict

batch = self._mols_to_batch(inputs)
batch_atom_logits, batch_bond_logits, _ = predict(self.args, self.model, batch)
Expand Down
2 changes: 1 addition & 1 deletion syntheseus/reaction_prediction/inference/retro_knn.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def load_data_store(path: Path):
self.adapter.eval()

def _forward_localretro(self, bg):
from LocalRetro.scripts.model_utils import pair_atom_feats, unbatch_feats, unbatch_mask
from local_retro.scripts.model_utils import pair_atom_feats, unbatch_feats, unbatch_mask

bg = bg.to(self.args["device"])
node_feats = bg.ndata.pop("h").to(self.args["device"])
Expand Down
11 changes: 6 additions & 5 deletions syntheseus/reaction_prediction/inference/root_aligned.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,10 @@ def __init__(
for key, value in opt_from_config.items():
setattr(opt, key, value)
opt.models = [get_unique_file_in_dir(model_dir, pattern="*.pt")]
opt.output = "/dev/null"
setattr(opt, "synthon", False)

import score
from root_aligned import score

score.opt = opt

Expand All @@ -79,7 +80,7 @@ def get_parameters(self):

def _mols_to_batch(self, inputs) -> List[bytes]:
"""Map `Molecule`s into SMILES bytes."""
from score import smi_tokenizer
from root_aligned.score import smi_tokenizer

# Example outcome: b'C C ( = O ) c 1 c c c 2 c ( c c n 2 C ( = O ) O C ( C ) ( C ) C ) c 1\n'.
return [bytes(smi_tokenizer(input.smiles) + "\n", "utf-8") for input in inputs]
Expand Down Expand Up @@ -151,7 +152,7 @@ def __call__(self, inputs, num_results: int, random_augmentation=False) -> List[
randomized_mol = Molecule(smiles=randomized_smi, canonicalize=False)
augmented_inputs.append(randomized_mol)
else:
from preprocessing.generate_PtoR_data import clear_map_canonical_smiles
from root_aligned.preprocessing.generate_PtoR_data import clear_map_canonical_smiles

for input in inputs:
product_atom_map_numbers = [i + 1 for i in range(input.rdkit_mol.GetNumAtoms())]
Expand Down Expand Up @@ -203,7 +204,7 @@ def __call__(self, inputs, num_results: int, random_augmentation=False) -> List[
for j in range(len(augmented_predictions[i])):
lines.append(augmented_predictions[i][j].replace(" ", ""))

from score import canonicalize_smiles_clear_map
from root_aligned.score import canonicalize_smiles_clear_map

raw_predictions = []
pool = multiprocessing.Pool(multiprocessing.cpu_count())
Expand All @@ -227,7 +228,7 @@ def __call__(self, inputs, num_results: int, random_augmentation=False) -> List[
ranked_results = [] # shape: `[data_size, augmentation_size x beam_size]`
ranked_scores = []

from score import compute_rank
from root_aligned.score import compute_rank

for i in range(len(predictions)):
rank, _ = compute_rank(predictions[i])
Expand Down
2 changes: 1 addition & 1 deletion syntheseus/reaction_prediction/models/retro_knn.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def __init__(self, dim, k=32):
nn.init.constant_(self.edge_proj.bias[0], 10.0)

def forward(self, g, nfeat, efeat, ndist, edist):
from LocalRetro.scripts.model_utils import pair_atom_feats
from local_retro.scripts.model_utils import pair_atom_feats

efeat = reorder_efeat(g, efeat)
x = self.gnn(g, nfeat, efeat)
Expand Down
Loading