ML Training and Inference for Synthetic Fermentation

Preprint: Predicting Three-Component Reaction Outcomes from 40k Miniaturized Reactant Combinations

For code used to collect experimental data, see this repository.

Installation

conda env create -f environment.yaml

or (if you don't have a suitable GPU):

conda env create -f environment_cpuonly.yaml

Notes:

The environment.yaml file is written for a workstation with Nvidia GPU. Use environment_cpuonly.yaml instead to run only on CPU. This will not install CUDA and will install the CPU-only versions of pytorch and dgl.
Installing the dgl dependency through conda sometimes creates issues where some packages are "not found" despite existing in the specified channels. Instead, try installing dgl separately with pip:
```
pip install dgl -f https://data.dgl.ai/wheels/repo.html
```
On some systems with outdated libraries (such as university clusters) dgl wheels may not work, and you may need to build it from source. See the DGL installation guide for more information.
There is an issue with pytorch and the 2024.1.x version of the mkl dependency. If an ImportError [...] undefined symbol: iJIT_NotifyEvent occurs, downgrade with conda install mkl=2024.0

Log in to WandB

We track training runs with WandB. Before starting any training runs, you need to log into WandB by running

wandb login

then supply your API key.

Training models

The run.py script serves as an entrypoint for training models. It is configured with a set of command line arguments, including the path to a configuration file with model hyperparameters. See config/config_example.yaml for an example configuration file.

To see the full list of command line arguments, run:

python run.py train --help

Predicting using trained models

The inference.py script serves as an entrypoint for predicting reaction outcome. It expects a CSV file with three columns: initiator, monomer, terminator. See config/config_example.yaml for an example configuration file.

Call it like:

python inference.py -i example_reactants.csv -o out.csv

or use python inference.py --help for more information.

Development

We use nbstripout to remove output from notebooks before committing to the repository. Install with:

conda install -c conda-forge nbstripout # or pip install nbstripout
nbstripout --install  # configures git filters and attributes for this repo

We use pre-commit hooks to ensure standardized code formatting. Install with:

pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
config		config
data		data
logs		logs
notebooks		notebooks
production_models		production_models
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
environment_cpuonly.yaml		environment_cpuonly.yaml
inference.py		inference.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Training and Inference for Synthetic Fermentation

Installation

Log in to WandB

Training models

Predicting using trained models

Development

About

Releases 1

Packages

Languages

License

jugoetz/synferm-predictions

Folders and files

Latest commit

History

Repository files navigation

ML Training and Inference for Synthetic Fermentation

Installation

Log in to WandB

Training models

Predicting using trained models

Development

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages