Predict optical properties of molecules with machine learning.
A Google Colab notebook is available here with examples of using the various types of models and predictions. Alternatively, you may use the command line instructions below.
- Install Anaconda or Miniconda if you have not yet done so.
git clone git@github.com:learningmatter-mit/uvvisml.git
cd uvvisml
conda env create -f environment.yml
cd uvvisml
bash get_model_files.sh
(This downloads trained model files from Zenodo.)conda activate uvvisml
pip install chemprop
To make predictions, specify a --test_file
with the dyes or dye-solvent pairs for which you wish to predict properties. This should be a CSV with one dye (for vacuum TD-DFT predictions) or dye-solvent pair (for experimental predictions) per line. For example, the test file for vacuum TD-DFT predictions could be:
smiles
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1
CCN(CC)c1ccc2cc(-c3nc4ccccc4n3C)c(=O)oc2c1
C[SiH](C)c1cccc2ccccc12
The test file for experimental predictions could be:
smiles,solvent
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1,C1CCCCC1
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1,CCOC(C)=O
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1,CC#N
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1,CCO
CCN(CC)c1ccc2c(C(F)(F)F)cc(=O)oc2c1,OCC(O)CO
CCN(CC)c1ccc2cc(-c3nc4ccccc4n3C)c(=O)oc2c1,CC#N
C[SiH](C)c1cccc2ccccc12,C1CCCCC1
- Experimental peak wavelength of maximum absorption:
--property absorption_peak_nm_expt
- Vertical excitation energy with maximum oscillator strength in vacuum TD-DFT:
--property vertical_excitation_eV_tddft
- Single-fidelity (experiment or TD-DFT):
--method chemprop
- Multi-fidelity (experiment only):
--method chemprop_tddft
- Experiment:
--train_dataset combined
(default) or--train_dataset deep4chem
- TD-DFT:
--train_dataset all_wb97xd3
Cluster that the script will be run on. Includes options for Supercloud and Engaging clusters at MIT. Default of None
runs the script on the local machine.
Output the ensemble variance (a measure of epistemic uncertainty) in predictions using --uncertainty_method ensemble_variance
.
python uvvisml/predict.py --test_file uvvisml/data/splits/lambda_max_abs/deep4chem/group_by_smiles/smiles_target_test.csv --property absorption_peak_nm_expt --method chemprop --preds_file test_preds.csv
python uvvisml/predict.py --test_file uvvisml/data/splits/lambda_max_abs/deep4chem/group_by_smiles/smiles_target_test.csv --property vertical_excitation_eV_tddft --method chemprop --preds_file test_preds.csv
python uvvisml/predict.py --test_file uvvisml/data/splits/lambda_max_abs/deep4chem/group_by_smiles/smiles_target_test.csv --property absorption_peak_nm_expt --method chemprop --preds_file test_preds.csv --train_dataset deep4chem
python uvvisml/predict.py --test_file uvvisml/data/splits/lambda_max_abs/deep4chem/group_by_smiles/smiles_target_test.csv --property absorption_peak_nm_expt --method chemprop_tddft --preds_file test_preds.csv --log_level info
Please see the Data README for details on the sources and processing of the data used in this repository.
If you use this code, please cite the following manuscript:
@article{greenman2022multi,
title={Multi-fidelity prediction of molecular optical peaks with deep learning},
author={Greenman, Kevin P. and Green, William H. and G{\'{o}}mez-Bombarelli, Rafael},
journal={Chemical Science},
year={2022},
volume={13},
issue={4},
pages={1152-1162},
publisher={The Royal Society of Chemistry},
doi={10.1039/D1SC05677H},
url={http://dx.doi.org/10.1039/D1SC05677H}
}
The code for reproducing the results and figures from the above paper is available on Zenodo.