Quantitative Prediction of Physicochemical Properties of Deep Eutectic Solvents Using Machine Learning
Deep eutectic solvents (DESs) have emerged as a promising eco-friendly alternative to environmentally hazardous organic solvents. DESs are a new class of mixtures characterized by a reduced melting point relative to pure starting materials. This phenomenon is explained by the formation of hydrogen bonds. However, selecting the suitable DES for a specific application has been traditionally challenging and time-consuming. This study presents an innovative approach to optimizing DES design by developing a machine learning workflow that enables us to predict fundamental physicochemical properties — melting point, density, and viscosity — critical to determining the scope of DES. Our models, particularly gradient-boosted trees like CatBoost, exhibited strong predictive accuracy, with cross-validation R2 values of 0.76, 0.89, and 0.64 for the mentioned properties, respectively. ore importantly, we created a web resource, DESignSolvents, which hosts an extensive database of two- and three-component DESs properties and associated prediction models. This resource can significantly contribute to accelerating the development of DES for various applications, adopting green chemistry practices, and facilitating the global transition to sustainable solvent usage.
Each folder in this repository contains files specific to one of the three physico-chemical properties predicted in this work.
CSV
files contain the data of the corresponding physico-chemical property and the dataset used to train machine learning models.
DataAnalysis
notebook contains the main processing steps, such as SMILES parsing, removal of duplicates, exploratory data analysis, etc.
MiningDescriptors
file contains the code for calculating descriptors using RDKit.
ML
notebook contains the code for optimizing and selectin the best machine learning model.
BEST_MODEL
notebook contains the optimized ML model of the top performance. The models are also available in the PKL
format.