This package is aimed at molecular descriptor generation, data processing, model training, and hyper-parameter optimization.
- Summarizes various molecular descriptor generation methods provided by different tools/packages, including RDKit, CDK, Openbabel, Pubchem, Deepchem, etc. It's easy for batch generation.
- Data pre-processing and splitting.
- Modeling training and hyperparameter optimization by leveraging Scikit-Learn, XGBoost, and LightGBM, more machine learning, and neural network methods will be included/wrapped in the future.
SPOC currently supports Python >= 3.6 and requires these packages on any condition.
# Clone project
git clone git@github.com:WhitestoneYang/spoc.git # or other released or tagged version.
# conda installation
bash - i conda_installation.sh
# docker build
docker build --progress=plain -t spoc .
# docker run
docker run -v $(pwd):/workspace/ --network host -it spoc
- Please refer the tests for descriptor generation examples, including single and multiple molecular descriptor generation examples
- Please refer the examples for 1) molecular descriptor generation; 2) data processing; 3) model training; 4) hyper parameter optimization workflow.
If you have used SPOC in your research, please cite our paper.