scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

Descriptors
- MolecularDescriptorTransformer

Fingerprints
- MorganFingerprintTransformer
- MACCSKeysFingerprintTransformer
- RDKitFingerprintTransformer
- AtomPairFingerprintTransformer
- TopologicalTorsionFingerprintTransformer
- MHFingerprintTransformer
- SECFingerprintTransformer
- AvalonFingerprintTransformer

Conversions
- SmilesToMol

Standardizer
- Standardizer

- safeinference - SafeInferenceWrapper - set_safe_inference_mode

Utilities
- CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

or from conda-forge

conda install -c conda-forge scikit-mol

The conda forge package should get updated shortly after a new tagged release on pypi.

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Contributing

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Esben Jannik Bjerrum @ebjerrum, esbenbjerrum+scikit_mol@gmail.com
Carmen Esposito @cespos
Son Ha, sonha@uni-mainz.de
Oh-hyeon Choung, ohhyeon.choung@gmail.com
Andreas Poehlmann, @ap--
Ya Chen, @anya-chen
Rafał Bachorz @rafalbachorz
Adrien Chaton @adrienchaton
@VincentAlexanderScholz
@RiesBen
@enricogandini
@mikemhenry
@c-feldmann

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
notebooks		notebooks
ressources/logo		ressources/logo
scikit_mol		scikit_mol
tests		tests
.gitignore		.gitignore
CITATION.bib		CITATION.bib
CONTRIBUTION.md		CONTRIBUTION.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

Implemented

Installation

Documentation

Contributing

BUGS

Contributers:

About

Releases 9

Packages

Contributors 13

Languages

License

EBjerrum/scikit-mol

Folders and files

Latest commit

History

Repository files navigation

scikit-mol

Scikit-Learn classes for molecular vectorization using RDKit

Implemented

Installation

Documentation

Contributing

BUGS

Contributers:

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 13

Languages

Packages