Chemsy - A Minimalistic Automatic Framework for Chemometrics and Machine Learning

About The Project

This project is to make a lightweight and flexible automatic framework for chemometrics and machine learning. The main target for the methods are for spectroscopic data and industrial process data analysis. Chemsy provides a structured, customizable and minimalistic framework for automatic pre-processing search. The syntax of Chemsy also follows the widely-used sklearn library, and any algorithms/method that has the sklearn syntax will be usable in Chemsy. Chemsy supports freedom, open source and software accessability for all chemometricians, machine learning engineers and data scientists.

Chemsy serves as a framework for automatic pre-processing and modelling:

Future Release

We will provide support for explainable AI (xAI) and hybrid AI (hAI) in the near future. Future specific model interpretation tools are being internally reviewed for release.

Install a Stable Version

pip install Chemsy

Install the Most Updated Version (Recommended)

Install on Google Colab:

In a Colab code block:

!pip install git+https://github.com/tsyet12/Chemsy --quiet

Install on local python environment:

In a environment terminal or CMD:

pip install git+https://github.com/tsyet12/Chemsy --quiet

Current support for algorithms

Automatic pre-processing search with support for:

Partial Least Squares with Cross Validation
Savitzky–Golay filter
Asymmetric Least Squares (AsLS) Baseline Correction
Modified Polynomial Baseline Correction
Improved Modified Polynomial Baseline Correction
Zhang Fit Baseline Correction
Linear Baseline Correction
Second Order Baseline Correction
Multiplicative Scatter Correction
First Derivative
Second Derivative
Standard Normal Variate
Robust Normal Variate
Standard Scaler (also known as Autoscaling)
Min Max Scaler
Any other algorithms with sklearn syntax can be used directly

To see what are the most updates algorithms available:

import chemsy
from chemsy.help import see_methods

# see what preprocessing methods are available
see_methods(chemsy.prep.methods)

# see what prediction methods are available
see_methods(chemsy.predict.methods)

Return:

Preprocessing method supported:
['BaselineASLS', 'BaselineIModPoly', 'BaselineLinear', 'BaselineModpoly', 'BaselineSecondOrder', 'BaselineZhangFit', 'FirstDerivative', 'FunctionTransformer', 'KernelPCA', 'MSC', 'MaxAbsScaler', 'MinMaxScaler', 'PCA', 'PowerTransformer', 'QuantileTransformer', 'RNV', 'RobustScaler', 'SNV', 'SavgolFilter', 'SecondDerivative', 'StandardScaler']

Prediction method supported:
['BayesianRidge', 'DecisionTreeRegressor', 'ElasticNet', 'GaussianProcessRegressor', 'GradientBoostingRegressor', 'KNeighborsRegressor', 'KernelRidge', 'Lasso', 'LinearRegression', 'MLPRegressor', 'PLSRegression', 'PartialLeastSquaresCV', 'RandomForestRegressor', 'Ridge']

Getting Started

Quick evaluation on Google Colab:

For quickstart/evaluation of the functionality, see this Google Colab notebook online.

Quick functionality in 3 Steps:

Import libraries and load dataset

# Import all modules necessary 
import chemsy
from chemsy.explore import SupervisedChemsy
from chemsy.prep.methods import *
from chemsy.predict.methods import *
import numpy as np
import pandas as pd

# Use a default dataset
from sklearn.datasets import load_diabetes
X, Y = load_diabetes(return_X_y=True)

Make a custom recipe

# Make a custom recipe for the method search, all combinations will be evaluated
custom_recipe= {
"Level 0":[None],
"Level 1":[MSC(),StandardScaler(),MinMaxScaler(),RobustScaler()],
"Level 2":[PowerTransformer(),QuantileTransformer(output_distribution='normal', random_state=0), PCA(n_components='mle')],
"Level 3":[PartialLeastSquaresCV(),Lasso(), ]
}

Search pre-processing methods

# Search pre-processing methods and all combinations
solutions=SupervisedChemsy(X, Y,recipe=custom_recipe)

# Show the results
solutions.get_results(verbose=False)

Return:

Methods	fit_time	score_time	cross_val_MAE	cross_val_MSE	cross_val_R2	cross_val_MBE
StandardScaler + PCA + PartialLeastSquaresCV	0.177647	0.00294271	43.1078	2816.97	0.513709	0.72431
MinMaxScaler + PCA + PartialLeastSquaresCV	0.185936	0.00269322	43.2748	2852.44	0.50761	0.522684
StandardScaler + PCA + Lasso	0.00312543	0.00111251	43.3569	2832.88	0.510979	0.908942
RobustScaler + PCA + PartialLeastSquaresCV	0.221452	0.00257006	43.3624	2832.27	0.51107	0.871943
StandardScaler + PowerTransformer + PartialLeastSquaresCV	0.201116	0.00330443	43.8542	2883.86	0.502165	0.922369
⋮	⋮	⋮	⋮	⋮	⋮	⋮

Example Recipe

A recipe from Engel et al. (2013) for spectroscopic IR data:

Engel_2013= {
"Baseline":[None, BaselineSecondOrder(),BaselineSecondOrder(degree=3),BaselineSecondOrder(degree=4),BaselineASLS(),FirstDerivative(),SecondDerivative()],
"Scatter":[None, MeanScaling(), MedianScaling(),MaxScaling(),L2NormScaling(),RNV(q=0.15),RNV(q=0.25),RNV(q=0.35),MSC()],
"Noise":[None, SavgolFilter(5,2),SavgolFilter(9,2),SavgolFilter(11,2),SavgolFilter(5,3),SavgolFilter(9,3),SavgolFilter(11,3),SavgolFilter(5,4),SavgolFilter(9,4),SavgolFilter(11,4)],
"Scaling & Transformations":[MeanCentering(),StandardScaler(),RangeScaling(),ParetoScaling(),PoissonScaling(),LevelScaling(), ],
"PLS":[PartialLeastSquaresCV()]
}

Recipe reference:

Engel, J., Gerretzen, J., Szymańska, E., Jansen, J.J., Downey, G., Blanchet, L. and Buydens, L.M., 2013. Breaking with trends in pre-processing?. TrAC Trends in Analytical Chemistry, 50, pp.96-106.https://www.sciencedirect.com/science/article/pii/S0165993613001465

Classification

For classification set the argument "classify" as True:

solutions=SupervisedChemsy(X, Y,recipe=custom_recipe,classify=True)

See this Google Colab for classification example.

Tutorial/Useful Examples

The tutorials below are Colab example on how to use Chemsy with more flexibility:

Tutorial 1: Regression Problem
Tutorial 2: Classification Problem
Tutorial 3: Random Search (Custom Solver 1)
Tutorial 4: Gerretzen Search (Custom Solver 2)
Tutorial 5: Custom Method (To be added)

Reference for search method in Tutorial 4:

Gerretzen, J., Szymańska, E., Jansen, J.J., Bart, J., van Manen, H.J., van den Heuvel, E.R. and Buydens, L.M., 2015. Simple and effective way for data preprocessing selection based on design of experiments. Analytical chemistry, 87(24), pp.12096-12103. https://pubs.acs.org/doi/abs/10.1021/acs.analchem.5b02832

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b testbranch/prep)
Commit your Changes (git commit -m 'Improve testbranch/prep')
Push to the Branch (git push origin testbranch/prep)
Open a Pull Request

License

Distributed under the Open Sourced BSD-2-Clause License. See LICENSE for more information.

Contact

Main Developer:

Sin Yong Teng sinyong.teng@ru.nl or tsyet12@gmail.com Radboud University Nijmegen

Contributors:

Testing and Development: Martijn Dingemans martijn.dingemans@ru.nl or martijn.dingemans@gmail.com

Testing and Applications: Maria Cairoli maria.cairoli@ru.nl

Conceptualization: Jeroen J. Jansen jj.jansen@science.ru.nl

Acknowledgements

This project is co-funded by TKI-E&I with the supplementary grant 'TKI- Toeslag' for Topconsortia for Knowledge and Innovation (TKI's) of the Ministry of Economic Affairs and Climate Policy. The authors thank all partners within the project 'Measure for Management (M4M)’, managed by the Institute for Sustainable Process Technology (ISPT) in Amersfoort, The Netherlands.

How to cite this software

S.Y., Teng., M., Dingemans, M., Cairoli, J., Jansen. (2021). tsyet12/Chemsy: Chemsy v1.0b (Zenodo). Zenodo. https://doi.org/10.5281/zenodo.5793315

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
chemsy		chemsy
misc		misc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemsy - A Minimalistic Automatic Framework for Chemometrics and Machine Learning

Table of Contents

About The Project

Future Release

Install a Stable Version

Install the Most Updated Version (Recommended)

Install on Google Colab:

Install on local python environment:

Current support for algorithms

Getting Started

Quick evaluation on Google Colab:

Quick functionality in 3 Steps:

Example Recipe

Classification

Tutorial/Useful Examples

Contributing

License

Contact

Acknowledgements

How to cite this software

About

Releases

Packages

Contributors 2

Languages

License

tsyet12/Chemsy

Folders and files

Latest commit

History

Repository files navigation

Chemsy - A Minimalistic Automatic Framework for Chemometrics and Machine Learning

Table of Contents

About The Project

Future Release

Install a Stable Version

Install the Most Updated Version (Recommended)

Install on Google Colab:

Install on local python environment:

Current support for algorithms

Getting Started

Quick evaluation on Google Colab:

Quick functionality in 3 Steps:

Example Recipe

Classification

Tutorial/Useful Examples

Contributing

License

Contact

Acknowledgements

How to cite this software

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages