A curated list of awesome machine learning libraries for marketing. Inspired by both awesome-production-machine-learning and awesome-machine-learning, and created and maintained by Station 10.
Note that some packages could fit into more than one section. This has been noted in the descriptions so be sure to Ctrl + F as well as exploring by sections.
Want to contribute? Please raise a Pull Request or an issue. If you find this useful please drop a ⭐️. This helps motivate us and others to update and maintain the list.
All packages are Python based unless otherwise stated. We welcome contributions from R Users!
- ChannelAttribution Python and R library that employs a k-order Markov representation to identify structural correlations in customer journey data.
- fractribution Data driven MTA by Google.
- Marketing-Attribution-Models Heuristic and data driven Multi Touch Attribution.
- markov-chain-attribution Leverages a first order Markov chain to reallocate conversions.
- mta Various data driven Multi Touch Attribution algorithms.
- pychattr Python implementation of the excellent R ChannelAttribution library.
- shapley Shapley Values For Attribution Modelling.
- shapley-attribution-model-zhao-naive Shapley Value Methods for Attribution Modeling (Naive, Set-based).
- CausalImpact (R) Causal Inference using Bayesian structural time-series models by Google.
- causalml Uplift modeling and causal inference with ML by Uber.
- CausalPy
Causal Inference & Synthetic Control. Supports fitting with
scikit-learn
andPyMC
models. - dowhy Causal Inference that supports explicit modeling and testing of causal assumptions.
- SyntheticControlMethods Causal inference using Synthetic Control.
- tfcausalimpact Google's CausalImpact Algorithm implemented on top of TensorFlow Probability.
- upliftml Scalable unconstrained and constrained uplift modeling from experimental data using PySpark and H20.
- scikit-uplift
- Uplift modeling python package that provides fast sklearn-style models implementation, evaluation metrics and visualization tools.
- btyd Buy Till You Die and CLV statistical models in Python.
- lifetimes CLV and Churn modelling. Deprecated and incorporated into pymc-marketing.
- lucius-ltv CLV for subscriptions.
- gapandas4 Python package for querying the Google Analytics Data API for GA4 and displaying the results in a Pandas dataframe.
- EconML AI, Econometrics and Causal Inference modelling.
- statsmodels Statistical modeling including time series and econometrics.
- trimmed_match Ad effectiveness through the design and analysis of randomized Geo Experiments by Google.
- matched_markets Time-Based regression matched markets approach for designing Geo Experiments by Google.
- GeoexperimentsResearch (R) Open-source implementation of the geo experiment analysis methodology developed at Google (Archived)
- GeoLift Geo Experimentation methodology based on Synthetic Control Methods used to measure lift of ad campaigns by Facebook.
- BayesianMMM Bayesian Media Mix mMdelling with shape and carryover effect.
- dammmdatagen (R) Media Mix Modeling Data Generator.
- lightweight-mmm Bayesian Media Mix Models by Google.
- mamimo Small Media Mix Models designed to be used in conjunction with ML libraries (e.g. SKL)
- mmm-stan Multiplicative Media Media Mix Model.
- pymc-marketing Bayesian Media Mix, Adstock, Saturation Customer Lifetime Value & Churn models.
- Robyn (R) Bayesian Media Mix Models by Facebook.
- amazon-denseclus Python module for clustering both categorical and numerical data using UMAP and HDBSCAN by Amazon.
- rfm RFM Analysis and Customer Segmentation.
- retentioneering-tools Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation
- ecommercetools Data science toolkit for those working in technical ecommerce, marketing science, and technical seo and includes a wide range of features to aid analysis and model building.
- lightfm Implementation of LightFM, a hybrid recommendation algorithm.
- openrec Open-source and modular library for neural network-inspired recommendation algorithms.
- recmetrics A library of metrics for evaluating recommender systems
- recommenders Best Practices on Recommendation Systems by Microsoft.
- Surprise Scikit for building and analyzing recommender systems that deal with explicit rating data.
- darts Python library for user-friendly forecasting and anomaly detection on time series built using SKL conventions.
- gluonts Probabilistic time series modeling, focusing on deep learning based models, based on PyTorch and MXNet.
- neural_prophet Framework for interpretable time series forecasting built on PyTorch.
- orbit Python package for Bayesian time series forecasting and inference by Uber.
- pmdarima
- Pmdarima is a statistical library designed to fill the void in Python's time series analysis capabilities.
- prophet Additive time series modelling by Facebook.
- sktime A unified framework for ML with Time Eeries.
- statsforecast Lightning ⚡️ fast forecasting with statistical and econometric models.
- stumpy STUMPY computes something called the matrix profile, which is just an academic way of saying "for every subsequence automatically identify its corresponding nearest-neighbor"
- temporian Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖.
- tbats BATS and TBATS time series forecasting
- tsfresh Time Series Feature extraction based on scalable hypothesis tests.
- tslearn The machine learning toolkit for time series analysis in Python.
- lifelines lifelines is a pure Python implementation of the best parts of survival analysis.
- pysurvival An open source python package for Survival Analysis modeling.
- scikit-survival Survival analysis built on top of scikit-learn.