This page contains a curated list of examples, tutorials, blogs about XGBoost usecases. It is inspired by awesome-MXNet, awesome-php and awesome-machine-learning.
Please send a pull request if you find things that belongs to here.
- Code Examples
- Machine Learning Challenge Winning Solutions
- Tutorials
- Usecases
- Tools using XGBoost
- Integrations with 3rd party software
- Awards
- Windows Binaries
Note: for the R package, see the in-package examples and vignettes instead
This is a list of short codes introducing different functionalities of xgboost packages.
- Basic walkthrough of packages python Julia PHP
- Customize loss function, and evaluation metric python Julia
- Boosting from existing prediction python Julia
- Predicting using first n trees python Julia
- Generalized Linear Model python Julia
- Cross validation python Julia
- Predicting leaf indices python
Most of examples in this section are based on CLI or python version. However, the parameter settings can be applied to all versions
- Starter script for Kaggle Higgs Boson
- Kaggle Tradeshift winning solution by daxiongshu
- Benchmarking the most commonly used open source tools for binary classification
XGBoost is extensively used by machine learning practitioners to create state of art data science solutions, this is a list of machine learning winning solutions with XGBoost. Please send pull requests if you find ones that are missing here.
- Bishwarup Bhattacharjee, 1st place winner of Allstate Claims Severity conducted on December 2016. Link to discussion
- Benedikt Schifferer, Gilberto Titericz, Chris Deotte, Christof Henkel, Kazuki Onodera, Jiwei Liu, Bojan Tunguz, Even Oldridge, Gabriel De Souza Pereira Moreira and Ahmet Erdem, 1st place winner of Twitter RecSys Challenge 2020 conducted from June,20-August,20. GPU Accelerated Feature Engineering and Training for Recommender Systems
- Eugene Khvedchenya,Jessica Fridrich, Jan Butora, Yassine Yousfi 1st place winner in ALASKA2 Image Steganalysis. Link to discussion
- Dan Ofer, Seffi Cohen, Noa Dagan, Nurit, 1st place in WiDS Datathon 2020. Link to discussion
- Chris Deotte, Konstantin Yakovlev 1st place in IEEE-CIS Fraud Detection. Link to discussion
- Giba, Lucasz, 1st place winner in Santander Value Prediction Challenge organized on August,2018. Solution discussion and code
- Beluga, 2nd place and Evgeny Nekrasov, 3rd place winner in Statoil/C-CORE Iceberg Classifier Challenge'2018. Link to discussion
- Radek Osmulski, 1st place of the iMaterialist Challenge (Fashion) at FGVC5. Link to the winning solution.
- Maksims Volkovs, Guangwei Yu and Tomi Poutanen, 1st place of the 2017 ACM RecSys challenge. Link to paper.
- Vlad Sandulescu, Mihai Chiru, 1st place of the KDD Cup 2016 competition. Link to the arxiv paper.
- Marios Michailidis, Mathias Müller and HJ van Veen, 1st place of the Dato Truely Native? competition. Link to the Kaggle interview.
- Vlad Mironov, Alexander Guschin, 1st place of the CERN LHCb experiment Flavour of Physics competition. Link to the Kaggle interview.
- Josef Slavicek, 3rd place of the CERN LHCb experiment Flavour of Physics competition. Link to the Kaggle interview.
- Mario Filho, Josef Feigl, Lucas, Gilberto, 1st place of the Caterpillar Tube Pricing competition. Link to the Kaggle interview.
- Qingchen Wang, 1st place of the Liberty Mutual Property Inspection. Link to the Kaggle interview.
- Chenglong Chen, 1st place of the Crowdflower Search Results Relevance. Link to the winning solution.
- Alexandre Barachant (“Cat”) and Rafał Cycoń (“Dog”), 1st place of the Grasp-and-Lift EEG Detection. Link to the Kaggle interview.
- Halla Yang, 2nd place of the Recruit Coupon Purchase Prediction Challenge. Link to the Kaggle interview.
- Owen Zhang, 1st place of the Avito Context Ad Clicks competition. Link to the Kaggle interview.
- Keiichi Kuroyanagi, 2nd place of the Airbnb New User Bookings. Link to the Kaggle interview.
- Marios Michailidis, Mathias Müller and Ning Situ, 1st place Homesite Quote Conversion. Link to the Kaggle interview.
- Gilberto Titericz, Stanislav Semenov, 1st place in challenge to classify products into the correct category organized by Otto Group in 2015. Link to challenge. Link to kaggle winning solution
- Darius Barušauskas, 1st place winner in Predicting Red Hat Business Value. Link to interview. Link to discussion
- David Austin, Weimin Wang, 1st place winner in Iceberg-classifier-challenge Link to discussion
- Kazuki Onodera, Kazuki Fujikawa, 2nd place winner in OpenVaccine: COVID-19 mRNA Vaccine Degradation Prediction Link to Discussion
- Prarthana Bhat, 2nd place winner in DYD Competition. Link to Solution.
- XGBoost: A Scalable Tree Boosting System ([video] (https://www.youtube.com/watch?v=Vly8xGnNiWs) + slides) by Tianqi Chen at the Los Angeles Data Science meetup
- XGBoost Training with Dask, using Saturn Cloud
- Machine Learning with XGBoost on Qubole Spark Cluster
- XGBoost Official RMarkdown Tutorials
- An Introduction to XGBoost R Package by Tong He
- Open Source Tools & Data Science Competitions by Owen Zhang - XGBoost parameter tuning tips
- Feature Importance Analysis with XGBoost in Tax audit
- Winning solution of Kaggle Higgs competition: what a single model can do
- XGBoost - eXtreme Gradient Boosting by Tong He
- How to use XGBoost algorithm in R in easy steps by TAVISH SRIVASTAVA (Chinese Translation 中文翻译 by HarryZhu)
- Kaggle Solution: What’s Cooking ? (Text Mining Competition) by MANISH SARASWAT
- Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R) by Manuel Amunategui (Youtube Link) (GitHub Link)
- XGBoost Rossman Parameter Tuning by Norbert Kozlowski
- Featurizing log data before XGBoost by Xavier Conort, Owen Zhang etc
- West Nile Virus Competition Benchmarks & Tutorials by Anna Montoya
- Ensemble Decision Tree with XGBoost by Bing Xu
- Notes on eXtreme Gradient Boosting by ARSHAK NAVRUZYAN (iPython Notebook)
- Complete Guide to Parameter Tuning in XGBoost by Aarshay Jain
- Practical XGBoost in Python online course by Parrot Prediction
- Spark and XGBoost using Scala by Elena Cuoco
If you have particular usecase of xgboost that you would like to highlight. Send a PR to add a one sentence description:)
- XGBoost is used in Kaggle Script to solve data science challenges.
- Distribute XGBoost as Rest API server from Jupyter notebook with BentoML. Link to notebook
- Seldon predictive service powered by XGBoost
- XGBoost Distributed is used in ODPS Cloud Service by Alibaba (in Chinese)
- XGBoost is incoporated as part of Graphlab Create for scalable machine learning.
- Hanjing Su from Tencent data platform team: "We use distributed XGBoost for click through prediction in wechat shopping and lookalikes. The problems involve hundreds millions of users and thousands of features. XGBoost is cleanly designed and can be easily integrated into our production environment, reducing our cost in developments."
- CNevd from autohome.com ad platform team: "Distributed XGBoost is used for click through rate prediction in our display advertising, XGBoost is highly efficient and flexible and can be easily used on our distributed platform, our ctr made a great improvement with hundred millions samples and millions features due to this awesome XGBoost"
- BayesBoost - Bayesian Optimization using xgboost and sklearn API
- FLAML - An open source AutoML library designed to automatically produce accurate machine learning models with low computational cost. FLAML includes XGBoost as one of the default learners and can also be used as a fast hyperparameter tuning tool for XGBoost (code example).
- gp_xgboost_gridsearch - In-database parallel grid-search for XGBoost on Greenplum using PL/Python
- tpot - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
Open source integrations with XGBoost:
- Neptune.ai - Experiment management and collaboration tool for ML/DL/RL specialists. Integration has a form of the XGBoost callback that automatically logs training and evaluation metrics, as well as saved model (booster), feature importance chart and visualized trees.
- Optuna - An open source hyperparameter optimization framework to automate hyperparameter search. Optuna integrates with XGBoost in the XGBoostPruningCallback that let users easily prune unpromising trials.
- dtreeviz - A python library for decision tree visualization and model interpretation. Starting from version 1.0, dtreeviz is able to visualize tree ensembles produced by XGBoost.
- John Chambers Award - 2016 Winner: XGBoost R Package, by Tong He (Simon Fraser University) and Tianqi Chen (University of Washington)
- InfoWorld’s 2019 Technology of the Year Award
Unofficial windows binaries and instructions on how to use them are hosted on Guido Tapia's blog