Understanding and implementing the assumption checks behind one of the most important statistical techniques in data science - Logistic Regression
- Link to TowardsDataScience article: https://towardsdatascience.com/assumptions-of-logistic-regression-clearly-explained-44d85a22b290
- Logistic regression is a highly effective modeling technique that has remained a mainstay in statistics since its development in the 1940s.
- Given its popularity and utility, data practitioners should understand the fundamentals of logistic regression before using it to tackle data and business problems.
- In this project, we explore the key assumptions of logistic regression with theoretical explanations and practical Python implementation of the assumption checks.
(1) Logistic_Regression_Assumptions.ipynb
- The main notebook containing the Python implementation codes (along with explanations) on how to check for each of the 6 key assumptions in logistic regression
(2) Box-Tidwell-Test-in-R.ipynb
- Notebook containing R code for running Box-Tidwell test (to check for logit linearity assumption)
(3) /data
- Folder containing the public Titanic dataset (train set)
(4) /references
- Folder containing several sets of lecture notes explaining advanced regression
- @dataninj4 for correcting imports and adding .loc referencing in diagnosis_df cell so that it runs without errors in Python 3.6/3.8
- @ArneTR for rightly pointing out that VIF calculation should include a constant, and correlation matrix should exclude target variable
- Machine Learning Essentials - Practical Guide in R
- Logistic and Linear Regression Assumptions - Violation Recognition and Control
- Testing linearity in the logit using Box-Tidwell Transformation in SPSS - Youtube
- Logistic Regression using SPSS
- Statistics How To - Cook's Distance
- Statsmodels Documentation - GLM
- Statsmodels Documetation - Logit Influence example notebook
- PennState Eberly College of Science - Stat 462
- Statistics Solution - Assumptions of Logistic Regression
- Course Notes for IS 6489 - Statistics and Predictive Analytics
- MSc in Big Data Analytics at Carlos III University of Madrid - Notes for Predictive Modeling
- Freakonometrics - Residuals from a Logistic Regression
- Kaggle - Titanic - Logistic Regression with Python
- Yellowbrick API Reference - Cook's Distance
- DataCamp - Understanding Logistic Regression in Python
- Statology - How to Calculate Cook's Distance
- ResearchGate - Box-Tidwell Test in SPSS
- CrossValidated - Why include x ln x interaction term helps
- UCLA IDRE - Logistic Regression Diagnostics
- Logistic and Linear Regression Assumptions: Violation Recognition and Control