GitHub - Oufattole/HR-Attrition: Will they stay or will they go? Predicting whether employees will leave + why.

This Repo is a fork of https://github.com/Lion-Mod/HR-Attrition which contains bug fixes to that repo and reproduces similar results.

"The output of this function call will be a number between 0 and 1 that will indicate us how similar the two tables are, being 0 the worst and 1 the best possible score."
- This is incorrect even in the given documentation example
bugs
- sdv parameter names for copulaGAN had to be updated
- the ord_feats had to be fixed
- "\r" in the raw ipynb file causes an editor crash in jupyter notebook, I removed all of them in a python script
Methodology Issues
- He used AUC to choose his first model which was lr
- Then he used AUC to choose his last model which was catboost, but he chose gbc which had the second highest AUC
  - I tried gbc with synthetic + original data and with only original data and found you get higher results with synthetic + original data
- Dataset differences
  - the file size is smaller for the dataset given compared to the kaggle ibm one that is linked.
  - Both had a dimension of (1470, 35) so I think the difference is the compression algorithm from storing the data on github

lr = logistic regression

gbc = Gradient boosting classifier

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
models		models
Attrition_performance.ipynb		Attrition_performance.ipynb
CTGAN-Model.pkl		CTGAN-Model.pkl
CopulaGAN.pkl		CopulaGAN.pkl
GuassianCopulaModel.pkl		GuassianCopulaModel.pkl
HR Employee Attrition.csv		HR Employee Attrition.csv
Predicting_employee_attrition.ipynb		Predicting_employee_attrition.ipynb
README.md		README.md
TVAE-Model.pkl		TVAE-Model.pkl
app.py		app.py
check.py		check.py
example_reason_plot.PNG		example_reason_plot.PNG
fix.py		fix.py
logs.log		logs.log
multi_prediction.gif		multi_prediction.gif
prep_pipe.pkl		prep_pipe.pkl
single_prediction.gif		single_prediction.gif
synth_data.csv		synth_data.csv
test.py		test.py

Provide feedback