Machine Learning & Detecting Fraudulent Healthcare Providers

Blog Write-up WIP | Kaggle | David Gottlieb | Theodore Cheek

Project Goals

The goal of this project is to analyze and predict the fraudulence of healthcare providers in the well-known Kaggle data set, linked above.

Classification analysis using this methodology can be particularly of use to Health Insurance companies as well as to public health advocates. With the proper approach, an assessor can ascertain not only which providers are engaged in fraudulent activities, but also avoid erroneously classifying companies as fraudulent, thus - in the case of insurance companies - saving a great deal of money in the process.

In order to achieve this goal, we will dig deeply into the data and apply a variety of Machine Learning Classification techniques, including such classic models as Logistic Regression and Support Vector Classification as well as involving more modern models, such as CatBoost or LightGBM.

What's in this Repo?

Here you will find 3 notebooks of particular note: I. Data Exploration, II. Data Preparation, and III. Machine Learning Processing. Considering the length of the project, we found it most expedient to separate the three approaches into separate notebooks for easier viewing.

Over the course of the project, we incorporated a variety of standard tools and techniques including Pandas, Numpy, Seaborne, and Matplotlib. Of further note are SKLearn's StandardScaler, PCA, LogisticRegression, KNeighborsClassifier, LinearDiscriminantAnalysis, GaussianNaiveBayes, and GridSearchCV. We also used SVM, CatBoostClassifier, LGBMClassifier. The very end of our project culminated with successful implementation of stacking techniques.

For More

For further discussion of the project, its process, and the full analysis, please consult the blog which - at the time of this writing - is yet forthcoming. Should you have any other questions, please feel free to reach out to either of us.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
catboost_info		catboost_info
.gitignore		.gitignore
Health Care Fraud Capstone DG.ipynb		Health Care Fraud Capstone DG.ipynb
Health Care Fraud Capstone Full Data Pre-Processing.ipynb		Health Care Fraud Capstone Full Data Pre-Processing.ipynb
Health Fraud - III. Machine Learning - Upsampled Balanced Version.ipynb		Health Fraud - III. Machine Learning - Upsampled Balanced Version.ipynb
Health Fraud - III. Machine Learning-Copy1.ipynb		Health Fraud - III. Machine Learning-Copy1.ipynb
Health Fraud - III. Machine Learning.ipynb		Health Fraud - III. Machine Learning.ipynb
HealthFraudExplorations.ipynb		HealthFraudExplorations.ipynb
HealthFraudExplorations_724.ipynb		HealthFraudExplorations_724.ipynb
README.md		README.md
drive-download-20210906T021840Z-001.zip		drive-download-20210906T021840Z-001.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning & Detecting Fraudulent Healthcare Providers

Blog Write-up WIP | Kaggle | David Gottlieb | Theodore Cheek

Project Goals

What's in this Repo?

For More

About

Releases

Packages

Contributors 2

Languages

datatodavid/FraudDetection

Folders and files

Latest commit

History

Repository files navigation

Machine Learning & Detecting Fraudulent Healthcare Providers

Blog Write-up WIP | Kaggle | David Gottlieb | Theodore Cheek

Project Goals

What's in this Repo?

For More

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages