This project aims to tackle the pervasive issue of vehicle insurance fraud, which causes substantial financial losses for insurance companies and erodes consumer trust. Fraudulent claims vary from staged accidents to exaggerated injuries, complicating the claims process and increasing costs. By leveraging historical vehicle and policy data, our objective is to develop a robust predictive model to accurately detect and prevent fraudulent claims. The implementation of this model is intended to help insurance companies minimize financial losses, enhance the efficiency of claims processing, and maintain fair premium pricing for customers.
- Source: Kaggle
- Size: 15,420 records
- Variables: 33 (both categorical and numerical)
- Key Features:
- Month of the accident
- Day of the week
- Make of the vehicle
- Accident area
- Age of the policyholder
- Various policy details
- Indicator of whether the claim was fraudulent
The dataset offers a robust sample size for training and evaluating the predictive model and includes indicators for fraudulent claims, making it suitable for building a classification model.
- Data Preparation: Handle missing values, convert data types, encode categorical variables, and scale numerical features.
- Feature Engineering: Select relevant features, encode categorical variables using one-hot encoding, and apply SMOTE to handle class imbalance.
- Model Training and Evaluation: Train and evaluate models (Isolation Forest, Gradient Boosting, Decision Tree, XGBoost, Random Forest, K-Nearest Neighbor, Logistic Regression, and CatBoost) with hyperparameter tuning.
- Model Comparison: Compare models based on performance metrics, highlighting CatBoost as the top performer.
- Python 3.x
- Required libraries:
- pandas
- numpy
- scikit-learn
- xgboost
- matplotlib
- seaborn
- scikit-optimize (for Bayesian Optimization)
- Jupyter Notebook
-
Clone the repository:
git clone https://github.com/oxayavongsa/aai-510-ml-group-1 cd aai-510-ml-group-1
-
Install the required packages:
pip install -r requirements.txt
-
Open the Jupyter Notebook for Exploratory Data Analysis (EDA):
jupyter notebook Final Project SectionA-Team 1.ipynb
-
Follow the notebook steps to perform data cleaning, feature selection, and model training.
- Team Leader/Representative: Outhai Xayavongsa (Thai)
- Technical Lead: Aaron Ramirez
- Members:
- Aaron Ramirez
- Muhammad Haris
- Outhai Xayavongsa (Thai)
YouTube: Related Video
This project is licensed under the MIT License - see the LICENSE file for details.