Python-based supervised machine learning project developed as the final project for INFO 6105 (Data Science Eng Methods).
The goal of this project is twofold: first, to understand how an imbalanced dataset can impact the analysis and results of credit card fraud detection; and second, to evaluate the effectiveness of different classification models and techniques aimed at enhancing the accuracy and reliability of fraud detection systems.
- Understand the effects of imbalanced datasets on fraud detection analysis.
- Compare Undersampling and Oversampling techniques for addressing data imbalance.
- Benchmark the performance of different classification models.
- Address outliers and normalize features to refine analysis and predictions.
- dataset/: Contains dataset used for the project.
- src/: Contains Python scripts for data preprocessing, model training, and evaluation.
- report/: Stores Report files.
- Python
- NumPy
- Pandas
- Scikit-learn
- xgboost
- Matplotlib
- Seaborn
- Clone the repository:
git clone https://github.com/Faridghr/FraudDetectivePy.git
- Navigate to the project directory:
cd FraudDetectivePy
- Install dependencies:
pip install -r requirements.txt
- Download the dataset and extract it to the
dataset
folder. - Run the main script to preprocess data, train models, and evaluate performance:
src/FraudDetectivePy.ipynb
The dataset used in this project was obtained from Kaggle.