Welcome to the "Predicting Term Deposits" project repository! This project demonstrates the implementation of classification algorithms, specifically Decision Trees and k-Nearest Neighbors (kNN), using R. The goal is to predict whether a customer will subscribe to a term deposit based on various features from a bank dataset.
The Bank Marketing dataset used in this project is sourced from the UCI Machine Learning Repository. It contains information about a bank's telemarketing campaigns and whether or not the clients subscribed to a term deposit. The dataset consists of 16 input variables and one binary output variable indicating the subscription status (0/1). It is commonly used for binary classification tasks in machine learning.
To access the dataset and learn more about its attributes, you can visit the following link:
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
Here is the description of all the variables :
Input variables:
- Variable: Definition
- ID: Unique client ID
- age: Age of the client
- job: Type of job
- marital: Marital status of the client
- education: Education level
- default: Credit in default.
- housing: Housing loan
- loan: Personal loan
- contact: Type of communication
- month: Contact month
- day_of_week: Day of week of contact
- duration: Contact duration
- campaign: number of contacts performed during this campaign to the client
- pdays: number of days that passed by after the client was last contacted
- previous: number of contacts performed before this campaign
- poutcome: outcome of the previous marketing campaign
Output variable (desired target):
- y: has the client subscribed a term deposit? (binary: “yes”,“no”)
The repository is organized into several sections:
-
Data: This section provides an overview of the dataset, including the input variables and the target variable. It describes the attributes and their meanings, helping you understand the data better.
-
Exploratory Data Analysis: Here, we explore the dataset, visualize key features, and uncover any interesting insights. Data preprocessing and cleaning steps are also explained, if performed.
-
Decision Tree Classification: This section focuses on the implementation of Decision Tree models. Two models are showcased: one using the unmodified dataset and another using the SMOTE technique for handling imbalanced data. The models are trained, evaluated, and the results are presented.
-
k-Nearest Neighbors (kNN) Classification: In this section, we cover the implementation of kNN models. It includes data preprocessing steps specific to kNN, followed by two models: one using the original dataset and another using SMOTE for imbalanced data handling. Model training, evaluation, and results are discussed.
-
Conclusion: The conclusion section provides a summary of the project, highlighting key findings and insights. It also suggests possible areas for future improvement or research.
The detailed project report can be found in the output folder. It is available in HTML format, providing comprehensive analysis, insights, and conclusions derived from the classification models.
To run the code and reproduce the results, you need to have R installed on your machine along with the necessary libraries specified in the code. Make sure to set the working directory correctly and run the scripts in the provided order to ensure proper execution.
Contributions to this project are welcome! If you have any suggestions, improvements, or bug fixes, please feel free to open an issue or submit a pull request. Let's collaborate and make this project even better!
[1] Kaggle Datasets
[2] UCI Machine Learning Repository