"Predicting Bank Marketing Campaign Outcomes Using Classification and Regression Models Analyze the customer segmentation with Clustering. "
This project involves predicting the outcomes of bank marketing campaigns using various classification and regression models. The dataset contains information about phone calls made by a Portuguese bank to market term deposits. The goal is determining whether a client will subscribe to the product based on various features.
Regression_Final.ipynb
: The main Jupyter notebook containing the entire workflow, including data preprocessing, feature engineering, model training, and evaluation.- Also check the Clustering section!
- Logistic Regression: For binary classification of subscription outcomes.
- Random Forest Classifier: This classifier predicts whether a client will subscribe based on categorical and numerical features.
- Support Vector Classifier (SVC): This is used with a tuned parameter grid for optimal classification performance.
- Support Vector Regression (SVR): Applied to predict continuous target variables (regression task).
- Decision Tree Regression: Simple model for regression tasks with decision trees.
- Random Forest Regression: Ensemble method for continuous predictions.
- K-means Clustering:partitioning a dataset into a pre-defined number of clusters.
- Categorical features were one-hot encoded.
- Numerical features were scaled where needed (e.g., for SVC and Logistic Regression).
- Feature selection was applied for regression tasks to pick the most relevant variables.
- Models were evaluated based on metrics such as accuracy, precision, recall, F1 score for classification, and MSE, R² for regression.
- The best performance was achieved by the Random Forest Classifier in classification and the Random Forest Regressor in regression tasks.
The dataset used for this project is related to direct marketing campaigns of a Portuguese banking institution. It can be found at the Kaggle website :https://www.kaggle.com/datasets/henriqueyamahata/bank-marketing
- age: Age of the client (numeric).
- job: Type of job (categorical).
- marital: Marital status (categorical).
- education: Level of education (categorical).
- default: Whether the client has credit in default (categorical).
- housing: Whether the client has a housing loan (categorical).
- loan: Whether the client has a personal loan (categorical).
- contact: Contact communication type (categorical).
- month: Month of the last contact (categorical).
- day_of_week: Day of the week of the last contact (categorical).
- duration: Duration of the last contact (numeric).
- campaign: Number of contacts during this campaign (numeric).
- pdays: Days since last contact (numeric).
- previous: Number of contacts before this campaign (numeric).
- poutcome: Outcome of the previous campaign (categorical).
- emp.var.rate: Employment variation rate (numeric).
- cons.price.idx: Consumer price index (numeric).
- cons.conf.idx: Consumer confidence index (numeric).
- euribor3m: Euribor 3-month rate (numeric).
- nr.employed: Number of employees (numeric).