This repository contains a web app developed using Streamlit and hosted on Streamlit Cloud. The web app integrates five different classification projects, each utilizing machine learning models to provide accurate predictions. The projects covered are:
- Spam Mail Prediction
- Titanic Survival Prediction
- Wine Quality Prediction
- Loan Status Prediction
- Credit Card Fraud Detection
- Overview
- Installation
- Usage
- Dataset Description
- Technologies Used
- Model Development Process
- Models Used
- Model Evaluation
- Conclusion
- Deployment
- Contributing
- Contact
This web application allows users to select from five different classification projects and get predictions based on the input features. Each project was developed through extensive data analysis and model selection processes, ensuring high accuracy and reliability.
To run this project locally, please follow these steps:
- Clone the repository
- Navigate to the project directory
- Install the required dependencies
git clone <repository_url>
cd <project_directory>
pip install -r requirements.txt
To start the Streamlit web app, run the following command in your terminal:
streamlit run streamlit_app.py
This will launch the web app in your default web browser. You can then select the desired classification project from the sidebar and input the required features to get a prediction.
Description: This dataset contains emails labeled as spam or not spam, with features such as email content, length, and specific words used.
Description: This dataset includes information about the passengers on the Titanic, with features such as age, sex, passenger class, and fare, used to predict survival.
Description: This dataset contains features like acidity, sugar levels, pH, and alcohol content to predict the quality of wine.
Description: This dataset includes features such as applicant income, loan amount, credit history, and employment status, used to predict loan approval status.
Description: This dataset contains transactions made by credit cards, with features such as transaction amount and frequency, used to predict fraudulent transactions.
- Programming Language: Python
- Web Framework: Streamlit
- Machine Learning Libraries: Scikit-learn, XGBoost
- Data Analysis and Visualization: Pandas, NumPy, Matplotlib, Seaborn
Each classification project was developed through the following steps:
- Importing the Dependencies
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Handling missing values
- Handling outliers
- Label encoding/One-hot encoding
- Standardizing the data
- Model Selection
- Selected the most common 5 classification models
- Trained each model and checked cross-validation scores
- Chose the top 3 models based on cross-validation scores
- Model Building and Evaluation
- Selected best features using Recursive Feature Elimination (RFE)
- Performed hyperparameter tuning using Grid Search CV
- Built the final model with the best hyperparameters and features
- Evaluated the model using classification reports
The top 3 models for each classification project are as follows:
- Support Vector Classifier: Effective in high-dimensional spaces.
- XGBoost: Boosting algorithm known for high performance.
- Random Forest Classifier: Ensemble method that reduces overfitting.
- Logistic Regression: Interpretable and performs well with classification.
- XGBoost: Boosting algorithm known for high performance.
- K-Nearest Neighbour: Simple algorithm that works well with small datasets.
- Logistic Regression: Interpretable and performs well with classification.
- XGBoost: Boosting algorithm known for high performance.
- K-Nearest Neighbour: Simple algorithm that works well with small datasets.
- XGBoost: Excellent performance with complex datasets.
- Random Forest Classifier: Robust and handles missing values well.
- Logistic Regression: Highly interpretable and performs well with binary classification.
- XGBoost: Powerful gradient boosting framework.
- Random Forest Classifier: Ensemble method that reduces overfitting.
- Support Vector Classifier: Effective in high-dimensional spaces.
- Support Vector Classifier: 98.21%
- XGBoost: 98.21%
- Random Forest Classifier: 96.59%
- Logistic Regression: 81.00%
- XGBoost: 79.33%
- K-Nearest Neighbour: 78.21%
- Logistic Regression: 67.50%
- XGBoost: 66.25%
- K-Nearest Neighbour: 58.44%
- XGBoost: 99.30%
- Random Forest Classifier: 98.83%
- Logistic Regression: 95.55%
- XGBoost: 92.38%
- Random Forest Classifier: 91.88%
- Support Vector Classifier: 91.37%
This ML Classification Projects WebApp provides an easy-to-use interface for predicting various outcomes based on input features. The models used are well-validated and tuned for high accuracy. The system aims to assist in decision-making and classification tasks across different domains.
The web app is hosted on Streamlit Cloud. You can access it using the following link:
ML Classification Projects WebApp
Contributions are welcome! If you have any suggestions or improvements, please create a pull request or open an issue.
If you have any questions or suggestions, feel free to contact me at prachetpandav283@gmail.com.