Welcome to my Data Science Projects Repository! It contains a collection of my Data Science projects in order to show my skills and expertise in the field. Each project is a demonstration of different aspects of Data Analysis, Visualization, Machine Learning and Cloud Computing.
Description: This project is a captivating journey of a self-taught data science enthusiast who tackled the challenge of predicting house prices using the Kaggle dataset "House Prices: Advanced Regression Techniques." The goal was to showcase skills in exploratory analytics, feature engineering, and machine learning models.
Technologies Used:
- Python
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Linear Regression, Tree Regression, K-Nearest Neighbors (KNN)
- Data Visualization
Results:
- Explored dataset and handled missing values by substituting with -1.
- Conducted creative feature engineering to enhance model performance.
- Utilized three machine learning models, with Linear Regression achieving the lowest mean squared error.
- Validated the Linear Regression model through visualization, confirming its accuracy.
- Achieved an impressive Kaggle score of 0.25 for house price predictions.
Check my article on Medium about this project
This project showcased the ability to independently tackle real-world data challenges and deliver valuable insights through exploratory analytics and feature engineering. The outcome solidified my understanding of evaluation methodologies and reinforced my passion for data science.
Description: The objective of this project is to predict customer churn in a telecom company. Customer Churn is the rate at which customers stop doing business with a company or discontinue their services. For that, develop a machine learning model that can predict customers who will leave the company.
Technologies Used:
- Python
- Exploratory Data Analysis (EDA)
- Data Preprocessing - Robust Scaler
- Feature Engineering
- Logistic Regression, Random Forest Regression, XGB Classifier
- Data Visualization
- Encoding Variables using LabelEncoder
- Evaluation and confusion matrix
Results:
- Understanding of the Business problem.
- Explored dataset through the graphics.
- Conducted feature engineering to improve the model performance.
- Utilized three machine learning models, in which Logistic Regression had the best performance related to others.
- Achieved 80% of Accuracy, the number of predicted Customer churns was 460.
- Conclusions and recommendations to the company about the analysis.
Description: The challenge is to recognize fraudulent credit card transactions so that the customers of credit card companies are not charged for items that they did not purchase.
Technologies Used:
- Google Colab
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Feature Engineering
- MLP Classifier, Random Forest Regression, Logistic Regression
- Data Visualization
- Robust Scaler and sample to deal with imbalanced data
- Evaluation model
Results:
- Understanding of the Business problem
- Explored dataset through the graphics
- Conducted feature engineering to improve the model performance
- Utilized three machine learning models, in which Logistic Regression had the best performance related to others.
- Achieved 99.96% of Accuracy using Logistic Regression and 99.95% with MLP Classifier.