Data Science Projects

Welcome to my Data Science Projects Repository! It contains a collection of my Data Science projects in order to show my skills and expertise in the field. Each project is a demonstration of different aspects of Data Analysis, Visualization, Machine Learning and Cloud Computing.

Projects

1. House Price Prediction

Description: This project is a captivating journey of a self-taught data science enthusiast who tackled the challenge of predicting house prices using the Kaggle dataset "House Prices: Advanced Regression Techniques." The goal was to showcase skills in exploratory analytics, feature engineering, and machine learning models.

Technologies Used:

Python
Exploratory Data Analysis (EDA)
Feature Engineering
Linear Regression, Tree Regression, K-Nearest Neighbors (KNN)
Data Visualization

Results:

Explored dataset and handled missing values by substituting with -1.
Conducted creative feature engineering to enhance model performance.
Utilized three machine learning models, with Linear Regression achieving the lowest mean squared error.
Validated the Linear Regression model through visualization, confirming its accuracy.
Achieved an impressive Kaggle score of 0.25 for house price predictions.

Check my article on Medium about this project

This project showcased the ability to independently tackle real-world data challenges and deliver valuable insights through exploratory analytics and feature engineering. The outcome solidified my understanding of evaluation methodologies and reinforced my passion for data science.

2. Telecom Churn Prediction

Description: The objective of this project is to predict customer churn in a telecom company. Customer Churn is the rate at which customers stop doing business with a company or discontinue their services. For that, develop a machine learning model that can predict customers who will leave the company.

Technologies Used:

Python
Exploratory Data Analysis (EDA)
Data Preprocessing - Robust Scaler
Feature Engineering
Logistic Regression, Random Forest Regression, XGB Classifier
Data Visualization
Encoding Variables using LabelEncoder
Evaluation and confusion matrix

Results:

Understanding of the Business problem.
Explored dataset through the graphics.
Conducted feature engineering to improve the model performance.
Utilized three machine learning models, in which Logistic Regression had the best performance related to others.
Achieved 80% of Accuracy, the number of predicted Customer churns was 460.
Conclusions and recommendations to the company about the analysis.

3. Credit Card Fraud Detection

Description: The challenge is to recognize fraudulent credit card transactions so that the customers of credit card companies are not charged for items that they did not purchase.

Technologies Used:

Google Colab
Exploratory Data Analysis (EDA)
Data Preprocessing
Feature Engineering
MLP Classifier, Random Forest Regression, Logistic Regression
Data Visualization
Robust Scaler and sample to deal with imbalanced data
Evaluation model

Results:

Understanding of the Business problem
Explored dataset through the graphics
Conducted feature engineering to improve the model performance
Utilized three machine learning models, in which Logistic Regression had the best performance related to others.
Achieved 99.96% of Accuracy using Logistic Regression and 99.95% with MLP Classifier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Science Projects

Projects

1. House Price Prediction

2. Telecom Churn Prediction

3. Credit Card Fraud Detection

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Science Projects

Projects

1. House Price Prediction

2. Telecom Churn Prediction

3. Credit Card Fraud Detection