Data-Analysis-Projects

1) Netflix EDA

This dataset & Repository consists of all Netflix original films released as of June 1st, 2021. Additionally, it also includes all Netflix documentaries and specials. The data was webscraped off of this Wikipedia page, which was then integrated with a dataset consisting of all of their corresponding IMDB scores. IMDB scores are voted on by community members, and the majority of the films have 1,000+ reviews. Dataset consist of: Title Genre Premiere date,IMDB scores Runtime,Languages

2) Football EDA

This repository will be looking at Football doing a range of different activities with football data this will include Exploratory Data Analysis, Data visualization,many other topics. This repository will consist of mainly Jupyter Notebooks and Python programming language.

3) Twitter Senitment Analysis

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by classification, text mining, text analysis, data analysis and data visualization

4) Power BI Dashboard

Power BI Sales Dashboard for Global Super Store • The project involves creating an interactive Power BI Sales Dashboard using Global_super_store sales data.

• The ETL process was performed to clean and transform the data using Power query.

• DAX was used for creating calculated measures and calculated columns.

• Visualizations and reports were created using cards, charts and slicers to provide insights and easy understanding for end users.

• The tools used were Microsoft Power BI and MS Excel.

5) Data Science EDA

Data Science Job Salaries Dataset contains 11 columns, each are:

• work_year: The year the salary was paid.

• experience_level: The experience level in the job during the year

• employment_type: The type of employment for the role

• job_title: The role worked in during the year.

• salary: The total gross salary amount paid.

• employee_residence: Employee's primary country of residence in during the work year as an ISO 3166 country code.

• remote_ratio: The overall amount of work done remotely

• company_location: The country of the employer's main office or contracting branch

• company_size: The median number of people that worked for the company during the year

6) IPL Data Analysis_Using Apache Spark

Here are the things I have done.

•Basics of Apache Spark (architecture, transformation, action, lazy evaluation)

•Creating a Databricks account and the basics of it

•Structured API and how to write transformation functions

•Using SQL to analyze IPL Data

•Building visualization to gain more insights

The goal of this project is to give you an overall understanding of Apache Spark and its different functions to write transformation blocks on top of that you will learn SQL to analyze data and build visualization.

7) Loan Eligibility Prediction

Data Loading and Exploration: Imported necessary libraries and loaded the dataset from a CSV file. Explored the dataset with head(), info(), shape, and describe() methods to understand its structure and summary statistics.

Identified missing values using isnull().sum(). Filled missing values in categorical columns (e.g., Gender, Married) with the mode, and in numerical columns (e.g., LoanAmount, Loan_Amount_Term) with mean or mode as appropriate. Feature Engineering:

Created new features such as TotalIncome by summing ApplicantIncome and CoapplicantIncome. Transformed skewed data using logarithmic scaling (LoanAmount_log and TotalIncome_log).

Data Visualization: Used histograms and boxplots to visualize the distribution of ApplicantIncome, CoapplicantIncome, LoanAmount, and their logarithmic transformations. Examined the relationship between Credit_History and Loan_Status using cross-tabulation.

Data Preparation: Selected relevant features for model training and separated the target variable (Loan_Status). Split the data into training and testing sets using train_test_split. Encoded categorical variables into numerical values using LabelEncoder.

Model Training and Evaluation: Applied the Naive Bayes Classifier to train the model on the training set. Evaluated the model's performance on the test set, likely calculating metrics such as accuracy, precision, recall, and F1-score (though the evaluation part isn't explicitly mentioned in the provided code).

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Amazon Book selling		Amazon Book selling
DS Salary EDA		DS Salary EDA
Heart disease Prediction ML		Heart disease Prediction ML
IPL Data Analysis_ Spark		IPL Data Analysis_ Spark
Loan Eligiblity Prediction		Loan Eligiblity Prediction
Netflix EDA		Netflix EDA
Super Store Sales Project BI		Super Store Sales Project BI
Twitter Sentiment Analysis		Twitter Sentiment Analysis
EPL_20_21.csv		EPL_20_21.csv
FOOTBALL.ipynb		FOOTBALL.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Analysis-Projects

1) Netflix EDA

2) Football EDA

3) Twitter Senitment Analysis

4) Power BI Dashboard

5) Data Science EDA

6) IPL Data Analysis_Using Apache Spark

7) Loan Eligibility Prediction

About

Releases

Packages

Languages

soham7998/Data-Analysis-Projects

Folders and files

Latest commit

History

Repository files navigation

Data-Analysis-Projects

1) Netflix EDA

2) Football EDA

3) Twitter Senitment Analysis

4) Power BI Dashboard

5) Data Science EDA

6) IPL Data Analysis_Using Apache Spark

7) Loan Eligibility Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages