Skip to content
/ EDA Public

Introduction to exploratory data analysis using seaborn

Notifications You must be signed in to change notification settings

JoydELC/EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

This post consists of two examples, made in a google colab script, the first one in which we will analyze all the EDA steps in order to have the best data preprocessing before making our own models, and the second one in which we will analyze the distribution of the Iris data set thanks to the different libraries available.

EDA

image

Usage - Example 1: Car Price Dataset.

First of all, we will need both data set in our colab workshop, data_price_cars.csv and Iris.csv. By means of the first data set we will perform all the steps of the EDA with the help of the following commands.

df = pd.read_csv("data_price_cars.csv")
# To display the top 5 rows 
df.head(5) 
#Remove irrelevant columns 
df = df.drop(['Engine Fuel Type', 'Market Category', 'Vehicle Style', 'Popularity', 'Number of Doors', 'Vehicle Size'], axis=1)
#Rename columns
df = df.rename(columns={"Engine HP": "HP", "Engine Cylinders": "Cylinders", "Transmission Type": "Transmission", "Driven_Wheels": "Drive Mode","highway MPG": "MPG-H", "city mpg": "MPG-C", "MSRP": "Price" })
#Remove duplicated rows
duplicate_rows_df = df[df.duplicated()]
df = df.drop_duplicates()
# Dropping the missing values.
df = df.dropna()   

Next we will detect the outliers,with the seaborn library and remove them from the data set.

Outliers

And we remove them with the following code line.

#Remove outliers
df = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
df.shape

Finally, we can analyze our data using histograms, heat maps or scatterplots. For example: image

Usage - Example 2: Car Iris data.

In this example, unlike the previous one, we will analyze the different probabilistic distributions of our data, using the different libraries avaible, here are some examples:

General data:

image

Probability Distribution:

image

Box Plots:

image

Violin Plots:

image

Scatter Plots:

image

Pair Plots:

image

About

Introduction to exploratory data analysis using seaborn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published