EDA

Exploratory Data Analysis

This post consists of two examples, made in a google colab script, the first one in which we will analyze all the EDA steps in order to have the best data preprocessing before making our own models, and the second one in which we will analyze the distribution of the Iris data set thanks to the different libraries available.

EDA

Usage - Example 1: Car Price Dataset.

First of all, we will need both data set in our colab workshop, data_price_cars.csv and Iris.csv. By means of the first data set we will perform all the steps of the EDA with the help of the following commands.

df = pd.read_csv("data_price_cars.csv")
# To display the top 5 rows 
df.head(5) 
#Remove irrelevant columns 
df = df.drop(['Engine Fuel Type', 'Market Category', 'Vehicle Style', 'Popularity', 'Number of Doors', 'Vehicle Size'], axis=1)
#Rename columns
df = df.rename(columns={"Engine HP": "HP", "Engine Cylinders": "Cylinders", "Transmission Type": "Transmission", "Driven_Wheels": "Drive Mode","highway MPG": "MPG-H", "city mpg": "MPG-C", "MSRP": "Price" })
#Remove duplicated rows
duplicate_rows_df = df[df.duplicated()]
df = df.drop_duplicates()
# Dropping the missing values.
df = df.dropna()

Next we will detect the outliers,with the seaborn library and remove them from the data set.

And we remove them with the following code line.

#Remove outliers
df = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
df.shape

Finally, we can analyze our data using histograms, heat maps or scatterplots. For example:

Usage - Example 2: Car Iris data.

In this example, unlike the previous one, we will analyze the different probabilistic distributions of our data, using the different libraries avaible, here are some examples:

General data:

Probability Distribution:

Box Plots:

Violin Plots:

Scatter Plots:

Pair Plots:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
EDA.ipynb		EDA.ipynb
Iris.csv		Iris.csv
README.md		README.md
data_price_cars.csv		data_price_cars.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis

EDA

Usage - Example 1: Car Price Dataset.

Usage - Example 2: Car Iris data.

About

Releases

Packages

Languages

JoydELC/EDA

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis

EDA

Usage - Example 1: Car Price Dataset.

Usage - Example 2: Car Iris data.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages