Data Preprocessing Machine Learning Template

This is a Colab Template to be used as Data Preprocessing step before apply any ML Model. Within the Notebook, you will find the explanation and details of the “Why” of these steps presented on this template.

In this template, we present this 6 essential preprocessing steps:

Importing libraries
Importing the dataset
Deal with missing data
Encode categorical data
Split the dataset into Training and Test
Feature Scaling

1.- Importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as sk

2.- Importing dataset

from google.colab import drive
drive.mount("/content/gdrive")
from google.colab import files
uploaded = files.upload()

dataset = pd.read_csv("dataset.csv")

3.- Deal with missing data

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(x[:, :])
x[:, :] = imputer.transform(x[:, :])

4.- Encoding Categorical Data

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [])], remainder='passthrough')
x = np.array(ct.fit_transform(x))

5.- Split the data into Training and Test Set

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)

6.- Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train[:,:] = sc.fit_transform(x_train[:, :])
x_test[:, :] = sc.fit_transform(x_test[:, :])

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data_Preprocessing_Template_for_Machine_Learning.ipynb		Data_Preprocessing_Template_for_Machine_Learning.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Preprocessing Machine Learning Template

This is a Colab Template to be used as Data Preprocessing step before apply any ML Model. Within the Notebook, you will find the explanation and details of the “Why” of these steps presented on this template.

In this template, we present this 6 essential preprocessing steps:

1.- Importing libraries

2.- Importing dataset

3.- Deal with missing data

4.- Encoding Categorical Data

5.- Split the data into Training and Test Set

6.- Feature Scaling

About

Releases

Packages

Languages

License

LaloGarces/Data-Preprocessing-Machine-Learning-Template

Folders and files

Latest commit

History

Repository files navigation

Data Preprocessing Machine Learning Template

This is a Colab Template to be used as Data Preprocessing step before apply any ML Model. Within the Notebook, you will find the explanation and details of the “Why” of these steps presented on this template.

In this template, we present this 6 essential preprocessing steps:

1.- Importing libraries

2.- Importing dataset

3.- Deal with missing data

4.- Encoding Categorical Data

5.- Split the data into Training and Test Set

6.- Feature Scaling

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages