Skip to content

This data cleaning project meticulously ensures data quality, employs statistical methodology and includes comprehensive documenntation.

Notifications You must be signed in to change notification settings

blegodwin/airbnb_cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Airbnb Dataset Cleaning

Overview

This data cleaning project aims to clean and transform real life data to maintain its quality, integrity and context for the analytical process and make data ready for building a linear regression model.

This projects meticulously handles missing values, duplicate values, tranforms and converts data types, carefully considers outliers and every other form of inconsistences encountered in the data set.

The entirety of the cleaning process priorties statistical and analytical concepts to ensure that the data results in accuracy. These processes are documented in details.

Dataset Information

  • The dataset used in this project This dataset was sourced from Kaggle and can be accessed here.
  • The dataset contained data quality concerns such as NAN values, outliers, duplicates, skewed data distribution, improper casing, ilegal characters etc.
  • The dataset contained date values stored as objects.

Tools Used

  • Python programming language and its manipulation and computational libraries, pandas and numpy respectively are employed.
  • Python visualisation modules matplotlib and seaborn were also used.

Project Highlights

  • Data Quality Standards and Requirements: This is define as the accepted decimal place, the data types, text casing and value data types and consistency for computation best compatible with the project and other considerations.

  • Data Profiling: Thorough examination of the dataset to understand its structure, and anomalies. Potential data quality issues, such as missing values, duplicates, outliers, and inconsistencies were visualised and identified.

  • Data Cleaning and Transformation: Python pandas and numpy libraries were employed as data cleaning tools to address identified issues such as and not limited to; removing duplicates, handling missing values,truncating different data types recorded as single value, standardising formats and data types to ensure uniformity.

  • Documentation: To communicate effectively, be transparent and ensure that this exercise can be reproduced, comments and detailed explanation were included where necessary.

About

This data cleaning project meticulously ensures data quality, employs statistical methodology and includes comprehensive documenntation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published