Skip to content

This repository contains assignments #3 that was completed as a part of "FIT5196 Data Wrangling", taught at Monash Uni in S2 2020.

Notifications You must be signed in to change notification settings

gaaniruddha/FIT5196-A3

Repository files navigation

Data Wrangling A3: Data Integration and Data Reshaping

  • Assignment_Specifications.pdf: Assignment specifications
  • Assignment_Solutions.ipynb/pdf: Assignment solutions. Python code to integrate several datasets into one single schema and find and fix possible problems in the data.
  • Input data: 7 datasets in various formats and data is about housing information in Victoria, Australia.
  • Input files: GTFS_Melbourne_Train_Information.zip, vic_suburb_boundary.zip, 30945305.zip.
  • Output files: 30945305_A3_solution.zip

Tasks completed:

  1. Task 1: Data Integration

    • Integrated the 7 input files into one dataset with a specified schema mentioned in the assignment specifications.
    • File types: .txt, .xlsx, json, xml, html and pdf
  2. Task 2: Data Shaping

    • Studied the effects of different normalization/transformation methods (i.e. standardization, min-max normalization, log, power, box-cox transformation) on various attributes.
    • Observe and explain their effect.

Libraries used: pandas, numpy, re, json, bs4, tabula, scipy, matplotlib, sklearn, sklearn.model_selection, sklearn.metrics, sklearn.linear_model

Releases

No releases published

Packages

No packages published