Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 1.02 KB

README.md

File metadata and controls

19 lines (15 loc) · 1.02 KB

Housing-Information-Melbourne

By using a Python code we can integrate several datasets into one single schema and find and fix possible problems in the data. In this case we are going to use 7 different datasets in various formats about housing information in Victoria, Australia. Each of you is given 7 datasets in various formats and the data is about housing information in Victoria, Australia. The first task is to integrate all the datasets into one dataset:

  • Hospitals (HTML Format)
  • Supermarkets (Excel Format)
  • Shopping centers (PDF Format)
  • Real Estate (XML format)
  • Real Estate (JSON format)
  • Vic_suburb_boundary (Shape Format)
  • GTFS_Melbourne_Train_Information (Text Format)

The second task is to study the effect of different normalization/transformation methods:

  • Z-score Standardization
  • Minmax normalization

And observe and explain their effect assuming we want to develop a linear model to predict the price of a property using Distance_to_sc, travel_min_to_CBD, and Distance_to_hospital attributes.