New York City Property Property Value Prediction Using Housing Sales Data

This project aims to predict NYC property values based on their distance to train stations and historical housing sales data from 2003-2015. Using a dataset of over 900,000 records, the project focuses on single-family dwellings (Building Code A) and applies machine learning models to estimate future property values. This work leverages regression models and geographic data to build accurate predictions.

Features

Data cleaning and preprocessing pipeline for housing sales data.
Integration of geographic data using the Nominatim API to retrieve location details.
Machine learning models for property value prediction (Linear Regression, Random Forest).
Feature engineering with building code, price change, and other key attributes.

Installation

Clone the repository:

git clone https://github.com/aaronmpuga/property-value-prediction.git
cd property-value-prediction

Set up Virtual environment and install dependencies

git clone https://github.com/aaronmpuga/property-value-prediction.git
cd property-value-prediction
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Set up the Nominatim API for location data (if applicable):

Use the free Nominatim API to get geographic coordinates for each property.
Note: Nominatim API only processes one request per second so if you plan on finding longitude and latitude values for a large amount of properties I would suggest the use of a paid geographic API to process data faster.

Usage

To test the prediction models follow the steps below:

Download the random forest regression models folder to your local device
Download the predict.py python script to your local device

In your IDE open the predict.py function and add this code at the end of the function

curr_price = 300000  # Current property value input by the user
years = 5 # How many years ahead the user wants to predict
b_code = "A1"     # Building classification code of user's property
borough = "Queens"      # The borough the property is in
station_dist = 500   # Distance of the property from the station in meters

predicted_value = predict_property_value(curr_price, years, b_code, borough, station_dist)
print(f"The predicted property value in {years_ahead} years is: {predicted_value}")

curr_price, years, b_code, borough and station_dist are example inputs for the function to take in. Change these values to what you would like to predict

Data

The raw_data folder is reserved for files and data that were sourced online at the start of the project from sites such as Kaggle and dataNy.gov
- The Annulized_Rolling_Sale_Update folder taken from NYC Open Data has the Excel files containing the property sales data across all boroughs in NYC from 2003 - 2015
- The zip_borough.csv file taken from Kaggle was used to create the borough column in the housing sales dataframe by mapping each property zipcode to its corresponding borough name
- The MTA_Subway_Stations.csv file taken from data.gov held the train station data used throughout the project
The interim_data folder contains modified/cleaned datasets that have been created based on the inital data in the raw_data folder or from other csv files in the folder.
The processed_data folder contains the final csv file datasets that contain all the data necessary for their respective purposes

Future Work

Expand the model to include additional building codes (e.g., Code B and D).
Implement more sophisticated geographic feature extraction using APIs with better limits.
Explore deep learning models for improved predictions
Develop a web-based application for real-time property value prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
notebooks		notebooks
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New York City Property Property Value Prediction Using Housing Sales Data

Table of Contents

Features

Installation

Usage

Data

Future Work

About

Releases

Packages

Languages

aaronmpuga/property-value-prediction

Folders and files

Latest commit

History

Repository files navigation

New York City Property Property Value Prediction Using Housing Sales Data

Table of Contents

Features

Installation

Usage

Data

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages