This repository stores the Python code for the Fall 2023 class project of CIS 9440 Data Warehousing and Analytics at Baruch College.
Professor: Professor Isaac Vaghefi
Team members:
- Komsit Rattana
- Angela Lee
- Derek Strang
- Mariya Mithaiwala
The project analyzes NYC 311 Service Requests data to find insights from the rodent complaints with secondary data sources like:
- NYC Geographical boundaries (NTA, Tract, Block, zip code)
- NYC Open Restaurant data
https://data.cityofnewyork.us/Transportation/Open-Restaurant-Applications-Historic-/pitm-atqc - Restaurant Inspection data
https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j - Population data by block from 2020 Census Data-census blocks
https://www.nyc.gov/site/planning/planning-level/nyc-population/2020-census.page
- Google Dataflow
- Google BigQuery
- Apache Beam
- Python
- location-transformation-pipeline
- Prepare staging data for location by extract geolocations from NYC 311 Service Request, Open Restaurant data, Restaurant Inspection data
- Resolve to NTA, Tract, and Block
- nyc-2020-census-block-dataload
- Extract Census Block to BigQuery
- nyc-2020-neighborhood-dataload
- Extract NTA boundaries to BigQuery
- nyc-311-request-extract-pipeline
- Extract NYC 311 Service Requests from API to BigQuery
- Support snapshot and incremental load
- nyc-open-restaurant-application-dataload
- Extract Open Restaurant data to BigQuery
- nyc-restaurant-inspection-extract-pipeline
- Extract Restaurant Inspection data from API to BigQuery
- Support snapshot and incremental load
- sql
SQL files for Schedule SQL Queries in BigQuery to refresh or incrementally load data into the tables - schemas
Schemas for all BigQuery tables - data
CSV for static and historical data