This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segreggation of zoning types and promote inclusivity. We hoped to be able to compare the results against data from other countries that have more includive zoning laws, but this was not possible due to constraints on data availability and language barriers. For the distributed component, we are using a cluster of 10 machines that are managed by Yarn. To do the processing of data and calculations, we applied Spark using Java and Gradle. The data itself was stored using HDFS and totaled to ~3.2 GB. For more detail on our motivation, procedures, project structure, and results, please reference the latex file or the presentation. Use the former for a more detailed explanation and the latter for a summary.
To access dataset using HDFS
$ hadoop fs -ls /zoning/<place>/<dataset>
|--zoning
|--Ohio
|--mental_health
|--occupancy
|--ohio-parcels
|--poverty
|--transportation_means
|--transportation_type
|--Houston
|--houston-parcels
|--mental_health
|--occupancy
|--poverty
|--transportation_means
|--transportation_type
|--Colorado
|--denver-parcels
|--jefferson-parcels
|--mental_health
|--occupancy
|--poverty
|--transportation_means
|--transportation_type
|--weld-parcels
|--Placer
|--mental_health
|--occupancy
|--placer-parcels
|--poverty
|--transportation_means
|--transportation_type
|--Dallas
|--dallas-parcels
|--mental_health
|--occupancy
|--poverty
|--structure
|--transportation_means
|--transportation_type
|--Detroit
|--detroit-parcels
|--mental_health
|--occupancy
|--poverty
|--structure
|--transportation_means
|--transportation_type
|--QOL
|--races.csv
|--financial.csv
Parcel Data
- https://koordinates.com/explore/
CENSUS DATA
- B08006 Sex of Workers by Means of Transportation to Work (Employment -> Commuting)
- S0802 Means of Transportation to Work by Selected Characteristics (Employment -> Commuting)
- H1 OCCUPANCY STATUS (Housing, Housing, 1)
- S1702 Poverty Status in the Past 12 Months of Families (Income and Poverty -> Income and Poverty)
MENTAL HEALTH
- https://www.countyhealthrankings.org/health-data/health-outcomes/quality-of-life
DATASET FOR COUNTY LEVEL QUALITY OF LIFE
- https://www.kaggle.com/datasets/zacvaughan/cityzipcountyfips-quality-of-life
- Java 11
- Gradle 8.0
- Apache Spark SQL 3.5
- Apache Spark Core 3.5
To start DFS:
$ start-dfs.sh
To start master:
$ start-master.sh
To start workers:
$ start-workers.sh
To use the spark shell
$ spark-shell
To run the Java program using Gradle:
$ gradle run