-
Notifications
You must be signed in to change notification settings - Fork 0
/
assignment05.qmd
210 lines (159 loc) · 5.85 KB
/
assignment05.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
title: "Assignment 05"
author: "Benjamin Reese"
format: html
self-contained: true
---
## Installing Packages Needed For Analysis
```{r, warning=FALSE, message=FALSE}
## Packages
library(tidyverse)
library(stringr)
library(readr)
library(lubridate)
library(sf)
library(tigris)
library(patchwork)
library(httr)
library(jsonlite)
library(testthat)
```
## 1. Data Cleaning and Loading
```{r, warning=FALSE, message=FALSE}
## Data Loading
crimes <- read_csv("data/crimes-reduced.csv", col_types = cols(Latitude = col_character(),
Longitude = col_character()))
## Data Cleaning
crimes <- crimes %>%
rename_all(funs(str_to_lower(.) %>% ## changing to lowercase
str_replace_all(., '\\s','_') ## removing spaces
)
)
```
## 2. Filtering Data
```{r, warning=FALSE, message=FALSE}
## Formatting date variable, filtering out NAs, and only including homicides
crimes_lim <- crimes %>%
mutate(date = as_date(mdy_hms(date, tz= "America/Chicago"))) %>%
filter(primary_type == "HOMICIDE",
!is.na(longitude),
!is.na(latitude),
!is.na(date),
date >= today() - years(10) ## Past 10 Years, should update
)
## Showing the lower_case and removed spaces
crimes_lim
```
### Testing the Filtering
```{r, warning=FALSE, message=FALSE}
## Testing
table(year(crimes_lim$date))
min(crimes_lim$date)
```
## 3. Convert Lon/Lat to Points Geometry And Making Map
```{r, warning=FALSE, message=FALSE}
## Converting lat and lon and coloring dots by arrest
crimes_lim %>%
st_as_sf(coords = c("longitude", "latitude")) %>%
st_set_crs(value = 4326) %>%
ggplot() +
geom_sf(aes(color=arrest), alpha=.4) ## using alpha for transparency
```
## 4. Load Census Tracts, Perform a Spatial Join, and Create Choropleth
```{r, message=FALSE, warning=FALSE}
## Reading in shapefile
chi_shape <- st_read("data/Boundaries - Census Tracts - 2010/geo_export_098913a2-eb78-48e6-944e-bdaaf19b94f6.shp")
```
```{r, message=FALSE, warning=FALSE}
## Selecting only geoid10 and geometry
chi_shape <- chi_shape %>%
select(geoid10, geometry)
## Setting crimes_lim coords
crimes_lim <- crimes_lim %>%
st_as_sf(coords = c("longitude", "latitude")) %>%
st_set_crs(value = 4326)
## Transforming crimes_lim
crimes_lim <- st_transform(crimes_lim, crs = 4326)
## Setting chi_shape crs
chi_shape <- st_transform(chi_shape, crs = 4326)
## Joining the shape file to crime data
chi_joins <- st_join(chi_shape, crimes_lim)
## Creating merged_chi_agg
merged_chi_agg <- chi_joins %>%
group_by(geoid10) %>%
summarize(homicides = sum(primary_type=="HOMICIDE"),
percent_arrests = mean(arrest))
## Homicide Map
chi_map1 <- merged_chi_agg %>%
ggplot(aes(fill=homicides)) +
geom_sf(color="white", lwd=.1) +
scale_fill_gradient(guide="colorbar", na.value="white") +
theme_void()
## Arrests Percent Map
chi_map2 <- merged_chi_agg %>%
ggplot(aes(fill=percent_arrests)) +
geom_sf(color="white", lwd=.1) +
scale_fill_gradient(guide="colorbar", na.value="white") +
theme_void()
## Using patchwork to put the maps together
chi_map1 + chi_map2
```
### Short Paragraph About Maps
The Central-Northwestern part of the city has the most homicides, as shown in the left map above. There are also some clusters of high homicide Census tracts around South and South Central Chicago. The other areas of the city do not have as many homicides. Homicides in Chicago seem to be quite geographically clustered and many Census tracts have few homicides. Perhaps surprisingly, the areas of Chicago with relatively fewer homicides have higher arrest rates, meaning homicides that occur in areas where homicides are more unusual are more likely to end in an arrest than killings in the Census tracts that have high amounts of homicides. In sum, there are pockets of high homicide areas in Chicago where a smaller proportion of the homicides result in an arrest, while the low homicide areas are more likely to make an arrest in connection to the homicide.
## 5. Using the Census API
I already have my census API key installed from an in class exercise, so I will just display the key.
```{r, warning=FALSE, message=FALSE}
## tidycensus package loading
library(tidycensus)
## Showing my key
Sys.getenv("CENSUS_API_KEY")
```
```{r, warning=FALSE, message=FALSE}
## Loading in Variables with tidycensus
chi_stats <- get_acs(
geography = "tract",
variables = c("B19013_001", "B15003_022", "B17017_002"),
state = "Illinois",
county = "Cook County",
geometry = TRUE,
output = "wide",
year = 2019,
progress = FALSE
)
## URL method
url <-
"https://api.census.gov/data/2019/acs/acs5?get=B19013_001E,B15003_022E,B17017_002E&for=tract:*&in=county:031&in=state:17&key=ef3e91b4b57955de6fbf3f957341452734d1b79a"
## Using url to make request from API
acs_json <- GET(url = url)
## Checking status
http_status(acs_json)
## Making a text string
acs_json <- content(acs_json, as = "text")
## Creating a character matrix
acs_matrix <- fromJSON(acs_json)
## Turning into tibble
acs_data <- as_tibble(acs_matrix[2:nrow(acs_matrix), ], .name_repair = "minimal")
## Adding names
names(acs_data) <- acs_matrix[1,]
```
### Now testing the equivalencies
```{r, warning=FALSE, message=FALSE}
## Formatting the tidycensus data for test
chi_test <- as_tibble(chi_stats) %>%
select(B19013_001E, B15003_022E, B17017_002E) %>%
arrange(desc(B19013_001E))
## Forming the url method data for test
acs_data_test <- acs_data %>%
select(B19013_001E, B15003_022E, B17017_002E) %>%
mutate(B19013_001E = as.numeric(B19013_001E),
B15003_022E = as.numeric(B15003_022E),
B17017_002E = as.numeric(B17017_002E)) %>%
mutate_all(~replace(., . == -666666666, NA)) %>%
arrange(desc(B19013_001E))
## Running the test with all_equal
all_equal(acs_data_test, chi_test)
## Using test_that
test_that("if tibbles are equal", {
expect_equivalent(all_equal(chi_test, acs_data_test), TRUE)
})
```