-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-application.Rmd
260 lines (226 loc) · 12.4 KB
/
04-application.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# Results
## Destination Choice Models
Using the simulated trip choices assembled from the location-based services data,
we estimate destination choice models with the `mlogit` package for
R [@R; @mlogit].
```{r park-models}
tar_load(park_models)
park_map = c(
"duration_CAR" = "Drive time",
"mclogsum" = "Mode Choice Logsum",
"yj_acres" = "log(Acres)",
"playgroundTRUE" = "Playground",
"splashpadTRUE" = "Splashpad",
"volleyballTRUE" = "Volleyball",
"basketballTRUE" = "Basketball",
"tennisTRUE" = "Tennis"
)
ls_v_attr <- lrtest(park_models$Attributes, park_models$`All - Logsum`)
ls_v_car <- horowitz_lrtest(park_models$`All - Logsum`, park_models$`All - Car`)
modelsummary(park_models, estimate = "{estimate}({statistic}){stars}",
title = "Park Destination Choice Utilities",
stars = c('*' = .05, '**' = .01),
coef_map = park_map, gof_map = gm,
statistic = NULL, note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
kable_styling(latex_options = c("scale_down"))
```
Table \@ref(tab:park-models) presents the model estimation results for
five different specifications of park destination choice. The "Car" model
includes only the network travel time by car as a predictor of park choice;
the "MCLS" model similarly contains only the mode choice logsum as an
impedance term. The signs on the coefficient indicate that people are more
likely to choose parks with lower car distance or higher multi-modal access, all
else equal. The "Attributes" model includes only information on the park attributes
including size and amenities. On balance, people appear attracted to larger parks
and parks with playgrounds, while somewhat deterred by various sports facilities.
The "All" models include both the relevant travel impedance term as well as
destination attributes.
For most block group-park pairs, the transit and walk travel costs
are sufficiently high that choosing these travel modes is unlikely. As a
result, the mode choice logsum is highly collinear with the car travel time.
Nevertheless, there are small differences between the models
with the different impedance terms. Using a non-nested likelihood statistic
test presented by @horowitz1987, we cannot reject the null hypothesis that the two
"All" models have equivalent likelihood ($p$-value of `r ls_v_car`), and therefore
cannot infer that the mode choice logsum is a marginally better estimator of
park choice than the vehicle travel time alone.
```{r grocery-models}
tar_load(grocery_models)
ls_v_car_g <- horowitz_lrtest(grocery_models$`All - Car`, grocery_models$`All - Logsum`)
d_conv_ratio <- grocery_models$`All - Car`$coefficients["typeConvenience Store"] /
grocery_models$`All - Car`$coefficients["duration_CAR"]
grocery_map = c(
"duration_CAR" = "Drive time",
"mclogsum" = "Mode Choice Logsum",
"typeConvenience Store" = "Convenience Store",
"typeOther (Write in a description)" = "Other non-standard",
"pharmacyTRUE" = "Has pharmacy",
"ethnicTRUE" = "Ethnic market",
"merchTRUE" = "Has other merchandise",
"registers" = "Number of registers",
"selfchecko" = "Number of self-checkout"
)
modelsummary(grocery_models,
estimate = "{estimate}({statistic}){stars}", statistic = NULL,
title = "Grocery Destination Choice Utilities",
stars = c('*' = .05, '**' = .01), coef_map = grocery_map, gof_map = gm,
note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
kable_styling(latex_options = c("scale_down"))
```
Table \@ref(tab:grocery-models) presents the model estimation results for the grocery
store models. As with the parks models in Table \@ref(tab:park-models), the
most predictive model contains both a travel impedance term and attributes of
the destination grocery store. The number of registers suggests that people
prefer larger stores, all else equal; ethnic markets, convenience stores, and other
facilities are less preferred while stores with pharmacies and other merchandise
(clothes, home goods, etc.) attract visitors. The ratio of the drive time and
convenience store coefficients suggests that on average, people are willing to
drive `r d_conv_ratio` minutes to reach a store that is not a convenience store.
In terms of the travel impedance, there is again not a sufficiently large gap in the
model likelihoods to reject that the mode choice logsum and the drive
time are equivalent predictors of grocery store choice.
```{r library-models}
tar_load(library_models)
library_map <- c(
"duration_CAR" = "Drive time",
"mclogsum" = "Mode Choice Logsum",
"classesTRUE" = "Offers Classes",
"genealogyTRUE" = "Genealogy Resources",
"area" = "log(Square Footage)"
)
ls_v_car_l <- horowitz_lrtest(library_models$`All - Logsum`, library_models$`All - Car`)
modelsummary(library_models, estimate = "{estimate}({statistic}){stars}",
title = "Library Destination Choice Utilities",
stars = c('*' = .05, '**' = .01),
coef_map = library_map, gof_map = gm,
statistic = NULL, note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
kable_styling(latex_options = c("scale_down"))
```
Table \@ref(tab:library-models) presents the model estimation results for the
library destination choice models. As with parks and grocery stores, both
travel impedance and destination attributes are significant predictors of
library choice. The strength of the attributes vector is somewhat surprising,
because virtually all libraries in the data set offer the same set of basic amenities
other than the size of the facility. Further,
each municipality in Utah County operates its own library rather than
having a system of interconnected library branches as might be typical in
larger cities or other regions. As with grocery stores and parks, there is no significant difference
between the prediction power of the mode choice logsum versus the car travel
time.
## Accessibilities
Using the results of the "All - Logsum" models estimated for each community resource
in the last section, we calculate the total utility-based accessibility measure
for each block group in Utah County. For comparison to a more traditional measure,
we also created buffer-based accessibility terms where a block group has
"access" to a grocery store if there is one within a 5-minute drive, a
park if there is one within a five-minute walk, and a library if there is one within
a ten-minute drive.
```{r access}
tar_load(access_bin)
tar_load(access_ls)
```
```{r access-map, dev = if(knitr::is_latex_output())"tikz", dev.args=list(pointsize=12), fig.cap = "Spatial comparison of grocery access threshold versus logsum value."}
amdata <- bind_rows(
access_ls %>% st_transform(4326) %>%
select(-total) %>% mutate(method = "Logsum"),
access_bin %>% st_transform(4326) %>%
select(!contains("_min")) %>% mutate(method = "Buffer")
)
ggplot() +
annotation_map_tile("cartolight", zoom=11) +
coord_sf(crs = st_crs(3426), expand = FALSE) +
geom_sf(data = amdata %>% filter(method == "Buffer"), aes(color = as.logical(grocery)), inherit.aes = FALSE, alpha = 0.7) +
scale_color_manual("$<5$ minute drive", values = dj1[c(5, 3)]) +
new_scale_color() +
geom_sf(data = amdata %>% filter(method == "Logsum"), aes(color = grocery), inherit.aes = FALSE, alpha = 0.7) +
scale_color_viridis_c("Logsum", option = "magma") +
facet_wrap(~method) +
theme(axis.line = element_line(color = NA),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
```
Figure \@ref(fig:access-map) spatially presents the difference between the
buffer-based measure and the logsum-based measure. The two measures largely
show the same basic shape: block groups along the spine of the county tend
to have binary access in the buffer and also have a higher logsum value. The
difference is at the margins, where the discontinuity of the buffer measure
is replaced by a smoother access surface, more spatially reflective of what
people are likely to experience.
```{r access-plot, fig.cap="Relationship between travel time and logsum access value for block groups in Utah County. Travel time thresholds shown as dashed lines, with best fit regresssion line added for context.", fig.height=4, dev = if(knitr::is_latex_output())"tikz",, dev.args=list(pointsize=10)}
thresholds <- tibble(
resource = c("Grocery", "Library", "Park"),
minutes = c(5, 10, 5)
)
left_join(
access_bin %>%
st_set_geometry(NULL) %>%
rename(buffer_Grocery = grocery, buffer_Library = library, buffer_Park = park,
minutes_Grocery = grocery_min, minutes_Library = library_min, minutes_Park = park_min),
access_ls %>%
st_set_geometry(NULL) %>%
select(-total) %>%
rename(logsum_Grocery = grocery, logsum_Library = library, logsum_Park = park),
by = c("id", "POPULATION")
) %>%
pivot_longer(cols = minutes_Grocery:logsum_Park) %>%
separate(name, c("measure", "resource")) %>%
pivot_wider(names_from = measure, values_from = value) %>%
mutate(buffer = as.logical(buffer)) %>%
ggplot(aes(x = minutes, y = logsum)) +
facet_wrap(~resource, scales = "free") +
geom_point(pch = 21, alpha = 0.4, stroke = 1) +
geom_vline(data = thresholds, aes(xintercept = minutes), lty = "dashed") +
stat_smooth(method = "lm", show.legend = FALSE, ribbon = FALSE) +
ylab("Accessibility Logsum") + xlab("Travel Time in Minutes")
```
The potential for the buffer measure to oversimplify the accessibility problem is further
illustrated in
Figure \@ref(fig:access-plot). This figure shows the utility-based accessibility
logsum calculated using the mode choice logsum as an impedance term against the
travel time in minutes (drive time for grocery stores and libraries; walk time
for parks), for block groups in the study region. It is clear that for all three
land uses,
lower travel time is significantly correlated with higher accessibility.
But for block groups with equivalent travel time to a particular community
resource, the accessibility logsum value
varies substantially. Even for block groups along the buffer --- where a small change
in travel time might place a block within or without the buffer --- the variance
in accessibility logsum appears to be almost as large as the variance
in the travel time. This variance in accessiblity logsum might be due to a
travel time differential between drive, walk, and transit modes captured in the
mode choice logsum, or it could also be because the resources available near the
set of block groups have substantial variance in their amenities. Being near a
single poor-quality grocery store is not the same thing as being near multiple
high-quality groceries, and the logsum value can capture this variance in its
construction.
```{r income-disp}
tar_load(no_buffer)
tar_load(no_logsum)
# calculate number of households in each group, and number of low-income
nhh_nobuf <- sum(no_buffer$households, na.rm = TRUE)
nhh_nols <- sum(no_logsum$households, na.rm = TRUE)
nhh_nobuflow <- sum(no_buffer$households * 0.01 * no_buffer$lowincome, na.rm = TRUE)
nhh_nolslow <- sum(no_logsum$households * 0.01 * no_logsum$lowincome, na.rm = TRUE)
# how many households are in both groups?
nhh_both <- inner_join(no_buffer[, c("id", "households", "lowincome")],
no_logsum[, "id"], by = "id") %>%
summarise(n_hh = sum(households), n_lo = sum(households * lowincome * 0.01))
```
## Spatial Distribution
In this analysis,
we estimate that `r nhh_nobuf` households live in block groups outside the
boundary of all three resource buffers: 10-minute drive for a library, 5-minute drive for
a grocery store, and 5-minute walk for a park. Of these, `r nhh_nobuflow` make less
than \$35,000 per year.
At the same time, only `r nhh_nols` households
live in block groups that are beneath the regional mean utility-based access to
all three resources; that is, they have less-than the regional average access to
grocery stores, and to libraries, and to parks. Of these households, `r nhh_nolslow`
are similarly low-income. Perhaps more importantly, the overlap between the households
in *both* groups is not very high: only `r nhh_both$n_hh[1]` households live
in block groups with low access determined by both buffers and by accessibility
logsum, `r nhh_both$n_lo[1]` of which are low-income households.