-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-method.Rmd
290 lines (247 loc) · 14.8 KB
/
03-method.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# Methods
In this section, we present a method to estimate utility-based access to
community resources in Utah County, Utah.
## Modeling Framework {#framework}
In a destination choice modeling framework [@recker1978], an individual
at origin $i$ considering a destination $j$ from a set of possible destinations
$J$ has a choice probability
\begin{equation}
P_{ij} = \frac{e^{V_{ij}}}{\sum_{j' \in J} e^{V_{ij'}}}
(\#eq:mnlp)
\end{equation}
where $V_{ij}$ is a linear-in-parameters function representing the utility of
destination $j$. The destination utility consists of two basic elements:
\begin{equation}
V_{ij} = \beta t_{ij} + X_j\gamma
(\#eq:dcu)
\end{equation}
where $t_{ij}$ is a measure of the travel impedance between $i$ and $j$, $X_j$
is a vector of attributes of destination $j$, and $\beta, \gamma$ are estimated
parameters relating the travel impedance and the destination attributes to the
utility. These parameters may be estimated by maximum likelihood given sufficient
observational data.
The logarithm of the denominator of the choice probability given in Equation
\@ref(eq:mnlp) is a quantity called the *logsum* and represents the total
value --- or accessibility $A$ --- of the choice set for individual $i$
[@williams1977formation; @handy1997]
\begin{equation}
A_i = \log\left(\sum_{j' \in J} e^{V_{ij'}}\right) + C
(\#eq:dclogsum)
\end{equation}
where $C$ is an unknown constant resulting from the fact that the utility
represented in Equation \@ref(eq:dcu) is not absolute, but rather relative to
the utilities of the other options. The difference in logsum values between two
different origin points could be compared to determine which location has "better"
accessibility to the destinations in question, based on the elements included
in Equation \@ref(eq:dcu). Accessibility might be improved by lower travel
impedance, or by improved amenities, or even by simply having more options available.
```{r utilities}
tar_load(utilities)
```
These other elements include attributes of the community resource relevant to
the destination choice problem: the size of the resource, amenities available,
the price of goods on sale, etc. Each of these variables has an
importance weighted against the travel impedance $t_{ij}$, which might take
various forms depending on the data available and the destination resource in
question.
Simple measures such as the highway travel time or the walk distance
might be more or less appropriate for particular resources. Another option
commonly used in travel demand models is actually the logsum of a *mode*
choice model with the utility of choosing each mode given by a set of utility
equations. In this study we adopt generic mode choice utility equations
\begin{align*}
V_{ij\mathrm{auto}} &= `r utilities$CAR$ivtt`* (t_{ij\mathrm{auto}})\\
V_{ij\mathrm{transit}} &= `r utilities$TRANSIT$constant` `r utilities$TRANSIT$ivtt`* (t_{ij\mathrm{transit}})
`r utilities$TRANSIT$wait`* (wt_{ij}) `r utilities$TRANSIT$access`*(at_{ij})\\
V_{ij\mathrm{walk}} &= `r utilities$WALK$constant` `r utilities$WALK$ivtt`* (t_{ij\mathrm{walk}})
`r utilities$WALK$short_distance`* (d_{ij} | d_{ij} < 1.5)
`r utilities$WALK$long_distance`* (d_{ij} | d_{ij} \geq 1.5)
\end{align*}
where $t_{ij}$ is the travel time in minutes from $i$ to $j$ by each mode
(including only in-vehicle time for transit), $wt$ is the transit wait and
transfer time, $at$ is the time to access and egress transit by walking, and $d_{ij}$
is the walking distance in miles. The walking distance uses two different
utility parameters depending on whether the walking distance is greater than 1.5
miles. The travel impedance logsum between $i$ and $j$ is then
\begin{equation}
MCLS_{ij} = \log\left(\exp(V_{ij\mathrm{auto}}) + \exp(V_{ij\mathrm{transit}}) + \exp(V_{ij\mathrm{walk}}) \right)
(\#eq:mcls)
\end{equation}
## Data
Figure \@ref(fig:diagram) presents a schematic of the process to calculate the accessibility logsum for a particular region. Though the remainder of this section provides detail on each step and data input, a high-level overview is perhaps useful here. First, the American Community Survey provides information on the "origins" or residence neighborhoods in the region of study. Manual data collection efforts or other methods provide information on the community resources under study, or the "destinations." Routing software (R5) generates a matrix of travel costs between pairs of neighborhoods and resource locations using highway networks from OpenStreetMap and transit service timetables. These three data sets (origin information, destination information, and travel costs) are then combined into a "synthetic" choice set representing possible activity locations for people in each neighborhood. Location-based services data from a commercial provider reveals which potential destination was actually chosen, information which feeds an econometric choice model that estimates choice utility parameters. Finally, these utility parameters can then be re-applied to the choice set --- or a new dataset representing future conditions or even a different region --- to calculate utility-based accessibility measures.
```{r diagram, fig.cap="Diagram of the data assembly process.", fig.width=6}
grViz(diagram = "
digraph boxes_and_circles {
# a 'graph' statement
graph [overlap = true, fontsize = 10]
# several 'node' statements
node [shape = box,
fontname = Helvetica]
'Road Network (OpenStreetMap)';
'Transit (GTFS)';
'Travel costs';
'American Community Survey (origins)';
'Resources (destinations)';
'Synthetic choice set';
'Location-based services data' [color = red];
'Utility estimates';
'Accessibility';
node [shape = circle,
color = blue,
fixedsize = true,
width = 0.9] // sets as circles
R5; mlogit
# several 'edge' statements
'Road Network (OpenStreetMap)' -> R5
'Transit (GTFS)' -> R5
R5 -> 'Travel costs'
'Travel costs' -> 'Synthetic choice set'
'American Community Survey (origins)' -> 'Synthetic choice set'
'Resources (destinations)' -> 'Synthetic choice set'
'Synthetic choice set' -> mlogit
'Location-based services data' -> mlogit
mlogit -> 'Utility estimates'
'Utility estimates' -> 'Accessibility'
'Synthetic choice set' -> 'Accessibility'
}
", height = 300, width = 400)
```
Utah County, Utah, is among the fastest-growing urbanized regions in the United
States, with formerly agrarian areas and open rangeland being converted to
predominately suburban built environments. The population and economic center of
the county is in Provo and Orem, home to two large universities (Brigham Young
and Utah Valley), but the most rapid development in commercial and residential
terms has been in communities north of Utah Lake, between Provo and Salt Lake
City to the north. Interstate 15 travels the spine of the county, and a commuter
rail system travels multiple times a day between Provo and Salt Lake City with
stations in Orem, American Fork, and Lehi. A bus rapid transit (BRT) system connects
the universities, two commuter rail stations, and the densest portions of Provo
and Orem, and other local bus services operate in other communities in the region.
Table \@ref(tab:acstable) presents descriptive statistics of
the block groups --- a Census-defined geography between 600 and 3,000 people, and the smallest geography at which aggregate demographic statistics are generally available --- in Utah County obtained from the 2015-2019 American Community
Survey (ACS) using the tidycensus package for R [@tidycensus].
```{r acstable}
tar_load(acsdata)
options("datasummary_format_numeric_latex" = "plain")
acs_for_table <- acsdata %>% select(
"Density: Households per square kilometer" = density,
"Income: Median block group annual income ($US)" = income,
"Low Income: Share of households making less than $35k" = lowincome,
"High Income: Share of households making more than $125k" = highincome,
"Children: Share of households with children under 6" = children,
"Black: Share of population who is Black" = black,
"Asian: Share of population who is Asian" = asian,
"Hispanic: Share of population who is Hispanic*" = hispanic,
"White: Share of population who is White" = white)
datasummary_skim(acs_for_table, title = "Block Group Summary Statistics",
histogram = !knitr::is_latex_output(),
output = "kableExtra",
col.names = c("", "", "", "", "", "", "", "")) %>%
kableExtra::kable_styling(latex_options = c("scale_down")) %>%
kableExtra::column_spec(1, width = "4cm", ) %>%
kableExtra::footnote(symbol = "Hispanic indicates Hispanic individuals of all races; non-Hispanic individuals report a single race alone.")
```
```{r utco-map, fig.cap="Location of community resources in Utah County", fig.align='center', dev='tikz', cache = FALSE, fig.width=6.5, fig.height=8,dev.args=list(pointsize=2)}
tar_load(groceries)
tar_load(park_polygons)
tar_load(libraries)
mapdata <- bind_rows(
list("Groceries" = groceries,
"Parks" = park_polygons,
"Libraries" = libraries),
.id = "LandUse"
) %>%
st_centroid()
if(knitr::is_latex_output()){
ggplot() +
annotation_map_tile("stamenbw", zoom=11) +
annotation_scale() +
ggspatial::annotation_north_arrow(pad_y = unit(1, "in")) +
coord_sf(crs = st_crs(4326), expand = FALSE) +
geom_sf(data = mapdata %>% st_transform(4326), aes(color = `LandUse`, size = `LandUse`),
inherit.aes = FALSE, alpha = 0.5) +
scale_color_manual("Resource Type", values = dj1[c(1, 5, 3)]) +
scale_size_manual("Resource Type", values = c(2,3, 1)) +
theme(axis.line = element_line(color = NA),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.position = "bottom"
)
} else {
polys <- mapdata %>% st_transform(4326) %>% filter(st_is(., "MULTIPOLYGON"))
polypal <- colorFactor("Dark2", domain = polys$LandUse)
pnts <- mapdata %>% st_transform(4326) %>% filter(!st_is(., "MULTIPOINT"))
pntspal <- colorFactor("Dark2",domain = pnts$LandUse)
leaflet(polys) %>%
addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
addPolygons(color = ~polypal(LandUse)) %>%
addCircles(data = pnts, color = ~pntspal(pnts$LandUse))
}
```
### Resource Data
Figure \@ref(fig:utco-map) shows the locations of three types of
community resources in Utah County: parks, grocery stores, and libraries.
For each resource, an initial list of resources and elementary attributes was
obtained by executing a relevant query to OpenStreetMap (OSM).
Public parks and their attributes retrieved from OSM were verified and
corrected using aerial imagery and some site visits. The attributes included
the size of the park in acres, whether the park includes a playground, and
whether the park includes facilities for volleyball, basketball, and tennis.
The constructed dataset includes `r nrow(park_polygons)` attributed parks.
Grocery stores were retrieved from OSM and validated using internet resources and
site visits. The complete Nutritional Environment Measures Survey (NEMS-S) [@glanz2007]
was collected for each store, but this preliminary analysis only includes
cursory information on the stores including whether the store is a convenience store
or some other non-traditional grocery, whether the store includes a pharmacy or
other non-food merchandise, and the number of registers as a measure of the
store's size. The constructed dataset includes `r nrow(groceries)` stores.
Libraries were retrieved from OSM, and validated using library websites and
some site visits. The initial query returned university libraries and other
specialty resources; though some of these libraries are open to those outside
the university community, these were removed so that the resource list only
includes libraries generally catering to the general public. The amenities
available include whether the library offers additional classes and programs,
and whether the library includes genealogical or family history resources. The
square footage of the library was estimated from online mapping services.
Other variables discussed in the literature such as the availability of computers
and public wi-fi networks were present in every library and therefore cannot
be included in the destination utility equations.
### Mobile Device Data
@alamedaparks present a method for estimating destination choice models from
such data, which we repeat in this study. We provided a set of geometric
polygons for each park, grocery store, and library to StreetLight Data, Inc., a
commercial aggregator. StreetLight Data in turn provided data on the number of
mobile devices observed in each polygon grouped by the inferred residence block
group of those devices during summer and fall 2019.
We then created a simulated destination choice estimation dataset for each
community resource by sampling 10,000 block group - resource "trips" from the
StreetLight dataset. This created a "chosen" alternative; we then sampled ten additional
resources at random (each simulated trip was paired with a different sample) to
serve as the non-chosen alternatives. Random sampling of alternatives is a
common practice that results in unbiased estimates, though the standard errors
of the estimates might be larger than could be obtained through a more carefully
designed sampling scheme [@train2009].
### Travel Times
Once the choice, alternatives, and attributes of the alternatives have been
defined, the last element of the choice utility is the travel impedance between
each block group and each resource. Using the `r5r` R interface [@r5r] to the R5
routing algorithm, we estimated the highway drive travel time, the walking
route time, and the transit travel time for trips departing on April 26,
2022 at 8 AM. The time and date are most relevant for the transit path builder
in R5, which uses detailed transit path information stored in the
Utah Transit Authority GTFS feed file for Spring and Summer 2022. The transit path contains
separate measures of the total travel time, the time in the transit vehicle,
transfer time, and access / egress time, allowing us full use of the
mode choice utility equations and resulting logsum described in Equation \@ref(eq:mcls).
We limited valid paths to those involving less than 10 km of walking and
2 hours of total travel time.
For groceries and libraries, we queried the shortest time path on each
mode from the population-weighted block group centroid to the centroid of the grocery
store or library polygon. Because some parks in the dataset can be relatively
large and the centroid might be far from the park access or use point, we instead
sampled points along the boundary of the park polygon, and queried the shortest
time path by each mode to the nearest boundary point.