04-application.Rmd

# Results


## Destination Choice Models

Using the simulated trip choices assembled from the location-based services data,
we estimate destination choice models with the `mlogit` package for 
R [@R; @mlogit].

```{r park-models}
tar_load(park_models)
park_map = c(
    "duration_CAR" = "Drive time",
    "mclogsum" = "Mode Choice Logsum",
    "yj_acres" = "log(Acres)",
    "playgroundTRUE" = "Playground",
    "splashpadTRUE"  = "Splashpad",
    "volleyballTRUE" = "Volleyball",
    "basketballTRUE" = "Basketball",
    "tennisTRUE" = "Tennis"
)

ls_v_attr <- lrtest(park_models$Attributes, park_models$`All - Logsum`)
ls_v_car <- horowitz_lrtest(park_models$`All - Logsum`, park_models$`All - Car`)
modelsummary(park_models, estimate = "{estimate}({statistic}){stars}", 
             title = "Park Destination Choice Utilities",
             stars = c('*' = .05, '**' = .01),
             coef_map = park_map, gof_map = gm,
             statistic = NULL, note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
  kable_styling(latex_options = c("scale_down"))
```

Table \@ref(tab:park-models) presents the model estimation results for 
five different specifications of park destination choice. The "Car" model
includes only the network travel time by car as a predictor of park choice; 
the "MCLS" model similarly contains only the mode choice logsum as an 
impedance term. The signs on the coefficient indicate that people are more
likely to choose parks with lower car distance or higher multi-modal access, all
else equal. The "Attributes" model includes only information on the park attributes
including size and amenities. On balance, people appear attracted to larger parks
and parks with playgrounds, while somewhat deterred by various sports facilities. 
The "All" models include both the relevant travel impedance term as well as
destination attributes. 

For most block group-park pairs, the transit and walk travel costs 
are sufficiently high that choosing these travel modes is unlikely. As a
result, the mode choice logsum is highly collinear with the car travel time.
Nevertheless, there are small differences between the models 
with the different impedance terms. Using a non-nested likelihood statistic 
test presented by @horowitz1987, we cannot reject the null hypothesis that the two 
"All" models have equivalent likelihood ($p$-value of `r ls_v_car`), and therefore 
cannot infer that the mode choice logsum is a marginally better estimator of
park choice than the vehicle travel time alone.

```{r grocery-models}
tar_load(grocery_models)
ls_v_car_g <- horowitz_lrtest(grocery_models$`All - Car`, grocery_models$`All - Logsum`)
d_conv_ratio <- grocery_models$`All - Car`$coefficients["typeConvenience Store"] / 
  grocery_models$`All - Car`$coefficients["duration_CAR"]  
grocery_map = c(
    "duration_CAR" = "Drive time",
    "mclogsum" = "Mode Choice Logsum",
    "typeConvenience Store" = "Convenience Store",
    "typeOther (Write in a description)" = "Other non-standard",
    "pharmacyTRUE" = "Has pharmacy",
    "ethnicTRUE" = "Ethnic market",
    "merchTRUE" = "Has other merchandise",
    "registers" = "Number of registers",
    "selfchecko" = "Number of self-checkout"
  )

modelsummary(grocery_models, 
             estimate = "{estimate}({statistic}){stars}",  statistic = NULL, 
             title = "Grocery Destination Choice Utilities",
             stars = c('*' = .05, '**' = .01), coef_map = grocery_map, gof_map = gm,
             note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
  kable_styling(latex_options = c("scale_down"))
```

Table \@ref(tab:grocery-models) presents the model estimation results for the grocery 
store models. As with the parks models in Table \@ref(tab:park-models), the 
most predictive model contains both a travel impedance term and attributes of
the destination grocery store. The number of registers suggests that people 
prefer larger stores, all else equal; ethnic markets, convenience stores, and other
facilities are less preferred while stores with pharmacies and other merchandise
(clothes, home goods, etc.) attract visitors. The ratio of the drive time and 
convenience store coefficients suggests that on average, people are willing to 
drive `r d_conv_ratio` minutes to reach a store that is not a convenience store.
In terms of the travel impedance, there is again not a sufficiently large gap in the
model likelihoods to reject that the mode choice logsum and the drive 
time are equivalent predictors of grocery store choice.

```{r library-models}
tar_load(library_models)

library_map <- c(
    "duration_CAR" = "Drive time",
    "mclogsum" = "Mode Choice Logsum",
    "classesTRUE" = "Offers Classes",
    "genealogyTRUE" = "Genealogy Resources",
    "area" = "log(Square Footage)"
)

ls_v_car_l <- horowitz_lrtest(library_models$`All - Logsum`, library_models$`All - Car`)
modelsummary(library_models, estimate = "{estimate}({statistic}){stars}", 
             title = "Library Destination Choice Utilities",
             stars = c('*' = .05, '**' = .01), 
             coef_map = library_map, gof_map = gm,
             statistic = NULL, note = "t-statistics in parentheses. * p < 0.5, ** p < 0.01") %>%
  kable_styling(latex_options = c("scale_down"))
```

Table \@ref(tab:library-models) presents the model estimation results for the 
library destination choice models. As with parks and grocery stores, both 
travel impedance and destination attributes are significant predictors of 
library choice.  The strength of the attributes vector is somewhat surprising, 
because virtually all  libraries in the data set offer the same set of basic amenities
other than the size of the facility. Further,  
each municipality in Utah County operates its own library rather than 
having a system of interconnected library branches as might be typical in 
larger cities or other regions. As with grocery stores and parks, there is no significant difference
between the prediction power of the mode choice logsum versus the car travel 
time.

## Accessibilities

Using the results of the "All - Logsum" models estimated for each community resource 
in the last section, we calculate the total utility-based accessibility measure
for each block group in Utah County. For comparison to a more traditional measure,
we also created buffer-based accessibility terms where a block group has 
"access" to a grocery store if there is one within a 5-minute drive, a 
park if there is one within a five-minute walk, and a library if there is one within 
a ten-minute drive.


```{r access}
tar_load(access_bin)
tar_load(access_ls)
```


```{r access-map, dev = if(knitr::is_latex_output())"tikz", dev.args=list(pointsize=12), fig.cap = "Spatial comparison of grocery access threshold versus logsum value."}
amdata <- bind_rows(
  access_ls %>% st_transform(4326) %>%
    select(-total) %>% mutate(method = "Logsum"),
  access_bin %>% st_transform(4326) %>%
    select(!contains("_min")) %>% mutate(method = "Buffer")
)

ggplot() + 
  annotation_map_tile("cartolight", zoom=11) + 
  coord_sf(crs = st_crs(3426),  expand = FALSE) + 
  geom_sf(data = amdata %>% filter(method == "Buffer"), aes(color = as.logical(grocery)), inherit.aes = FALSE, alpha = 0.7) +
  scale_color_manual("$<5$ minute drive", values = dj1[c(5, 3)]) +  
  new_scale_color() + 
  geom_sf(data = amdata %>% filter(method == "Logsum"),  aes(color = grocery), inherit.aes = FALSE, alpha = 0.7) +
  scale_color_viridis_c("Logsum", option = "magma") +  
  facet_wrap(~method) + 
  theme(axis.line = element_line(color = NA),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())
```

Figure \@ref(fig:access-map) spatially presents the difference between the
buffer-based measure and the logsum-based measure. The two measures largely 
show the same basic shape: block groups along the spine of the county tend 
to have binary access in the buffer and also have a higher logsum value. The 
difference is at the margins, where the discontinuity of the buffer measure 
is replaced by a smoother access surface, more spatially reflective of what 
people are likely to experience.

```{r access-plot, fig.cap="Relationship between travel time and logsum access value for block groups in Utah County. Travel time thresholds shown as dashed lines, with best fit regresssion line added for context.", fig.height=4, dev = if(knitr::is_latex_output())"tikz",, dev.args=list(pointsize=10)}

thresholds <- tibble(
  resource = c("Grocery", "Library", "Park"),
  minutes = c(5, 10, 5)
)

left_join(
  access_bin %>%
    st_set_geometry(NULL) %>%
    rename(buffer_Grocery = grocery, buffer_Library = library, buffer_Park = park,
           minutes_Grocery = grocery_min, minutes_Library = library_min, minutes_Park = park_min),
  access_ls %>%
    st_set_geometry(NULL) %>%
    select(-total) %>%
    rename(logsum_Grocery = grocery, logsum_Library = library, logsum_Park = park),
  by = c("id", "POPULATION")
)  %>%
  pivot_longer(cols = minutes_Grocery:logsum_Park) %>%
  separate(name, c("measure", "resource")) %>%
  pivot_wider(names_from = measure, values_from = value) %>%
  mutate(buffer = as.logical(buffer)) %>%
  ggplot(aes(x = minutes, y = logsum)) +
  facet_wrap(~resource, scales = "free") +
  geom_point(pch = 21, alpha = 0.4, stroke = 1) + 
  geom_vline(data = thresholds, aes(xintercept = minutes), lty = "dashed") +
  stat_smooth(method = "lm", show.legend = FALSE, ribbon = FALSE) +
  ylab("Accessibility Logsum") + xlab("Travel Time in Minutes")

```

The potential for the buffer measure to oversimplify the accessibility problem is further 
illustrated in 
Figure \@ref(fig:access-plot). This figure shows the utility-based accessibility
logsum calculated using the mode choice logsum as an impedance term against the
travel time in minutes (drive time for grocery stores and libraries; walk time
for parks), for block groups in the study region. It is clear that for all three
land uses,
lower travel time is significantly correlated with higher accessibility.
But for block groups with equivalent travel time to a particular community
resource, the accessibility logsum value
varies substantially. Even for block groups along the buffer --- where a small change
in travel time might place a block within or without the buffer --- the variance 
in accessibility logsum appears to be almost as large as the variance 
in the travel time. This variance in accessiblity logsum might be due to a
travel time differential between drive, walk, and transit modes captured in the
mode choice logsum, or it could also be because the resources available near the
set of block groups have substantial variance in their amenities. Being near a
single poor-quality grocery store is not the same thing as being near multiple
high-quality groceries, and the logsum value can capture this variance in its
construction.


```{r income-disp}
tar_load(no_buffer)
tar_load(no_logsum)

# calculate number of households in each group, and number of low-income
nhh_nobuf <- sum(no_buffer$households, na.rm = TRUE)
nhh_nols  <- sum(no_logsum$households, na.rm = TRUE)
nhh_nobuflow <- sum(no_buffer$households * 0.01 * no_buffer$lowincome, na.rm = TRUE)
nhh_nolslow  <- sum(no_logsum$households * 0.01 * no_logsum$lowincome, na.rm = TRUE)

# how many households are  in both groups?
nhh_both <- inner_join(no_buffer[, c("id", "households", "lowincome")], 
                       no_logsum[, "id"], by = "id")  %>%
  summarise(n_hh = sum(households), n_lo = sum(households * lowincome * 0.01))
```

## Spatial Distribution

 In this analysis, 
we estimate that `r nhh_nobuf` households live in block groups outside the
boundary of all three resource buffers: 10-minute drive for a library, 5-minute drive for 
a grocery store, and 5-minute walk for a park. Of these, `r nhh_nobuflow` make less
than \$35,000 per year.
At the same time, only `r nhh_nols` households
live in block groups that are beneath the regional mean utility-based access to
all three resources; that is, they have less-than the regional average access to 
grocery stores, and to libraries, and to parks. Of these households, `r nhh_nolslow`
are similarly low-income. Perhaps more importantly, the overlap between the households
in *both* groups is not very high: only `r nhh_both$n_hh[1]` households live
in block groups with low access determined by both buffers and by accessibility 
logsum, `r nhh_both$n_lo[1]` of which are low-income households.