-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.Rmd
423 lines (313 loc) · 21.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
---
output: github_document
bibliography: references.bib
---
<!-- badges: start -->
[![R-CMD-check](https://github.com/udsleeds/openinfra/workflows/R-CMD-check/badge.svg)](https://github.com/udsleeds/openinfra/actions)
[![R-CMD-check](https://github.com/udsleeds/openinfra/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/udsleeds/openinfra/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = TRUE
)
library(sf)
library(tmap)
```
```{r, eval=FALSE, echo=FALSE}
# Generate citations (requires Zotero)
library(rbbt)
bbt_write_bib(path = "references.bib", keys = bbt_detect_citations("README.Rmd"), overwrite = TRUE)
```
# Open access data for transport research: tools, modelling and simulation
# Summary
Getting people walking and cycling has become a priority for many local, regional and national governments in recent years.
Interventions boosting physical activity represent a 'magic bullet', tackling obesity, air pollution and wellbeing.
Active travel is a rapidly growing topic of multi-disciplinary research but has received limited attention from data science perspectives, with a recent paper on modelling cycle network growth [@orozco_datadriven_2020] providing a notable exception.
The work will be grounded in geographic data science, building on previous studies assessing open datasets for transport applications [@ferster_using_2020; @haklay_how_2010a].
In the post-pandemic world, active modes will be even more important due to reduced public transport capacities, as highlighted by the Department for Transport's £250m Active Travel Fund (ATF) and £2bn allocated to walking and cycling over the next 5 years in the UK alone.
New policies and investment programs such as the ATF have led to increased demand for local evidence to inform interventions ranging from new cycleways to improved pavement quality.
This project will the potential of open access transport sources such as OpenStreetMap (OSM) and Ordnance Survey Open Roads (OSOR) datasets, and associated tools, for transport planning to meet active travel objectives.
Specifically, the project will explore how open datasets can be used to understand, prioritise and design active travel infrastructure, such as cycleways, pavements, crossing points and traffic-calming features.
The overall aim is to research and add value to open transport infrastructure data --- and OpenStreetMap data in particular --- for use in transport planning.
The outputs will include new insights, ideas and datasets, leading to a step change in the accessibility, utility and understanding of crowd source data for evidence-based decision making.
# Introduction
This repo contains code and example data to explore the utility of open data for transport planning and, specifically, open data on transport infrastructure.
It was created to support a 12 month LIDA internship, the objectives of which are to:
1. develop new methods for bulk downloading, querying and analysing OpenStreetMap data on transport infrastructure
1. assess the quality of OSM data with reference to ‘ground truth’ datasets including data from satellite imagery and Ordnance Survey data
1. develop a typology of transport infrastructure data and data schemas for each infrastructure type and an actionable definition of ‘active travel infrastructure’
1. articulate ideas on how future research, datasets, software and tools could add value to open transport infrastructure data and support sustainable transport planning practice
1. publish reproducible methods and documentation on using OSM data for transport planning with reference to the strengths and potential pitfalls of the data
1. develop ‘OSM transport infrastructure data packs’ for every transport authority in Great Britain, with layers reflecting a typology of transport infrastructure data developed in the project
1. develop and publish guidance on using OSM data for transport planning
1. suggest a research agenda to enable better use of existing open datasets on transport infrastructure and envision future developments that could make transport planning more transparent, reproducible and participatory
The internship will be undertaken in two 6 month phases, with a rough plan being for objectives 1:4 to be tackled during months 1:6 and objectives 5:8 to be tackled during months 7:12.
An agile approach will be taken whereby objectives can be changed during the internship to pursue promising avenues that emerge.
There are already good tools open tools for working with transport infrastructure data, including the R packages [`osmextract`](https://docs.ropensci.org/osmextract/), [`stplanr`](https://docs.ropensci.org/stplanr/), and [`sfnetworks`](https://luukvdmeer.github.io/sfnetworks/).
These, and packages written in other languages such as Julia and Python, are largely academic-led and technical projects with little uptake among practitioners.
This project will explore the landscape of open transport infrastructure, describe and critique how active travel infrastructure is represented, and document how practitioners can better use open data for evidence-based, transparent and participatory active travel interventions.
Local authority planners and other stakeholders have more data than ever before on transport systems to support their work, especially in relation to travel *behaviour* thanks to datasets from traffic counts, travel surveys and open access tools such as the Propensity to Cycle Tool.
However, there is less accessible data on travel *infrastructure*, especially in relation to walking and cycling.
Good practice on designing for active travel is well known [@departmentfortransport_manual_2007; @parkin_designing_2018] and increasingly recommended/enforced.
Recent government publications provide clear guidance on design parameter for active travel infrastructure, with the recent '[Cycle infrastructure design](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/904088/cycle-infrastructure-design-ltn-1-20.pdf)' guidance from the Department for Transport specifying 'desirable' and 'absolute' minimum widths of cycleways of 1.5m and 2m on cycleways with low levels of cycle traffic, for example [@departmentfortransport_cycle_2020].
However, little is known about the extent to which new infrastructure is compliant with such guidance: there is no open data on cycleway widths in most parts of the UK, leading to new approaches to assess compliance using region-specific datasets [@tait_cycling_2022].
Furthermore, new tools building on OSM datasets have been developed, for example to model change in transport infrastructure, prioritise road space reallocation schemes, and identify 'low traffic neighbourhoods' [e.g. @lovelace_open_2021; @lovelace_methods_2020; @lucas-smith_mapping_2021].
The internship will generate new research and publications on additional uses of open data to support sustainable transport planning objectives.
# Example of transport infrastructure in R
The brief example below shows how quickly you can get started with OSM data using command-line driven open source software to ensure reproducibility and scalability, based on an example put together for [ODI Manchester](https://github.com/Robinlovelace/openTransportDataDemo).
If you're new to R, it may be worth reading up on introductory material such as the free and open source resource *Reproducible Road Safety with R* [@lovelace_reproducible_2020] tutorial.
See [Section 1.5](https://itsleeds.github.io/rrsrr/introduction.html#installing-r-and-rstudio) of that tutorial to install R/RStudio and [Section 3](https://itsleeds.github.io/rrsrr/rstudio.html) on getting started with the powerful RStudio editor.
A strength of R is the number of high quality and open access [tutorials](https://education.rstudio.com/learn/beginner/), [books](https://education.rstudio.com/learn/beginner/) and videos to get started.
With R installed, you should be able to run all the code in this example and reproduce the results.
The first step is to install some packages, by entering the following commands into the R console:
```{r}
pkgs = c(
"pct",
"stats19",
"osmextract",
"tmap",
"stplanr",
"od",
"dplyr"
)
```
Install these packages as follows:
```{r, eval=FALSE}
install.packages(pkgs)
```
Load the packages one-by-one with `library(pct)` etc, or all at once as follows:
```{r}
lapply(pkgs, library, character.only = TRUE)[length(pkgs)]
```
One final line of code to set-up the environment is to switch `tmap` into 'view' mode if you want to create interactive maps:
```{r}
tmap_mode("view")
```
We will select the Worseley Building, home of LIDA, as the case study area.
As a starting point, we will use a 2 km buffer around the straight line between LIDA and Leeds city centre to capture movement along this transport corridor:
```{r}
lida_point = tmaptools::geocode_OSM("Worsley Building, Leeds")
leeds_point = tmaptools::geocode_OSM("leeds")
c_m_coordiantes = rbind(lida_point$coords, leeds_point$coords)
c_m_od = od::points_to_od(p = c_m_coordiantes, interzone_only = TRUE)
c_m_desire_line = od::odc_to_sf(c_m_od[-(1:2)])[1, ]
lida_buffer = stplanr::geo_buffer(c_m_desire_line, dist = 2000)
```
```{r}
qtm(lida_buffer)
```
```{r, eval=FALSE}
sf::st_write(lida_buffer, "lida_buffer.geojson")
```
![](unnamed-chunk-8-1.png)
## Transport infrastructure data from osmextract
The following commands get transport infrastructure data.
See documentation on the [`osmextract` website](https://docs.ropensci.org/osmextract/index.html) for details.
```{r, eval=FALSE}
osm_data_full = osmextract::oe_get(lida_buffer, extra_tags = c("maxspeed", "lanes"))
osm_data_region = osm_data_full[lida_buffer, , op = sf::st_within]
summary(factor(osm_data_region$highway))
tmap_mode("plot")
tm_shape(osm_data_region) +
tm_lines(col = "highway")
tmap_save(.Last.value, "osm_highway_map.png")
```
![](osm_highway_map.png)
The same approach can be used to get building polygons:
```{r, eval=FALSE}
q = "select * from multipolygons where building in ('house', 'residential', 'office', 'commercial', 'detached', 'yes')"
osm_data_polygons = osmextract::oe_get(zones, query = q)
osm_data_polygons_region = osm_data_polygons[lida_buffer, , op = sf::st_within]
qtm(zones) +
qtm(osm_data_polygons_region)
saveRDS(osm_data_polygons_region, "osm_data_polygons_region.Rds")
```
There is lots more we can do with this data and other open transport datasets, and this project looks set to identify and document some of the most important uses for sustainable transport planning.
<!-- ## Zone data from the PCT -->
<!-- The Propensity to Cycle Tool (PCT) is a research project and web application that provides data on transport patterns at high levels of geographic resolution across England and Wales. -->
<!-- The PCT is the main national tool that highway authorities use to support strategic cycle network plans and to ensure that investment goes in places, and transport corridors, with high cycling potential. -->
<!-- You can use the PCT in a web browser by navigating to www.pct.bike and clicking on a region of interest. -->
<!-- By making model results publicly the PCT enables more stakeholders to engage in the transport planning process than do proprietary tools only available to a handful of people with expensive licenses [@lovelace_open_2020]. -->
<!-- The PCT is also an open data project, and you can download data for any region in England and Wales in the Region data tab when using the tool. -->
<!-- You can also download data programmatically using the `pct` R package to enable others to build on the tool using the statistical programming language in which it was built. -->
<!-- This section demonstrates how to get and visualise key transport datasets from the PCT. -->
<!-- ```{r} -->
<!-- head(pct::pct_regions$region_name) -->
<!-- # zones = pct::get_pct_zones("west-yorkshire") # for smaller LSOA zones -->
<!-- zones = pct::get_pct_zones("west-yorkshire", geography = "msoa") -->
<!-- names(zones)[1:20] -->
<!-- names_to_plot = c("bicycle", "foot", "car_driver", "bus") -->
<!-- plot(zones[names_to_plot]) -->
<!-- ``` -->
<!-- To keep only zones whose centroids lie inside the study area we can use the following spatial subsetting code: -->
<!-- ```{r} -->
<!-- zone_centroids = sf::st_centroid(zones) -->
<!-- zone_centroids_lida = zone_centroids[lida_buffer, ] -->
<!-- zones = zones[zones$geo_code %in% zone_centroids_lida$geo_code, ] -->
<!-- saveRDS(zones, "zones.Rds") -->
<!-- ``` -->
<!-- Let's plot the result, to get a handle on the level of walking and cycling in the area (see interactive version of this map [here](https://rpubs.com/RobinLovelace/772770), shown are LSOA results): -->
<!-- ```{r, eval=FALSE} -->
<!-- tm_shape(zones) + -->
<!-- tm_fill(c("foot", "bicycle"), palette = "viridis") + -->
<!-- tm_shape(lida_buffer) + tm_borders(lwd = 3) -->
<!-- ``` -->
<!-- ![](https://i.imgur.com/oEuv1Zj.png) -->
<!-- ## Desire line data from the pct package -->
<!-- The maps shown in the previous section establish that there is a decent amount of cycling in the Chorlton area, at least according to the 2011 Census which is still a good proxy for travel patterns in 2021 due to the inertia of travel behaviours to change [@goodman_walking_2013]. -->
<!-- You can get national OD (origin/destination, also called desire line) data from the Census into R with the following command: -->
<!-- ```{r} -->
<!-- od_national = pct::get_od() -->
<!-- od_national -->
<!-- ``` -->
<!-- Let's keep only OD data that have a start and end point in the study area (in a transport simulation, we may also want trips starting or ending outside this area and passing through): -->
<!-- ```{r} -->
<!-- od = od_national %>% -->
<!-- filter(geo_code1 %in% zones$geo_code) %>% -->
<!-- filter(geo_code2 %in% zones$geo_code) -->
<!-- dim(od) -->
<!-- ``` -->
<!-- The result is nearly 300 rows of data representing movement between origin and destination zone centroids. -->
<!-- The data is non geographic, however. -->
<!-- To convert this non-geographic data into geographic desire lines, you can use the `od_to_sf()` function in the `od` package as follows: -->
<!-- ```{r} -->
<!-- desire_lines = od::od_to_sf(x = od, z = zones) -->
<!-- ``` -->
<!-- We'll calculated the straight line distance of these trips as follows: -->
<!-- ```{r} -->
<!-- desire_lines$length_km = as.numeric(sf::st_length(desire_lines)) / 1000 -->
<!-- summary(desire_lines$length_km) -->
<!-- ``` -->
<!-- We can plot the result as follows: -->
<!-- ```{r} -->
<!-- tmap_mode("plot") -->
<!-- qtm(zones) + -->
<!-- tm_shape(desire_lines) + -->
<!-- tm_lines(c("foot", "bicycle"), palette = "Blues", style = "jenks", lwd = 3, alpha = 0.5) -->
<!-- ``` -->
<!-- Note the OD data describes an aggregate pattern, between pairs of zones -- not between individual points-of-interest. -->
<!-- The following code returns only OD pairs with an origin in the Chorlton area: -->
<!-- ```{r} -->
<!-- od_lida = od %>% -->
<!-- filter(geo_code1 %in% "E02001073") -->
<!-- ``` -->
<!-- ## Crash data from stats19 -->
<!-- A major deterrent to walking and cycling is motor traffic. -->
<!-- You can get open data on road traffic casulaties in the case study area over the last five years as follows: -->
<!-- ```{r, eval=FALSE} -->
<!-- library(stats19) -->
<!-- crashes = get_stats19(year = 2015:2019, output_format = "sf", lonlat = TRUE) -->
<!-- casualties = get_stats19(year = 2015:2019, type = "casualties") -->
<!-- crashes_combined = inner_join(crashes, casualties) -->
<!-- table(crashes_combined$casualty_type) -->
<!-- crashes_active = crashes_combined %>% -->
<!-- filter(casualty_type %in% c("Pedestrian", "Cyclist")) -->
<!-- crashes_in_area = crashes_active[lida_buffer, ] -->
<!-- tm_shape(crashes_in_area) + -->
<!-- tm_dots("casualty_type", popup.vars = c("casualty_type", "accident_severity", "datetime"), palette = "viridis") -->
<!-- ``` -->
<!-- ![](https://i.imgur.com/oTYSwzQ.png) -->
<!-- ```{r, eval=FALSE, echo=FALSE} -->
<!-- sf::write_sf(crashes_in_area, "crashes_in_area.geojson") -->
<!-- piggyback::pb_upload("crashes_in_area.geojson") -->
<!-- piggyback::pb_download_url("crashes_in_area.geojson") -->
<!-- ``` -->
<!-- You can get the resulting crash data from: https://github.com/Robinlovelace/openTransportDataDemo/releases/download/1/crashes_in_area.geojson -->
<!-- ## Scenarios of change -->
<!-- You can model cycling uptake functions with the `pct` package as follows: -->
<!-- ```{r} -->
<!-- percent_cycling = pct::uptake_pct_godutch_2020(distance = desire_lines$length_km, gradient = 0) -->
<!-- plot(desire_lines$length_km, percent_cycling) -->
<!-- ``` -->
<!-- To get more realistic results, you would use route (not straight line) distance and hilliness from actual routes, not just desire lines. -->
<!-- Routing takes time but can be done with R packages such as `stplanr`. -->
<!-- For the purposes of illustration, we will use a simple uptake model implemented below: -->
<!-- ```{r} -->
<!-- desire_lines_go_active = desire_lines %>% -->
<!-- mutate(car_driver = case_when(length_km < 2 ~ car_driver * 0.33, TRUE ~ car_driver)) %>% -->
<!-- mutate(foot = case_when(length_km < 2 ~ foot + car_driver * (1 - 0.33), TRUE ~ foot)) %>% -->
<!-- mutate(car_driver = car_driver * 0.5, bicycle = bicycle + car_driver * 0.5) %>% -->
<!-- mutate_if(is.numeric, round) -->
<!-- sum(desire_lines_go_active$bicycle) -->
<!-- sum(desire_lines$bicycle) -->
<!-- sum(desire_lines_go_active$foot) -->
<!-- sum(desire_lines$foot) -->
<!-- ``` -->
<!-- ## Preparing data for A/B Street -->
<!-- ```{r, fig.show='hold', out.width="49%"} -->
<!-- remotes::install_github("a-b-street/abstr", ref = "ab_scenario2") -->
<!-- u = "https://github.com/Robinlovelace/openTransportDataDemo/releases/download/1/osm_data_polygons_region.Rds" -->
<!-- f = basename(u) -->
<!-- if(!file.exists(f)) { -->
<!-- download.file(url = u, destfile = f) -->
<!-- } -->
<!-- osm_data_polygons_region = readRDS("osm_data_polygons_region.Rds") -->
<!-- # Explore inputs and outputs of ab_scenario fun -->
<!-- desire_lines_abst = desire_lines %>% -->
<!-- filter(geo_code1 == "E02001073") %>% -->
<!-- transmute(o = geo_code1, d = geo_code2, all, Walk = foot, Bike = bicycle, Drive = car_driver, -->
<!-- Transit = light_rail + train + bus) -->
<!-- set.seed(2050) -->
<!-- desire_lines_disaggregated = abstr::ab_scenario(desire_lines_abst, zones = zones, -->
<!-- subpoints = osm_data_polygons_region) -->
<!-- desire_lines_disaggregated %>% -->
<!-- tm_shape() + -->
<!-- tm_lines("mode") + -->
<!-- qtm(osm_data_polygons_region) -->
<!-- desire_lines_json = abstr::ab_json(desire_lines_disaggregated["mode"], scenario_name = "baseline") -->
<!-- abstr::ab_save(x = desire_lines_json, "baseline.json") -->
<!-- # Go Active scenario -->
<!-- desire_lines_abst = desire_lines_go_active %>% -->
<!-- filter(geo_code1 == "E02001073") %>% -->
<!-- transmute(o = geo_code1, d = geo_code2, all, Walk = foot, Bike = bicycle, Drive = car_driver, -->
<!-- Transit = light_rail + train + bus) -->
<!-- set.seed(2050) -->
<!-- desire_lines_disaggregated = abstr::ab_scenario(desire_lines_abst, zones = zones, -->
<!-- subpoints = osm_data_polygons_region) -->
<!-- desire_lines_disaggregated %>% -->
<!-- tm_shape() + -->
<!-- tm_lines("mode") + -->
<!-- qtm(osm_data_polygons_region) -->
<!-- desire_lines_json = abstr::ab_json(desire_lines_disaggregated["mode"], scenario_name = "go_active") -->
<!-- abstr::ab_save(x = desire_lines_json, "go_active.json") -->
<!-- ``` -->
<!-- ```{r, eval=FALSE, echo=FALSE} -->
<!-- piggyback::pb_upload("go_active.json") -->
<!-- piggyback::pb_download_url("go_active.json") -->
<!-- fs::file_size("baseline.json") -->
<!-- # Explore inputs and outputs of ab_scenario fun -->
<!-- desire_lines_disaggregated = abstr::ab_scenario(desire_lines_abst, zones = zones) -->
<!-- piggyback::pb_upload("osm_data_polygons_region.Rds") -->
<!-- piggyback::pb_download_url("osm_data_polygons_region.Rds") -->
<!-- desire_lines_json = ab_json(desire_lines_disaggregated["mode"], scenario_name = "baseline") -->
<!-- ab_save(x = desire_lines_json, "baseline.json") -->
<!-- fs::file_size("baseline.json") -->
<!-- library(abstr) -->
<!-- ?ab_scenario -->
<!-- ab_evening_dutch = ab_scenario2( -->
<!-- leeds_houses, -->
<!-- leeds_buildings, -->
<!-- leeds_desire_lines, -->
<!-- leeds_zones, -->
<!-- scenario = "dutch", -->
<!-- output_format = "sf", -->
<!-- hr = 20, # representing 8 pm -->
<!-- sd = 0 -->
<!-- ) -->
<!-- head(leeds_desire_lines) -->
<!-- head(ab_evening_dutch) # output is simple: sf object with `mode` column. -->
<!-- nrow(ab_evening_dutch) -->
<!-- sum(leeds_desire_lines$all_base) -->
<!-- # issue to fix here: -->
<!-- zones = zones %>% -->
<!-- filter(geo_name %in% c(desire_lines$geo_code1, desire_lines$geo_code2)) -->
<!-- desire_lines_json = ab_json(desire_lines_disaggregated["mode"], scenario_name = "baseline") -->
<!-- ``` -->
# Further reading
- To get started with R for transport research I recommend Reproducible Road Safety Research with R, an online version of which can be found here: https://itsleeds.github.io/rrsrr/
- To get a deeper understanding of using geographic research transport research, Chapter 12 of the book Geocomputation with R is a great place to start: https://geocompr.robinlovelace.net/transport.html
- For more on A/B Street scenarios, see here: https://a-b-street.github.io/docs/dev/formats/scenarios.html
For any questions, feel free to ask in a GitHub issue track associated with any of the repositories mentioned in this guide.
# References