-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.qmd
438 lines (360 loc) · 13.3 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
---
title: "Philly Center City District Sips 2022: An Interactive Map"
subtitle: "[R-Ladies Philly](https://www.rladiesphilly.org/) workshop on webscraping, geocoding, and interactive map-making"
author: "[Silvia Canelón](https://silviacanelon.com)"
date: 2022-09-29
format:
html:
toc: true
code-overflow: wrap
code-link: true
self-contained: true
# read about html options:
# https://quarto.org/docs/reference/formats/html.html
---
![](https://www.meetup.com/_next/image/?url=https%3A%2F%2Fsecure-content.meetupstatic.com%2Fimages%2Fclassic-events%2F506372053%2F676x380.webp&w=3840&q=75){fig-alt="Silvia Canelón presents Webscraping, Geocoding & Interactive Map-Making with Center City Sips Data. R-Ladies Philly." fig-align="center"}
## Introduction
This workshop is adapted from a [blog post](https://silviacanelon.com/blog/2022-ccd-sips) of the same name and is accompanied by [slides](https://slides.silviacanelon.com/2022-ccd-sips).
The [2022 Center City District Sips](https://centercityphila.org/explore-center-city/ccd-sips) website features all of the restaurants participating in the Center City Sips event, but does not offer a map view. This makes it hard to locate a happy hour special nearby, so we're going to use the data they provide to build an interactive map!
1. [Scrape restaurants and addresses from the website](#scraping-the-data)
2. [Geocode the restaurant addresses to obtain geographical coordinates](#geocoding-addresses)
3. [Build an interactive map with `leaflet`](#building-the-map)
### Packages
| Package | Purpose | Version |
|------------------|------------------------------------------------|----------|
| `tidyverse` | Data manipulation and iteration functions | 1.3.2.90 |
| `here` | File referencing in project-oriented workflows | 0.7.13 |
| `knitr ` | Style data frame output into formatted table | 1.40 |
| `robotstxt` | Check website for scraping permissions | 0.7.13 |
| `rvest` | Scrape the information off of the website | 1.0.3 |
| `tidygeocoder` | Geocode the restaurant addresses | 1.0.5 |
| `leaflet` | Build the interactive map | 2.1.1 |
| `leaflet.extras` | Add extra functionality to map | 1.0.0 |
```{r setup}
#| message: false
library(tidyverse)
library(here)
library(knitr)
library(robotstxt)
library(rvest)
library(tidygeocoder)
library(leaflet)
library(leaflet.extras)
```
## Scraping the data
We will scrape the data from the [2022 Center City District Sips]() website, specifically the list view: <https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view>
### Checking site permissions
First we check the site's terms of service using the [robotstxt](https://docs.ropensci.org/robotstxt/) package, which downloads and parses the site's robots.txt file.
What we want to look for is whether any pages are _not allowed_ to be crawled by bots/scrapers. In this case there aren't any, indicated by `Allow: /`.
```{r check-permissions}
get_robotstxt("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view")
```
### Harvesting data from the first page
We'll use the [rvest](https://rvest.tidyverse.org/index.html) package to scrape the information from the tables of restaurants/bars participating in CCD Sips.
Ideally you would only scrape each page once, so we will check our approach with the first page before writing a function to scrape the remaining pages.
```{r scrape-pg-1}
#| eval: true
# define the page
url <- "https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1"
# read the page html
html1 <- read_html(url)
# extract table info
table1 <-
html1 |>
html_node("table") |>
html_table()
table1 |> head() |> kable()
# extract hyperlinks to specific restaurant/bar specials
links <-
html1 |>
html_elements(".o-table__tag.ccd-text-link") |>
html_attr("href") |>
as_tibble()
links |> head() |> kable()
# add full hyperlinks to the table info
table1Mod <-
bind_cols(table1, links) |>
mutate(Specials = paste0(url, value)) |>
select(-c(`CCD SIPS Specials`, value))
table1Mod |> head() |> kable()
```
### Harvesting data from the remaining pages
We confirmed that the above approach harvested the information we needed, so we can adapt the code into a function that we can apply to pages 2-3 of the site.
```{r create-function}
getTables <- function(pageNumber) {
# wait 2 seconds between each scrape
Sys.sleep(2)
url <- paste0("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=", pageNumber)
# read the page html
html <- read_html(url)
# extract table info
table <-
html |>
html_node("table") |>
html_table()
# extract hyperlinks to specific restaurant/bar specials
links <-
html |>
html_elements(".o-table__tag.ccd-text-link") |>
html_attr("href") |>
as_tibble()
# add full hyperlinks to the table info
tableSpecials <<-
bind_cols(table, links) |>
mutate(Specials = paste0(url, value)) |>
select(-c(`CCD SIPS Specials`, value))
}
```
We can use the `getTable()` function and the `purrr::map_df()` function to harvest the table of restaurants/bars from pages 2 and 3.
Then we can combine all the data frames together and saved the complete data frame as an `.Rds` object so that we won't have to scrape the data again.
::: {.callout-tip}
## Shortcut
Skip this step and load the data from the `data/` folder:
```{r load-scraped-specials}
table <- read_rds(here("data", "specialsScraped.Rds"))
```
:::
```{r scrape-remaining, eval=FALSE}
#| eval: false
# get remaining tables
table2 <- map_df(2:3, getTables)
# combine all tables
table <- bind_rows(table1Mod, table2)
```
```{r show-scraped-specials}
table |> head() |> kable()
```
```{r save-scraped-specials}
#| eval: false
# save full table to file
write_rds(table,
file = here("data", "specialsScraped.Rds"))
```
## Geocoding addresses
The next step is to use geocoding to convert the restaurant/bar addresses to geographical coordinates (longitude and latitude) that we can map. We can use the [tidygeocoder](https://jessecambon.github.io/tidygeocoder) package to help us, and specify that we want to use the [ArcGIS](https://developers.arcgis.com/rest/geocode/api-reference/overview-world-geocoding-service.htm) geocoding service.
```{r, geocode-addresses}
# geocode addresses
specials <-
table |>
geocode(address = Address,
method = 'arcgis',
long = Longitude,
lat = Latitude)
```
```{r show-specials-geocoded}
specials |> head() |> kable()
```
Make sure to save the new data frame with geographical coordinates as an `.Rds` object so you won't have to geocode the data again! This is particularly important if you ever want to work with a large project.
```{r save-geocoded-specials}
# save table with geocoded addresses to file
write_rds(specials,
file = here("data", "specialsGeocoded.Rds"))
```
## Building the map
To build the map, we can use the [leaflet](https://rstudio.github.io/leaflet/) package.
::: {.callout-tip}
# Tip
Add a [Google Font](https://fonts.google.com/) with a `css` chunk that imports the font face(s) and weights you want to use (e.g. Red Hat Text)
```{css}
#| echo: fenced
@import url('https://fonts.googleapis.com/css2?family=Red+Hat+Text:ital,wght@0,300;0,400;1,300;1,400&display=swap');
```
:::
### Plotting the restaurants/bars
```{r add-restaurants}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
)
```
### Adding the map background
```{r add-basemap}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron)
```
### Setting the map view
```{r set-view}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16)
```
### Adding fullscreen control
```{r add-fullscreen}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl()
```
### Customizing map markers
```{r style-markers}
# style pop-ups for the map with inline css styling
# marker for the restaurants/bars
popInfoCircles <- paste(
"<h2 style='font-family: Red Hat Text, sans-serif; font-size: 1.6em; color:#43464C;'>",
"<a style='color: #00857A;' href=", specials$Specials, ">", specials$Name, "</a></h2>",
"<p style='font-family: Red Hat Text, sans-serif; font-weight: normal; font-size: 1.5em; color:#9197A6;'>", specials$Address, "</p>"
)
```
```{r customize-marker-labels}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
# customize markers
fillColor = "#009E91",
fillOpacity = 0.6,
stroke = F,
radius = 12,
# customize pop-ups
popup = popInfoCircles,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl()
```
### Adding a marker at the center
```{r style-center-marker}
# marker for the center of the map
popInfoMarker <- paste(
"<h1 style='padding-top: 0.5em; margin-top: 1em; margin-bottom: 0.5em; font-family: Red Hat Text, sans-serif; font-size: 1.8em; color:#43464C;'>",
"<a style='color: #00857A;' href='https://centercityphila.org/explore-center-city/ccdsips'>",
"Center City District Sips 2022",
"</a></h1><p style='color:#9197A6; font-family: Red Hat Text, sans-serif; font-size: 1.5em; padding-bottom: 1em;'>",
"Philadelphia, PA", "</p>")
# custom icon for the center of the map
centerIcon <-
makeAwesomeIcon(
icon = "map-pin",
iconColor = "#FFFFFF",
markerColor = "darkblue", # accepts HTML colors
library = "fa"
)
```
```{r add-center-marker}
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
# customize markers
fillColor = "#009E91",
fillOpacity = 0.6,
stroke = F,
radius = 12,
# customize pop-ups
popup = popInfoCircles,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl() |>
# add marker at the center
addAwesomeMarkers(
icon = centerIcon,
lng = mean(specials$Longitude),
lat = mean(specials$Latitude),
label = "Center City District Sips 2022",
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
),
popup = popInfoMarker,
popupOptions = popupOptions(maxWidth = 250))
```