-
Notifications
You must be signed in to change notification settings - Fork 0
/
geospatial_with_sf.Rmd
314 lines (224 loc) · 11.2 KB
/
geospatial_with_sf.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
---
title: "Spatial analysis in R with the `sf` package"
fig_caption: false
output:
html_document:
toc: true
toc_float: true
toc_depth: 2
---
```{r setup, echo=FALSE, warning=FALSE, purl=FALSE, message=FALSE}
library(knitr)
options(repos="http://cran.rstudio.com/")
pkgs <- c("sf", "raster", "dplyr","rgdal","mapview")
x<-lapply(pkgs, library, character.only = TRUE)
opts_chunk$set(tidy=F, root.dir = ".", cach=TRUE)
```
In this mini-workshop we will introduce the `sf` package, show some examples of geospatial analysis, work with base plotting of `sf` objects, and show how `mapview` can be used to map these objects. It is assumed that you have [R and RStudio installed](https://github.com/rhodyrstats/intro_r_workshop#software) and that you, at a minimum, understand the basic concepts of the R language (e.g. you can work through [R For Cats](https://rforcats.net)).
Also as an aside, I am learning the `sf` package right now so, we will be learning all of this together!
# The `sf` package
Things are changing quickly in the R/spatial analysis world and the most fundamental change is via the `sf` package. This package aims to replace `sp`, `rgdal`, and `rgeos`. There are a lot of reasons why this is a good thing, but that is a bit beyond the scope of this workshop Suffice it to say it should make things faster and simpler!
To get started, lets get `sf` installed:
```{r, eval=FALSE}
install.packages("sf")
library("sf")
```
`sf` does rely on having access the GDAL, GEOS, and Proj.4 libraries. On Windows and Mac this should be pretty straightforward if we are installing from CRAN (which we are). If you use linux, you know nothing is straightforward and you are on your own!
## Exercise 1
The first exercise won't be too thrilling, but we need to make sure everyone has the packages installed.
1. Install `sf`.
2. Load `sf`.
3. If you don't have `dplyr` already, make sure it is installed.
4. Load `dplyr`.
# Reading in spatial data with `sf`
## Simple Features
So, what does `sf` actually provide us? It is an implementation of an ISO standard for storing spatial data. It forms the basis for many of the common vector data models and is centered on the concept of a "feature". Essentially a feature is any object in the real world. There are many different types of features and there are different details that get stored about each. The [first `sf` vignette](https://cran.r-project.org/web/packages/sf/vignettes/sf1.html) does a really nice job of explaining the details. For this mini-workshop we are going to focus on three feature types, POINT, LINESTRING, and POLYGON. For each of the types, there will be coordinates stored as dimensions, a coordinate reference system, and attributes.
## Get some data to use
We can grab some data directly from the Rhode Island Geographic Information System (RIGIS) for these examples. This code assumes you have a `data` folder in your current workspace. Create one if you need it.
```{r, eval = TRUE, echo=FALSE, warning=FALSE}
# This can take some time!
# Check fo data folder
if(!dir.exists("data")){
dir.create("data")
}
# Municipal Boundaries
if(!file.exists("data/muni97d.shp")){
download.file(url = "http://www.rigis.org/geodata/bnd/muni97d.zip",
destfile = "data/muni97d.zip")
unzip(zipfile = "data/muni97d.zip",
exdir = "data")
}
# Streams
if(!file.exists("data/streams.shp")){
download.file(url = "http://www.rigis.org/geodata/hydro/streams.zip",
destfile = "data/streams.zip")
unzip(zipfile = "data/streams.zip",
exdir = "data")
}
# Potential Growth Centers
if(!file.exists("data/growth06.shp")){
download.file(url = "http://www.rigis.org/geodata/plan/growth06.zip",
destfile = "data/growth06.zip")
unzip(zipfile = "data/growth06.zip",
exdir = "data")
}
# Land Use/Land Cover
if(!file.exists("data/rilc11d.shp")){
download.file(url = "http://www.rigis.org/geodata/plan/rilc11d.zip",
destfile = "data/rilc11d.zip")
unzip(zipfile = "data/rilc11d.zip",
exdir = "data")
}
```
```{r, eval = FALSE}
# Create a data folder if it doesn't exist, and yes R can do that
if(!dir.exists("data")){dir.create("data")}
# Municipal Boundaries
download.file(url = "http://www.rigis.org/geodata/bnd/muni97d.zip",
destfile = "data/muni97d.zip")
unzip(zipfile = "data/muni97d.zip",
exdir = "data")
# Streams
download.file(url = "http://www.rigis.org/geodata/hydro/streams.zip",
destfile = "data/streams.zip")
unzip(zipfile = "data/streams.zip",
exdir = "data")
# Potential Growth Centers
download.file(url = "http://www.rigis.org/geodata/plan/growth06.zip",
destfile = "data/growth06.zip")
unzip(zipfile = "data/growth06.zip",
exdir = "data")
# Land Use/Land Cover
download.file(url = "http://www.rigis.org/geodata/plan/rilc11d.zip",
destfile = "data/rilc11d.zip")
unzip(zipfile = "data/rilc11d.zip",
exdir = "data")
```
## Data input
To pull the shapefiles in we can simply use the `st_read()` function. This will create an object which is a simple feature collection of, in our case, POINT, LINESTRING, or POLYGON. As an aside, many of the `sf` functions and all of the ones we will be using start with `st_`. This stands for "spatial" and "temporal". Take a look below for examples reading in each of our datasets.
### POINT
```{r}
growth_cent <- st_read("data/growth06.shp")
```
### LINESTRING
```{r}
streams <- st_read("data/streams.shp")
```
### POLYGON
```{r}
muni <- st_read("data/muni97d.shp")
```
```{r}
lulc <- st_read("data/rilc11d.shp")
```
## Other data
We won't have time during this mini-workshop to look at reading in other data formats, but since `sf` uses GDAL it can read all of the files for which you have drivers.
```{r}
drvrs <- st_drivers()
drvrs$long_name
```
Additionally, you if you have a tabular dataset with coordinates, you can create an sf object with those. An example using EPA's National Lakes Assessment data:
```{r}
nla_url <- "https://www.epa.gov/sites/production/files/2016-12/nla2012_wide_siteinfo_08232016.csv"
nla_stations <- read.csv(nla_url, stringsAsFactors = FALSE)
nla_sf <- st_as_sf(nla_stations, coords = c("LON_DD83", "LAT_DD83"), crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
```
(HT to [Mike Treglia](https://twitter.com/MikeTreglia) for the suggestion!)
## A word on performance
One of the benefits of using `sf` is the speed. In my tests it is about twice as fast as the prior standard of `sp` and `rgdal`. Let's look at a biggish shape file with 1 million points!
![1 million points](https://media4.giphy.com/media/13B1WmJg7HwjGU/giphy.gif)
```{r, echo=FALSE, message=FALSE}
if(!file.exists("data/big.shp")){
df <- data.frame(x=rnorm(1000000,-70,2),y=rnorm(1000000,42,2))
big <- st_as_sf(df, coords = c("x","y"), crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
st_write(big,"data/big.shp")
}
```
```{r, eval=TRUE}
#The old way
system.time(rgdal::readOGR("data","big"))
#The sf way
system.time(st_read("data/big.shp"))
```
## Exercise 2
1. Make sure each of the shapefiles is read in. Follow the instructions from above to complete this.
# Basics of `sf` objects
Perhaps the nicest feature (pun intended) about `sf` objects is that they are nothing more than data frames. The data for each feature (e.g. the attributes in ESRI speak) are stored in the data frame's first columns. The last column of the data frame is a "geometry" column which holds the coordinates, and coordinate reference system information. I say this is nice becuase we don't need to completely learn a new way of working with spatial data. Much of what we now about working with plain old tabular data frames will also work with `sf` objects.
We can use many of our base R skills directly with these objects.
```{r}
head(muni)
streams$NAME[1:10]
str(growth_cent)
```
## Manipulate `sf` objects with `dplyr`, yes, `dplyr`!
However, what is truly awesome, is the we can use `dplyr` to manipulate our spatial data and pipe our spatial workflows. First let's make sure `dplyr` is loaded up.
```{r}
library("dplyr")
```
Now we can work with our spatial data. Let's filter out the towns that are more than 100 km^2^:
```{r}
big_muni <- muni %>%
mutate(area_skm = AREA*.000000092903) %>%
group_by(NAME) %>%
summarize(sum_area_skm = sum(area_skm)) %>%
filter(sum_area_skm >= 100)
big_muni
```
And first time I tried this, this happened:
![mind_blown](https://media.giphy.com/media/3o7TKSjRrfIPjeiVyM/giphy.gif)
## Exercise 3
Let's try the same thing with our land use/land cover data, except without the filter. Use `dplyr` and
1. Add a column to convert area from acres to square kilometers
2. Group by LULC_2011
3. Sum up the square kilometers
4. Which land use/land cover class is the biggest? The smallest?
# Plotting
There are default plotting methods for `sf` objects, so we can use our base plotting tools to create some quick maps.
## `sf` plotting methods
If a dataset has multiple attributes, the default plotting will create a separate plot of each. Nice for quickly exploring a dataset.
```{r}
plot(muni)
```
And to plot just a single field (might be a better way to do this, but it is what makes the most sense to me right now):
```{r}
plot(muni$geometry, col = muni$COUNTY)
```
And to add other layers
```{r}
plot(muni$geometry)
plot(streams$geometry, col = "blue", add = TRUE)
plot(growth_cent$geometry, pch = 19, cex = 1, col = "red", add = TRUE)
```
That's nice, but if we want to interact with the data we can't do that just yet (I'm planning to add `sf` support to [`quickmapr`](https://cran.r-project.org/package=quickmapr)). That has been added to the development version of `mapview`.
## `mapview`
If you don't already have `mapview` we need to get it installed and loaded. This isn't on CRAN yet so we can install with `devtools::install_github()`
```{r, eval = FALSE}
devtools::install_github("environmentalinformatics-marburg/mapview", ref="develop")
library("mapview")
```
And add some of our `sf` objects to a Leaflet map with `mapview`.
```{r, eval = FALSE}
mapview(muni) + streams + growth_cent
```
(sorry not embedded, was having rendering problems that I didn't have time to solve)
## Exercise 4
1. Play around with base plotting to plot some of these different `sf` objects.
# Analysis
Basic GIS analyses are fully supported in `sf`. For this introduction we will discuss only one: buffer, clip, and summarize.
## Buffer
To buffer features we will use the `st_buffer()` function. For our example we will buffer a single point and summarize the land use/land cover around that point. We can pipe our full workflow together using `%>%`.
```{r}
wk_lulc_summ <- growth_cent %>%
filter(NAME == "WEST KINGSTON") %>%
st_buffer(dist = 5280) %>%
st_intersection(lulc) %>%
group_by(Descr_2011)%>%
summarize(area = sum(Acres_2011)) %>%
arrange(desc(area))
wk_lulc_summ
plot(wk_lulc_summ$geoms, col = wk_lulc_summ$Descr_2011)
```
Clearly we need to think about our colors some, but the combination of `sf` and `dplyr` for doing GIS types of operations is FANTASTIC and I am now completely:
![in love](https://media.giphy.com/media/26FLdmIp6wJr91JAI/giphy.gif)
## Exercise 5
Let's get a bit ambitious now. What is the land use land cover totals within 1 km of the Wood River?