generated from quarcs-lab/r-env
-
Notifications
You must be signed in to change notification settings - Fork 1
/
spdep.Rmd
805 lines (471 loc) · 16.4 KB
/
spdep.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
---
title: "Spatial regression analysis in R"
author: "Carlos Mendez"
output:
github_document: default
html_notebook:
code_folding: show
highlight: monochrome
number_sections: yes
theme: cosmo
toc: yes
toc_depth: 4
toc_float:
collapsed: no
smooth_scroll: no
pdf_document: default
word_document: default
html_document:
keep_md: yes
code_download: true
df_print: paged
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 4
number_sections: true
code_folding: "show"
theme: "cosmo"
highlight: "monochrome"
always_allow_html: true
---
<style>
h1.title {font-size: 18pt; color: DarkBlue;}
body, h1, h2, h3, h4 {font-family: "Palatino", serif;}
body {font-size: 12pt;}
/* Headers */
h1,h2,h3,h4,h5,h6{font-size: 14pt; color: #00008B;}
body {color: #333333;}
a, a:hover {color: #8B3A62;}
pre {font-size: 12px;}
</style>
Suggested citation:
> Mendez C. (2020). Spatial regression analysis in R. R Studio/RPubs. Available at <https://rpubs.com/quarcs-lab/tutorial-spatial-regression>
This work is licensed under the Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License.
Acknowledgment:
Material adapted from multiple sources, in particular [BurkeyAcademy's GIS & Spatial Econometrics Project](https://spatial.burkeyacademy.com/)
# Libraries
```{r message=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse) # Modern data science workflow
library(spdep)
library(spatialreg)
library(sf)
library(rgdal)
library(rgeos)
# Change the presentation of decimal numbers to 4 and avoid scientific notation
options(prompt="R> ", digits=4, scipen=7)
```
# Tutorial objectives
- Import shapefiles into R
- Import neighbor relationship from `.gal` files
- Create neighbor relationships in R from shape files
- Create neighbor relationships in R from shape latitude and longitude
- Understand the difference between Great Circle and Euclidean distances
- Export neighbor relationships as weight matrices to plain text files
- Test for spatial dependence via the Moran's I test
- Evaluate the four simplest models of spatial regression
# Replication files
- All necessary files can be downloaded from [BurkeyAcademy's GIS & Spatial Econometrics Project](https://spatial.burkeyacademy.com/)
- If you are a member of the [QuaRCS lab](https://quarcs-lab.rbind.io/), you can run this tutorial in [R Studio Cloud](https://rstudio.cloud/spaces/15597/project/965714) and access the files in the following [Github Repository](https://github.com/quarcs-lab/tutorial-spatial-regression)
# Preliminary material
- [Overview of Spatial Econometric Models](https://youtu.be/6qZgchGCMds)
- [Recall of spatial regression in GeoDa and its limitations](https://youtu.be/2IIXH5h6Gz0)
- [R Spatial Data 1: Import shapefiles, create weights matrices, and run Moran's I](https://youtu.be/_bnorgXbSG4)
- [R Spatial Data 2: Make KNN from Lon/Lat text file and export as matrix](https://youtu.be/MtkuQxxQj5s)
- [R Spatial Regression 1: The Four Simplest Models](https://youtu.be/b3HtV2Mhmvk)
- [R Spatial Regression 2: All of the models, likelihood Ratio specification tests, and spatial Breusch-Pagan](https://youtu.be/MbQ4s8lwqGI)
# Import spatial data
Let us use the `readOGR` function from the `rgdal` library to import the `.shp` file
```{r}
#NCVACO <- readOGR(dsn = ".", layer = "data/NCVACO.shp")
NCVACO <- st_read("data/NCVACO.shp")
```
Note that the file is imported as a `SpatialPolygonsDataFrame` object
# Import neighbor relationship: `.gal` file
Let us use the `read.gal` function from the `rgdal` library to import the `.gal` weights matrix created in GeoDa
```{r}
queen.nb <- read.gal("data/NCVACO-Wqueen.gal", region.id=NCVACO$FIPS)
```
## Summarize neighbor relationships
```{r}
summary(queen.nb)
```
- Is the is the neighbor relationship symmetric?
```{r}
is.symmetric.nb(queen.nb)
```
# Create is the neighbor relationship in R
## From a shapefile
- For queen contiguity
```{r}
queen.R.nb <- poly2nb(NCVACO, row.names = NCVACO$FIPS)
```
- Alternatively, you can create a Rook contiguity relationship as
```{r}
rook.R.nb <- poly2nb(NCVACO, row.names = NCVACO$FIPS, queen=FALSE)
```
### Summarize neighbor relationships
```{r}
summary(queen.R.nb)
```
- Are the relationships symmetric?
```{r}
is.symmetric.nb(queen.R.nb)
```
## From latitude and longitude
- Import table
```{r}
nc.cent <- read.csv(file="data/CenPop2010_Mean_CO37.txt")
```
- Identify coordinates
```{r}
nc.coords <- cbind(nc.cent$LONGITUDE, nc.cent$LATITUDE)
```
### Identify 5 nearest neighbors
#### The right way
Recognize that latitude and longitude are handled using **great circle distances**
```{r}
nc.5nn <- knearneigh(nc.coords, k=5, longlat = TRUE)
```
#### The wrong way
Fail to recognize that latitude and longitude are handled using great circle distances. Latitude and longitude should not be used to compute **Euclidean distances**
```{r}
nc.5nn.wrong <- knearneigh(nc.coords, k=5, longlat = FALSE)
```
### Create 5 nearest neighbors relationship
```{r}
nc.5nn.nb <- knn2nb(nc.5nn)
```
Plot the right neighbor relationship
```{r}
plot(nc.5nn.nb, nc.coords)
```
```{r}
nc.5nn.nb.wrong <- knn2nb(nc.5nn.wrong)
```
Plot the wrong neighbor relationship
```{r}
plot(nc.5nn.nb.wrong, nc.coords)
```
Compare the differences
```{r}
plot(nc.5nn.nb,nc.coords)
plot(diffnb(nc.5nn.nb, nc.5nn.nb.wrong), nc.coords, add=TRUE, col="red", lty=2)
title(main="Differences between Euclidean and Great Circle k=5 neighbors")
```
## Compare neighbor relationships
- Do the two queen-based neighbor relationships have the same structure?
```{r}
isTRUE(all.equal(queen.nb, queen.R.nb, check.attributes=FALSE))
```
- Do the two 5nn relationships have the same structure?
```{r}
isTRUE(all.equal(nc.5nn.nb, nc.5nn.nb.wrong, check.attributes=FALSE))
```
## Export as plain text weight matrices
```{r}
nc.5nn.mat <- nb2mat(nc.5nn.nb)
```
```{r}
write.csv(nc.5nn.mat, file="data/nck5.csv")
```
## Note on storing and converting neighbor relationships
There are many ways to store weights matrices and contiguity files:
- `listw` is used in most spdep commands
- `nb` means neighbor file
- `knn` is a k nearest neighbors object
- `neigh` is another kind of neighbor file, common in ecology (e.g. package ade4)
- `poly` stores it a "polygons" of a map file
There are many commands to convert one way of storing contiguity information into another:
- `poly2nb(object)` converts `polygon` to `nb`
- `nb2listw(object)` converts `nb` to `listw`
# Test spatial autocorrelation
Let us use the Moran's I based on the function `moran`, which need the following arguments:
- variable
- neighbor relationship as a `listw` object
- number of regions
- sum of weights
```{r}
moranStatistic <- moran(NCVACO$SALESPC, nb2listw(queen.nb), length(NCVACO$SALESPC), Szero(nb2listw(queen.nb)))
moranStatistic
```
Moran statistic
```{r}
moranStatistic[["I"]]
```
An alternative way of computing the test and a p value
```{r}
moranTest <- moran.test(NCVACO$SALESPC, nb2listw(queen.nb))
moranTest
```
Moran Statistic
```{r}
moranTest[["estimate"]][["Moran I statistic"]]
```
P-value
```{r}
moranTest[["p.value"]]
```
# Regression models
## Import spatial data
```{r}
#spat.data = readOGR(dsn = ".", layer = "NCVACO")
spat.data <- st_read("data/NCVACO.shp")
```
- show variable names
```{r}
names(spat.data)
```
This dataset is some data from some studies Mark Burkey did on liquor demand using data from around 2003. In particular, he looks at the states of Virginia and North Carolina. This dataset is related to, but not the same as data used on an NIH grant and published in a paper:
> [Burkey, Mark L. Geographic Access and Demand in the Market for Alcohol. The Review of Regional Studies, 40(2), Fall 2010, 159-179](https://ideas.repec.org/p/pra/mprapa/36913.html)
Unit of analysis: counties in Virginia and North Carolina
Variable Descriptions:
- `Lon Lat` Longitude and Latitude of County Centroid
- `FIPS` FIPS Code for County (Federal Information Processing Standard)
- `qtystores` #Liquor Stores in County
- `SALESPC` Liquor Sales per capita per year, $
- `PCI` Per capita income
- `COMM15OVP` % commuting over 15 minutes to work
- `COLLENRP` % of people currently enrolled in college
- `SOMECOLLP` % of people with "some college" or higher education level
- `ARMEDP` % in armed forces
- `NONWHITEP` % nonwhite
- `UNEMPP` % unemployed
- `ENTRECP` % employed i entertainment or recreation fields (proxy for tourism areas)
- `PUBASSTP` % on public assistance of some sort
- `POVPOPP` % in poverty
- `URBANP` % living in urban areas
- `FOREIGNBP` % foreign born
- `BAPTISTSP` % southern baptist (historically anti-alcohol)
- `ADHERENTSP` % adherents of any religion
- `BKGRTOMIX` wtd. average distance from block group to nearest bar selling liquor
- `COUNTMXBV` count of bars selling liquor
- `MXBVSQM` bars per square mile
- `BKGRTOABC` distance fro block group to nearest retail liquor outlet ("ABC stores")
- `MXBVPPOP18OV` Bars per 1,000? people 18 and older
- `DUI1802` DUI arrests per 1,000 people 18+
- `FVPTHH02` Offences against families and children (domestic violence) per 1,000 households
- DC GA KY MD SC TN WV VA Dummy variables for counties bordering other states
- `AREALANDSQMI` Area of county
- `COUNTBKGR` count of block groups in county
- `TOTALPOP` Population of county
- `POP18OV` 18+ people in county
- `LABFORCE` number in labor force in county
- `HHOLDS` # households in county
- `POP25OV` Pop 25+ in county
- `POP16OV` Pop 16+ in county
- summarize imported data
```{r}
summary(spat.data)
```
### Transform variables
- From categorical factor to numeric
```{r}
spat.data$PCI <- as.numeric(levels(spat.data$PCI))[spat.data$PCI]
```
## Spatial distribution
```{r}
#spplot(spat.data, "SALESPC")
```
## Spatial weights
- Create neighbor relationships
```{r}
queen.nb <- poly2nb(spat.data)
rook.nb <- poly2nb(spat.data, queen=FALSE)
```
- convert `nb` to `listw` type
```{r}
queen.listw <- nb2listw(queen.nb)
rook.listw <- nb2listw(rook.nb)
```
- use a shorter name
```{r}
listw1 <- queen.listw
```
# Classical approach
## The four simplest models
![](data/the4simplestModels.jpg)
Re-watch the video ["The Four Simplest Models"](https://youtu.be/b3HtV2Mhmvk)
## Model 1: OLS
- define regression equation, so we don't have to type it each time
```{r}
reg.eq1 <- DUI1802 ~ SALESPC + COLLENRP + BKGRTOABC + BAPTISTSP + BKGRTOMIX + ENTRECP
```
```{r}
reg1 <- lm(reg.eq1, data = spat.data)
summary(reg1)
```
### Check residual spatial dependence
```{r}
lmMoranTest <- lm.morantest(reg1,listw1)
lmMoranTest
```
### Model selection via LM tests
![](data/anselinApproach.jpg)
Source: Anselin (1988), Burkey (2018)
```{r}
lmLMtests <- lm.LMtests(reg1, listw1, test=c("LMerr", "LMlag", "RLMerr", "RLMlag", "SARMA"))
lmLMtests
```
Based on these results (Lowest is RLMlag), we would choose the spatial lag model
## Model 2: SLX
Spatially Lagged X y=Xß+WXT+e
p=rho, T=theta, and L=lambda
```{r}
reg2 = lmSLX(reg.eq1, data = spat.data, listw1)
summary(reg2)
```
### Marginal effects
Even in the case of the SLX model, you need to think about total impact (Direct+Indirect)
```{r}
impacts(reg2, listw = listw1)
```
- Add se errors and p-values
(R=500 not actually needed for SLX)
```{r}
summary(impacts(reg2, listw=listw1, R=500), zstats = TRUE)
```
## Model 3: SAR
y=pWy+XB+e
```{r}
reg3 <- lagsarlm(reg.eq1, data = spat.data, listw1)
summary(reg3)
```
### Marginal effects
Remember that when you use the spatial lag model, you cannot interpret Betas as marginal effects, you have to use the `impacts` function and obtain direct and indirect effects.
```{r}
impacts(reg3, listw = listw1)
```
- evaluate p-values
```{r}
# The code below works, but it takes some time. I will skip it for now
#summary(impacts(reg3, listw=listw1, R=500),zstats=TRUE)
```
Caution: These p-values are simulated, and seem to vary a bit from run to run.
## Model 4: SEM
y=XB+u, u=LWu+e
```{r}
reg4 <- errorsarlm(reg.eq1, data=spat.data, listw1)
summary(reg4)
```
### Spatial Hausman Test
```{r}
HausmanTest <- Hausman.test(reg4)
HausmanTest
```
[Pace, R.K. and LeSage, J.P., 2008. A spatial Hausman test. Economics Letters, 101(3), pp.282-284.](https://www.sciencedirect.com/science/article/abs/pii/S016517650800270X)
# Modern approach
Re-watch this video: [R Spatial Regression 2: All of the models, likelihood Ratio specification tests, and spatial Breusch-Pagan](https://youtu.be/MbQ4s8lwqGI)
![](data/nesting.jpg)
Source: Burkey (2018)
## Model 5: SDEM (Spatial Durbin Error)
add lag X to SEM
y=XB+WxT+u, u=LWu+e
```{r}
reg5 <- errorsarlm(reg.eq1, data = spat.data, listw1, etype = "emixed")
summary(reg5)
```
### Marginal effects
```{r}
impacts(reg5,listw=listw.wts)
```
```{r}
summary(impacts(reg5, listw=listw.wts, R=500),zstats=TRUE)
```
## Model 6: SDM (Spatial Durbin Model)
add lag X to SAR
y=pWy+XB+WXT+e
```{r}
reg6 <- lagsarlm(reg.eq1, data = spat.data, listw1, type="mixed")
summary(reg6)
```
### Marginal effects
```{r}
impacts(reg6,listw=listw1)
```
```{r}
#summary(impacts(reg6,listw=listw1,R=500),zstats=TRUE)
```
## Model 7: Manski
All inclusive
Not recommended
y=pWy+XB+WXT+u, u=LWu+e
```{r}
reg7 <- sacsarlm(reg.eq1, data = spat.data, listw1, type="sacmixed")
summary(reg7)
```
### Marginal effects
```{r}
impacts(reg7,listw=listw1)
```
```{r}
#summary(impacts(reg7,listw=listw1,R=500),zstats=TRUE)
```
## Model 8: SARAR
a.k.a. Kelejian-Prucha, Cliff-Ord, or SAC
If all T=0,y=pWy+XB+u, u=LWu+e
```{r}
reg8 <- sacsarlm(reg.eq1, data = spat.data, listw1, type="sac")
summary(reg8)
```
(listw2 allows for a different weights matrix for the error structure if desired)
### Marginal effects
```{r}
impacts(reg8,listw=listw1)
```
```{r}
#summary(impacts(reg8,listw=listw1,R=500),zstats=TRUE)
```
## Model 9: SARMA??
SARMA (like SARAR, but more local error structure)
y=ρWy+Xβ+u, u=I-λWε or
y=ρWy+Xβ+u, u=λWε+ε (2 ways of writing the same thing)
Can't be done easily in R
# Nesting via LR tests
Test Model Restrictions
Ho: Restricting coefficients = 0 (i.e., the restricting model is OK)
### SDEM to SEM
```{r}
LR.sarlm(reg5, reg4)
```
Since p-value = 0.004, we should reject the null hypothesis. SDM and SDEM are not nested.
- the order you put the models in doesn't matter
```{r}
LR.sarlm(reg4, reg5)
```
## SDEM to SLX
```{r}
LR.sarlm(reg5, reg2)
```
Since p-value = 0.0002, we should reject the null hypothesis. SLX and SDEM are not nested.
## SDEM to OLS
```{r}
LR.sarlm(reg5, reg1)
```
Since p-value = 0.000002, we should reject the null hypothesis. OLS and SDEM are not nested.
## Spatial Breusch-Pagan test for Heteroskedasticity
```{r}
bptest.sarlm(reg5, studentize = TRUE)
```
We do not have evidence of Heteroskedasticity
## Spatial Pseudo R^2
If we want to get an idea of how accurately our spatial model "Fits" the data, we can via a Pseudo R^2 as follows
```{r}
1-(reg5$SSE/(var(spat.data$DUI1802)*(length(spat.data$DUI1802)-1)))
```
# References
- Anselin, Luc. (1988) Spatial Econometrics: Methods and Models. Kluwer Academic Publishers: Dordrecht, Germany.
- [Bivand, R., & Piras, G. (2015). Comparing implementations of estimation methods for spatial econometrics. American Statistical Association.](https://www.jstatsoft.org/article/view/v063i18)
- [BurkeyAcademy's GIS & Spatial Econometrics Project](https://spatial.burkeyacademy.com/)
- Burkey, Mark L. A Short Course on Spatial Econometrics and GIS. REGION, 5(3), 2018, R13-R18. <https://doi.org/10.18335/region.v5i3.254>
- [Pace, R.K. and LeSage, J.P., 2008. A spatial Hausman test. Economics Letters, 101(3), pp.282-284.](https://www.sciencedirect.com/science/article/abs/pii/S016517650800270X)
- Elhorst, J.P. (2014) Spatial Econometrics: From Cross-Sectional Data to Spatial Panels, Springer.
- Lesage, James and R. Kelly Pace (2009) Introduction to Spatial Econometrics, CRC Press/Taylor & Francis Group.
- [LeSage, James, (2014), What Regional Scientists Need to Know about Spatial Econometrics, The Review of Regional Studies, 44, Issue 1, p. 13-32.](https://rrs.scholasticahq.com/article/8081-what-regional-scientists-need-to-know-about-spatial-econometrics)
# Other tutorials
- <https://ignaciomsarmiento.github.io/2017/02/07/An-Introduction-to-Spatial-Econometrics-in-R.html>
- <https://github.com/rsbivand/ECS530_h19/blob/master/ECS530_VII.Rmd>
- <https://www.r-bloggers.com/spatial-regression-in-r-part-1-spamm-vs-glmmtmb/>
END