-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
290 lines (233 loc) · 12.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: "Non-Emergency Responses in the 311 System During the Early Stage of the Pandemic: \n A Case Study of Kansas City"
output: github_document
---
**Abstract**
In response to the COVID-19 pandemic, many U.S. citizens sought information and support from their city governance through the 311 non-emergency service request system. The main purpose of this study was to analyze the temporal trends in the 311 data before and during the first few months of the COVID-19 pandemic (3/1/2019 to 9/1/2020). Like other major U.S. cities such as Dallas and New York City, analysis of Kansas City 311 data showed that the COVID-19 pandemic has led to a considerable decline in the aggregate number of calls. However, five service categories (“Public Safety”, “Public Health”, “Trash/Recycling”, “Parks & Recreation”, and “Property / Buildings / Construction”) experienced a substantial increase in call volume. To explore whether these changes are driven by COVID-related service requests, we used the description text data and identified 2,379 requests related to the pandemic, accounting for 4.3 percent of all non-emergency requests in Kansas City between March and August of 2020. More than half of the COVID-related requests reported mask violations where people failed to wear masks or did not wear masks properly. Compared to the non-COVID-related requests, citizens were more likely to seek non-emergency services through phone and email and less likely to use the web as means of communication. In addition, most changes in “Public Safety” and “Public Health” request volumes were driven by these COVID-related requests. These results can help city officials and decision makers improve the city’s resilience by allocating resources for the above-mentioned five service categories during a pandemic. In conclusion, analysis of open-access 311 data can be a catalyst for local governments to quickly and properly respond and build long-term resilience against future pandemics and other health threats.
**Figures**
```{r setup, include=FALSE}
#load library
library(dplyr)
options(dplyr.summarise.inform =FALSE)
library(reshape2)
library(openxlsx)
library(car)
library(ggplot2)
library(hrbrthemes)
library(viridis)
#load library for word association graph
library(magrittr)
library(purrr)
library(tibble)
library(dplyr)
library(tidyr)
library(tidytext)
library(igraph)
library(ggraph)
library(corpus)
#load data
load("data/kcmo2019_2020.rdata")
#create year+month var
dat$date = dat$CREATEYR*100 +dat$CREATEMO
dat$date = as.Date(as.character(dat$date*100+1),"%Y%m%d")
#identify covid related calls
dat = dat %>%
mutate(
covid = as.numeric(grepl("covid",tolower(description))|
grepl("corona",tolower(description))|
grepl("pandemic",tolower(description))|
grepl("virus",tolower(description))|
grepl("positive",tolower(description))
),
mask = as.numeric(grepl("mask",tolower(description))|
grepl("face cover",tolower(description))|
grepl("ppe ",tolower(description))|
grepl("coverings",tolower(description))
),
sdistance = as.numeric(grepl("social",tolower(description))|
grepl("distanc",tolower(description))|
grepl("6 feet",tolower(description))|
grepl("quarantine",tolower(description))|
grepl("stay at home",tolower(description))|
grepl("gathering",tolower(description))
),
essential = as.numeric(grepl("essential",tolower(description))|
grepl("still open",tolower(description))|
grepl("open for business",tolower(description))|
grepl("open and operating",tolower(description))|
grepl("still operating",tolower(description))
),
allcovid = as.numeric(covid==1|mask==1|sdistance==1|essential==1)
)
#load covid cases data
covid= read.csv("https://raw.githubusercontent.com/OpportunityInsights/EconomicTracker/main/data/COVID%20-%20City%20-%20Daily.csv")
covid = covid %>% filter(cityid == 37) %>% #KCMO code is 37
select(year,month, day, new_case_count,new_death_count) #%>% group_by(year,month) %>% summarise_each(funs(mean(., na.rm = TRUE)))
covid$new_case_count = as.numeric(covid$new_case_count)
covid$new_death_count = as.numeric(covid$new_death_count)
covid = covid %>% group_by(year,month) %>% summarise_each(funs(mean(., na.rm = TRUE)))
covid$date = as.Date(as.character(covid$year*10000+covid$month*100+1),"%Y%m%d")
covid = covid %>% select(date,new_case_count,new_death_count)
#calculate total cases
covid[is.na(covid)]=0
covid = covid %>%
mutate(total_case_count = cumsum(new_case_count))
#subset 311 data
dat_covid = dat %>% filter(season=="COVID-warm")
```
Figure 1: Monthly number of requests containing COVID-19-related keywords in 2019-2020
```{r figure01, echo=FALSE, warning=FALSE, message=FALSE,fig.height=8,fig.width=8, dpi=300}
#Calculate volume of requests containing each keyword over time
tab_00 = dat %>% group_by(date) %>%
summarize(
`covid` = sum(grepl("covid",tolower(description))),
`corona` = sum(grepl("corona",tolower(description))),
`pandemic` = sum(grepl("pandemic",tolower(description))),
`virus` = sum(grepl("virus",tolower(description))),
`positive` = sum(grepl("positive",tolower(description))),
`mask` = sum(grepl("mask",tolower(description))),
`face cover` = sum(grepl("face cover",tolower(description))),
`coverings` = sum(grepl("coverings",tolower(description))),
`ppe` = sum(grepl("ppe ",tolower(description))),
`social` = sum(grepl("social",tolower(description))),
`distanc` = sum(grepl("distanc",tolower(description))),
`6 feet` = sum(grepl("6 feet",tolower(description))),
`quarantine` = sum(grepl("quarantine",tolower(description))),
`stay at home` = sum(grepl("stay at home",tolower(description))),
`gathering` = sum(grepl("gathering",tolower(description))),
`essential` = sum(grepl("essential",tolower(description))),
`still open` = sum(grepl("still open",tolower(description))),
`open for business` = sum(grepl("open for business",tolower(description))),
`open and operating` = sum(grepl("open and operating",tolower(description))),
`still operating` = sum(grepl("still operating",tolower(description))),
) %>%
melt(id.vars="date") %>%
mutate(
Keywords = variable,
Keyorder = case_when(
Keywords %in% c("distanc", "social","covid","pandemic","6 feet","ppe") ~ 1,
Keywords %in% c("essential", "stay at home","still open","still operating","open for business","open and operating") ~ 2,
Keywords %in% c("gathering", "virus","corona","quarantine") ~ 3,
Keywords %in% c("mask", "positive","face cover","coverings") ~ 4
)
)
#Line Plot of key words over time
ggplot(tab_00, aes(x=date,y = value, color = reorder(Keywords,Keyorder)))+
geom_line(lwd=1.50)+
scale_color_viridis(discrete = TRUE) +
scale_x_date(date_breaks= "3 months",date_labels = "%b-%y")+
labs(y="Number of requests",
x="")+
facet_wrap(~reorder(Keywords,Keyorder),ncol=4,scale="free_y")+
theme_bw()+
theme(legend.position = "none",axis.text.x = element_text(angle = 90))
```
Figure 2: Number of requests during March-August in 2019 versus 2020
```{r figure02, echo=FALSE, warning=FALSE, message=FALSE, dpi=300}
#Plot request volume by category in 2019 versus 2020
dat %>%
filter(season != "PreCOVID-COld") %>%
group_by(season,CREATEMO) %>%
count() %>%
arrange(desc(season)) %>%
mutate(
season = ifelse(season == "PreCOVID-Warm","2019","2020"),
CREATEMO = factor(CREATEMO,levels = 3:8, labels = c("Mar","Apr","May","Jun","Jul","Aug")))%>%
ggplot(aes(x = CREATEMO, y = n,group=reorder(season,-n),fill=reorder(season,-n) ))+
geom_col(position = "dodge")+
scale_fill_manual(values = c("dodgerblue4","red4"))+
#scale_x_continuous(breaks = scales::pretty_breaks(n = 5)) +
scale_y_continuous(labels=scales::comma,limit = c(0,15000)) +
labs(y = "Number of requests", x = "",fill= "")+
theme_bw()+
theme(legend.position = c(0.9,0.9),
legend.background = element_blank())
```
Figure 3: Number of covid-related requests (green, left axis) vs COVID-19 new cases (blue, right axis) in 2020
```{r figure03, echo = FALSE, warning=FALSE, message=FALSE, dpi=300}
#Trends of covid-related requests
tab_01 = dat %>% group_by(date) %>%
summarise(COVID_19_related = sum(allcovid,na.rm=T))
#Join with covid cases
tab_01 = tab_01 %>% left_join(covid,by="date")
tab_01[is.na(tab_01)] = 0
#Plot covid-related words versus new cases
coef = 0.2
ggplot(tab_01[11:18,], aes(x=date))+
geom_col(aes(y=COVID_19_related),lwd=1.50,fill="dodgerblue4")+
geom_line(aes(y=new_case_count/coef),lwd=1.50,col="green3")+
geom_line(aes(y=total_case_count/coef),lwd=1.50,col="red",lty=2)+
scale_x_date(date_breaks= "1 month", date_labels = "%b-%y")+
scale_y_continuous(name= "Number of requests",sec.axis = sec_axis(~.*coef,name="Number of cases"))+
labs(x="")+
theme_bw()
```
Figure 4: Words Association Graph
```{r figure04, echo = FALSE, warning=FALSE, message=FALSE, dpi = 300,fig.width=8,fig.height=3.5}
#Combine requests text
text_bigram = dat_covid %>% filter(allcovid==1) %>%
select(description) %>%
#mutate(description = gsub("Description: ","",description)) %>%
unnest_tokens(bigram, description, token = "ngrams", n=2)
#Remove stop words from our dataset
my_stop_words <- tibble(
word = c("the","description"))
all_stop_words <- stop_words %>%
bind_rows(my_stop_words)
bigrams_filtered = text_bigram %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% all_stop_words$word) %>%
filter(!word2 %in% all_stop_words$word) %>%
mutate(
word1 = text_tokens(word1, stemmer = "en"),
word2 = text_tokens(word2, stemmer = "en"),
)
#Get bigrams that occur for more than 20 times
bigram_tograph = bigrams_filtered %>%
group_by(word1,word2) %>% count()%>%
filter(n > 20)
#Draw the network graph with the ggraph function and specify properties of the nodes (edge, point, text) and add title, subtitles, etc. to the graph
bigram_tograph %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(width = n,edge_alpha = n), show.legend = FALSE,edge_colour = "red",lwd = 2.5) +
scale_edge_width(range = c(0.5,2.5))+
scale_edge_alpha(range = c(0.3,1))+
geom_node_point(color = "black") +
geom_node_text(aes(label = name), vjust = 0.1, hjust =-0.1, repel = TRUE,col = "dodgerblue4") +
labs(x="",y="") +
#theme_bw()+
theme(axis.text = element_blank(),
panel.background = element_rect(fill = "white"))
```
Figure 5: Year-over-year change in covid-related requests (darker blue) vs year-over-year change in total requests (lighter blue) by category from March to August 2020
```{r figure05,echo=FALSE, warning=FALSE, message=FALSE,fig.width=8, dpi=300}
#Calculate year over year change
diff = dat %>%
filter(CATEGORY %in% c("Public Health","Public Safety","Parks & Recreation","Property / Buildings / Construction","Trash / Recycling")) %>%
group_by(CATEGORY,CREATEYR,CREATEMO) %>%
count() %>%
dcast(CATEGORY+CREATEMO~CREATEYR) %>%
mutate(diff = `2020`-`2019`)%>%
na.omit()
#Calculate covid related requests
diffcovid = dat_covid %>%
filter(CATEGORY %in% c("Public Health","Public Safety","Parks & Recreation","Property / Buildings / Construction","Trash / Recycling")) %>%
group_by(CATEGORY,CREATEMO,date) %>%
summarize(covid= sum(allcovid))
#Create labels for alpha values
alpha = c("All requests"=0.2,"Covid-related requests"=1)
#Plot by category and date
diffcovid %>%
merge(diff) %>%
ggplot(aes(y =diff, x = date)) +
geom_col(aes(alpha="All requests"),fill="dodgerblue4")+
geom_col(aes(y=covid,alpha = "Covid-related requests"),fill="dodgerblue4")+
facet_wrap(~reorder(CATEGORY,covid),scale="free_y",ncol=3)+
scale_x_date(date_breaks= "1 month", date_labels = "%b")+
scale_y_continuous(labels=scales::comma) +
scale_alpha_manual(values = alpha)+
labs(y="Year-over-year Change In Number of Requests",x="",alpha="")+
theme_bw()+
theme(legend.position = c(0.85,0.1))
```