-
Notifications
You must be signed in to change notification settings - Fork 0
/
status.Rmd
289 lines (223 loc) · 11.6 KB
/
status.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
title: "Sampling status"
output:
html_document:
toc: true
toc_float: true
---
```{r setup, include=FALSE}
# Load packages for this page
library(googlesheets4)
library(hexbin)
library(ggplot2)
library(viridis)
library(knitr)
library(ggmap)
library(tidyverse)
library(kableExtra)
# Load metadata on completed surveys
gs4_deauth()
d = read_sheet("https://docs.google.com/spreadsheets/d/1mNYnSTCs9WYy5SN2HF4VYep550Sccz9xU5o8JuYut-A/edit?usp=sharing",
sheet='Completed surveys')
d = data.frame(d)
d2 = read_sheet("https://docs.google.com/spreadsheets/d/1XFNI7KXeuo5NuHL-0miKhWYkt3MeWUFYu2LugHRHE6Q/edit?usp=sharing", sheet='completedSurveys')
d2$surveyDate = as.character(d2$surveyDate)
d2 = data.frame(d2)
d2$Genus_sp = with(d2, paste(plantGenus, plantSpecies, sep='_'))
dAll = bind_rows(d,d2)
# Drop rows without an entry for Genus_sp
dAll = subset(dAll, !is.na(dAll$Genus_sp))
# Load meta data on focal species
# dfs = read_sheet("https://docs.google.com/spreadsheets/d/1mNYnSTCs9WYy5SN2HF4VYep550Sccz9xU5o8JuYut-A/edit?usp=sharing",
# sheet='Focal_species')
# dfs = data.frame(dfs)
focalFam = function(z) # define function to ID focal families in data frame
ifelse(z == 'Apocynaceae' | z == 'Asteraceae' | z == 'Fabaceae' | z == 'Solanaceae' |
z == 'Rubiaceae', TRUE, FALSE)
dAll$focalFamily = focalFam(dAll$plantFamily)
d.complete = dAll[!is.na(dAll$plantFamily),] # data.frame for complete entries
```
The Herbivory Variability Network has completed **`r nrow(dAll)` surveys on `r length(unique(d.complete$Genus_sp))` plant species from `r length(unique(d.complete$plantFamily))` plant families**, which is about `r round( (length(unique(d.complete$plantFamily))/465) * 100, 0)`% of all extant vascular plant families. These figures and tables update as new data are entered into our database.
Our Phase 1 of data collection had a three-scaled sampling approach:
1. Sampling one plant species from as many plant families as possible
2. Sampling as many species (and tribes and genera) as possible from five focal families (Apocynaceae, Asteraceae, Fabaceae, Rubiaceae, and Solanaceae)
3. Sampling as many sites as possible for three focal species (_Taraxacum officinale_, _Plantago lanceolata_, and _Plantago major_).
The goal of this stratified sampling plan was to avoid potential biases in geographic and/or taxonomic coverage, as well as to permit robust exploration of factors that shape patterns in herbivory.
Below we show the current status of our sampling for each of the three scales of sampling, our overall geographic extent, and finally our current plant species list.
<br>
## Sampling across plant families
```{r echo=FALSE, out.width="800px", fig.align="center", fig.cap="The distribution of currently sampled plant families (purple) across the phylogeny of all extant vascular plant families. Tree made by HerbVar collaborator [Marjorie Weber](http://www.theweberlab.com) with mega-tree from Jin et al. 2019."}
# Load phylogeny packages
require(V.PhyloMaker2)
require(taxonlookup)
require(ape)
require(phytools)
require(taxize)
###########################################
# Match genus names to most up to date higher family and order assignments using the Taxonomic Name Resolution Servicet (information here: https://github.com/traitecoevo/taxonlookup) ####
###########################################
# herbvar_genera<-sapply(strsplit(dAll$Genus_sp, "_"), function(x) x[1])
herbvar_tax_names_clean <- taxonlookup::lookup_table(unique(dAll$Genus_sp),by_species=TRUE,missing_action="NA")
#head(herbvar_tax_names_clean)
# figure out if any queries didn't work. Can edit them in later in the dataset and re-run to fix, or just omit for now.
omitted<-herbvar_tax_names_clean[is.na(herbvar_tax_names_clean$family),]
#nrow(omitted) #number of entries that there was no name for
#omitted
#genus species, original family, new family
family.naming.table<-data.frame(genus_species=dAll$Genus_sp[!duplicated(dAll$Genus_sp)],orig_family=dAll$plantFamily[!duplicated(dAll$Genus_sp)],new_family=herbvar_tax_names_clean$family)
#######################################
## make a family level tree figure ####
#######################################
#http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-33062017000200191
# load phylogeny of plant families
family_tree <-read.tree("GBOTB.extended.families.tree")
#plot(family_tree,type = 'fan',cex=0.1, main="Mega-Tree of 479 Extant Vascular Plant Families",sub= "(Jin et al. 2019, GBOTB.extended.tre)")
#check that all the family names in herbvar are in the tree
family.names<-family_tree$tip.label
#tolower(herbvar_tax_names_clean$family)[(!tolower(herbvar_tax_names_clean$family) %in% tolower(family.names))] #missing names
#make a character vector of which families are represented in herbvar (1's) and which aren't (0's)
herbvar.presabs<-rep(0,length(family.names))
herbvar.presabs[tolower(family.names) %in% tolower(herbvar_tax_names_clean$family)]<-1
names(herbvar.presabs)<-family.names
#herbvar.presabs[names(herbvar.presabs) == 'Asteraceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Apocynaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Fabaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Rubiaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Solanaceae'] = 2
herbvar.colors = rep('transparent', length=length(herbvar.presabs))
herbvar.colors[herbvar.presabs == 1] = 'purple'
herbvar.colors[herbvar.presabs == 2] = '#9B870C'
## plot phylogeny with familys in herbvar in purple
#make edge color vector
# This will be the main vector which colors will be assigned to
col.vector <- vector(mode="character",length=nrow(family_tree$edge))
# Number of tips in the phylogeny
n.tips <- length(family_tree$tip.label)
# Non-terminal branches will be gray
col.vector[family_tree$edge[,2]>n.tips] <- "gray"
# Create a vector of colors
colors<-c("gray","purple", "yellow")
# create a duplicate of phy$edge which can be manipulated as required
edge.data <- as.data.frame(family_tree$edge)
# Yes, its a loop, but its easier
for(i in 1:length(family_tree$tip.label)){
spp <- family_tree$tip.label[i]
spp.trait.val <- as.numeric(herbvar.presabs[grep(spp,names(herbvar.presabs))])
spp.col <- colors[spp.trait.val+1]
edge.row <- as.numeric(rownames(edge.data[edge.data$V2==i,]))
col.vector[edge.row] <- spp.col
}
tips.in_herbvar <-family_tree$tip.label[herbvar.presabs==1]
par(xpd = NA, mar=c(1,1,1,1))
plot(family_tree, show.tip.label=TRUE, edge.col=col.vector, edge.width=2,type = 'fan',cex=0.5,tip.color=herbvar.colors)
```
<br>
```{r, echo=FALSE, warning=FALSE, out.width = "800px", fig.asp=0.5, fig.align='center', fig.cap="The geographic distribution of our completed surveys. Each hexagon shows the number of herbivory surveys that we have completed in that area."}
mapWorld = borders("world", colour="gray80", fill="gray80") # create a layer of borders
ggplot(data=dAll, aes(x=transectOriginLong, y=transectOriginLat)) +
mapWorld +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_blank(),
axis.ticks=element_blank()) +
#geom_point(fill=viridis(2)[2], size=1.75, shape=21, alpha=0.7) +
coord_fixed() +
#stat_binhex(bins=c(60,80)*1.75) +
stat_binhex(aes(fill=cut(..count.., breaks=c(1,2,5,10,20,1000),
labels=c('1','2-5','6-9','10-19','>=20'), right=FALSE)), bins=c(60,80)*1.25) +
scale_fill_viridis_d(alpha=0.9, name='No. surveys')
```
<br>
## Sampling within plant families
```{r, echo=FALSE, fig.align='center', fig.width=5, fig.asp=2.5, fig.cap="Number of completed surveys per plant family. Our five focal families are in yellow (Apocynaceae, Asteraceae, Fabaceae, Rubiaceae, and Solanaceae)."}
ggplot(d.complete, aes(x=plantFamily, fill=focalFamily)) + geom_bar() + coord_flip() +
ylab('No. surveys') + xlab('Plant family') + theme_classic() +
scale_fill_viridis(discrete=TRUE, alpha=0.8) + theme(legend.position = "none")
```
<br>
## Sampling within focal species
We have three focal species that we are aiming to sampling across the broadest possible geographic extent and across broad environmental gradients. These are _Taraxacum officinale_, _Plantago major_, and _Plantago lanceolata_. This is the newest part of our sampling effort and a major component of our [Phase 2 of data collection](phase_2.html), so we are just starting to collect these data.
```{r, include=FALSE}
# map.world = map_data('world')
#
#
# dfs$Taraxacum.count = as.integer(dfs$Taraxacum.count)
# dfs$Plantago_major.count = as.integer(dfs$Plantago_major.count)
#
# map.world$region[map.world$region == 'USA'] = 'United States'
# map.world$region[map.world$region == 'UK'] = 'United Kingdom'
# map.world$region[map.world$region == 'Russia'] = 'Russian Federation'
# map.world$region[map.world$region == 'Democratic Republic of the Congo'] = 'The Democratic Republic of The Congo'
# map.world$region[map.world$region == 'Republic of Congo'] = 'Congo'
#
#
# anti_join(dfs, map.world, by=c('Country' = 'region'))
#
# focal.sp.map.data = left_join(map.world, dfs, by = c('region' = 'Country'))
#
# # For countries not in the list (because of match problems), set count to zero
# # Check on this and fix matches as we add countries!
# focal.sp.map.data$Taraxacum.count[is.na(focal.sp.map.data$Taraxacum.count)] = 0
# focal.sp.map.data$Plantago_major.count[is.na(focal.sp.map.data$Plantago_major.count)] = 0
Taof = subset(d.complete, Genus_sp == 'Taraxacum_officinale')
Plma = subset(d.complete, Genus_sp == 'Plantago_major')
Plla = subset(d.complete, Genus_sp == 'Plantago_lanceolata')
mapWorld = borders("world", colour="gray80", fill="gray80") # create a layer of borders
```
```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}
# Plot it
ggplot(Taof, aes(x=transectOriginLong, y=transectOriginLat)) +
mapWorld +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_blank(),
axis.ticks=element_blank()) +
geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
coord_fixed() +
labs(fill = NULL
,title = expression(~italic('Taraxacum officinale')~'surveys')
,subtitle = NULL
,x = NULL
,y = NULL)
```
```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}
# Plot it
ggplot(Plma, aes(x=transectOriginLong, y=transectOriginLat)) +
mapWorld +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_blank(),
axis.ticks=element_blank()) +
geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
coord_fixed() +
labs(fill = NULL
,title = expression(~italic('Plantago major')~'surveys')
,subtitle = NULL
,x = NULL
,y = NULL)
```
```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}
# Plot it
ggplot(Plla, aes(x=transectOriginLong, y=transectOriginLat)) +
mapWorld +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_blank(),
axis.ticks=element_blank()) +
geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
coord_fixed() +
labs(fill = NULL
,title = expression(~italic('Plantago lanceolata')~'surveys')
,subtitle = NULL
,x = NULL
,y = NULL)
```
<br>
## Species list
Our current species list with the number of surveys completed for each species.
```{r, echo=FALSE}
Genus_sp.form = paste(d.complete$plantFamily, ":", '*', d.complete$Genus_sp, '*', sep='')
Genus_sp.form = sub('_', ' ', Genus_sp.form)
num.spp = table(Genus_sp.form)
num.spp = data.frame(num.spp)
num.spp = separate(num.spp, 1, into=c("Plant family", "Plant species"), sep=':')
kable(num.spp, col.names=c('Plant family', 'Plant species', 'No. surveys'), align=c('l','l', 'l')) %>%
kable_styling(full_width = FALSE)
```
<div class="tocify-extend-page" data-unique="tocify-extend-page" style="height: 0;"></div>