status.Rmd

---
title: "Sampling status"
output: 
  html_document:
    toc: true
    toc_float: true
---

```{r setup, include=FALSE}

# Load packages for this page
library(googlesheets4)
library(hexbin)
library(ggplot2)
library(viridis)
library(knitr)
library(ggmap)
library(tidyverse)
library(kableExtra)

# Load metadata on completed surveys
gs4_deauth()
d = read_sheet("https://docs.google.com/spreadsheets/d/1mNYnSTCs9WYy5SN2HF4VYep550Sccz9xU5o8JuYut-A/edit?usp=sharing",
           sheet='Completed surveys')
d = data.frame(d)

d2 = read_sheet("https://docs.google.com/spreadsheets/d/1XFNI7KXeuo5NuHL-0miKhWYkt3MeWUFYu2LugHRHE6Q/edit?usp=sharing", sheet='completedSurveys')
d2$surveyDate = as.character(d2$surveyDate)
d2 = data.frame(d2)
d2$Genus_sp = with(d2, paste(plantGenus, plantSpecies, sep='_'))

dAll = bind_rows(d,d2)

# Drop rows without an entry for Genus_sp
dAll = subset(dAll, !is.na(dAll$Genus_sp))

# Load meta data on focal species
# dfs = read_sheet("https://docs.google.com/spreadsheets/d/1mNYnSTCs9WYy5SN2HF4VYep550Sccz9xU5o8JuYut-A/edit?usp=sharing",
#           sheet='Focal_species')
# dfs = data.frame(dfs)

focalFam = function(z) # define function to ID focal families in data frame
	ifelse(z == 'Apocynaceae' | z == 'Asteraceae' | z == 'Fabaceae' | z == 'Solanaceae' |
		z == 'Rubiaceae', TRUE, FALSE)
dAll$focalFamily = focalFam(dAll$plantFamily)

d.complete = dAll[!is.na(dAll$plantFamily),] # data.frame for complete entries

```

The Herbivory Variability Network has completed **`r nrow(dAll)` surveys on `r length(unique(d.complete$Genus_sp))` plant species from `r length(unique(d.complete$plantFamily))` plant families**, which is about `r round( (length(unique(d.complete$plantFamily))/465) * 100, 0)`% of all extant vascular plant families. These figures and tables update as new data are entered into our database.

Our Phase 1 of data collection had a three-scaled sampling approach:

1. Sampling one plant species from as many plant families as possible
2. Sampling as many species (and tribes and genera) as possible from five focal families (Apocynaceae, Asteraceae, Fabaceae, Rubiaceae, and Solanaceae)
3. Sampling as many sites as possible for three focal species (_Taraxacum officinale_, _Plantago lanceolata_, and _Plantago major_). 

The goal of this stratified sampling plan was to avoid potential biases in geographic and/or taxonomic coverage, as well as to permit robust exploration of factors that shape patterns in herbivory.

Below we show the current status of our sampling for each of the three scales of sampling, our overall geographic extent, and finally our current plant species list.

<br>

## Sampling across plant families
```{r echo=FALSE, out.width="800px", fig.align="center", fig.cap="The distribution of currently sampled plant families (purple) across the phylogeny of all extant vascular plant families. Tree made by HerbVar collaborator [Marjorie Weber](http://www.theweberlab.com) with mega-tree from Jin et al. 2019."}

# Load phylogeny packages
require(V.PhyloMaker2)
require(taxonlookup)
require(ape)
require(phytools)
require(taxize)

###########################################
# Match genus names to most up to date higher family and order assignments using the Taxonomic Name Resolution Servicet (information here: https://github.com/traitecoevo/taxonlookup) ####
###########################################

# herbvar_genera<-sapply(strsplit(dAll$Genus_sp, "_"), function(x) x[1])
herbvar_tax_names_clean <- taxonlookup::lookup_table(unique(dAll$Genus_sp),by_species=TRUE,missing_action="NA")
#head(herbvar_tax_names_clean)

# figure out if any queries didn't work.  Can edit them in later in the dataset and re-run to fix, or just omit for now.
omitted<-herbvar_tax_names_clean[is.na(herbvar_tax_names_clean$family),]
#nrow(omitted) #number of entries that there was no name for
#omitted

#genus species, original family, new family
family.naming.table<-data.frame(genus_species=dAll$Genus_sp[!duplicated(dAll$Genus_sp)],orig_family=dAll$plantFamily[!duplicated(dAll$Genus_sp)],new_family=herbvar_tax_names_clean$family)


#######################################
## make a family level tree figure ####
#######################################
#http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-33062017000200191

# load phylogeny of plant families
family_tree <-read.tree("GBOTB.extended.families.tree")
#plot(family_tree,type = 'fan',cex=0.1, main="Mega-Tree of 479 Extant Vascular Plant Families",sub= "(Jin et al. 2019, GBOTB.extended.tre)")

#check that all the family names in herbvar are in the tree 
family.names<-family_tree$tip.label
#tolower(herbvar_tax_names_clean$family)[(!tolower(herbvar_tax_names_clean$family) %in% tolower(family.names))] #missing names

#make a character vector of which families are represented in herbvar (1's) and which aren't (0's)
herbvar.presabs<-rep(0,length(family.names))
herbvar.presabs[tolower(family.names) %in% tolower(herbvar_tax_names_clean$family)]<-1
names(herbvar.presabs)<-family.names
#herbvar.presabs[names(herbvar.presabs) == 'Asteraceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Apocynaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Fabaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Rubiaceae'] = 2
#herbvar.presabs[names(herbvar.presabs) == 'Solanaceae'] = 2

herbvar.colors = rep('transparent', length=length(herbvar.presabs))
herbvar.colors[herbvar.presabs == 1] = 'purple'
herbvar.colors[herbvar.presabs == 2] = '#9B870C'

## plot phylogeny with familys in herbvar in purple

#make edge color vector
# This will be the main vector which colors will be assigned to
col.vector <- vector(mode="character",length=nrow(family_tree$edge))

# Number of tips in the phylogeny
n.tips <- length(family_tree$tip.label)

# Non-terminal branches will be gray
col.vector[family_tree$edge[,2]>n.tips] <- "gray"

# Create a vector of colors 
colors<-c("gray","purple", "yellow")

# create a duplicate of phy$edge which can be manipulated as required
edge.data <- as.data.frame(family_tree$edge)

# Yes, its a loop, but its easier
for(i in 1:length(family_tree$tip.label)){
  spp <- family_tree$tip.label[i]
  spp.trait.val <- as.numeric(herbvar.presabs[grep(spp,names(herbvar.presabs))])
  spp.col <- colors[spp.trait.val+1]
  edge.row <- as.numeric(rownames(edge.data[edge.data$V2==i,]))
  col.vector[edge.row] <- spp.col
}

tips.in_herbvar <-family_tree$tip.label[herbvar.presabs==1]

par(xpd = NA, mar=c(1,1,1,1))
plot(family_tree, show.tip.label=TRUE, edge.col=col.vector, edge.width=2,type = 'fan',cex=0.5,tip.color=herbvar.colors)


```

<br>

```{r, echo=FALSE, warning=FALSE, out.width = "800px", fig.asp=0.5, fig.align='center', fig.cap="The geographic distribution of our completed surveys. Each hexagon shows the number of herbivory surveys that we have completed in that area."}
mapWorld = borders("world", colour="gray80", fill="gray80") # create a layer of borders
ggplot(data=dAll, aes(x=transectOriginLong, y=transectOriginLat)) + 
	mapWorld + 
	theme_bw() + 
	theme(axis.title=element_blank(), axis.text=element_blank(),
		axis.ticks=element_blank()) + 
	#geom_point(fill=viridis(2)[2], size=1.75, shape=21, alpha=0.7) +
	coord_fixed() +
  #stat_binhex(bins=c(60,80)*1.75) +
  stat_binhex(aes(fill=cut(..count.., breaks=c(1,2,5,10,20,1000), 
        labels=c('1','2-5','6-9','10-19','>=20'), right=FALSE)), bins=c(60,80)*1.25) +
  scale_fill_viridis_d(alpha=0.9, name='No. surveys')

```

<br>

## Sampling within plant families

```{r, echo=FALSE, fig.align='center', fig.width=5, fig.asp=2.5, fig.cap="Number of completed surveys per plant family. Our five focal families are in yellow (Apocynaceae, Asteraceae, Fabaceae, Rubiaceae, and Solanaceae)."}

ggplot(d.complete, aes(x=plantFamily, fill=focalFamily)) + geom_bar() + coord_flip() + 
	ylab('No. surveys') + xlab('Plant family') + theme_classic() +
	scale_fill_viridis(discrete=TRUE, alpha=0.8) + theme(legend.position = "none")
```

<br>

## Sampling within focal species

We have three focal species that we are aiming to sampling across the broadest possible geographic extent and across broad environmental gradients. These are _Taraxacum officinale_, _Plantago major_, and _Plantago lanceolata_. This is the newest part of our sampling effort and a major component of our [Phase 2 of data collection](phase_2.html), so we are just starting to collect these data.

```{r, include=FALSE}
# map.world = map_data('world')
# 
# 
# dfs$Taraxacum.count = as.integer(dfs$Taraxacum.count)
# dfs$Plantago_major.count = as.integer(dfs$Plantago_major.count)
# 
# map.world$region[map.world$region == 'USA'] = 'United States'
# map.world$region[map.world$region == 'UK'] = 'United Kingdom'
# map.world$region[map.world$region == 'Russia'] = 'Russian Federation'
# map.world$region[map.world$region == 'Democratic Republic of the Congo'] = 'The Democratic Republic of The Congo'
# map.world$region[map.world$region == 'Republic of Congo'] = 'Congo'
# 
# 
# anti_join(dfs, map.world, by=c('Country' = 'region'))
# 
# focal.sp.map.data = left_join(map.world, dfs, by = c('region' = 'Country'))
# 
# # For countries not in the list (because of match problems), set count to zero
# # Check on this and fix matches as we add countries!
# focal.sp.map.data$Taraxacum.count[is.na(focal.sp.map.data$Taraxacum.count)] = 0
# focal.sp.map.data$Plantago_major.count[is.na(focal.sp.map.data$Plantago_major.count)] = 0

Taof = subset(d.complete, Genus_sp == 'Taraxacum_officinale')
Plma = subset(d.complete, Genus_sp == 'Plantago_major')
Plla = subset(d.complete, Genus_sp == 'Plantago_lanceolata')

mapWorld = borders("world", colour="gray80", fill="gray80") # create a layer of borders

```


```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}

# Plot it
ggplot(Taof, aes(x=transectOriginLong, y=transectOriginLat)) +
	mapWorld + 
	theme_bw() + 
	theme(axis.title=element_blank(), axis.text=element_blank(),
		axis.ticks=element_blank()) + 
	geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
	coord_fixed() +
   labs(fill = NULL
       ,title = expression(~italic('Taraxacum officinale')~'surveys')
       ,subtitle = NULL
       ,x = NULL
       ,y = NULL)
```

```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}
# Plot it
ggplot(Plma, aes(x=transectOriginLong, y=transectOriginLat)) +
	mapWorld + 
	theme_bw() + 
	theme(axis.title=element_blank(), axis.text=element_blank(),
		axis.ticks=element_blank()) + 
	geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
	coord_fixed() +
   labs(fill = NULL
       ,title = expression(~italic('Plantago major')~'surveys')
       ,subtitle = NULL
       ,x = NULL
       ,y = NULL)

```

```{r, echo=FALSE, warning=FALSE, out.width="600px", fig.asp=0.5, fig.align='center'}
# Plot it
ggplot(Plla, aes(x=transectOriginLong, y=transectOriginLat)) +
	mapWorld + 
	theme_bw() + 
	theme(axis.title=element_blank(), axis.text=element_blank(),
		axis.ticks=element_blank()) + 
	geom_point(fill=viridis(1)[1], size=1.75, shape=21, alpha=0.7) +
	coord_fixed() +
   labs(fill = NULL
       ,title = expression(~italic('Plantago lanceolata')~'surveys')
       ,subtitle = NULL
       ,x = NULL
       ,y = NULL)

```


<br>

## Species list

Our current species list with the number of surveys completed for each species.

```{r, echo=FALSE}
Genus_sp.form = paste(d.complete$plantFamily, ":", '*', d.complete$Genus_sp, '*', sep='')
Genus_sp.form = sub('_', ' ', Genus_sp.form)
num.spp = table(Genus_sp.form)
num.spp = data.frame(num.spp)
num.spp = separate(num.spp, 1, into=c("Plant family", "Plant species"), sep=':')
kable(num.spp, col.names=c('Plant family', 'Plant species', 'No. surveys'), align=c('l','l', 'l')) %>%
  kable_styling(full_width = FALSE)
```

<div class="tocify-extend-page" data-unique="tocify-extend-page" style="height: 0;"></div>