-
Notifications
You must be signed in to change notification settings - Fork 1
/
wed_data_preparation.Rmd
210 lines (150 loc) · 8.04 KB
/
wed_data_preparation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "Preparing input data"
output: html_document
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=12, fig.height=8, eval=FALSE,
echo=TRUE, warning=FALSE, message=FALSE,
tidy = TRUE, collapse = TRUE,
results = 'hold')
```
# Objectives
In this exercise, you will use the cleaned occurrence data and the taxon-specific bioregionalization (if you have it) to prepare the input files you need for the ancestral area reconstruction on the phylogeny.
# Exercises
1. Use a species list of your group of interest to extract a medium-sized phylogeny from the provided large-scale phylogenies. If you already have a phylogeny for your group of interest, convert it to newick format.
2. Use the cleaned occurrences together with the taxon specific bioregionalization to get a species to area classification in the right format to use for the DEC analysis. If you do not have suitable bioregions, use biome classifications instead.
3. Extract temperature ranges for all species in the phylogeny using the occurrence records.
# Tutorial
## Library Setup
```{r}
library(ape)
library(geiger)
library(tidyverse)
library(raster)
library(speciesgeocodeR)
library(rgdal)
```
## 1. Extract clades from a large-scale phylogeny
In the last years large scale phylogenies for certain taxonomic groups have become available, for instance for amphibians, birds, mammals, squamates, and flowering plants. These phylogenies have challenges, but can provide helpful information for many research questions.
```{r}
# Load the species list
splist <- read_csv("example_data/bombacoideae_specieslist.csv")
rownames(splist) <- splist$species
# Load the large-scale phylogeny
tree <- read.tree("example_data/VascularPlants.tre")
# Check if the spelling of the names is identical, e.g. if the epitheton is preceeded by a space or an underscore
head(splist$species)
head(tree$tip.label)
# extract species that are in the species list
clade <- treedata(tree, splist)$phy
# save in newick format for DEC
write.tree(clade, "example_data/bombacoideae_phylogeny.newick")
# As .tre file for GeoSSE
write.tree(clade, "example_data/bombacoideae_phylogeny.tre")
```
If you do not have a species list, or your list is incomplete for your group of interest (for example, because you extracted it from the GBIF data) you can also pick two species whose most recent common ancestor matches with the MRCA of your clade of interest. As an example, we will extract all Canidae from the mammal phylogeny. First we will need to find the MRCA of the group. To do so, you need to identify two genera, whose MRCA coincides with the MRCA of all Canidae, for instance *Vulpes* (foxes) and *Lupus* (wolves). You can use http://www.onezoom.org/life/ to identify suitable genera to extract your clade of interest. Alternatively you can also provide a list of all species to keep in the tree.
```{r}
# read the large-scale phylogeny
mammals_tree <- read.tree("example_data/mammals.tre")
# Check properties (attributes) of the tree object
str(mammals_tree)
# Extract clade of interest
# Create an object containing at tip labels
species_list <- mammals_tree$tip.label
# find where Canis and Vulpes are in the species list
dogs <- grep("Canis",species_list)
foxes <- grep("Vulpes",species_list)
# find the node corresponding to the MRCA of all canids
MRCA_Canidae <- getMRCA(mammals_tree, c(dogs, foxes))
# extract Canidae clade
canidae_tree <- extract.clade(mammals_tree,MRCA_Canidae)
plot(canidae_tree)
# add time axis
axisPhylo()
# plot with smaller font size
plot(canidae_tree,cex=0.5)
axisPhylo()
write.tree(canidae_tree, "inst/canidae_phylogeny.newick")
```
## 2. Classify species into areas
### Taxon-specific bioregions
```{r}
# Load cleaned occurences
occ <- read_csv("example_data/bombacoideae_occurrences_cleaned.csv") %>%
as.data.frame()
# remove species not in the phylogeny
tr <- read.tree("example_data/bombacoideae_phylogeny.newick")
occ <- occ %>%
filter(species %in% tr$tip.label)
# load bioregions
area <- readOGR(dsn = "example_data", layer = "bombacoideae_infomap_shapefile")
# Classify species
class <- SpGeoCod(x = occ, y = area, areanames = "bioregn")
# convert to presence-absence
class <- class$spec_table %>%
as.data.frame() %>%
dplyr::select(-not_classified)
# Make sure each species has at least one area
# adapt to the BioGeoBEARS format
class[class > 0] <- 1
class$comb <- apply(class, 1, function(x) paste(x, collapse = ""))
class <- dplyr::select(class, comb)
# write to disk
# Here you need to adapt the header, the first number is the number of tips in your phylogeny, the second number is the number of bioregions, and then in brackets a one latter code for each area
sink("example_data/bombacoideae_area_classification.txt")
cat("38 8 (A B C D E F G H )\n")
write.table(as.data.frame(class), row.names = TRUE, col.names = FALSE, quote = FALSE)
sink(NULL)
```
### Biomes
The vast majority of species occurrence information available, via 'big data' aggregators as GBIF are georeferenced point locations consisting of geographic coordinates. However, most methods for ancestral area estimation require species occurrences in a limited number of discrete geographic units. A manual classification of species based on expert knowledge or graphical-user-interface based GIS software are limited in the amount of data that can be processed and often hard to reproduce. SpeciesgeocodeR implements an easy-to-use function to classify species occurrence to discrete areas, accounting for issues in data quality. You can find detailed tutorials on the software [here](https://github.com/azizka/speciesgeocodeR/wiki) and articles disrobing the method [here](https://academic.oup.com/sysbio/article/66/2/145/2670075/SpeciesGeoCoder-Fast-Categorization-of-Species) and [here](http://www.biorxiv.org/content/early/2015/11/24/032755).
```{r}
# Load cleaned occurences
occ <- read_csv("example_data/bombacoideae_occurrences_cleaned.csv")
# remove species not in the phylogeny
tr <- read.tree("example_data/bombacoideae_phylogeny.newick")
occ <- occ %>%
filter(species %in% tr$tip.label) %>%
as.data.frame()
# Load biomes
biom <- WWFload(x = "inst")
names(biom)
# Classify species
class <- SpGeoCod(x = occ, y = biom, areanames = "BIOME")
# convert to presence-absence
class <- class$spec_table %>%
# here you can exclude some areas by their name, for isntance marginal areas
dplyr::select(-not_classified, -`3`, -`4`, -`5`, -`9`, -`10`, -`98`) %>%
as.data.frame()
# adapt to the BioGeoBEARS format
class[class > 0] <- 1
write.csv(class, "example_data/bombacoideae_biome_classification_details.csv")
class$comb <- apply(class, 1, function(x) paste(x, collapse = ""))
class <- dplyr::select(class, comb)
# write to disk
# Here you need to adapt the header, the first number is the number of tips in your phylogeny, the second number is the number of bioregions, and then in brackets a one latter code for each area
sink("example_data/bombacoideae_biome_classification.txt")
cat("38 5 (A B C D E)\n")
write.table(as.data.frame(class), row.names = TRUE, col.names = FALSE, quote = FALSE)
sink(NULL)
```
## 3. Extract environmental data
Linking species occurrences and evolution to environmental variables and quantifying species "niche" and its evolution is essential for many biogeographic questions. We will use the temperature range of species to reconstruct the evolution of temperature mean and variance using BITE tomorrow.
```{r}
# Load cleaned occurences
occ <- read_csv("example_data/bombacoideae_occurrences_cleaned.csv")
# Load environmental data
env <- raster("example_data/wc2.0_bio_5m_01.tif")
# Generate a spatial object with the coordinates
pts <- occ %>%
dplyr::select(decimallongitude, decimallatitude) %>%
as.matrix()
# Extract the environmental values
ext <- raster::extract(env, pts) %>%
enframe(name = NULL, value = "MAT") %>%
mutate(MAT = round(MAT, 2))
# Bind coordinates and environmental variables
out <- bind_cols(dplyr::select(occ, species), ext)
# Write to disk
write.table(out, "example_data/bombacoideae_environment.txt", sep = " ", quote = FALSE)
```