-
Notifications
You must be signed in to change notification settings - Fork 0
/
div_indices.Rmd
169 lines (114 loc) · 10.2 KB
/
div_indices.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: "Comparing MDAT diversity metrics"
author: "emily shumchenia"
date: "`r format(Sys.time(), '%d %B %Y')`"
output: github_document
---
##Intended audience and purpose
This document is intended for [Northeast Ocean Data Portal](http://www.northeastoceandata.org) users. The purpose is to provide a plain-language explanation of the similarities and differences between marine life diversity data products developed by the [Marine life Data & Analysis Team (MDAT)](http://seamap.env.duke.edu/models/mdat/).
```{r load data, include=FALSE}
## required packages
library(dplyr)
library(raster)
library(viridis)
library(Hmisc)
library(knitr)
library(ggplot2)
library(GGally)
library(gridExtra)
## load data: these are rasters of species richness, shannon index, and simpson index for mdat cetacean models.
cet_richness <-raster("data/mammal_richness_Cetaceans.tif")
cet_simpson <-raster("data/mammal_simpson_Cetaceans.tif")
cet_shannon <-raster("data/mammal_shannon_Cetaceans.tif")
## clip all rasters to the same extent as richness
cet_simpson <-crop(cet_simpson, extent(-347000, 1803000, -1081000, 1519000))
cet_shannon <-crop(cet_shannon, extent(-347000, 1803000, -1081000, 1519000))
```
### The quick explanation
> **Different diversity metrics may respond to species' abundance patterns differently, and you may need to look at one or a combination of them to most accurately answer your specific questions in the context of the habitat type or ecological component that is most important to you.**
>
> FOR EXAMPLE:
>
> * If you are most concerned about accounting for and considering rare species, **SPECIES RICHNESS** does the best job, by representing every species equally, regardless of abundance. However, temporal trends in species richness will not capture any changes to community composition in changing environments. For this reason, a biodiversity metric may be more useful for monitoring.
>
> * If you care about rare species, but need to give some additional consideration to species that are particularly abundant, the **SHANNON INDEX** does a good job.
>
> * If abundant or dominant species are important for your analysis, the **SIMPSON INDEX** gives more weight to highly abundant species, to the detriment of the representation of the rarest species.
>
## Detailed explanation
### Introduction
What are diversity metrics?
When most people think about diversity, they think about the total number of species in an area. However, the total number of species is a metric called [Species Richness](https://en.wikipedia.org/wiki/Species_richness).
[Diversity indices](https://en.wikipedia.org/wiki/Diversity_index) are more complex metrics that take the total number of species *and* their abundances into account (in different ways, depending on the index).
Experts in the region have indicated that each metric is valuable for different reasons. The scientific literature underscores the need to [consider multiple diversity indices, even though some may seem interchangeable](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4224527/). In general, [diversity indices that reflect species dominance and identity are more effective than Species Richness for tracking change over time](http://onlinelibrary.wiley.com/doi/10.1111/1365-2664.12959/epdf).
Because each metric integrates information differently, it is important to know how they are similar or different, and how the outputs could be used.
***
MDAT has developed diversity data products using three different metrics: [Species Richness](https://en.wikipedia.org/wiki/Species_richness), the [Shannon index](https://en.wikipedia.org/wiki/Diversity_index#Shannon_index), and the [Simpson index](https://en.wikipedia.org/wiki/Diversity_index#Simpson_index). To know how and when each metric could be used, we asked the following questions:
1. Are the metrics similar or different?
2. How do their results compare to one another?
3. Do the metrics represent ecological patterns similarly?
Below are the maps and data for Species Richness, Shannon Diversity, and Simpson Diversity for all cetacean species that we will use to help answer these questions.
```{r maps, echo=FALSE}
## plot the three maps
par(mfrow=c(2,3))
plot(cet_richness, main="Cetacean Species Richness", col=viridis(256), axes=FALSE)
plot(cet_shannon, main="Cetacean Shannon Diversity", col=viridis(256), axes=FALSE)
plot(cet_simpson, main="Cetacean Simpson Diversity", col=viridis(256), axes=FALSE)
hist(cet_richness, main=NULL, col=viridis(12), xlab="Species Richness")
hist(cet_shannon, main=NULL, col=viridis(13), xlab="Shannon Diversity")
hist(cet_simpson, main=NULL, col=viridis(18), xlab="Simpson Diversity")
```
###QUESTION: Are the metrics similar or different?
**ANSWER:** From the maps and histograms we can tell that Shannon and Simpson are more similar to each other than either of them are to Species Richness. We can check this by doing a simple correlation between each pair of metrics (a value of 1.0 = a perfect positive correlation, a value of -1.0 = a perfect negative correlation):
```{r metrics_corr, echo=FALSE, results='asis'}
## make a raster stack for each species group's set of diversity metrics
cet_stack <- stack(cet_richness, cet_shannon, cet_simpson)
## extract each stack's values to a table
cet_metrics <-values(cet_stack)
## change the column headers
colnames(cet_metrics) <- c("richness", "shannon", "simpson")
## calculate correlations among diversity metrics, save as data frame object, display the table
ggcorr(cet_metrics, nbreaks=NULL, label=TRUE, label_round=2, label_size=5, label_color='white')
```
This result is not surprising since both Shannon and Simpson indices consider abundance information, and Species Richness does not.
###QUESTION: How do the metrics' results compare to one another?
We know that results for Species Richness are quite different from both Shannon and Simpson results. To understand differences in the results of these two indices (how each deals differently with abundance information), we can plot the Shannon and Simpson results together. The plots below shows how multiple species abundance and richness information at each location has contributed to each diversity index value for a single pixel with relatively high diversity.
The first graph shows the density (abundance per unit area) values for all mammal species present in this pixel, and the number of species present (richness) at these densities. Note that there are many species with very low (near zero) densities in this pixel.
The second graph plots the contribution of each density/richness observation to the total diversity index value for both the Shannon index and the Simpson Index. Note that there is only 1 species present at the highest density value of ~2.5 (top graph), but this observation contributes proportionally much more to the final Simpson index value (~35% of its total; bottom graph) than to the final Shannon index value (~10% of its total; bottom graph).
![cetacean diversity components](data/Mammal_Diversity_Components_High.png)
**ANSWER:** These plots show that both the Shannon Index and Simpson Index consider low to medium density values comparably. However, species with high densities contribute much more, proportionally, to the Simpson index than to the Shannon Index.
This result indicates that the Simpson index gives more weight to highly abundant or dominant species than the Shannon index. The Shannon index treats rare species and dominant species more equally.
###QUESTION: Do the metrics represent ecological patterns similarly?
```{r regressions, include=FALSE}
## load bathymetry data and crop to the same extent as richness
bathy <-raster("~/Dropbox/Research/mdat_diversity_indices/data/ETOPO1_mdat.tif")
bathy <-crop(bathy, extent(-347000, 1803000, -1081000, 1519000))
## make the cet_metrics object into a dataframe, omitting NAs
cet_metrics_df <-data.frame(cet_metrics)
cet_metrics_df <-na.omit(cet_metrics_df)
## extract water depth values from bathy raster to a data object, omitting NAs and reversing sign
depths <-values(bathy)
depths <-data.frame(depths)
depths <-na.omit(depths)
depths <-depths*-1
## combine all the data into a single data frame
alldata <- data.frame(cet_metrics_df$richness, cet_metrics_df$shannon, cet_metrics_df$simpson, depths$depths)
colnames(alldata) <- c("richness", "shannon", "simpson", "depth")
```
We know that depth is an important habitat variable for cetaceans - certain groups of species spend more time in deeper water, while other species have more coastal distributions.
If each diversity metric integrated information in a similar way, we would expect each metric to represent cetacean diversity similarly across different ocean habitats. In other words, each metric will respond in the same way to changes in depth.
We can look at a few different ocean "habitats", defined by differences in depth only. The values of each diversity metric are calculated in each ocean habitat.
```{r habitats_results, echo=FALSE, results='asis'}
alldata$depth_cat <- cut(alldata$depth, 5, labels = c("<1000m","1000-2000m", "2000-3000m", "3000-4000m", ">4000m"))
b1 <- ggplot(alldata, aes(depth_cat, richness)) + geom_boxplot(aes(fill = depth_cat)) +
scale_fill_brewer() + coord_fixed(ratio=0.17) + theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) + ggtitle("Species Richness")
b2 <- ggplot(alldata, aes(depth_cat, shannon)) + geom_boxplot(aes(fill = depth_cat)) +
scale_fill_brewer() + theme(legend.position = "none", axis.title.y = element_blank(), axis.title.x = element_blank()) + coord_fixed(ratio=1.6) + ggtitle("Shannon Diversity")
b3 <- ggplot(alldata, aes(depth_cat, simpson)) + geom_boxplot(aes(fill = depth_cat)) +
scale_fill_brewer() + xlab("Depth 'habitats'") + theme(legend.position = "none", axis.title.y = element_blank()) + coord_fixed(ratio=4.3) + ggtitle("Simpson Diversity")
b1
b2
b3
```
**ANSWER:** The boxes show the range of values for each diversity metric in each depth habitat. While the Shannon and Simpson Diversity values show a similar response to depth habitats (diversity generally increasing with depth), Species Richness is different, with Richness being highest in the 1000-2000m depth habitat and generally decreasing in deeper habitats.
Back to [The quick explanation](#the-quick-explanation)