diff --git a/docs/404.html b/docs/404.html
index 84888db..e23cf00 100644
--- a/docs/404.html
+++ b/docs/404.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
diff --git a/docs/adding-covariates-to-checklist-data.html b/docs/adding-covariates-to-checklist-data.html
index 33e95b5..3b42c15 100644
--- a/docs/adding-covariates-to-checklist-data.html
+++ b/docs/adding-covariates-to-checklist-data.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -321,7 +320,7 @@ <h2><span class="header-section-number">8.1</span> Prepare libraries and data</h
 <span id="cb69-19"><a href="adding-covariates-to-checklist-data.html#cb69-19"></a><span class="kw">library</span>(sf)</span>
 <span id="cb69-20"><a href="adding-covariates-to-checklist-data.html#cb69-20"></a></span>
 <span id="cb69-21"><a href="adding-covariates-to-checklist-data.html#cb69-21"></a><span class="co"># load saved data object</span></span>
-<span id="cb69-22"><a href="adding-covariates-to-checklist-data.html#cb69-22"></a><span class="kw">load</span>(<span class="st">&quot;data/01_data_prelim_processing.rdata&quot;</span>)</span></code></pre></div>
+<span id="cb69-22"><a href="adding-covariates-to-checklist-data.html#cb69-22"></a><span class="kw">load</span>(<span class="st">&quot;results/02_data_prelim_processing.rdata&quot;</span>)</span></code></pre></div>
 </div>
 <div id="spatial-subsampling" class="section level2">
 <h2><span class="header-section-number">8.2</span> Spatial subsampling</h2>
@@ -465,7 +464,7 @@ <h3><span class="header-section-number">8.3.1</span> Count absences after tempor
 <span id="cb75-5"><a href="adding-covariates-to-checklist-data.html#cb75-5"></a>  )</span>
 <span id="cb75-6"><a href="adding-covariates-to-checklist-data.html#cb75-6"></a></span>
 <span id="cb75-7"><a href="adding-covariates-to-checklist-data.html#cb75-7"></a><span class="co"># save data</span></span>
-<span id="cb75-8"><a href="adding-covariates-to-checklist-data.html#cb75-8"></a><span class="kw">write_csv</span>(data_presence_prop, <span class="st">&quot;data/results/data_class_balance.csv&quot;</span>)</span></code></pre></div>
+<span id="cb75-8"><a href="adding-covariates-to-checklist-data.html#cb75-8"></a><span class="kw">write_csv</span>(data_presence_prop, <span class="st">&quot;results/08_data-class-balance.csv&quot;</span>)</span></code></pre></div>
 <div class="sourceCode" id="cb76"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb76-1"><a href="adding-covariates-to-checklist-data.html#cb76-1"></a><span class="co"># bind all spatially and temporally thinned absences rows for data frame</span></span>
 <span id="cb76-2"><a href="adding-covariates-to-checklist-data.html#cb76-2"></a>dataSubsample &lt;-<span class="st"> </span><span class="kw">bind_rows</span>(dataSubsample)</span>
 <span id="cb76-3"><a href="adding-covariates-to-checklist-data.html#cb76-3"></a></span>
@@ -503,7 +502,7 @@ <h3><span class="header-section-number">8.3.1</span> Count absences after tempor
 <h2><span class="header-section-number">8.4</span> Add checklist calibration index</h2>
 <p>Load the CCI computed in the previous section. The CCI was the lone observer’s expertise score for single-observer checklists, and the highest expertise score among observers for group checklists.</p>
 <div class="sourceCode" id="cb77"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb77-1"><a href="adding-covariates-to-checklist-data.html#cb77-1"></a><span class="co"># read in obs score and extract numbers</span></span>
-<span id="cb77-2"><a href="adding-covariates-to-checklist-data.html#cb77-2"></a>expertiseScore &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;data/03_data-obsExpertise-score.csv&quot;</span>) <span class="op">%&gt;%</span></span>
+<span id="cb77-2"><a href="adding-covariates-to-checklist-data.html#cb77-2"></a>expertiseScore &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;results/04_data-obsExpertise-score.csv&quot;</span>) <span class="op">%&gt;%</span></span>
 <span id="cb77-3"><a href="adding-covariates-to-checklist-data.html#cb77-3"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">numObserver =</span> <span class="kw">str_extract</span>(observer, <span class="st">&quot;</span><span class="ch">\\</span><span class="st">d+&quot;</span>)) <span class="op">%&gt;%</span></span>
 <span id="cb77-4"><a href="adding-covariates-to-checklist-data.html#cb77-4"></a><span class="st">  </span>dplyr<span class="op">::</span><span class="kw">select</span>(<span class="op">-</span>observer)</span>
 <span id="cb77-5"><a href="adding-covariates-to-checklist-data.html#cb77-5"></a></span>
@@ -641,7 +640,7 @@ <h3><span class="header-section-number">8.7.3</span> Link environmental covariat
 <span id="cb82-7"><a href="adding-covariates-to-checklist-data.html#cb82-7"></a>)</span></code></pre></div>
 <p>Save data to file.</p>
 <div class="sourceCode" id="cb83"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb83-1"><a href="adding-covariates-to-checklist-data.html#cb83-1"></a><span class="co"># write to file</span></span>
-<span id="cb83-2"><a href="adding-covariates-to-checklist-data.html#cb83-2"></a><span class="kw">write_csv</span>(dataSubsample, <span class="dt">path =</span> <span class="kw">glue</span>(<span class="st">&quot;data/04_data-covars-2.5km.csv&quot;</span>))</span></code></pre></div>
+<span id="cb83-2"><a href="adding-covariates-to-checklist-data.html#cb83-2"></a><span class="kw">write_csv</span>(dataSubsample, <span class="dt">path =</span> <span class="kw">glue</span>(<span class="st">&quot;results/08_data-covars-2.5km.csv&quot;</span>))</span></code></pre></div>
 
 </div>
 </div>
diff --git a/docs/checking-temporal-sampling-frequency.html b/docs/checking-temporal-sampling-frequency.html
index f89a559..79d6248 100644
--- a/docs/checking-temporal-sampling-frequency.html
+++ b/docs/checking-temporal-sampling-frequency.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -316,7 +315,7 @@ <h2><span class="header-section-number">7.1</span> Load libraries</h2>
 <h2><span class="header-section-number">7.2</span> Load checklist data</h2>
 <p>Here we load filtered checklist data and convert to UTM 43N coordinates.</p>
 <div class="sourceCode" id="cb62"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb62-1"><a href="checking-temporal-sampling-frequency.html#cb62-1"></a><span class="co"># load checklist data</span></span>
-<span id="cb62-2"><a href="checking-temporal-sampling-frequency.html#cb62-2"></a><span class="kw">load</span>(<span class="st">&quot;data/01_data_prelim_processing.rdata&quot;</span>)</span>
+<span id="cb62-2"><a href="checking-temporal-sampling-frequency.html#cb62-2"></a><span class="kw">load</span>(<span class="st">&quot;results/02_data_prelim_processing.rdata&quot;</span>)</span>
 <span id="cb62-3"><a href="checking-temporal-sampling-frequency.html#cb62-3"></a></span>
 <span id="cb62-4"><a href="checking-temporal-sampling-frequency.html#cb62-4"></a><span class="co"># get checklists</span></span>
 <span id="cb62-5"><a href="checking-temporal-sampling-frequency.html#cb62-5"></a>data &lt;-<span class="st"> </span><span class="kw">distinct</span>(</span>
diff --git a/docs/examining-spatial-sampling-bias.html b/docs/examining-spatial-sampling-bias.html
index 442fcc4..72e2880 100644
--- a/docs/examining-spatial-sampling-bias.html
+++ b/docs/examining-spatial-sampling-bias.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -327,7 +326,7 @@ <h2><span class="header-section-number">6.1</span> Prepare libraries</h2>
 <h2><span class="header-section-number">6.2</span> Read checklist data</h2>
 <p>Read in checklist data with distance to nearest neighbouring site, and the distance to the nearest road.</p>
 <div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb54-1"><a href="examining-spatial-sampling-bias.html#cb54-1"></a><span class="co"># read from local file</span></span>
-<span id="cb54-2"><a href="examining-spatial-sampling-bias.html#cb54-2"></a>chkCovars &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;data/03_data-covars-perChklist.csv&quot;</span>)</span></code></pre></div>
+<span id="cb54-2"><a href="examining-spatial-sampling-bias.html#cb54-2"></a>chkCovars &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;results/04_data-covars-perChklist.csv&quot;</span>)</span></code></pre></div>
 <div id="spatially-explicit-filter-on-checklists-1" class="section level3">
 <h3><span class="header-section-number">6.2.1</span> Spatially explicit filter on checklists</h3>
 <p>We filter the checklists by the boundary of the study area. This is <em>not</em> the extent.</p>
@@ -383,7 +382,7 @@ <h3><span class="header-section-number">6.3.2</span> Table: Distance to roads</h
 <span id="cb57-11"><a href="examining-spatial-sampling-bias.html#cb57-11"></a>    <span class="kw">vars</span>(value),</span>
 <span id="cb57-12"><a href="examining-spatial-sampling-bias.html#cb57-12"></a>    <span class="kw">list</span>(<span class="op">~</span><span class="st"> </span><span class="kw">mean</span>(.), <span class="op">~</span><span class="st"> </span><span class="kw">sd</span>(.), <span class="op">~</span><span class="st"> </span><span class="kw">min</span>(.), <span class="op">~</span><span class="st"> </span><span class="kw">max</span>(.))</span>
 <span id="cb57-13"><a href="examining-spatial-sampling-bias.html#cb57-13"></a>  ) <span class="op">%&gt;%</span></span>
-<span id="cb57-14"><a href="examining-spatial-sampling-bias.html#cb57-14"></a><span class="st">  </span><span class="kw">write_csv</span>(<span class="st">&quot;data/results/distance_roads_sites.csv&quot;</span>)</span></code></pre></div>
+<span id="cb57-14"><a href="examining-spatial-sampling-bias.html#cb57-14"></a><span class="st">  </span><span class="kw">write_csv</span>(<span class="st">&quot;results/06_distance-roads-sites.csv&quot;</span>)</span></code></pre></div>
 </div>
 <div id="distance-to-nearest-neighbouring-site" class="section level3">
 <h3><span class="header-section-number">6.3.3</span> Distance to nearest neighbouring site</h3>
diff --git a/docs/index.html b/docs/index.html
index 4422463..3fb7cd7 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -297,18 +296,14 @@ <h1>
             <section class="normal" id="section-">
 <div id="header">
 <h1 class="title">Source code and supplementary material for <em>Using citizen science to parse climatic and landcover influences on bird occupancy within a tropical biodiversity hotspot</em></h1>
-<p class="date"><em>2022-04-26</em></p>
+<p class="date"><em>2022-04-28</em></p>
 </div>
 <div id="introduction" class="section level1">
 <h1><span class="header-section-number">Section 1</span> Introduction</h1>
-<p>This is the readable version containing analysis that models associations between environmental predictors (climate and landcover) and citizen science observations of birds across the Nilgiri and Anamalai Hills of the Western Ghats Biodiversity Hotspot.</p>
+<p>This is the readable version containing analysis that models associations between environmental predictors (climate and landcover) and citizen science observations of birds across the Nilgiri and Anamalai Hills of the Western Ghats Biodiversity hotspot.</p>
 <p>Methods and format are derived from <a href="https://cornelllabofornithology.github.io/ebird-best-practices/" class="uri">https://cornelllabofornithology.github.io/ebird-best-practices/</a>.</p>
-<div id="data-access" class="section level2">
-<h2><span class="header-section-number">1.1</span> Data access</h2>
-<p>The data used in this work are available from <a href="http://ebird.org/data/download">eBird</a>.</p>
-</div>
 <div id="data-processing" class="section level2">
-<h2><span class="header-section-number">1.2</span> Data processing</h2>
+<h2><span class="header-section-number">1.1</span> Data processing</h2>
 <p>The data processing for this project is described in the following sections. Navigate through them using the links in the sidebar.</p>
 <hr />
 <div class="figure">
diff --git a/docs/modelling-species-occupancy.html b/docs/modelling-species-occupancy.html
index 7df44b0..b08e617 100644
--- a/docs/modelling-species-occupancy.html
+++ b/docs/modelling-species-occupancy.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -330,7 +329,7 @@ <h3><span class="header-section-number">9.0.1</span> Load necessary libraries</h
 <h2><span class="header-section-number">9.1</span> Load dataframe and prepare covariates</h2>
 <p>Here, we load the required dataframe that contains 10 random visits to a site and environmental covariates that were prepared at a spatial scale of 2.5 sq.km. We also scaled all covariates (mean around 0 and standard deviation of 1). Next, we ensured that only Traveling and Stationary checklists were considered. Even though stationary counts have no distance traveled, we defaulted all stationary accounts to an effective distance of 100m, which we consider the average maximum detection radius for most bird species in our area.</p>
 <div class="sourceCode" id="cb85"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb85-1"><a href="modelling-species-occupancy.html#cb85-1"></a><span class="co"># Load in the prepared dataframe</span></span>
-<span id="cb85-2"><a href="modelling-species-occupancy.html#cb85-2"></a>dat &lt;-<span class="st"> </span><span class="kw">fread</span>(<span class="st">&quot;data/04_data-covars-2.5km.csv&quot;</span>, <span class="dt">header =</span> T)</span>
+<span id="cb85-2"><a href="modelling-species-occupancy.html#cb85-2"></a>dat &lt;-<span class="st"> </span><span class="kw">fread</span>(<span class="st">&quot;results/08_data-covars-2.5km.csv&quot;</span>, <span class="dt">header =</span> T)</span>
 <span id="cb85-3"><a href="modelling-species-occupancy.html#cb85-3"></a>dat &lt;-<span class="st"> </span><span class="kw">as_tibble</span>(dat)</span>
 <span id="cb85-4"><a href="modelling-species-occupancy.html#cb85-4"></a><span class="kw">head</span>(dat)</span></code></pre></div>
 <div id="handle-the-sampling-protocol" class="section level3">
@@ -452,12 +451,12 @@ <h3><span class="header-section-number">9.1.3</span> Scaling covariates</h3>
 <span id="cb88-51"><a href="modelling-species-occupancy.html#cb88-51"></a>)</span>
 <span id="cb88-52"><a href="modelling-species-occupancy.html#cb88-52"></a></span>
 <span id="cb88-53"><a href="modelling-species-occupancy.html#cb88-53"></a><span class="co"># save data to file</span></span>
-<span id="cb88-54"><a href="modelling-species-occupancy.html#cb88-54"></a><span class="kw">fwrite</span>(dat.scaled, <span class="st">&quot;data/05_scaled-covars-2.5km.csv&quot;</span>)</span></code></pre></div>
+<span id="cb88-54"><a href="modelling-species-occupancy.html#cb88-54"></a><span class="kw">fwrite</span>(dat.scaled, <span class="st">&quot;results/09_scaled-covars-2.5km.csv&quot;</span>)</span></code></pre></div>
 </div>
 <div id="correct-date-format" class="section level3">
 <h3><span class="header-section-number">9.1.4</span> Correct date format</h3>
 <div class="sourceCode" id="cb89"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb89-1"><a href="modelling-species-occupancy.html#cb89-1"></a><span class="co"># Reload the scaled covariate data</span></span>
-<span id="cb89-2"><a href="modelling-species-occupancy.html#cb89-2"></a>dat.scaled &lt;-<span class="st"> </span><span class="kw">fread</span>(<span class="st">&quot;data/05_scaled-covars-2.5km.csv&quot;</span>, <span class="dt">header =</span> T)</span>
+<span id="cb89-2"><a href="modelling-species-occupancy.html#cb89-2"></a>dat.scaled &lt;-<span class="st"> </span><span class="kw">fread</span>(<span class="st">&quot;results/09_scaled-covars-2.5km.csv&quot;</span>, <span class="dt">header =</span> T)</span>
 <span id="cb89-3"><a href="modelling-species-occupancy.html#cb89-3"></a>dat.scaled &lt;-<span class="st"> </span><span class="kw">as_tibble</span>(dat.scaled)</span>
 <span id="cb89-4"><a href="modelling-species-occupancy.html#cb89-4"></a><span class="kw">head</span>(dat.scaled)</span>
 <span id="cb89-5"><a href="modelling-species-occupancy.html#cb89-5"></a></span>
@@ -564,7 +563,7 @@ <h2><span class="header-section-number">9.2</span> Running a null model</h2>
 <span id="cb91-63"><a href="modelling-species-occupancy.html#cb91-63"></a><span class="kw">close</span>(pb)</span>
 <span id="cb91-64"><a href="modelling-species-occupancy.html#cb91-64"></a></span>
 <span id="cb91-65"><a href="modelling-species-occupancy.html#cb91-65"></a><span class="co"># Store all the  model outputs for each species</span></span>
-<span id="cb91-66"><a href="modelling-species-occupancy.html#cb91-66"></a><span class="kw">capture.output</span>(all_null, <span class="dt">file =</span> <span class="st">&quot;data/results/null_models.csv&quot;</span>)</span></code></pre></div>
+<span id="cb91-66"><a href="modelling-species-occupancy.html#cb91-66"></a><span class="kw">capture.output</span>(all_null, <span class="dt">file =</span> <span class="st">&quot;results/09_null_models.csv&quot;</span>)</span></code></pre></div>
 </div>
 <div id="identifying-covariates-necessary-to-model-the-detection-process" class="section level2">
 <h2><span class="header-section-number">9.3</span> Identifying covariates necessary to model the detection process</h2>
@@ -696,13 +695,13 @@ <h2><span class="header-section-number">9.3</span> Identifying covariates necess
 <span id="cb92-125"><a href="modelling-species-occupancy.html#cb92-125"></a><span class="co">## Storing output from the above models in excel sheets</span></span>
 <span id="cb92-126"><a href="modelling-species-occupancy.html#cb92-126"></a></span>
 <span id="cb92-127"><a href="modelling-species-occupancy.html#cb92-127"></a><span class="co"># 1. Store all the model outputs for each species (variable: det_dred() - see above)</span></span>
-<span id="cb92-128"><a href="modelling-species-occupancy.html#cb92-128"></a><span class="kw">write.xlsx</span>(det_dred, <span class="dt">file =</span> <span class="st">&quot;data/results/det-dred.xlsx&quot;</span>)</span>
+<span id="cb92-128"><a href="modelling-species-occupancy.html#cb92-128"></a><span class="kw">write.xlsx</span>(det_dred, <span class="dt">file =</span> <span class="st">&quot;results/09_det-dred.xlsx&quot;</span>)</span>
 <span id="cb92-129"><a href="modelling-species-occupancy.html#cb92-129"></a></span>
 <span id="cb92-130"><a href="modelling-species-occupancy.html#cb92-130"></a><span class="co"># 2. Store all the model averaged outputs for each species and the relative importance score</span></span>
-<span id="cb92-131"><a href="modelling-species-occupancy.html#cb92-131"></a><span class="kw">write.xlsx</span>(det_avg, <span class="dt">file =</span> <span class="st">&quot;data/results/det-avg.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
-<span id="cb92-132"><a href="modelling-species-occupancy.html#cb92-132"></a><span class="kw">write.xlsx</span>(det_imp, <span class="dt">file =</span> <span class="st">&quot;data/results/det-imp.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb92-131"><a href="modelling-species-occupancy.html#cb92-131"></a><span class="kw">write.xlsx</span>(det_avg, <span class="dt">file =</span> <span class="st">&quot;results/09_det-avg.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb92-132"><a href="modelling-species-occupancy.html#cb92-132"></a><span class="kw">write.xlsx</span>(det_imp, <span class="dt">file =</span> <span class="st">&quot;results/09_det-imp.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
 <span id="cb92-133"><a href="modelling-species-occupancy.html#cb92-133"></a></span>
-<span id="cb92-134"><a href="modelling-species-occupancy.html#cb92-134"></a><span class="kw">write.xlsx</span>(det_modelEst, <span class="dt">file =</span> <span class="st">&quot;data/results/det-modelEst.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb92-134"><a href="modelling-species-occupancy.html#cb92-134"></a><span class="kw">write.xlsx</span>(det_modelEst, <span class="dt">file =</span> <span class="st">&quot;results/09_det-modelEst.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
 <span id="cb92-135"><a href="modelling-species-occupancy.html#cb92-135"></a></span>
 <span id="cb92-136"><a href="modelling-species-occupancy.html#cb92-136"></a><span class="co"># Note if you are unable to write to a file, use (for example)</span></span>
 <span id="cb92-137"><a href="modelling-species-occupancy.html#cb92-137"></a>a &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map</span>(det_imp, <span class="op">~</span><span class="st"> </span>purrr<span class="op">::</span><span class="kw">compact</span>(.)) <span class="op">%&gt;%</span><span class="st"> </span>purrr<span class="op">::</span><span class="kw">keep</span>(<span class="op">~</span><span class="st"> </span><span class="kw">length</span>(.) <span class="op">!=</span><span class="st"> </span><span class="dv">0</span>)</span></code></pre></div>
@@ -843,14 +842,14 @@ <h2><span class="header-section-number">9.4</span> Land Cover and Climate</h2>
 <span id="cb93-129"><a href="modelling-species-occupancy.html#cb93-129"></a><span class="kw">close</span>(pb)</span>
 <span id="cb93-130"><a href="modelling-species-occupancy.html#cb93-130"></a></span>
 <span id="cb93-131"><a href="modelling-species-occupancy.html#cb93-131"></a><span class="co"># 1. Store all the model outputs for each species (for both landcover and climate)</span></span>
-<span id="cb93-132"><a href="modelling-species-occupancy.html#cb93-132"></a><span class="kw">write.xlsx</span>(lc_clim, <span class="dt">file =</span> <span class="st">&quot;data/results/lc-clim.xlsx&quot;</span>)</span>
+<span id="cb93-132"><a href="modelling-species-occupancy.html#cb93-132"></a><span class="kw">write.xlsx</span>(lc_clim, <span class="dt">file =</span> <span class="st">&quot;results/09_lc-clim.xlsx&quot;</span>)</span>
 <span id="cb93-133"><a href="modelling-species-occupancy.html#cb93-133"></a></span>
 <span id="cb93-134"><a href="modelling-species-occupancy.html#cb93-134"></a><span class="co"># 2. Store all the model averaged outputs for each species and relative importance scores</span></span>
-<span id="cb93-135"><a href="modelling-species-occupancy.html#cb93-135"></a><span class="kw">write.xlsx</span>(lc_clim_avg, <span class="dt">file =</span> <span class="st">&quot;data/results/lc-clim-avg.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
-<span id="cb93-136"><a href="modelling-species-occupancy.html#cb93-136"></a><span class="kw">write.xlsx</span>(lc_clim_imp, <span class="dt">file =</span> <span class="st">&quot;data/results/lc-clim-imp.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb93-135"><a href="modelling-species-occupancy.html#cb93-135"></a><span class="kw">write.xlsx</span>(lc_clim_avg, <span class="dt">file =</span> <span class="st">&quot;results/09_lc-clim-avg.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb93-136"><a href="modelling-species-occupancy.html#cb93-136"></a><span class="kw">write.xlsx</span>(lc_clim_imp, <span class="dt">file =</span> <span class="st">&quot;results/09_lc-clim-imp.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
 <span id="cb93-137"><a href="modelling-species-occupancy.html#cb93-137"></a></span>
 <span id="cb93-138"><a href="modelling-species-occupancy.html#cb93-138"></a><span class="co"># 3. Store all model estimates</span></span>
-<span id="cb93-139"><a href="modelling-species-occupancy.html#cb93-139"></a><span class="kw">write.xlsx</span>(lc_clim_modelEst, <span class="dt">file =</span> <span class="st">&quot;data/results/lc-clim-modelEst.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
+<span id="cb93-139"><a href="modelling-species-occupancy.html#cb93-139"></a><span class="kw">write.xlsx</span>(lc_clim_modelEst, <span class="dt">file =</span> <span class="st">&quot;results/09_lc-clim-modelEst.xlsx&quot;</span>, <span class="dt">rowNames =</span> T, <span class="dt">colNames =</span> T)</span>
 <span id="cb93-140"><a href="modelling-species-occupancy.html#cb93-140"></a></span>
 <span id="cb93-141"><a href="modelling-species-occupancy.html#cb93-141"></a><span class="co"># Note if you are unable to write to a file, use (for example)</span></span>
 <span id="cb93-142"><a href="modelling-species-occupancy.html#cb93-142"></a>a &lt;-<span class="st"> </span>purrr<span class="op">::</span><span class="kw">map</span>(lc_clim_modelEst, <span class="op">~</span><span class="st"> </span>purrr<span class="op">::</span><span class="kw">compact</span>(.)) <span class="op">%&gt;%</span><span class="st"> </span>purrr<span class="op">::</span><span class="kw">keep</span>(<span class="op">~</span><span class="st"> </span><span class="kw">length</span>(.) <span class="op">!=</span><span class="st"> </span><span class="dv">0</span>)</span></code></pre></div>
@@ -926,7 +925,7 @@ <h2><span class="header-section-number">9.5</span> Goodness-of-fit tests</h2>
 <span id="cb94-66"><a href="modelling-species-occupancy.html#cb94-66"></a>}</span>
 <span id="cb94-67"><a href="modelling-species-occupancy.html#cb94-67"></a><span class="kw">close</span>(pb)</span>
 <span id="cb94-68"><a href="modelling-species-occupancy.html#cb94-68"></a></span>
-<span id="cb94-69"><a href="modelling-species-occupancy.html#cb94-69"></a><span class="kw">write.csv</span>(goodness_of_fit, <span class="st">&quot;data/results/goodness-of-fit-2.5km.csv&quot;</span>, <span class="dt">row.names =</span> F)</span></code></pre></div>
+<span id="cb94-69"><a href="modelling-species-occupancy.html#cb94-69"></a><span class="kw">write.csv</span>(goodness_of_fit, <span class="st">&quot;results/09_goodness-of-fit-2.5km.csv&quot;</span>, <span class="dt">row.names =</span> F)</span></code></pre></div>
 
 </div>
 </div>
diff --git a/docs/predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html b/docs/predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html
index c4f6085..ed74493 100644
--- a/docs/predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html
+++ b/docs/predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -316,7 +315,7 @@ <h2><span class="header-section-number">11.1</span> Prepare libraries</h2>
 <div id="read-data" class="section level2">
 <h2><span class="header-section-number">11.2</span> Read data</h2>
 <div class="sourceCode" id="cb109"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb109-1"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-1"></a><span class="co"># read coefficient effect data</span></span>
-<span id="cb109-2"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-2"></a>data &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;data/results/data_predictor_effect.csv&quot;</span>)</span>
+<span id="cb109-2"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-2"></a>data &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;results/10_data-predictor-effect.csv&quot;</span>)</span>
 <span id="cb109-3"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-3"></a></span>
 <span id="cb109-4"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-4"></a><span class="co"># check for a predictor column</span></span>
 <span id="cb109-5"><a href="predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html#cb109-5"></a>assertthat<span class="op">::</span><span class="kw">assert_that</span>(</span>
diff --git a/docs/preparing-checklist-calibration-index.html b/docs/preparing-checklist-calibration-index.html
index bcfb76d..cb445bd 100644
--- a/docs/preparing-checklist-calibration-index.html
+++ b/docs/preparing-checklist-calibration-index.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -409,7 +408,7 @@ <h2><span class="header-section-number">5.3</span> Spatially explicit filter on
 <div id="prepare-species-of-interest" class="section level2">
 <h2><span class="header-section-number">5.4</span> Prepare species of interest</h2>
 <div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="preparing-checklist-calibration-index.html#cb45-1"></a><span class="co"># read in species list</span></span>
-<span id="cb45-2"><a href="preparing-checklist-calibration-index.html#cb45-2"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species_list.csv&quot;</span>)</span>
+<span id="cb45-2"><a href="preparing-checklist-calibration-index.html#cb45-2"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species-list.csv&quot;</span>)</span>
 <span id="cb45-3"><a href="preparing-checklist-calibration-index.html#cb45-3"></a></span>
 <span id="cb45-4"><a href="preparing-checklist-calibration-index.html#cb45-4"></a><span class="co"># set species of interest</span></span>
 <span id="cb45-5"><a href="preparing-checklist-calibration-index.html#cb45-5"></a>soi &lt;-<span class="st"> </span>specieslist<span class="op">$</span>scientific_name</span>
@@ -422,7 +421,7 @@ <h2><span class="header-section-number">5.4</span> Prepare species of interest</
 <span id="cb45-12"><a href="preparing-checklist-calibration-index.html#cb45-12"></a>]</span>
 <span id="cb45-13"><a href="preparing-checklist-calibration-index.html#cb45-13"></a></span>
 <span id="cb45-14"><a href="preparing-checklist-calibration-index.html#cb45-14"></a><span class="co"># write to file and link with checklist id later</span></span>
-<span id="cb45-15"><a href="preparing-checklist-calibration-index.html#cb45-15"></a><span class="kw">fwrite</span>(ebdSpSum, <span class="dt">file =</span> <span class="st">&quot;data/03_data-nspp-per-chk.csv&quot;</span>)</span></code></pre></div>
+<span id="cb45-15"><a href="preparing-checklist-calibration-index.html#cb45-15"></a><span class="kw">fwrite</span>(ebdSpSum, <span class="dt">file =</span> <span class="st">&quot;results/04_data-nspp-per-chk.csv&quot;</span>)</span></code></pre></div>
 </div>
 <div id="prepare-checklists-for-observer-score" class="section level2">
 <h2><span class="header-section-number">5.5</span> Prepare checklists for observer score</h2>
@@ -533,7 +532,7 @@ <h2><span class="header-section-number">5.7</span> Filter checklist data</h2>
 <span id="cb48-45"><a href="preparing-checklist-calibration-index.html#cb48-45"></a><span class="st">  </span><span class="kw">drop_na</span>(newjulianDate)</span>
 <span id="cb48-46"><a href="preparing-checklist-calibration-index.html#cb48-46"></a></span>
 <span id="cb48-47"><a href="preparing-checklist-calibration-index.html#cb48-47"></a><span class="co"># save to file for later reuse</span></span>
-<span id="cb48-48"><a href="preparing-checklist-calibration-index.html#cb48-48"></a><span class="kw">fwrite</span>(ebdChkSummary, <span class="dt">file =</span> <span class="st">&quot;data/03_data-covars-perChklist.csv&quot;</span>)</span></code></pre></div>
+<span id="cb48-48"><a href="preparing-checklist-calibration-index.html#cb48-48"></a><span class="kw">fwrite</span>(ebdChkSummary, <span class="dt">file =</span> <span class="st">&quot;results/04_data-covars-perChklist.csv&quot;</span>)</span></code></pre></div>
 </div>
 <div id="model-observer-expertise" class="section level2">
 <h2><span class="header-section-number">5.8</span> Model observer expertise</h2>
@@ -553,14 +552,14 @@ <h2><span class="header-section-number">5.8</span> Model observer expertise</h2>
 <span id="cb49-13"><a href="preparing-checklist-calibration-index.html#cb49-13"></a><span class="dt">data =</span> ebdChkSummary, <span class="dt">family =</span> <span class="st">&quot;poisson&quot;</span></span>
 <span id="cb49-14"><a href="preparing-checklist-calibration-index.html#cb49-14"></a>)</span></code></pre></div>
 <div class="sourceCode" id="cb50"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb50-1"><a href="preparing-checklist-calibration-index.html#cb50-1"></a><span class="co"># make dir if absent</span></span>
-<span id="cb50-2"><a href="preparing-checklist-calibration-index.html#cb50-2"></a><span class="cf">if</span> (<span class="op">!</span><span class="kw">dir.exists</span>(<span class="st">&quot;data/modOutput&quot;</span>)) {</span>
-<span id="cb50-3"><a href="preparing-checklist-calibration-index.html#cb50-3"></a>  <span class="kw">dir.create</span>(<span class="st">&quot;data/modOutput&quot;</span>)</span>
+<span id="cb50-2"><a href="preparing-checklist-calibration-index.html#cb50-2"></a><span class="cf">if</span> (<span class="op">!</span><span class="kw">dir.exists</span>(<span class="st">&quot;results/modOutput&quot;</span>)) {</span>
+<span id="cb50-3"><a href="preparing-checklist-calibration-index.html#cb50-3"></a>  <span class="kw">dir.create</span>(<span class="st">&quot;results/modOutput&quot;</span>)</span>
 <span id="cb50-4"><a href="preparing-checklist-calibration-index.html#cb50-4"></a>}</span>
 <span id="cb50-5"><a href="preparing-checklist-calibration-index.html#cb50-5"></a></span>
 <span id="cb50-6"><a href="preparing-checklist-calibration-index.html#cb50-6"></a><span class="co"># write model output to text file</span></span>
 <span id="cb50-7"><a href="preparing-checklist-calibration-index.html#cb50-7"></a>{</span>
 <span id="cb50-8"><a href="preparing-checklist-calibration-index.html#cb50-8"></a>  <span class="kw">writeLines</span>(R.utils<span class="op">::</span><span class="kw">captureOutput</span>(<span class="kw">list</span>(<span class="kw">Sys.time</span>(), <span class="kw">summary</span>(modObsExp))),</span>
-<span id="cb50-9"><a href="preparing-checklist-calibration-index.html#cb50-9"></a>    <span class="dt">con =</span> <span class="st">&quot;data/modOutput/03_model-output-expertise.txt&quot;</span></span>
+<span id="cb50-9"><a href="preparing-checklist-calibration-index.html#cb50-9"></a>    <span class="dt">con =</span> <span class="st">&quot;results/modOutput/04_model-output-expertise.txt&quot;</span></span>
 <span id="cb50-10"><a href="preparing-checklist-calibration-index.html#cb50-10"></a>  )</span>
 <span id="cb50-11"><a href="preparing-checklist-calibration-index.html#cb50-11"></a>}</span></code></pre></div>
 <div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb51-1"><a href="preparing-checklist-calibration-index.html#cb51-1"></a><span class="co"># make df with means</span></span>
@@ -582,7 +581,7 @@ <h2><span class="header-section-number">5.8</span> Model observer expertise</h2>
 <span id="cb51-17"><a href="preparing-checklist-calibration-index.html#cb51-17"></a>) <span class="op">%&gt;%</span></span>
 <span id="cb51-18"><a href="preparing-checklist-calibration-index.html#cb51-18"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">score =</span> scales<span class="op">::</span><span class="kw">rescale</span>(score))</span></code></pre></div>
 <div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb52-1"><a href="preparing-checklist-calibration-index.html#cb52-1"></a><span class="kw">fwrite</span>(dfPredict <span class="op">%&gt;%</span><span class="st"> </span>dplyr<span class="op">::</span><span class="kw">select</span>(observer, score),</span>
-<span id="cb52-2"><a href="preparing-checklist-calibration-index.html#cb52-2"></a>  <span class="dt">file =</span> <span class="st">&quot;data/03_data-obsExpertise-score.csv&quot;</span></span>
+<span id="cb52-2"><a href="preparing-checklist-calibration-index.html#cb52-2"></a>  <span class="dt">file =</span> <span class="st">&quot;results/04_data-obsExpertise-score.csv&quot;</span></span>
 <span id="cb52-3"><a href="preparing-checklist-calibration-index.html#cb52-3"></a>)</span></code></pre></div>
 
 </div>
diff --git a/docs/preparing-ebird-data.html b/docs/preparing-ebird-data.html
index 6aeca0b..e2e421f 100644
--- a/docs/preparing-ebird-data.html
+++ b/docs/preparing-ebird-data.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -325,7 +324,7 @@ <h2><span class="header-section-number">3.2</span> Filter data</h2>
 <p>Insert the list of species that we will be analyzing in this study. We initially chose those species that occurred in at least 5% of all checklists across 50% of the 25 x 25 km cells from where they have been reported, resulting in a total of 79 species. To arrive at this final list of species, we carried out further pre-processing which can be found in the previous script.</p>
 <p>For further details regarding the list of species, please refer to the main text of the manuscript.</p>
 <div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="preparing-ebird-data.html#cb11-1"></a><span class="co"># add species of interest</span></span>
-<span id="cb11-2"><a href="preparing-ebird-data.html#cb11-2"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species_list.csv&quot;</span>)</span>
+<span id="cb11-2"><a href="preparing-ebird-data.html#cb11-2"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species-list.csv&quot;</span>)</span>
 <span id="cb11-3"><a href="preparing-ebird-data.html#cb11-3"></a>speciesOfInterest &lt;-<span class="st"> </span><span class="kw">as.character</span>(specieslist<span class="op">$</span>scientific_name)</span></code></pre></div>
 <p>Here, we set broad spatial filters for the states of Kerala, Tamil Nadu and Karnataka and keep only those checklists for our list of species that were reported between 1st Jan 2013 and 31st May 2021.</p>
 <div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="preparing-ebird-data.html#cb12-1"></a><span class="co"># run filters using auk packages</span></span>
@@ -421,7 +420,7 @@ <h2><span class="header-section-number">3.4</span> Spatial filter</h2>
 <span id="cb18-32"><a href="preparing-ebird-data.html#cb18-32"></a>  })</span></code></pre></div>
 <p>Save temporary data created so far.</p>
 <div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="preparing-ebird-data.html#cb19-1"></a><span class="co"># save a temp data file</span></span>
-<span id="cb19-2"><a href="preparing-ebird-data.html#cb19-2"></a><span class="kw">save</span>(data, <span class="dt">file =</span> <span class="st">&quot;data/01_data_temp.rdata&quot;</span>)</span></code></pre></div>
+<span id="cb19-2"><a href="preparing-ebird-data.html#cb19-2"></a><span class="kw">save</span>(data, <span class="dt">file =</span> <span class="st">&quot;results/02_data_temp.rdata&quot;</span>)</span></code></pre></div>
 </div>
 <div id="handle-presence-data" class="section level2">
 <h2><span class="header-section-number">3.5</span> Handle presence data</h2>
@@ -504,7 +503,7 @@ <h2><span class="header-section-number">3.6</span> Add decimal time</h2>
 <span id="cb21-15"><a href="preparing-ebird-data.html#cb21-15"></a>assertthat<span class="op">::</span><span class="kw">assert_that</span>(<span class="op">!</span><span class="st">&quot;sf&quot;</span> <span class="op">%in%</span><span class="st"> </span><span class="kw">class</span>(dataGrouped))</span></code></pre></div>
 <p>The above data is saved to a file.</p>
 <div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="preparing-ebird-data.html#cb22-1"></a><span class="co"># save a temp data file</span></span>
-<span id="cb22-2"><a href="preparing-ebird-data.html#cb22-2"></a><span class="kw">save</span>(dataGrouped, <span class="dt">file =</span> <span class="st">&quot;data/01_data_prelim_processing.Rdata&quot;</span>)</span></code></pre></div>
+<span id="cb22-2"><a href="preparing-ebird-data.html#cb22-2"></a><span class="kw">save</span>(dataGrouped, <span class="dt">file =</span> <span class="st">&quot;results/02_data_prelim_processing.Rdata&quot;</span>)</span></code></pre></div>
 
 </div>
 </div>
diff --git a/docs/preparing-environmental-predictors.html b/docs/preparing-environmental-predictors.html
index 32c3911..499a981 100644
--- a/docs/preparing-environmental-predictors.html
+++ b/docs/preparing-environmental-predictors.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -299,7 +298,7 @@ <h1>
 <h1><span class="header-section-number">Section 4</span> Preparing Environmental Predictors</h1>
 <p>In this script, we processed climatic and landscape predictors for occupancy modeling.</p>
 <p>All climatic data was obtained from <a href="https://chelsa-climate.org/bioclim/" class="uri">https://chelsa-climate.org/bioclim/</a><br />
-All landscape data was derived from a high resolution land cover map (Roy et al. 2015). This map provides sufficient classes to achieve a high land cover resolution and can be accessed here (<a href="https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html" class="uri">https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html</a>)</p>
+All landscape data was derived from a high resolution land cover map (Roy et al. 2015). This map provides sufficient classes to achieve a high land cover resolution and can be accessed here (<a href="https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html" class="uri">https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html</a>).</p>
 <p>The goal here is to resample all rasters so that they have the same resolution of 1km cells.</p>
 <div id="prepare-libraries-1" class="section level2">
 <h2><span class="header-section-number">4.1</span> Prepare libraries</h2>
@@ -526,7 +525,7 @@ <h2><span class="header-section-number">4.5</span> Resample landcover from 10m t
 <span id="cb33-15"><a href="preparing-environmental-predictors.html#cb33-15"></a>)</span>
 <span id="cb33-16"><a href="preparing-environmental-predictors.html#cb33-16"></a></span>
 <span id="cb33-17"><a href="preparing-environmental-predictors.html#cb33-17"></a><span class="co"># read reclassification matrix</span></span>
-<span id="cb33-18"><a href="preparing-environmental-predictors.html#cb33-18"></a>reclassification_matrix &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/landUseClassification/LandCover_ReclassifyMatrix_2015.csv&quot;</span>)</span>
+<span id="cb33-18"><a href="preparing-environmental-predictors.html#cb33-18"></a>reclassification_matrix &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/landUseClassification/reclassification-matrix-landCover-2015.csv&quot;</span>)</span>
 <span id="cb33-19"><a href="preparing-environmental-predictors.html#cb33-19"></a>reclassification_matrix &lt;-<span class="st"> </span><span class="kw">as.matrix</span>(reclassification_matrix[, <span class="kw">c</span>(<span class="st">&quot;V1&quot;</span>, <span class="st">&quot;To&quot;</span>)])</span>
 <span id="cb33-20"><a href="preparing-environmental-predictors.html#cb33-20"></a></span>
 <span id="cb33-21"><a href="preparing-environmental-predictors.html#cb33-21"></a><span class="co"># reclassify</span></span>
diff --git a/docs/reference-keys.txt b/docs/reference-keys.txt
index b2931b7..4300867 100644
--- a/docs/reference-keys.txt
+++ b/docs/reference-keys.txt
@@ -1,5 +1,4 @@
 introduction
-data-access
 data-processing
 selecting-species-of-interest
 prepare-libraries
diff --git a/docs/references.html b/docs/references.html
index 4fefc87..4341e48 100644
--- a/docs/references.html
+++ b/docs/references.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
diff --git a/docs/search_index.json b/docs/search_index.json
index 98911f1..9430a93 100644
--- a/docs/search_index.json
+++ b/docs/search_index.json
@@ -1 +1 @@
-[["index.html", "Source code and supplementary material for Using citizen science to parse climatic and landcover influences on bird occupancy within a tropical biodiversity hotspot Section 1 Introduction 1.1 Data access 1.2 Data processing", " Source code and supplementary material for Using citizen science to parse climatic and landcover influences on bird occupancy within a tropical biodiversity hotspot 2022-04-26 Section 1 Introduction This is the readable version containing analysis that models associations between environmental predictors (climate and landcover) and citizen science observations of birds across the Nilgiri and Anamalai Hills of the Western Ghats Biodiversity Hotspot. Methods and format are derived from https://cornelllabofornithology.github.io/ebird-best-practices/. 1.1 Data access The data used in this work are available from eBird. 1.2 Data processing The data processing for this project is described in the following sections. Navigate through them using the links in the sidebar. The Nilgiri and Anamalai Hills in southern India provide a convenient geography for studying the interplay of land cover and climate on the distributions of bird species. (a) The Nilgiri and Anamalai Hills of the Southern Western Ghats are topographically complex, with maximum elevations &gt; 2,000 m, and are separated by the very low-lying Palghat Gap, which serves as a natural barrier to the dispersal of many hill birds. (b) Lower elevations are primarily covered by agriculture and settlements, reflecting the intense human pressure on this region, while mid- and higher elevations show a mix of natural and human-modified land cover types (see Fig. 2 for details). (c) The coastal edge of the area, and the windward hill slopes show limited temperature seasonality across the December  May period; this seasonality increases with distance from the coast but is lower at higher elevations inland. (d) Higher elevations also show limited precipitation seasonality than both low-lying coastal and inland regions. Our study area (bounds shown as dashed lines) includes multiple combinations of elevation, land cover type, and temperature and rainfall seasonality, resulting in a naturally occurring crossed-factorial design that allows us to study the effects of climate and land cover on bird occupancy. Representative forest-restricted and habitat-generalist birds from the study area are shown between panels (all images were obtained from Wikimedia commons and credit is assigned for each species in brackets); From L to R: (1) Malabar grey hornbill (by Koshy), (2) Crimson-backed sunbird (by Mandar Godbole), (3) Asian emerald dove (by Selvaganesh), (4) Black-and-orange flycatcher (by LKanth), (5) Grey-headed canary flycatcher (by David Raju), (6) Greater-racket tailed drongo (by MD Shahanshah Bappy), (7) Eurasian hoopoe (by Zeynel cebeci), (8) Chestnut-headed bee-eater (by MikeBirds), (9) Coppersmith barbet (by Raju Kasambe), (10) Red-vented bulbul (by TR Shankar Raman), (11) Pied bushchat (by TR Shankar Raman), (12) Ashy prinia (by Rison Thumboor). Elevation is from 30 m resolution SRTM data (Farr et al. 2007), land cover, at 1 km resolution, is reclassified from Roy et al. (2015), while climatic variation is represented by CHELSA seasonality layers (temperature: BIOCLIM 4a, rainfall: BIOCLIM 15), at 1km resolution (Karger et al. 2017). All layers were resampled to 1 km resolution for analyses. "],["selecting-species-of-interest.html", "Section 2 Selecting species of interest 2.1 Prepare libraries 2.2 Subset species by geographical confines of the study area 2.3 Subset an initial list of terrestrial birds based on a) minimum of 1000 detections between 2013-2021 and b) remove species that are often easily confused with congeners 2.4 Read subset of species following filtering and removal of waterbirds, raptors, and other noctural species 2.5 Load raw data for locations 2.6 Get proportional obs counts in 25km cells 2.7 Which species are reported sufficiently in checklists? 2.8 Figure: Checklist distribution 2.9 Prepare the species list", " Section 2 Selecting species of interest Prior to preparing eBird data for occupancy modeling, we selected a list of species using a simple and objective criteria. Our primary focus is to understand how terrestrial bird species occupancy (largely passerine species) varied as a function of climate and land cover across the Nilgiri and the Anamalai hills of the Western Ghats. We derived this list from inclusion criteria adapted from the State of Indias Birds 2020. Initially, we considered all species reported on eBird that occurred within the outlines of our study area. We then added a filter to consider only terrestrial birds and removed species that are often easily confused for their congeners (eg. green/greenish warbler). In addition, we considered only those species that had a minimum of 1000 detections each between 2013 and 2021. Next, the study area was divided into 25 x 25 km cells following Viswanathan et al. (2020). We then kept only those species that occurred in at least 5% of all checklists across half of the 25 x 25 km cells from where they have been reported (there are 42 unique 25 x 25 km grid cells across our study area). We used the above criteria to ensure as much uniform sampling of a species as possible across our study area and to reduce any erroneous associations between environmental drivers and species occupancy. This resulted in a total of 79 species, prior to occupancy modeling. This script shows the proportion of checklists that report a particular species across every 25km by 25km grid across the Nilgiris and the Anamalais. Using this analysis, we arrived at a final list of species for occupancy modeling. 2.1 Prepare libraries # load libraries library(data.table) library(readxl) library(magrittr) library(stringr) library(dplyr) library(tidyr) library(readr) library(ggplot2) library(ggthemes) library(scico) # round any function round_any &lt;- function(x, accuracy = 25000) { round(x / accuracy) * accuracy } # set file paths for auk functions # To use these two datasets, please download the latest versions from https://ebird.org/data/download and set the file path accordingly. Since these two datasets are extremely large, we have not uploaded the same to github. # In this study, the version of data loaded corresponds to November 2021. f_in_ebd &lt;- file.path(&quot;data/ebd_IN_relNov-2021.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/ebd_sampling_relNov-2021.txt&quot;) 2.2 Subset species by geographical confines of the study area # read in shapefile of the study area to subset by bounding box library(sf) wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) box &lt;- st_bbox(wg) # read in data and subset # To access the latest dataset, please visit: https://ebird.org/data/download and set the file path accordingly. ebd &lt;- fread(&quot;data/ebd_IN_relNov-2021.txt&quot;) ebd &lt;- ebd[between(LONGITUDE, box[&quot;xmin&quot;], box[&quot;xmax&quot;]) &amp; between(LATITUDE, box[&quot;ymin&quot;], box[&quot;ymax&quot;]), ] ebd &lt;- ebd[year(`OBSERVATION DATE`) &gt;= 2013, ] # make new column names newNames &lt;- str_replace_all(colnames(ebd), &quot; &quot;, &quot;_&quot;) %&gt;% str_to_lower() setnames(ebd, newNames) # keep useful columns columnsOfInterest &lt;- c( &quot;common_name&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;sampling_event_identifier&quot; ) ebd &lt;- ebd[, ..columnsOfInterest] 2.3 Subset an initial list of terrestrial birds based on a) minimum of 1000 detections between 2013-2021 and b) remove species that are often easily confused with congeners # Convert all presences marked &#39;X&#39; as &#39;1&#39; ebd &lt;- ebd %&gt;% mutate(observation_count = ifelse(observation_count == &quot;X&quot;, &quot;1&quot;, observation_count )) # Convert observation count to numeric ebd$observation_count &lt;- as.numeric(ebd$observation_count) totCount &lt;- ebd %&gt;% dplyr::select(scientific_name, common_name, observation_count) %&gt;% group_by(scientific_name, common_name) %&gt;% summarise(tot = sum(observation_count)) # subset species with a min of 1000 detections tot1000 &lt;- totCount %&gt;% filter(tot &gt; 1000) species1000 &lt;- tot1000$scientific_name ebd1000 &lt;- ebd[scientific_name %in% species1000, ] # Beginning with 3.37 million observations of 684 species in eBird that occurred within the outlines of our study area (Fig. 1), over the years 20132021, we retained only those species that had a minimum of 1,000 detections each between 2013 and 2021 (347 species remaining; 3.33 million observations). Next, we divided the study area into 25x25 km cells following State of Indias Birds 2020 methodology. We kept only those species that occurred in at least 5% of all checklists across half of the grids (42 unique grid cells) from which they had been reported. # export the above list as .csv to carry out initial filtering based on natural history write.csv(totCount, &quot;data/species_list.csv&quot;, row.names = F) 2.4 Read subset of species following filtering and removal of waterbirds, raptors, and other noctural species # add species of interest # please note the below script is obtained after manual subsetting based on natural history and hence the user is asked to examine the dataset obtained in the previous time step prior to further processing specieslist &lt;- read.csv(&quot;data/species_list.csv&quot;) speciesOfInterest &lt;- specieslist$scientific_name 2.5 Load raw data for locations Add a spatial filter and assign grids of 25km x 25km. # strict spatial filter and assign grid locs &lt;- ebd[, .(longitude, latitude)] # transform to UTM and get 25km boxes coords &lt;- setDF(locs) %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% bind_cols(as.data.table(st_coordinates(.))) %&gt;% st_transform(32643) %&gt;% mutate(id = 1:nrow(.)) # convert wg to UTM for filter wg &lt;- st_transform(wg, 32643) coords &lt;- coords %&gt;% filter(id %in% unlist(st_contains(wg, coords))) %&gt;% rename(longitude = X, latitude = Y) %&gt;% bind_cols(as.data.table(st_coordinates(.))) %&gt;% st_drop_geometry() %&gt;% as.data.table() # remove unneeded objects rm(locs) gc() coords &lt;- coords[, .N, by = .(longitude, latitude, X, Y)] ebd &lt;- merge(ebd, coords, all = FALSE, by = c(&quot;longitude&quot;, &quot;latitude&quot;)) ebd &lt;- ebd[(longitude %in% coords$longitude) &amp; (latitude %in% coords$latitude), ] 2.6 Get proportional obs counts in 25km cells # round to 25km cell in UTM coords ebd[, `:=`(X = round_any(X), Y = round_any(Y))] # count checklists in cell ebd_summary &lt;- ebd[, nchk := length(unique(sampling_event_identifier)), by = .(X, Y) ] # count checklists reporting each species in cell and get proportion ebd_summary &lt;- ebd_summary[, .(nrep = length(unique( sampling_event_identifier ))), by = .(X, Y, nchk, scientific_name) ] ebd_summary[, p_rep := nrep / nchk] # filter for soi ebd_summary &lt;- ebd_summary[scientific_name %in% speciesOfInterest, ] # complete the dataframe for no reports # keep no reports as NA --- allows filtering based on proportion reporting ebd_summary &lt;- setDF(ebd_summary) %&gt;% complete( nesting(X, Y), scientific_name # , # fill = list(p_rep = 0) ) %&gt;% filter(!is.na(p_rep)) 2.7 Which species are reported sufficiently in checklists? # A total of 42 unique grids (of 25km by 25km) across the study area # total number of checklists across unique grids tot_n_chklist &lt;- ebd_summary %&gt;% distinct(X, Y, nchk) # species-specific number of grids spp_grids &lt;- ebd_summary %&gt;% group_by(scientific_name) %&gt;% distinct(X, Y) %&gt;% count(scientific_name, name = &quot;n_grids&quot; ) # Write the above two results write_csv(tot_n_chklist, &quot;data/01_nchk_per_grid.csv&quot;) write_csv(spp_grids, &quot;data/01_ngrids_per_spp.csv&quot;) # left-join the datasets ebd_summary &lt;- left_join(ebd_summary, spp_grids, by = &quot;scientific_name&quot;) # check the proportion of grids across which this cut-off is met for each species # Is it &gt; 90% or 70%? # For example, with a 3% cut-off, ~100 species are occurring in &gt;50% # of the grids they have been reported in p_cutoff &lt;- 0.05 # Proportion of checklists a species has been reported in grid_proportions &lt;- ebd_summary %&gt;% group_by(scientific_name) %&gt;% tally(p_rep &gt;= p_cutoff) %&gt;% mutate(prop_grids_cut = n / (spp_grids$n_grids)) %&gt;% arrange(desc(prop_grids_cut)) grid_prop_cut &lt;- filter( grid_proportions, prop_grids_cut &gt;= p_cutoff ) # Write the results write_csv(grid_prop_cut, &quot;data/01_chk_5_percent.csv&quot;) # Identifying the number of species that occur in potentially &lt;5% of all lists total_number_lists &lt;- sum(tot_n_chklist$nchk) spp_sum_chk &lt;- ebd_summary %&gt;% distinct(X, Y, scientific_name, nrep) %&gt;% group_by(scientific_name) %&gt;% mutate(sum_chk = sum(nrep)) %&gt;% distinct(scientific_name, sum_chk) # Approximately 90 to 100 species occur in &gt;5% of all checklists prop_all_lists &lt;- spp_sum_chk %&gt;% mutate(prop_lists = sum_chk / total_number_lists) %&gt;% arrange(desc(prop_lists)) 2.8 Figure: Checklist distribution # add land library(rnaturalearth) land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) # crop land land &lt;- st_transform(land, 32643) Proportion of checklists reporting a species in each grid cell (25km side) between 2013 and 2021. Checklists were filtered to be within the boundaries of the Nilgiris and the Anamalai hills (black outline), but rounding to 25km cells may place cells outside the boundary. Deeper shades of red indicate a higher proportion of checklists reporting a species. 2.9 Prepare the species list # write the new list of species that occur in at least 5% of checklists across a minimum of 50% of the grids they have been reported in new_sp_list &lt;- semi_join(specieslist, grid_prop_cut, by = &quot;scientific_name&quot;) write_csv(new_sp_list, &quot;data/01_list-of-species-cutoff.csv&quot;) "],["preparing-ebird-data.html", "Section 3 Preparing eBird Data 3.1 Prepare libraries and data sources 3.2 Filter data 3.3 Process filtered data 3.4 Spatial filter 3.5 Handle presence data 3.6 Add decimal time", " Section 3 Preparing eBird Data 3.1 Prepare libraries and data sources Here, we will load the necessary libraries required for preparing the eBird data. Please download the latest versions of the eBird Basic Dataset (for India) and the eBird Sampling dataset from https://ebird.org/data/download. # load libraries library(tidyverse) library(readr) library(sf) library(auk) library(readxl) library(lubridate) # custom sum function sum.no.na &lt;- function(x) { sum(x, na.rm = T) } # set file paths for auk functions # To use these two datasets, please download the latest versions from https://ebird.org/data/download and set the file path accordingly. Since these two datasets are extremely large, we have not uploaded the same on github. # In this study, the version of data loaded corresponds to November 2021. f_in_ebd &lt;- file.path(&quot;data/ebd_IN_relNov-2021.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/ebd_sampling_relNov-2021.txt&quot;) 3.2 Filter data Insert the list of species that we will be analyzing in this study. We initially chose those species that occurred in at least 5% of all checklists across 50% of the 25 x 25 km cells from where they have been reported, resulting in a total of 79 species. To arrive at this final list of species, we carried out further pre-processing which can be found in the previous script. For further details regarding the list of species, please refer to the main text of the manuscript. # add species of interest specieslist &lt;- read.csv(&quot;data/species_list.csv&quot;) speciesOfInterest &lt;- as.character(specieslist$scientific_name) Here, we set broad spatial filters for the states of Kerala, Tamil Nadu and Karnataka and keep only those checklists for our list of species that were reported between 1st Jan 2013 and 31st May 2021. # run filters using auk packages ebd_filters &lt;- auk_ebd(f_in_ebd, f_in_sampling) %&gt;% auk_species(speciesOfInterest) %&gt;% auk_country(country = &quot;IN&quot;) %&gt;% auk_state(c(&quot;IN-KL&quot;, &quot;IN-TN&quot;, &quot;IN-KA&quot;)) %&gt;% # Restricting geography to TamilNadu, Kerala &amp; Karnataka auk_date(c(&quot;2013-01-01&quot;, &quot;2021-05-31&quot;)) %&gt;% auk_complete() # check filters ebd_filters Below code need not be run if it has been filtered once already and the above path leads to the right dataset. NB: This is a computation heavy process, run with caution. # specify output location and perform filter f_out_ebd &lt;- &quot;data/01_ebird-filtered-EBD-westernGhats.txt&quot; f_out_sampling &lt;- &quot;data/01_ebird-filtered-sampling-westernGhats.txt&quot; ebd_filtered &lt;- auk_filter(ebd_filters, file = f_out_ebd, file_sampling = f_out_sampling, overwrite = TRUE ) 3.3 Process filtered data The data has been filtered above using the auk functions. We will now work with the filtered checklist observations (Please note that we have not yet spatially filtered the checklists to the confines of our study area, which is the Nilgiris and the Anamalai hills. This step is carried out further on). # read in the data ebd &lt;- read_ebd(f_out_ebd) eBird checklists only suggest whether a species was reported at a particular location. To arrive at absence data, we use a process known as zero-filling (Johnston et al. 2021), wherein a new dataframe is created with a 0 marked for each checklist when the bird was not observed. # fill zeroes zf &lt;- auk_zerofill(f_out_ebd, f_out_sampling) new_zf &lt;- collapse_zerofill(zf) Let us now choose specific columns necessary for further analysis. # choose columns of interest columnsOfInterest &lt;- c( &quot;checklist_id&quot;, &quot;scientific_name&quot;, &quot;common_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;time_observations_started&quot;, &quot;observer_id&quot;, &quot;sampling_event_identifier&quot;, &quot;protocol_type&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;effort_area_ha&quot;, &quot;number_observers&quot;, &quot;species_observed&quot;, &quot;reviewed&quot; ) # make list of presence and absence data and choose cols of interest data &lt;- list(ebd, new_zf) %&gt;% map(function(x) { x %&gt;% select(one_of(columnsOfInterest)) }) # remove zerofills to save working memory rm(zf, new_zf) gc() # check for presences and absence in absences df, remove essentially the presences df which may lead to erroneous analysis data[[2]] &lt;- data[[2]] %&gt;% filter(species_observed == F) 3.4 Spatial filter A spatial filter is now supplied to further restrict our list of observations to the confines of the Nilgiris and the Anamalai hills of the Western Ghats biodiversity hotspot. # load shapefile of the study area library(sf) hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) # write a prelim filter by bounding box box &lt;- st_bbox(hills) # get data spatial coordinates dataLocs &lt;- data %&gt;% map(function(x) { select(x, longitude, latitude) %&gt;% filter(between(longitude, box[&quot;xmin&quot;], box[&quot;xmax&quot;]) &amp; between(latitude, box[&quot;ymin&quot;], box[&quot;ymax&quot;])) }) %&gt;% bind_rows() %&gt;% distinct() %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% st_set_crs(4326) %&gt;% st_intersection(hills) # get simplified data and drop geometry dataLocs &lt;- mutate(dataLocs, spatialKeep = T) %&gt;% bind_cols(., as_tibble(st_coordinates(dataLocs))) %&gt;% st_drop_geometry() # bind to data and then filter data &lt;- data %&gt;% map(function(x) { left_join(x, dataLocs, by = c(&quot;longitude&quot; = &quot;X&quot;, &quot;latitude&quot; = &quot;Y&quot;)) %&gt;% filter(spatialKeep == T) %&gt;% select(-Id, -spatialKeep) }) Save temporary data created so far. # save a temp data file save(data, file = &quot;data/01_data_temp.rdata&quot;) 3.5 Handle presence data Further pre-processing is required in the case of many checklists where species abundance is often unknown and an X is denoted in such cases. Here, we convert all X notations to a 1, suggesting a presence (as we are not concerned with abundance data in this analysis). We also removed those checklists where the duration in minutes is either not recorded or listed as zero. Lastly, we added an sampling effort based filter following (Johnston et al. 2021), wherein we considered only those checklists with duration in minutes is less than 300 and distance in kilometers traveled is less than 5km. Lastly, we excluded those group checklists where the number of observers was greater than 10. For the sake of occupancy modeling of appropriate detection and occupancy covariates, we restrict all our checklists between December 1st and May 31st (non-rainy months)and checklists recorded between 5am and 7pm. # in the first set, replace X, for presences, with 1 data[[1]] &lt;- data[[1]] %&gt;% mutate(observation_count = ifelse(observation_count == &quot;X&quot;, &quot;1&quot;, observation_count )) # remove records where duration is 0 data &lt;- map(data, function(x) filter(x, duration_minutes &gt; 0)) # group data by site and sampling event identifier # then, summarise relevant variables as the sum dataGrouped &lt;- map(data, function(x) { x %&gt;% group_by(sampling_event_identifier) %&gt;% summarise_at( vars( duration_minutes, effort_distance_km, effort_area_ha ), list(sum.no.na) ) }) # bind rows combining data frames, and filter dataGrouped &lt;- bind_rows(dataGrouped) %&gt;% filter( duration_minutes &lt;= 300, effort_distance_km &lt;= 5, effort_area_ha &lt;= 500 ) # get data identifiers, such as sampling identifier etc dataConstants &lt;- data %&gt;% bind_rows() %&gt;% select( sampling_event_identifier, time_observations_started, locality, locality_type, locality_id, observer_id, observation_date, scientific_name, observation_count, protocol_type, number_observers, longitude, latitude ) # join the summarised data with the identifiers, # using sampling_event_identifier as the key dataGrouped &lt;- left_join(dataGrouped, dataConstants, by = &quot;sampling_event_identifier&quot; ) # remove checklists or seis with more than 10 obervers count(dataGrouped, number_observers &gt; 10) # count how many have 10+ obs dataGrouped &lt;- filter(dataGrouped, number_observers &lt;= 10) # keep only checklists between 5AM and 7PM dataGrouped &lt;- filter(dataGrouped, time_observations_started &gt;= &quot;05:00:00&quot; &amp; time_observations_started &lt;= &quot;19:00:00&quot;) # keep only checklists between December 1st and May 31st dataGrouped &lt;- filter(dataGrouped, month(observation_date) %in% c(1, 2, 3, 4, 5, 12)) 3.6 Add decimal time We added a column where time is denoted in decimal hours since midnight. # assign present or not, and get time in decimal hours since midnight library(lubridate) time_to_decimal &lt;- function(x) { x &lt;- hms(x, quiet = TRUE) hour(x) + minute(x) / 60 + second(x) / 3600 } # will cause issues if using time obs started as a linear effect and not quadratic dataGrouped &lt;- mutate(dataGrouped, pres_abs = observation_count &gt;= 1, decimalTime = time_to_decimal(time_observations_started) ) # check class of dataGrouped, make sure not sf assertthat::assert_that(!&quot;sf&quot; %in% class(dataGrouped)) The above data is saved to a file. # save a temp data file save(dataGrouped, file = &quot;data/01_data_prelim_processing.Rdata&quot;) "],["preparing-environmental-predictors.html", "Section 4 Preparing Environmental Predictors 4.1 Prepare libraries 4.2 Prepare spatial extent 4.3 Prepare terrain rasters 4.4 Prepare CHELSA rasters 4.5 Resample landcover from 10m to 1km 4.6 Resample other rasters to 1km 4.7 Climate variables in relation to elevation 4.8 Climate across land cover types 4.9 Land cover type in relation to elevation 4.10 Main Text Figure 2", " Section 4 Preparing Environmental Predictors In this script, we processed climatic and landscape predictors for occupancy modeling. All climatic data was obtained from https://chelsa-climate.org/bioclim/ All landscape data was derived from a high resolution land cover map (Roy et al. 2015). This map provides sufficient classes to achieve a high land cover resolution and can be accessed here (https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html) The goal here is to resample all rasters so that they have the same resolution of 1km cells. 4.1 Prepare libraries We load some common libraries for raster processing and define a custom mode function. # load libs library(raster) library(stringi) library(glue) library(gdalUtils) library(purrr) library(dplyr) library(tidyr) library(tibble) # for plotting library(viridis) library(colorspace) library(tmap) library(scales) library(ggplot2) library(patchwork) # prep mode function to aggregate funcMode &lt;- function(x, na.rm = T) { ux &lt;- unique(x) ux[which.max(tabulate(match(x, ux)))] } # a basic test assertthat::assert_that(funcMode(c(2, 2, 2, 2, 3, 3, 3, 4)) == as.character(2), msg = &quot;problem in the mode function&quot; ) # works # get ci func ci &lt;- function(x) { qnorm(0.975) * sd(x, na.rm = T) / sqrt(length(x)) } 4.2 Prepare spatial extent We prepare a 30km buffer around the boundary of the study area. This buffer will be used to mask the landscape rasters.The buffer procedure is done on data transformed to the UTM 43N CRS to avoid distortions. # load hills library(sf) hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) hills &lt;- st_transform(hills, 32643) buffer &lt;- st_buffer(hills, 3e4) %&gt;% st_transform(4326) 4.3 Prepare terrain rasters We prepare the elevation data which is an SRTM raster layer, and derive the slope and aspect from it after cropping it to the extent of the study site buffer. Please download the latest version of the SRTM raster layer from https://www.worldclim.org/data/worldclim21.html # load elevation and crop to hills size, then mask by study area alt &lt;- raster(&quot;data/spatial/Elevation/alt&quot;) # this layer is not added to github as a result of its large size and can be downloaded from the above link alt.hills &lt;- raster::crop(alt, as(buffer, &quot;Spatial&quot;)) rm(alt) gc() # get slope and aspect slopeData &lt;- raster::terrain(x = alt.hills, opt = c(&quot;slope&quot;, &quot;aspect&quot;)) elevData &lt;- raster::stack(alt.hills, slopeData) rm(alt.hills) gc() 4.4 Prepare CHELSA rasters CHELSA rasters can be downloaded using the get_chelsa.sh shell script, which is a wget command pointing to the envidatS3.txt file. 4.4.1 Prepare BIOCLIM 4a and 15 We prepare the CHELSA rasters for seasonality in temperature (Bio 4a) and seasonality in precipitation (Bio 15) in the same way, reading them in, cropping them to the study site buffer extent, and handling the temperature layer values which we divide by 10. The CHELSA rasters can be downloaded from https://chelsa-climate.org/bioclim/ # list chelsa files # the chelsa data can be downloaded from the aforementioned link. They haven&#39;t been uploaded to github as a result of its large size. chelsaFiles &lt;- list.files(&quot;data/chelsa/&quot;, full.names = TRUE, recursive = TRUE, pattern = &quot;bio10&quot; ) # gather chelsa rasters chelsaData &lt;- purrr::map(chelsaFiles, function(chr) { a &lt;- raster(chr) crs(a) &lt;- crs(elevData) a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) return(a) }) # divide temperature by 10 chelsaData[[1]] &lt;- chelsaData[[1]] / 10 # stack chelsa data chelsaData &lt;- raster::stack(chelsaData) 4.4.2 Prepare BIOCLIM 4a if (file.exists(&quot;data/chelsa/CHELSA_bio10_4a.tif&quot;)) { message(&quot;Bio 4a already exists, will be overwritten&quot;) } Bioclim 4a, the coefficient of variation temperature seasonality is calculated as \\[Bio\\ 4a = \\frac{SD\\{ Tkavg_1, \\ldots Tkavg_{12} \\}}{(Bio\\ 1 + 273.15)} \\times 100\\] where \\(Tkavg_i = (Tkmin_i + Tkmax_i) / 2\\) Here, we use only the months of December and Jan  May for winter temperature variation. # list rasters by pattern patterns &lt;- c(&quot;tmin&quot;, &quot;tmax&quot;) # list the filepaths tkAvg &lt;- map(patterns, function(pattern) { # list the paths files &lt;- list.files( path = &quot;data/chelsa&quot;, full.names = TRUE, recursive = TRUE, pattern = pattern ) }) # print crs elev data for sanity check --- basic WGS84 crs(elevData) # now run over the paths and read as rasters and crop by buffer tkAvg &lt;- map(tkAvg, function(paths) { # going over the file paths, read them in as rasters, convert CRS and crop tempData &lt;- map(paths, function(path) { # read in a &lt;- raster(path) # assign crs crs(a) &lt;- crs(elevData) # crop by buffer, will throw error if CRS doesn&#39;t match a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) # return a a }) # convert each to kelvin, first dividing by 10 to get celsius tempData &lt;- map(tempData, function(tmpRaster) { tmpRaster &lt;- (tmpRaster / 10) + 273.15 }) }) # assign names names(tkAvg) &lt;- patterns # go over the tmin and tmax and get the average monthly temp tkAvg &lt;- map2(tkAvg[[&quot;tmin&quot;]], tkAvg[[&quot;tmax&quot;]], function(tmin, tmax) { # return the mean of the corresponding tmin and tmax # still in kelvin calc(stack(tmin, tmax), fun = mean) }) # calculate Bio 4a bio_4a &lt;- (calc(stack(tkAvg), fun = sd) / (chelsaData[[1]] + 273.15)) * 100 names(bio_4a) &lt;- &quot;CHELSA_bio10_4a&quot; # save bio_4a writeRaster(bio_4a, filename = &quot;data/chelsa/CHELSA_bio10_4a.tif&quot;, overwrite = T) 4.4.3 Prepare Bioclim 15 if (file.exists(&quot;data/chelsa/CHELSA_bio10_15.tif&quot;)) { message(&quot;Bio 15 already exists, will be overwritten&quot;) } Bioclim 15, the coefficient of variation precipitation (in our area, rainfall) seasonality is calculated as \\[Bio\\ 15 = \\frac{SD\\{ PPT_1, \\ldots PPT_{12} \\}}{1 + (Bio\\ 12 / 12)} \\times 100\\] where \\(PPT_i\\) is the monthly precipitation. Here, we use only the months of December and Jan  May for winter rainfall variation. # list rasters by pattern pattern &lt;- &quot;prec&quot; # list the filepaths pptTotal &lt;- list.files( path = &quot;data/chelsa&quot;, full.names = TRUE, recursive = TRUE, pattern = pattern ) # print crs elev data for sanity check --- basic WGS84 crs(elevData) # now run over the paths and read as rasters and crop by buffer pptTotal &lt;- map(pptTotal, function(path) { a &lt;- raster(path) # assign crs crs(a) &lt;- crs(elevData) # crop by buffer, will throw error if CRS doesn&#39;t match a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) # return a a }) # calculate Bio 4a bio_15 &lt;- (calc(stack(pptTotal), fun = sd) / (1 + (chelsaData[[2]] / 12))) * 100 names(bio_15) &lt;- &quot;CHELSA_bio10_15&quot; # save bio_4a writeRaster(bio_15, filename = &quot;data/chelsa/CHELSA_bio10_15.tif&quot;, overwrite = T) 4.4.4 Stack terrain and climate We stack the terrain and climatic rasters. # If bio4a and bio15 have already been prepared from previous runs/analysis - load them directly bio_4a &lt;- raster(&quot;data/chelsa/CHELSA_bio10_4a.tif&quot;) bio_15 &lt;- raster(&quot;data/chelsa/CHELSA_bio10_15.tif&quot;) 4.4.5 Stack terrain and climate We stack the terrain and climatic rasters. # stack rasters for efficient reprojection later env_data &lt;- stack(elevData, bio_4a, bio_15) 4.5 Resample landcover from 10m to 1km We read in a land cover classified image and resample that using the mode function to a 1km resolution. Please note that the resampling process need not be carried out as it has been done already and the resampled raster can be loaded with the subsequent code chunk. # read in landcover raster location # To access the land cover data, please visit: https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html landcover &lt;- &quot;data/landUseClassification/landcover_roy_2015/&quot; # read in and crop landcover &lt;- raster(landcover) buffer_utm &lt;- st_transform(buffer, 32643) landcover &lt;- crop( landcover, as( buffer_utm, &quot;Spatial&quot; ) ) # read reclassification matrix reclassification_matrix &lt;- read.csv(&quot;data/landUseClassification/LandCover_ReclassifyMatrix_2015.csv&quot;) reclassification_matrix &lt;- as.matrix(reclassification_matrix[, c(&quot;V1&quot;, &quot;To&quot;)]) # reclassify landcover_reclassified &lt;- reclassify( x = landcover, rcl = reclassification_matrix ) # write to file writeRaster(landcover_reclassified, filename = &quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;, overwrite = TRUE ) # check reclassification plot(landcover_reclassified) # get extent e &lt;- bbox(raster(landcover)) # init resolution res_init &lt;- res(raster(landcover)) # res to transform to 1000m res_final &lt;- res_init * (1000 / res_init) # use gdalutils gdalwarp for resampling transform # to 1km from 10m gdalUtils::gdalwarp( srcfile = &quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;, dstfile = &quot;data/landUseClassification/lc_01000m.tif&quot;, tr = c(res_final), r = &quot;mode&quot;, te = c(e) ) We compare the frequency of landcover classes between the original raster and the resampled 1km raster to be certain that the resampling has not resulted in drastic misrepresentation of the frequency of any landcover type. This comparison is made using the figure below. Resampling the Roy et al. (2015) landcover raster, reclassified into 7 main classes, to a resolution of 1km, preserves the important features of landcover over the study area. 4.6 Resample other rasters to 1km We now resample all other rasters to a resolution of 1km. 4.6.1 Read in resampled landcover Here, we read in the 1km landcover raster and set 0 to NA. lc_data &lt;- raster(&quot;data/landUseClassification/lc_01000m.tif&quot;) lc_data[lc_data == 0] &lt;- NA 4.6.2 Reproject environmental data using landcover as a template # resample to the corresponding landcover data env_data_resamp &lt;- projectRaster( from = env_data, to = lc_data, crs = crs(lc_data), res = res(lc_data) ) # export as raster stack land_stack &lt;- stack(env_data_resamp, lc_data) # get names land_names &lt;- glue(&#39;data/spatial/landscape_resamp{c(&quot;01&quot;)}_km.tif&#39;) # write to file raster::writeRaster( land_stack, filename = as.character(land_names), overwrite = TRUE ) 4.7 Climate variables in relation to elevation 4.7.1 Load resampled environmental rasters # read landscape prepare for plotting landscape &lt;- stack(&quot;data/spatial/landscape_resamp01_km.tif&quot;) # get proper names elev_names &lt;- c(&quot;elev&quot;, &quot;slope&quot;, &quot;aspect&quot;) chelsa_names &lt;- c(&quot;bio_4a&quot;, &quot;bio_15&quot;) names(landscape) &lt;- glue(&#39;{c(elev_names, chelsa_names, &quot;landcover&quot;)}&#39;) # make duplicate stack land_data &lt;- landscape[[c(&quot;elev&quot;, chelsa_names)]] # convert to list land_data &lt;- as.list(land_data) # map get values over the stack land_data &lt;- purrr::map(land_data, getValues) names(land_data) &lt;- c(&quot;elev&quot;, chelsa_names) # conver to dataframe and round to 200m land_data &lt;- bind_cols(land_data) land_data &lt;- drop_na(land_data) %&gt;% mutate(elev_round = plyr::round_any(elev, 200)) %&gt;% dplyr::select(-elev) %&gt;% pivot_longer( cols = contains(&quot;bio&quot;), names_to = &quot;clim_var&quot; ) %&gt;% group_by(elev_round, clim_var) %&gt;% summarise_all(.funs = list(~ mean(.), ~ ci(.))) Figure code is hidden in versions rendered as HTML or PDF. 4.8 Climate across land cover types Get climate values per (re-classified) landcover type from the 1km resampled raster. # make duplicate stack again lc_clim_data &lt;- landscape[[c(&quot;landcover&quot;, chelsa_names)]] # convert to list lc_clim_data &lt;- as.list(lc_clim_data) # map get values over the stack lc_clim_data &lt;- purrr::map(lc_clim_data, getValues) names(lc_clim_data) &lt;- c(&quot;landcover&quot;, chelsa_names) # conver to dataframe for histogram lc_clim_data &lt;- bind_cols(lc_clim_data) # pivot long lc_clim_data &lt;- pivot_longer( lc_clim_data, cols = contains(&quot;bio&quot;), names_to = &quot;climvar&quot; ) # make landcover factor lc_clim_data &lt;- mutate( lc_clim_data, landcover = factor(landcover) ) # filter bio lc_clim_data &lt;- filter( lc_clim_data, !is.na(landcover) ) # split by variable lc_clim_data &lt;- nest(lc_clim_data, data = c(&quot;landcover&quot;, &quot;value&quot;)) # assign names lc_clim_data$climvar_name &lt;- c( &quot;Temperature seasonality&quot;, &quot;Precipitation seasonality&quot; ) Plot density plots of climate seasonality per LC type. 4.9 Land cover type in relation to elevation # get data from landscape rasters lc_elev &lt;- tibble( elev = getValues(landscape[[&quot;elev&quot;]]), landcover = getValues(landscape[[&quot;landcover&quot;]]) ) # process data for proportions lc_elev &lt;- lc_elev %&gt;% filter(!is.na(landcover), !is.na(elev)) %&gt;% # round elev to 100m mutate(elev = plyr::round_any(elev, 100)) %&gt;% count(elev, landcover) %&gt;% group_by(elev) %&gt;% mutate(prop = n / sum(n)) # fill out lc elev lc_elev_canon &lt;- crossing( elev = unique(lc_elev$elev), landcover = unique(lc_elev$landcover) ) # bind with lcelev lc_elev &lt;- full_join(lc_elev, lc_elev_canon) # convert NA to zero lc_elev &lt;- replace_na(lc_elev, replace = list(n = 0, prop = 0)) Figure code is hidden in versions rendered as HTML and PDF. 4.10 Main Text Figure 2 Climate and land cover vary strongly along the elevation gradient in the Nilgiri and Anamalai Hills. Both (a) temperature seasonality and (b) precipitation seasonality, between the months of December and May, declines with increasing elevation across the Nilgiri and Anamalai Hills. Climatic variation is not very strongly associated with land cover type, as both natural habitats such as forests, and human-associated habitat types such as plantations show low seasonality in (c) temperature, and (d) precipitation. (e) Most elevations host a range of land cover types: while human-associated habitats such as agriculture are concentrated at lower elevations, and more natural types such as grasslands and forests are associated with higher elevations, each of these types is also found outside their characteristic elevational bands. We calculated climate seasonalities (BIOCLIM 4a and 15: temperature and precipitation, respectively) using CHELSA data over 1979  2013, from December to May (Karger et al. 2017), and present mean seasonality values (vertical bars show standard deviation) for every 200 m elevational band. Land cover types were taken from a reclassification of Roy et al. (2015; see main text) at 100 m elevational bands. Land cover types covering &lt; 1% of an elevational band are shaded grey. All landscape layers were first resampled to 1 km resolution. "],["preparing-checklist-calibration-index.html", "Section 5 Preparing Checklist Calibration Index 5.1 Prepare libraries 5.2 Prepare data 5.3 Spatially explicit filter on checklists 5.4 Prepare species of interest 5.5 Prepare checklists for observer score 5.6 Get landcover 5.7 Filter checklist data 5.8 Model observer expertise", " Section 5 Preparing Checklist Calibration Index Differences in local avifaunal expertise among citizen scientists can lead to biased species detection when compared with data collected by a consistent set of trained observers (Van Strien et al. 2013). Including observer expertise as a detection covariate in occupancy models using eBird data can help account for this variation (Johnston et al. 2018). Observer-specific expertise in local avifauna was calculated following (Kelling et al. 2015) as the normalized predicted number of species reported by an observer after 60 minutes of sampling across the most common land cover type within the study area. This score was calculated by examining checklists from anonymized observers across the study area. We modified Kelling et al. (2015) formulation by including only observations of the 79 species of interest in our calculations. An observer with a higher number of species of interest reported within 60 minutes would have a higher observer-specific expertise score, with respect to the study area. 5.1 Prepare libraries # load libs library(data.table) library(readxl) library(magrittr) library(stringr) library(dplyr) library(tidyr) library(auk) # get decimal time function library(lubridate) time_to_decimal &lt;- function(x) { x &lt;- lubridate::hms(x, quiet = TRUE) lubridate::hour(x) + lubridate::minute(x) / 60 + lubridate::second(x) / 3600 } 5.2 Prepare data Here, we go through the data preparation process again because we might want to assess observer expertise over a larger area than the study site. # Read in shapefile of study area to subset by bounding box library(sf) wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) # set file paths for auk functions f_in_ebd &lt;- file.path(&quot;data/01_ebird-filtered-EBD-westernGhats.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/01_ebird-filtered-sampling-westernGhats.txt&quot;) # run filters using auk packages ebd_filters &lt;- auk_ebd(f_in_ebd, f_in_sampling) %&gt;% auk_country(country = &quot;IN&quot;) %&gt;% auk_state(c(&quot;IN-KL&quot;, &quot;IN-TN&quot;, &quot;IN-KA&quot;)) %&gt;% # Restricting geography to TamilNadu, Kerala &amp; Karnataka auk_date(c(&quot;2013-01-01&quot;, &quot;2021-05-31&quot;)) %&gt;% auk_complete() # check filters ebd_filters # specify output location and perform filter f_out_ebd &lt;- &quot;data/ebird_for_expertise.txt&quot; f_out_sampling &lt;- &quot;data/ebird_sampling_for_expertise.txt&quot; ebd_filtered &lt;- auk_filter(ebd_filters, file = f_out_ebd, file_sampling = f_out_sampling, overwrite = TRUE ) Load in the filtered data and columns of interest. ## Process filtered data # read in the data ebd &lt;- fread(f_out_ebd) names &lt;- names(ebd) %&gt;% stringr::str_to_lower() %&gt;% stringr::str_replace_all(&quot; &quot;, &quot;_&quot;) setnames(ebd, names) # choose columns of interest columnsOfInterest &lt;- c( &quot;global_unique_identifier&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;time_observations_started&quot;, &quot;observer_id&quot;, &quot;sampling_event_identifier&quot;, &quot;protocol_type&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;effort_area_ha&quot;, &quot;number_observers&quot;, &quot;all_species_reported&quot;, &quot;reviewed&quot; ) ebd &lt;- setDF(ebd) %&gt;% as_tibble() %&gt;% dplyr::select(one_of(columnsOfInterest)) setDT(ebd) # remove checklists or seis with more than 10 obervers ebd &lt;- filter(ebd, number_observers &lt;= 10) # keep only checklists between 5AM and 7PM ebd &lt;- filter(ebd, time_observations_started &gt;= &quot;05:00:00&quot; &amp; time_observations_started &lt;= &quot;19:00:00&quot;) # keep only checklists between December 1st and May 31st ebd &lt;- filter(ebd, month(observation_date) %in% c(1, 2, 3, 4, 5, 12)) 5.3 Spatially explicit filter on checklists # get checklist locations ebd_locs &lt;- ebd[, .(longitude, latitude)] ebd_locs &lt;- setDF(ebd_locs) %&gt;% distinct() ebd_locs &lt;- st_as_sf(ebd_locs, coords = c(&quot;longitude&quot;, &quot;latitude&quot;) ) %&gt;% `st_crs&lt;-`(4326) %&gt;% bind_cols(as_tibble(st_coordinates(.))) %&gt;% st_transform(32643) %&gt;% mutate(id = 1:nrow(.)) # check whether to include to_keep &lt;- unlist(st_contains(wg, ebd_locs)) # filter locs ebd_locs &lt;- filter(ebd_locs, id %in% to_keep) %&gt;% bind_cols(as_tibble(st_coordinates(st_as_sf(.)))) %&gt;% st_drop_geometry() names(ebd_locs) &lt;- c(&quot;longitudeWGS&quot;, &quot;latitudeWGS&quot;, &quot;id&quot;, &quot;longitudeUTM&quot;, &quot;latitudeUTM&quot;) ebd &lt;- ebd[longitude %in% ebd_locs$longitudeWGS &amp; latitude %in% ebd_locs$latitudeWGS, ] 5.4 Prepare species of interest # read in species list specieslist &lt;- read.csv(&quot;data/species_list.csv&quot;) # set species of interest soi &lt;- specieslist$scientific_name ebdSpSum &lt;- ebd[, .( nSp = .N, totSoiSeen = length(intersect(scientific_name, soi)) ), by = list(sampling_event_identifier) ] # write to file and link with checklist id later fwrite(ebdSpSum, file = &quot;data/03_data-nspp-per-chk.csv&quot;) 5.5 Prepare checklists for observer score # 1. add new columns of decimal time and julian date ebd[, `:=`( decimalTime = time_to_decimal(time_observations_started), julianDate = yday(as.POSIXct(observation_date)) )] ebdEffChk &lt;- setDF(ebd) %&gt;% mutate(year = year(observation_date)) %&gt;% distinct( sampling_event_identifier, observer_id, year, duration_minutes, effort_distance_km, effort_area_ha, longitude, latitude, locality, locality_id, decimalTime, julianDate, number_observers ) %&gt;% # drop rows with NAs in cols used in the model tidyr::drop_na( sampling_event_identifier, observer_id, duration_minutes, decimalTime, julianDate ) %&gt;% # drop years below 2013 filter(year &gt;= 2013) # 3. join to covariates and remove large groups (&gt; 10) ebdChkSummary &lt;- inner_join(ebdEffChk, ebdSpSum) # remove ebird data rm(ebd) gc() 5.6 Get landcover Read in land cover type data resampled at 1km resolution. # read in 1km landcover and set 0 to NA library(raster) landcover &lt;- raster::raster(&quot;data/landUseClassification/lc_01000m.tif&quot;) landcover[landcover == 0] &lt;- NA # get locs in utm coords locs &lt;- distinct( ebdChkSummary, sampling_event_identifier, longitude, latitude, locality, locality_id ) locs &lt;- st_as_sf(locs, coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% st_transform(32643) %&gt;% st_coordinates() # get for unique points landcoverVec &lt;- raster::extract( x = landcover, y = locs ) # assign to df and overwrite setDT(ebdChkSummary)[, landcover := landcoverVec] 5.7 Filter checklist data # change names for easy handling setnames(ebdChkSummary, c( &quot;locality&quot;, &quot;locality_id&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observer&quot;, &quot;sei&quot;, &quot;duration&quot;, &quot;distance&quot;, &quot;area&quot;, &quot;nObs&quot;, &quot;decimalTime&quot;, &quot;julianDate&quot;, &quot;year&quot;, &quot;nSp&quot;, &quot;nSoi&quot;, &quot;landcover&quot; )) # count data points per observer obscount &lt;- count(ebdChkSummary, observer) %&gt;% filter(n &gt;= 3) # make factor variables and remove obs not in obscount # also remove 0 durations ebdChkSummary &lt;- ebdChkSummary %&gt;% mutate( distance = ifelse(is.na(distance), 0, distance), duration = if_else(is.na(duration), 0.0, as.double(duration)) ) %&gt;% filter( observer %in% obscount$observer, duration &gt; 0, duration &lt;= 300, nSoi &gt;= 0, distance &lt;= 5, !is.na(nSoi) ) %&gt;% mutate( landcover = as.factor(landcover), observer = as.factor(observer) ) %&gt;% drop_na(landcover) # editing julian date to model it in a linear fashion unique(ebdChkSummary$julianDate) ebdChkSummary &lt;- ebdChkSummary %&gt;% mutate( newjulianDate = case_when( julianDate &gt;= 334 &amp; julianDate &lt;= 365 ~ (julianDate - 333), julianDate &gt;= 1 &amp; julianDate &lt;= 152 ~ (julianDate + 31) ) ) %&gt;% drop_na(newjulianDate) # save to file for later reuse fwrite(ebdChkSummary, file = &quot;data/03_data-covars-perChklist.csv&quot;) 5.8 Model observer expertise Our observer expertise model aims to include the random intercept effect of observer identity, with a random slope effect of duration. This models the different rate of species accumulation by different observers, as well as their different starting points. # uses either a subset or all data library(lmerTest) # here we specify a glmm with random effects for observer # time is considered a fixed log predictor and a random slope modObsExp &lt;- glmer(nSoi ~ duration + sqrt(duration) + landcover + sqrt(decimalTime) + I((sqrt(decimalTime))^2) + log(newjulianDate) + I((log(newjulianDate)^2)) + (1 | observer) + (0 + duration | observer), data = ebdChkSummary, family = &quot;poisson&quot; ) # make dir if absent if (!dir.exists(&quot;data/modOutput&quot;)) { dir.create(&quot;data/modOutput&quot;) } # write model output to text file { writeLines(R.utils::captureOutput(list(Sys.time(), summary(modObsExp))), con = &quot;data/modOutput/03_model-output-expertise.txt&quot; ) } # make df with means observer &lt;- unique(ebdChkSummary$observer) # predict at 60 mins on the most common landcover (deciduous forests) dfPredict &lt;- ebdChkSummary %&gt;% summarise_at(vars(duration, decimalTime, newjulianDate), list(~ mean(.))) %&gt;% mutate(duration = 60, landcover = as.factor(2)) %&gt;% tidyr::crossing(observer) # run predict from model on it dfPredict &lt;- mutate(dfPredict, score = predict(modObsExp, newdata = dfPredict, type = &quot;response&quot;, allow.new.levels = TRUE ) ) %&gt;% mutate(score = scales::rescale(score)) fwrite(dfPredict %&gt;% dplyr::select(observer, score), file = &quot;data/03_data-obsExpertise-score.csv&quot; ) "],["examining-spatial-sampling-bias.html", "Section 6 Examining Spatial Sampling Bias 6.1 Prepare libraries 6.2 Read checklist data 6.3 Prepare Main Text Figure 3 6.4 Figure: Spatial sampling bias", " Section 6 Examining Spatial Sampling Bias The goal of this section is to show how far each checklist location is from the nearest road, and how far each site is from its nearest neighbour. This follows finding the pairwise distance between a large number of unique checklist locations to a vast number of roads, as well as to each other. 6.1 Prepare libraries # load libraries # for data library(sf) library(rnaturalearth) library(dplyr) library(readr) library(purrr) # for plotting library(scales) library(ggplot2) library(ggspatial) library(colorspace) # round any function round_any &lt;- function(x, accuracy = 20000) { round(x / accuracy) * accuracy } # ci function ci &lt;- function(x) { qnorm(0.975) * sd(x, na.rm = TRUE) / sqrt(length(x)) } 6.2 Read checklist data Read in checklist data with distance to nearest neighbouring site, and the distance to the nearest road. # read from local file chkCovars &lt;- read_csv(&quot;data/03_data-covars-perChklist.csv&quot;) 6.2.1 Spatially explicit filter on checklists We filter the checklists by the boundary of the study area. This is not the extent. chkCovars &lt;- st_as_sf(chkCovars, coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% st_transform(32643) # read wg wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) # get bounding box bbox &lt;- st_bbox(wg) # spatial subset chkCovars &lt;- chkCovars %&gt;% mutate(id = 1:nrow(.)) %&gt;% filter(id %in% unlist(st_contains(wg, chkCovars))) 6.2.2 Get background land for plotting # add land land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) %&gt;% st_transform(32643) # add roads data roads &lt;- st_read(&quot;data/spatial/roads_studysite_2019/roads_studysite_2019.shp&quot;) %&gt;% st_transform(32643) 6.3 Prepare Main Text Figure 3 6.3.1 Prepare histogram of distance to roads Figure code is hidden in versions rendered as HTML or PDF. 6.3.2 Table: Distance to roads # write the mean and ci95 to file chkCovars %&gt;% st_drop_geometry() %&gt;% dplyr::select(dist_road, nnb) %&gt;% tidyr::pivot_longer( cols = c(&quot;dist_road&quot;, &quot;nnb&quot;), names_to = &quot;variable&quot; ) %&gt;% group_by(variable) %&gt;% summarise_at( vars(value), list(~ mean(.), ~ sd(.), ~ min(.), ~ max(.)) ) %&gt;% write_csv(&quot;data/results/distance_roads_sites.csv&quot;) 6.3.3 Distance to nearest neighbouring site # get unique locations from checklists locs_unique &lt;- cbind( st_drop_geometry(chkCovars), st_coordinates(chkCovars) ) %&gt;% as_tibble() locs_unique &lt;- distinct(locs_unique, X, Y, .keep_all = T) Figure code is hidden in versions rendered as HTML and PDF. 6.3.4 Spatial distribution of distances to neighbours Figure code is hidden in HTML and PDF versions, consult the Rmarkdown file. Most observation sites are within 300m of another site. 6.4 Figure: Spatial sampling bias # get locations points &lt;- chkCovars %&gt;% bind_cols(as_tibble(st_coordinates(.))) %&gt;% st_drop_geometry() %&gt;% mutate(X = round_any(X, 2500), Y = round_any(Y, 2500)) # count points points &lt;- count(points, X, Y) Figure code is hidden in versions rendered as HTML and PDF. # save as png ggsave( fig_checklists_grid, filename = &quot;figs/fig_spatial_bias.png&quot; ) # save figure as Robject for next plot save(fig_checklists_grid, file = &quot;data/fig_checklists_grid.Rds&quot;) Sampling effort across the Nilgiri and Anamalai Hills, in the form of eBird checklists reported by birdwatchers, mostly takes place along roads, with the majority of checklists located &lt; 1 km from a roadway (see distribution in inset), and therefore, only about 300m, on average, from the location of another checklist. Each cell here is 2.5km x 2.5km. "],["checking-temporal-sampling-frequency.html", "Section 7 Checking Temporal Sampling Frequency 7.1 Load libraries 7.2 Load checklist data 7.3 Get time differences per grid cell 7.4 Time Since Previous Checklist 7.5 Main Text Figure 3 7.6 Checklists per Month", " Section 7 Checking Temporal Sampling Frequency How often are checklists recorded in each grid cell? 7.1 Load libraries # load libraries library(tidyverse) library(sf) # for plotting library(ggplot2) library(colorspace) library(scico) library(ggthemes) library(ggspatial) library(patchwork) 7.2 Load checklist data Here we load filtered checklist data and convert to UTM 43N coordinates. # load checklist data load(&quot;data/01_data_prelim_processing.rdata&quot;) # get checklists data &lt;- distinct( dataGrouped, sampling_event_identifier, observation_date, longitude, latitude ) # remove old data rm(dataGrouped) # transform to UTM 43N data &lt;- st_as_sf(data, coords = c(&quot;longitude&quot;, &quot;latitude&quot;), crs = 4326) data &lt;- st_transform(data, crs = 32643) # get coordinates and bind to data data &lt;- cbind( st_drop_geometry(data), st_coordinates(data) ) # bin to 1000m data &lt;- mutate(data, X = plyr::round_any(X, 2500), Y = plyr::round_any(Y, 2500) ) 7.3 Get time differences per grid cell # get time differences in days data &lt;- mutate(data, observation_date = as.POSIXct(observation_date)) data &lt;- nest(data, data = c(&quot;sampling_event_identifier&quot;, &quot;observation_date&quot;)) # map over data data &lt;- mutate(data, lag_metrics = lapply(data, function(df) { df &lt;- arrange(df, observation_date) lag &lt;- as.numeric(diff(df$observation_date, na.rm = TRUE) / (24 * 3600)) data &lt;- tibble( mean_lag = mean(lag, na.rm = TRUE), median_lag = median(lag, na.rm = TRUE), sd_lag = sd(lag, na.rm = TRUE), n_chk = nrow(df) ) data }) ) # unnest lag metrics data_lag &lt;- select(data, -data) data_lag &lt;- unnest(data_lag, cols = &quot;lag_metrics&quot;) # set the mean and median to infinity if nchk is 1 data_lag &lt;- mutate(data_lag, mean_lag = ifelse(n_chk == 1, Inf, mean_lag), median_lag = ifelse(n_chk == 1, Inf, median_lag), sd_lag = ifelse(n_chk == 1, Inf, sd_lag) ) # set all 0 to 1 data_lag &lt;- mutate(data_lag, mean_lag = mean_lag + 1, median_lag = median_lag + 1 ) # melt data by tile # data_lag = pivot_longer(data_lag, cols = c(&quot;mean_lag&quot;, &quot;median_lag&quot;, &quot;sd_lag&quot;)) 7.4 Time Since Previous Checklist 7.4.1 Get aux data # hills data wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) roads &lt;- st_read(&quot;data/spatial/roads_studysite_2019/roads_studysite_2019.shp&quot;) %&gt;% st_transform(32643) # add land library(rnaturalearth) land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) %&gt;% st_transform(32643) bbox &lt;- st_bbox(wg) 7.4.2 Histogram of lags Figure code hidden in HTML and PDF versions. # get lags data &lt;- mutate(data, lag_hist = lapply(data, function(df) { df &lt;- arrange(df, observation_date) lag &lt;- as.numeric(diff(df$observation_date, na.rm = TRUE) / (24 * 3600)) data &lt;- tibble( lag = lag + 1, index = seq(lag) ) data }) ) # unnest lags data_hist &lt;- select(data, X, Y, lag_hist) %&gt;% unnest(cols = &quot;lag_hist&quot;) Most sites are resurveyed at least once, but some are visited much more frequently than others. There does not appear to be a link between roads and visit frequency. eBird checklists are also strongly clustered in time, with some of the most sampled areas over the study period visited at intervals of &gt; 1 week, and with some less intensively sampled areas visited frequently, at intervals of &lt; 1 week. Overall, the majority of checklists are reported only a day after the previous checklist at that location (see inset). 7.5 Main Text Figure 3 Combining figures for spatial and temporal clustering into main text figure 3. This overall figure is not shown here, see main text. # load spatial bias figure load(&quot;data/fig_checklists_grid.Rds&quot;) Distribution of sampling effort in the form of eBird checklists in the Nilgiri and Anamalai Hills between 2013 and 2021. (a) Sampling effort across the Nilgiri and Anamalai Hills, in the form of eBird checklists reported by birdwatchers, mostly takes place along roads, with the majority of checklists located &lt;1 km from a roadway (see distribution in inset), and therefore, only about 300m, on average, from the location of another checklist. (b) eBird checklists are also strongly clustered in time, with some of the most sampled areas over the study period visited at intervals of &gt; 1 week, and with some less intensively sampled areas visited frequently, at intervals of &lt; 1 week. Overall, most checklists are reported only a day after the previous checklist at that location (see inset). Both spatial and temporal clustering make data thinning necessary. Both panels show counts or mean intervals in a 2.5km grid cell; the study area is bounded by a dashed line, and roads within it are shown as (a) blue or (b) red lines. 7.6 Checklists per Month We counted the checklists per month, pooled over years, to determine how sampling effort varies over the year. # get two week period by date data &lt;- select(data, X, Y, data) # unnest data &lt;- unnest(data, cols = &quot;data&quot;) # get fortnight library(lubridate) data &lt;- mutate(data, week = week(observation_date), week = plyr::round_any(week, 2), year = year(observation_date), month = month(observation_date) ) # count checklists per fortnight data_count &lt;- count(data, month, year) Observations peak in the early months of the year, and decline towards the rainy months, slowly increasing until the following winter. "],["adding-covariates-to-checklist-data.html", "Section 8 Adding Covariates to Checklist Data 8.1 Prepare libraries and data 8.2 Spatial subsampling 8.3 Temporal subsampling 8.4 Add checklist calibration index 8.5 Add climatic and landscape covariates 8.6 Spatial buffers around selected checklists 8.7 Spatial buffer-wide covariates", " Section 8 Adding Covariates to Checklist Data In this section, we prepare a final list of covariates, after taking into account spatial sampling bias, temporal bias and observer expertise scores (examined in previous sections). 8.1 Prepare libraries and data # load libs for data library(dplyr) library(readr) library(stringr) library(purrr) library(glue) library(tidyr) # check for velox and install library(devtools) if (!&quot;velox&quot; %in% installed.packages()) { install_github(&quot;hunzikp/velox&quot;) } # load spatial library(raster) library(rgeos) library(velox) library(sf) # load saved data object load(&quot;data/01_data_prelim_processing.rdata&quot;) 8.2 Spatial subsampling Sampling bias can be introduced into citizen science due to the often ad-hoc nature of data collection (Sullivan et al. 2014). For eBird, this translates into checklists reported when convenient, rather than at regular or random points in time and space, leading to non-independence in the data if observations are spatio-temporally clustered (Johnston et al. 2021). Spatio-temporal autocorrelation in the data can be reduced by sub-sampling at an appropriate spatial resolution, and by avoiding temporal clustering. We estimated two simple measures of spatial clustering: the distance from each site to the nearest road (road data from OpenStreetMap) and the nearest-neighbor distance for each site. Sites were strongly tied to roads (mean distance to road ± SD = 390.77 ± 859.15 m; range = 0.28 m  7.64 km) and were on average only 297 m away from another site (SD = 553 m; range = 0.14 m  12.85 km) (Figure 3). This analysis was done in the previous section. Here, to further reduce spatial autocorrelation, we divided the study area into a grid of 1km wide square cells and picked checklists from one site at random within each grid cell. # grid based spatial thinning gridsize &lt;- 500 # grid size in metres effort_distance_max &lt;- 1000 # removing checklists with this distance # make grids across the study site hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) grid &lt;- st_make_grid(hills, cellsize = gridsize) # filtering on !pres_abs keeps absences # this absence data will be thinned data_thin_absences &lt;- filter(dataGrouped, !pres_abs) data_presences &lt;- filter(dataGrouped, pres_abs) # split data by species data_thin_absences &lt;- split( x = data_thin_absences, f = data_thin_absences$scientific_name ) 8.2.1 Counting presence observation proportion data_presence_prop &lt;- count(data_presences, scientific_name, name = &quot;presences&quot;) %&gt;% mutate( absences = map_int(data_thin_absences, nrow), presence_prop = presences / (presences + absences) ) # mean and sd of presence prop data_presence_prop %&gt;% summarise(mean(presence_prop), sd(presence_prop)) # spatial thinning on each species retains # site with maximum visits per grid cell data_thin_absences &lt;- map(data_thin_absences, function(df) { # count visits per locality df &lt;- group_by(df, locality) %&gt;% mutate(tot_effort = length(sampling_event_identifier)) %&gt;% ungroup() # remove sites with distances above spatial independence df &lt;- df %&gt;% dplyr::filter(effort_distance_km &lt;= effort_distance_max) %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) # transform to regional UTM 43N and add id df &lt;- df %&gt;% st_transform(32643) %&gt;% mutate(coordId = 1:nrow(.)) %&gt;% bind_cols(as_tibble(st_coordinates(.))) # whcih cell has which coords grid_overlap &lt;- st_contains(grid, df) %&gt;% unclass() %&gt;% purrr::discard(.p = is_empty) # count length of grid overlap list # this is the number of cells with points in them sampled_cells &lt;- length(grid_overlap) # make tibble grid_overlap &lt;- tibble( uid_cell = seq(length(grid_overlap)), # the uid_cell is specific to this sp. coordId = grid_overlap ) # unnest grid_overlap &lt;- unnest(grid_overlap, cols = &quot;coordId&quot;) # join grid cell overlap with coordinate data df &lt;- left_join(df, grid_overlap, by = &quot;coordId&quot; ) %&gt;% st_drop_geometry() # for each uid_cell, select coord where effort is max points_max &lt;- df %&gt;% group_by(uid_cell) %&gt;% dplyr::filter(tot_effort == max(tot_effort)) %&gt;% # there may be multiple rows with max effort, select first dplyr::filter(row_number() == 1) # check for number of samples assertthat::assert_that( assertthat::are_equal(sampled_cells, nrow(points_max), msg = &quot;spatial thinning error: more samples than\\\\ sampled cells&quot; ) ) # check that there is one sample per cell assertthat::assert_that( assertthat::are_equal( max(count(points_max, uid_cell)$n), 1 ) ) # return data without UID cell and coordinate Id dplyr::select(ungroup(points_max), -uid_cell, -coordId, -tot_effort) }) # remove old data rm(dataGrouped) 8.2.2 Count absences after spatial thinning data_presence_prop &lt;- data_presence_prop %&gt;% mutate( absences_sp_thin = map_int(data_thin_absences, nrow) ) 8.3 Temporal subsampling Additionally, from each selected site, we randomly selected a maximum of 10 absence checklists, which reduced temporal clustering. We kept all presence checklists. # subsample data for random 10 observations dataSubsample &lt;- map(data_thin_absences, function(df) { df &lt;- ungroup(df) df_to_locality &lt;- split(x = df, f = df$locality) df_samples &lt;- map_if( .x = df_to_locality, .p = function(x) { nrow(x) &gt; 10 }, .f = function(x) sample_n(x, 10, replace = FALSE) ) bind_rows(df_samples) }) 8.3.1 Count absences after temporal thinning data_presence_prop &lt;- data_presence_prop %&gt;% mutate( absences_tmp_thin = map_int(dataSubsample, nrow), presence_prop_post_thin = presences / (presences + absences_tmp_thin) ) # save data write_csv(data_presence_prop, &quot;data/results/data_class_balance.csv&quot;) # bind all spatially and temporally thinned absences rows for data frame dataSubsample &lt;- bind_rows(dataSubsample) # convert presence data to UTM 43 N and long-lat to X-Y data_presences &lt;- bind_cols( data_presences, as_tibble( st_as_sf( data_presences, coords = c(&quot;longitude&quot;, &quot;latitude&quot;), crs = 4326 ) %&gt;% st_transform(32643) %&gt;% st_coordinates() ) ) # drop long lat data_presences &lt;- dplyr::select(data_presences, -longitude, -latitude) # join ALL PRESENCES and THINNED ABSENCES dataSubsample &lt;- bind_rows(dataSubsample, data_presences) # check joined data assertthat::assert_that( max(apply(dataSubsample, 2, function(x) sum(is.na(x)))) == 0, msg = &quot;some columns missing from one of the datasets&quot; ) # remove previous data rm(data_thin_absences) 8.4 Add checklist calibration index Load the CCI computed in the previous section. The CCI was the lone observers expertise score for single-observer checklists, and the highest expertise score among observers for group checklists. # read in obs score and extract numbers expertiseScore &lt;- read_csv(&quot;data/03_data-obsExpertise-score.csv&quot;) %&gt;% mutate(numObserver = str_extract(observer, &quot;\\\\d+&quot;)) %&gt;% dplyr::select(-observer) # group seis consist of multiple observers # in this case, seis need to have the highest expertise observer score # as the associated covariate # get unique observers per sei dataSeiScore &lt;- distinct( dataSubsample, sampling_event_identifier, observer_id ) %&gt;% # make list column of observers mutate(observers = str_split(observer_id, &quot;,&quot;)) %&gt;% unnest(cols = c(observers)) %&gt;% # add numeric observer id mutate(numObserver = str_extract(observers, &quot;\\\\d+&quot;)) %&gt;% # now get distinct sei and observer id numeric distinct(sampling_event_identifier, numObserver) # now add expertise score to sei dataSeiScore &lt;- left_join(dataSeiScore, expertiseScore, by = &quot;numObserver&quot; ) %&gt;% # get max expertise score per sei group_by(sampling_event_identifier) %&gt;% summarise(expertise = max(score)) # add to dataCovar dataSubsample &lt;- left_join(dataSubsample, dataSeiScore, by = &quot;sampling_event_identifier&quot; ) # remove data without expertise score dataSubsample &lt;- filter(dataSubsample, !is.na(expertise)) 8.5 Add climatic and landscape covariates Reload climate and land cover predictors prepared previously. # list landscape covariate stacks landscape_files &lt;- &quot;data/spatial/landscape_resamp01_km.tif&quot; # read in as stacks landscape_data &lt;- stack(landscape_files) # get proper names elev_names &lt;- c(&quot;elev&quot;, &quot;slope&quot;, &quot;aspect&quot;) chelsa_names &lt;- c(&quot;bio_1&quot;, &quot;bio_12&quot;) names(landscape_data) &lt;- as.character(glue(&#39;{c(elev_names, chelsa_names, &quot;landcover&quot;)}&#39;)) 8.6 Spatial buffers around selected checklists Every checklist on eBird is associated with a latitude and longitude. However, the coordinates entered by an observer may not accurately depict the location at which a species was detected. This can occur for two reasons: first, traveling checklists are often associated with a single location along the route travelled by observers; and second, checklist locations could be assigned to a hotspot  a location that is marked by eBird as being frequented by multiple observers. In many cases, an observation might be assigned to a hotspot even though the observation was not made at the precise location of the hotspot (Praveen J 2017). Johnston et al. (2021) showed that a large proportion of observations occurred within a 3km grid, even for those checklists up to 5km in length. Hence to adjust for spatial precision, we considered a minimum radius of 2.5km around each unique locality when sampling environmental covariate values. # assign neighbourhood radius in m sample_radius &lt;- 2.5 * 1e3 # get distinct points and make buffer ebird_buff &lt;- dataSubsample %&gt;% ungroup() %&gt;% distinct(X, Y) %&gt;% # remove NAs drop_na() # convert to spatial features ebird_buff &lt;- st_as_sf(ebird_buff, coords = c(&quot;X&quot;, &quot;Y&quot;), crs = 32643) %&gt;% # add long lat bind_cols(as_tibble(st_coordinates(.))) %&gt;% # make buffer around points st_buffer(dist = sample_radius) 8.7 Spatial buffer-wide covariates 8.7.1 Mean climatic covariates All climatic covariates are sampled by considering the mean values within a 2.5km radius as discussed above and prefixed am_. # get area mean for all preds except landcover, which is the last one stk &lt;- raster::dropLayer(landscape_data, &quot;landcover&quot;) # removing landcover here velstk &lt;- velox(stk) # velox raster value extraction dextr &lt;- velstk$extract( sp = ebird_buff, df = TRUE, fun = function(x) mean(x, na.rm = T) ) # assign names for joining names(dextr) &lt;- c(&quot;id&quot;, names(stk)) env_area_mean &lt;- as_tibble(dextr) # add id to buffer data ebird_buff &lt;- mutate(ebird_buff, id = seq(nrow(ebird_buff)) ) # join to buffer data ebird_buff &lt;- inner_join(ebird_buff, env_area_mean) 8.7.2 Proportions of land cover type All land cover covariates were sampled by considering the proportion of each land cover type within a 2.5km radius. # get the last element of each stack from the list # this is the landcover at that resolution lc &lt;- landscape_data[[&quot;landcover&quot;]] # accessing landcover here lc_velox &lt;- velox(lc) lc_vals &lt;- lc_velox$extract(sp = ebird_buff, df = TRUE) names(lc_vals) &lt;- c(&quot;id&quot;, &quot;lc&quot;) # get landcover proportions lc_prop &lt;- count(lc_vals, id, lc) %&gt;% group_by(id) %&gt;% mutate( lc = glue(&#39;lc_{str_pad(lc, 2, pad = &quot;0&quot;)}&#39;), prop = n / sum(n) ) %&gt;% dplyr::select(-n) %&gt;% tidyr::pivot_wider( names_from = lc, values_from = prop, values_fill = list(prop = 0) ) %&gt;% ungroup() # join to data ebird_buff &lt;- mutate(ebird_buff, lc_prop) 8.7.3 Link environmental covariates to checklists # drop geometry ebird_buff &lt;- st_drop_geometry(ebird_buff) # link to dataSubsample dataSubsample &lt;- inner_join(dataSubsample, ebird_buff, by = c(&quot;X&quot;, &quot;Y&quot;) ) Save data to file. # write to file write_csv(dataSubsample, path = glue(&quot;data/04_data-covars-2.5km.csv&quot;)) "],["modelling-species-occupancy.html", "Section 9 Modelling Species Occupancy 9.1 Load dataframe and prepare covariates 9.2 Running a null model 9.3 Identifying covariates necessary to model the detection process 9.4 Land Cover and Climate 9.5 Goodness-of-fit tests", " Section 9 Modelling Species Occupancy 9.0.1 Load necessary libraries # Load libraries # for ebird data library(auk) library(ebirdst) # general data library(tidyverse) library(data.table) library(lubridate) library(openxlsx) library(raster) # probably unnecessary # for models library(unmarked) library(MuMIn) library(AICcmodavg) library(fields) # for computation library(doParallel) library(snow) library(ecodist) # Source necessary functions source(&quot;R/fun_screen_cor.R&quot;) source(&quot;R/fun_model_estimate_collection.r&quot;) 9.1 Load dataframe and prepare covariates Here, we load the required dataframe that contains 10 random visits to a site and environmental covariates that were prepared at a spatial scale of 2.5 sq.km. We also scaled all covariates (mean around 0 and standard deviation of 1). Next, we ensured that only Traveling and Stationary checklists were considered. Even though stationary counts have no distance traveled, we defaulted all stationary accounts to an effective distance of 100m, which we consider the average maximum detection radius for most bird species in our area. # Load in the prepared dataframe dat &lt;- fread(&quot;data/04_data-covars-2.5km.csv&quot;, header = T) dat &lt;- as_tibble(dat) head(dat) 9.1.1 Handle the sampling protocol Select protocol and add 0.1 km to stationary checklists. # Some more pre-processing to get the right data structures # Ensuring that only Traveling and Stationary checklists were considered names(dat) dat &lt;- dat %&gt;% filter(protocol_type %in% c(&quot;Traveling&quot;, &quot;Stationary&quot;)) # We take all stationary counts and give them a distance of 100 m (so 0.1 km), # as that&#39;s approximately the max normal hearing distance for people doing point # counts. dat &lt;- dat %&gt;% mutate(effort_distance_km = if_else( effort_distance_km == 0 &amp; protocol_type == &quot;Stationary&quot;, 0.1, effort_distance_km )) 9.1.2 Handle time and date Convert time and date to julian date and minutes since day. # Converting time observations started to numeric and adding it as a new column # This new column will be minute_observations_started dat &lt;- dat %&gt;% mutate( min_obs_started = as.integer( as.difftime( time_observations_started, format = &quot;%H:%M:%S&quot;, units = &quot;mins&quot; ) ) ) # Adding the julian date to the dataframe dat &lt;- dat %&gt;% mutate(julian_date = lubridate::yday(observation_date)) # recode julian date to model it as a linear predictor dat &lt;- dat %&gt;% mutate( newjulianDate = case_when( (julian_date &gt;= 334 &amp; julian_date) &lt;= 365 ~ (julian_date - 333), (julian_date &gt;= 1 &amp; julian_date) &lt;= 152 ~ (julian_date + 31) ) ) %&gt;% drop_na(newjulianDate) # recode time observations started to model it as a linear predictor dat &lt;- dat %&gt;% mutate( newmin_obs_started = case_when( min_obs_started &gt;= 300 &amp; min_obs_started &lt;= 720 ~ abs(min_obs_started - 720), min_obs_started &gt;= 720 &amp; min_obs_started &lt;= 1140 ~ abs(720 - min_obs_started) ) ) %&gt;% drop_na(newmin_obs_started) 9.1.3 Scaling covariates # Removing other unnecessary columns from the dataframe and creating a clean one without the rest names(dat) # select relevant columns BY NAME dat &lt;- dplyr::select( dat, c( &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;locality&quot;, &quot;locality_type&quot;, &quot;locality_id&quot;, &quot;observer_id&quot;, &quot;observation_date&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;protocol_type&quot;, &quot;number_observers&quot;, &quot;pres_abs&quot; ), # rename X-Y but NOTE THEY ARE IN UTM COORDINATES longitude = &quot;X&quot;, latitude = &quot;Y&quot;, expertise, # elevation and climate layers elev, bio4, bio15, # all LANDCOVER COLUMNS matches(&quot;lc&quot;), # set new columns to old column names julian_date = &quot;newjulianDate&quot;, min_obs_started = &quot;newmin_obs_started&quot; ) # add year and convert presence-absence to integer dat.1 &lt;- dat %&gt;% mutate( year = year(observation_date), pres_abs = as.integer(pres_abs) ) # occupancy modeling requires an integer response # Scaling detection and occupancy covariates dat.scaled &lt;- dat.1 # Note: Never refer to columns by numbers, numbers change, names remain cols_to_scale &lt;- c( &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;elev&quot;, &quot;bio4a&quot;, &quot;bio15a&quot;, &quot;lc_02&quot;, &quot;lc_09&quot;, &quot;lc_01&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;julian_date&quot;, &quot;min_obs_started&quot; ) # this scales the relevant columns between 0 and 1 dat.scaled &lt;- mutate( dat.scaled, across( .cols = all_of(cols_to_scale), # referring to the columns .fns = scales::rescale # the rescale function ) ) # save data to file fwrite(dat.scaled, &quot;data/05_scaled-covars-2.5km.csv&quot;) 9.1.4 Correct date format # Reload the scaled covariate data dat.scaled &lt;- fread(&quot;data/05_scaled-covars-2.5km.csv&quot;, header = T) dat.scaled &lt;- as_tibble(dat.scaled) head(dat.scaled) # Ensure observation_date column is in the right format dat.scaled$observation_date &lt;- format( as.Date( dat.scaled$observation_date, &quot;%m/%d/%Y&quot; ), &quot;%Y-%m-%d&quot; ) 9.1.5 Check for correlated covariates # Testing for correlations before running further analyses # Majority are uncorrelated since we decided to keep climatic and land cover predictors and removed elevation. source(&quot;R/fun_screen_cor.R&quot;) # SELECT COLUMNS to check BY NAME cols_to_check &lt;- c( &quot;expertise&quot;, &quot;bio4a&quot;, &quot;bio15a&quot;, &quot;lc_02&quot;, &quot;lc_09&quot;, &quot;lc_01&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_07&quot;, &quot;lc_03&quot; ) # screen covariates for correlation screen.cor(dat.scaled[, cols_to_check], threshold = 0.3) # total number of presences by species # min no. presences = 224 to max = 7725 presSpecies &lt;- dat.scaled %&gt;% group_by(scientific_name) %&gt;% filter(pres_abs == &quot;1&quot;) %&gt;% summarise(n = n()) # convert locality_id to factors dat.scaled$locality_id &lt;- as.factor(dat.scaled$locality_id) 9.2 Running a null model # All null models are stored in lists below all_null &lt;- list() # define species and a counter species &lt;- unique(dat.scaled$scientific_name) counter &lt;- 0 # Add a progress bar for the loop pb &lt;- txtProgressBar( min = 0, max = length(species), style = 3 ) # text based bar # loop over species for (i in species) { # filter data by species data &lt;- dat.scaled %&gt;% filter(scientific_name == i) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) # Set up the model # the list is now automatically named all_null[[i]] &lt;- occu(~1 ~ 1, data = occ_um) # increase counter counter &lt;- counter + 1 setTxtProgressBar(pb, counter) } close(pb) # Store all the model outputs for each species capture.output(all_null, file = &quot;data/results/null_models.csv&quot;) 9.3 Identifying covariates necessary to model the detection process Here, we use the unmarked package in R (Fiske and Chandler 2011) to identify detection level covariates that are important for each species. We use AIC criteria to select top models (Burnham et al. 2002; 2011). # All models are stored in lists below det_dred &lt;- list() # Subsetting those models whose deltaAIC&lt;4 (Burnham et al. 2011) top_det &lt;- list() # Getting model averaged coefficients and relative importance scores det_avg &lt;- list() det_imp &lt;- list() # Getting model estimates det_modelEst &lt;- list() # Add a progress bar for the loop pb &lt;- txtProgressBar( min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3 ) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) # Fit a global model with all detection level covariates global_mod &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + expertise ~ 1, data = occ_um) # Set up the cluster clusterType &lt;- if (length(find.package(&quot;snow&quot;, quiet = TRUE))) &quot;SOCK&quot; else &quot;PSOCK&quot; clust &lt;- try(makeCluster(getOption(&quot;cl.cores&quot;, 5), type = clusterType)) clusterEvalQ(clust, library(unmarked)) clusterExport(clust, &quot;occ_um&quot;) det_dred[[i]] &lt;- pdredge(global_mod, clust) names(det_dred)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Get the top models, which we&#39;ll define as those with deltaAICc &lt; 4 top_det[[i]] &lt;- get.models(det_dred[[i]], subset = delta &lt; 4, cluster = clust) names(top_det)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Obtaining model averaged coefficients if (length(top_det[[i]]) &gt; 1) { a &lt;- model.avg(top_det[[i]], fit = TRUE) det_avg[[i]] &lt;- as.data.frame(a$coefficients) names(det_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] det_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(a, full = T)[, 1], SE = coefTable(a, full = T)[, 2], lowerCI = confint(a)[, 1], upperCI = confint(a)[, 2], z_value = (summary(a)$coefmat.full)[, 3], Pr_z = (summary(a)$coefmat.full)[, 4] ) names(det_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] det_imp[[i]] &lt;- as.data.frame(MuMIn::importance(a)) names(det_imp)[i] &lt;- unique(dat.scaled$scientific_name)[i] } else { det_avg[[i]] &lt;- as.data.frame(unmarked::coef(top_det[[i]][[1]])) names(det_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lowDet &lt;- data.frame(lowerCI = confint(top_det[[i]][[1]], type = &quot;det&quot;)[, 1]) upDet &lt;- data.frame(upperCI = confint(top_det[[i]][[1]], type = &quot;det&quot;)[, 2]) zDet &lt;- data.frame(summary(top_det[[i]][[1]])$det[, 3]) Pr_zDet &lt;- data.frame(summary(top_det[[i]][[1]])$det[, 4]) Coefficient &lt;- coefTable(top_det[[i]][[1]])[, 1] SE &lt;- coefTable(top_det[[i]][[1]])[, 2] det_modelEst[[i]] &lt;- data.frame( Coefficient = Coefficient[2:length(Coefficient)], SE = SE[2:length(SE)], lowerCI = lowDet, upperCI = upDet, z_value = zDet, Pr_z = Pr_zDet ) names(det_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] } setTxtProgressBar(pb, i) stopCluster(clust) } close(pb) ## Storing output from the above models in excel sheets # 1. Store all the model outputs for each species (variable: det_dred() - see above) write.xlsx(det_dred, file = &quot;data/results/det-dred.xlsx&quot;) # 2. Store all the model averaged outputs for each species and the relative importance score write.xlsx(det_avg, file = &quot;data/results/det-avg.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(det_imp, file = &quot;data/results/det-imp.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(det_modelEst, file = &quot;data/results/det-modelEst.xlsx&quot;, rowNames = T, colNames = T) # Note if you are unable to write to a file, use (for example) a &lt;- purrr::map(det_imp, ~ purrr::compact(.)) %&gt;% purrr::keep(~ length(.) != 0) 9.4 Land Cover and Climate Occupancy models estimate the probability of occurrence of a given species while controlling for the probability of detection and allow us to model the factors affecting occurrence and detection independently. The flexible eBird observation process contributes to the largest source of variation in the likelihood of detecting a particular species; hence, we included seven covariates that influence the probability of detection for each checklist: ordinal day of year, duration of observation, distance traveled, protocol type, time observations started, number of observers and the checklist calibration index (CCI). Using a multi-model information-theoretic approach, we tested how strongly our occurrence data fit our candidate set of environmental covariates (Burnham et al. 2002). We fitted single-species occupancy models for each species, to simultaneously estimate a probability of detection (\\(\\p\\)) and a probability of occupancy (\\(\\psi\\)). For each species, we fit models, each with a unique combination of the climate and land cover occupancy covariates and all seven detection covariates. Across the models tested for each species, the model with highest support was determined using AICc scores. However, across the majority of the species, no single model had overwhelming support. Hence, for each species, we examined those models which had \\(\\Delta\\)AICc &lt; 4, as these top models were considered to explain a large proportion of the association between the species-specific probability of occupancy and environmental drivers (Burnham et al. 2011; Elsen et al. 2017). Using these restricted model sets for each species; we created a model-averaged coefficient estimate for each predictor and assessed its direction and significance. We considered a predictor to be significantly associated with occupancy if the range of the 95% confidence interval around the model-averaged coefficient did not contain zero. # All models are stored in lists below lc_clim &lt;- list() # Subsetting those models whose deltaAIC&lt;4 (Burnham et al. 2011) top_lc_clim &lt;- list() # Getting model averaged coefficients and relative importance scores lc_clim_avg &lt;- list() lc_clim_imp &lt;- list() # Storing Model estimates lc_clim_modelEst &lt;- list() # Add a progress bar for the loop pb &lt;- txtProgressBar(min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) model_lc_clim &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + expertise ~ lc_01 + lc_02 + lc_05 + lc_04 + lc_09 + lc_07 + lc_03 + bio4a + bio15a, data = occ_um) # Set up the cluster clusterType &lt;- if (length(find.package(&quot;snow&quot;, quiet = TRUE))) &quot;SOCK&quot; else &quot;PSOCK&quot; clust &lt;- try(makeCluster(getOption(&quot;cl.cores&quot;, 5), type = clusterType)) clusterEvalQ(clust, library(unmarked)) clusterExport(clust, &quot;occ_um&quot;) # Detection terms are fixed det_terms &lt;- c( &quot;p(duration_minutes)&quot;, &quot;p(effort_distance_km)&quot;, &quot;p(expertise)&quot;, &quot;p(julian_date)&quot;, &quot;p(min_obs_started)&quot;, &quot;p(number_observers)&quot; ) lc_clim[[i]] &lt;- pdredge(model_lc_clim, clust, fixed = det_terms) names(lc_clim)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Identiying top subset of models based on deltaAIC scores being less than 4 (Burnham et al., 2011) top_lc_clim[[i]] &lt;- get.models(lc_clim[[i]], subset = delta &lt; 4, cluster = clust) names(top_lc_clim)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Obtaining model averaged coefficients for both candidate model subsets if (length(top_lc_clim[[i]]) &gt; 1) { a &lt;- model.avg(top_lc_clim[[i]], fit = TRUE) lc_clim_avg[[i]] &lt;- as.data.frame(a$coefficients) names(lc_clim_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lc_clim_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(a, full = T)[, 1], SE = coefTable(a, full = T)[, 2], lowerCI = confint(a)[, 1], upperCI = confint(a)[, 2], z_value = (summary(a)$coefmat.full)[, 3], Pr_z = (summary(a)$coefmat.full)[, 4] ) names(lc_clim_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] lc_clim_imp[[i]] &lt;- as.data.frame(MuMIn::importance(a)) names(lc_clim_imp)[i] &lt;- unique(dat.scaled$scientific_name)[i] } else { lc_clim_avg[[i]] &lt;- as.data.frame(unmarked::coef(top_lc_clim[[i]][[1]])) names(lc_clim_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lowSt &lt;- data.frame(lowerCI = confint(top_lc_clim[[i]][[1]], type = &quot;state&quot;)[, 1]) lowDet &lt;- data.frame(lowerCI = confint(top_lc_clim[[i]][[1]], type = &quot;det&quot;)[, 1]) upSt &lt;- data.frame(upperCI = confint(top_lc_clim[[i]][[1]], type = &quot;state&quot;)[, 2]) upDet &lt;- data.frame(upperCI = confint(top_lc_clim[[i]][[1]], type = &quot;det&quot;)[, 2]) zSt &lt;- data.frame(z_value = summary(top_lc_clim[[i]][[1]])$state[, 3]) zDet &lt;- data.frame(z_value = summary(top_lc_clim[[i]][[1]])$det[, 3]) Pr_zSt &lt;- data.frame(Pr_z = summary(top_lc_clim[[i]][[1]])$state[, 4]) Pr_zDet &lt;- data.frame(Pr_z = summary(top_lc_clim[[i]][[1]])$det[, 4]) lc_clim_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(top_lc_clim[[i]][[1]])[, 1], SE = coefTable(top_lc_clim[[i]][[1]])[, 2], lowerCI = rbind(lowSt, lowDet), upperCI = rbind(upSt, upDet), z_value = rbind(zSt, zDet), Pr_z = rbind(Pr_zSt, Pr_zDet) ) names(lc_clim_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] } gc() setTxtProgressBar(pb, i) stopCluster(clust) } close(pb) # 1. Store all the model outputs for each species (for both landcover and climate) write.xlsx(lc_clim, file = &quot;data/results/lc-clim.xlsx&quot;) # 2. Store all the model averaged outputs for each species and relative importance scores write.xlsx(lc_clim_avg, file = &quot;data/results/lc-clim-avg.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(lc_clim_imp, file = &quot;data/results/lc-clim-imp.xlsx&quot;, rowNames = T, colNames = T) # 3. Store all model estimates write.xlsx(lc_clim_modelEst, file = &quot;data/results/lc-clim-modelEst.xlsx&quot;, rowNames = T, colNames = T) # Note if you are unable to write to a file, use (for example) a &lt;- purrr::map(lc_clim_modelEst, ~ purrr::compact(.)) %&gt;% purrr::keep(~ length(.) != 0) 9.5 Goodness-of-fit tests Adequate model fit was assessed using a chi-square goodness-of-fit test using 1,000 parametric bootstrap simulations on a global model that included all occupancy and detection covariates (MacKenzie &amp; Bailey, 2004). goodness_of_fit &lt;- data.frame() # Add a progress bar for the loop pb &lt;- txtProgressBar(min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;protocol_type&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) model_lc_clim &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + protocol_type + expertise ~ lc_01 + lc_02 + lc_05 + lc_04 + lc_09 + lc_07 + lc_03 + bio4a + bio15a, data = occ_um) # note: reduce nsim as this takes a very long time even with parallelization occ_gof &lt;- mb.gof.test(model_lc_clim, nsim = 1000, parallel = T, ncores = 5, plot.hist = FALSE ) p.value &lt;- occ_gof$p.value c.hat &lt;- occ_gof$c.hat.est scientific_name &lt;- unique(data$scientific_name) a &lt;- data.frame(scientific_name, p.value, c.hat) goodness_of_fit &lt;- rbind(a, goodness_of_fit) setTxtProgressBar(pb, i) } close(pb) write.csv(goodness_of_fit, &quot;data/results/goodness-of-fit-2.5km.csv&quot;, row.names = F) "],["visualizing-occupancy-predictor-effects.html", "Section 10 Visualizing Occupancy Predictor Effects 10.1 Prepare libraries 10.2 Load species list 10.3 Show AIC weight importance 10.4 Prepare model coefficient data 10.5 Get predictor effects", " Section 10 Visualizing Occupancy Predictor Effects In this section, we will visualize the magnitude and direction of species-specific probability of occupancy. 10.1 Prepare libraries # to load data library(readxl) # to handle data library(dplyr) library(readr) library(forcats) library(tidyr) library(purrr) library(stringr) # library(data.table) # to wrangle models source(&quot;R/fun_model_estimate_collection.r&quot;) source(&quot;R/fun_make_resp_data.r&quot;) # nice tables library(knitr) library(kableExtra) # plotting library(ggplot2) library(patchwork) source(&quot;R/fun_plot_interaction.r&quot;) 10.2 Load species list # list of species # Removing species after running a chi-square goodness of fit test species &lt;- read_csv(&quot;data/species_list.csv&quot;) %&gt;% filter(!scientific_name %in% c( &quot;Treron affinis&quot;, &quot;Prinia hodgsonii&quot;, &quot;Pellorneum ruficeps&quot;, &quot;Hypothymis azurea&quot;,&quot;Dendrocitta leucogastra&quot;,&quot;Chalcophaps indica&quot;, &quot;Rubigula gularis&quot;, &quot;Muscicapa dauurica&quot;, &quot;Geokichla citrina&quot;, &quot;Chrysocolaptes guttacristatus&quot;,&quot;Terpsiphone paradisi&quot;,&quot;Orthotomus sutorius&quot;, &quot;Oriolus kundoo&quot;, &quot;Dicrurus aeneus&quot;, &quot;Cyornis tickelliae&quot;, &quot;Copsychus fulicatus&quot;, &quot;Oriolus xanthornus&quot;, &quot;Alcippe poioicephala&quot;, &quot;Ficedula nigrorufa&quot;,&quot;Dendrocitta vagabunda&quot;, &quot;Dicrurus paradiseus&quot;, &quot;Ocyceros griseus&quot;, &quot;Psilopogon viridis&quot;, &quot;Psittacula cyanocephala&quot;)) list_of_species &lt;- as.character(species$scientific_name) 10.3 Show AIC weight importance To get cumulative AIC weights, we first obtained a measure of relative importance of climatic and landscape predictors by calculating cumulative variable importance scores. These scores were calculated by obtaining the sum of model weights (AIC weights) across all models (including the top models) for each predictor across all species. We then calculated the mean cumulative variable importance score and a standard deviation for each predictor. 10.3.1 Read in AIC weight data # which files to read file_names &lt;- c(&quot;data/results/lc-clim-imp.xlsx&quot;) # read in sheets by species model_imp &lt;- map(file_names, function(f) { md_list &lt;- map(list_of_species, function(sn) { # some sheets are not found tryCatch( { readxl::read_excel(f, sheet = sn) %&gt;% `colnames&lt;-`(c(&quot;predictor&quot;, &quot;AIC_weight&quot;)) %&gt;% filter(str_detect(predictor, &quot;psi&quot;)) %&gt;% mutate( predictor = stringr::str_extract(predictor, pattern = stringr::regex(&quot;\\\\((.*?)\\\\)&quot;) ), predictor = stringr::str_replace_all(predictor, &quot;[//(//)]&quot;, &quot;&quot;), predictor = stringr::str_remove(predictor, &quot;\\\\.y&quot;) ) }, error = function(e) { message(as.character(e)) } ) }) names(md_list) &lt;- list_of_species return(md_list) }) 10.3.2 Prepare cumulative AIC weight data # bind rows model_imp &lt;- map(model_imp, bind_rows) %&gt;% bind_rows() # convert to numeric model_imp$AIC_weight &lt;- as.numeric(model_imp$AIC_weight) # Let&#39;s get a summary of cumulative variable importance model_imp &lt;- group_by(model_imp, predictor) %&gt;% summarise( mean_AIC = mean(AIC_weight), sd_AIC = sd(AIC_weight), min_AIC = min(AIC_weight), max_AIC = max(AIC_weight), med_AIC = median(AIC_weight) ) # write to file write_csv(model_imp, &quot;data/results/cumulative_AIC_weights.csv&quot;) Read data back in. # read data and make factor model_imp &lt;- read_csv(&quot;data/results/cumulative_AIC_weights.csv&quot;) model_imp$predictor &lt;- as_factor(model_imp$predictor) # make nice names predictor_name &lt;- tibble( predictor = levels(model_imp$predictor), pred_name = c( &quot;Precipitation seasonality&quot;, &quot;Temperature seasonality&quot;, &quot;% Evergreen Forest&quot;, &quot;% Deciduous Forest&quot;, &quot;% Mixed/Degraded Forest&quot;, &quot;% Agriculture/Settlements&quot;, &quot;% Grassland&quot;, &quot;% Plantations&quot;, &quot;% Water Bodies&quot; ) ) # rename predictor model_imp &lt;- left_join(model_imp, predictor_name) Prepare figure for cumulative AIC weight. Figure code is hidden in versions rendered as HTML and PDF. 10.4 Prepare model coefficient data For each species, we examined those models which had AICc &lt; 4, as these top models were considered to explain a large proportion of the association between the species-specific probability of occupancy and environmental drivers. Using these restricted model sets for each species; we created a model-averaged coefficient estimate for each predictor and assessed its direction and significance. We considered a predictor to be significantly associated with occupancy if the range of the 95% confidence interval around the model-averaged coefficient did not contain zero. file_read &lt;- c(&quot;data/results/lc-clim-modelEst.xlsx&quot;) # read data as list column model_est &lt;- map(list_of_species, function(sn) { tryCatch( { readxl::read_excel(file_read, sheet = sn) %&gt;% rename(predictor = &quot;...1&quot;) %&gt;% filter(str_detect(predictor, &quot;psi&quot;)) %&gt;% mutate( predictor = stringr::str_extract(predictor, pattern = stringr::regex(&quot;\\\\((.*?)\\\\)&quot;) ), predictor = stringr::str_replace_all(predictor, &quot;[//(//)]&quot;, &quot;&quot;), predictor = stringr::str_remove(predictor, &quot;\\\\.y&quot;) ) }, error = function(e) { message(as.character(e)) } ) }) # assign names names(model_est) &lt;- list_of_species # prepare model data model_data &lt;- tibble( scientific_name = list_of_species ) # remove null data model_est &lt;- keep(model_est, .p = function(x) !is.null(x)) # rename model data components and separate predictors names &lt;- c( &quot;predictor&quot;, &quot;coefficient&quot;, &quot;se&quot;, &quot;ci_lower&quot;, &quot;ci_higher&quot;, &quot;z_value&quot;, &quot;p_value&quot; ) # get data for plotting: model_est &lt;- map(model_est, function(df) { colnames(df) &lt;- names # df &lt;- separate_interaction_terms(df) # df &lt;- make_response_data(df) return(df) }) # add names and scales model_est &lt;- imap(model_est, function(.x, .y) { mutate(.x, scientific_name = .y) }) # remove modulators model_est &lt;- bind_rows(model_est) %&gt;% dplyr::select(-matches(&quot;modulator&quot;)) # join data to species name model_data &lt;- model_data %&gt;% left_join(model_est) # Keep only those predictors whose p-values are significant: model_data &lt;- model_data %&gt;% filter(p_value &lt; 0.05) %&gt;% filter(predictor != &quot;Int&quot;) Export predictor effects. # get predictor effect data data_predictor_effect &lt;- distinct( model_data, scientific_name, se, predictor, coefficient ) # write to file write_csv(data_predictor_effect, &quot;data/results/data_predictor_effect.csv&quot;) Export model data. model_data_to_file &lt;- model_data %&gt;% dplyr::select( predictor, scientific_name ) # remove .y model_data_to_file &lt;- model_data_to_file %&gt;% mutate(predictor = str_remove(predictor, &quot;\\\\.y&quot;)) write_csv( model_data_to_file, &quot;data/results/data_occupancy_predictors.csv&quot; ) Read in data after clearing R session. # first merge species trait data with significant predictor species_trait &lt;- read.csv(&quot;data/species-trait-dat.csv&quot;) sig_predictor &lt;- read.csv(&quot;data/results/data_predictor_effect.csv&quot;) merged_species_traits &lt;- inner_join(sig_predictor, species_trait, by = c(&quot;scientific_name&quot; = &quot;scientific_name&quot;) ) write_csv( merged_species_traits, &quot;data/results/results-predictors-species-traits.csv&quot; ) # read from file model_data &lt;- read_csv(&quot;data/results/results-predictors-species-traits.csv&quot;) Fix predictor name. # remove .y from predictors model_data &lt;- model_data %&gt;% mutate_at(.vars = c(&quot;predictor&quot;), .funs = function(x) { stringr::str_remove(x, &quot;.y&quot;) }) 10.5 Get predictor effects # is the coeff positive? how many positive per scale per predictor per axis of split? # now splitting by habitat --- forest or open country data_predictor &lt;- mutate(model_data, direction = coefficient &gt; 0 ) %&gt;% filter(predictor != &quot;Int&quot;, predictor != &quot;Ibio4^2&quot; &amp; # If you had squared terms predictor != &quot;Ibio15^2&quot;) %&gt;% rename(habitat = &quot;Habitat.type&quot;) %&gt;% count( predictor, habitat, direction ) %&gt;% mutate(mag = n * (if_else(direction, 1, -1))) # wrangle data to get nice bars data_predictor &lt;- data_predictor %&gt;% dplyr::select(-n) %&gt;% drop_na(direction) %&gt;% mutate(direction = ifelse(direction, &quot;positive&quot;, &quot;negative&quot;)) %&gt;% pivot_wider(values_from = &quot;mag&quot;, names_from = &quot;direction&quot;) %&gt;% mutate_at( vars(positive, negative), ~ if_else(is.na(.), 0, .) ) data_predictor_long &lt;- data_predictor %&gt;% pivot_longer( cols = c(&quot;negative&quot;, &quot;positive&quot;), names_to = &quot;effect&quot;, values_to = &quot;magnitude&quot; ) # write write_csv( data_predictor_long, &quot;data/results/data_predictor_direction_nSpecies.csv&quot; ) Prepare data to determine the direction (positive or negative) of the effect of each predictor. How many species are affected in either direction? # join with predictor names and relative AIC data_predictor_long &lt;- left_join(data_predictor_long, model_imp) Prepare figure of the number of species affected in each direction. Figure code is hidden in versions rendered as HTML and PDF. Environmental predictors and species-specific associations The direction of association between species-specific probability of occupancy and climatic and landscape predictors is shown here (as a function of habitat preference). Blue colors show the number of species that are positively associated with a climatic/landscape predictor while red colors show the number of species that are negatively associated with a climatic/landscape predictor (see Table 1 for the number of forest/generalist species that show positive/negative association with each of the predictors). "],["predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html", "Section 11 Predicting species-specific occupancy as a function of significant predictors 11.1 Prepare libraries 11.2 Read data 11.3 Prepare predictor data 11.4 Get predictor responses 11.5 Get probability of occupancy 11.6 Add scaling for predictors 11.7 Mapping species occupancy", " Section 11 Predicting species-specific occupancy as a function of significant predictors This script plots species-specific probabilities of occupancy as a function of significant environmental predictors and maps occupancy across the study area for a given list of species and significant predictors. 11.1 Prepare libraries # to handle data library(dplyr) library(readr) library(tidyr) library(purrr) library(stringr) library(glue) # library(data.table) # plotting library(ggplot2) library(patchwork) 11.2 Read data # read coefficient effect data data &lt;- read_csv(&quot;data/results/data_predictor_effect.csv&quot;) # check for a predictor column assertthat::assert_that( all(c(&quot;predictor&quot;, &quot;coefficient&quot;, &quot;se&quot;) %in% colnames(data)), msg = &quot;make_response_data: data must have columns called &#39;predictor&#39;, &#39;coefficient&#39;, and &#39;se&#39;&quot; ) 11.3 Prepare predictor data # preparep predictors - now look only for any digits predictors &lt;- c(&quot;bio\\\\d+&quot;, glue(&quot;lc_0{seq(9)}&quot;)) # prepare predictor search strings and scaling power preds &lt;- glue(&quot;({predictors})&quot;) preds &lt;- str_flatten(preds, collapse = &quot;|&quot;) # some way of identifying square terms power &lt;- (str_extract(data$predictor, &quot;Ibio&quot;)) power[!is.na(power)] = 2 power[is.na(power)] &lt;- 1 power = as.numeric(power) # assign predictor name and power data &lt;- mutate( data, predictor = str_extract(predictor, preds), power = power ) 11.4 Get predictor responses # make predictor sequences data &lt;- mutate( data, pred_val = map( predictor, function(x) { seq(0, 1, 0.05) } ), #handle squared terms pred_val_pow = purrr::map2( pred_val, power, function(x, y) { x^y } )) # get coefficient and error times terms data_resp &lt;- mutate( data, response = map2( pred_val_pow, coefficient, function(x, y) { x * y } ), resp_var = map2( pred_val_pow, se, function(x, y) { x * y } ) ) 11.5 Get probability of occupancy # unnest and get responses data_resp &lt;- unnest( data_resp, cols = c(&quot;response&quot;, &quot;resp_var&quot;, &quot;pred_val&quot;) ) # get responses for quadratic terms data_resp &lt;- group_by( data_resp, scientific_name, predictor, pred_val ) %&gt;% dplyr::select(-power, -coefficient, -se) %&gt;% summarise( across( .cols = c(&quot;response&quot;, &quot;resp_var&quot;), .fns = sum ), .groups = &quot;keep&quot; ) # get probability of occupancy data_resp &lt;- ungroup( data_resp ) %&gt;% mutate( p_occu = 1 / (1 + exp(-response)), p_occu_low = 1 / (1 + exp(-(response - resp_var))), p_occu_high = 1 / (1 + exp(-(response + resp_var))) ) 11.6 Add scaling for predictors # scale predictors scale15 &lt;- c(30, 50) # range of precipitation scale4 &lt;- c(0, 1) # range of temperatures # scale bio4a and bio15a by actual values data_resp &lt;- mutate( data_resp, pred_val = case_when( predictor == &quot;bio4&quot; ~ scales::rescale(pred_val, to = scale4), predictor == &quot;bio15&quot; ~ scales::rescale(pred_val, to = scale15), T ~ pred_val ) ) # make long data_poccu &lt;- dplyr::select( data_resp, -response, -resp_var ) # select species soi &lt;- c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;, &quot;Merops leschenaulti&quot;,&quot;Myophonus horsfieldii&quot;) which_predictors &lt;- c(&quot;bio4&quot;) 11.6.1 Figure: Occupancy ~ predictors data_fig &lt;- data_poccu %&gt;% filter( scientific_name %in% soi, predictor %in% which_predictors ) %&gt;% mutate( cat = case_when( scientific_name %in% c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;) ~ &quot;forest&quot;, T ~ &quot;general&quot; ) ) # split data data_fig &lt;- nest( data_fig, -cat ) # make plots make_occu_fig &lt;- function(df, this_fill) { ggplot( df ) + geom_ribbon( aes( pred_val, ymin = p_occu_low, ymax = p_occu_high ), fill = this_fill, alpha = 0.5 ) + geom_line( aes( pred_val, p_occu ), size = 1 ) + facet_grid( ~scientific_name ) + theme_test( base_family = &quot;Arial&quot; ) + theme( strip.text = element_text( face = &quot;italic&quot; ) ) + labs( x = &quot;Temperature seasonality&quot;, y = &quot;Probability of occupancy&quot; ) } fig_occu &lt;- map2(data_fig$data, &quot;grey&quot;, make_occu_fig) fig_occu &lt;- wrap_plots( fig_occu[c(1, 2)], ncol = 1, nrow = 2 ) &amp; theme( plot.tag = element_text( face = &quot;bold&quot; ) ) # save figure ggsave( fig_occu, filename = &quot;figs/fig_05.png&quot;, width = 5, height = 5.5 ) Probability of occupancy as a function of temperature seasonality. Predicted probability of occupancy curves as a function of temperature seasonality for four forest species are shown here. Temperature seasonality is negatively associated with the probability of occupancy of several forest species including the asian fairy-bluebird Irena puella, the crimson-backed sunbird Leptocoma minima, the chestnut-headed bee-eater Merops leschenaulti and the Malabar whistling-thrush Myophonus horsfieldii. 11.6.2 Figures: Occupancy ~ predictors for all species data_fig &lt;- nest( data_poccu, -scientific_name, -predictor ) pred_names &lt;- c( &quot;bio4&quot; = &quot;Temp. seasonality&quot;, &quot;bio15&quot; = &quot;Precip. seasonality&quot;, &quot;lc_01&quot; = &quot;Evergreen&quot;, &quot;lc_02&quot; = &quot;Deciduous&quot;, &quot;lc_03&quot; = &quot;Mixed/degraded&quot;, &quot;lc_04&quot; = &quot;Agri./Settl.&quot;, &quot;lc_05&quot; = &quot;Grassland&quot;, &quot;lc_07&quot; = &quot;Plantation&quot;, &quot;lc_09&quot; = &quot;Water&quot; ) pred_names &lt;- tibble( name = pred_names, predictor = names(pred_names) ) data_fig &lt;- left_join( data_fig, pred_names ) data_fig &lt;- mutate( data_fig, plots = map( data, function(df) { ggplot(df) + geom_ribbon( aes( pred_val, ymin = p_occu_low, ymax = p_occu_high ), fill = &quot;grey&quot;, alpha = 0.5 ) + geom_line( aes( pred_val, p_occu ) ) + coord_cartesian( ylim = c(0, 1) ) + theme_test( base_family = &quot;Arial&quot; ) + labs( x = &quot;Predictor&quot;, y = &quot;p(Occupancy)&quot; ) } ) ) # add names data_fig &lt;- mutate( data_fig, plots = map2( plots, name, function(p, name) { p &lt;- p + labs( x = name ) } ) ) # summarise as patchwork data_fig &lt;- group_by( data_fig, scientific_name ) %&gt;% summarise( plots = list( wrap_plots( plots, ncol = 5 ) ) ) # add title as sp data_fig &lt;- mutate( data_fig, plots = map2( plots, scientific_name, function(p, name) { p &lt;- p &amp; plot_annotation( title = name ) } ) ) # save images cairo_pdf( filename = &quot;figs/fig_occupancy_predictors.pdf&quot;, onefile = TRUE, width = 10, height = 2 ) data_fig$plots dev.off() 11.7 Mapping species occupancy 11.7.1 Read in raster layers library(terra) library(sf) # read saved rasters lscape = rast(&quot;data/spatial/landscape_resamp01_km.tif&quot;) # isolate temperature and rainfall bio4 = lscape[[4]] bio15 = lscape[[5]] # rain # careful while loading this raster, large size landcover &lt;- rast(&quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;) lc_1km &lt;- rast(&quot;data/landUseClassification/lc_01000m.tif&quot;) 11.7.2 Split landcover into proportions per 1km # separate the fine-scale landcover raster into presence-absence of each class lc_split &lt;- segregate(landcover) # resample to 1km # bilinear resampling uses the mean function. # mean of N 0s and 1s is the proportion of 1s, ie, proportion of each landcover lc_split &lt;- terra::resample( lc_split, lc_1km, method = &quot;bilinear&quot; ) # rename rasters names(lc_split) &lt;- pred_names$name[-c(1, 2)] # save raster of landcover proportion terra::writeRaster( lc_split, filename = &quot;data/spatial/raster_landcover_proportion_1km.tif&quot;, overwrite=TRUE ) rm(landcover) gc() # plot proportion of landcover classes png(width = 1200 * 2, height = 1200 * 2, filename = &quot;figs/fig_landcover_proportion_1km.png&quot;, res = 300) plot( lc_split, col = colorspace::sequential_hcl(20, palette = &quot;Viridis&quot;), range = c(0, 1) ) dev.off() 11.7.3 Prepare climatic layers # load landcover split lc_split = terra::rast(&quot;data/spatial/raster_landcover_proportion_1km.tif&quot;) 11.7.4 Mask by study area # mask by hills # run only if required (makes more sense to map to a larger area) hills = st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) #bio_1 = terra::mask( # bio_1, # vect(hills) #) #bio_12 = terra::mask( # bio_12, # vect(hills) #) # get ranges range4 &lt;- terra::minmax(bio4)[, 1] range15 &lt;- terra::minmax(bio15)[, 1] # rescale bio4 &lt;- (bio4 - min(range4)) / (diff(range4)) bio15 &lt;- (bio15 - min(range15)) / (diff(range15)) # project to UTM climate &lt;- c(bio4, bio15) names(climate) = c( &quot;Temp. seasonality&quot;, &quot;Precip. seasonality&quot; ) climate &lt;- terra::project( x = climate, y = lc_1km ) # make squared terms climate2 &lt;- climate * climate # names names(climate2) = glue(&quot;{names(climate)} 2&quot;) # add to landcover proportions and plot landscape &lt;- c(climate, lc_split) 11.7.5 Plot full bounds of landscape variables # plot proportion of landcover classes png( width = 1200 * 2, height = 1200 * 2, filename = &quot;figs/fig_landscape_1km.png&quot;, res = 300 ) plot( landscape, col = colorspace::sequential_hcl(20, palette = &quot;agSunset&quot;, rev = T), range = c(0, 1) ) dev.off() # add squared terms landscape &lt;- c( climate, climate2, lc_split ) #landscape = terra::mask( # landscape, # vect(hills) #) 11.7.6 Prepare soi predictors Prepare the soi predictor coefficients as a vector of the same length as the number of raster layers. These will be multiplied with each layer to give the effect of each layer. # get soi coefs sp_coefs &lt;- filter( data ) %&gt;% dplyr::select( -pred_val, -pred_val_pow ) # add missing landcover classes sp_preds &lt;- crossing( scientific_name = soi, predictor = pred_names$predictor, power = c(1, 2) ) # remove squared terms for landcover sp_preds &lt;- filter( sp_preds, !(str_detect(predictor, &quot;lc&quot;) &amp; power == 2) ) # correct square LC terms sp_coefs = mutate( sp_coefs, power = if_else( str_detect(predictor, &quot;lc&quot;), 1, power ) ) sp_coefs &lt;- full_join( sp_coefs, sp_preds ) # make wide --- this should give no warnings sp_coefs &lt;- pivot_wider( sp_coefs, id_cols = c(&quot;scientific_name&quot;), names_from = c(&quot;predictor&quot;, &quot;power&quot;), values_from = &quot;coefficient&quot; ) # get into order sp_coefs &lt;- dplyr::select( sp_coefs, scientific_name, c( &quot;bio4_1&quot;, &quot;bio15_1&quot;, &quot;bio4_2&quot;, &quot;bio15_2&quot; ), matches(&quot;lc&quot;) ) # get vectors of coefficients sp_coefs &lt;- nest( sp_coefs, -scientific_name ) 11.7.7 Prepare species occupancy for SOI Here, we shall simply multiply each landscape layer with the corresponding predictor coefficient. Where these are not available, we shall simply multiply the corresponding layer with NA. The resulting layers will be summed together to get a single response layer, which will then be inverse logit transformed to get the probability of occupancy. # multiply coefficients with layers soi_occu &lt;- map( sp_coefs[sp_coefs$scientific_name %in% soi, ]$data, .f = function(df) { response &lt;- unlist(slice(df, 1), use.names = F) * landscape response &lt;- sum(response, na.rm = TRUE) # remove NA layers, i.e., non-sig preds # now transform for probability occupancy response &lt;- 1 / (1 + exp(-response)) } ) # assign names names(soi_occu) &lt;- soi # make single stack soi_occu &lt;- reduce(soi_occu, c) names(soi_occu) &lt;- c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;,&quot;Merops leschenaulti&quot;,&quot;Myophonus horsfieldii&quot;) # use stars for plotting with ggplot library(stars) library(colorspace) soi_occu &lt;- st_as_stars(soi_occu) fig_occu_map &lt;- ggplot() + geom_stars( data = soi_occu ) + scale_fill_binned_sequential( palette = &quot;Purple-Yellow&quot;, name = &quot;Probability of Occupancy&quot;, rev = T, limits = c(0, 1), breaks = seq(0, 1, 0.1), na.value = &quot;grey99&quot;, show.limits = T ) + facet_wrap( ~band, labeller = labeller( band = function(x) str_replace(x, &quot;\\\\.&quot;, &quot; &quot;) ) ) + coord_sf( crs = 32643, expand = FALSE ) + theme_test() + theme( # legend.position = &quot;rg&quot;, axis.title = element_blank(), axis.text = element_blank(), legend.key.height = unit(10, &quot;mm&quot;), legend.key.width = unit(1, &quot;mm&quot;), strip.text = element_text( face = &quot;italic&quot; ), legend.title = element_text( vjust = 1 ) ) # save figure ggsave( fig_occu_map, filename = &quot;figs/fig_06.png&quot;, width = 6, height = 6 ) Predicted area of occurrence Predicted area of occurrence for four forest species are shown here. The probability of occupancy of the asian fairy-bluebird Irena puella, the crimson-backed sunbird Leptocoma minima and the chestnut-headed bee-eater Merops leschenaulti is higher across the western slopes and at mid-elevations across our study area. The Malabar whistling-thrush Myophonus horsfieldii has a higher probability of occupancy across mid-elevations throughout the study area examined. 11.7.8 Prepare species occupancy for all species # multiply coefficients with layers sp_occu &lt;- map( sp_coefs$data, .f = function(df) { response &lt;- unlist(slice(df, 1), use.names = F) * landscape response &lt;- sum(response, na.rm = TRUE) # remove NA layers, i.e., non-sig preds # now transform for probability occupancy response &lt;- 1 / (1 + exp(-response)) } ) # make single stack sp_occu &lt;- reduce(sp_occu, c) # assign names names(sp_occu) &lt;- sp_coefs$scientific_name # use stars for plotting with ggplot library(stars) library(colorspace) sp_occu &lt;- st_as_stars(sp_occu) fig_occu_map_all &lt;- ggplot() + geom_stars( data = sp_occu ) + scale_fill_binned_sequential( palette = &quot;Purple-Yellow&quot;, name = &quot;p(Occu.)&quot;, rev = T, limits = c(0, 1), na.value = &quot;grey99&quot;, breaks = seq(0, 1, 0.1), show.limits = T ) + facet_wrap( ~band, labeller = labeller( band = function(x) str_replace(x, &quot;\\\\.&quot;, &quot; &quot;) ) ) + coord_sf( crs = 32643, expand = FALSE ) + theme_test( base_size = 8 ) + theme( # legend.position = &quot;rg&quot;, axis.title = element_blank(), axis.text = element_blank(), legend.key.height = unit(10, &quot;mm&quot;), legend.key.width = unit(1, &quot;mm&quot;), strip.text = element_text( face = &quot;italic&quot; ), strip.background = element_blank(), legend.title = element_text( vjust = 1 ) ) # save figure ggsave( fig_occu_map_all, filename = &quot;figs/fig_occupancy_maps.png&quot;, width = 16, height = 16 ) "],["references.html", "Section 12 References", " Section 12 References Aiello-Lammens, M. E. et al. 2015. spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. - Ecography 38: 541545. Ali, S. and Ripley, S. D. 1983. Handbook of the birds of India and Pakistan. - Oxford University Press. Anand, M. O. et al. 2010. Sustaining biodiversity conservation in human-modified landscapes in the Western Ghats: Remnant forests matter. - Biological Conservation 143: 23632374. Arasumani, M. et al. 2018. Not seeing the grass for the trees: plantations and agriculture shrink tropical montane grassland by two-thirds over four decades in the Palani Hills, a Western Ghats Sky Island. - PloS ONE 13: 118. Barto, K. 2009. MuMIn: multi-model inference. Barve, S. et al. 2021. Elevation and body size drive convergent variation in thermoinsulative feather structure of Himalayan birds. - Ecography 44: 680689. Boyle, W. A. et al. 2020. Hygric Niches for Tropical Endotherms. - Trends in Ecology and Evolution xx: 115. Burnham, K. P. and Anderson, D. R. 2002. Model selection and multimodel inference: a practical information-theoretic approach. - Springer. Burnham, K. P. et al. 2011. AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. - Behavioral Ecology and Sociobiology 65: 2335. Butt, N. et al. 2015. Cascading effects of climate extremes on vertebrate fauna through changes to low-latitude tree flowering and fruiting phenology. - Global Change Biology 21: 32673277. Chan, W.-P. et al. 2016. Seasonal and daily climate variation have opposite effects on species elevational range size. - Science 351: 14371439. Das, A. et al. 2006. Prioritisation of conservation areas in the Western Ghats, India. - Biological Conservation 133: 1631. Davies, R. G. et al. 2007. Topography, energy and the global distribution of bird species richness. - Proceedings of the Royal Society B: Biological Sciences 274: 11891197. Deutsch, C. A. et al. 2008. Impacts of climate warming on terrestrial ectotherms across latitude. - Proceedings of the National Academy of Sciences of the United States of America 105: 66686672. Devictor, V. et al. 2010. Beyond scarcity: citizen science programmes as useful tools for conservation biogeography: Citizen science and conservation biogeography. - Diversity and Distributions 16: 354362. Ellwood, E. R. et al. 2017. Citizen science and conservation: Recommendations for a rapidly moving field. - Biological Conservation 208: 14. Elsen, P. R. et al. 2017. The role of competition, ecotones, and temperature in the elevational distribution of Himalayan birds. - Ecology 98: 337348. Elsen, P. R. et al. 2018. Conserving Himalayan birds in highly seasonal forested and agricultural landscapes. - Conservation Biology 32: 13131324. Elsen, P. R. et al. 2020. Topography and human pressure in mountain ranges alter expected species responses to climate change. - Nature Communications 11: 110. Fink, D. et al. 2014. Crowdsourcing Meets Ecology: Distribution Models. - Association for the Advancement of Artificial Intelligence: 1930. Fiske, I. J. and Chandler, R. B. 2011. Unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. - Journal of Statistical Software 43: 123. Freeman, B. G. et al. 2018. Climate change causes upslope shifts and mountaintop extirpations in a tropical bird community. - Proceedings of the National Academy of Sciences 115: 1198211987. Frishkoff, L. O. et al. 2016. Climate change and habitat conversion favour the same species. - Ecology letters 19: 10811090. Gadgil, M. and Meher-Homji, V. 1986. Localities of great significance to conservation of Indias biological diversity. - Proceedings of the Indian Academy of Sciences: 165180. Guo, F. et al. 2018. Land-use change interacts with climate to determine elevational species redistribution. - Nature Communications 2018 9:1 9: 13151315. Jankowski, J. E. et al. 2013. Exploring the role of physiology and biotic interactions in determining elevational ranges of tropical animals. - Ecography 36: 112. Janzen, D. H. 1967. Why Mountain Passes are Higher in the Tropics. - The American naturalist 101: 233249. Johnston, A. et al. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. - Ecological Applications 25: 17491756. Johnston, A. et al. 2018. Estimates of observer expertise improve species distributions from citizen science data. - Methods in Ecology and Evolution 9: 8897. Johnston, A. et al. 2021. Analytical guidelines to increase the value of community science data: An example using eBird data to estimate species distributions (Y Fourcade, Ed.). - Divers Distrib 27: 12651277. Karanth, K. K. et al. 2016. Producing Diversity: Agroforests Sustain Avian Richness and Abundance in Indias Western Ghats. - Frontiers in Ecology and Evolution 4: 110. Karger, D. N. et al. 2017. Climatologies at high resolution for the earths land surface areas. - Scientific Data 4: 120. Kelling, S. et al. 2015. Can observation skills of citizen scientists be estimated using species accumulation curves? - PLoS ONE 10: 120. Kelling, S. et al. 2019. Using Semistructured Surveys to Improve Citizen Science Data for Monitoring Biodiversity. - BioScience 69: 170179. Kennedy, C. M. et al. 2011. Landscape matrix mediates occupancy dynamics of Neotropical avian insectivores. - Ecological Applications 21: 18371850. La Sorte, F. A. and Jetz, W. 2010. Projected range contractions of montane biodiversity under global warming. - Proceedings of the Royal Society B: Biological Sciences 277: 34013410. Loiselle, B. A. and Blake, J. G. 1991. Temporal Variation in Birds and Fruits Along an Elevational Gradient in Costa Rica. - Ecology 72: 180193. MacKenzie, D. I. and Bailey, L. L. 2004. Assessing the fit of site-occupancy models. - Journal of Agricultural, Biological, and Environmental Statistics 9: 300318. Mackenzie, D. I. et al. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. - Ecology 83: 119. MacKenzie, D. et al. 2017. Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. - Academic Press. Mani, M. S. 1974. Ecology and Biogeography in India. - Springer Netherlands. McGill, B. J. et al. 2006. Rebuilding community ecology from functional traits. - Trends in Ecology and Evolution 21: 178185. Myers, N. et al. 2000. Biodiversity hotspots for conservation priorities. - Nature 403: 853858. Newbold, T. et al. 2015. Global effects of land use on local terrestrial biodiversity. - Nature 520: 4550. Nogués-Bravo, D. et al. 2007. Exposure of global mountain systems to climate warming during the 21st Century. - Global Environmental Change 17: 420428. Nowakowski, A. J. et al. 2018. Changing Thermal Landscapes: Merging Climate Science and Landscape Ecology through Thermal Biology. - Current Landscape Ecology Reports 3: 5772. ODonnell, M. S. and Ignizio, D. A. 2012. Bioclimatic predictors for supporting ecological applications in the conterminous United States. OpenStreetMap contributors 2017. Planet dump retrieved from https://planet.osm.org. in press. Pascal, J. 1988. Wet evergreen forests of the Western Ghats of India: Ecology, structure, floristic composition and succession (Travaux de la Section scientifique et technique). - Institut Francais de Pondicherry. Payne, D. et al. 2017. Opportunities for research on mountain biodiversity under global change. - Current Opinion in Environmental Sustainability 29: 4047. Perez, T. M. et al. 2016. Thermal trouble in the tropics. - Science 351: 13921393. Peters, M. K. et al. 2019. Climateland-use interactions shape tropical mountain biodiversity and ecosystem functions. - Nature 568: 8892. Pigot, A. L. et al. 2020. Macroevolutionary convergence connects morphological form to ecological function in birds. - Nature Ecology and Evolution 4: 230239. Praveen J 2017. On the geo-precision of data for modelling home range of a speciesA commentary on Ramesh et al. (2017). - Biological Conservation. Praveen J 2021. Kerala Bird Atlas 2015-2020: features, outcomes and implications of a citizen-science project. - Current Science 122: 298309. Quintero, I. and Jetz, W. 2018. Global elevational diversity and diversification of birds. - Nature 555: 246250. R Core Team 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Rahbek, C. et al. 2019. Humboldts enigma: What causes global patterns of mountain biodiversity? - Science 365: 11081113. Rajendran, K. et al. 2012. Monsoon circulation interaction with Western Ghats orography under changing climate: Projection by a 20-km mesh AGCM. - Theor Appl Climatol 110: 555571. Raman, T. R. S. 2006. Effects of habitat structure and adjacent habitats on birds in tropical rainforest fragments and shaded plantations in the Western Ghats, India. - Biodiversity and Conservation 15: 15771607. Raman, T. R. S. et al. 2021. Native shade trees aid bird conservation in tea plantations in southern India. - CURRENT SCIENCE 121: 12. Ranganathan, J. et al. 2010. Landscape-level effects on avifauna within tropical agriculture in the Western Ghats: Insights for management and conservation. - Biological Conservation 143: 29092917. Robin, V. V. et al. 2015. Islands within islands: two montane palaeo-endemic birds impacted by recent anthropogenic fragmentation. - Molecular ecology 24: 35723584. Robinson, O. J. et al. 2020. Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution models (L Maiorano, Ed.). - Divers Distrib 26: 976986. Roy, P. S. et al. 2015. Development of decadal (1985-1995-2005) land use and land cover database for India. - Remote Sensing 7: 24012430. ekerciolu, çaan H. et al. 2012. The effects of climate change on tropical birds. - Biological Conservation 148: 118. Senior, R. A. et al. 2017. A pantropical analysis of the impacts of forest degradation and conversion on local temperature. - Ecology and Evolution 7: 78977908. Sidhu, S. et al. 2010. Effects of plantations and home-gardens on tropical forest bird communities and mixed-species bird flocks in the southern Western Ghats. - Journal of the Bombay Natural History Society 107: 9191. Sirami, C. et al. 2017. Impacts of global change on species distributions: obstacles and solutions to integrate climate and land use. - Global Ecology and Biogeography 26: 385394. SoIB 2020. State of Indias Birds, 2020: Range, trends and conservation status. Sreekar, R. et al. 2013. Natural Windbreaks Sustain Bird Diversity in a Tea-Dominated Landscape. - PLoS ONE 8: 411. Srinivasan, U. and Wilcove, D. S. 2020. Interactive impacts of climate change and landuse change on the demography of montane birds. - Ecology. Srinivasan, U. et al. 2018. Temperature and competition interact to structure himalayan bird communities. - Proceedings of the Royal Society B: Biological Sciences. Srinivasan, U. et al. 2019. Annual temperature variation influences the vulnerability of montane bird communities to land-use change. - Ecography 42: 20842094. Steen, V. A. et al. 2021. Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data (J McPherson, Ed.). - Methods Ecol Evol 12: 216226. Stevens, G. C. 1989. The Latitudinal Gradient in Geographical Range: How so Many Species Coexist in the Tropics. - The American Naturalist 133: 240256. Sullivan, B. L. et al. 2009. eBird: A citizen-based bird observation network in the biological sciences. - Biological Conservation 142: 22822292. Sullivan, B. L. et al. 2014. The eBird enterprise: An integrated approach to development and application of citizen science. - Biological Conservation 169: 3140. Sunarto, S. et al. 2012. Tigers need cover: Multi-scale occupancy study of the big cat in Sumatran forest and plantation landscapes. - PLoS ONE. Tewksbury, J. J. et al. 2008. Putting the Heat on Tropical Animals. - Science (New York, N.Y.) 320: 12961297. Tingley, M. W. et al. 2009. Birds track their Grinnellian niche through a century of climate change. - Proceedings of the National Academy of Sciences 106: 1963719643. Tsai, P. Y. et al. 2020. New insights into the patterns and drivers of avian altitudinal migration from a growing crowdsourcing data source. - Ecography: 112. Urban, M. C. 2018. Escalator to extinction. - Proc Natl Acad Sci USA 115: 1187111873. van Strien, A. J. et al. 2013. Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. - Journal of Applied Ecology 50: 14501458. Vijayakumar, S. P. et al. 2016. Glaciations, gradients, and geography: multiple drivers of diversification of bush frogs in the Western Ghats Escarpment. - Proceedings of the Royal Society B: Biological Sciences 283: 2016101120161011. Viswanathan, A. et al. 2020. State of Indias Birds 2020: Background and Methodology.: 136. Williams, S. E. and Middleton, J. 2008. Climatic seasonality, resource bottlenecks, and abundance of rainforest birds: implications for global climate change: Birds, seasonality and climate change. - Diversity and Distributions 14: 6977. Wood, C. et al. 2011. eBird: Engaging Birders in Science and Conservation. - PLoS Biol 9: e1001220. Yalcin, S. and Leroux, S. J. 2018. An empirical test of the relative and combined effects of land-cover and climate change on local colonization and extinction. - Global Change Biology 24: 38493861. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
+[["index.html", "Source code and supplementary material for Using citizen science to parse climatic and landcover influences on bird occupancy within a tropical biodiversity hotspot Section 1 Introduction 1.1 Data processing", " Source code and supplementary material for Using citizen science to parse climatic and landcover influences on bird occupancy within a tropical biodiversity hotspot 2022-04-28 Section 1 Introduction This is the readable version containing analysis that models associations between environmental predictors (climate and landcover) and citizen science observations of birds across the Nilgiri and Anamalai Hills of the Western Ghats Biodiversity hotspot. Methods and format are derived from https://cornelllabofornithology.github.io/ebird-best-practices/. 1.1 Data processing The data processing for this project is described in the following sections. Navigate through them using the links in the sidebar. The Nilgiri and Anamalai Hills in southern India provide a convenient geography for studying the interplay of land cover and climate on the distributions of bird species. (a) The Nilgiri and Anamalai Hills of the Southern Western Ghats are topographically complex, with maximum elevations &gt; 2,000 m, and are separated by the very low-lying Palghat Gap, which serves as a natural barrier to the dispersal of many hill birds. (b) Lower elevations are primarily covered by agriculture and settlements, reflecting the intense human pressure on this region, while mid- and higher elevations show a mix of natural and human-modified land cover types (see Fig. 2 for details). (c) The coastal edge of the area, and the windward hill slopes show limited temperature seasonality across the December  May period; this seasonality increases with distance from the coast but is lower at higher elevations inland. (d) Higher elevations also show limited precipitation seasonality than both low-lying coastal and inland regions. Our study area (bounds shown as dashed lines) includes multiple combinations of elevation, land cover type, and temperature and rainfall seasonality, resulting in a naturally occurring crossed-factorial design that allows us to study the effects of climate and land cover on bird occupancy. Representative forest-restricted and habitat-generalist birds from the study area are shown between panels (all images were obtained from Wikimedia commons and credit is assigned for each species in brackets); From L to R: (1) Malabar grey hornbill (by Koshy), (2) Crimson-backed sunbird (by Mandar Godbole), (3) Asian emerald dove (by Selvaganesh), (4) Black-and-orange flycatcher (by LKanth), (5) Grey-headed canary flycatcher (by David Raju), (6) Greater-racket tailed drongo (by MD Shahanshah Bappy), (7) Eurasian hoopoe (by Zeynel cebeci), (8) Chestnut-headed bee-eater (by MikeBirds), (9) Coppersmith barbet (by Raju Kasambe), (10) Red-vented bulbul (by TR Shankar Raman), (11) Pied bushchat (by TR Shankar Raman), (12) Ashy prinia (by Rison Thumboor). Elevation is from 30 m resolution SRTM data (Farr et al. 2007), land cover, at 1 km resolution, is reclassified from Roy et al. (2015), while climatic variation is represented by CHELSA seasonality layers (temperature: BIOCLIM 4a, rainfall: BIOCLIM 15), at 1km resolution (Karger et al. 2017). All layers were resampled to 1 km resolution for analyses. "],["selecting-species-of-interest.html", "Section 2 Selecting species of interest 2.1 Prepare libraries 2.2 Subset species by geographical confines of the study area 2.3 Subset an initial list of terrestrial birds based on a) minimum of 1000 detections between 2013-2021 and b) remove species that are often easily confused with congeners 2.4 Read subset of species following filtering and removal of waterbirds, raptors, and other noctural species 2.5 Load raw data for locations 2.6 Get proportional obs counts in 25km cells 2.7 Which species are reported sufficiently in checklists? 2.8 Figure: Checklist distribution 2.9 Prepare the species list", " Section 2 Selecting species of interest Prior to preparing eBird data for occupancy modeling, we selected a list of species using a simple and objective criteria. Our primary focus is to understand how terrestrial bird species occupancy (largely passerine species) varied as a function of climate and land cover across the Nilgiri and the Anamalai hills of the Western Ghats. We derived this list from inclusion criteria adapted from the State of Indias Birds 2020. Initially, we considered all species reported on eBird that occurred within the outlines of our study area. We then added a filter to consider only terrestrial birds and removed species that are often easily confused for their congeners (eg. green/greenish warbler). In addition, we considered only those species that had a minimum of 1000 detections each between 2013 and 2021. Next, the study area was divided into 25 x 25 km cells following Viswanathan et al. (2020). We then kept only those species that occurred in at least 5% of all checklists across half of the 25 x 25 km cells from where they have been reported (there are 42 unique 25 x 25 km grid cells across our study area). We used the above criteria to ensure as much uniform sampling of a species as possible across our study area and to reduce any erroneous associations between environmental drivers and species occupancy. This resulted in a total of 79 species, prior to occupancy modeling. This script shows the proportion of checklists that report a particular species across every 25km by 25km grid across the Nilgiris and the Anamalais. Using this analysis, we arrived at a final list of species for occupancy modeling. 2.1 Prepare libraries # load libraries library(data.table) library(readxl) library(magrittr) library(stringr) library(dplyr) library(tidyr) library(readr) library(ggplot2) library(ggthemes) library(scico) # round any function round_any &lt;- function(x, accuracy = 25000) { round(x / accuracy) * accuracy } # set file paths for auk functions # To use these two datasets, please download the latest versions from https://ebird.org/data/download and set the file path accordingly. Since these two datasets are extremely large, we have not uploaded the same to github. # In this study, the version of data loaded corresponds to November 2021. f_in_ebd &lt;- file.path(&quot;data/ebd_IN_relNov-2021.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/ebd_sampling_relNov-2021.txt&quot;) 2.2 Subset species by geographical confines of the study area # read in shapefile of the study area to subset by bounding box library(sf) wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) box &lt;- st_bbox(wg) # read in data and subset # To access the latest dataset, please visit: https://ebird.org/data/download and set the file path accordingly. ebd &lt;- fread(&quot;data/ebd_IN_relNov-2021.txt&quot;) ebd &lt;- ebd[between(LONGITUDE, box[&quot;xmin&quot;], box[&quot;xmax&quot;]) &amp; between(LATITUDE, box[&quot;ymin&quot;], box[&quot;ymax&quot;]), ] ebd &lt;- ebd[year(`OBSERVATION DATE`) &gt;= 2013, ] # make new column names newNames &lt;- str_replace_all(colnames(ebd), &quot; &quot;, &quot;_&quot;) %&gt;% str_to_lower() setnames(ebd, newNames) # keep useful columns columnsOfInterest &lt;- c( &quot;common_name&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;sampling_event_identifier&quot; ) ebd &lt;- ebd[, ..columnsOfInterest] 2.3 Subset an initial list of terrestrial birds based on a) minimum of 1000 detections between 2013-2021 and b) remove species that are often easily confused with congeners # Convert all presences marked &#39;X&#39; as &#39;1&#39; ebd &lt;- ebd %&gt;% mutate(observation_count = ifelse(observation_count == &quot;X&quot;, &quot;1&quot;, observation_count )) # Convert observation count to numeric ebd$observation_count &lt;- as.numeric(ebd$observation_count) totCount &lt;- ebd %&gt;% dplyr::select(scientific_name, common_name, observation_count) %&gt;% group_by(scientific_name, common_name) %&gt;% summarise(tot = sum(observation_count)) # subset species with a min of 1000 detections tot1000 &lt;- totCount %&gt;% filter(tot &gt; 1000) species1000 &lt;- tot1000$scientific_name ebd1000 &lt;- ebd[scientific_name %in% species1000, ] # Beginning with 3.37 million observations of 684 species in eBird that occurred within the outlines of our study area (Fig. 1), over the years 20132021, we retained only those species that had a minimum of 1,000 detections each between 2013 and 2021 (347 species remaining; 3.33 million observations). Next, we divided the study area into 25x25 km cells following State of Indias Birds 2020 methodology. We kept only those species that occurred in at least 5% of all checklists across half of the grids (42 unique grid cells) from which they had been reported. # export the above list as .csv to carry out initial filtering based on natural history write.csv(totCount, &quot;data/species-list.csv&quot;, row.names = F) 2.4 Read subset of species following filtering and removal of waterbirds, raptors, and other noctural species # add species of interest # please note the below script is obtained after manual subsetting based on natural history and hence the user is asked to examine the dataset obtained in the previous time step prior to further processing specieslist &lt;- read.csv(&quot;data/species-list.csv&quot;) speciesOfInterest &lt;- specieslist$scientific_name 2.5 Load raw data for locations Add a spatial filter and assign grids of 25km x 25km. # strict spatial filter and assign grid locs &lt;- ebd[, .(longitude, latitude)] # transform to UTM and get 25km boxes coords &lt;- setDF(locs) %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% bind_cols(as.data.table(st_coordinates(.))) %&gt;% st_transform(32643) %&gt;% mutate(id = 1:nrow(.)) # convert wg to UTM for filter wg &lt;- st_transform(wg, 32643) coords &lt;- coords %&gt;% filter(id %in% unlist(st_contains(wg, coords))) %&gt;% rename(longitude = X, latitude = Y) %&gt;% bind_cols(as.data.table(st_coordinates(.))) %&gt;% st_drop_geometry() %&gt;% as.data.table() # remove unneeded objects rm(locs) gc() coords &lt;- coords[, .N, by = .(longitude, latitude, X, Y)] ebd &lt;- merge(ebd, coords, all = FALSE, by = c(&quot;longitude&quot;, &quot;latitude&quot;)) ebd &lt;- ebd[(longitude %in% coords$longitude) &amp; (latitude %in% coords$latitude), ] 2.6 Get proportional obs counts in 25km cells # round to 25km cell in UTM coords ebd[, `:=`(X = round_any(X), Y = round_any(Y))] # count checklists in cell ebd_summary &lt;- ebd[, nchk := length(unique(sampling_event_identifier)), by = .(X, Y) ] # count checklists reporting each species in cell and get proportion ebd_summary &lt;- ebd_summary[, .(nrep = length(unique( sampling_event_identifier ))), by = .(X, Y, nchk, scientific_name) ] ebd_summary[, p_rep := nrep / nchk] # filter for soi ebd_summary &lt;- ebd_summary[scientific_name %in% speciesOfInterest, ] # complete the dataframe for no reports # keep no reports as NA --- allows filtering based on proportion reporting ebd_summary &lt;- setDF(ebd_summary) %&gt;% complete( nesting(X, Y), scientific_name # , # fill = list(p_rep = 0) ) %&gt;% filter(!is.na(p_rep)) 2.7 Which species are reported sufficiently in checklists? # A total of 42 unique grids (of 25km by 25km) across the study area # total number of checklists across unique grids tot_n_chklist &lt;- ebd_summary %&gt;% distinct(X, Y, nchk) # species-specific number of grids spp_grids &lt;- ebd_summary %&gt;% group_by(scientific_name) %&gt;% distinct(X, Y) %&gt;% count(scientific_name, name = &quot;n_grids&quot; ) # Write the above two results write_csv(tot_n_chklist, &quot;data/01_nchk-per-grid.csv&quot;) write_csv(spp_grids, &quot;data/01_ngrids-per-spp.csv&quot;) # left-join the datasets ebd_summary &lt;- left_join(ebd_summary, spp_grids, by = &quot;scientific_name&quot;) # check the proportion of grids across which this cut-off is met for each species # Is it &gt; 90% or 70%? # For example, with a 3% cut-off, ~100 species are occurring in &gt;50% # of the grids they have been reported in p_cutoff &lt;- 0.05 # Proportion of checklists a species has been reported in grid_proportions &lt;- ebd_summary %&gt;% group_by(scientific_name) %&gt;% tally(p_rep &gt;= p_cutoff) %&gt;% mutate(prop_grids_cut = n / (spp_grids$n_grids)) %&gt;% arrange(desc(prop_grids_cut)) grid_prop_cut &lt;- filter( grid_proportions, prop_grids_cut &gt;= p_cutoff ) # Write the results write_csv(grid_prop_cut, &quot;data/01_prop-grids-per-spp.csv&quot;) # Identifying the number of species that occur in potentially &lt;5% of all lists total_number_lists &lt;- sum(tot_n_chklist$nchk) spp_sum_chk &lt;- ebd_summary %&gt;% distinct(X, Y, scientific_name, nrep) %&gt;% group_by(scientific_name) %&gt;% mutate(sum_chk = sum(nrep)) %&gt;% distinct(scientific_name, sum_chk) # Approximately 90 to 100 species occur in &gt;5% of all checklists prop_all_lists &lt;- spp_sum_chk %&gt;% mutate(prop_lists = sum_chk / total_number_lists) %&gt;% arrange(desc(prop_lists)) 2.8 Figure: Checklist distribution # add land library(rnaturalearth) land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) # crop land land &lt;- st_transform(land, 32643) Proportion of checklists reporting a species in each grid cell (25km side) between 2013 and 2021. Checklists were filtered to be within the boundaries of the Nilgiris and the Anamalai hills (black outline), but rounding to 25km cells may place cells outside the boundary. Deeper shades of red indicate a higher proportion of checklists reporting a species. 2.9 Prepare the species list # write the new list of species that occur in at least 5% of checklists across a minimum of 50% of the grids they have been reported in new_sp_list &lt;- semi_join(specieslist, grid_prop_cut, by = &quot;scientific_name&quot;) write_csv(new_sp_list, &quot;results/01_list-of-species-cutoff.csv&quot;) "],["preparing-ebird-data.html", "Section 3 Preparing eBird Data 3.1 Prepare libraries and data sources 3.2 Filter data 3.3 Process filtered data 3.4 Spatial filter 3.5 Handle presence data 3.6 Add decimal time", " Section 3 Preparing eBird Data 3.1 Prepare libraries and data sources Here, we will load the necessary libraries required for preparing the eBird data. Please download the latest versions of the eBird Basic Dataset (for India) and the eBird Sampling dataset from https://ebird.org/data/download. # load libraries library(tidyverse) library(readr) library(sf) library(auk) library(readxl) library(lubridate) # custom sum function sum.no.na &lt;- function(x) { sum(x, na.rm = T) } # set file paths for auk functions # To use these two datasets, please download the latest versions from https://ebird.org/data/download and set the file path accordingly. Since these two datasets are extremely large, we have not uploaded the same on github. # In this study, the version of data loaded corresponds to November 2021. f_in_ebd &lt;- file.path(&quot;data/ebd_IN_relNov-2021.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/ebd_sampling_relNov-2021.txt&quot;) 3.2 Filter data Insert the list of species that we will be analyzing in this study. We initially chose those species that occurred in at least 5% of all checklists across 50% of the 25 x 25 km cells from where they have been reported, resulting in a total of 79 species. To arrive at this final list of species, we carried out further pre-processing which can be found in the previous script. For further details regarding the list of species, please refer to the main text of the manuscript. # add species of interest specieslist &lt;- read.csv(&quot;data/species-list.csv&quot;) speciesOfInterest &lt;- as.character(specieslist$scientific_name) Here, we set broad spatial filters for the states of Kerala, Tamil Nadu and Karnataka and keep only those checklists for our list of species that were reported between 1st Jan 2013 and 31st May 2021. # run filters using auk packages ebd_filters &lt;- auk_ebd(f_in_ebd, f_in_sampling) %&gt;% auk_species(speciesOfInterest) %&gt;% auk_country(country = &quot;IN&quot;) %&gt;% auk_state(c(&quot;IN-KL&quot;, &quot;IN-TN&quot;, &quot;IN-KA&quot;)) %&gt;% # Restricting geography to TamilNadu, Kerala &amp; Karnataka auk_date(c(&quot;2013-01-01&quot;, &quot;2021-05-31&quot;)) %&gt;% auk_complete() # check filters ebd_filters Below code need not be run if it has been filtered once already and the above path leads to the right dataset. NB: This is a computation heavy process, run with caution. # specify output location and perform filter f_out_ebd &lt;- &quot;data/01_ebird-filtered-EBD-westernGhats.txt&quot; f_out_sampling &lt;- &quot;data/01_ebird-filtered-sampling-westernGhats.txt&quot; ebd_filtered &lt;- auk_filter(ebd_filters, file = f_out_ebd, file_sampling = f_out_sampling, overwrite = TRUE ) 3.3 Process filtered data The data has been filtered above using the auk functions. We will now work with the filtered checklist observations (Please note that we have not yet spatially filtered the checklists to the confines of our study area, which is the Nilgiris and the Anamalai hills. This step is carried out further on). # read in the data ebd &lt;- read_ebd(f_out_ebd) eBird checklists only suggest whether a species was reported at a particular location. To arrive at absence data, we use a process known as zero-filling (Johnston et al. 2021), wherein a new dataframe is created with a 0 marked for each checklist when the bird was not observed. # fill zeroes zf &lt;- auk_zerofill(f_out_ebd, f_out_sampling) new_zf &lt;- collapse_zerofill(zf) Let us now choose specific columns necessary for further analysis. # choose columns of interest columnsOfInterest &lt;- c( &quot;checklist_id&quot;, &quot;scientific_name&quot;, &quot;common_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;time_observations_started&quot;, &quot;observer_id&quot;, &quot;sampling_event_identifier&quot;, &quot;protocol_type&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;effort_area_ha&quot;, &quot;number_observers&quot;, &quot;species_observed&quot;, &quot;reviewed&quot; ) # make list of presence and absence data and choose cols of interest data &lt;- list(ebd, new_zf) %&gt;% map(function(x) { x %&gt;% select(one_of(columnsOfInterest)) }) # remove zerofills to save working memory rm(zf, new_zf) gc() # check for presences and absence in absences df, remove essentially the presences df which may lead to erroneous analysis data[[2]] &lt;- data[[2]] %&gt;% filter(species_observed == F) 3.4 Spatial filter A spatial filter is now supplied to further restrict our list of observations to the confines of the Nilgiris and the Anamalai hills of the Western Ghats biodiversity hotspot. # load shapefile of the study area library(sf) hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) # write a prelim filter by bounding box box &lt;- st_bbox(hills) # get data spatial coordinates dataLocs &lt;- data %&gt;% map(function(x) { select(x, longitude, latitude) %&gt;% filter(between(longitude, box[&quot;xmin&quot;], box[&quot;xmax&quot;]) &amp; between(latitude, box[&quot;ymin&quot;], box[&quot;ymax&quot;])) }) %&gt;% bind_rows() %&gt;% distinct() %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% st_set_crs(4326) %&gt;% st_intersection(hills) # get simplified data and drop geometry dataLocs &lt;- mutate(dataLocs, spatialKeep = T) %&gt;% bind_cols(., as_tibble(st_coordinates(dataLocs))) %&gt;% st_drop_geometry() # bind to data and then filter data &lt;- data %&gt;% map(function(x) { left_join(x, dataLocs, by = c(&quot;longitude&quot; = &quot;X&quot;, &quot;latitude&quot; = &quot;Y&quot;)) %&gt;% filter(spatialKeep == T) %&gt;% select(-Id, -spatialKeep) }) Save temporary data created so far. # save a temp data file save(data, file = &quot;results/02_data_temp.rdata&quot;) 3.5 Handle presence data Further pre-processing is required in the case of many checklists where species abundance is often unknown and an X is denoted in such cases. Here, we convert all X notations to a 1, suggesting a presence (as we are not concerned with abundance data in this analysis). We also removed those checklists where the duration in minutes is either not recorded or listed as zero. Lastly, we added an sampling effort based filter following (Johnston et al. 2021), wherein we considered only those checklists with duration in minutes is less than 300 and distance in kilometers traveled is less than 5km. Lastly, we excluded those group checklists where the number of observers was greater than 10. For the sake of occupancy modeling of appropriate detection and occupancy covariates, we restrict all our checklists between December 1st and May 31st (non-rainy months)and checklists recorded between 5am and 7pm. # in the first set, replace X, for presences, with 1 data[[1]] &lt;- data[[1]] %&gt;% mutate(observation_count = ifelse(observation_count == &quot;X&quot;, &quot;1&quot;, observation_count )) # remove records where duration is 0 data &lt;- map(data, function(x) filter(x, duration_minutes &gt; 0)) # group data by site and sampling event identifier # then, summarise relevant variables as the sum dataGrouped &lt;- map(data, function(x) { x %&gt;% group_by(sampling_event_identifier) %&gt;% summarise_at( vars( duration_minutes, effort_distance_km, effort_area_ha ), list(sum.no.na) ) }) # bind rows combining data frames, and filter dataGrouped &lt;- bind_rows(dataGrouped) %&gt;% filter( duration_minutes &lt;= 300, effort_distance_km &lt;= 5, effort_area_ha &lt;= 500 ) # get data identifiers, such as sampling identifier etc dataConstants &lt;- data %&gt;% bind_rows() %&gt;% select( sampling_event_identifier, time_observations_started, locality, locality_type, locality_id, observer_id, observation_date, scientific_name, observation_count, protocol_type, number_observers, longitude, latitude ) # join the summarised data with the identifiers, # using sampling_event_identifier as the key dataGrouped &lt;- left_join(dataGrouped, dataConstants, by = &quot;sampling_event_identifier&quot; ) # remove checklists or seis with more than 10 obervers count(dataGrouped, number_observers &gt; 10) # count how many have 10+ obs dataGrouped &lt;- filter(dataGrouped, number_observers &lt;= 10) # keep only checklists between 5AM and 7PM dataGrouped &lt;- filter(dataGrouped, time_observations_started &gt;= &quot;05:00:00&quot; &amp; time_observations_started &lt;= &quot;19:00:00&quot;) # keep only checklists between December 1st and May 31st dataGrouped &lt;- filter(dataGrouped, month(observation_date) %in% c(1, 2, 3, 4, 5, 12)) 3.6 Add decimal time We added a column where time is denoted in decimal hours since midnight. # assign present or not, and get time in decimal hours since midnight library(lubridate) time_to_decimal &lt;- function(x) { x &lt;- hms(x, quiet = TRUE) hour(x) + minute(x) / 60 + second(x) / 3600 } # will cause issues if using time obs started as a linear effect and not quadratic dataGrouped &lt;- mutate(dataGrouped, pres_abs = observation_count &gt;= 1, decimalTime = time_to_decimal(time_observations_started) ) # check class of dataGrouped, make sure not sf assertthat::assert_that(!&quot;sf&quot; %in% class(dataGrouped)) The above data is saved to a file. # save a temp data file save(dataGrouped, file = &quot;results/02_data_prelim_processing.Rdata&quot;) "],["preparing-environmental-predictors.html", "Section 4 Preparing Environmental Predictors 4.1 Prepare libraries 4.2 Prepare spatial extent 4.3 Prepare terrain rasters 4.4 Prepare CHELSA rasters 4.5 Resample landcover from 10m to 1km 4.6 Resample other rasters to 1km 4.7 Climate variables in relation to elevation 4.8 Climate across land cover types 4.9 Land cover type in relation to elevation 4.10 Main Text Figure 2", " Section 4 Preparing Environmental Predictors In this script, we processed climatic and landscape predictors for occupancy modeling. All climatic data was obtained from https://chelsa-climate.org/bioclim/ All landscape data was derived from a high resolution land cover map (Roy et al. 2015). This map provides sufficient classes to achieve a high land cover resolution and can be accessed here (https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html). The goal here is to resample all rasters so that they have the same resolution of 1km cells. 4.1 Prepare libraries We load some common libraries for raster processing and define a custom mode function. # load libs library(raster) library(stringi) library(glue) library(gdalUtils) library(purrr) library(dplyr) library(tidyr) library(tibble) # for plotting library(viridis) library(colorspace) library(tmap) library(scales) library(ggplot2) library(patchwork) # prep mode function to aggregate funcMode &lt;- function(x, na.rm = T) { ux &lt;- unique(x) ux[which.max(tabulate(match(x, ux)))] } # a basic test assertthat::assert_that(funcMode(c(2, 2, 2, 2, 3, 3, 3, 4)) == as.character(2), msg = &quot;problem in the mode function&quot; ) # works # get ci func ci &lt;- function(x) { qnorm(0.975) * sd(x, na.rm = T) / sqrt(length(x)) } 4.2 Prepare spatial extent We prepare a 30km buffer around the boundary of the study area. This buffer will be used to mask the landscape rasters.The buffer procedure is done on data transformed to the UTM 43N CRS to avoid distortions. # load hills library(sf) hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) hills &lt;- st_transform(hills, 32643) buffer &lt;- st_buffer(hills, 3e4) %&gt;% st_transform(4326) 4.3 Prepare terrain rasters We prepare the elevation data which is an SRTM raster layer, and derive the slope and aspect from it after cropping it to the extent of the study site buffer. Please download the latest version of the SRTM raster layer from https://www.worldclim.org/data/worldclim21.html # load elevation and crop to hills size, then mask by study area alt &lt;- raster(&quot;data/spatial/Elevation/alt&quot;) # this layer is not added to github as a result of its large size and can be downloaded from the above link alt.hills &lt;- raster::crop(alt, as(buffer, &quot;Spatial&quot;)) rm(alt) gc() # get slope and aspect slopeData &lt;- raster::terrain(x = alt.hills, opt = c(&quot;slope&quot;, &quot;aspect&quot;)) elevData &lt;- raster::stack(alt.hills, slopeData) rm(alt.hills) gc() 4.4 Prepare CHELSA rasters CHELSA rasters can be downloaded using the get_chelsa.sh shell script, which is a wget command pointing to the envidatS3.txt file. 4.4.1 Prepare BIOCLIM 4a and 15 We prepare the CHELSA rasters for seasonality in temperature (Bio 4a) and seasonality in precipitation (Bio 15) in the same way, reading them in, cropping them to the study site buffer extent, and handling the temperature layer values which we divide by 10. The CHELSA rasters can be downloaded from https://chelsa-climate.org/bioclim/ # list chelsa files # the chelsa data can be downloaded from the aforementioned link. They haven&#39;t been uploaded to github as a result of its large size. chelsaFiles &lt;- list.files(&quot;data/chelsa/&quot;, full.names = TRUE, recursive = TRUE, pattern = &quot;bio10&quot; ) # gather chelsa rasters chelsaData &lt;- purrr::map(chelsaFiles, function(chr) { a &lt;- raster(chr) crs(a) &lt;- crs(elevData) a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) return(a) }) # divide temperature by 10 chelsaData[[1]] &lt;- chelsaData[[1]] / 10 # stack chelsa data chelsaData &lt;- raster::stack(chelsaData) 4.4.2 Prepare BIOCLIM 4a if (file.exists(&quot;data/chelsa/CHELSA_bio10_4a.tif&quot;)) { message(&quot;Bio 4a already exists, will be overwritten&quot;) } Bioclim 4a, the coefficient of variation temperature seasonality is calculated as \\[Bio\\ 4a = \\frac{SD\\{ Tkavg_1, \\ldots Tkavg_{12} \\}}{(Bio\\ 1 + 273.15)} \\times 100\\] where \\(Tkavg_i = (Tkmin_i + Tkmax_i) / 2\\) Here, we use only the months of December and Jan  May for winter temperature variation. # list rasters by pattern patterns &lt;- c(&quot;tmin&quot;, &quot;tmax&quot;) # list the filepaths tkAvg &lt;- map(patterns, function(pattern) { # list the paths files &lt;- list.files( path = &quot;data/chelsa&quot;, full.names = TRUE, recursive = TRUE, pattern = pattern ) }) # print crs elev data for sanity check --- basic WGS84 crs(elevData) # now run over the paths and read as rasters and crop by buffer tkAvg &lt;- map(tkAvg, function(paths) { # going over the file paths, read them in as rasters, convert CRS and crop tempData &lt;- map(paths, function(path) { # read in a &lt;- raster(path) # assign crs crs(a) &lt;- crs(elevData) # crop by buffer, will throw error if CRS doesn&#39;t match a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) # return a a }) # convert each to kelvin, first dividing by 10 to get celsius tempData &lt;- map(tempData, function(tmpRaster) { tmpRaster &lt;- (tmpRaster / 10) + 273.15 }) }) # assign names names(tkAvg) &lt;- patterns # go over the tmin and tmax and get the average monthly temp tkAvg &lt;- map2(tkAvg[[&quot;tmin&quot;]], tkAvg[[&quot;tmax&quot;]], function(tmin, tmax) { # return the mean of the corresponding tmin and tmax # still in kelvin calc(stack(tmin, tmax), fun = mean) }) # calculate Bio 4a bio_4a &lt;- (calc(stack(tkAvg), fun = sd) / (chelsaData[[1]] + 273.15)) * 100 names(bio_4a) &lt;- &quot;CHELSA_bio10_4a&quot; # save bio_4a writeRaster(bio_4a, filename = &quot;data/chelsa/CHELSA_bio10_4a.tif&quot;, overwrite = T) 4.4.3 Prepare Bioclim 15 if (file.exists(&quot;data/chelsa/CHELSA_bio10_15.tif&quot;)) { message(&quot;Bio 15 already exists, will be overwritten&quot;) } Bioclim 15, the coefficient of variation precipitation (in our area, rainfall) seasonality is calculated as \\[Bio\\ 15 = \\frac{SD\\{ PPT_1, \\ldots PPT_{12} \\}}{1 + (Bio\\ 12 / 12)} \\times 100\\] where \\(PPT_i\\) is the monthly precipitation. Here, we use only the months of December and Jan  May for winter rainfall variation. # list rasters by pattern pattern &lt;- &quot;prec&quot; # list the filepaths pptTotal &lt;- list.files( path = &quot;data/chelsa&quot;, full.names = TRUE, recursive = TRUE, pattern = pattern ) # print crs elev data for sanity check --- basic WGS84 crs(elevData) # now run over the paths and read as rasters and crop by buffer pptTotal &lt;- map(pptTotal, function(path) { a &lt;- raster(path) # assign crs crs(a) &lt;- crs(elevData) # crop by buffer, will throw error if CRS doesn&#39;t match a &lt;- crop(a, as(buffer, &quot;Spatial&quot;)) # return a a }) # calculate Bio 4a bio_15 &lt;- (calc(stack(pptTotal), fun = sd) / (1 + (chelsaData[[2]] / 12))) * 100 names(bio_15) &lt;- &quot;CHELSA_bio10_15&quot; # save bio_4a writeRaster(bio_15, filename = &quot;data/chelsa/CHELSA_bio10_15.tif&quot;, overwrite = T) 4.4.4 Stack terrain and climate We stack the terrain and climatic rasters. # If bio4a and bio15 have already been prepared from previous runs/analysis - load them directly bio_4a &lt;- raster(&quot;data/chelsa/CHELSA_bio10_4a.tif&quot;) bio_15 &lt;- raster(&quot;data/chelsa/CHELSA_bio10_15.tif&quot;) 4.4.5 Stack terrain and climate We stack the terrain and climatic rasters. # stack rasters for efficient reprojection later env_data &lt;- stack(elevData, bio_4a, bio_15) 4.5 Resample landcover from 10m to 1km We read in a land cover classified image and resample that using the mode function to a 1km resolution. Please note that the resampling process need not be carried out as it has been done already and the resampled raster can be loaded with the subsequent code chunk. # read in landcover raster location # To access the land cover data, please visit: https://daac.ornl.gov/VEGETATION/guides/Decadal_LULC_India.html landcover &lt;- &quot;data/landUseClassification/landcover_roy_2015/&quot; # read in and crop landcover &lt;- raster(landcover) buffer_utm &lt;- st_transform(buffer, 32643) landcover &lt;- crop( landcover, as( buffer_utm, &quot;Spatial&quot; ) ) # read reclassification matrix reclassification_matrix &lt;- read.csv(&quot;data/landUseClassification/reclassification-matrix-landCover-2015.csv&quot;) reclassification_matrix &lt;- as.matrix(reclassification_matrix[, c(&quot;V1&quot;, &quot;To&quot;)]) # reclassify landcover_reclassified &lt;- reclassify( x = landcover, rcl = reclassification_matrix ) # write to file writeRaster(landcover_reclassified, filename = &quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;, overwrite = TRUE ) # check reclassification plot(landcover_reclassified) # get extent e &lt;- bbox(raster(landcover)) # init resolution res_init &lt;- res(raster(landcover)) # res to transform to 1000m res_final &lt;- res_init * (1000 / res_init) # use gdalutils gdalwarp for resampling transform # to 1km from 10m gdalUtils::gdalwarp( srcfile = &quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;, dstfile = &quot;data/landUseClassification/lc_01000m.tif&quot;, tr = c(res_final), r = &quot;mode&quot;, te = c(e) ) We compare the frequency of landcover classes between the original raster and the resampled 1km raster to be certain that the resampling has not resulted in drastic misrepresentation of the frequency of any landcover type. This comparison is made using the figure below. Resampling the Roy et al. (2015) landcover raster, reclassified into 7 main classes, to a resolution of 1km, preserves the important features of landcover over the study area. 4.6 Resample other rasters to 1km We now resample all other rasters to a resolution of 1km. 4.6.1 Read in resampled landcover Here, we read in the 1km landcover raster and set 0 to NA. lc_data &lt;- raster(&quot;data/landUseClassification/lc_01000m.tif&quot;) lc_data[lc_data == 0] &lt;- NA 4.6.2 Reproject environmental data using landcover as a template # resample to the corresponding landcover data env_data_resamp &lt;- projectRaster( from = env_data, to = lc_data, crs = crs(lc_data), res = res(lc_data) ) # export as raster stack land_stack &lt;- stack(env_data_resamp, lc_data) # get names land_names &lt;- glue(&#39;data/spatial/landscape_resamp{c(&quot;01&quot;)}_km.tif&#39;) # write to file raster::writeRaster( land_stack, filename = as.character(land_names), overwrite = TRUE ) 4.7 Climate variables in relation to elevation 4.7.1 Load resampled environmental rasters # read landscape prepare for plotting landscape &lt;- stack(&quot;data/spatial/landscape_resamp01_km.tif&quot;) # get proper names elev_names &lt;- c(&quot;elev&quot;, &quot;slope&quot;, &quot;aspect&quot;) chelsa_names &lt;- c(&quot;bio_4a&quot;, &quot;bio_15&quot;) names(landscape) &lt;- glue(&#39;{c(elev_names, chelsa_names, &quot;landcover&quot;)}&#39;) # make duplicate stack land_data &lt;- landscape[[c(&quot;elev&quot;, chelsa_names)]] # convert to list land_data &lt;- as.list(land_data) # map get values over the stack land_data &lt;- purrr::map(land_data, getValues) names(land_data) &lt;- c(&quot;elev&quot;, chelsa_names) # conver to dataframe and round to 200m land_data &lt;- bind_cols(land_data) land_data &lt;- drop_na(land_data) %&gt;% mutate(elev_round = plyr::round_any(elev, 200)) %&gt;% dplyr::select(-elev) %&gt;% pivot_longer( cols = contains(&quot;bio&quot;), names_to = &quot;clim_var&quot; ) %&gt;% group_by(elev_round, clim_var) %&gt;% summarise_all(.funs = list(~ mean(.), ~ ci(.))) Figure code is hidden in versions rendered as HTML or PDF. 4.8 Climate across land cover types Get climate values per (re-classified) landcover type from the 1km resampled raster. # make duplicate stack again lc_clim_data &lt;- landscape[[c(&quot;landcover&quot;, chelsa_names)]] # convert to list lc_clim_data &lt;- as.list(lc_clim_data) # map get values over the stack lc_clim_data &lt;- purrr::map(lc_clim_data, getValues) names(lc_clim_data) &lt;- c(&quot;landcover&quot;, chelsa_names) # conver to dataframe for histogram lc_clim_data &lt;- bind_cols(lc_clim_data) # pivot long lc_clim_data &lt;- pivot_longer( lc_clim_data, cols = contains(&quot;bio&quot;), names_to = &quot;climvar&quot; ) # make landcover factor lc_clim_data &lt;- mutate( lc_clim_data, landcover = factor(landcover) ) # filter bio lc_clim_data &lt;- filter( lc_clim_data, !is.na(landcover) ) # split by variable lc_clim_data &lt;- nest(lc_clim_data, data = c(&quot;landcover&quot;, &quot;value&quot;)) # assign names lc_clim_data$climvar_name &lt;- c( &quot;Temperature seasonality&quot;, &quot;Precipitation seasonality&quot; ) Plot density plots of climate seasonality per LC type. 4.9 Land cover type in relation to elevation # get data from landscape rasters lc_elev &lt;- tibble( elev = getValues(landscape[[&quot;elev&quot;]]), landcover = getValues(landscape[[&quot;landcover&quot;]]) ) # process data for proportions lc_elev &lt;- lc_elev %&gt;% filter(!is.na(landcover), !is.na(elev)) %&gt;% # round elev to 100m mutate(elev = plyr::round_any(elev, 100)) %&gt;% count(elev, landcover) %&gt;% group_by(elev) %&gt;% mutate(prop = n / sum(n)) # fill out lc elev lc_elev_canon &lt;- crossing( elev = unique(lc_elev$elev), landcover = unique(lc_elev$landcover) ) # bind with lcelev lc_elev &lt;- full_join(lc_elev, lc_elev_canon) # convert NA to zero lc_elev &lt;- replace_na(lc_elev, replace = list(n = 0, prop = 0)) Figure code is hidden in versions rendered as HTML and PDF. 4.10 Main Text Figure 2 Climate and land cover vary strongly along the elevation gradient in the Nilgiri and Anamalai Hills. Both (a) temperature seasonality and (b) precipitation seasonality, between the months of December and May, declines with increasing elevation across the Nilgiri and Anamalai Hills. Climatic variation is not very strongly associated with land cover type, as both natural habitats such as forests, and human-associated habitat types such as plantations show low seasonality in (c) temperature, and (d) precipitation. (e) Most elevations host a range of land cover types: while human-associated habitats such as agriculture are concentrated at lower elevations, and more natural types such as grasslands and forests are associated with higher elevations, each of these types is also found outside their characteristic elevational bands. We calculated climate seasonalities (BIOCLIM 4a and 15: temperature and precipitation, respectively) using CHELSA data over 1979  2013, from December to May (Karger et al. 2017), and present mean seasonality values (vertical bars show standard deviation) for every 200 m elevational band. Land cover types were taken from a reclassification of Roy et al. (2015; see main text) at 100 m elevational bands. Land cover types covering &lt; 1% of an elevational band are shaded grey. All landscape layers were first resampled to 1 km resolution. "],["preparing-checklist-calibration-index.html", "Section 5 Preparing Checklist Calibration Index 5.1 Prepare libraries 5.2 Prepare data 5.3 Spatially explicit filter on checklists 5.4 Prepare species of interest 5.5 Prepare checklists for observer score 5.6 Get landcover 5.7 Filter checklist data 5.8 Model observer expertise", " Section 5 Preparing Checklist Calibration Index Differences in local avifaunal expertise among citizen scientists can lead to biased species detection when compared with data collected by a consistent set of trained observers (Van Strien et al. 2013). Including observer expertise as a detection covariate in occupancy models using eBird data can help account for this variation (Johnston et al. 2018). Observer-specific expertise in local avifauna was calculated following (Kelling et al. 2015) as the normalized predicted number of species reported by an observer after 60 minutes of sampling across the most common land cover type within the study area. This score was calculated by examining checklists from anonymized observers across the study area. We modified Kelling et al. (2015) formulation by including only observations of the 79 species of interest in our calculations. An observer with a higher number of species of interest reported within 60 minutes would have a higher observer-specific expertise score, with respect to the study area. 5.1 Prepare libraries # load libs library(data.table) library(readxl) library(magrittr) library(stringr) library(dplyr) library(tidyr) library(auk) # get decimal time function library(lubridate) time_to_decimal &lt;- function(x) { x &lt;- lubridate::hms(x, quiet = TRUE) lubridate::hour(x) + lubridate::minute(x) / 60 + lubridate::second(x) / 3600 } 5.2 Prepare data Here, we go through the data preparation process again because we might want to assess observer expertise over a larger area than the study site. # Read in shapefile of study area to subset by bounding box library(sf) wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) # set file paths for auk functions f_in_ebd &lt;- file.path(&quot;data/01_ebird-filtered-EBD-westernGhats.txt&quot;) f_in_sampling &lt;- file.path(&quot;data/01_ebird-filtered-sampling-westernGhats.txt&quot;) # run filters using auk packages ebd_filters &lt;- auk_ebd(f_in_ebd, f_in_sampling) %&gt;% auk_country(country = &quot;IN&quot;) %&gt;% auk_state(c(&quot;IN-KL&quot;, &quot;IN-TN&quot;, &quot;IN-KA&quot;)) %&gt;% # Restricting geography to TamilNadu, Kerala &amp; Karnataka auk_date(c(&quot;2013-01-01&quot;, &quot;2021-05-31&quot;)) %&gt;% auk_complete() # check filters ebd_filters # specify output location and perform filter f_out_ebd &lt;- &quot;data/ebird_for_expertise.txt&quot; f_out_sampling &lt;- &quot;data/ebird_sampling_for_expertise.txt&quot; ebd_filtered &lt;- auk_filter(ebd_filters, file = f_out_ebd, file_sampling = f_out_sampling, overwrite = TRUE ) Load in the filtered data and columns of interest. ## Process filtered data # read in the data ebd &lt;- fread(f_out_ebd) names &lt;- names(ebd) %&gt;% stringr::str_to_lower() %&gt;% stringr::str_replace_all(&quot; &quot;, &quot;_&quot;) setnames(ebd, names) # choose columns of interest columnsOfInterest &lt;- c( &quot;global_unique_identifier&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;locality&quot;, &quot;locality_id&quot;, &quot;locality_type&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observation_date&quot;, &quot;time_observations_started&quot;, &quot;observer_id&quot;, &quot;sampling_event_identifier&quot;, &quot;protocol_type&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;effort_area_ha&quot;, &quot;number_observers&quot;, &quot;all_species_reported&quot;, &quot;reviewed&quot; ) ebd &lt;- setDF(ebd) %&gt;% as_tibble() %&gt;% dplyr::select(one_of(columnsOfInterest)) setDT(ebd) # remove checklists or seis with more than 10 obervers ebd &lt;- filter(ebd, number_observers &lt;= 10) # keep only checklists between 5AM and 7PM ebd &lt;- filter(ebd, time_observations_started &gt;= &quot;05:00:00&quot; &amp; time_observations_started &lt;= &quot;19:00:00&quot;) # keep only checklists between December 1st and May 31st ebd &lt;- filter(ebd, month(observation_date) %in% c(1, 2, 3, 4, 5, 12)) 5.3 Spatially explicit filter on checklists # get checklist locations ebd_locs &lt;- ebd[, .(longitude, latitude)] ebd_locs &lt;- setDF(ebd_locs) %&gt;% distinct() ebd_locs &lt;- st_as_sf(ebd_locs, coords = c(&quot;longitude&quot;, &quot;latitude&quot;) ) %&gt;% `st_crs&lt;-`(4326) %&gt;% bind_cols(as_tibble(st_coordinates(.))) %&gt;% st_transform(32643) %&gt;% mutate(id = 1:nrow(.)) # check whether to include to_keep &lt;- unlist(st_contains(wg, ebd_locs)) # filter locs ebd_locs &lt;- filter(ebd_locs, id %in% to_keep) %&gt;% bind_cols(as_tibble(st_coordinates(st_as_sf(.)))) %&gt;% st_drop_geometry() names(ebd_locs) &lt;- c(&quot;longitudeWGS&quot;, &quot;latitudeWGS&quot;, &quot;id&quot;, &quot;longitudeUTM&quot;, &quot;latitudeUTM&quot;) ebd &lt;- ebd[longitude %in% ebd_locs$longitudeWGS &amp; latitude %in% ebd_locs$latitudeWGS, ] 5.4 Prepare species of interest # read in species list specieslist &lt;- read.csv(&quot;data/species-list.csv&quot;) # set species of interest soi &lt;- specieslist$scientific_name ebdSpSum &lt;- ebd[, .( nSp = .N, totSoiSeen = length(intersect(scientific_name, soi)) ), by = list(sampling_event_identifier) ] # write to file and link with checklist id later fwrite(ebdSpSum, file = &quot;results/04_data-nspp-per-chk.csv&quot;) 5.5 Prepare checklists for observer score # 1. add new columns of decimal time and julian date ebd[, `:=`( decimalTime = time_to_decimal(time_observations_started), julianDate = yday(as.POSIXct(observation_date)) )] ebdEffChk &lt;- setDF(ebd) %&gt;% mutate(year = year(observation_date)) %&gt;% distinct( sampling_event_identifier, observer_id, year, duration_minutes, effort_distance_km, effort_area_ha, longitude, latitude, locality, locality_id, decimalTime, julianDate, number_observers ) %&gt;% # drop rows with NAs in cols used in the model tidyr::drop_na( sampling_event_identifier, observer_id, duration_minutes, decimalTime, julianDate ) %&gt;% # drop years below 2013 filter(year &gt;= 2013) # 3. join to covariates and remove large groups (&gt; 10) ebdChkSummary &lt;- inner_join(ebdEffChk, ebdSpSum) # remove ebird data rm(ebd) gc() 5.6 Get landcover Read in land cover type data resampled at 1km resolution. # read in 1km landcover and set 0 to NA library(raster) landcover &lt;- raster::raster(&quot;data/landUseClassification/lc_01000m.tif&quot;) landcover[landcover == 0] &lt;- NA # get locs in utm coords locs &lt;- distinct( ebdChkSummary, sampling_event_identifier, longitude, latitude, locality, locality_id ) locs &lt;- st_as_sf(locs, coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% st_transform(32643) %&gt;% st_coordinates() # get for unique points landcoverVec &lt;- raster::extract( x = landcover, y = locs ) # assign to df and overwrite setDT(ebdChkSummary)[, landcover := landcoverVec] 5.7 Filter checklist data # change names for easy handling setnames(ebdChkSummary, c( &quot;locality&quot;, &quot;locality_id&quot;, &quot;latitude&quot;, &quot;longitude&quot;, &quot;observer&quot;, &quot;sei&quot;, &quot;duration&quot;, &quot;distance&quot;, &quot;area&quot;, &quot;nObs&quot;, &quot;decimalTime&quot;, &quot;julianDate&quot;, &quot;year&quot;, &quot;nSp&quot;, &quot;nSoi&quot;, &quot;landcover&quot; )) # count data points per observer obscount &lt;- count(ebdChkSummary, observer) %&gt;% filter(n &gt;= 3) # make factor variables and remove obs not in obscount # also remove 0 durations ebdChkSummary &lt;- ebdChkSummary %&gt;% mutate( distance = ifelse(is.na(distance), 0, distance), duration = if_else(is.na(duration), 0.0, as.double(duration)) ) %&gt;% filter( observer %in% obscount$observer, duration &gt; 0, duration &lt;= 300, nSoi &gt;= 0, distance &lt;= 5, !is.na(nSoi) ) %&gt;% mutate( landcover = as.factor(landcover), observer = as.factor(observer) ) %&gt;% drop_na(landcover) # editing julian date to model it in a linear fashion unique(ebdChkSummary$julianDate) ebdChkSummary &lt;- ebdChkSummary %&gt;% mutate( newjulianDate = case_when( julianDate &gt;= 334 &amp; julianDate &lt;= 365 ~ (julianDate - 333), julianDate &gt;= 1 &amp; julianDate &lt;= 152 ~ (julianDate + 31) ) ) %&gt;% drop_na(newjulianDate) # save to file for later reuse fwrite(ebdChkSummary, file = &quot;results/04_data-covars-perChklist.csv&quot;) 5.8 Model observer expertise Our observer expertise model aims to include the random intercept effect of observer identity, with a random slope effect of duration. This models the different rate of species accumulation by different observers, as well as their different starting points. # uses either a subset or all data library(lmerTest) # here we specify a glmm with random effects for observer # time is considered a fixed log predictor and a random slope modObsExp &lt;- glmer(nSoi ~ duration + sqrt(duration) + landcover + sqrt(decimalTime) + I((sqrt(decimalTime))^2) + log(newjulianDate) + I((log(newjulianDate)^2)) + (1 | observer) + (0 + duration | observer), data = ebdChkSummary, family = &quot;poisson&quot; ) # make dir if absent if (!dir.exists(&quot;results/modOutput&quot;)) { dir.create(&quot;results/modOutput&quot;) } # write model output to text file { writeLines(R.utils::captureOutput(list(Sys.time(), summary(modObsExp))), con = &quot;results/modOutput/04_model-output-expertise.txt&quot; ) } # make df with means observer &lt;- unique(ebdChkSummary$observer) # predict at 60 mins on the most common landcover (deciduous forests) dfPredict &lt;- ebdChkSummary %&gt;% summarise_at(vars(duration, decimalTime, newjulianDate), list(~ mean(.))) %&gt;% mutate(duration = 60, landcover = as.factor(2)) %&gt;% tidyr::crossing(observer) # run predict from model on it dfPredict &lt;- mutate(dfPredict, score = predict(modObsExp, newdata = dfPredict, type = &quot;response&quot;, allow.new.levels = TRUE ) ) %&gt;% mutate(score = scales::rescale(score)) fwrite(dfPredict %&gt;% dplyr::select(observer, score), file = &quot;results/04_data-obsExpertise-score.csv&quot; ) "],["examining-spatial-sampling-bias.html", "Section 6 Examining Spatial Sampling Bias 6.1 Prepare libraries 6.2 Read checklist data 6.3 Prepare Main Text Figure 3 6.4 Figure: Spatial sampling bias", " Section 6 Examining Spatial Sampling Bias The goal of this section is to show how far each checklist location is from the nearest road, and how far each site is from its nearest neighbour. This follows finding the pairwise distance between a large number of unique checklist locations to a vast number of roads, as well as to each other. 6.1 Prepare libraries # load libraries # for data library(sf) library(rnaturalearth) library(dplyr) library(readr) library(purrr) # for plotting library(scales) library(ggplot2) library(ggspatial) library(colorspace) # round any function round_any &lt;- function(x, accuracy = 20000) { round(x / accuracy) * accuracy } # ci function ci &lt;- function(x) { qnorm(0.975) * sd(x, na.rm = TRUE) / sqrt(length(x)) } 6.2 Read checklist data Read in checklist data with distance to nearest neighbouring site, and the distance to the nearest road. # read from local file chkCovars &lt;- read_csv(&quot;results/04_data-covars-perChklist.csv&quot;) 6.2.1 Spatially explicit filter on checklists We filter the checklists by the boundary of the study area. This is not the extent. chkCovars &lt;- st_as_sf(chkCovars, coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) %&gt;% st_transform(32643) # read wg wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) # get bounding box bbox &lt;- st_bbox(wg) # spatial subset chkCovars &lt;- chkCovars %&gt;% mutate(id = 1:nrow(.)) %&gt;% filter(id %in% unlist(st_contains(wg, chkCovars))) 6.2.2 Get background land for plotting # add land land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) %&gt;% st_transform(32643) # add roads data roads &lt;- st_read(&quot;data/spatial/roads_studysite_2019/roads_studysite_2019.shp&quot;) %&gt;% st_transform(32643) 6.3 Prepare Main Text Figure 3 6.3.1 Prepare histogram of distance to roads Figure code is hidden in versions rendered as HTML or PDF. 6.3.2 Table: Distance to roads # write the mean and ci95 to file chkCovars %&gt;% st_drop_geometry() %&gt;% dplyr::select(dist_road, nnb) %&gt;% tidyr::pivot_longer( cols = c(&quot;dist_road&quot;, &quot;nnb&quot;), names_to = &quot;variable&quot; ) %&gt;% group_by(variable) %&gt;% summarise_at( vars(value), list(~ mean(.), ~ sd(.), ~ min(.), ~ max(.)) ) %&gt;% write_csv(&quot;results/06_distance-roads-sites.csv&quot;) 6.3.3 Distance to nearest neighbouring site # get unique locations from checklists locs_unique &lt;- cbind( st_drop_geometry(chkCovars), st_coordinates(chkCovars) ) %&gt;% as_tibble() locs_unique &lt;- distinct(locs_unique, X, Y, .keep_all = T) Figure code is hidden in versions rendered as HTML and PDF. 6.3.4 Spatial distribution of distances to neighbours Figure code is hidden in HTML and PDF versions, consult the Rmarkdown file. Most observation sites are within 300m of another site. 6.4 Figure: Spatial sampling bias # get locations points &lt;- chkCovars %&gt;% bind_cols(as_tibble(st_coordinates(.))) %&gt;% st_drop_geometry() %&gt;% mutate(X = round_any(X, 2500), Y = round_any(Y, 2500)) # count points points &lt;- count(points, X, Y) Figure code is hidden in versions rendered as HTML and PDF. # save as png ggsave( fig_checklists_grid, filename = &quot;figs/fig_spatial_bias.png&quot; ) # save figure as Robject for next plot save(fig_checklists_grid, file = &quot;data/fig_checklists_grid.Rds&quot;) Sampling effort across the Nilgiri and Anamalai Hills, in the form of eBird checklists reported by birdwatchers, mostly takes place along roads, with the majority of checklists located &lt; 1 km from a roadway (see distribution in inset), and therefore, only about 300m, on average, from the location of another checklist. Each cell here is 2.5km x 2.5km. "],["checking-temporal-sampling-frequency.html", "Section 7 Checking Temporal Sampling Frequency 7.1 Load libraries 7.2 Load checklist data 7.3 Get time differences per grid cell 7.4 Time Since Previous Checklist 7.5 Main Text Figure 3 7.6 Checklists per Month", " Section 7 Checking Temporal Sampling Frequency How often are checklists recorded in each grid cell? 7.1 Load libraries # load libraries library(tidyverse) library(sf) # for plotting library(ggplot2) library(colorspace) library(scico) library(ggthemes) library(ggspatial) library(patchwork) 7.2 Load checklist data Here we load filtered checklist data and convert to UTM 43N coordinates. # load checklist data load(&quot;results/02_data_prelim_processing.rdata&quot;) # get checklists data &lt;- distinct( dataGrouped, sampling_event_identifier, observation_date, longitude, latitude ) # remove old data rm(dataGrouped) # transform to UTM 43N data &lt;- st_as_sf(data, coords = c(&quot;longitude&quot;, &quot;latitude&quot;), crs = 4326) data &lt;- st_transform(data, crs = 32643) # get coordinates and bind to data data &lt;- cbind( st_drop_geometry(data), st_coordinates(data) ) # bin to 1000m data &lt;- mutate(data, X = plyr::round_any(X, 2500), Y = plyr::round_any(Y, 2500) ) 7.3 Get time differences per grid cell # get time differences in days data &lt;- mutate(data, observation_date = as.POSIXct(observation_date)) data &lt;- nest(data, data = c(&quot;sampling_event_identifier&quot;, &quot;observation_date&quot;)) # map over data data &lt;- mutate(data, lag_metrics = lapply(data, function(df) { df &lt;- arrange(df, observation_date) lag &lt;- as.numeric(diff(df$observation_date, na.rm = TRUE) / (24 * 3600)) data &lt;- tibble( mean_lag = mean(lag, na.rm = TRUE), median_lag = median(lag, na.rm = TRUE), sd_lag = sd(lag, na.rm = TRUE), n_chk = nrow(df) ) data }) ) # unnest lag metrics data_lag &lt;- select(data, -data) data_lag &lt;- unnest(data_lag, cols = &quot;lag_metrics&quot;) # set the mean and median to infinity if nchk is 1 data_lag &lt;- mutate(data_lag, mean_lag = ifelse(n_chk == 1, Inf, mean_lag), median_lag = ifelse(n_chk == 1, Inf, median_lag), sd_lag = ifelse(n_chk == 1, Inf, sd_lag) ) # set all 0 to 1 data_lag &lt;- mutate(data_lag, mean_lag = mean_lag + 1, median_lag = median_lag + 1 ) # melt data by tile # data_lag = pivot_longer(data_lag, cols = c(&quot;mean_lag&quot;, &quot;median_lag&quot;, &quot;sd_lag&quot;)) 7.4 Time Since Previous Checklist 7.4.1 Get aux data # hills data wg &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) roads &lt;- st_read(&quot;data/spatial/roads_studysite_2019/roads_studysite_2019.shp&quot;) %&gt;% st_transform(32643) # add land library(rnaturalearth) land &lt;- ne_countries( scale = 50, type = &quot;countries&quot;, continent = &quot;asia&quot;, country = &quot;india&quot;, returnclass = c(&quot;sf&quot;) ) %&gt;% st_transform(32643) bbox &lt;- st_bbox(wg) 7.4.2 Histogram of lags Figure code hidden in HTML and PDF versions. # get lags data &lt;- mutate(data, lag_hist = lapply(data, function(df) { df &lt;- arrange(df, observation_date) lag &lt;- as.numeric(diff(df$observation_date, na.rm = TRUE) / (24 * 3600)) data &lt;- tibble( lag = lag + 1, index = seq(lag) ) data }) ) # unnest lags data_hist &lt;- select(data, X, Y, lag_hist) %&gt;% unnest(cols = &quot;lag_hist&quot;) Most sites are resurveyed at least once, but some are visited much more frequently than others. There does not appear to be a link between roads and visit frequency. eBird checklists are also strongly clustered in time, with some of the most sampled areas over the study period visited at intervals of &gt; 1 week, and with some less intensively sampled areas visited frequently, at intervals of &lt; 1 week. Overall, the majority of checklists are reported only a day after the previous checklist at that location (see inset). 7.5 Main Text Figure 3 Combining figures for spatial and temporal clustering into main text figure 3. This overall figure is not shown here, see main text. # load spatial bias figure load(&quot;data/fig_checklists_grid.Rds&quot;) Distribution of sampling effort in the form of eBird checklists in the Nilgiri and Anamalai Hills between 2013 and 2021. (a) Sampling effort across the Nilgiri and Anamalai Hills, in the form of eBird checklists reported by birdwatchers, mostly takes place along roads, with the majority of checklists located &lt;1 km from a roadway (see distribution in inset), and therefore, only about 300m, on average, from the location of another checklist. (b) eBird checklists are also strongly clustered in time, with some of the most sampled areas over the study period visited at intervals of &gt; 1 week, and with some less intensively sampled areas visited frequently, at intervals of &lt; 1 week. Overall, most checklists are reported only a day after the previous checklist at that location (see inset). Both spatial and temporal clustering make data thinning necessary. Both panels show counts or mean intervals in a 2.5km grid cell; the study area is bounded by a dashed line, and roads within it are shown as (a) blue or (b) red lines. 7.6 Checklists per Month We counted the checklists per month, pooled over years, to determine how sampling effort varies over the year. # get two week period by date data &lt;- select(data, X, Y, data) # unnest data &lt;- unnest(data, cols = &quot;data&quot;) # get fortnight library(lubridate) data &lt;- mutate(data, week = week(observation_date), week = plyr::round_any(week, 2), year = year(observation_date), month = month(observation_date) ) # count checklists per fortnight data_count &lt;- count(data, month, year) Observations peak in the early months of the year, and decline towards the rainy months, slowly increasing until the following winter. "],["adding-covariates-to-checklist-data.html", "Section 8 Adding Covariates to Checklist Data 8.1 Prepare libraries and data 8.2 Spatial subsampling 8.3 Temporal subsampling 8.4 Add checklist calibration index 8.5 Add climatic and landscape covariates 8.6 Spatial buffers around selected checklists 8.7 Spatial buffer-wide covariates", " Section 8 Adding Covariates to Checklist Data In this section, we prepare a final list of covariates, after taking into account spatial sampling bias, temporal bias and observer expertise scores (examined in previous sections). 8.1 Prepare libraries and data # load libs for data library(dplyr) library(readr) library(stringr) library(purrr) library(glue) library(tidyr) # check for velox and install library(devtools) if (!&quot;velox&quot; %in% installed.packages()) { install_github(&quot;hunzikp/velox&quot;) } # load spatial library(raster) library(rgeos) library(velox) library(sf) # load saved data object load(&quot;results/02_data_prelim_processing.rdata&quot;) 8.2 Spatial subsampling Sampling bias can be introduced into citizen science due to the often ad-hoc nature of data collection (Sullivan et al. 2014). For eBird, this translates into checklists reported when convenient, rather than at regular or random points in time and space, leading to non-independence in the data if observations are spatio-temporally clustered (Johnston et al. 2021). Spatio-temporal autocorrelation in the data can be reduced by sub-sampling at an appropriate spatial resolution, and by avoiding temporal clustering. We estimated two simple measures of spatial clustering: the distance from each site to the nearest road (road data from OpenStreetMap) and the nearest-neighbor distance for each site. Sites were strongly tied to roads (mean distance to road ± SD = 390.77 ± 859.15 m; range = 0.28 m  7.64 km) and were on average only 297 m away from another site (SD = 553 m; range = 0.14 m  12.85 km) (Figure 3). This analysis was done in the previous section. Here, to further reduce spatial autocorrelation, we divided the study area into a grid of 1km wide square cells and picked checklists from one site at random within each grid cell. # grid based spatial thinning gridsize &lt;- 500 # grid size in metres effort_distance_max &lt;- 1000 # removing checklists with this distance # make grids across the study site hills &lt;- st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) grid &lt;- st_make_grid(hills, cellsize = gridsize) # filtering on !pres_abs keeps absences # this absence data will be thinned data_thin_absences &lt;- filter(dataGrouped, !pres_abs) data_presences &lt;- filter(dataGrouped, pres_abs) # split data by species data_thin_absences &lt;- split( x = data_thin_absences, f = data_thin_absences$scientific_name ) 8.2.1 Counting presence observation proportion data_presence_prop &lt;- count(data_presences, scientific_name, name = &quot;presences&quot;) %&gt;% mutate( absences = map_int(data_thin_absences, nrow), presence_prop = presences / (presences + absences) ) # mean and sd of presence prop data_presence_prop %&gt;% summarise(mean(presence_prop), sd(presence_prop)) # spatial thinning on each species retains # site with maximum visits per grid cell data_thin_absences &lt;- map(data_thin_absences, function(df) { # count visits per locality df &lt;- group_by(df, locality) %&gt;% mutate(tot_effort = length(sampling_event_identifier)) %&gt;% ungroup() # remove sites with distances above spatial independence df &lt;- df %&gt;% dplyr::filter(effort_distance_km &lt;= effort_distance_max) %&gt;% st_as_sf(coords = c(&quot;longitude&quot;, &quot;latitude&quot;)) %&gt;% `st_crs&lt;-`(4326) # transform to regional UTM 43N and add id df &lt;- df %&gt;% st_transform(32643) %&gt;% mutate(coordId = 1:nrow(.)) %&gt;% bind_cols(as_tibble(st_coordinates(.))) # whcih cell has which coords grid_overlap &lt;- st_contains(grid, df) %&gt;% unclass() %&gt;% purrr::discard(.p = is_empty) # count length of grid overlap list # this is the number of cells with points in them sampled_cells &lt;- length(grid_overlap) # make tibble grid_overlap &lt;- tibble( uid_cell = seq(length(grid_overlap)), # the uid_cell is specific to this sp. coordId = grid_overlap ) # unnest grid_overlap &lt;- unnest(grid_overlap, cols = &quot;coordId&quot;) # join grid cell overlap with coordinate data df &lt;- left_join(df, grid_overlap, by = &quot;coordId&quot; ) %&gt;% st_drop_geometry() # for each uid_cell, select coord where effort is max points_max &lt;- df %&gt;% group_by(uid_cell) %&gt;% dplyr::filter(tot_effort == max(tot_effort)) %&gt;% # there may be multiple rows with max effort, select first dplyr::filter(row_number() == 1) # check for number of samples assertthat::assert_that( assertthat::are_equal(sampled_cells, nrow(points_max), msg = &quot;spatial thinning error: more samples than\\\\ sampled cells&quot; ) ) # check that there is one sample per cell assertthat::assert_that( assertthat::are_equal( max(count(points_max, uid_cell)$n), 1 ) ) # return data without UID cell and coordinate Id dplyr::select(ungroup(points_max), -uid_cell, -coordId, -tot_effort) }) # remove old data rm(dataGrouped) 8.2.2 Count absences after spatial thinning data_presence_prop &lt;- data_presence_prop %&gt;% mutate( absences_sp_thin = map_int(data_thin_absences, nrow) ) 8.3 Temporal subsampling Additionally, from each selected site, we randomly selected a maximum of 10 absence checklists, which reduced temporal clustering. We kept all presence checklists. # subsample data for random 10 observations dataSubsample &lt;- map(data_thin_absences, function(df) { df &lt;- ungroup(df) df_to_locality &lt;- split(x = df, f = df$locality) df_samples &lt;- map_if( .x = df_to_locality, .p = function(x) { nrow(x) &gt; 10 }, .f = function(x) sample_n(x, 10, replace = FALSE) ) bind_rows(df_samples) }) 8.3.1 Count absences after temporal thinning data_presence_prop &lt;- data_presence_prop %&gt;% mutate( absences_tmp_thin = map_int(dataSubsample, nrow), presence_prop_post_thin = presences / (presences + absences_tmp_thin) ) # save data write_csv(data_presence_prop, &quot;results/08_data-class-balance.csv&quot;) # bind all spatially and temporally thinned absences rows for data frame dataSubsample &lt;- bind_rows(dataSubsample) # convert presence data to UTM 43 N and long-lat to X-Y data_presences &lt;- bind_cols( data_presences, as_tibble( st_as_sf( data_presences, coords = c(&quot;longitude&quot;, &quot;latitude&quot;), crs = 4326 ) %&gt;% st_transform(32643) %&gt;% st_coordinates() ) ) # drop long lat data_presences &lt;- dplyr::select(data_presences, -longitude, -latitude) # join ALL PRESENCES and THINNED ABSENCES dataSubsample &lt;- bind_rows(dataSubsample, data_presences) # check joined data assertthat::assert_that( max(apply(dataSubsample, 2, function(x) sum(is.na(x)))) == 0, msg = &quot;some columns missing from one of the datasets&quot; ) # remove previous data rm(data_thin_absences) 8.4 Add checklist calibration index Load the CCI computed in the previous section. The CCI was the lone observers expertise score for single-observer checklists, and the highest expertise score among observers for group checklists. # read in obs score and extract numbers expertiseScore &lt;- read_csv(&quot;results/04_data-obsExpertise-score.csv&quot;) %&gt;% mutate(numObserver = str_extract(observer, &quot;\\\\d+&quot;)) %&gt;% dplyr::select(-observer) # group seis consist of multiple observers # in this case, seis need to have the highest expertise observer score # as the associated covariate # get unique observers per sei dataSeiScore &lt;- distinct( dataSubsample, sampling_event_identifier, observer_id ) %&gt;% # make list column of observers mutate(observers = str_split(observer_id, &quot;,&quot;)) %&gt;% unnest(cols = c(observers)) %&gt;% # add numeric observer id mutate(numObserver = str_extract(observers, &quot;\\\\d+&quot;)) %&gt;% # now get distinct sei and observer id numeric distinct(sampling_event_identifier, numObserver) # now add expertise score to sei dataSeiScore &lt;- left_join(dataSeiScore, expertiseScore, by = &quot;numObserver&quot; ) %&gt;% # get max expertise score per sei group_by(sampling_event_identifier) %&gt;% summarise(expertise = max(score)) # add to dataCovar dataSubsample &lt;- left_join(dataSubsample, dataSeiScore, by = &quot;sampling_event_identifier&quot; ) # remove data without expertise score dataSubsample &lt;- filter(dataSubsample, !is.na(expertise)) 8.5 Add climatic and landscape covariates Reload climate and land cover predictors prepared previously. # list landscape covariate stacks landscape_files &lt;- &quot;data/spatial/landscape_resamp01_km.tif&quot; # read in as stacks landscape_data &lt;- stack(landscape_files) # get proper names elev_names &lt;- c(&quot;elev&quot;, &quot;slope&quot;, &quot;aspect&quot;) chelsa_names &lt;- c(&quot;bio_1&quot;, &quot;bio_12&quot;) names(landscape_data) &lt;- as.character(glue(&#39;{c(elev_names, chelsa_names, &quot;landcover&quot;)}&#39;)) 8.6 Spatial buffers around selected checklists Every checklist on eBird is associated with a latitude and longitude. However, the coordinates entered by an observer may not accurately depict the location at which a species was detected. This can occur for two reasons: first, traveling checklists are often associated with a single location along the route travelled by observers; and second, checklist locations could be assigned to a hotspot  a location that is marked by eBird as being frequented by multiple observers. In many cases, an observation might be assigned to a hotspot even though the observation was not made at the precise location of the hotspot (Praveen J 2017). Johnston et al. (2021) showed that a large proportion of observations occurred within a 3km grid, even for those checklists up to 5km in length. Hence to adjust for spatial precision, we considered a minimum radius of 2.5km around each unique locality when sampling environmental covariate values. # assign neighbourhood radius in m sample_radius &lt;- 2.5 * 1e3 # get distinct points and make buffer ebird_buff &lt;- dataSubsample %&gt;% ungroup() %&gt;% distinct(X, Y) %&gt;% # remove NAs drop_na() # convert to spatial features ebird_buff &lt;- st_as_sf(ebird_buff, coords = c(&quot;X&quot;, &quot;Y&quot;), crs = 32643) %&gt;% # add long lat bind_cols(as_tibble(st_coordinates(.))) %&gt;% # make buffer around points st_buffer(dist = sample_radius) 8.7 Spatial buffer-wide covariates 8.7.1 Mean climatic covariates All climatic covariates are sampled by considering the mean values within a 2.5km radius as discussed above and prefixed am_. # get area mean for all preds except landcover, which is the last one stk &lt;- raster::dropLayer(landscape_data, &quot;landcover&quot;) # removing landcover here velstk &lt;- velox(stk) # velox raster value extraction dextr &lt;- velstk$extract( sp = ebird_buff, df = TRUE, fun = function(x) mean(x, na.rm = T) ) # assign names for joining names(dextr) &lt;- c(&quot;id&quot;, names(stk)) env_area_mean &lt;- as_tibble(dextr) # add id to buffer data ebird_buff &lt;- mutate(ebird_buff, id = seq(nrow(ebird_buff)) ) # join to buffer data ebird_buff &lt;- inner_join(ebird_buff, env_area_mean) 8.7.2 Proportions of land cover type All land cover covariates were sampled by considering the proportion of each land cover type within a 2.5km radius. # get the last element of each stack from the list # this is the landcover at that resolution lc &lt;- landscape_data[[&quot;landcover&quot;]] # accessing landcover here lc_velox &lt;- velox(lc) lc_vals &lt;- lc_velox$extract(sp = ebird_buff, df = TRUE) names(lc_vals) &lt;- c(&quot;id&quot;, &quot;lc&quot;) # get landcover proportions lc_prop &lt;- count(lc_vals, id, lc) %&gt;% group_by(id) %&gt;% mutate( lc = glue(&#39;lc_{str_pad(lc, 2, pad = &quot;0&quot;)}&#39;), prop = n / sum(n) ) %&gt;% dplyr::select(-n) %&gt;% tidyr::pivot_wider( names_from = lc, values_from = prop, values_fill = list(prop = 0) ) %&gt;% ungroup() # join to data ebird_buff &lt;- mutate(ebird_buff, lc_prop) 8.7.3 Link environmental covariates to checklists # drop geometry ebird_buff &lt;- st_drop_geometry(ebird_buff) # link to dataSubsample dataSubsample &lt;- inner_join(dataSubsample, ebird_buff, by = c(&quot;X&quot;, &quot;Y&quot;) ) Save data to file. # write to file write_csv(dataSubsample, path = glue(&quot;results/08_data-covars-2.5km.csv&quot;)) "],["modelling-species-occupancy.html", "Section 9 Modelling Species Occupancy 9.1 Load dataframe and prepare covariates 9.2 Running a null model 9.3 Identifying covariates necessary to model the detection process 9.4 Land Cover and Climate 9.5 Goodness-of-fit tests", " Section 9 Modelling Species Occupancy 9.0.1 Load necessary libraries # Load libraries # for ebird data library(auk) library(ebirdst) # general data library(tidyverse) library(data.table) library(lubridate) library(openxlsx) library(raster) # probably unnecessary # for models library(unmarked) library(MuMIn) library(AICcmodavg) library(fields) # for computation library(doParallel) library(snow) library(ecodist) # Source necessary functions source(&quot;R/fun_screen_cor.R&quot;) source(&quot;R/fun_model_estimate_collection.r&quot;) 9.1 Load dataframe and prepare covariates Here, we load the required dataframe that contains 10 random visits to a site and environmental covariates that were prepared at a spatial scale of 2.5 sq.km. We also scaled all covariates (mean around 0 and standard deviation of 1). Next, we ensured that only Traveling and Stationary checklists were considered. Even though stationary counts have no distance traveled, we defaulted all stationary accounts to an effective distance of 100m, which we consider the average maximum detection radius for most bird species in our area. # Load in the prepared dataframe dat &lt;- fread(&quot;results/08_data-covars-2.5km.csv&quot;, header = T) dat &lt;- as_tibble(dat) head(dat) 9.1.1 Handle the sampling protocol Select protocol and add 0.1 km to stationary checklists. # Some more pre-processing to get the right data structures # Ensuring that only Traveling and Stationary checklists were considered names(dat) dat &lt;- dat %&gt;% filter(protocol_type %in% c(&quot;Traveling&quot;, &quot;Stationary&quot;)) # We take all stationary counts and give them a distance of 100 m (so 0.1 km), # as that&#39;s approximately the max normal hearing distance for people doing point # counts. dat &lt;- dat %&gt;% mutate(effort_distance_km = if_else( effort_distance_km == 0 &amp; protocol_type == &quot;Stationary&quot;, 0.1, effort_distance_km )) 9.1.2 Handle time and date Convert time and date to julian date and minutes since day. # Converting time observations started to numeric and adding it as a new column # This new column will be minute_observations_started dat &lt;- dat %&gt;% mutate( min_obs_started = as.integer( as.difftime( time_observations_started, format = &quot;%H:%M:%S&quot;, units = &quot;mins&quot; ) ) ) # Adding the julian date to the dataframe dat &lt;- dat %&gt;% mutate(julian_date = lubridate::yday(observation_date)) # recode julian date to model it as a linear predictor dat &lt;- dat %&gt;% mutate( newjulianDate = case_when( (julian_date &gt;= 334 &amp; julian_date) &lt;= 365 ~ (julian_date - 333), (julian_date &gt;= 1 &amp; julian_date) &lt;= 152 ~ (julian_date + 31) ) ) %&gt;% drop_na(newjulianDate) # recode time observations started to model it as a linear predictor dat &lt;- dat %&gt;% mutate( newmin_obs_started = case_when( min_obs_started &gt;= 300 &amp; min_obs_started &lt;= 720 ~ abs(min_obs_started - 720), min_obs_started &gt;= 720 &amp; min_obs_started &lt;= 1140 ~ abs(720 - min_obs_started) ) ) %&gt;% drop_na(newmin_obs_started) 9.1.3 Scaling covariates # Removing other unnecessary columns from the dataframe and creating a clean one without the rest names(dat) # select relevant columns BY NAME dat &lt;- dplyr::select( dat, c( &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;locality&quot;, &quot;locality_type&quot;, &quot;locality_id&quot;, &quot;observer_id&quot;, &quot;observation_date&quot;, &quot;scientific_name&quot;, &quot;observation_count&quot;, &quot;protocol_type&quot;, &quot;number_observers&quot;, &quot;pres_abs&quot; ), # rename X-Y but NOTE THEY ARE IN UTM COORDINATES longitude = &quot;X&quot;, latitude = &quot;Y&quot;, expertise, # elevation and climate layers elev, bio4, bio15, # all LANDCOVER COLUMNS matches(&quot;lc&quot;), # set new columns to old column names julian_date = &quot;newjulianDate&quot;, min_obs_started = &quot;newmin_obs_started&quot; ) # add year and convert presence-absence to integer dat.1 &lt;- dat %&gt;% mutate( year = year(observation_date), pres_abs = as.integer(pres_abs) ) # occupancy modeling requires an integer response # Scaling detection and occupancy covariates dat.scaled &lt;- dat.1 # Note: Never refer to columns by numbers, numbers change, names remain cols_to_scale &lt;- c( &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;elev&quot;, &quot;bio4a&quot;, &quot;bio15a&quot;, &quot;lc_02&quot;, &quot;lc_09&quot;, &quot;lc_01&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;julian_date&quot;, &quot;min_obs_started&quot; ) # this scales the relevant columns between 0 and 1 dat.scaled &lt;- mutate( dat.scaled, across( .cols = all_of(cols_to_scale), # referring to the columns .fns = scales::rescale # the rescale function ) ) # save data to file fwrite(dat.scaled, &quot;results/09_scaled-covars-2.5km.csv&quot;) 9.1.4 Correct date format # Reload the scaled covariate data dat.scaled &lt;- fread(&quot;results/09_scaled-covars-2.5km.csv&quot;, header = T) dat.scaled &lt;- as_tibble(dat.scaled) head(dat.scaled) # Ensure observation_date column is in the right format dat.scaled$observation_date &lt;- format( as.Date( dat.scaled$observation_date, &quot;%m/%d/%Y&quot; ), &quot;%Y-%m-%d&quot; ) 9.1.5 Check for correlated covariates # Testing for correlations before running further analyses # Majority are uncorrelated since we decided to keep climatic and land cover predictors and removed elevation. source(&quot;R/fun_screen_cor.R&quot;) # SELECT COLUMNS to check BY NAME cols_to_check &lt;- c( &quot;expertise&quot;, &quot;bio4a&quot;, &quot;bio15a&quot;, &quot;lc_02&quot;, &quot;lc_09&quot;, &quot;lc_01&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_07&quot;, &quot;lc_03&quot; ) # screen covariates for correlation screen.cor(dat.scaled[, cols_to_check], threshold = 0.3) # total number of presences by species # min no. presences = 224 to max = 7725 presSpecies &lt;- dat.scaled %&gt;% group_by(scientific_name) %&gt;% filter(pres_abs == &quot;1&quot;) %&gt;% summarise(n = n()) # convert locality_id to factors dat.scaled$locality_id &lt;- as.factor(dat.scaled$locality_id) 9.2 Running a null model # All null models are stored in lists below all_null &lt;- list() # define species and a counter species &lt;- unique(dat.scaled$scientific_name) counter &lt;- 0 # Add a progress bar for the loop pb &lt;- txtProgressBar( min = 0, max = length(species), style = 3 ) # text based bar # loop over species for (i in species) { # filter data by species data &lt;- dat.scaled %&gt;% filter(scientific_name == i) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) # Set up the model # the list is now automatically named all_null[[i]] &lt;- occu(~1 ~ 1, data = occ_um) # increase counter counter &lt;- counter + 1 setTxtProgressBar(pb, counter) } close(pb) # Store all the model outputs for each species capture.output(all_null, file = &quot;results/09_null_models.csv&quot;) 9.3 Identifying covariates necessary to model the detection process Here, we use the unmarked package in R (Fiske and Chandler 2011) to identify detection level covariates that are important for each species. We use AIC criteria to select top models (Burnham et al. 2002; 2011). # All models are stored in lists below det_dred &lt;- list() # Subsetting those models whose deltaAIC&lt;4 (Burnham et al. 2011) top_det &lt;- list() # Getting model averaged coefficients and relative importance scores det_avg &lt;- list() det_imp &lt;- list() # Getting model estimates det_modelEst &lt;- list() # Add a progress bar for the loop pb &lt;- txtProgressBar( min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3 ) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) # Fit a global model with all detection level covariates global_mod &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + expertise ~ 1, data = occ_um) # Set up the cluster clusterType &lt;- if (length(find.package(&quot;snow&quot;, quiet = TRUE))) &quot;SOCK&quot; else &quot;PSOCK&quot; clust &lt;- try(makeCluster(getOption(&quot;cl.cores&quot;, 5), type = clusterType)) clusterEvalQ(clust, library(unmarked)) clusterExport(clust, &quot;occ_um&quot;) det_dred[[i]] &lt;- pdredge(global_mod, clust) names(det_dred)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Get the top models, which we&#39;ll define as those with deltaAICc &lt; 4 top_det[[i]] &lt;- get.models(det_dred[[i]], subset = delta &lt; 4, cluster = clust) names(top_det)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Obtaining model averaged coefficients if (length(top_det[[i]]) &gt; 1) { a &lt;- model.avg(top_det[[i]], fit = TRUE) det_avg[[i]] &lt;- as.data.frame(a$coefficients) names(det_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] det_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(a, full = T)[, 1], SE = coefTable(a, full = T)[, 2], lowerCI = confint(a)[, 1], upperCI = confint(a)[, 2], z_value = (summary(a)$coefmat.full)[, 3], Pr_z = (summary(a)$coefmat.full)[, 4] ) names(det_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] det_imp[[i]] &lt;- as.data.frame(MuMIn::importance(a)) names(det_imp)[i] &lt;- unique(dat.scaled$scientific_name)[i] } else { det_avg[[i]] &lt;- as.data.frame(unmarked::coef(top_det[[i]][[1]])) names(det_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lowDet &lt;- data.frame(lowerCI = confint(top_det[[i]][[1]], type = &quot;det&quot;)[, 1]) upDet &lt;- data.frame(upperCI = confint(top_det[[i]][[1]], type = &quot;det&quot;)[, 2]) zDet &lt;- data.frame(summary(top_det[[i]][[1]])$det[, 3]) Pr_zDet &lt;- data.frame(summary(top_det[[i]][[1]])$det[, 4]) Coefficient &lt;- coefTable(top_det[[i]][[1]])[, 1] SE &lt;- coefTable(top_det[[i]][[1]])[, 2] det_modelEst[[i]] &lt;- data.frame( Coefficient = Coefficient[2:length(Coefficient)], SE = SE[2:length(SE)], lowerCI = lowDet, upperCI = upDet, z_value = zDet, Pr_z = Pr_zDet ) names(det_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] } setTxtProgressBar(pb, i) stopCluster(clust) } close(pb) ## Storing output from the above models in excel sheets # 1. Store all the model outputs for each species (variable: det_dred() - see above) write.xlsx(det_dred, file = &quot;results/09_det-dred.xlsx&quot;) # 2. Store all the model averaged outputs for each species and the relative importance score write.xlsx(det_avg, file = &quot;results/09_det-avg.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(det_imp, file = &quot;results/09_det-imp.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(det_modelEst, file = &quot;results/09_det-modelEst.xlsx&quot;, rowNames = T, colNames = T) # Note if you are unable to write to a file, use (for example) a &lt;- purrr::map(det_imp, ~ purrr::compact(.)) %&gt;% purrr::keep(~ length(.) != 0) 9.4 Land Cover and Climate Occupancy models estimate the probability of occurrence of a given species while controlling for the probability of detection and allow us to model the factors affecting occurrence and detection independently. The flexible eBird observation process contributes to the largest source of variation in the likelihood of detecting a particular species; hence, we included seven covariates that influence the probability of detection for each checklist: ordinal day of year, duration of observation, distance traveled, protocol type, time observations started, number of observers and the checklist calibration index (CCI). Using a multi-model information-theoretic approach, we tested how strongly our occurrence data fit our candidate set of environmental covariates (Burnham et al. 2002). We fitted single-species occupancy models for each species, to simultaneously estimate a probability of detection (\\(\\p\\)) and a probability of occupancy (\\(\\psi\\)). For each species, we fit models, each with a unique combination of the climate and land cover occupancy covariates and all seven detection covariates. Across the models tested for each species, the model with highest support was determined using AICc scores. However, across the majority of the species, no single model had overwhelming support. Hence, for each species, we examined those models which had \\(\\Delta\\)AICc &lt; 4, as these top models were considered to explain a large proportion of the association between the species-specific probability of occupancy and environmental drivers (Burnham et al. 2011; Elsen et al. 2017). Using these restricted model sets for each species; we created a model-averaged coefficient estimate for each predictor and assessed its direction and significance. We considered a predictor to be significantly associated with occupancy if the range of the 95% confidence interval around the model-averaged coefficient did not contain zero. # All models are stored in lists below lc_clim &lt;- list() # Subsetting those models whose deltaAIC&lt;4 (Burnham et al. 2011) top_lc_clim &lt;- list() # Getting model averaged coefficients and relative importance scores lc_clim_avg &lt;- list() lc_clim_imp &lt;- list() # Storing Model estimates lc_clim_modelEst &lt;- list() # Add a progress bar for the loop pb &lt;- txtProgressBar(min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) model_lc_clim &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + expertise ~ lc_01 + lc_02 + lc_05 + lc_04 + lc_09 + lc_07 + lc_03 + bio4a + bio15a, data = occ_um) # Set up the cluster clusterType &lt;- if (length(find.package(&quot;snow&quot;, quiet = TRUE))) &quot;SOCK&quot; else &quot;PSOCK&quot; clust &lt;- try(makeCluster(getOption(&quot;cl.cores&quot;, 5), type = clusterType)) clusterEvalQ(clust, library(unmarked)) clusterExport(clust, &quot;occ_um&quot;) # Detection terms are fixed det_terms &lt;- c( &quot;p(duration_minutes)&quot;, &quot;p(effort_distance_km)&quot;, &quot;p(expertise)&quot;, &quot;p(julian_date)&quot;, &quot;p(min_obs_started)&quot;, &quot;p(number_observers)&quot; ) lc_clim[[i]] &lt;- pdredge(model_lc_clim, clust, fixed = det_terms) names(lc_clim)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Identiying top subset of models based on deltaAIC scores being less than 4 (Burnham et al., 2011) top_lc_clim[[i]] &lt;- get.models(lc_clim[[i]], subset = delta &lt; 4, cluster = clust) names(top_lc_clim)[i] &lt;- unique(dat.scaled$scientific_name)[i] # Obtaining model averaged coefficients for both candidate model subsets if (length(top_lc_clim[[i]]) &gt; 1) { a &lt;- model.avg(top_lc_clim[[i]], fit = TRUE) lc_clim_avg[[i]] &lt;- as.data.frame(a$coefficients) names(lc_clim_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lc_clim_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(a, full = T)[, 1], SE = coefTable(a, full = T)[, 2], lowerCI = confint(a)[, 1], upperCI = confint(a)[, 2], z_value = (summary(a)$coefmat.full)[, 3], Pr_z = (summary(a)$coefmat.full)[, 4] ) names(lc_clim_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] lc_clim_imp[[i]] &lt;- as.data.frame(MuMIn::importance(a)) names(lc_clim_imp)[i] &lt;- unique(dat.scaled$scientific_name)[i] } else { lc_clim_avg[[i]] &lt;- as.data.frame(unmarked::coef(top_lc_clim[[i]][[1]])) names(lc_clim_avg)[i] &lt;- unique(dat.scaled$scientific_name)[i] lowSt &lt;- data.frame(lowerCI = confint(top_lc_clim[[i]][[1]], type = &quot;state&quot;)[, 1]) lowDet &lt;- data.frame(lowerCI = confint(top_lc_clim[[i]][[1]], type = &quot;det&quot;)[, 1]) upSt &lt;- data.frame(upperCI = confint(top_lc_clim[[i]][[1]], type = &quot;state&quot;)[, 2]) upDet &lt;- data.frame(upperCI = confint(top_lc_clim[[i]][[1]], type = &quot;det&quot;)[, 2]) zSt &lt;- data.frame(z_value = summary(top_lc_clim[[i]][[1]])$state[, 3]) zDet &lt;- data.frame(z_value = summary(top_lc_clim[[i]][[1]])$det[, 3]) Pr_zSt &lt;- data.frame(Pr_z = summary(top_lc_clim[[i]][[1]])$state[, 4]) Pr_zDet &lt;- data.frame(Pr_z = summary(top_lc_clim[[i]][[1]])$det[, 4]) lc_clim_modelEst[[i]] &lt;- data.frame( Coefficient = coefTable(top_lc_clim[[i]][[1]])[, 1], SE = coefTable(top_lc_clim[[i]][[1]])[, 2], lowerCI = rbind(lowSt, lowDet), upperCI = rbind(upSt, upDet), z_value = rbind(zSt, zDet), Pr_z = rbind(Pr_zSt, Pr_zDet) ) names(lc_clim_modelEst)[i] &lt;- unique(dat.scaled$scientific_name)[i] } gc() setTxtProgressBar(pb, i) stopCluster(clust) } close(pb) # 1. Store all the model outputs for each species (for both landcover and climate) write.xlsx(lc_clim, file = &quot;results/09_lc-clim.xlsx&quot;) # 2. Store all the model averaged outputs for each species and relative importance scores write.xlsx(lc_clim_avg, file = &quot;results/09_lc-clim-avg.xlsx&quot;, rowNames = T, colNames = T) write.xlsx(lc_clim_imp, file = &quot;results/09_lc-clim-imp.xlsx&quot;, rowNames = T, colNames = T) # 3. Store all model estimates write.xlsx(lc_clim_modelEst, file = &quot;results/09_lc-clim-modelEst.xlsx&quot;, rowNames = T, colNames = T) # Note if you are unable to write to a file, use (for example) a &lt;- purrr::map(lc_clim_modelEst, ~ purrr::compact(.)) %&gt;% purrr::keep(~ length(.) != 0) 9.5 Goodness-of-fit tests Adequate model fit was assessed using a chi-square goodness-of-fit test using 1,000 parametric bootstrap simulations on a global model that included all occupancy and detection covariates (MacKenzie &amp; Bailey, 2004). goodness_of_fit &lt;- data.frame() # Add a progress bar for the loop pb &lt;- txtProgressBar(min = 0, max = length(unique(dat.scaled$scientific_name)), style = 3) # text based bar for (i in 1:length(unique(dat.scaled$scientific_name))) { data &lt;- dat.scaled %&gt;% filter(dat.scaled$scientific_name == unique(dat.scaled$scientific_name)[i]) # Preparing data for the unmarked model occ &lt;- filter_repeat_visits(data, min_obs = 1, max_obs = 10, annual_closure = FALSE, n_days = 1488, # 7 years is considered a period of closure date_var = &quot;observation_date&quot;, site_vars = c(&quot;locality_id&quot;) ) obs_covs &lt;- c( &quot;min_obs_started&quot;, &quot;duration_minutes&quot;, &quot;effort_distance_km&quot;, &quot;number_observers&quot;, &quot;protocol_type&quot;, &quot;expertise&quot;, &quot;julian_date&quot; ) # format for unmarked occ_wide &lt;- format_unmarked_occu(occ, site_id = &quot;site&quot;, response = &quot;pres_abs&quot;, site_covs = c( &quot;locality_id&quot;, &quot;lc_01&quot;, &quot;lc_02&quot;, &quot;lc_05&quot;, &quot;lc_04&quot;, &quot;lc_09&quot;, &quot;lc_07&quot;, &quot;lc_03&quot;, &quot;bio4a&quot;, &quot;bio15a&quot; ), obs_covs = obs_covs ) # Convert this dataframe of observations into an unmarked object to start fitting occupancy models occ_um &lt;- formatWide(occ_wide, type = &quot;unmarkedFrameOccu&quot;) model_lc_clim &lt;- occu(~ min_obs_started + julian_date + duration_minutes + effort_distance_km + number_observers + protocol_type + expertise ~ lc_01 + lc_02 + lc_05 + lc_04 + lc_09 + lc_07 + lc_03 + bio4a + bio15a, data = occ_um) # note: reduce nsim as this takes a very long time even with parallelization occ_gof &lt;- mb.gof.test(model_lc_clim, nsim = 1000, parallel = T, ncores = 5, plot.hist = FALSE ) p.value &lt;- occ_gof$p.value c.hat &lt;- occ_gof$c.hat.est scientific_name &lt;- unique(data$scientific_name) a &lt;- data.frame(scientific_name, p.value, c.hat) goodness_of_fit &lt;- rbind(a, goodness_of_fit) setTxtProgressBar(pb, i) } close(pb) write.csv(goodness_of_fit, &quot;results/09_goodness-of-fit-2.5km.csv&quot;, row.names = F) "],["visualizing-occupancy-predictor-effects.html", "Section 10 Visualizing Occupancy Predictor Effects 10.1 Prepare libraries 10.2 Load species list 10.3 Show AIC weight importance 10.4 Prepare model coefficient data 10.5 Get predictor effects", " Section 10 Visualizing Occupancy Predictor Effects In this section, we will visualize the magnitude and direction of species-specific probability of occupancy. 10.1 Prepare libraries # to load data library(readxl) # to handle data library(dplyr) library(readr) library(forcats) library(tidyr) library(purrr) library(stringr) # library(data.table) # to wrangle models source(&quot;R/fun_model_estimate_collection.r&quot;) source(&quot;R/fun_make_resp_data.r&quot;) # nice tables library(knitr) library(kableExtra) # plotting library(ggplot2) library(patchwork) source(&quot;R/fun_plot_interaction.r&quot;) 10.2 Load species list # list of species # Removing species after running a chi-square goodness of fit test species &lt;- read_csv(&quot;data/species_list.csv&quot;) %&gt;% filter(!scientific_name %in% c( &quot;Treron affinis&quot;, &quot;Prinia hodgsonii&quot;, &quot;Pellorneum ruficeps&quot;, &quot;Hypothymis azurea&quot;,&quot;Dendrocitta leucogastra&quot;,&quot;Chalcophaps indica&quot;, &quot;Rubigula gularis&quot;, &quot;Muscicapa dauurica&quot;, &quot;Geokichla citrina&quot;, &quot;Chrysocolaptes guttacristatus&quot;,&quot;Terpsiphone paradisi&quot;,&quot;Orthotomus sutorius&quot;, &quot;Oriolus kundoo&quot;, &quot;Dicrurus aeneus&quot;, &quot;Cyornis tickelliae&quot;, &quot;Copsychus fulicatus&quot;, &quot;Oriolus xanthornus&quot;, &quot;Alcippe poioicephala&quot;, &quot;Ficedula nigrorufa&quot;,&quot;Dendrocitta vagabunda&quot;, &quot;Dicrurus paradiseus&quot;, &quot;Ocyceros griseus&quot;, &quot;Psilopogon viridis&quot;, &quot;Psittacula cyanocephala&quot;)) list_of_species &lt;- as.character(species$scientific_name) 10.3 Show AIC weight importance To get cumulative AIC weights, we first obtained a measure of relative importance of climatic and landscape predictors by calculating cumulative variable importance scores. These scores were calculated by obtaining the sum of model weights (AIC weights) across all models (including the top models) for each predictor across all species. We then calculated the mean cumulative variable importance score and a standard deviation for each predictor. 10.3.1 Read in AIC weight data # which files to read file_names &lt;- c(&quot;results/09_lc-clim-imp.xlsx&quot;) # read in sheets by species model_imp &lt;- map(file_names, function(f) { md_list &lt;- map(list_of_species, function(sn) { # some sheets are not found tryCatch( { readxl::read_excel(f, sheet = sn) %&gt;% `colnames&lt;-`(c(&quot;predictor&quot;, &quot;AIC_weight&quot;)) %&gt;% filter(str_detect(predictor, &quot;psi&quot;)) %&gt;% mutate( predictor = stringr::str_extract(predictor, pattern = stringr::regex(&quot;\\\\((.*?)\\\\)&quot;) ), predictor = stringr::str_replace_all(predictor, &quot;[//(//)]&quot;, &quot;&quot;), predictor = stringr::str_remove(predictor, &quot;\\\\.y&quot;) ) }, error = function(e) { message(as.character(e)) } ) }) names(md_list) &lt;- list_of_species return(md_list) }) 10.3.2 Prepare cumulative AIC weight data # bind rows model_imp &lt;- map(model_imp, bind_rows) %&gt;% bind_rows() # convert to numeric model_imp$AIC_weight &lt;- as.numeric(model_imp$AIC_weight) # Let&#39;s get a summary of cumulative variable importance model_imp &lt;- group_by(model_imp, predictor) %&gt;% summarise( mean_AIC = mean(AIC_weight), sd_AIC = sd(AIC_weight), min_AIC = min(AIC_weight), max_AIC = max(AIC_weight), med_AIC = median(AIC_weight) ) # write to file write_csv(model_imp, &quot;results/10_cumulative-AIC-weights.csv&quot;) Read data back in. # read data and make factor model_imp &lt;- read_csv(&quot;results/10_cumulative-AIC-weights.csv&quot;) model_imp$predictor &lt;- as_factor(model_imp$predictor) # make nice names predictor_name &lt;- tibble( predictor = levels(model_imp$predictor), pred_name = c( &quot;Precipitation seasonality&quot;, &quot;Temperature seasonality&quot;, &quot;% Evergreen Forest&quot;, &quot;% Deciduous Forest&quot;, &quot;% Mixed/Degraded Forest&quot;, &quot;% Agriculture/Settlements&quot;, &quot;% Grassland&quot;, &quot;% Plantations&quot;, &quot;% Water Bodies&quot; ) ) # rename predictor model_imp &lt;- left_join(model_imp, predictor_name) Prepare figure for cumulative AIC weight. Figure code is hidden in versions rendered as HTML and PDF. 10.4 Prepare model coefficient data For each species, we examined those models which had AICc &lt; 4, as these top models were considered to explain a large proportion of the association between the species-specific probability of occupancy and environmental drivers. Using these restricted model sets for each species; we created a model-averaged coefficient estimate for each predictor and assessed its direction and significance. We considered a predictor to be significantly associated with occupancy if the range of the 95% confidence interval around the model-averaged coefficient did not contain zero. file_read &lt;- c(&quot;results/09_lc-clim-modelEst.xlsx&quot;) # read data as list column model_est &lt;- map(list_of_species, function(sn) { tryCatch( { readxl::read_excel(file_read, sheet = sn) %&gt;% rename(predictor = &quot;...1&quot;) %&gt;% filter(str_detect(predictor, &quot;psi&quot;)) %&gt;% mutate( predictor = stringr::str_extract(predictor, pattern = stringr::regex(&quot;\\\\((.*?)\\\\)&quot;) ), predictor = stringr::str_replace_all(predictor, &quot;[//(//)]&quot;, &quot;&quot;), predictor = stringr::str_remove(predictor, &quot;\\\\.y&quot;) ) }, error = function(e) { message(as.character(e)) } ) }) # assign names names(model_est) &lt;- list_of_species # prepare model data model_data &lt;- tibble( scientific_name = list_of_species ) # remove null data model_est &lt;- keep(model_est, .p = function(x) !is.null(x)) # rename model data components and separate predictors names &lt;- c( &quot;predictor&quot;, &quot;coefficient&quot;, &quot;se&quot;, &quot;ci_lower&quot;, &quot;ci_higher&quot;, &quot;z_value&quot;, &quot;p_value&quot; ) # get data for plotting: model_est &lt;- map(model_est, function(df) { colnames(df) &lt;- names # df &lt;- separate_interaction_terms(df) # df &lt;- make_response_data(df) return(df) }) # add names and scales model_est &lt;- imap(model_est, function(.x, .y) { mutate(.x, scientific_name = .y) }) # remove modulators model_est &lt;- bind_rows(model_est) %&gt;% dplyr::select(-matches(&quot;modulator&quot;)) # join data to species name model_data &lt;- model_data %&gt;% left_join(model_est) # Keep only those predictors whose p-values are significant: model_data &lt;- model_data %&gt;% filter(p_value &lt; 0.05) %&gt;% filter(predictor != &quot;Int&quot;) Export predictor effects. # get predictor effect data data_predictor_effect &lt;- distinct( model_data, scientific_name, se, predictor, coefficient ) # write to file write_csv(data_predictor_effect, &quot;results/10_data-predictor-effect.csv&quot;) Export model data. model_data_to_file &lt;- model_data %&gt;% dplyr::select( predictor, scientific_name ) # remove .y model_data_to_file &lt;- model_data_to_file %&gt;% mutate(predictor = str_remove(predictor, &quot;\\\\.y&quot;)) write_csv( model_data_to_file, &quot;results/10_data-occupancy-predictors.csv&quot; ) Read in data after clearing R session. # first merge species trait data with significant predictor species_trait &lt;- read.csv(&quot;data/species-trait-dat.csv&quot;) sig_predictor &lt;- read.csv(&quot;results/10_data-predictor-effect.csv&quot;) merged_species_traits &lt;- inner_join(sig_predictor, species_trait, by = c(&quot;scientific_name&quot; = &quot;scientific_name&quot;) ) write_csv( merged_species_traits, &quot;results/10_results-predictors-species-traits.csv&quot; ) # read from file model_data &lt;- read_csv(&quot;results/10_results-predictors-species-traits.csv&quot;) Fix predictor name. # remove .y from predictors model_data &lt;- model_data %&gt;% mutate_at(.vars = c(&quot;predictor&quot;), .funs = function(x) { stringr::str_remove(x, &quot;.y&quot;) }) 10.5 Get predictor effects # is the coeff positive? how many positive per scale per predictor per axis of split? # now splitting by habitat --- forest or open country data_predictor &lt;- mutate(model_data, direction = coefficient &gt; 0 ) %&gt;% filter(predictor != &quot;Int&quot;, predictor != &quot;Ibio4^2&quot; &amp; # If you had squared terms predictor != &quot;Ibio15^2&quot;) %&gt;% rename(habitat = &quot;Habitat.type&quot;) %&gt;% count( predictor, habitat, direction ) %&gt;% mutate(mag = n * (if_else(direction, 1, -1))) # wrangle data to get nice bars data_predictor &lt;- data_predictor %&gt;% dplyr::select(-n) %&gt;% drop_na(direction) %&gt;% mutate(direction = ifelse(direction, &quot;positive&quot;, &quot;negative&quot;)) %&gt;% pivot_wider(values_from = &quot;mag&quot;, names_from = &quot;direction&quot;) %&gt;% mutate_at( vars(positive, negative), ~ if_else(is.na(.), 0, .) ) data_predictor_long &lt;- data_predictor %&gt;% pivot_longer( cols = c(&quot;negative&quot;, &quot;positive&quot;), names_to = &quot;effect&quot;, values_to = &quot;magnitude&quot; ) # write write_csv( data_predictor_long, &quot;results/10_data-predictor-direction-nSpecies.csv&quot; ) Prepare data to determine the direction (positive or negative) of the effect of each predictor. How many species are affected in either direction? # join with predictor names and relative AIC data_predictor_long &lt;- left_join(data_predictor_long, model_imp) Prepare figure of the number of species affected in each direction. Figure code is hidden in versions rendered as HTML and PDF. Environmental predictors and species-specific associations The direction of association between species-specific probability of occupancy and climatic and landscape predictors is shown here (as a function of habitat preference). Blue colors show the number of species that are positively associated with a climatic/landscape predictor while red colors show the number of species that are negatively associated with a climatic/landscape predictor (see Table 1 for the number of forest/generalist species that show positive/negative association with each of the predictors). "],["predicting-species-specific-occupancy-as-a-function-of-significant-predictors.html", "Section 11 Predicting species-specific occupancy as a function of significant predictors 11.1 Prepare libraries 11.2 Read data 11.3 Prepare predictor data 11.4 Get predictor responses 11.5 Get probability of occupancy 11.6 Add scaling for predictors 11.7 Mapping species occupancy", " Section 11 Predicting species-specific occupancy as a function of significant predictors This script plots species-specific probabilities of occupancy as a function of significant environmental predictors and maps occupancy across the study area for a given list of species and significant predictors. 11.1 Prepare libraries # to handle data library(dplyr) library(readr) library(tidyr) library(purrr) library(stringr) library(glue) # library(data.table) # plotting library(ggplot2) library(patchwork) 11.2 Read data # read coefficient effect data data &lt;- read_csv(&quot;results/10_data-predictor-effect.csv&quot;) # check for a predictor column assertthat::assert_that( all(c(&quot;predictor&quot;, &quot;coefficient&quot;, &quot;se&quot;) %in% colnames(data)), msg = &quot;make_response_data: data must have columns called &#39;predictor&#39;, &#39;coefficient&#39;, and &#39;se&#39;&quot; ) 11.3 Prepare predictor data # preparep predictors - now look only for any digits predictors &lt;- c(&quot;bio\\\\d+&quot;, glue(&quot;lc_0{seq(9)}&quot;)) # prepare predictor search strings and scaling power preds &lt;- glue(&quot;({predictors})&quot;) preds &lt;- str_flatten(preds, collapse = &quot;|&quot;) # some way of identifying square terms power &lt;- (str_extract(data$predictor, &quot;Ibio&quot;)) power[!is.na(power)] = 2 power[is.na(power)] &lt;- 1 power = as.numeric(power) # assign predictor name and power data &lt;- mutate( data, predictor = str_extract(predictor, preds), power = power ) 11.4 Get predictor responses # make predictor sequences data &lt;- mutate( data, pred_val = map( predictor, function(x) { seq(0, 1, 0.05) } ), #handle squared terms pred_val_pow = purrr::map2( pred_val, power, function(x, y) { x^y } )) # get coefficient and error times terms data_resp &lt;- mutate( data, response = map2( pred_val_pow, coefficient, function(x, y) { x * y } ), resp_var = map2( pred_val_pow, se, function(x, y) { x * y } ) ) 11.5 Get probability of occupancy # unnest and get responses data_resp &lt;- unnest( data_resp, cols = c(&quot;response&quot;, &quot;resp_var&quot;, &quot;pred_val&quot;) ) # get responses for quadratic terms data_resp &lt;- group_by( data_resp, scientific_name, predictor, pred_val ) %&gt;% dplyr::select(-power, -coefficient, -se) %&gt;% summarise( across( .cols = c(&quot;response&quot;, &quot;resp_var&quot;), .fns = sum ), .groups = &quot;keep&quot; ) # get probability of occupancy data_resp &lt;- ungroup( data_resp ) %&gt;% mutate( p_occu = 1 / (1 + exp(-response)), p_occu_low = 1 / (1 + exp(-(response - resp_var))), p_occu_high = 1 / (1 + exp(-(response + resp_var))) ) 11.6 Add scaling for predictors # scale predictors scale15 &lt;- c(30, 50) # range of precipitation scale4 &lt;- c(0, 1) # range of temperatures # scale bio4a and bio15a by actual values data_resp &lt;- mutate( data_resp, pred_val = case_when( predictor == &quot;bio4&quot; ~ scales::rescale(pred_val, to = scale4), predictor == &quot;bio15&quot; ~ scales::rescale(pred_val, to = scale15), T ~ pred_val ) ) # make long data_poccu &lt;- dplyr::select( data_resp, -response, -resp_var ) # select species soi &lt;- c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;, &quot;Merops leschenaulti&quot;,&quot;Myophonus horsfieldii&quot;) which_predictors &lt;- c(&quot;bio4&quot;) 11.6.1 Figure: Occupancy ~ predictors data_fig &lt;- data_poccu %&gt;% filter( scientific_name %in% soi, predictor %in% which_predictors ) %&gt;% mutate( cat = case_when( scientific_name %in% c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;) ~ &quot;forest&quot;, T ~ &quot;general&quot; ) ) # split data data_fig &lt;- nest( data_fig, -cat ) # make plots make_occu_fig &lt;- function(df, this_fill) { ggplot( df ) + geom_ribbon( aes( pred_val, ymin = p_occu_low, ymax = p_occu_high ), fill = this_fill, alpha = 0.5 ) + geom_line( aes( pred_val, p_occu ), size = 1 ) + facet_grid( ~scientific_name ) + theme_test( base_family = &quot;Arial&quot; ) + theme( strip.text = element_text( face = &quot;italic&quot; ) ) + labs( x = &quot;Temperature seasonality&quot;, y = &quot;Probability of occupancy&quot; ) } fig_occu &lt;- map2(data_fig$data, &quot;grey&quot;, make_occu_fig) fig_occu &lt;- wrap_plots( fig_occu[c(1, 2)], ncol = 1, nrow = 2 ) &amp; theme( plot.tag = element_text( face = &quot;bold&quot; ) ) # save figure ggsave( fig_occu, filename = &quot;figs/fig_05.png&quot;, width = 5, height = 5.5 ) Probability of occupancy as a function of temperature seasonality. Predicted probability of occupancy curves as a function of temperature seasonality for four forest species are shown here. Temperature seasonality is negatively associated with the probability of occupancy of several forest species including the asian fairy-bluebird Irena puella, the crimson-backed sunbird Leptocoma minima, the chestnut-headed bee-eater Merops leschenaulti and the Malabar whistling-thrush Myophonus horsfieldii. 11.6.2 Figures: Occupancy ~ predictors for all species data_fig &lt;- nest( data_poccu, -scientific_name, -predictor ) pred_names &lt;- c( &quot;bio4&quot; = &quot;Temp. seasonality&quot;, &quot;bio15&quot; = &quot;Precip. seasonality&quot;, &quot;lc_01&quot; = &quot;Evergreen&quot;, &quot;lc_02&quot; = &quot;Deciduous&quot;, &quot;lc_03&quot; = &quot;Mixed/degraded&quot;, &quot;lc_04&quot; = &quot;Agri./Settl.&quot;, &quot;lc_05&quot; = &quot;Grassland&quot;, &quot;lc_07&quot; = &quot;Plantation&quot;, &quot;lc_09&quot; = &quot;Water&quot; ) pred_names &lt;- tibble( name = pred_names, predictor = names(pred_names) ) data_fig &lt;- left_join( data_fig, pred_names ) data_fig &lt;- mutate( data_fig, plots = map( data, function(df) { ggplot(df) + geom_ribbon( aes( pred_val, ymin = p_occu_low, ymax = p_occu_high ), fill = &quot;grey&quot;, alpha = 0.5 ) + geom_line( aes( pred_val, p_occu ) ) + coord_cartesian( ylim = c(0, 1) ) + theme_test( base_family = &quot;Arial&quot; ) + labs( x = &quot;Predictor&quot;, y = &quot;p(Occupancy)&quot; ) } ) ) # add names data_fig &lt;- mutate( data_fig, plots = map2( plots, name, function(p, name) { p &lt;- p + labs( x = name ) } ) ) # summarise as patchwork data_fig &lt;- group_by( data_fig, scientific_name ) %&gt;% summarise( plots = list( wrap_plots( plots, ncol = 5 ) ) ) # add title as sp data_fig &lt;- mutate( data_fig, plots = map2( plots, scientific_name, function(p, name) { p &lt;- p &amp; plot_annotation( title = name ) } ) ) # save images cairo_pdf( filename = &quot;figs/fig_occupancy_predictors.pdf&quot;, onefile = TRUE, width = 10, height = 2 ) data_fig$plots dev.off() 11.7 Mapping species occupancy 11.7.1 Read in raster layers library(terra) library(sf) # read saved rasters lscape = rast(&quot;data/spatial/landscape_resamp01_km.tif&quot;) # isolate temperature and rainfall bio4 = lscape[[4]] bio15 = lscape[[5]] # rain # careful while loading this raster, large size landcover &lt;- rast(&quot;data/landUseClassification/landcover_roy_2015_reclassified.tif&quot;) lc_1km &lt;- rast(&quot;data/landUseClassification/lc_01000m.tif&quot;) 11.7.2 Split landcover into proportions per 1km # separate the fine-scale landcover raster into presence-absence of each class lc_split &lt;- segregate(landcover) # resample to 1km # bilinear resampling uses the mean function. # mean of N 0s and 1s is the proportion of 1s, ie, proportion of each landcover lc_split &lt;- terra::resample( lc_split, lc_1km, method = &quot;bilinear&quot; ) # rename rasters names(lc_split) &lt;- pred_names$name[-c(1, 2)] # save raster of landcover proportion terra::writeRaster( lc_split, filename = &quot;data/spatial/raster_landcover_proportion_1km.tif&quot;, overwrite=TRUE ) rm(landcover) gc() # plot proportion of landcover classes png(width = 1200 * 2, height = 1200 * 2, filename = &quot;figs/fig_landcover_proportion_1km.png&quot;, res = 300) plot( lc_split, col = colorspace::sequential_hcl(20, palette = &quot;Viridis&quot;), range = c(0, 1) ) dev.off() 11.7.3 Prepare climatic layers # load landcover split lc_split = terra::rast(&quot;data/spatial/raster_landcover_proportion_1km.tif&quot;) 11.7.4 Mask by study area # mask by hills # run only if required (makes more sense to map to a larger area) hills = st_read(&quot;data/spatial/hillsShapefile/Nil_Ana_Pal.shp&quot;) %&gt;% st_transform(32643) #bio_1 = terra::mask( # bio_1, # vect(hills) #) #bio_12 = terra::mask( # bio_12, # vect(hills) #) # get ranges range4 &lt;- terra::minmax(bio4)[, 1] range15 &lt;- terra::minmax(bio15)[, 1] # rescale bio4 &lt;- (bio4 - min(range4)) / (diff(range4)) bio15 &lt;- (bio15 - min(range15)) / (diff(range15)) # project to UTM climate &lt;- c(bio4, bio15) names(climate) = c( &quot;Temp. seasonality&quot;, &quot;Precip. seasonality&quot; ) climate &lt;- terra::project( x = climate, y = lc_1km ) # make squared terms climate2 &lt;- climate * climate # names names(climate2) = glue(&quot;{names(climate)} 2&quot;) # add to landcover proportions and plot landscape &lt;- c(climate, lc_split) 11.7.5 Plot full bounds of landscape variables # plot proportion of landcover classes png( width = 1200 * 2, height = 1200 * 2, filename = &quot;figs/fig_landscape_1km.png&quot;, res = 300 ) plot( landscape, col = colorspace::sequential_hcl(20, palette = &quot;agSunset&quot;, rev = T), range = c(0, 1) ) dev.off() # add squared terms landscape &lt;- c( climate, climate2, lc_split ) #landscape = terra::mask( # landscape, # vect(hills) #) 11.7.6 Prepare soi predictors Prepare the soi predictor coefficients as a vector of the same length as the number of raster layers. These will be multiplied with each layer to give the effect of each layer. # get soi coefs sp_coefs &lt;- filter( data ) %&gt;% dplyr::select( -pred_val, -pred_val_pow ) # add missing landcover classes sp_preds &lt;- crossing( scientific_name = soi, predictor = pred_names$predictor, power = c(1, 2) ) # remove squared terms for landcover sp_preds &lt;- filter( sp_preds, !(str_detect(predictor, &quot;lc&quot;) &amp; power == 2) ) # correct square LC terms sp_coefs = mutate( sp_coefs, power = if_else( str_detect(predictor, &quot;lc&quot;), 1, power ) ) sp_coefs &lt;- full_join( sp_coefs, sp_preds ) # make wide --- this should give no warnings sp_coefs &lt;- pivot_wider( sp_coefs, id_cols = c(&quot;scientific_name&quot;), names_from = c(&quot;predictor&quot;, &quot;power&quot;), values_from = &quot;coefficient&quot; ) # get into order sp_coefs &lt;- dplyr::select( sp_coefs, scientific_name, c( &quot;bio4_1&quot;, &quot;bio15_1&quot;, &quot;bio4_2&quot;, &quot;bio15_2&quot; ), matches(&quot;lc&quot;) ) # get vectors of coefficients sp_coefs &lt;- nest( sp_coefs, -scientific_name ) 11.7.7 Prepare species occupancy for SOI Here, we shall simply multiply each landscape layer with the corresponding predictor coefficient. Where these are not available, we shall simply multiply the corresponding layer with NA. The resulting layers will be summed together to get a single response layer, which will then be inverse logit transformed to get the probability of occupancy. # multiply coefficients with layers soi_occu &lt;- map( sp_coefs[sp_coefs$scientific_name %in% soi, ]$data, .f = function(df) { response &lt;- unlist(slice(df, 1), use.names = F) * landscape response &lt;- sum(response, na.rm = TRUE) # remove NA layers, i.e., non-sig preds # now transform for probability occupancy response &lt;- 1 / (1 + exp(-response)) } ) # assign names names(soi_occu) &lt;- soi # make single stack soi_occu &lt;- reduce(soi_occu, c) names(soi_occu) &lt;- c(&quot;Irena puella&quot;,&quot;Leptocoma minima&quot;,&quot;Merops leschenaulti&quot;,&quot;Myophonus horsfieldii&quot;) # use stars for plotting with ggplot library(stars) library(colorspace) soi_occu &lt;- st_as_stars(soi_occu) fig_occu_map &lt;- ggplot() + geom_stars( data = soi_occu ) + scale_fill_binned_sequential( palette = &quot;Purple-Yellow&quot;, name = &quot;Probability of Occupancy&quot;, rev = T, limits = c(0, 1), breaks = seq(0, 1, 0.1), na.value = &quot;grey99&quot;, show.limits = T ) + facet_wrap( ~band, labeller = labeller( band = function(x) str_replace(x, &quot;\\\\.&quot;, &quot; &quot;) ) ) + coord_sf( crs = 32643, expand = FALSE ) + theme_test() + theme( # legend.position = &quot;rg&quot;, axis.title = element_blank(), axis.text = element_blank(), legend.key.height = unit(10, &quot;mm&quot;), legend.key.width = unit(1, &quot;mm&quot;), strip.text = element_text( face = &quot;italic&quot; ), legend.title = element_text( vjust = 1 ) ) # save figure ggsave( fig_occu_map, filename = &quot;figs/fig_06.png&quot;, width = 6, height = 6 ) Predicted area of occurrence Predicted area of occurrence for four forest species are shown here. The probability of occupancy of the asian fairy-bluebird Irena puella, the crimson-backed sunbird Leptocoma minima and the chestnut-headed bee-eater Merops leschenaulti is higher across the western slopes and at mid-elevations across our study area. The Malabar whistling-thrush Myophonus horsfieldii has a higher probability of occupancy across mid-elevations throughout the study area examined. 11.7.8 Prepare species occupancy for all species # multiply coefficients with layers sp_occu &lt;- map( sp_coefs$data, .f = function(df) { response &lt;- unlist(slice(df, 1), use.names = F) * landscape response &lt;- sum(response, na.rm = TRUE) # remove NA layers, i.e., non-sig preds # now transform for probability occupancy response &lt;- 1 / (1 + exp(-response)) } ) # make single stack sp_occu &lt;- reduce(sp_occu, c) # assign names names(sp_occu) &lt;- sp_coefs$scientific_name # use stars for plotting with ggplot library(stars) library(colorspace) sp_occu &lt;- st_as_stars(sp_occu) fig_occu_map_all &lt;- ggplot() + geom_stars( data = sp_occu ) + scale_fill_binned_sequential( palette = &quot;Purple-Yellow&quot;, name = &quot;p(Occu.)&quot;, rev = T, limits = c(0, 1), na.value = &quot;grey99&quot;, breaks = seq(0, 1, 0.1), show.limits = T ) + facet_wrap( ~band, labeller = labeller( band = function(x) str_replace(x, &quot;\\\\.&quot;, &quot; &quot;) ) ) + coord_sf( crs = 32643, expand = FALSE ) + theme_test( base_size = 8 ) + theme( # legend.position = &quot;rg&quot;, axis.title = element_blank(), axis.text = element_blank(), legend.key.height = unit(10, &quot;mm&quot;), legend.key.width = unit(1, &quot;mm&quot;), strip.text = element_text( face = &quot;italic&quot; ), strip.background = element_blank(), legend.title = element_text( vjust = 1 ) ) # save figure ggsave( fig_occu_map_all, filename = &quot;figs/fig_occupancy_maps.png&quot;, width = 16, height = 16 ) "],["references.html", "Section 12 References", " Section 12 References Aiello-Lammens, M. E. et al. 2015. spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. - Ecography 38: 541545. Ali, S. and Ripley, S. D. 1983. Handbook of the birds of India and Pakistan. - Oxford University Press. Anand, M. O. et al. 2010. Sustaining biodiversity conservation in human-modified landscapes in the Western Ghats: Remnant forests matter. - Biological Conservation 143: 23632374. Arasumani, M. et al. 2018. Not seeing the grass for the trees: plantations and agriculture shrink tropical montane grassland by two-thirds over four decades in the Palani Hills, a Western Ghats Sky Island. - PloS ONE 13: 118. Barto, K. 2009. MuMIn: multi-model inference. Barve, S. et al. 2021. Elevation and body size drive convergent variation in thermoinsulative feather structure of Himalayan birds. - Ecography 44: 680689. Boyle, W. A. et al. 2020. Hygric Niches for Tropical Endotherms. - Trends in Ecology and Evolution xx: 115. Burnham, K. P. and Anderson, D. R. 2002. Model selection and multimodel inference: a practical information-theoretic approach. - Springer. Burnham, K. P. et al. 2011. AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons. - Behavioral Ecology and Sociobiology 65: 2335. Butt, N. et al. 2015. Cascading effects of climate extremes on vertebrate fauna through changes to low-latitude tree flowering and fruiting phenology. - Global Change Biology 21: 32673277. Chan, W.-P. et al. 2016. Seasonal and daily climate variation have opposite effects on species elevational range size. - Science 351: 14371439. Das, A. et al. 2006. Prioritisation of conservation areas in the Western Ghats, India. - Biological Conservation 133: 1631. Davies, R. G. et al. 2007. Topography, energy and the global distribution of bird species richness. - Proceedings of the Royal Society B: Biological Sciences 274: 11891197. Deutsch, C. A. et al. 2008. Impacts of climate warming on terrestrial ectotherms across latitude. - Proceedings of the National Academy of Sciences of the United States of America 105: 66686672. Devictor, V. et al. 2010. Beyond scarcity: citizen science programmes as useful tools for conservation biogeography: Citizen science and conservation biogeography. - Diversity and Distributions 16: 354362. Ellwood, E. R. et al. 2017. Citizen science and conservation: Recommendations for a rapidly moving field. - Biological Conservation 208: 14. Elsen, P. R. et al. 2017. The role of competition, ecotones, and temperature in the elevational distribution of Himalayan birds. - Ecology 98: 337348. Elsen, P. R. et al. 2018. Conserving Himalayan birds in highly seasonal forested and agricultural landscapes. - Conservation Biology 32: 13131324. Elsen, P. R. et al. 2020. Topography and human pressure in mountain ranges alter expected species responses to climate change. - Nature Communications 11: 110. Fink, D. et al. 2014. Crowdsourcing Meets Ecology: Distribution Models. - Association for the Advancement of Artificial Intelligence: 1930. Fiske, I. J. and Chandler, R. B. 2011. Unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. - Journal of Statistical Software 43: 123. Freeman, B. G. et al. 2018. Climate change causes upslope shifts and mountaintop extirpations in a tropical bird community. - Proceedings of the National Academy of Sciences 115: 1198211987. Frishkoff, L. O. et al. 2016. Climate change and habitat conversion favour the same species. - Ecology letters 19: 10811090. Gadgil, M. and Meher-Homji, V. 1986. Localities of great significance to conservation of Indias biological diversity. - Proceedings of the Indian Academy of Sciences: 165180. Guo, F. et al. 2018. Land-use change interacts with climate to determine elevational species redistribution. - Nature Communications 2018 9:1 9: 13151315. Jankowski, J. E. et al. 2013. Exploring the role of physiology and biotic interactions in determining elevational ranges of tropical animals. - Ecography 36: 112. Janzen, D. H. 1967. Why Mountain Passes are Higher in the Tropics. - The American naturalist 101: 233249. Johnston, A. et al. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. - Ecological Applications 25: 17491756. Johnston, A. et al. 2018. Estimates of observer expertise improve species distributions from citizen science data. - Methods in Ecology and Evolution 9: 8897. Johnston, A. et al. 2021. Analytical guidelines to increase the value of community science data: An example using eBird data to estimate species distributions (Y Fourcade, Ed.). - Divers Distrib 27: 12651277. Karanth, K. K. et al. 2016. Producing Diversity: Agroforests Sustain Avian Richness and Abundance in Indias Western Ghats. - Frontiers in Ecology and Evolution 4: 110. Karger, D. N. et al. 2017. Climatologies at high resolution for the earths land surface areas. - Scientific Data 4: 120. Kelling, S. et al. 2015. Can observation skills of citizen scientists be estimated using species accumulation curves? - PLoS ONE 10: 120. Kelling, S. et al. 2019. Using Semistructured Surveys to Improve Citizen Science Data for Monitoring Biodiversity. - BioScience 69: 170179. Kennedy, C. M. et al. 2011. Landscape matrix mediates occupancy dynamics of Neotropical avian insectivores. - Ecological Applications 21: 18371850. La Sorte, F. A. and Jetz, W. 2010. Projected range contractions of montane biodiversity under global warming. - Proceedings of the Royal Society B: Biological Sciences 277: 34013410. Loiselle, B. A. and Blake, J. G. 1991. Temporal Variation in Birds and Fruits Along an Elevational Gradient in Costa Rica. - Ecology 72: 180193. MacKenzie, D. I. and Bailey, L. L. 2004. Assessing the fit of site-occupancy models. - Journal of Agricultural, Biological, and Environmental Statistics 9: 300318. Mackenzie, D. I. et al. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. - Ecology 83: 119. MacKenzie, D. et al. 2017. Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. - Academic Press. Mani, M. S. 1974. Ecology and Biogeography in India. - Springer Netherlands. McGill, B. J. et al. 2006. Rebuilding community ecology from functional traits. - Trends in Ecology and Evolution 21: 178185. Myers, N. et al. 2000. Biodiversity hotspots for conservation priorities. - Nature 403: 853858. Newbold, T. et al. 2015. Global effects of land use on local terrestrial biodiversity. - Nature 520: 4550. Nogués-Bravo, D. et al. 2007. Exposure of global mountain systems to climate warming during the 21st Century. - Global Environmental Change 17: 420428. Nowakowski, A. J. et al. 2018. Changing Thermal Landscapes: Merging Climate Science and Landscape Ecology through Thermal Biology. - Current Landscape Ecology Reports 3: 5772. ODonnell, M. S. and Ignizio, D. A. 2012. Bioclimatic predictors for supporting ecological applications in the conterminous United States. OpenStreetMap contributors 2017. Planet dump retrieved from https://planet.osm.org. in press. Pascal, J. 1988. Wet evergreen forests of the Western Ghats of India: Ecology, structure, floristic composition and succession (Travaux de la Section scientifique et technique). - Institut Francais de Pondicherry. Payne, D. et al. 2017. Opportunities for research on mountain biodiversity under global change. - Current Opinion in Environmental Sustainability 29: 4047. Perez, T. M. et al. 2016. Thermal trouble in the tropics. - Science 351: 13921393. Peters, M. K. et al. 2019. Climateland-use interactions shape tropical mountain biodiversity and ecosystem functions. - Nature 568: 8892. Pigot, A. L. et al. 2020. Macroevolutionary convergence connects morphological form to ecological function in birds. - Nature Ecology and Evolution 4: 230239. Praveen J 2017. On the geo-precision of data for modelling home range of a speciesA commentary on Ramesh et al. (2017). - Biological Conservation. Praveen J 2021. Kerala Bird Atlas 2015-2020: features, outcomes and implications of a citizen-science project. - Current Science 122: 298309. Quintero, I. and Jetz, W. 2018. Global elevational diversity and diversification of birds. - Nature 555: 246250. R Core Team 2020. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Rahbek, C. et al. 2019. Humboldts enigma: What causes global patterns of mountain biodiversity? - Science 365: 11081113. Rajendran, K. et al. 2012. Monsoon circulation interaction with Western Ghats orography under changing climate: Projection by a 20-km mesh AGCM. - Theor Appl Climatol 110: 555571. Raman, T. R. S. 2006. Effects of habitat structure and adjacent habitats on birds in tropical rainforest fragments and shaded plantations in the Western Ghats, India. - Biodiversity and Conservation 15: 15771607. Raman, T. R. S. et al. 2021. Native shade trees aid bird conservation in tea plantations in southern India. - CURRENT SCIENCE 121: 12. Ranganathan, J. et al. 2010. Landscape-level effects on avifauna within tropical agriculture in the Western Ghats: Insights for management and conservation. - Biological Conservation 143: 29092917. Robin, V. V. et al. 2015. Islands within islands: two montane palaeo-endemic birds impacted by recent anthropogenic fragmentation. - Molecular ecology 24: 35723584. Robinson, O. J. et al. 2020. Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution models (L Maiorano, Ed.). - Divers Distrib 26: 976986. Roy, P. S. et al. 2015. Development of decadal (1985-1995-2005) land use and land cover database for India. - Remote Sensing 7: 24012430. ekerciolu, çaan H. et al. 2012. The effects of climate change on tropical birds. - Biological Conservation 148: 118. Senior, R. A. et al. 2017. A pantropical analysis of the impacts of forest degradation and conversion on local temperature. - Ecology and Evolution 7: 78977908. Sidhu, S. et al. 2010. Effects of plantations and home-gardens on tropical forest bird communities and mixed-species bird flocks in the southern Western Ghats. - Journal of the Bombay Natural History Society 107: 9191. Sirami, C. et al. 2017. Impacts of global change on species distributions: obstacles and solutions to integrate climate and land use. - Global Ecology and Biogeography 26: 385394. SoIB 2020. State of Indias Birds, 2020: Range, trends and conservation status. Sreekar, R. et al. 2013. Natural Windbreaks Sustain Bird Diversity in a Tea-Dominated Landscape. - PLoS ONE 8: 411. Srinivasan, U. and Wilcove, D. S. 2020. Interactive impacts of climate change and landuse change on the demography of montane birds. - Ecology. Srinivasan, U. et al. 2018. Temperature and competition interact to structure himalayan bird communities. - Proceedings of the Royal Society B: Biological Sciences. Srinivasan, U. et al. 2019. Annual temperature variation influences the vulnerability of montane bird communities to land-use change. - Ecography 42: 20842094. Steen, V. A. et al. 2021. Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data (J McPherson, Ed.). - Methods Ecol Evol 12: 216226. Stevens, G. C. 1989. The Latitudinal Gradient in Geographical Range: How so Many Species Coexist in the Tropics. - The American Naturalist 133: 240256. Sullivan, B. L. et al. 2009. eBird: A citizen-based bird observation network in the biological sciences. - Biological Conservation 142: 22822292. Sullivan, B. L. et al. 2014. The eBird enterprise: An integrated approach to development and application of citizen science. - Biological Conservation 169: 3140. Sunarto, S. et al. 2012. Tigers need cover: Multi-scale occupancy study of the big cat in Sumatran forest and plantation landscapes. - PLoS ONE. Tewksbury, J. J. et al. 2008. Putting the Heat on Tropical Animals. - Science (New York, N.Y.) 320: 12961297. Tingley, M. W. et al. 2009. Birds track their Grinnellian niche through a century of climate change. - Proceedings of the National Academy of Sciences 106: 1963719643. Tsai, P. Y. et al. 2020. New insights into the patterns and drivers of avian altitudinal migration from a growing crowdsourcing data source. - Ecography: 112. Urban, M. C. 2018. Escalator to extinction. - Proc Natl Acad Sci USA 115: 1187111873. van Strien, A. J. et al. 2013. Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. - Journal of Applied Ecology 50: 14501458. Vijayakumar, S. P. et al. 2016. Glaciations, gradients, and geography: multiple drivers of diversification of bush frogs in the Western Ghats Escarpment. - Proceedings of the Royal Society B: Biological Sciences 283: 2016101120161011. Viswanathan, A. et al. 2020. State of Indias Birds 2020: Background and Methodology.: 136. Williams, S. E. and Middleton, J. 2008. Climatic seasonality, resource bottlenecks, and abundance of rainforest birds: implications for global climate change: Birds, seasonality and climate change. - Diversity and Distributions 14: 6977. Wood, C. et al. 2011. eBird: Engaging Birders in Science and Conservation. - PLoS Biol 9: e1001220. Yalcin, S. and Leroux, S. J. 2018. An empirical test of the relative and combined effects of land-cover and climate change on local colonization and extinction. - Global Change Biology 24: 38493861. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
diff --git a/docs/selecting-species-of-interest.html b/docs/selecting-species-of-interest.html
index 8c1e703..7b448c1 100644
--- a/docs/selecting-species-of-interest.html
+++ b/docs/selecting-species-of-interest.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -378,14 +377,14 @@ <h2><span class="header-section-number">2.3</span> Subset an initial list of ter
 <span id="cb3-23"><a href="selecting-species-of-interest.html#cb3-23"></a><span class="co"># Beginning with 3.37 million observations of 684 species in eBird that occurred within the outlines of our study area (Fig. 1), over the years 2013–2021, we retained only those species that had a minimum of 1,000 detections each between 2013 and 2021 (347 species remaining; 3.33 million observations). Next, we divided the study area into 25x25 km cells following State of India’s Birds 2020 methodology. We kept only those species that occurred in at least 5% of all checklists across half of the grids (42 unique grid cells) from which they had been reported. </span></span>
 <span id="cb3-24"><a href="selecting-species-of-interest.html#cb3-24"></a></span>
 <span id="cb3-25"><a href="selecting-species-of-interest.html#cb3-25"></a><span class="co"># export the above list as .csv to carry out initial filtering based on natural history</span></span>
-<span id="cb3-26"><a href="selecting-species-of-interest.html#cb3-26"></a><span class="kw">write.csv</span>(totCount, <span class="st">&quot;data/species_list.csv&quot;</span>, <span class="dt">row.names =</span> F)</span></code></pre></div>
+<span id="cb3-26"><a href="selecting-species-of-interest.html#cb3-26"></a><span class="kw">write.csv</span>(totCount, <span class="st">&quot;data/species-list.csv&quot;</span>, <span class="dt">row.names =</span> F)</span></code></pre></div>
 </div>
 <div id="read-subset-of-species-following-filtering-and-removal-of-waterbirds-raptors-and-other-noctural-species" class="section level2">
 <h2><span class="header-section-number">2.4</span> Read subset of species following filtering and removal of waterbirds, raptors, and other noctural species</h2>
 <div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="selecting-species-of-interest.html#cb4-1"></a><span class="co"># add species of interest</span></span>
 <span id="cb4-2"><a href="selecting-species-of-interest.html#cb4-2"></a><span class="co"># please note the below script is obtained after manual subsetting based on natural history and hence the user is asked to examine the dataset obtained in the previous time step prior to further processing</span></span>
 <span id="cb4-3"><a href="selecting-species-of-interest.html#cb4-3"></a></span>
-<span id="cb4-4"><a href="selecting-species-of-interest.html#cb4-4"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species_list.csv&quot;</span>)</span>
+<span id="cb4-4"><a href="selecting-species-of-interest.html#cb4-4"></a>specieslist &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species-list.csv&quot;</span>)</span>
 <span id="cb4-5"><a href="selecting-species-of-interest.html#cb4-5"></a>speciesOfInterest &lt;-<span class="st"> </span>specieslist<span class="op">$</span>scientific_name</span></code></pre></div>
 </div>
 <div id="load-raw-data-for-locations" class="section level2">
@@ -464,8 +463,8 @@ <h2><span class="header-section-number">2.7</span> Which species are reported su
 <span id="cb7-11"><a href="selecting-species-of-interest.html#cb7-11"></a>  )</span>
 <span id="cb7-12"><a href="selecting-species-of-interest.html#cb7-12"></a></span>
 <span id="cb7-13"><a href="selecting-species-of-interest.html#cb7-13"></a><span class="co"># Write the above two results</span></span>
-<span id="cb7-14"><a href="selecting-species-of-interest.html#cb7-14"></a><span class="kw">write_csv</span>(tot_n_chklist, <span class="st">&quot;data/01_nchk_per_grid.csv&quot;</span>)</span>
-<span id="cb7-15"><a href="selecting-species-of-interest.html#cb7-15"></a><span class="kw">write_csv</span>(spp_grids, <span class="st">&quot;data/01_ngrids_per_spp.csv&quot;</span>)</span>
+<span id="cb7-14"><a href="selecting-species-of-interest.html#cb7-14"></a><span class="kw">write_csv</span>(tot_n_chklist, <span class="st">&quot;data/01_nchk-per-grid.csv&quot;</span>)</span>
+<span id="cb7-15"><a href="selecting-species-of-interest.html#cb7-15"></a><span class="kw">write_csv</span>(spp_grids, <span class="st">&quot;data/01_ngrids-per-spp.csv&quot;</span>)</span>
 <span id="cb7-16"><a href="selecting-species-of-interest.html#cb7-16"></a></span>
 <span id="cb7-17"><a href="selecting-species-of-interest.html#cb7-17"></a><span class="co"># left-join the datasets</span></span>
 <span id="cb7-18"><a href="selecting-species-of-interest.html#cb7-18"></a>ebd_summary &lt;-<span class="st"> </span><span class="kw">left_join</span>(ebd_summary, spp_grids, <span class="dt">by =</span> <span class="st">&quot;scientific_name&quot;</span>)</span>
@@ -486,7 +485,7 @@ <h2><span class="header-section-number">2.7</span> Which species are reported su
 <span id="cb7-33"><a href="selecting-species-of-interest.html#cb7-33"></a>)</span>
 <span id="cb7-34"><a href="selecting-species-of-interest.html#cb7-34"></a></span>
 <span id="cb7-35"><a href="selecting-species-of-interest.html#cb7-35"></a><span class="co"># Write the results</span></span>
-<span id="cb7-36"><a href="selecting-species-of-interest.html#cb7-36"></a><span class="kw">write_csv</span>(grid_prop_cut, <span class="st">&quot;data/01_chk_5_percent.csv&quot;</span>)</span>
+<span id="cb7-36"><a href="selecting-species-of-interest.html#cb7-36"></a><span class="kw">write_csv</span>(grid_prop_cut, <span class="st">&quot;data/01_prop-grids-per-spp.csv&quot;</span>)</span>
 <span id="cb7-37"><a href="selecting-species-of-interest.html#cb7-37"></a></span>
 <span id="cb7-38"><a href="selecting-species-of-interest.html#cb7-38"></a><span class="co"># Identifying the number of species that occur in potentially &lt;5% of all lists</span></span>
 <span id="cb7-39"><a href="selecting-species-of-interest.html#cb7-39"></a>total_number_lists &lt;-<span class="st"> </span><span class="kw">sum</span>(tot_n_chklist<span class="op">$</span>nchk)</span>
@@ -521,7 +520,7 @@ <h2><span class="header-section-number">2.8</span> Figure: Checklist distributio
 <h2><span class="header-section-number">2.9</span> Prepare the species list</h2>
 <div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="selecting-species-of-interest.html#cb9-1"></a><span class="co"># write the new list of species that occur in at least 5% of checklists across a minimum of 50% of the grids they have been reported in</span></span>
 <span id="cb9-2"><a href="selecting-species-of-interest.html#cb9-2"></a>new_sp_list &lt;-<span class="st"> </span><span class="kw">semi_join</span>(specieslist, grid_prop_cut, <span class="dt">by =</span> <span class="st">&quot;scientific_name&quot;</span>)</span>
-<span id="cb9-3"><a href="selecting-species-of-interest.html#cb9-3"></a><span class="kw">write_csv</span>(new_sp_list, <span class="st">&quot;data/01_list-of-species-cutoff.csv&quot;</span>)</span></code></pre></div>
+<span id="cb9-3"><a href="selecting-species-of-interest.html#cb9-3"></a><span class="kw">write_csv</span>(new_sp_list, <span class="st">&quot;results/01_list-of-species-cutoff.csv&quot;</span>)</span></code></pre></div>
 
 </div>
 </div>
diff --git a/docs/visualizing-occupancy-predictor-effects.html b/docs/visualizing-occupancy-predictor-effects.html
index 2e2672b..2ae6d1f 100644
--- a/docs/visualizing-occupancy-predictor-effects.html
+++ b/docs/visualizing-occupancy-predictor-effects.html
@@ -23,7 +23,7 @@
 
 
 
-<meta name="date" content="2022-04-26" />
+<meta name="date" content="2022-04-28" />
 
   <meta name="viewport" content="width=device-width, initial-scale=1" />
   <meta name="apple-mobile-web-app-capable" content="yes" />
@@ -134,8 +134,7 @@
 
 <ul class="summary">
 <li class="chapter" data-level="1" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i><b>1</b> Introduction</a><ul>
-<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-access"><i class="fa fa-check"></i><b>1.1</b> Data access</a></li>
-<li class="chapter" data-level="1.2" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.2</b> Data processing</a></li>
+<li class="chapter" data-level="1.1" data-path="index.html"><a href="index.html#data-processing"><i class="fa fa-check"></i><b>1.1</b> Data processing</a></li>
 </ul></li>
 <li class="chapter" data-level="2" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html"><i class="fa fa-check"></i><b>2</b> Selecting species of interest</a><ul>
 <li class="chapter" data-level="2.1" data-path="selecting-species-of-interest.html"><a href="selecting-species-of-interest.html#prepare-libraries"><i class="fa fa-check"></i><b>2.1</b> Prepare libraries</a></li>
@@ -348,7 +347,7 @@ <h2><span class="header-section-number">10.3</span> Show AIC weight importance</
 <div id="read-in-aic-weight-data" class="section level3">
 <h3><span class="header-section-number">10.3.1</span> Read in AIC weight data</h3>
 <div class="sourceCode" id="cb97"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb97-1"><a href="visualizing-occupancy-predictor-effects.html#cb97-1"></a><span class="co"># which files to read</span></span>
-<span id="cb97-2"><a href="visualizing-occupancy-predictor-effects.html#cb97-2"></a>file_names &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;data/results/lc-clim-imp.xlsx&quot;</span>)</span>
+<span id="cb97-2"><a href="visualizing-occupancy-predictor-effects.html#cb97-2"></a>file_names &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;results/09_lc-clim-imp.xlsx&quot;</span>)</span>
 <span id="cb97-3"><a href="visualizing-occupancy-predictor-effects.html#cb97-3"></a></span>
 <span id="cb97-4"><a href="visualizing-occupancy-predictor-effects.html#cb97-4"></a><span class="co"># read in sheets by species</span></span>
 <span id="cb97-5"><a href="visualizing-occupancy-predictor-effects.html#cb97-5"></a>model_imp &lt;-<span class="st"> </span><span class="kw">map</span>(file_names, <span class="cf">function</span>(f) {</span>
@@ -399,10 +398,10 @@ <h3><span class="header-section-number">10.3.2</span> Prepare cumulative AIC wei
 <span id="cb98-16"><a href="visualizing-occupancy-predictor-effects.html#cb98-16"></a>  )</span>
 <span id="cb98-17"><a href="visualizing-occupancy-predictor-effects.html#cb98-17"></a></span>
 <span id="cb98-18"><a href="visualizing-occupancy-predictor-effects.html#cb98-18"></a><span class="co"># write to file</span></span>
-<span id="cb98-19"><a href="visualizing-occupancy-predictor-effects.html#cb98-19"></a><span class="kw">write_csv</span>(model_imp, <span class="st">&quot;data/results/cumulative_AIC_weights.csv&quot;</span>)</span></code></pre></div>
+<span id="cb98-19"><a href="visualizing-occupancy-predictor-effects.html#cb98-19"></a><span class="kw">write_csv</span>(model_imp, <span class="st">&quot;results/10_cumulative-AIC-weights.csv&quot;</span>)</span></code></pre></div>
 <p>Read data back in.</p>
 <div class="sourceCode" id="cb99"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb99-1"><a href="visualizing-occupancy-predictor-effects.html#cb99-1"></a><span class="co"># read data and make factor</span></span>
-<span id="cb99-2"><a href="visualizing-occupancy-predictor-effects.html#cb99-2"></a>model_imp &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;data/results/cumulative_AIC_weights.csv&quot;</span>)</span>
+<span id="cb99-2"><a href="visualizing-occupancy-predictor-effects.html#cb99-2"></a>model_imp &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;results/10_cumulative-AIC-weights.csv&quot;</span>)</span>
 <span id="cb99-3"><a href="visualizing-occupancy-predictor-effects.html#cb99-3"></a>model_imp<span class="op">$</span>predictor &lt;-<span class="st"> </span><span class="kw">as_factor</span>(model_imp<span class="op">$</span>predictor)</span></code></pre></div>
 <div class="sourceCode" id="cb100"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb100-1"><a href="visualizing-occupancy-predictor-effects.html#cb100-1"></a><span class="co"># make nice names</span></span>
 <span id="cb100-2"><a href="visualizing-occupancy-predictor-effects.html#cb100-2"></a>predictor_name &lt;-<span class="st"> </span><span class="kw">tibble</span>(</span>
@@ -424,7 +423,7 @@ <h3><span class="header-section-number">10.3.2</span> Prepare cumulative AIC wei
 <div id="prepare-model-coefficient-data" class="section level2">
 <h2><span class="header-section-number">10.4</span> Prepare model coefficient data</h2>
 <p>For each species, we examined those models which had ΔAICc &lt; 4, as these top models were considered to explain a large proportion of the association between the species-specific probability of occupancy and environmental drivers. Using these restricted model sets for each species; we created a model-averaged coefficient estimate for each predictor and assessed its direction and significance. We considered a predictor to be significantly associated with occupancy if the range of the 95% confidence interval around the model-averaged coefficient did not contain zero.</p>
-<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb101-1"><a href="visualizing-occupancy-predictor-effects.html#cb101-1"></a>file_read &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;data/results/lc-clim-modelEst.xlsx&quot;</span>)</span>
+<div class="sourceCode" id="cb101"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb101-1"><a href="visualizing-occupancy-predictor-effects.html#cb101-1"></a>file_read &lt;-<span class="st"> </span><span class="kw">c</span>(<span class="st">&quot;results/09_lc-clim-modelEst.xlsx&quot;</span>)</span>
 <span id="cb101-2"><a href="visualizing-occupancy-predictor-effects.html#cb101-2"></a></span>
 <span id="cb101-3"><a href="visualizing-occupancy-predictor-effects.html#cb101-3"></a><span class="co"># read data as list column</span></span>
 <span id="cb101-4"><a href="visualizing-occupancy-predictor-effects.html#cb101-4"></a>model_est &lt;-<span class="st"> </span><span class="kw">map</span>(list_of_species, <span class="cf">function</span>(sn) {</span>
@@ -498,7 +497,7 @@ <h2><span class="header-section-number">10.4</span> Prepare model coefficient da
 <span id="cb102-7"><a href="visualizing-occupancy-predictor-effects.html#cb102-7"></a>)</span>
 <span id="cb102-8"><a href="visualizing-occupancy-predictor-effects.html#cb102-8"></a></span>
 <span id="cb102-9"><a href="visualizing-occupancy-predictor-effects.html#cb102-9"></a><span class="co"># write to file</span></span>
-<span id="cb102-10"><a href="visualizing-occupancy-predictor-effects.html#cb102-10"></a><span class="kw">write_csv</span>(data_predictor_effect, <span class="st">&quot;data/results/data_predictor_effect.csv&quot;</span>)</span></code></pre></div>
+<span id="cb102-10"><a href="visualizing-occupancy-predictor-effects.html#cb102-10"></a><span class="kw">write_csv</span>(data_predictor_effect, <span class="st">&quot;results/10_data-predictor-effect.csv&quot;</span>)</span></code></pre></div>
 <p>Export model data.</p>
 <div class="sourceCode" id="cb103"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb103-1"><a href="visualizing-occupancy-predictor-effects.html#cb103-1"></a>model_data_to_file &lt;-<span class="st"> </span>model_data <span class="op">%&gt;%</span></span>
 <span id="cb103-2"><a href="visualizing-occupancy-predictor-effects.html#cb103-2"></a><span class="st">  </span>dplyr<span class="op">::</span><span class="kw">select</span>(</span>
@@ -512,22 +511,22 @@ <h2><span class="header-section-number">10.4</span> Prepare model coefficient da
 <span id="cb103-10"><a href="visualizing-occupancy-predictor-effects.html#cb103-10"></a></span>
 <span id="cb103-11"><a href="visualizing-occupancy-predictor-effects.html#cb103-11"></a><span class="kw">write_csv</span>(</span>
 <span id="cb103-12"><a href="visualizing-occupancy-predictor-effects.html#cb103-12"></a>  model_data_to_file,</span>
-<span id="cb103-13"><a href="visualizing-occupancy-predictor-effects.html#cb103-13"></a>  <span class="st">&quot;data/results/data_occupancy_predictors.csv&quot;</span></span>
+<span id="cb103-13"><a href="visualizing-occupancy-predictor-effects.html#cb103-13"></a>  <span class="st">&quot;results/10_data-occupancy-predictors.csv&quot;</span></span>
 <span id="cb103-14"><a href="visualizing-occupancy-predictor-effects.html#cb103-14"></a>)</span></code></pre></div>
 <p>Read in data after clearing R session.</p>
 <div class="sourceCode" id="cb104"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb104-1"><a href="visualizing-occupancy-predictor-effects.html#cb104-1"></a><span class="co"># first merge species trait data with significant predictor</span></span>
 <span id="cb104-2"><a href="visualizing-occupancy-predictor-effects.html#cb104-2"></a>species_trait &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/species-trait-dat.csv&quot;</span>)</span>
-<span id="cb104-3"><a href="visualizing-occupancy-predictor-effects.html#cb104-3"></a>sig_predictor &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;data/results/data_predictor_effect.csv&quot;</span>)</span>
+<span id="cb104-3"><a href="visualizing-occupancy-predictor-effects.html#cb104-3"></a>sig_predictor &lt;-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">&quot;results/10_data-predictor-effect.csv&quot;</span>)</span>
 <span id="cb104-4"><a href="visualizing-occupancy-predictor-effects.html#cb104-4"></a>merged_species_traits &lt;-<span class="st"> </span><span class="kw">inner_join</span>(sig_predictor, species_trait,</span>
 <span id="cb104-5"><a href="visualizing-occupancy-predictor-effects.html#cb104-5"></a>  <span class="dt">by =</span> <span class="kw">c</span>(<span class="st">&quot;scientific_name&quot;</span> =<span class="st"> &quot;scientific_name&quot;</span>)</span>
 <span id="cb104-6"><a href="visualizing-occupancy-predictor-effects.html#cb104-6"></a>)</span>
 <span id="cb104-7"><a href="visualizing-occupancy-predictor-effects.html#cb104-7"></a><span class="kw">write_csv</span>(</span>
 <span id="cb104-8"><a href="visualizing-occupancy-predictor-effects.html#cb104-8"></a>  merged_species_traits,</span>
-<span id="cb104-9"><a href="visualizing-occupancy-predictor-effects.html#cb104-9"></a>  <span class="st">&quot;data/results/results-predictors-species-traits.csv&quot;</span></span>
+<span id="cb104-9"><a href="visualizing-occupancy-predictor-effects.html#cb104-9"></a>  <span class="st">&quot;results/10_results-predictors-species-traits.csv&quot;</span></span>
 <span id="cb104-10"><a href="visualizing-occupancy-predictor-effects.html#cb104-10"></a>)</span>
 <span id="cb104-11"><a href="visualizing-occupancy-predictor-effects.html#cb104-11"></a></span>
 <span id="cb104-12"><a href="visualizing-occupancy-predictor-effects.html#cb104-12"></a><span class="co"># read from file</span></span>
-<span id="cb104-13"><a href="visualizing-occupancy-predictor-effects.html#cb104-13"></a>model_data &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;data/results/results-predictors-species-traits.csv&quot;</span>)</span></code></pre></div>
+<span id="cb104-13"><a href="visualizing-occupancy-predictor-effects.html#cb104-13"></a>model_data &lt;-<span class="st"> </span><span class="kw">read_csv</span>(<span class="st">&quot;results/10_results-predictors-species-traits.csv&quot;</span>)</span></code></pre></div>
 <p>Fix predictor name.</p>
 <div class="sourceCode" id="cb105"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb105-1"><a href="visualizing-occupancy-predictor-effects.html#cb105-1"></a><span class="co"># remove .y from predictors</span></span>
 <span id="cb105-2"><a href="visualizing-occupancy-predictor-effects.html#cb105-2"></a>model_data &lt;-<span class="st"> </span>model_data <span class="op">%&gt;%</span></span>
@@ -573,7 +572,7 @@ <h2><span class="header-section-number">10.5</span> Get predictor effects</h2>
 <span id="cb106-34"><a href="visualizing-occupancy-predictor-effects.html#cb106-34"></a><span class="co"># write</span></span>
 <span id="cb106-35"><a href="visualizing-occupancy-predictor-effects.html#cb106-35"></a><span class="kw">write_csv</span>(</span>
 <span id="cb106-36"><a href="visualizing-occupancy-predictor-effects.html#cb106-36"></a>  data_predictor_long,</span>
-<span id="cb106-37"><a href="visualizing-occupancy-predictor-effects.html#cb106-37"></a>  <span class="st">&quot;data/results/data_predictor_direction_nSpecies.csv&quot;</span></span>
+<span id="cb106-37"><a href="visualizing-occupancy-predictor-effects.html#cb106-37"></a>  <span class="st">&quot;results/10_data-predictor-direction-nSpecies.csv&quot;</span></span>
 <span id="cb106-38"><a href="visualizing-occupancy-predictor-effects.html#cb106-38"></a>)</span></code></pre></div>
 <p>Prepare data to determine the direction (positive or negative) of the effect of each predictor. How many species are affected in either direction?</p>
 <div class="sourceCode" id="cb107"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb107-1"><a href="visualizing-occupancy-predictor-effects.html#cb107-1"></a><span class="co"># join with predictor names and relative AIC</span></span>