Data & Code Supplement for "Environmental DNA as a management tool for tracking artificial waterhole use in savanna ecosystems"
Gathering barcoding reference sequences for the Kruger National Park (also available as a stand alone git reposity.
The two scripts to gather sequences from GenBank and BOLD are "GB_BOLD_seq_download.R" which take a list of Latin binomials and download sequences from the rentrez and bold R packages. These can be modified to download reference sequences for other marker genes. Next "CO1_from_GB_mito_genomes.R" follows the same process, but uses rentrez and modified scripts from the PrimerMiner R package to downloadd whole mitochondrial genomes and extract COI sequences.
After downloading the sequences, "format_GB_BOLD_refLib.sh", "generate_taxonomy.R", and "format_refLib_dada_2.R are used to clean up the downloaded FASTA files, generate the taxonomy file, and format for use with dada2's built-in RDP classifer.
Beyond the custom library, additional scripts are included to format the MIDORI and terrimporter COI reference databases for use with dada2 (require download of these source databases).
Contains downloaded FASTA files, whole mitochondrial genomes, and species lists generated by the Kruger National Park.
The "output" folder contains intermediate files, plus the final dada2-formatted reference sequences:
- Kingdom to Genus: "Kruger_Vertebrates_refLib_dada2.fasta"
- Species: "Kruger_Vertebrates_refLib_dada2_species.fasta"
- Phylum to Species: "Kruger_Vertebrates_refLib_dada2_phy2species.fasta"
Analyses can be reproduced with the data files included here. To reproduce the DADA2 pipelines, the raw sequence reads are archived in the NCBI Sequence Read Archive:
BioProject PRJNA490450 Accession numbers SRR7822814 to SRR7822901
The "data" folder contains the raw camera trap annotations, field notes, mammal phylogeny, mammal trait data, and the final merged data file ("merged_eDNA_camtrap_data_nov20_2019.RData" - created by scripts/merging_data.R)
Raw sequences are separated into primer sets ("separate_coi_by_primer.sh"), then per primer set, separate DADA2 pipelines perform quality filtering, denoising, chimera removal, ASV calling, and taxonomy assignment (dada2_*.R files). The sequence tables resulting from the separate pipelines are merged with "tax_assign_dada2.R"
"merging_data.R" merges the eDNA sequence tables, camera trap, and water sample data into an RData object for subsequent analyses ("merged_eDNA_camtrap_data_nov20_2019.RData").
Statistical analyses, figures, and tables for the most analyses are conducted withn "analyses.R", with the exception of the hierarchical models, which are conducted with "stan_models.R"