Skip to content

akvaplan-niva/gbif-no-darwin-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Darwin Core biodiversity data pipelines

This repository contains data production pipelines for building Darwin Core datasets for publication in the Global Biodiversity Information Facility, with permanent archiving in Zenodo

EcoTaxa

Datasets

Notice: These are pre-production URLs, for testing purposes only

Workflow

  • Export EcoTaxa data as TSV (using DOI export with images)

  • Publish untreated TSV and images to Zenodo

  • Create Darwin Core occurrences in NDJSON from EcoTaxa TSV, using ecotaxa-darwin-core

  • Create unique Darwin Core sampling events in NDJSON by reducing the occurrences

  • @todo Merge with other/authoritative event metadata (eg. sampling volumes)

  • Create lists of ignored (not-living) and rejected (non-Eukaryota) objects

  • Create lists of rejected events (non-unique or invalid/non-consistent metadata)

  • Finish local processing by executing Darwin Core pipelines below

gbif-no-darwin-core$ ./bin/ecotaxa-pipeline 1420

Darwin Core pipelines

Taxonomy

  • Create taxonomy NDJSON by extracting occurrence taxa and checking against GBIF Species API using WoRMS
  • Create lists of possible taxonomy issues (not found or incertae sedis)

Metadata

  • Extract time coverage (start/end, years, months, days, dates)
  • Extract space coverage (bounding box/depths)
  • Extract sampling protocols
  • @todo Create EML XML

Archive

Metafile

  • Create meta.xml with file metadata for event core (event.tsv) and extensions (occurrence.tsv taxonomy.tsv)
  • Set default fields for occurrenceStatus ("present"), basisOfRecord (MO?) and organismQuantityType ("individuals")

Event Core

  • Reduce occurrences by rolling up to one line per taxon per sample and summing organismQuantity

Occurrences extension

  • Update resulting occurrences by appending authorship into scientific name and merge-in relevant fields from taxonomy (in particular taxonID)
  • Publish NDJSON distribution with zipped Darwin Core archive in Zenodo

Taxonomy

Dependencies

@todo

Project

This project was co-funded by GBIF Norway, see Data management plan for further details.

About

Reproducible Darwin Core data pipelines for GBIF Norway

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published