Skip to content

Latest commit

 

History

History
260 lines (221 loc) · 7.82 KB

README.md

File metadata and controls

260 lines (221 loc) · 7.82 KB

GATEWAy database from v.1 to v.2

Emilio Berti

Change log

V.2.0

  1. Removed problematic characters from fields: - , replaced with ; when within ". - " removed. - Multiple spaces trimmed to one space. - Convert all uppercase letters to lowercase.
  2. Add data from Mulder et al. (citation needed).
  3. Add data from Vinagre et al. (citation_needed).
  4. Harmonize taxonomy
  5. Rename columns following the Darwin Core standard: http://rs.tdwg.org/dwc/terms.htm#4.

Additionally, tables and shapefiles are created to be inserted into a PostGIS database to feed the website server.

SQL database

I re-structured the GATEWAy database into <…> tables:

  1. Foodwebs
  2. Species
  3. Interactions

The old database was a single csv file. I colored the columns of this file with the same color of the tables above. The symbol => show columns that have been renamed. E.g., A => B means A was renamed to B, following DarwinCore standards: http://rs.tdwg.org/dwc/terms.htm#4. All column names have been converted to camelCase. Strike-through columns have been dropped.

foodwebName
longitude => decimalLongitude
latitude => decimalLatitude ecosystemType
geographicLocation => locality
studySite
altitude => verbatimElevation
depth => verbatimDepth
samplingTime
sampling.start.year => EarliestDateCollected
sampling.end.year => LatestDateCollected

~~key~~
taxonID,

identified by the combination of all fields related to species
taxonomy => acceptedTaxonName
common => vernacularName
lifeStage
metabolic.type => metabolicType
movement.type => movementType
mass.min.g. => lowestMass, in grams
mass.max.g. => highestMass, in grams
mass.mean.g. => meanMass, in grams
length.min.cm. => shortestLength, in cm
length.max.cm. => longestLength, in cm
length.mean.cm. => meanLength, in cm
size.method => sizeMethod
size.citation => sizeReference
taxonomy.status => taxonomicStatus
taxonomy.level => taxonRank

~~autoid~~
style="color:red">interactionID, identified by unique taxonID and foodwebName
interactionType
interactionDimensionality
link.methodology => interactionMethod
link.citation => interactionReference
interaction.classification => basisOfRecord
link.citation => interactionReference
notes => interactionRemarks

relations

Pipeline

To run the whole pipeline at once:

bash pipeline.sh

This runs (in order):

  1. Clean v.1.0 for encoding/parsing errors: scripts/clean-gateway.
  2. Process new data to add to the database: scripts/mulder.R and scripts/tagus.R.
  3. Extract species names: scripts/extract-species-names.R.
  4. Query the species names against GBIF: harmonize-taxonomy.py.
  5. Combine v.1.0 with new data: scripts/combine.R.
  6. Harmonize taxonomy of the database: harmonize-taxonomy.R.
  7. Saves the new database as gateway-v.2.0.csv.
  8. Rename fields and split table into relational tables: scripts/relational.R.

Some of this steps can take some time. To avoid re-running already completed steps, once the step is completed successfully an hidden (empty) file is added to the steps folder. Steps that have such files will not be re-ran. You can re-run the whole pipeline from scratch specifying the option --clean:

bash pipeline.sh --clean

To see available options and usage: bash pipeline.sh --help.

PostGRES database

Create database

psql -U postgres
CREATE DATABASE econetlab;
\c econetlab;
CREATE EXTENSION postgis;

Foodwebs table

CREATE TABLE foodwebs(
  foodwebID INTEGER PRIMARY KEY,
  foodwebName  VARCHAR,
  decimalLongitude  REAL,
  decimalLatitude  REAL,
  ecosystemType  VARCHAR,
  geographicLocation  VARCHAR,
  studySite  VARCHAR,
  verbatimElevation  REAL,
  verbatimDepth  REAL,
  samplingTime  VARCHAR,
  EarliestDateCollected  VARCHAR,
  LatestDateCollected  VARCHAR
);

COPY foodwebs FROM '/home/eb97ziwi/gateway-database/data/foodwebs.csv' CSV HEADER;

Species table

CREATE TABLE species(
  ID INTEGER PRIMARY KEY,
  acceptedTaxonName VARCHAR,
  taxonRank VARCHAR,
  taxonomicStatus VARCHAR,
  vernacularName VARCHAR,
  taxonomicLevel VARCHAR,
  lifeStage VARCHAR,
  metabolicType VARCHAR,
  movementType VARCHAR,
  lowestMass REAL,
  highestMass REAL,
  meanMass REAL,
  shortestLength REAL,
  longestLength REAL,
  meanLength REAL,
  sizeMethod VARCHAR,
  sizeReference VARCHAR
);

COPY species FROM '/home/eb97ziwi/gateway-database/data/species.csv' CSV HEADER;

Interaction table

CREATE TABLE interactions(
  ID INTEGER PRIMARY KEY,
  foodwebID INTEGER,
  resourceID INTEGER,
  consumerID INTEGER,
  interactionType VARCHAR,
  interactionDimensionality VARCHAR,
  interactionMethod VARCHAR,
  interactionReference VARCHAR,
  interactionRemarks VARCHAR,
  basisOfRecord VARCHAR
);

COPY interactions FROM '/home/eb97ziwi/gateway-database/data/interactions.csv' CSV HEADER;

Table schema

For https::/dbdiagram.io/d

Table foodwebs {
  ID INTEGER [primary key]
  foodwebName  VARCHAR
  decimalLongitude  REAL
  decimalLatitude  REAL
  ecosystemType  VARCHAR
  geographicLocation  VARCHAR
  studySite  VARCHAR
  verbatimElevation  REAL
  verbatimDepth  REAL
  samplingTime  VARCHAR
  EarliestDateCollected  VARCHAR
  LatestDateCollected  VARCHAR
}

Table species {
  ID INTEGER [primary key]
  acceptedTaxonName VARCHAR
  taxonRank VARCHAR
  taxonomicStatus VARCHAR
  vernacularName VARCHAR
  taxonomicLevel VARCHAR
  lifeStage VARCHAR
  metabolicType VARCHAR
  movementType VARCHAR
  lowestMass REAL
  highestMass REAL
  meanMass REAL
  shortestLength REAL
  longestLength REAL
  meanLength REAL
  sizeMethod VARCHAR
  sizeReference VARCHAR
}

Table interactions {
  ID INTEGER [primary key]
  foodwebID INTEGER
  resourceID INTEGER
  consumerID INTEGER
  interactionType VARCHAR
  interactionDimensionality VARCHAR
  interactionMethod VARCHAR
  interactionReference VARCHAR
  interactionRemarks VARCHAR
  basisOfRecord VARCHAR
}

Ref: foodwebs.ID < interactions.foodwebID
Ref: species.ID < interactions.consumerID
Ref: species.ID < interactions.resourceID

Generate this HTML Readme

Rscript --vanilla -e "rmarkdown::render('README.Rmd')"
pandoc -s --toc -c readme.css README.md -o README.html --metadata title="GATEWAy Database"