Home

Species Occurrence Phase Two (interim)

June 2 - ? Until next scheduled spike

objective: Prepare and analyze data sets, including a graph database, in preparation for training a model to identify species occurrences + habit in student papers.

Extract species names from papers using GNRD
Improve our NER model to find locations more reliably in source papers
Filter WoRMS or GBIF by resulting extract
Map species occurrences in GBIF or WoRMS to RDF

How accurate is location?
Can we associate lat lon with habitat?
Can habitat be determined? Found in Wikipedia?

Run a series of stats on our source data. For example:

1. Distribution (# of papers) per institution, by year (or even more specific date)
1. Using #1, plot the species names against paper dates
1. Generate network graph of relationships between species, location, habitat
1. Plot papers and species by institution location (see figure 1) Figure 1

Species Occurrence Phase One

March 22-26, 2019 Daily Standup: 10:00-10:30am PDT (15-30 minutes)

Objective:

TAXA a species occurrence verifier. or a prototype records verification application that allows human oversight/verification for machine-identified species occurrences. The result will be a list of species occurrences extracted automatically from student papers. Readers will be able to approve or edit records. Approved records can be output (csv file) for upload to the Hopkins GBIF node. Visualizations to aid the evaluation (if time allows):

contextual annotated text
mapped locations.
Data lookup in WoRMS, GBIF, and/or Wikidata

Communication channels/Quick links:

Google Doc for daily stand-up notes
#ai-marinetext (Slack): Zoom link, Github updates and quick exchanges
Github Project: Write issues, annotate and track progress
Wiki (here): work plan and project timeline
Jupyter Book: Documentation
SPOC Drive:Student papers, locations, habitat, species:
Phase 1 Retrospective

Work Plan

Step 1 - First output March 22-23

Output a list of genus/species, time, location
Contextual Text annotation viewer
Refined mock-ups and final output of verifier.

Step 2 - First review March 23

Initial Model card. Maybe Data sheet?
Evaluate genus/species identification
How is WoRMS working? Having success? Do we need to modify?

Evaluate time

What kind of time indication do we get from the content, if any. How often are we relying on the metadata?

Step 3 - Location - display / lookup Map March 24-25

https://deck.gl/
Yes/No annotation to the record

Step 4. Review/evaluation of mapping the data March 24-25

Step 5. GBIF API Integration March 26

Prepare output to GBIF and learn how GBIF data can help

Step 6. Documentation - Throughout the process

Move notes and outcomes to the Jupyter Book

Topics

Models and Methods

Data sources and knowledge graphs

SPOC Verifier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly