Skip to content
Nicole Coleman edited this page Jun 2, 2021 · 54 revisions

Species Occurrence Phase Two (interim)

June 2 - ? Until next scheduled spike

objective: Prepare and analyze data sets, including a graph database, in preparation for training a model to identify species occurrences + habit in student papers.

  1. Extract species names from papers using GNRD
  2. Improve our NER model to find locations more reliably in source papers
  3. Filter WoRMS or GBIF by resulting extract
  4. Map species occurrences in GBIF or WoRMS to RDF
  • How accurate is location?
  • Can we associate lat lon with habitat?
  • Can habitat be determined? Found in Wikipedia?
  1. Run a series of stats on our source data. For example:
    1. Distribution (# of papers) per institution, by year (or even more specific date)
    1. Using #1, plot the species names against paper dates
    1. Generate network graph of relationships between species, location, habitat
    1. Plot papers and species by institution location (see figure 1) Figure 1

Species Occurrence Phase One

March 22-26, 2019 Daily Standup: 10:00-10:30am PDT (15-30 minutes)

Objective:

TAXA a species occurrence verifier. or a prototype records verification application that allows human oversight/verification for machine-identified species occurrences. The result will be a list of species occurrences extracted automatically from student papers. Readers will be able to approve or edit records. Approved records can be output (csv file) for upload to the Hopkins GBIF node. Visualizations to aid the evaluation (if time allows):

  • contextual annotated text
  • mapped locations.
  • Data lookup in WoRMS, GBIF, and/or Wikidata

Communication channels/Quick links:

Work Plan

Step 1 - First output March 22-23

  • Output a list of genus/species, time, location
  • Contextual Text annotation viewer
  • Refined mock-ups and final output of verifier.

Step 2 - First review March 23

  • Initial Model card. Maybe Data sheet?
  • Evaluate genus/species identification
  • How is WoRMS working? Having success? Do we need to modify?

Evaluate time

  • What kind of time indication do we get from the content, if any. How often are we relying on the metadata?

Step 3 - Location - display / lookup Map March 24-25

Step 4. Review/evaluation of mapping the data March 24-25

Step 5. GBIF API Integration March 26

Prepare output to GBIF and learn how GBIF data can help

Step 6. Documentation - Throughout the process

Move notes and outcomes to the Jupyter Book

Topics

Models and Methods

Data sources and knowledge graphs

SPOC Verifier