-
Notifications
You must be signed in to change notification settings - Fork 0
Home
June 2 - ? Until next scheduled spike
objective: Prepare and analyze data sets, including a graph database, in preparation for training a model to identify species occurrences + habit in student papers.
- Extract species names from papers using GNRD
- Improve our NER model to find locations more reliably in source papers
- Filter WoRMS or GBIF by resulting extract
- Map species occurrences in GBIF or WoRMS to RDF
- How accurate is location?
- Can we associate lat lon with habitat?
- Can habitat be determined? Found in Wikipedia?
- Run a series of stats on our source data. For example:
-
- Distribution (# of papers) per institution, by year (or even more specific date)
-
- Using #1, plot the species names against paper dates
-
- Generate network graph of relationships between species, location, habitat
-
- Plot papers and species by institution location (see figure 1) Figure 1
March 22-26, 2019 Daily Standup: 10:00-10:30am PDT (15-30 minutes)
Objective:
TAXA a species occurrence verifier. or a prototype records verification application that allows human oversight/verification for machine-identified species occurrences. The result will be a list of species occurrences extracted automatically from student papers. Readers will be able to approve or edit records. Approved records can be output (csv file) for upload to the Hopkins GBIF node. Visualizations to aid the evaluation (if time allows):
- contextual annotated text
- mapped locations.
- Data lookup in WoRMS, GBIF, and/or Wikidata
Communication channels/Quick links:
- Google Doc for daily stand-up notes
- #ai-marinetext (Slack): Zoom link, Github updates and quick exchanges
- Github Project: Write issues, annotate and track progress
- Wiki (here): work plan and project timeline
- Jupyter Book: Documentation
- SPOC Drive:Student papers, locations, habitat, species:
- Phase 1 Retrospective
- Output a list of genus/species, time, location
- Contextual Text annotation viewer
- Refined mock-ups and final output of verifier.
- Initial Model card. Maybe Data sheet?
- Evaluate genus/species identification
- How is WoRMS working? Having success? Do we need to modify?
Evaluate time
- What kind of time indication do we get from the content, if any. How often are we relying on the metadata?
- https://deck.gl/
- Yes/No annotation to the record
Prepare output to GBIF and learn how GBIF data can help
Move notes and outcomes to the Jupyter Book