-
Notifications
You must be signed in to change notification settings - Fork 19
Home
Welcome to the CEVOpen wiki!
Main components of intern activity:
- technology - (getpapers, ami, wikidata/SPARQL) - search
- dictionaries (1 existing dictionary, 1 new dictionary) approx
- miniproject chemotype, genotype, activities (medicinal) phenotype - invasive species
- integration - how these fit together - an atlas
To build a multilingual semantic Atlas of Volatile Phytochemistry.[1]
To build Open Source multiplatform tools which can discover, aggregate, clean, and semantify scholarly documents containing significant amounts of phytochemical VOC[2]s. Documents will contain, extraction and assay of oils, optionally with properties and activities.
- APIs for repositories such as EPMC, biorXiv preprints, and thesis collections.
- Scrapers for semi-structured sites such as journals
- standardised metadata (e.g. JATS)
- PDF and HTML readers => XML or JSON
- article sectioning (e.g. into JATS categories)
- extraction of floats (tables, maps, images, diagrams, chemistry, maths*)
- display and navigation of sections in a paper
- aggregated statistics and machine learning
- multilingual annotation (using Wikidata)
- linking to the Wikidata knowledge graph
- Coordination of EO-related and general dictionaries - conformance to a common standard.
- Validation of gold-standard minicorpora (e.g. for training and validating machine learning)
[*] not included in CEVOpen but extensible in future [1] we need an engaging title. "Atlas" is often extended beyond maps (e.g. Atlas of The Human Body). For example, plantPart is an atlas of the plant. It works for me but may confuse others. Here are some ideas: "Compendium of ..." "Semantic Essence of phytochemistry". I like this - it's a play on words. Essence == central meaning, and also volatiles But please think creatively.
[2] Volatile Organic Compound