This project implements three custom RDF extractors based on Stanford's CoreNLP library.
Extracts named entities mentions, with the same output format as Stardog's entities
extractor.
Extracts and links entity mentions to existing resources in a knowledge graph. Same output format as Stardog's linker
extractor.
Extracts relations between named entity mentions. For example, the sentence:
The Orioles are a professional baseball team based in Baltimore
Will generate three triples:
entity:e435cd0347642bc7d2736155815a54e2 rdfs:label "Orioles"
entity:eb3cdb4e267d28feebb638711f8bd7b1 rdfs:label "Baltimore"
iri:e435cd0347642bc7d2736155815a54e2 relation:org:city_of_headquarters iri:eb3cdb4e267d28feebb638711f8bd7b1
- Download the latest release
- Add the jar to Stardog's classpath:
- Copy it to
server/ext
or other folder in the server (e.g.,server/dbms
) - OR
- Point the environment variable
STARDOG_EXT
to the its folder
- Copy it to
- Restart the Stardog server
CoreNLPMentionRDFExtractor
,CoreNLPEntityLinkerRDFExtractor
, andCoreNLPRelationRDFExtractor
will be available as RDF extractors, accessible through the CLI, API, and HTTP interfaces
For example, using the CLI, if you want to add a document to BITES and extract its entities:
stardog doc put --rdf-extractors CoreNLPMentionRDFExtractor myDatabase document.pdf
CoreNLP models can consume large amounts of system memory. If greeted with a GC overhead limit exceeded
error when using any of the extractors, increase the amount of memory available to Stardog.
- Tweak
build.gradle
to the language of your choice (e.g., change CoreNLP dependency tomodels-spanish
) - Run
gradlew clean shadowJar
for a single jar, orgradlew clean copyDeps
for individual dependencies - Add files in
build/libs
to Stardog's classpath - Restart the Stardog server