Skip to content

Constructing Knowledge Graphs from Unstructured Text

Notifications You must be signed in to change notification settings

rayhunter/KnowledgeGraph_Builder

 
 

Repository files navigation

KnowledgeGraph_Builder

KnowledgeGraph_Builder is an end-to-end pipeline for automatically constructing knowledge graphs from unstructured text in the form of RDF triples without supervision. It utilizes technologies such as spaCy, Stanford CoreNLP, Neural Coref from Huggingface, and sentence-transformers to perform Named Entity Recognition, Coreference Resolution, Relation Extraction, and Entity-Linking. The goal here is to offer a novel knowledge representation method for applications in automated ontology construction and taxonomy expansion. The intuition here is that by extracting knowledge from text sources, this project will aid in solving the low coverage issue often faced with hand crafted knowledge bases.

High-Level System

pipeline

Example

Input Text:
Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).

Output Graph:
graph

Requirements

  • python >=3.7
  • spacy
  • pandas
  • stanford-corenlp
  • json
  • nltk
  • miniconda

Setup

  1. Create new conda environment:
    conda env create -f environment.yml
  2. Activate environment:
    conda activate entitylink
  3. Download CoreNLP 4.2.0 and place in root folder of this repo:
    https://stanfordnlp.github.io/CoreNLP/download.html

Usage

  1. Add text of your choice in input.txt
  2. Run python text_to_graph.py input.txt
  3. Output will be output_processed.csv, containing triples of (entity1, relation, entity2)\

An example of an input and output is included in the examples folder

Example Notebooks

Example notebooks can be found under the notebooks folder which contain

  • NER_Evaluation.ipynb
  • CoRef_Evaluation.ipynb
  • Relation_Extraction.ipynb
  • EntityLinking_Evaluation.ipynb
  • Notebook_Text_to_Graph_Pipeline.ipynb
  • Notebook_SpaCy_Parsing_OpenIE_BERT_Evaluation.ipynb

For more information on this project, check out the Wiki.

About

Constructing Knowledge Graphs from Unstructured Text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 35.5%
  • Jupyter Notebook 32.9%
  • Python 18.6%
  • C 12.5%
  • Shell 0.2%
  • Java 0.2%
  • Makefile 0.1%