Skip to content

usc-isi-i2/hunger-for-knowledge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 

Repository files navigation

Enriching Wikidata with Linked Open Data

This is the official repository for paper "Enriching Wikidata with Linked Open Data".

Data

All the data we used and findings we got can be downloaded from this google drive folder. The usage and information of each data file can be found in each notebook's environment path step.

.
|__ input  // original data files for Wikidata, DBpedia and Getty
|   |
|   |__ wikidata
|   |   |__ claims.tsv.gz
|   |   |__ labels.en.tsv.gz
|   |   |__ derived.P31.tsv.gz
|   |   |__ derived.P279star.tsv.gz
|   |   |__ value_type_constraint.json
|   |
|   |__ dbpedia
|   |   |__ infobox-properties_lang=en_2021_12_01.ttl.bz2
|   |   |__ sitelink.20211027.tsv.gz
|   |
|   |__ getty
|       |__ AAT_explicit.zip
|       |__ ULAN_explicit.zip
|       |__ TGN_explicit.zip
|
|__ intermediate files  // graphs after importing by kgtk
|   |
|   |__ dbpedia
|   |   |__ wikidata_infobox.zip
|   |
|   |__ getty
|       |__ ULAN
|       |   |__ explicit.zip
|       |   |__ subgraphs.zip
|       |   |__ wiki.align.tsv
|       |__ TGN
|           |__ explicit.zip
|           |__ wiki.align.tsv
|
|__ output  // output results, mainly the validated novel and statistics
    |
    |__ dbpedia
    |   |__ novel.zip                           // validated enriched statements 
    |   |__ property_mapping.json               // property mapping results
    |   |__ property_mapping_ground_truth.json  // ground truth property mappings
    |   |__ statement.statistics.json           // stats count in statement
    |   |__ entity.statistics.json              // stats count in entity
    |   |__ annotation.tsv                      // annotation and prediction
    |
    |__ getty
    |   |__ novel.zip                           // validated enriched statements
    |   |__ statement.statistics.json           // stats count in statement
    |   |__ entity.statistics.json              // stats count in entity
    |   |__ annotation.tsv                      // annotation and prediction
    |
    |__ agree                                   // overlapping entity-property values
    |   |__ wikidata.getty.P19.tsv              // place of birth
    |   |__ wikidata.getty.P20.tsv              // place of death
    |   
    |__ literals                                // literals enrichment
        |__ new.results.P569.tsv                // date of birth
        |__ new.results.P570.tsv                // date of death

There are two ways to use the above data and run our examples:

  1. Run from scratch by using data in ./input folder and run our notebooks in ./import, it will generate the same data in ./intermediate which are graphs imported by kgtk.
  2. Skip the import step, directly use data in ./intermediate folder and run the examples.

Note:

In each notebook in examples, there are cells after the import cell that one can adjust the path to the files. The users can match the file path according to the specific file names.

Findings

Here is the correspondence between our findings and the files:

  • Finding 1: output/dbpedia/*.statistics.json, output/getty*.statistics.json, all novel results (enriched statements) are in novel.zip;
  • Finding 2: output/dbpedia/annotation.tsv and output/getty/annotation.tsv;
  • Finding 3: output/dbpedia/property_mapping.json and property_mapping_ground_truth.json;
  • Finding 4: the same files for Finding 1 and Finding 2;
  • Finding 5: output/agree;
  • Finding 6: output/literals.

Requirements

We use Knowledge Graph Toolkit dev branch to implement our procedures.

Notebooks

The following file tree shows the structure of our code, which includes import notebooks and enrichment notebooks, for our two external graphs: DBpedia and Getty.

.
|__ import  // import external graphs
|   |__ build_wikidata_infobox.ipynb  // import DBpedia
|   |__ import_getty_vocab.ipynb      // import Getty
|
|__ examples  // enrichment notebooks
    |
    |__ dbpedia 
    |   |__ batch_query_procedure.ipynb  // enrichment for all properties with value-type constraint
    |   |__ founding_year_of_university.ipynb  
    |   |__ industry_of_company.ipynb  
    |   |__ movies_with_cost.ipynb  
    |   |__ spouse_of_politicians.ipynb  
    |
    |__ getty
        |__ getty_birthdate_query.ipynb    // enrichment for P569
        |__ getty_birthplace_query.ipynb   // enrichment for P19
        |__ getty_deathdate_query.ipynb    // enrichment for P570
        |__ getty_deathplace_query.ipynb   // enrichment for P20
        |__ getty_gender_query.ipynb       // enrichment for P21
        |__ getty_nationality_query.ipynb  // enrichment for P27
        |__ getty_query_procedure.ipynb    // example of enrichment for a subset of Wikidata (not included in our paper)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published