Skip to content

Knowledge graph containing the catalog of software from the oeg-upm organization in GitHub

License

Notifications You must be signed in to change notification settings

oeg-upm/oeg-software-graph

Repository files navigation

oeg-software-graph

Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept. DOI Binder

Description

This repository contains the resources used to build a knowledge graph containing the catalog of software from the oeg-upm organization in GitHub (last execution June, 2023). The source data is generated with the SOftware Metadata Extraction Framework (SOMEF), which extracts the relevant information of a repository from README files and saves it as JSON files. Then, a knowledge graph that relies on the Software Description Ontology (SDO) is created using RML-star mappings. The resulting knowledge graph is then queried to assess the adoption of a set of representative best practices for research software publishing.

Disclaimer: this repository is a demonstration, accepted at the 2023 Semantics Conference.

Screenshot 2023-07-04 at 18 44 15

Structure

This repository is organized as follows:

  • data/ contains the input JSON file aggregating the metadata extracted of all repositories from the oeg-upm GitHub organisation (somef.json), along with the produced knowledge graph (somef-kg.nq)
  • mappings/ contains the RML-star mappings needed to construct the knowledge graph from the JSON file
  • notebooks/ contains two notebooks, for i) the generation of the JSON files and ii) construction and querying of the knowledge graph
  • best-practices-requirements/ describes the set of representative best practices that are assessed in the repositories represented in the knowledge graph

Installation

This pipeline has been tested in Python 3.9.

In order to run the pipeline, you need to install Jupyter Notebooks:

pip install notebook

Then, install the requirements of the project. Creating two separate environments is highly recommended (one for extraction, another one for querying), since the libraries used for extracting metadata and creating the knowledge graph have varied dependencies. For installing the extraction requirements, run:

pip install -r requirements_extraction.txt

For installing the construction and querying requirements, run

pip install -r requirements.txt

Finally, start Jupyter notebook and run the notebooks in the notebooks folder.

Requirements:

Our pipeline makes use of the somef, yatter, morph-kgc and pyoxigraph packages. For more information about the versions used, see the requirements.txt file (construction and querying) and requirements_extraction.txt (which will install somef).

If you want to play around with SPARQL queries, just run the construction and querying notebook, which will guide you through the KG creation and querying process.

Click in the Binder button Binder to show a pre-loaded notebook for testing (it may take a few minutes to load).

Citation

If you use this work, please cite our software as follows:

@article{iglesias2023towards,
  title        = {Towards Assessing FAIR Research Software Best Practices in an Organization Using RDF-star},
  author       = {Iglesias-Molina, Ana and Garijo, Daniel},
  year         = {2023},
  booktitle    = {Proceedings of the Posters and Demo Track of the 19th International Conference on Semantic Systems co-located with 19th International Conference on Semantic Systems (SEMANTiCS 2023)},
  publisher    = {CEUR-WS.org},
  series       = {{CEUR} Workshop Proceedings},
  volume       = {3526},
  url          = {https://ceur-ws.org/Vol-3526/paper-09.pdf}
}

Authors

About

Knowledge graph containing the catalog of software from the oeg-upm organization in GitHub

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published