This repository contains the resources used to build a knowledge graph containing the catalog of software from the oeg-upm organization in GitHub (last execution June, 2023). The source data is generated with the SOftware Metadata Extraction Framework (SOMEF), which extracts the relevant information of a repository from README files and saves it as JSON files. Then, a knowledge graph that relies on the Software Description Ontology (SDO) is created using RML-star mappings. The resulting knowledge graph is then queried to assess the adoption of a set of representative best practices for research software publishing.
Disclaimer: this repository is a demonstration, accepted at the 2023 Semantics Conference.
This repository is organized as follows:
data/
contains the input JSON file aggregating the metadata extracted of all repositories from the oeg-upm GitHub organisation (somef.json
), along with the produced knowledge graph (somef-kg.nq
)mappings/
contains the RML-star mappings needed to construct the knowledge graph from the JSON filenotebooks/
contains two notebooks, for i) the generation of the JSON files and ii) construction and querying of the knowledge graphbest-practices-requirements/
describes the set of representative best practices that are assessed in the repositories represented in the knowledge graph
This pipeline has been tested in Python 3.9
.
In order to run the pipeline, you need to install Jupyter Notebooks:
pip install notebook
Then, install the requirements of the project. Creating two separate environments is highly recommended (one for extraction, another one for querying), since the libraries used for extracting metadata and creating the knowledge graph have varied dependencies. For installing the extraction requirements, run:
pip install -r requirements_extraction.txt
For installing the construction and querying requirements, run
pip install -r requirements.txt
Finally, start Jupyter notebook and run the notebooks in the notebooks
folder.
Our pipeline makes use of the somef
, yatter
, morph-kgc
and pyoxigraph
packages. For more information about the versions used, see the requirements.txt file (construction and querying) and requirements_extraction.txt (which will install somef
).
If you want to play around with SPARQL queries, just run the construction and querying notebook, which will guide you through the KG creation and querying process.
Click in the Binder button to show a pre-loaded notebook for testing (it may take a few minutes to load).
If you use this work, please cite our software as follows:
@article{iglesias2023towards,
title = {Towards Assessing FAIR Research Software Best Practices in an Organization Using RDF-star},
author = {Iglesias-Molina, Ana and Garijo, Daniel},
year = {2023},
booktitle = {Proceedings of the Posters and Demo Track of the 19th International Conference on Semantic Systems co-located with 19th International Conference on Semantic Systems (SEMANTiCS 2023)},
publisher = {CEUR-WS.org},
series = {{CEUR} Workshop Proceedings},
volume = {3526},
url = {https://ceur-ws.org/Vol-3526/paper-09.pdf}
}