Skip to content

SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption (WWW-2022 Poster)

License

Notifications You must be signed in to change notification settings

dkw-aau/validatingshapes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SHACL and ShEx in the Wild❗

A Community Survey on Validating Shapes Generation and Adoption

Knowledge Graphs (KGs) are the de-facto standard to represent heterogeneous domain knowledge on the Web and within organizations. Various tools and approaches exist to manage KGs and ensure the quality of their data. Among these, the Shapes Constraint Language (SHACL) and the Shapes Expression Language (ShEx) are the two state-of-the-art languages to define validating shapes for KGs. In the last few years, the usage of these constraint languages has increased, and hence new needs arose. One such need is to enable the efficient generation of these shapes. Yet, since these languages are relatively new, we witness a lack of understanding of how they are effectively employed for existing KGs. Therefore, in this work, we answer How validating shapes are being generated and adopted? Our contribution is threefold. First, we conducted a community survey to analyze the needs of users (both from industry and academia) generating validating shapes. Then, we cross-referenced our results with an extensive survey of the existing tools and their features. Finally, we investigated how existing automatic shape extraction approaches work in practice on real, large KGs. Our analysis shows the need for developing semi-automatic methods that can help users generate shapes from large KGs.

Read the paper: https://dl.acm.org/doi/10.1145/3487553.3524253

Visit our website for more details: https://relweb.cs.aau.dk/validatingshapes/

Datasets

We have used the following datasets:

  1. DBPedia: We used dbpedia script to download all the dbpedia files listed here.
  2. YAGO-4: We downloaded YAGO-4 English version from https://yago-knowledge.org/data/yago4/en/.
  3. LUBM: We generated LUBM dataset following the guidelines available at LUBM's official Website.

Statistics of these datasets is shown in the table below:

DBpedia YAGO-4 LUBM
# of triples 52 M 210 M 91 M
# of distinct objects 19 M 126 M 12 M
# of distinct subjects 15 M 5 M 10 M
# of distinct literals 28 M 111 M 5.5 M
# of distinct RDF type triples 5 M 17 M 1 M
# of distinct classes 427 8,902 22
# of distinct properties 1,323 153 20
Size in GBs 6.6 28.59 15.66

You can download a copy of these datasets from our single archive.

SHACL Shapes

DOI

We have published the extracted SHACL shapes of all three datasets on Zenodo. Additionally, we have also made available an executable Jar file of our application on Zenodo to extract SHACL shapes from RDF datasets in .nt format.


Good News ⭐ Source Code is also available now!

We have made the source code available in the code directory along with instructions on how to run the code.


How to run the Jar?

  • Download the Jar from the Zenodo

  • Update the configuration in config.properties file

  • Follow these steps to install sdkman and execute the following commands to install the specified version of Java and Gradle.

      sdk list java
      sdk install java 17.0.2-open 
      sdk use java java 17.0.2-open 
      
      sdk list gradle
      sdk install gradle Gradle 7.4-rc-1
      sdk use gradle Gradle 7.4-rc-1
    
  • In case you are using docker, you should use gradle:7.3.3-jdk17-alpine.

  • Run the jar file by passing the config file as a parameter: java -jar shacl-generator-program.jar config.properties

Analyzing the State-of-the-art tools

We ran some experiments to find out the real capabilities of the following existing tools for automatically extracting shapes from RDF graphs.

1. SheXer

https://github.com/DaniFdezAlvarez/shexer

2. ShapeDesigner

https://gitlab.inria.fr/jdusart/shexjapp

3. SHACLGEN

https://pypi.org/project/shaclgen/

Persistent URI & Licence:

The content present in this repository is available at https://github.com/dkw-aau/validatingshapes under Apache License 2.0 .

Citing the work

Please cite us if you use the code in your project or publication

@inproceedings{DBLP:conf/www/RabbaniLH22,
  author       = {Kashif Rabbani and
                  Matteo Lissandrini and
                  Katja Hose},
  title        = {{SHACL} and ShEx in the Wild: {A} Community Survey on Validating Shapes
                  Generation and Adoption},
  booktitle    = {{WWW} (Companion Volume)},
  pages        = {260--263},
  publisher    = {{ACM}},
  year         = {2022}
}

About

SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption (WWW-2022 Poster)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published