Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHACL validation for DCAT-AP 2.1.1 #288

Merged
merged 23 commits into from
Jul 15, 2024
Merged

SHACL validation for DCAT-AP 2.1.1 #288

merged 23 commits into from
Jul 15, 2024

Conversation

amercader
Copy link
Member

This includes #281 , which should be merged first.

As a next step after finishing the scheming support we will test against the official SHACL shapes for DCAT-AP 2.1.1, to set up a reusable test mechanism and iron out existing issues with the current profiles.

@amercader
Copy link
Member Author

Right now, when testing against dcat-ap_2.1.1_shacl_shapes.ttl the only things left failing are the two values for the various geometry properties for dct:spatial (lcon:geometry, dcat:bbox, dcat:centroid). We are adding both the GeoJSON and the WKT version of the geometry, and the validator doesn't like that. (cc @seitenbau-govdata )

The Shacl validation expects only one item of LOCN.geometry, DCAT.bbox
or DCAT.centroid. Up until now we were adding two triples, one for
GeoJOSN and one for WKT. We'll default to WKT from now on as this is
what GeoDCAT-AP requires (or GML yuk...), but sites that for some reason
require GeoJSON (or both) can use the
`ckanext.dcat.output_spatial_format` to choose which format to use.
@amercader
Copy link
Member Author

@seitenbau-govdata Back in #220 we introduced changes to expose geometries for LOCN.geometry, DCAT.bbox
or DCAT.centroid. The same geometry was serialized as GeoJSON and as WKT, adding a node for each format. This is not allowed according to the Shacl validation, so if we want to be compliant we need to output just one. I defaulted to WKT from now on as this is what GeoDCAT-AP requires (or GML yuk...), but sites that for some reason require GeoJSON (or both) can use the
ckanext.dcat.output_spatial_format to choose which format to use. Does that sound reasonable? Of course this will be noted on the changelog along other minor changes in the parsers.

@seitenbau-govdata
Copy link
Member

@seitenbau-govdata Back in #220 we introduced changes to expose geometries for LOCN.geometry, DCAT.bbox or DCAT.centroid. The same geometry was serialized as GeoJSON and as WKT, adding a node for each format. This is not allowed according to the Shacl validation, so if we want to be compliant we need to output just one. I defaulted to WKT from now on as this is what GeoDCAT-AP requires (or GML yuk...), but sites that for some reason require GeoJSON (or both) can use the ckanext.dcat.output_spatial_format to choose which format to use. Does that sound reasonable? Of course this will be noted on the changelog along other minor changes in the parsers.

@amercader Thanks for the work on this! 👍 The serialization as GeoJSON and WKT was already there as we added the serialization for DCAT.bbox and DCAT.centroid and we decided to handle them in the same way. We were already aware of this, but decided not to change it due to backwards compatibility. This has already been reported with #249.

But I think the time is ready now. 😃 And your suggestion with the parameter looks good. We would also prefer WKT for serialization. This should be fine for the most consumers, because the most libraries can handle several types of spatial data, e.g. GeoJSON and WKT. And the possibility to choose the serialization with ckanext.dcat.output_spatial_format should certainly cover all needs.
The current implementation for deserialization stores the spatial value as GeoJSON anyway. This will be kept, right?

@amercader
Copy link
Member Author

amercader commented Jul 4, 2024

@amercader Thanks for the work on this! 👍 The serialization as GeoJSON and WKT was already there as we added the serialization for DCAT.bbox and DCAT.centroid and we decided to handle them in the same way. We were already aware of this, but decided not to change it due to backwards compatibility. This has already been reported with #249.

My bad, I didn't realize that the formats predated #220, and had missed #249

OK, let's go with this. To be honest I'm leaning towards doing a 2.0 release because all the major changes for scheming support and this can be well documented.

The current implementation for deserialization stores the spatial value as GeoJSON anyway. This will be kept, right?

Yes, geometries will get stored in GeoJSON in CKAN to integrate with ckanext-spatial

@amercader amercader mentioned this pull request Jul 5, 2024
The profile for DCAT-AP 1 stored triples using schema:startDate/endDate
but the namespace was updated to dcat:startDate/endDate in DCAT-AP 2.
As the dcat-ap 2 profile calls the version 1 profile, the two sets of
dates were added together. This makes fail the shapes recommended Shacl
tests and it is incorrect anyway, so we now remove the schema-namespaced
triples if present when using the dcat-ap 2 profile.
@amercader amercader marked this pull request as ready for review July 15, 2024 10:25
@amercader amercader merged commit 51d6513 into master Jul 15, 2024
8 checks passed
@amercader amercader deleted the shacl-validation branch July 15, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants