Skip to content

Commit

Permalink
Full docs review
Browse files Browse the repository at this point in the history
  • Loading branch information
amercader committed Aug 29, 2024
1 parent 7dd0ba7 commit f91f92b
Show file tree
Hide file tree
Showing 12 changed files with 681 additions and 529 deletions.
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -877,7 +877,7 @@ This plugin also contains a profile to serialize a CKAN dataset to a [schema.org

To define which profiles to use you can:

1. Set the `ckanext.dcat.rdf.profiles` configuration option on your CKAN configuration file:
1. Set the [`ckanext.dcat.rdf.profiles`](configuration.md#ckanextdcatrdfprofiles) configuration option on your CKAN configuration file:

ckanext.dcat.rdf.profiles = euro_dcat_ap sweden_dcat_ap

Expand Down Expand Up @@ -1166,6 +1166,15 @@ Default value: `True`
Whether to expose the catalog and dataset endpoints with the RDF DCAT
serializations.

#### ckanext.dcat.base_uri

Example:

```
https://my-site.org/uris/
```

Base URI to use when generating URIs for all entities. It needs to be a valid URI value.

#### ckanext.dcat.catalog_endpoint

Expand All @@ -1181,7 +1190,7 @@ Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.


#### ckanext.dcat.dataset_per_page
#### ckanext.dcat.datasets_per_page

Default value: `100`

Expand Down
7 changes: 6 additions & 1 deletion ckanext/dcat/config_declaration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,19 @@ groups:
serializations.
type: bool

- key: ckanext.dcat.base_uri
description: |
Base URI to use when generating URIs for all entities. It needs to be a valid URI value.
example: 'https://my-site.org/uri/'

- key: ckanext.dcat.catalog_endpoint
default: '/catalog.{_format}'
description: |
Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.
example: '/dcat/catalog/{_format}'

- key: ckanext.dcat.dataset_per_page
- key: ckanext.dcat.datasets_per_page
default: 100
type: int
description: |
Expand Down
3 changes: 0 additions & 3 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## CLI

The `ckan dcat` command offers utilites to transform between DCAT RDF Serializations and CKAN datasets (`ckan dcat consume`) and
viceversa (`ckan dcat produce`). In both cases the input can be provided as a path to a file:

Expand All @@ -16,4 +14,3 @@ The latter form allows chaininig commands for more complex metadata processing,
curl https://demo.ckan.org/api/action/package_search | jq .result.results | ckan dcat produce -f jsonld -

For the full list of options check `ckan dcat consume --help` and `ckan dcat produce --help`.

13 changes: 10 additions & 3 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
## Configuration reference

<!-- start-config -->

### General settings
Expand Down Expand Up @@ -73,6 +71,15 @@ Default value: `True`
Whether to expose the catalog and dataset endpoints with the RDF DCAT
serializations.

#### ckanext.dcat.base_uri

Example:

```
https://my-site.org/uris/
```

Base URI to use when generating URIs for all entities. It needs to be a valid URI value.

#### ckanext.dcat.catalog_endpoint

Expand All @@ -88,7 +95,7 @@ Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.


#### ckanext.dcat.dataset_per_page
#### ckanext.dcat.datasets_per_page

Default value: `100`

Expand Down
111 changes: 56 additions & 55 deletions docs/endpoints.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# RDF DCAT endpoints

By default when the `dcat` plugin is enabled, the following RDF endpoints are available on your CKAN instance. The schema used on the serializations can be customized using [profiles](#profiles).
By default, when the `dcat` plugin is enabled, the following RDF endpoints are available on your CKAN instance. The schema used on the serializations can be customized using [profiles](profiles.md#profiles).

To disable the RDF endpoints, you can set the following config in your ini file:

ckanext.dcat.enable_rdf_endpoints = False
To disable the RDF endpoints, you can set the [`ckanext.dcat.enable_rdf_endpoints`](configuration.md#ckanextdcatenable_rdf_endpoints) option in your ini file.


## Dataset endpoints

RDF representations of a particular dataset can accessed using the following endpoint:
RDF representations of a particular dataset can be accessed using the following endpoint:

https://{ckan-instance-host}/dataset/{dataset-id}.{format}

Expand All @@ -26,32 +24,32 @@ The fallback `rdf` format defaults to RDF/XML.

Here's an example of the different formats:

* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.rdf
* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml
* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.ttl
* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.n3
* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld

RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` sectionon the dataset page source code, eg:
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.rdf](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.rdf)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.ttl](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.ttl)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.n3](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.n3)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld)

<head>
RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` section of the dataset page source code, e.g.:

<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.rdf"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.ttl"/>
<!-- ... -->
```html
<head>

</head>
<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.rdf"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.ttl"/>
<!-- ... -->

</head>
```

Check the [RDF DCAT Serializer](#rdf-dcat-serializer) section for more details about how these are generated and how to customize the output using [profiles](#profiles).
Check the [RDF DCAT Serializer](profiles.md#rdf-dcat-serializer) section for more details about how these are generated and how to customize the output using [profiles](profiles.md#profiles).


You can specify the profile by using the `profiles=<profile1>,<profile2>` query parameter on the dataset endpoint (as a comma-separated list):

* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml?profiles=euro_dcat_ap
* https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld?profiles=schemaorg
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml?profiles=euro_dcat_ap](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml?profiles=euro_dcat_ap)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld?profiles=schemaorg](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld?profiles=schemaorg)

*Note*: When using this plugin, the above endpoints will replace the old deprecated ones that were part of CKAN core.


## Catalog endpoint
Expand All @@ -60,7 +58,7 @@ Additionally to the individual dataset representations, the extension also offer

https://{ckan-instance-host}/catalog.{format}?[page={page}]&[modified_since={date}]&[profiles={profile1},{profile2}]&[q={query}]&[fq={filter query}]

This endpoint can be customized if necessary using the `ckanext.dcat.catalog_endpoint` configuration option, eg:
This endpoint base path can be customized if necessary using the [`ckanext.dcat.catalog_endpoint`](configuration.md#ckanextdcatcatalog_endpoint) configuration option, eg:

ckanext.dcat.catalog_endpoint = /dcat/catalog/{_format}

Expand All @@ -72,44 +70,47 @@ As described previously, the extension will determine the RDF serialization form
* http://demo.ckan.org/catalog.xml
* http://demo.ckan.org/catalog.ttl

RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` sectionon the homepage and the dataset search page source code, eg:
RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` section of the catalog homepage and the dataset search page source code, eg:

<head>
```html
<head>

<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.rdf"/>
<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.xml"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/catalog.ttl"/>
<!-- ... -->

<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.rdf"/>
<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.xml"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/catalog.ttl"/>
<!-- ... -->
</head>
```

</head>
The number of datasets returned is limited. The response will include paging info, serialized using the [Hydra](http://www.w3.org/ns/hydra/spec/latest/core/) vocabulary. The different properties are self-explanatory, and can be used by clients to iterate the catalog:

The number of datasets returned is limited. The response will include paging info, serialized using the [Hydra](http://www.w3.org/ns/hydra/spec/latest/core/) vocabulary. The different terms are self-explanatory, and can be used by clients to iterate the catalog:
```turtle
@prefix hydra: <http://www.w3.org/ns/hydra/core#> .
@prefix hydra: <http://www.w3.org/ns/hydra/core#> .
<http://example.com/catalog.ttl?page=1> a hydra:PagedCollection ;
hydra:first "http://example.com/catalog.ttl?page=1" ;
hydra:last "http://example.com/catalog.ttl?page=3" ;
hydra:next "http://example.com/catalog.ttl?page=2" ;
hydra:totalItems 283 .
```

<http://example.com/catalog.ttl?page=1> a hydra:PagedCollection ;
hydra:first "http://example.com/catalog.ttl?page=1" ;
hydra:last "http://example.com/catalog.ttl?page=3" ;
hydra:next "http://example.com/catalog.ttl?page=2" ;
hydra:totalItems 283 .
The default number of datasets returned (100) can be modified by CKAN site maintainers using [`ckanext.dcat.datasets_per_page`](configuration.md#ckanextdcatdatasets_per_page)

The default number of datasets returned (100) can be modified by CKAN site maintainers using the following configuration option on your ini file:
The catalog endpoint also supports a `modified_since` parameter to restrict datasets to those modified from a certain date. The parameter value should be a valid ISO-8601 date:

ckanext.dcat.datasets_per_page = 20
http://demo.ckan.org/catalog.xml?modified_since=2015-07-24

The catalog endpoint also supports a `modified_since` parameter to restrict datasets to those modified from a certain date. The parameter value should be a valid ISO-8601 date:
It is possible to specify the profile(s) to use for the serialization using the `profiles` parameter:

http://demo.ckan.org/catalog.xml?modified_since=2015-07-24
http://demo.ckan.org/catalog.xml?profiles=euro_dcat_ap,sweden_dcat_ap

It's possible to specify the profile(s) to use for the serialization using the `profiles` parameter:
To filter the output, the catalog endpoint supports the `q` and `fq` parameters to specify a [search query](https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#q-parameter) or [filter query](https://solr.apache.org/guide/solr/latest/query-guide/common-query-parameters.html#fq-filter-query-parameter):

http://demo.ckan.org/catalog.xml?profiles=euro_dcat_ap,sweden_dcat_ap

To filter the output, the catalog endpoint supports the `q` and `fq` parameters to specify a [search query](https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-TheqParameter) or [filter query](https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter):

http://demo.ckan.org/catalog.xml?q=budget
http://demo.ckan.org/catalog.xml?fq=tags:economy
http://demo.ckan.org/catalog.xml?q=budget
http://demo.ckan.org/catalog.xml?fq=tags:economy



Expand All @@ -118,8 +119,8 @@ http://demo.ckan.org/catalog.xml?fq=tags:economy
Whenever possible, URIs are generated for the relevant entities. To try to generate them, the extension will use the first found of the following for each entity:

* Catalog:
- `ckanext.dcat.base_uri` configuration option value. This is the recommended approach. Value should be a valid URI
- `ckan.site_url` configuration option value.
- [`ckanext.dcat.base_uri`](configuration.md#ckanextdcatbase_uri) configuration option value. This is the recommended approach. Value should be a valid URI.
- [`ckan.site_url`](https://docs.ckan.org/en/latest/maintaining/configuration.html#ckan-site-url) configuration option value.
- 'http://' + `app_instance_uuid` configuration option value. This is not recommended, and a warning log message will be shown.

* Dataset:
Expand All @@ -131,12 +132,18 @@ Whenever possible, URIs are generated for the relevant entities. To try to gener
- The value of the `uri` field (note that this is not included in the default CKAN schema)
- Catalog URI (see above) + '/dataset/' + `package_id` field + '/resource/ + `id` field

Note that if you are using the [RDF DCAT harvester](#rdf-dcat-harvester) to import datasets from other catalogs and these define a proper URI for each dataset or resource, these will be stored as `uri` fields in your instance, and thus used when generating serializations for them.
Note that if you are using the [RDF DCAT harvester](harvester.md) to import datasets from other catalogs and these define a proper URI for each dataset or resource, these will be stored as `uri` fields in your instance, and so used when generating serializations for them.


## Content negotiation

The extension supports returning different representations of the datasets based on the value of the `Accept` header ([Content negotiation](https://en.wikipedia.org/wiki/Content_negotiation)).
The extension supports returning different representations of the datasets based on the value of the `Accept` header ([Content negotiation](https://en.wikipedia.org/wiki/Content_negotiation)). This is turned off by default, to enable it, set [`ckanext.dcat.enable_content_negotiation`](configuration.md#ckanextdcatenable_content_negotiation).

!!! Note

This feature overrides the CKAN core home page and dataset page view routes,
so you probably don't want to enable it if your own extension is also doing it.


When enabled, client applications can request a particular format via the `Accept` header on requests to the main dataset page, eg:

Expand All @@ -147,9 +154,3 @@ When enabled, client applications can request a particular format via the `Accep
This is also supported on the [catalog endpoint](#catalog-endpoint), in this case when making a request to the CKAN root URL (home page). This won't support the pagination and filter parameters:

curl https://{ckan-instance-host} -H Accept:text/turtle

Note that this feature overrides the CKAN core home page and dataset page controllers, so you probably don't want to enable it if your own extension is also doing it.

To enable content negotiation, set the following configuration option on your ini file:

ckanext.dcat.enable_content_negotiation = True
Loading

0 comments on commit f91f92b

Please sign in to comment.