Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamped documentation for 2.0 #296

Merged
merged 11 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Read the Docs configuration file for MkDocs projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.12"

mkdocs:
configuration: mkdocs.yml

python:
install:
- requirements: docs/requirements.txt
199 changes: 98 additions & 101 deletions CHANGELOG.md

Large diffs are not rendered by default.

1,278 changes: 12 additions & 1,266 deletions README.md

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion ckanext/dcat/config_declaration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,19 @@ groups:
serializations.
type: bool

- key: ckanext.dcat.base_uri
description: |
Base URI to use when generating URIs for all entities. It needs to be a valid URI value.
example: 'https://my-site.org/uri/'

- key: ckanext.dcat.catalog_endpoint
default: '/catalog.{_format}'
description: |
Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.
example: '/dcat/catalog/{_format}'

- key: ckanext.dcat.dataset_per_page
- key: ckanext.dcat.datasets_per_page
default: 100
type: int
description: |
Expand Down
Binary file added docs/_assets/ckan.ico
Binary file not shown.
Binary file added docs/_assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions docs/_css/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[data-md-color-scheme="ckan"] {
--md-primary-fg-color: #2980b9;
--md-primary-fg-color--light: #ECB7B7;
--md-primary-fg-color--dark: #90030C;
}

[data-md-color-scheme="slate"] {
--md-primary-fg-color: #2980b9;
--md-primary-fg-color--light: #ECB7B7;
--md-primary-fg-color--dark: #90030C;
--md-hue: 210;
}


[data-md-toggle="search"]:not(:checked) ~ .md-header .md-search__form::after {
position: absolute;
top: .3rem;
right: .3rem;
display: block;
padding: .1rem .4rem;
color: var(--md-default-bg-color);
font-weight: bold;
font-size: .8rem;
border: .05rem solid var(--md-default-bg-color--lighter);
border-radius: .1rem;
content: "/";
}
1 change: 1 addition & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--8<-- "CHANGELOG.md"
16 changes: 16 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
The `ckan dcat` command offers utilites to transform between DCAT RDF Serializations and CKAN datasets (`ckan dcat consume`) and
viceversa (`ckan dcat produce`). In both cases the input can be provided as a path to a file:

ckan dcat consume -f ttl examples/dcat/dataset.ttl

ckan dcat produce -f jsonld examples/ckan/ckan_datasets.json

or be read from stdin:

ckan dcat consume -

The latter form allows chaininig commands for more complex metadata processing, e.g.:

curl https://demo.ckan.org/api/action/package_search | jq .result.results | ckan dcat produce -f jsonld -

For the full list of options check `ckan dcat consume --help` and `ckan dcat produce --help`.
149 changes: 149 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
<!-- start-config -->

### General settings

#### ckanext.dcat.rdf.profiles

Example:

```
ckanext.dcat.rdf.profiles = euro_dcat_ap_2 my_local_ap
```

Default value: `euro_dcat_ap_2`

RDF profiles to use when parsing and serializing. See https://github.com/ckan/ckanext-dcat#profiles
for more details.


#### ckanext.dcat.translate_keys

Default value: `True`

If set to True, the plugin will automatically translate the keys of the DCAT
fields used in the frontend (at least those present in the `ckanext/dcat/i18n`
po files).


### Parsers / Serializers settings

#### ckanext.dcat.output_spatial_format

Default value: `wkt`

Format to use for geometries when serializing RDF documents. The default is
recommended as is the format expected by GeoDCAT, alternatively you can
use `geojson` (or both, which will make SHACL validation fail)


#### ckanext.dcat.resource.inherit.license

Default value: `False`

If there is no license defined for a resource / distribution, inherit it from
the dataset.


#### ckanext.dcat.normalize_ckan_format

Default value: `True`

When true, the resource label will be tried to match against the standard
list of CKAN formats (https://github.com/ckan/ckan/blob/master/ckan/config/resource_formats.json)
This allows for instance to populate the CKAN resource format field
with a value that view plugins, etc will understand (`csv`, `xml`, etc.)


#### ckanext.dcat.clean_tags

Default value: `False`

Remove special characters from keywords (use the old munge_tag() CKAN function).
This is generally not needed.


### Endpoints settings

#### ckanext.dcat.enable_rdf_endpoints

Default value: `True`

Whether to expose the catalog and dataset endpoints with the RDF DCAT
serializations.

#### ckanext.dcat.base_uri

Example:

```
https://my-site.org/uris/
```

Base URI to use when generating URIs for all entities. It needs to be a valid URI value.

#### ckanext.dcat.catalog_endpoint

Example:

```
ckanext.dcat.catalog_endpoint = /dcat/catalog/{_format}
```

Default value: `/catalog.{_format}`

Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.


#### ckanext.dcat.datasets_per_page

Default value: `100`

Default number of datasets returned by the catalog endpoint.


#### ckanext.dcat.enable_content_negotiation

Default value: `False`

Enable content negotiation in the main catalog and dataset endpoints. Note that
setting this to True overrides the core `home.index` and `dataset.read` endpoints.


### Harvester settings

#### ckanext.dcat.max_file_size

Default value: `50`

Maximum file size that will be downloaded for parsing by the harvesters


#### ckanext.dcat.expose_subcatalogs

Default value: `False`

Store information about the origin catalog when harvesting datasets.
See https://github.com/ckan/ckanext-dcat#transitive-harvesting for more details.


### Deprecated options (will be removed in future versions)

#### ckanext.dcat.compatibility_mode

Default value: `False`

Whether to modify some fields to maintain compatibility with previous versions
of the ckanext-dcat parsers.


#### ckanext.dcat.json_endpoint

Default value: `/dcat.json`

Custom route to expose the legacy JSON endpoint



<!-- end-config -->

156 changes: 156 additions & 0 deletions docs/endpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# RDF DCAT endpoints

By default, when the `dcat` plugin is enabled, the following RDF endpoints are available on your CKAN instance. The schema used on the serializations can be customized using [profiles](profiles.md#profiles).

To disable the RDF endpoints, you can set the [`ckanext.dcat.enable_rdf_endpoints`](configuration.md#ckanextdcatenable_rdf_endpoints) option in your ini file.


## Dataset endpoints

RDF representations of a particular dataset can be accessed using the following endpoint:

https://{ckan-instance-host}/dataset/{dataset-id}.{format}

The extension will determine the RDF serialization format returned. The currently supported values are:

| Extension | Format | Media Type |
|-----------|-------------------------------------------------------------|---------------------|
| `xml` | [RDF/XML](https://en.wikipedia.org/wiki/RDF/XML) | application/rdf+xml |
| `ttl` | [Turtle](https://en.wikipedia.org/wiki/Turtle_%28syntax%29) | text/turtle |
| `n3` | [Notation3](https://en.wikipedia.org/wiki/Notation3) | text/n3 |
| `jsonld` | [JSON-LD](http://json-ld.org/) | application/ld+json |

The fallback `rdf` format defaults to RDF/XML.

Here's an example of the different formats:

* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.rdf](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.rdf)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.ttl](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.ttl)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.n3](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.n3)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld)

RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` section of the dataset page source code, e.g.:

```html
<head>

<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.rdf"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/dataset/34315559-2b08-44eb-a2e6-ebe9ce1a266b.ttl"/>
<!-- ... -->

</head>
```

Check the [RDF DCAT Serializer](profiles.md#rdf-dcat-serializer) section for more details about how these are generated and how to customize the output using [profiles](profiles.md#profiles).


You can specify the profile by using the `profiles=<profile1>,<profile2>` query parameter on the dataset endpoint (as a comma-separated list):

* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml?profiles=euro_dcat_ap](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.xml?profiles=euro_dcat_ap)
* [https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld?profiles=schemaorg](https://opendata.swiss/en/dataset/verbreitung-der-steinbockkolonien.jsonld?profiles=schemaorg)



## Catalog endpoint

Additionally to the individual dataset representations, the extension also offers a catalog-wide endpoint for retrieving multiple datasets at the same time (the datasets are paginated, see below for details):

https://{ckan-instance-host}/catalog.{format}?[page={page}]&[modified_since={date}]&[profiles={profile1},{profile2}]&[q={query}]&[fq={filter query}]

This endpoint base path can be customized if necessary using the [`ckanext.dcat.catalog_endpoint`](configuration.md#ckanextdcatcatalog_endpoint) configuration option, eg:

ckanext.dcat.catalog_endpoint = /dcat/catalog/{_format}

The custom endpoint **must** start with a forward slash (`/`) and contain the `{_format}` placeholder.

As described previously, the extension will determine the RDF serialization format returned.

* http://demo.ckan.org/catalog.rdf
* http://demo.ckan.org/catalog.xml
* http://demo.ckan.org/catalog.ttl

RDF representations will be advertised using `<link rel="alternate">` tags on the `<head>` section of the catalog homepage and the dataset search page source code, eg:

```html
<head>

<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.rdf"/>
<link rel="alternate" type="application/rdf+xml" href="http://demo.ckan.org/catalog.xml"/>
<link rel="alternate" type="text/turtle" href="http://demo.ckan.org/catalog.ttl"/>
<!-- ... -->

</head>
```

The number of datasets returned is limited. The response will include paging info, serialized using the [Hydra](http://www.w3.org/ns/hydra/spec/latest/core/) vocabulary. The different properties are self-explanatory, and can be used by clients to iterate the catalog:

```turtle
@prefix hydra: <http://www.w3.org/ns/hydra/core#> .

<http://example.com/catalog.ttl?page=1> a hydra:PagedCollection ;
hydra:first "http://example.com/catalog.ttl?page=1" ;
hydra:last "http://example.com/catalog.ttl?page=3" ;
hydra:next "http://example.com/catalog.ttl?page=2" ;
hydra:totalItems 283 .
```

The default number of datasets returned (100) can be modified by CKAN site maintainers using [`ckanext.dcat.datasets_per_page`](configuration.md#ckanextdcatdatasets_per_page)

The catalog endpoint also supports a `modified_since` parameter to restrict datasets to those modified from a certain date. The parameter value should be a valid ISO-8601 date:

http://demo.ckan.org/catalog.xml?modified_since=2015-07-24

It is possible to specify the profile(s) to use for the serialization using the `profiles` parameter:

http://demo.ckan.org/catalog.xml?profiles=euro_dcat_ap,sweden_dcat_ap

To filter the output, the catalog endpoint supports the `q` and `fq` parameters to specify a [search query](https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#q-parameter) or [filter query](https://solr.apache.org/guide/solr/latest/query-guide/common-query-parameters.html#fq-filter-query-parameter):



http://demo.ckan.org/catalog.xml?q=budget
http://demo.ckan.org/catalog.xml?fq=tags:economy



## URIs

Whenever possible, URIs are generated for the relevant entities. To try to generate them, the extension will use the first found of the following for each entity:

* Catalog:
- [`ckanext.dcat.base_uri`](configuration.md#ckanextdcatbase_uri) configuration option value. This is the recommended approach. Value should be a valid URI.
- [`ckan.site_url`](https://docs.ckan.org/en/latest/maintaining/configuration.html#ckan-site-url) configuration option value.
- 'http://' + `app_instance_uuid` configuration option value. This is not recommended, and a warning log message will be shown.

* Dataset:
- The value of the `uri` field (note that this is not included in the default CKAN schema)
- The value of an extra with key `uri`
- Catalog URI (see above) + '/dataset/' + `id` field

* Resource:
- The value of the `uri` field (note that this is not included in the default CKAN schema)
- Catalog URI (see above) + '/dataset/' + `package_id` field + '/resource/ + `id` field

Note that if you are using the [RDF DCAT harvester](harvester.md) to import datasets from other catalogs and these define a proper URI for each dataset or resource, these will be stored as `uri` fields in your instance, and so used when generating serializations for them.


## Content negotiation

The extension supports returning different representations of the datasets based on the value of the `Accept` header ([Content negotiation](https://en.wikipedia.org/wiki/Content_negotiation)). This is turned off by default, to enable it, set [`ckanext.dcat.enable_content_negotiation`](configuration.md#ckanextdcatenable_content_negotiation).

!!! Note

This feature overrides the CKAN core home page and dataset page view routes,
so you probably don't want to enable it if your own extension is also doing it.


When enabled, client applications can request a particular format via the `Accept` header on requests to the main dataset page, eg:

curl https://{ckan-instance-host}/dataset/{dataset-id} -H Accept:text/turtle

curl https://{ckan-instance-host}/dataset/{dataset-id} -H Accept:"application/rdf+xml; q=1.0, application/ld+json; q=0.6"

This is also supported on the [catalog endpoint](#catalog-endpoint), in this case when making a request to the CKAN root URL (home page). This won't support the pagination and filter parameters:

curl https://{ckan-instance-host} -H Accept:text/turtle
Loading