Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config declaration, document options #293

Merged
merged 4 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 149 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Check the [overview](#overview) section for a summary of the available features.
- [Translation of fields](#translation-of-fields)
- [Structured data and Google Dataset Search indexing](#structured-data-and-google-dataset-search-indexing)
- [CLI](#cli)
- [Configuration reference](#configuration-reference)
- [Running the Tests](#running-the-tests)
- [Releases](#releases)
- [Acknowledgements](#acknowledgements)
Expand Down Expand Up @@ -95,7 +96,7 @@ These are implemented internally using:

3. Enable the required plugins in your ini file:

ckan.plugins = dcat dcat_rdf_harvester dcat_json_harvester dcat_json_interface structured_data
ckan.plugins = dcat dcat_rdf_harvester structured_data

4. To use the pre-built schemas, install [ckanext-scheming](https://github.com/ckan/ckanext-scheming):

Expand All @@ -105,6 +106,11 @@ Check the [Schemas](#schemas) section for extra configuration needed.

Optionally, if you want to use the RDF harvester, install ckanext-harvest as well ([https://github.com/ckan/ckanext-harvest#installation](https://github.com/ckan/ckanext-harvest#installation)).

For further configuration options available, see [Configuration reference](#configuration-reference).




## Schemas

The extension includes ready to use [ckanext-scheming](https://github.com/ckan/ckanext-scheming) schemas that enable DCAT support. These include a schema definition file (located in `ckanext/dcat/schemas`) plus extra validators and other custom logic that integrates the metadata modifications with the RDF DCAT [Parsers](#rdf-dcat-parser) and [Serializers](#rdf-dcat-serializer) and other CKAN features and extensions.
Expand Down Expand Up @@ -1142,6 +1148,148 @@ The latter form allows chaininig commands for more complex metadata processing,

For the full list of options check `ckan dcat consume --help` and `ckan dcat produce --help`.

## Configuration reference

<!-- start-config -->

### General settings

#### ckanext.dcat.rdf.profiles

Example:

```
ckanext.dcat.rdf.profiles = euro_dcat_ap_2 my_local_ap
```

Default value: `euro_dcat_ap_2`

RDF profiles to use when parsing and serializing. See https://github.com/ckan/ckanext-dcat#profiles
for more details.


#### ckanext.dcat.translate_keys

Default value: `True`

If set to True, the plugin will automatically translate the keys of the DCAT
fields used in the frontend (at least those present in the `ckanext/dcat/i18n`
po files).


### Parsers / Serializers settings

#### ckanext.dcat.output_spatial_format

Default value: `wkt`

Format to use for geometries when serializing RDF documents. The default is
recommended as is the format expected by GeoDCAT, alternatively you can
use `geojson` (or both, which will make SHACL validation fail)


#### ckanext.dcat.resource.inherit.license

Default value: `False`

If there is no license defined for a resource / distribution, inherit it from
the dataset.


#### ckanext.dcat.normalize_ckan_format

Default value: `True`

When true, the resource label will be tried to match against the standard
list of CKAN formats (https://github.com/ckan/ckan/blob/master/ckan/config/resource_formats.json)
This allows for instance to populate the CKAN resource format field
with a value that view plugins, etc will understand (`csv`, `xml`, etc.)


#### ckanext.dcat.clean_tags

Default value: `False`

Remove special characters from keywords (use the old munge_tag() CKAN function).
This is generally not needed.


### Endpoints settings

#### ckanext.dcat.enable_rdf_endpoints

Default value: `True`

Whether to expose the catalog and dataset endpoints with the RDF DCAT
serializations.


#### ckanext.dcat.catalog_endpoint

Example:

```
ckanext.dcat.catalog_endpoint = /dcat/catalog/{_format}
```

Default value: `/catalog.{_format}`

Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.


#### ckanext.dcat.dataset_per_page

Default value: `100`

Default number of datasets returned by the catalog endpoint.


#### ckanext.dcat.enable_content_negotiation

Default value: `False`

Enable content negotiation in the main catalog and dataset endpoints. Note that
setting this to True overrides the core `home.index` and `dataset.read` endpoints.


### Harvester settings

#### ckanext.dcat.max_file_size

Default value: `50`

Maximum file size that will be downloaded for parsing by the harvesters


#### ckanext.dcat.expose_subcatalogs

Default value: `False`

Store information about the origin catalog when harvesting datasets.
See https://github.com/ckan/ckanext-dcat#transitive-harvesting for more details.


### Deprecated options (will be removed in future versions)

#### ckanext.dcat.compatibility_mode

Default value: `False`

Whether to modify some fields to maintain compatibility with previous versions
of the ckanext-dcat parsers.


#### ckanext.dcat.json_endpoint

Default value: `/dcat.json`

Custom route to expose the legacy JSON endpoint



<!-- end-config -->

## Running the Tests

To run the tests do:
Expand Down
115 changes: 115 additions & 0 deletions ckanext/dcat/config_declaration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
version: 1
groups:
- annotation: General settings
options:

- key: ckanext.dcat.rdf.profiles
default_callable: 'ckanext.dcat.processors:_get_default_rdf_profiles'
description: |
RDF profiles to use when parsing and serializing. See https://github.com/ckan/ckanext-dcat#profiles
for more details.
example: 'euro_dcat_ap_2 my_local_ap'

- key: ckanext.dcat.translate_keys
type: bool
default: True
description: |
If set to True, the plugin will automatically translate the keys of the DCAT
fields used in the frontend (at least those present in the `ckanext/dcat/i18n`
po files).

- annotation: Parsers / Serializers settings
options:

- key: ckanext.dcat.output_spatial_format
type: list
default:
- 'wkt'
description: |
Format to use for geometries when serializing RDF documents. The default is
recommended as is the format expected by GeoDCAT, alternatively you can
use `geojson` (or both, which will make SHACL validation fail)

- key: ckanext.dcat.resource.inherit.license
type: bool
default: False
description: |
If there is no license defined for a resource / distribution, inherit it from
the dataset.

- key: ckanext.dcat.normalize_ckan_format
type: bool
default: True
description: |
When true, the resource label will be tried to match against the standard
list of CKAN formats (https://github.com/ckan/ckan/blob/master/ckan/config/resource_formats.json)
This allows for instance to populate the CKAN resource format field
with a value that view plugins, etc will understand (`csv`, `xml`, etc.)

- key: ckanext.dcat.clean_tags
type: bool
default: False
description: |
Remove special characters from keywords (use the old munge_tag() CKAN function).
This is generally not needed.

- annotation: Endpoints settings
options:

- key: ckanext.dcat.enable_rdf_endpoints
default: True
description: |
Whether to expose the catalog and dataset endpoints with the RDF DCAT
serializations.
type: bool

- key: ckanext.dcat.catalog_endpoint
default: '/catalog.{_format}'
description: |
Custom route for the catalog endpoint. It should start with `/` and include the
`{_format}` placeholder.
example: '/dcat/catalog/{_format}'

- key: ckanext.dcat.dataset_per_page
default: 100
type: int
description: |
Default number of datasets returned by the catalog endpoint.

- key: ckanext.dcat.enable_content_negotiation
default: False
type: bool
description: |
Enable content negotiation in the main catalog and dataset endpoints. Note that
setting this to True overrides the core `home.index` and `dataset.read` endpoints.

- annotation: Harvester settings
options:

- key: ckanext.dcat.max_file_size
type: int
default: 50
description: |
Maximum file size that will be downloaded for parsing by the harvesters

- key: ckanext.dcat.expose_subcatalogs
type: bool
default: false
description: |
Store information about the origin catalog when harvesting datasets.
See https://github.com/ckan/ckanext-dcat#transitive-harvesting for more details.

- annotation: Deprecated options (will be removed in future versions)
options:

- key: ckanext.dcat.compatibility_mode
type: bool
default: False
description: |
Whether to modify some fields to maintain compatibility with previous versions
of the ckanext-dcat parsers.

- key: ckanext.dcat.json_endpoint
default: '/dcat.json'
description: |
Custom route to expose the legacy JSON endpoint
8 changes: 8 additions & 0 deletions ckanext/dcat/plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@
I18N_DIR = os.path.join(HERE, u"../i18n")


def config_declaration(func):
if p.toolkit.check_ckan_version(min_version="2.10.0"):
return p.toolkit.blanket.config_declarations(func)
else:
return func


def _get_dataset_schema(dataset_type="dataset"):
schema = None
try:
Expand All @@ -43,6 +50,7 @@ def _get_dataset_schema(dataset_type="dataset"):
return schema


@config_declaration
class DCATPlugin(p.SingletonPlugin, DefaultTranslation):

p.implements(p.IConfigurer, inherit=True)
Expand Down
5 changes: 5 additions & 0 deletions ckanext/dcat/processors.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@
DEFAULT_RDF_PROFILES = ['euro_dcat_ap_2']


def _get_default_rdf_profiles():
"""Helper function used fo documenting the rdf profiles config option"""
return " ".join(DEFAULT_RDF_PROFILES)


class RDFProcessor(object):

def __init__(self, profiles=None, dataset_type='dataset', compatibility_mode=False):
Expand Down