Table of Contents
- ckanext-wri
This is the WRI Open Data Portal extension for CKAN. It contains CKAN backend customizations for this project.
Variable | Description |
---|---|
ckanext.wri.odp_url |
The URL of the frontend application, e.g., https://wri.dev.frontend.datopian.com . |
ckanext.wri.prefect_url |
The URL of the Prefect API, e.g., https://wri.dev.prefect.datopian.com . |
ckanext.wri.datapusher_deployment_name |
The name of the Datapusher deployment, e.g, datapusher-dev . |
ckanext.wri.migration_deployment_name |
The name of the migration deployment, e.g., migration_deployment . |
ckanext.wri.migration_flow_name |
The name of the migration flow, e.g., trigger-migration . |
ckanext.wri.migration_deployment_env |
The environment of the migration deployment, e.g., dev , staging , prod . |
ckanext.wri.api_analytics.measurement_id |
The Google Analytics measurement ID for API tracking, e.g., G-XXXXXXXXXX . |
ckanext.wri.api_analytics.api_secret |
The Google Analytics API secret for API tracking, e.g., XXXXXXXXXX . |
For information on the Prefect variables/blocks required for migrations, see the migration README.
This extension includes the following plugins:
wri
- The main plugin that loads ckanext-wri.wri_api_tracking
- A plugin that tracks API usage with Google Analytics.wri_harvester
- A custom harvester for ingesting datasets between WRI CKAN portals.
This extension includes a notification feature that utilizes its own database, action endpoints, and custom validators. The feature adds three actions
notification_create
notification_update
notification_get_all
This extension adds a table named notifications
to the database for the API notification feature. Initialization is required on the initial boot-up of the extension.
To initiate the database setup, use the following command:
ckan -c <path-to-ini-file> notificationdb
Parameters:
- recipient_id (string) – The user ID of the recipient of the notification (required).
- sender_id (string) – The user ID of the sender of the notification (required).
- activity_type (string) – The type of activity that triggers the notification, such as
dataset_create
, etc. (required). - object_type (string) – The type of the object on which the action is being performed (e.g., dataset, resource, etc.) (required).
- object_id (string) – The ID of the object on which the action is being performed (required).
The parameters time_sent
(set to the current timestamp), state
(set as active
) and is_unread
(set to false
) are automatically configured during creation.
Parameters:
- recipient_id (string) – The user ID of the recipient of the notification (required).
- sender_id (string) – The user ID of the sender of the notification (required).
- activity_type (string) – The type of activity that triggers the notification, such as
dataset_create
, etc. (required). - object_type (string) – The type of the object on which the action is being performed (e.g., dataset, resource, etc.) (required).
- object_id (string) – The ID of the object on which the action is being performed (required).
- time_sent (datetime without timezone) – The timestamp of the sent time (required).
- is_unread (string) – Indicates whether the notification is read or not (required).
- state (string) –
active
ordeleted
(required).
Returns a list of notifications for a sender or recipient.
Parameters:
- recipient_id (string) – The user ID of the recipient of the notification (optional, but either
recipient_id
orsender_id
is required). - sender_id (string) – The user ID of the sender of the notification (optional, but either
recipient_id
orsender_id
is required).
A pending dataset is dataset metadata that's been submitted for approval. While pending, the dataset metadata lives in a separate table from the main package
table, pending_datasets
. Once approved, the existing dataset is updated with the new metadata.
The pending_datasets
table has the following columns:
package_id |
package_data |
last_modified |
---|---|---|
text (PK) |
jsonb |
timestamp |
The package_id
column is the UUID of the dataset (and it's the primary key). The package_data
column contains the dataset metadata as a JSONB object. The last_modified
column is a timestamp that is automatically generated whenever package_data
is updated.
You can initialize the pending dataset table by running the following command:
ckan -c <path-to-ini-file> pendingdatasetsdb
Parameters:
- package_id (string) – The UUID of the dataset (required).
- package_data (JSON object) – The dataset metadata (required).
Creates a new pending dataset and returns the newly created pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
- package_data (JSON object) – The dataset metadata (required).
Updates an existing pending dataset and returns the updated pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Deletes an existing pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Returns the pending dataset for the given package_id
.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Returns the diff between the pending dataset and the existing dataset for the given package_id
.
Here's an example:
{
"help": "http://ckan-dev:5000/api/3/action/help_show?name=pending_diff_show",
"success": true,
"result": {
"title": {
"old_value": "My dataset title",
"new_value": "My better dataset title"
},
"application": {
"old_value": "",
"new_value": "wri"
},
"resources[0].description": {
"old_value": "My resource description",
"new_value": "My better resource description"
},
"resources[0].format": {
"old_value": "CSV",
"new_value": "HTML"
},
"resources[1].title": {
"old_value": "My resource title",
"new_value": "My better resource title" },
"wri_data": {
"old_value": false,
"new_value": true
},
"cautions": {
"old_value": "",
"new_value": "This is a caution"
},
"languages": {
"old_value": [
"fr"
],
"new_value": [
"en"
]
},
"function": {
"old_value": "The function of this dataset is to x...",
"new_value": "The function of this dataset is to y..."
},
},
}
This extension includes a migration API that allows users to migrate datasets from RW to CKAN.
Note: The functionality of this endpoint is limited to the user's permissions in CKAN. For example, if a user has the correct permissions to create a dataset but doesn't have permissions to add it to the Team or Topic specified, the request will return an authorization error.
Migrates an RW dataset/metadata to CKAN. It maps all supported RW fields to CKAN fields. All additional RW fields (except objects) are stored in the migration_extras
field of the CKAN dataset. This endpoint handles both the creation and updating of datasets (this is determined automatically—no need to specify).
Parameters:
- rw_dataset_id (string) – The RW UUID of the dataset to migrate (required—unless
gfw_dataset
is provided). Example:c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e
. - application (string) – The RW application of the dataset to migrate (required). Example:
rw
. - dataset_slug (string) – The desired slug of the dataset to migrate (optional). If you use this option, you will need to include this parameter each time you call
migrate_dataset
for this dataset. This value will override theslug
value from the RW/GFW APIs. Example:my-dataset
. - dataset_title (string) – The desired title of the dataset to migrate (optional). If you use this option, you will need to include this parameter each time you call
migrate_dataset
for this dataset. This value will override thename
value from the RW API or thetitle
value from the GFW API. Example:My Dataset
. - gfw_dataset (string) – The GFW dataset to migrate (optional). If this dataset also has metadata in the RW API, you should also include
rw_dataset_id
. Example:gfw_forest_data
. - gfw_version (string) – The version of the GFW dataset to migrate (optional). Will default to the latest if a specific version isn't provided. Example:
v2020.01.01
. - team (string) – The
name
(slug
) of the Team to associate the dataset with (optional). Example:land-carbon-lab
. - topics (string) – A comma-separated list of Topic
slug
s to associate the dataset with (optional). Example:atmosphere,biodiversity
. - geographic_coverage (string) – The geographic coverage of the dataset (optional). Example:
Global
. - authors (array of objects or string) – An array of objects containing the
name
andemail
of the authors of the dataset (optional). Example:[{"name": "John Smith", "email": "john.smith@example.com"}, {"name": "Jane Smith", "email": "jane.smith@example.com"}]
. Alternatively, you can use the following string format, with each author name and email separated by a colon (:
) and each pair of author name/email separated by a semicolon (;
). Example:John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com
. - maintainers (array of objects or string) – An array of objects containing the
name
andemail
of the maintainers of the dataset (optional). Example:[{"name": "John Smith", "email": "john.smith@example.com"}, {"name": "Jane Smith", "email": "jane.smith@example.com"}]
. Alternatively, you can use the following string format, with each maintainer name and email separated by a colon (:
) and each pair of maintainer name/email separated by a semicolon (;
). Example:John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com
. - layer_ids (string) – A comma-separated list of RW Layer UUIDs to associate with the dataset (optional). All other layers will be skipped. Example:
c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e,c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e
. - blacklist (string) – A comma-separated list of CKAN fields to exclude from the migration mapping (optional—cannot be used with
whitelist
). Example:resources,notes
will exclude theresources
(Layers) andnotes
(Description) fields from the migration mapping. - whitelist (string) – A comma-separated list of CKAN fields to include in the migration mapping (optional—cannot be used with
blacklist
). Example:title,notes
will only include thetitle
(Title) andnotes
(Description) fields in the migration mapping.
A successful request will return the Prefect status of the new migration job.
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/migrate_dataset?rw_dataset_id=c12446ce-174f-4ffb-b2f7-77ecb0116aba&application=rw&team=migration-test&topics=lucas-topic,nov-16-topic"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=migration_status",
"success": true,
"result": {
"id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"created": "2024-04-19T16:25:50.064208+00:00",
"updated": "2024-04-19T16:26:09.039985+00:00",
"name": "masked-penguin",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": "659bece7fac5af816d61217219193235",
"parameters": {
"data_dict": {
"rw_dataset_id": "c12446ce-174f-4ffb-b2f7-77ecb0116aba",
"team": "migration-test",
"topics": [
"lucas-topic",
"nov-16-topic"
],
"application": "rw"
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": 0,
"retry_delay": 0,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "COMPLETED",
"state_name": "Completed",
"run_count": 1,
"expected_start_time": "2024-04-19T16:25:50.064030+00:00",
"next_scheduled_start_time": null,
"start_time": "2024-04-19T16:26:03.090073+00:00",
"end_time": "2024-04-19T16:26:09.038686+00:00",
"total_run_time": 5.948613,
"estimated_run_time": 5.948613,
"estimated_start_time_delta": 13.026043,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"type": "COMPLETED",
"name": "Completed",
"timestamp": "2024-04-19T16:26:09.038686+00:00",
"message": null,
"data": {
"type": "unpersisted"
},
"state_details": {
"flow_run_id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": null,
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
You'll need this ID: "id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471"
(result.id
) to check the status of the migration job at a later time.
Note: This endpoint is currently only available to system admins.
Triggers a full migration of RW datasets/metadata using the pre-defined datasets.csv
file. Optionally, the file_name
parameter can be used to point to another file (e.g. ?file_name=my_file.csv
—only provide the file name, not the path, as the assumed path is migration/files/<FILE_NAME>
). The files must be placed in the same directory as datasets.csv
. This endpoint handles both the creation and updating of datasets (this is determined automatically—no need to specify).
Parameters:
- blacklist (string) – A comma-separated list of CKAN fields to exclude from the migration mapping (optional—cannot be used with
whitelist
). Example:resources,notes
will exclude theresources
(Layers) andnotes
(Description) fields from the migration mapping. - whitelist (string) – A comma-separated list of CKAN fields to include in the migration mapping (optional—cannot be used with
blacklist
). Example:title,notes
will only include thetitle
(Title) andnotes
(Description) fields in the migration mapping. - file_name (string) – The name of the file to use for the migration (optional). Example:
my_datasets.csv
.
A successful request will return the Prefect status of the new migration job.
Same as above, but with the /trigger_migration
endpoint.
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/trigger_migration"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=trigger_migration",
"success": true,
"result": {
"id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae",
"created": "2024-04-19T13:35:02.161350+00:00",
"updated": "2024-04-19T13:35:02.158656+00:00",
"name": "hospitable-barracuda",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "ecfa19b2-dd38-4434-a167-974fd8149b68",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": null,
"parameters": {
"data_dict": {
"is_bulk": true
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": null,
"retry_delay": null,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "SCHEDULED",
"state_name": "Scheduled",
"run_count": 0,
"expected_start_time": "2024-04-19T13:35:02.161117+00:00",
"next_scheduled_start_time": "2024-04-19T13:35:02.161117+00:00",
"start_time": null,
"end_time": null,
"total_run_time": 0,
"estimated_run_time": 0,
"estimated_start_time_delta": 0.035684,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "ecfa19b2-dd38-4434-a167-974fd8149b68",
"type": "SCHEDULED",
"name": "Scheduled",
"timestamp": "2024-04-19T13:35:02.161053+00:00",
"message": null,
"data": null,
"state_details": {
"flow_run_id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": "2024-04-19T13:35:02.161117+00:00",
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
You'll need this ID: "id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae"
(result.id
) to check the status of the migration job at a later time.
Add a custom file to the migration/files
directory and commit it to the repo. Once deployed, you can use the file_name
parameter to specify it. The file should be a CSV with the following columns:
rw_dataset_id
(required—unlessgfw_dataset
is provided)application
(required)team
(optional)topics
(optional)geographic_coverage
(optional)authors
(optional)maintainers
(optional)layer_ids
(optional)layer_names
(optional)gfw_dataset
(optional—unlessrw_dataset_id
isn't provided)gfw_version
(optional)dataset_title
(optional)dataset_slug
(optional)
Note: authors
and maintainers
must follow the format Author One:Author One Email;Author Two:Author Two Email;Author Three:Author Three Email
. Each author/maintainer name and email must be separated by a colon (:
), and each author/maintainer pair must be separated by a semicolon (;
). If there's only one author/maintainer, you shouldn't include a semicolon.
Example:
rw_dataset_id,gfw_dataset,application,team,topics,geographic_coverage,authors,maintainers,layer_ids,dataset_title,dataset_slug
d491f094-ad6e-4015-b248-1d1cd83667fa,,aqueduct-water-risk,aqueduct,"freshwater,surface-water-bodies",Global,,John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com,,An Aqueduct Dataset,an-aqueduct-dataset
b318381e-485d-46c9-8958-c9a9d75d7e91,,aqueduct-water-risk,aqueduct,"freshwater,water-risks",Global,John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com,,,Another Aqueduct Dataset,another-aqueduct-dataset
faf79d2c-5e54-4591-9d70-4bd1029c18e6,,crt,agriadapt,atmosphere,Global,John Smith:john.smith@example.com,Jane Smith:jane.smith@example.com,,,
,gfw_forest_flux_forest_age_category,gfw,global-forest-watch,"land,ghg-emissions,forest",,,John Smith:john.smith@example.com,,,
,gfw_forest_flux_removal_forest_type,gfw,global-forest-watch,"land,ghg-emissions,forest",,Jane Smith:jane.smith@example.com,John Smith:john.smith@example.com,,Another Title Example,
47a8e6cc-ea40-44a8-b1fc-6cf4fcc7d868,nasa_viirs_fire_alerts,gfw,global-forest-watch,"land,natural-hazards,forest",Global,,,2462cceb-41de-4bd2-8251-a6f75fe4e3d5,,another-slug-example
c92b6411-f0e5-4606-bbd9-138e40e50eb8,,gfw,global-forest-watch,"land,forest",,Jane Smith:jane.smith@example.com,,"0cba3c4f-2d3b-4fb1-8c93-c951dc1da84b,2351399c-ef2c-48da-9485-20698190acb0",,
Returns the status of the specified migration job in Prefect.
Parameters:
- id (string) – The Prefect flow run ID (required). This is found at
result.id
in the response from the/migrate_dataset
or/trigger_migration
endpoints.
A successful request will return the current status of the migration job.
The following uses the flow run ID from the /migrate_dataset
endpoint example above:
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/migration_status?id=2b3d8bf5-80a1-4816-a2f0-55a97f720471"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=migration_status",
"success": true,
"result": {
"id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"created": "2024-04-19T16:25:50.064208+00:00",
"updated": "2024-04-19T16:26:09.039985+00:00",
"name": "masked-penguin",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": "659bece7fac5af816d61217219193235",
"parameters": {
"data_dict": {
"id": "c12446ce-174f-4ffb-b2f7-77ecb0116aba",
"team": "migration-test",
"topics": [
"lucas-topic",
"nov-16-topic"
],
"application": "rw"
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": 0,
"retry_delay": 0,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "COMPLETED",
"state_name": "Completed",
"run_count": 1,
"expected_start_time": "2024-04-19T16:25:50.064030+00:00",
"next_scheduled_start_time": null,
"start_time": "2024-04-19T16:26:03.090073+00:00",
"end_time": "2024-04-19T16:26:09.038686+00:00",
"total_run_time": 5.948613,
"estimated_run_time": 5.948613,
"estimated_start_time_delta": 13.026043,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"type": "COMPLETED",
"name": "Completed",
"timestamp": "2024-04-19T16:26:09.038686+00:00",
"message": null,
"data": {
"type": "unpersisted"
},
"state_details": {
"flow_run_id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": null,
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
The most important part of the response is the state
object, which contains the current state of the migration job. The actual status can be found at result.state.name
or result.state.type
.
The following fields are supported for the blacklist
and whitelist
parameters (all other fields do not currently have a direct mapping):
cautions
- Maps tocautions
in the RW dataset.citation
- Maps tocitation
in the RW dataset.function
- Maps tofunction
orfunctions
in the RW dataset.language
- Maps tolanguage
in the RW dataset. Note: This field in CKAN requires an ISO 639-1 language code. If the incoming language is not an ISO 639-1 code, it will be ignored.learn_more
- Maps to thelearn_more_link
in the RW dataset.notes
- Maps todescription
in the RW dataset.resources
- Maps to the Layers in the RW dataset. Each Layer is mapped to a CKAN resource.title
- Maps toname
in the RW dataset if exists, otherwise, uses the same slugified name.url
- Maps to thedata_download_original_link
if exists, otherwise, usesdata_download_link
. Note: This field in CKAN requires a valid URL. If the incoming URL is not valid, it will be ignored.
In all cases above, both the RW metadata and dataset are checked for a value, defaulting to the RW metadata value if it exists. In most cases, there's no comparable key/value in the RW dataset, but there are a few cases where the RW dataset has a key that's not in the RW metadata.
Most fields that are not mapped directly to CKAN are stored in a custom field called migration_extras
(Note: some data is not migrated, such as objects, e.g., the full layer metadata, application config, etc.). This field holds a JSON object where the keys are the paths within the source dataset objects and the values are the values of those paths. For example:
{
... (other CKAN dataset fields) ...
"migration_extras": {
"dataset.dataPath": "None",
"metadata.info.sources.0.id": "0",
"metadata.info.citation": "National Geospatial Intelligence Agency. 2019. \"World Port Index.\" Accessed from https://msi.nga.mil/NGAPortal/MSI.portal?_nfpb=true&_pageLabel=msi_portal_page_62&pubCode=0015. Accessed through Resource Watch, (date). [www.resourcewatch.org](https://www.resourcewatch.org).",
"metadata.updatedAt": "2022-03-18T04:56:01.316Z",
"dataset.overwrite": "false",
"dataset.protected": "false",
"metadata.info.geographic_coverage": "Global",
"metadata.info.technical_title": "World Port Index",
"metadata.info.cautions": "Any changes to the port facilities that have occurred since the last publication will not be reflected. This reproduction, partial or complete, of any National Geospatial-Intelligence Agency (NGA), National Imagery and Mapping Agency (NIMA), or Defense Mapping Agency (DMA) product, information, or data is not approved, authorized, or endorsed by the Secretary of Defense, the Director of National Intelligence, the Director of the NGA, or any other element of the U.S. government. The U.S. government and the NGA accept no liability for the accuracy or quality of this reproduction or the use of any NGA, NIMA, or DMA products, information, or data.",
"dataset.taskId": "None",
"dataset.type": "tabular",
"dataset.application.0": "rw",
"metadata.info.data_download_link": "https://wri-public-data.s3.amazonaws.com/resourcewatch/com_017_rw2_major_ports.zip",
"dataset.tableName": "com_017_rw2_major_ports_edit",
"dataset.requested_application": "rw",
"metadata.resource.type": "dataset",
"dataset.name": "com.017.rw2 Major Ports",
"dataset.published": "true",
"dataset.dataLastUpdated": "None",
"dataset.subtitle": "None",
"metadata.info.data_type": "Vector",
"metadata.info.license_link": "https://creativecommons.org/share-your-work/public-domain/",
"metadata.dataset": "28d1f505-571c-4a52-8215-48ea02aa4928",
"metadata.createdAt": "2020-09-22T17:42:17.618Z",
"metadata.dataset_type": "metadata",
"metadata.application": "rw",
"dataset.createdAt": "2020-09-22T17:42:11.637Z",
"dataset.env": "production",
"dataset.updatedAt": "2020-09-25T13:05:42.080Z",
"metadata.language": "en",
"dataset.errorMessage": "",
"dataset.mainDateField": "None",
"metadata.info.sources.0.source-name": "",
"metadata.source": "NGA",
"metadata.info.sources.0.source-description": "National Geospatial-Intelligence Agency (NGA)",
"metadata.info.license": "Public domain",
"dataset.provider": "cartodb",
"dataset.connectorUrl": "https://wri-rw.carto.com/tables/com_017_rw2_major_ports_edit/public",
"metadata.resource.id": "28d1f505-571c-4a52-8215-48ea02aa4928",
"metadata.info.frequency_of_updates": "Varies",
"metadata.info.functions": "Locations, physical characteristics, facilities, and services offered by major ports around the world",
"metadata.info.learn_more_link": "https://msi.nga.mil/Publications/WPI",
"metadata.dataset_id": "5f6a3779b16c4d001a2f0f40",
"metadata.info.spatial_resolution": "None",
"metadata.name": "Major Ports",
"dataset.userId": "5efe38618e222c0010996c3c",
"dataset.dataset_type": "dataset",
"metadata.status": "published",
"metadata.info.name": "Major Ports",
"dataset.connectorType": "rest",
"metadata.info.date_of_content": "2019",
"metadata.info.data_download_original_link": "https://msi.nga.mil/Publications/WPI",
"metadata.info.rwId": "com.017.rw2",
"dataset.dataset_id": "28d1f505-571c-4a52-8215-48ea02aa4928",
"dataset.status": "saved",
"dataset.geoInfo": "true",
"dataset.slug": "com017rw2-Major-Ports",
"metadata.description": "The World Port Index is created by the Maritime Security Office of the National Geospatial-Intelligence Agency (NGA) to document the locations and features of major ports around the world. The Maritime Security Office requests that mariners send it corrections in plain language, which the office subsequently codes to create a consistent record of port facilities. Resource Watch shows only a subset of the data set. For access to the full data set and additional information, see the Learn More link. \n \n### Additional Information \n \nResource Watch shows only a subset of the dataset. For access to the full dataset and additional information, click on the “Learn more” button. \n \n### Disclaimer \n \nExcerpts of this description page were taken from the source metadata.",
"dataset.attributesPath": "None"
},
... (other CKAN dataset fields) ...
}
This extension includes a custom harvester for the ingesting datasets from other instances of this project (e.g., harvesting datasets from Production to Staging). Most of the customizations are behind-the-scenes, but a new config option was added, and it will need to be set if you want to create new resources within the datasets instead of using links to the original resources.
In the harvest source configuration section, you can add the following option:
{
"create_resources": true
}
Along with the new config option, all of the existing harvest options should still work as expected, such as specifying organizations to include or exclude, using an API key/token, etc.
For example, let's assume that you want to:
- Create new resources (full resources, instead of merely metadata pointing to the URL of the original data)
- Create new organizations to match the source instance (if they don't exist already)
- Only harvest datasets from two source organizations (
global-forest-watch
andland-carbon-lab
), and - a single source group (
forests
)
Your harvest source configuration would look something like this:
{
"remote_orgs": "create",
"remote_groups": "create",
"create_resources": true,
"organizations_filter_include": ["global-forest-watch", "land-carbon-lab"],
"groups_filter_include": ["forests"]
}
Note: This plugin requires a Google Analytics account with a Measurement ID and API Secret to work. For more information on how to set up Google Analytics, see the Google Analytics documentation. Steps to set up reports can be found in the main docs
directory in this repo (see wri-odp/docs/ga-api/README.md
).
This extension includes an optional plugin that sends API usage analytics to Google Analytics. To enable this feature, you need to add the wri_api_tracking
plugin to the ckan.plugins
list in your CKAN configuration file, just after the wri
plugin:
ckan.plugins = ... wri wri_api_tracking ...
You also need to add the following configuration options to your CKAN configuration file (these are required if the wri_api_tracking
plugin is enabled):
ckanext.wri.api_analytics.measurement_id = G-XXXXXXXXXX
ckanext.wri.api_analytics.api_secret = XXXXXXXXXXX
The plugin sends a simple event to Google Analytics whenever an API request is made. The event is tracked in Google Analytics as ckan_api
with a few params/dimensions (action
and user_agent
being the most useful). This is the object that's sent to Google Analytics:
{
'client_id': cid, # This is either a randomly generated UUID or the hash hex digest of the current CKAN user.
'events': [
{
'name': 'ckan_api', # The event name, used to group all API events.
'params': {
'action': tk.request.environ['PATH_INFO'].split('/')[-1], # The API action, extracted from the URL (e.g., 'package_show').
'user_agent': tk.request.environ.get('HTTP_USER_AGENT', ''), # The user agent of the request (e.g., "curl/8.6.0", "ckanapi/4.7", "node", etc.).
'session_id': uuid.uuid4().hex, # A randomly generated UUID to trigger a session.
'engagement_time_msec': 1, # Engagement time set to 1, just to trigger user engagement.
},
}
],
}
See the CKAN Backend Development README for instructions on how to set up a local Docker CKAN backend development environment.
This extension lives in ckan-backend-dev/src/ckanext-wri
. It is symlinked to the root of this repo for convenience and visibility.
Because it's part of this unified repo, if you need to make changes, you can do so directly in the ckanext-wri
directory. There's no external repo to clone or fork (like other CKAN extensions), so you can just create a new branch off of dev
, make your changes, and submit a PR.
The unit tests for this extension are run as part of the make unit-tests
command in the ckan-backend-dev
Docker development environment, but while developing, you can also run them alone. To do so, in another terminal window, go to ckan-backend-dev
and run:
If the environment is not already running, start it:
make up
Then enter the Docker shell:
make shell
Once in the shell, navigate to the extension directory:
cd src_extensions/ckanext-wri
Finally, run the tests:
pytest --ckan-ini=test.ini ckanext/wri/tests