This repository contains opensource geodata ETL/CI/CD pipeline developed by Kontur for MapAction. It is based on Kontur Geocint technology.
This pipeline downloads data from various sources, including OpenStreetMap, HDX and others, and produces geospatial datasets, in form of geojsons and ESRI shapefiles, which are uploaded to S3 compatible storage and to CKAN.
In order to make it running, you need two other repositories:
For more information on geocint installation and basic configuration please see Geocint readme and Geocint documentation.
There are 2 things that are needed to be configured for this pipeline specifically:
For S3 storage you can use Amazon services or any other cloud/hosting provider who provides S3 compartible storage. For S3 storage creation please refer to Amazon S3 documentation or documentation of your hosting provider.
For CKAN installation an configuration please refer to CKAN Maintainer’s guide.
The following sections describe how to specify credentials for S3 storage and CKAN for the pipeline.
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
andAWS_REGION
are specified in environment variables. Please check this manual for more informarmation or useaws configure
.
CKAN_BASE_URL
inconfig.inc.sh
should be set to url of your CKAN installation, for examplehttps://ckan.test.io/ckan
.CKAN_API_KEY
inconfig.in.sh
should be set to API key manually generated from CKAN UI (User Profile > Manage > API tokens). You can find more information in CKAN documentation.- In
config.inc.sh
variableCKAN_DATA_S3_URL
should be specified to path in S3 where you expect datasets to be uploaded, for examples3://geodata-eu-central-1-kontur-public/mapaction_dataset/
and variableCKAN_DATA_URL
should point at the same path in S3 asCKAN_DATA_S3_URL
but using http protocol, for examplehttps://geodata-eu-central-1-kontur-public.s3.amazonaws.com/mapaction_dataset/
.
Note: S3 paths should be specified with trailing slash /. Variable values should be specified WITHOUT quotes.
EXAMPLE
# CKAN Configuration
# Url of your CKAN instance
CKAN_BASE_URL=https://ckan.test.io/ckan
# API KEY of your CKAN user. Make sure it is added as editor to a proper organization
CKAN_API_KEY=aaaaaaaaaabbbbbbbbbbbbbbbcccccccc
# S3 path to you basket and folder, with trailing "/"
CKAN_DATA_S3_URL=s3://geodata-eu-central-1-kontur-public/mapaction_dataset/
# path to your files on S3 via http(s) protocol, with trailing "/"
CKAN_DATA_URL=https://geodata-eu-central-1-kontur-public.s3.amazonaws.com/mapaction_dataset/
- To generate data for a country there should be a json file with country boundaries in the directory
static_data/countries
. Currently there are 25 countries from MapAction priority country linst in this directory. To add another countries you can copy corresponding json files fromstatic_data/countries_world
tostatic_data/countries
.
Pipeline for generating OSM layers looks like this:
- downloading OSM planet
- extracting from OSM planet country region
- importing country region extract into database
- mapping OSM features to MapAction layers
- exporting from database to SHP/GeoJSON files
- uploading files to CKAN
Because of ununiform naming of population tabular data on HDX it was implemented using static_data/hdx_admin_pop_urls.json
file.
The script filters values using country code and downloads this layers.
TO ADD OR UPDATE NEW LAYER
- add OR update
static_data/hdx_admin_pop_urls.json