Skip to content

DLME Data Harvesting and Transformation

jacobthill edited this page Jul 9, 2024 · 3 revisions

Configuring the airflow catalog

DLME currently maintains the following intake drivers:

  • iiif_json: Used for harvesting IIIF collections. Requires a collection level IIIF manifest.
  • json: Used for harvesting collections from custom json APIs. May not work for all APIs.
  • oai_xml: Used for harvesting OAI-PMH collections.
  • sequential_csv: Used for harvesting csv files.
  • xml: Used for harvesting collections from custom xml APIs. May not work for all APIs.

Mapping in traject

Data normalization

Controlled vocabularies

Data validation