This repository contains Jupyter Notebooks for harvesting metadata for the BTAA Geoportal.
The BTAA Geoportal holds metadata records that point to geospatial data, maps, aerial imagery, web services, and websites hosted online by external organizations. The most common way of obtaining this metadata is to programmatically harvest it from an organization's website. These websites may be in the form of a data portal, a static page, or custom platform. Due to the many variations in how the different websites are structured, we have several workflows for obtaining the metadata.
They are primarily intended for the BTAA-GIN Product Manager and Graduate Research Assistants. However, anyone interested in batch metadata harvesting and processing may be able to find useful techniques presented here.
This Guide is hosted in GitHub at https://github.com/geobtaa/harvesting-guide and has all of the necessary files to run the tutorials as well as Jupyter Notebooks for running the Recipes. Make a fork or new branch of the repository to get started.
The Tutorial section contains short, easy to complete exercises to help someone get the basics of running and writing scripts to harvest metadata.
The recipes are step by step workflows for harvesting metadata from specific websites or groups of portals using the same technology. They may involve multiple steps and require manual troubleshooting at times. These guides will need regular maintenance and updates as the source websites may upgrade, change, or disappear.
The tutorials and recipes were prepared by Alexander Danielson and Karen Majewicz in April 2023.
The recipes also contain code contributed by **Melinda Kernik ** and alumni BTAA graduate research assistants, including:
- Ziying (Gene) Cheng - (2020-2022)
- Yijing (Zoey) Zhou - (2020-2021)
- Emily Ruetz (2018-2020)
- Andrew Smith (2017-2019)
- Lewei Hi (2017)