Skip to content

A xcube plugin that allows generating data cubes from the STAC API.

License

Notifications You must be signed in to change notification settings

xcube-dev/xcube-stac

Repository files navigation

xcube-stac

Build Status codecov Code style: black License

xcube-stac is a Python package and xcube plugin that adds a data store named stac to xcube. The data store is used to access data from the STAC - SpatioTemporal Asset Catalogs.

Table of contents

  1. Setup
    1. Installing the xcube-stac plugin from the repository
  2. Overview
    1. General structure of a STAC catalog
    2. General functionality of xcube-stac
  3. Introduction to xcube-stac
    1. Overview of Jupyter notebooks
    2. Getting started
  4. Testing
    1. Some notes on the strategy of unit-testing

Setup

Installing the xcube-stac plugin from the repository

Installing xcube-stac directly from the git repository, clone the repository, direct into xcube-stac, and follow the steps below:

conda env create -f environment.yml
conda activate xcube-stac
pip install .

This installs all the dependencies of xcube-stac into a fresh conda environment, then installs xcube-stac into this environment from the repository.

Overview

General structure of a STAC catalog

A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog, collection, and item. Each item can contain multiple assets, each linked to a data source. Items are associated with a timestamp or temporal range and a bounding box describing the spatial extent of the data.

Items within a collection generally exhibit similarities. For example, a STAC catalog might contain multiple collections corresponding to different space-borne instruments. Each item represents a measurement covering a specific spatial area at a particular timestamp. For a multi-spectral instrument, different bands can be stored as separate assets.

A STAC catalog can comply with the STAC API - Item Search conformance class, enabling server-side searches for items based on specific parameters. If this compliance is not met, only client-side searches are possible, which can be slow for large STAC catalogs.

General functionality of xcube-stac

The xcube-stac plugin reads the data sources from the STAC catalog and opens the data in an analysis ready form following the xcube dataset convetion. By default, a data ID represents one item, which is opened as a dataset, with each asset becoming a data variable within the dataset.

Additionally, a stack mode is available, enabling the stacking of items using odc-stac. This allows for mosaicking multiple tiles and concatenating the datacube along the temporal axis.

Also, stackstac has been considered during the evaluation of python libraries supporting stacking of STAC items. However, the benchmarking report comparing stackstac and odc-stac shows that ocd-stac outperforms stackstac. Furthermore, stackstac shows an issue in making use of the overview levels of COGs files. Still, stackstac shows high popularity in the community and might be supported in the future.

Introduction to xcube-stac

Overview of Jupyter notebooks

The following Jupyter notebooks provide some examples:

  • example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.
  • example/notebooks/geotiff_nonsearchable_catalog.ipynb: This notebook shows an example how to load a GeoTIFF file from a non-searchable STAC catalog.
  • example/notebooks/geotiff_searchable_catalog.ipynb: This notebook shows an example how to load a GeoTIFF file from a searchable STAC catalog.
  • example/notebooks/netcdf_searchable_catalog.ipynb: This notebook shows an example how to load a NetCDF file from a searchable STAC catalog.
  • example/notebooks/xcube_server_stac_s3.ipynb: This notebook shows an example how to open data sources published by xcube server via the STAC API.

Getting started

The xcube data store framework allows to easily access data in an analysis ready format, following the few lines of code below.

from xcube.core.store import new_data_store

store = new_data_store(
    "stac",
    url="https://earth-search.aws.element84.com/v1"
)
ds = store.open_data(
    "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
    data_type="dataset"
)

The data ID "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A" points to the STAC item's JSON and is specified by the segment of the URL that follows the catalog's URL. The data_type can be set to dataset and mldataset, which returns a xr.Dataset and a xcube multi-resoltuion dataset, respectively. Note that in the above example, if data_type is not assigned, a multi-resolution dataset will be returned. This is because the item's asset links to GeoTIFFs, which are opened as multi-resolution datasets by default.

To use the stac-mode, initiate a stac store with the argument stack_mode=True.

from xcube.core.store import new_data_store

store = new_data_store(
    "stac",
    url="https://earth-search.aws.element84.com/v1",
    stack_mode=True
)
ds = store.open_data(
    "sentinel-2-l2a",
    data_type="dataset",
    bbox=[9.1, 53.1, 10.7, 54],
    time_range= ["2020-07-01", "2020-08-01"],
    query={"s2:processing_baseline": {"eq": "02.14"}},
)

In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To get Sentinel-2 L2A data, we assign data_id to "sentinel-2-l2a". The bounding box and time range are assigned to define the temporal and spatial extent of the data cube. Additionally, for this example, we need to set a query argument to select a specific Sentinel-2 processing baseline, as the collection contains multiple items for the same tile with different processing procedures. Note that this requirement can vary between collections and must be specified by the user. To set query arguments, the STAC catalog needs to be conform with the query extension.

The stacking is performed using odc-stac. All arguments of odc.stac.load can be passed into the open_data(...) method, which forwards them to the odc.stac.load function.

To apply mosaicking, we need to assign groupby="solar_day", as shown in the documentation of odc.stac.load. The following few lines of code show a small example including mosaicking.

from xcube.core.store import new_data_store

store = new_data_store(
    "stac",
    url="https://earth-search.aws.element84.com/v1",
    stack_mode=True
)
ds = store.open_data(
    "sentinel-2-l2a",
    data_type="dataset",
    bbox=[9.1, 53.1, 10.7, 54],
    time_range= ["2020-07-01", "2020-08-01"],
    query={"s2:processing_baseline": {"eq": "02.14"}},
    groupby="solar_day",
)

Testing

To run the unit test suite:

pytest

To analyze test coverage:

pytest --cov=xcube_stac

To produce an HTML coverage report:

pytest --cov-report html --cov=xcube_stac

Some notes on the strategy of unit-testing

The unit test suite uses pytest-recording to mock STAC catalogs. During development an actual HTTP request is performed to a STAC catalog and the responses are saved in cassettes/**.yaml files. During testing, only the cassettes/**.yaml files are used without an actual HTTP request. During development, to save the responses to cassettes/**.yaml, run

pytest -v -s --record-mode new_episodes

Note that --record-mode new_episodes overwrites all cassettes. If the user only wants to write cassettes which are not saved already, --record-mode once can be used. pytest-recording supports all records modes given by VCR.py. After recording the cassettes, testing can be then performed as usual.

About

A xcube plugin that allows generating data cubes from the STAC API.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages