xcube-stac
is a Python package and
xcube plugin that adds a
data store
named stac
to xcube. The data store is used to access data from the
STAC - SpatioTemporal Asset Catalogs.
Installing xcube-stac directly from the git repository, clone the repository,
direct into xcube-stac
, and follow the steps below:
conda env create -f environment.yml
conda activate xcube-stac
pip install .
This installs all the dependencies of xcube-stac
into a fresh conda
environment, then installs xcube-stac into this environment from the
repository.
A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog, collection, and item. Each item can contain multiple assets, each linked to a data source. Items are associated with a timestamp or temporal range and a bounding box describing the spatial extent of the data.
Items within a collection generally exhibit similarities. For example, a STAC catalog might contain multiple collections corresponding to different space-borne instruments. Each item represents a measurement covering a specific spatial area at a particular timestamp. For a multi-spectral instrument, different bands can be stored as separate assets.
A STAC catalog can comply with the STAC API - Item Search conformance class, enabling server-side searches for items based on specific parameters. If this compliance is not met, only client-side searches are possible, which can be slow for large STAC catalogs.
The xcube-stac plugin reads the data sources from the STAC catalog and opens the data in an analysis ready form following the xcube dataset convetion. By default, a data ID represents one item, which is opened as a dataset, with each asset becoming a data variable within the dataset.
Additionally, a stack mode is available, enabling the stacking of items using odc-stac. This allows for mosaicking multiple tiles and concatenating the datacube along the temporal axis.
Also, stackstac has been considered during the evaluation of python libraries supporting stacking of STAC items. However, the benchmarking report comparing stackstac and odc-stac shows that ocd-stac outperforms stackstac. Furthermore, stackstac shows an issue in making use of the overview levels of COGs files. Still, stackstac shows high popularity in the community and might be supported in the future.
The following Jupyter notebooks provide some examples:
example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb
: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.example/notebooks/geotiff_nonsearchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a non-searchable STAC catalog.example/notebooks/geotiff_searchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a searchable STAC catalog.example/notebooks/netcdf_searchable_catalog.ipynb
: This notebook shows an example how to load a NetCDF file from a searchable STAC catalog.example/notebooks/xcube_server_stac_s3.ipynb
: This notebook shows an example how to open data sources published by xcube server via the STAC API.
The xcube data store framework allows to easily access data in an analysis ready format, following the few lines of code below.
from xcube.core.store import new_data_store
store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1"
)
ds = store.open_data(
"collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
data_type="dataset"
)
The data ID "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A"
points to the
STAC item's JSON
and is specified by the segment of the URL that follows the catalog's URL. The
data_type
can be set to dataset
and mldataset
, which returns a xr.Dataset
and
a xcube multi-resoltuion dataset,
respectively. Note that in the above example, if data_type
is not assigned,
a multi-resolution dataset will be returned. This is because the item's asset links to
GeoTIFFs, which are opened as multi-resolution datasets by default.
To use the stac-mode, initiate a stac store with the argument stack_mode=True
.
from xcube.core.store import new_data_store
store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
)
In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To
get Sentinel-2 L2A data, we assign data_id
to "sentinel-2-l2a"
. The bounding box and
time range are assigned to define the temporal and spatial extent of the data cube.
Additionally, for this example, we need to set a query argument to select a specific
Sentinel-2 processing baseline,
as the collection contains multiple items for the same tile with different processing
procedures. Note that this requirement can vary between collections and must be
specified by the user. To set query arguments, the STAC catalog needs to be conform with
the query extension.
The stacking is performed using odc-stac.
All arguments of odc.stac.load
can be passed into the open_data(...)
method, which forwards them to the
odc.stac.load
function.
To apply mosaicking, we need to assign groupby="solar_day"
, as shown in the
documentation of odc.stac.load
.
The following few lines of code show a small example including mosaicking.
from xcube.core.store import new_data_store
store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
groupby="solar_day",
)
To run the unit test suite:
pytest
To analyze test coverage:
pytest --cov=xcube_stac
To produce an HTML coverage report:
pytest --cov-report html --cov=xcube_stac
The unit test suite uses pytest-recording
to mock STAC catalogs. During development an actual HTTP request is performed
to a STAC catalog and the responses are saved in cassettes/**.yaml
files.
During testing, only the cassettes/**.yaml
files are used without an actual
HTTP request. During development, to save the responses to cassettes/**.yaml
, run
pytest -v -s --record-mode new_episodes
Note that --record-mode new_episodes
overwrites all cassettes. If the user only
wants to write cassettes which are not saved already, --record-mode once
can be used.
pytest-recording supports all records modes given by VCR.py.
After recording the cassettes, testing can be then performed as usual.