- Title: Processing
- Identifier: https://stac-extensions.github.io/processing/v1.2.0/schema.json
- Field Name Prefix: processing
- Scope: Item, Collection
- Extension Maturity Classification: Candidate
- Owner: @emmanuelmathot
Processing metadata is considered to be data that indicate from which processing chain a data originates and how the data itself has been produced. Overall, it helps to increase traceability and search among processing levels and multiple algorithm versions.
Often, data items are the result of one or more waterfall processing pipeline. Tracing information such as the processing facility, the algorithm version or the processing date helps in the data version management.
This extension applies to STAC Items and STAC Collections. As these processing information are often closely bound to the Collection level and therefore are shared across all items, it is recommended adding the fields to the corresponding STAC Collection.
- Examples:
- Item example: Shows the basic usage of the extension in a STAC Item
- Collection example: Shows the basic usage of the extension in a STAC Collection
- JSON Schema
- Changelog
Field Name | Type | Description |
---|---|---|
processing:expression | Expression Object | An expression or processing chain that describes how the data has been processed. Alternatively, you can also link to a processing chain with the relation type processing-expression (see below). |
processing:lineage | string | Lineage Information provided as free text information about the how observations were processed or models that were used to create the resource being described NASA ISO. For example, GRD Post Processing for "GRD" product of Sentinel-1 satellites. CommonMark 0.29 syntax MAY be used for rich text representation. |
processing:level | string | The name commonly used to refer to the processing level to make it easier to search for product level across collections or items. The short name must be used (only L , not Level ). See the list of suggested processing levels. |
processing:facility | string | The name of the facility that produced the data. For example, Copernicus S1 Core Ground Segment - DPA for product of Sentinel-1 satellites. |
processing:datetime | string | Processing date and time of the corresponding data formatted according to RFC 3339, section 5.6, in UTC. |
processing:version | string | The version of the primary processing software or processing chain that produced the data. For example, this could be the processing baseline for the Sentinel missions. |
processing:software | Map<string, string> | A dictionary with name/version for key/value describing one or more applications or libraries that were involved during the production of the data for provenance purposes. |
The fields in the table above can be used in these parts of STAC documents:
- Catalogs
- Collections
- Collection Provider
- Item Properties (incl. Summaries in Collections)
- Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
- Links
In more detail, the following restrictions apply:
-
Items:
- The fields are usually placed in the properties. At least one field is required to be present.
- Additionally, STAC allows all fields to be used in the Asset Object.
-
Collections:
- The fields are usually placed in the Provider Objects
for the
providers
that have the roleproducer
orprocessor
assigned. They don't need to be provided for all providers of the respective role. - The fields can also be used in
summaries
, Collectionassets
or Item asset definitions (item_assets
). Please note that the JSON Schema is not be able to validate the values of Collection summaries.
- The fields are usually placed in the Provider Objects
for the
If the extension is given in the stac_extensions
list, at least one of the fields must be specified in any of the given places listed above.
The time of the processing can be specified as a global field in processing:datetime
,
but it can also be specified directly and individually via the created
properties of the target asset
as specified in the STAC Common metadata.
created
in Item properties describes the STAC metadata creation and in Assets it describes the creation of the data files.
Thus the timestamps provided in Item Properties for created
and processing:datetime
may differ.
As Item properties are easier to be indexed and used for filtering purposes, processing:datetime
exists.
created
and processing:datetime
should usually be the same value in Assets and as such processing:datetime
can usually be omitted.
Three fields exist for version numbers:
processing:software
processing:version
version
(in the Version extension)
The different fields exist to give data providers more flexibility depending on their needs.
In Item Properties:
processing:version
is useful if a single version number is available for the metadata or data that users should be able to filter on. A popular example for this is the processing baseline in Sentinel missions.processing:software
is used if the software libraries/tools are important to know, but it's not important to filter on them. They are mostly informative and important to be complete for reporducibility purposes. Thus, the values in the object can not just be version numbers, but also be e.g. tag names, commit hashes or similar. For example, you could expose a simplified version of thePipfile.lock
(Python) orpackage-lock.json
(NodeJS). If you need more information, you could also link to such files via the relation typeprocessing-software
.version
is usually not used in the context of processing and describes the version of the metadata.
In Items that declare this processing
extension, it is recommended to add one or more Links with derived_from
or via
relationships to the eventual source metadata & data used in the processing.
They could be used to trace back the processing history of the dataset.
The processing:level
is the name that is commonly used to refer to that processing level properties.
The table below shows some processing level used by the industry for some data product.
Each level represents a step in the abstraction process by which data relevant to physical information (raw, level 0, level 1) are turned into data relevant to geo physical information (level 2, level 3), and finally turned into data relevant to thematic information (level 4)
This list is not exhaustive and can be extended with the processing level specific to a data product.
Level Name | Description | Typical data product |
---|---|---|
RAW | Data in their original packets, as received from the instrument. | Sentinel-1 RAW |
L0 | Reconstructed unprocessed instrument data at full space time resolution with all available supplemental information to be used in subsequent processing (e.g., ephemeris, health and safety) appended. | Landsat Level 0 |
L1 | Unpacked, reformatted level 0 data, with all supplemental information to be used in subsequent processing appended. Optional radiometric and geometric correction applied to produce parameters in physical units. Data generally presented as full time/space resolution. A wide variety of sub level products are possible (see below). | Sentinel-1 Level 1 Sentinel-2 L1A |
L2 | Retrieved environmental variables (e.g., ocean wave height, soil-moisture, ice concentration) at the same resolution and location as the level 1 source data. A wide variety of sub-level products are possible (see below). | Sentinel-2 L2A |
L3 | Data or retrieved environmental variables which have been spatiallyand/or temporally re-sampled (i.e., derived from level 1 or 2 products). Such re-sampling may include averaging and compositing. A wide variety of sub-level products are possible (see below). | ENVISAT Level-3, Sentinel-2 L3 |
L4 | Model output or results from analyses of lower level data (i.e.,variables that are not directly measured by the instruments, but are derived from these measurements) |
Field Name | Type | Description |
---|---|---|
format | string | REQUIRED The type of the expression that is specified in the expression property. |
expression | * | REQUIRED An expression compliant with the format specified. The expression can be any data type and depends on the format given, e.g. string or object. |
Potential expression formats with examples:
Format | Type | Description | Example |
---|---|---|---|
gdal-calc |
string | A gdal_calc.py expression based on numpy syntax. |
A*logical_or(A<=177,A>=185) |
openeo |
object | openEO process | Example |
rio-calc |
string | A rio-calc (RasterIO) expression | (b4-b1)/(b4+b1) |
The following types should be used as applicable rel
types in the
Link Object.
Type | Description |
---|---|
derived_from | URL to a STAC Item that was used as input data in the creation of this Item. |
processing-expression | A processing chain (or script) that describes how the data has been processed. |
processing-execution | URL to any resource representing the processing execution (e.g. OGC Process API). |
processing-software | URL to any resource that identifies the software and versions used for processing the data, e.g. a Pipfile.lock (Python) or package-lock.json (NodeJS). |
All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.
The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid.
To run tests locally, you'll need npm
, which is a standard part of any node.js installation.
First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:
npm install
Then to check markdown formatting and test the examples against the JSON schema, you can run:
npm test
This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.
If the tests reveal formatting problems with the examples, you can fix them with:
npm run format-examples