Skip to content

Latest commit

 

History

History
executable file
·
92 lines (74 loc) · 5.71 KB

README.md

File metadata and controls

executable file
·
92 lines (74 loc) · 5.71 KB

S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images

This repository provides tools to work with the S1S2-Water dataset.

S1S2-Water dataset is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with STAC-compliant metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM.

The following pre-print article describes the dataset:

Wieland, M., Fichtner, F., Martinis, S., Groth, S., Krullikowski, C., Plank, S., Motagh, M. (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, doi: 10.1109/JSTARS.2023.3333969.

Dataset version update (2024-05-24)

The dataset on Zenodo has been updated to a new version. Sentinel-1 scenes for samples #31 and #59 have been missing in v1.0.0 and are now included in v1.0.1 along with all relevant masks and metadata.

Dataset access

The dataset (~170 GB) is available for download at: https://zenodo.org/records/11278238 (v1.0.1)

Download the dataset parts and extract them into a single data directory as follows.

.
└── data/
    ├── 1/
    │   ├── sentinel12_copdem30_1_elevation.tif
    │   ├── sentinel12_copdem30_1_slope.tif
    │   ├── sentinel12_s1_1_img.tif
    │   ├── sentinel12_s1_1_msk.tif
    │   ├── sentinel12_s1_1_valid.tif
    │   ├── sentinel12_s2_1_img.tif
    │   ├── sentinel12_s2_1_msk.tif
    │   ├── sentinel12_s2_1_valid.tif
    │   └── sentinel12_1_meta.json
    ├── 5/
    │   ├── sentinel12_copdem30_5_elevation.tif
    │   ├── sentinel12_copdem30_5_slope.tif
    │   ├── sentinel12_s1_5_img.tif
    │   ├── sentinel12_s1_5_msk.tif
    │   ├── sentinel12_s1_5_valid.tif
    │   ├── sentinel12_s2_5_img.tif
    │   ├── sentinel12_s2_5_msk.tif
    │   ├── sentinel12_s2_5_valid.tif
    │   └── sentinel12_5_meta.json
    ├── .../
    │   └── ...
    └── catalog.json

Dataset information

Each file follows the naming scheme sentinel12_SENSOR_ID_LAYER.tif (e.g. sentinel12_s1_5_img.tif). Raster layers are stored as Cloud Optimized GeoTIFF (COG) and are projected to Universal Transverse Mercator (UTM).

Sensor Layer Description Values Format Bands
S1 IMG Sentinel-1 image
GRD product
Unit: dB (scaled by factor 100) GeoTIFF
10980 x 10980 px
2 bands
Int16
0: VV
1: VH
S2 IMG Sentinel-2 image
L1C product
Unit: TOA reflectance (scaled by factor 10000) GeoTIFF
10980 x 10980 px
6 bands
UInt16
0: Blue
1: Green
2: Red
3: NIR
4: SWIR1
5: SWIR2
S1 / S2 MSK Annotation mask
Hand-labelled water mask
0: No Water
1: Water
GeoTIFF
10980 x 10980 px
1 band
UInt8
0: Water mask
S1 / S2 VALID Valid pixel mask
Hand-labelled valid pixel mask
0: Invalid (cloud, cloud-shadow, nodata)
1: Valid
GeoTIFF
10980 x 10980 px
1 band
UInt8
0: Valid mask
COPDEM30 ELEVATION Copernicus DEM elevation Unit: Meters GeoTIFF
3660 x 3660 px
1 band
Int16
0: Elevation
COPDEM30 SLOPE Copernicus DEM slope Unit: Degrees GeoTIFF
3660 x 3660 px
1 band
Int16
0: Slope
N.a. META METADATA STAC metadata item JSON N.a.

Data preparation

Make sure to download the dataset as described above. Clone this repository, adjust settings.toml and run s1s2_water.py to prepare the dataset according to your desired settings.

The following splits images and masks for a specific sensor (Sentinel-1 or Sentinel-2) into training, validation and testing tiles with predefined shape and band combination. Slope information can be appended to the image band stack if required.

$ python s1s2_water.py --settings settings.toml

Data preparation parameters are defined in a settings TOML file (--settings)

SENSOR = "s2"                           # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"]
TILE_SHAPE = [256, 256]                 # desired tile shape in pixel
IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5]      # desired image band combination
SLOPE = true                            # append slope band to image bands
EXCLUDE_NODATA = true                   # exclude tiles with nodata values
DATA_DIR = "/path/to/data_directory"    # data directory that holds the original images
OUT_DIR = "/path/to/output_directory"   # output directory that stores the prepared train, val and test tiles

# Sentinel-1 image bands
# {"VV": 0, "VH": 1}

# Sentinel-2 image bands
# {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5}

Information on the deployed preprocessing steps for Sentinel-1 imagery can be found in the SNAP GPT file.

Installation

$ conda env create --file environment.yaml
$ conda activate s1s2_water