S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images

This repository provides tools to work with the S1S2-Water dataset.

S1S2-Water dataset is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with STAC-compliant metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM.

The following pre-print article describes the dataset:

Wieland, M., Fichtner, F., Martinis, S., Groth, S., Krullikowski, C., Plank, S., Motagh, M. (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, doi: 10.1109/JSTARS.2023.3333969.

Dataset version update (2024-05-24)

The dataset on Zenodo has been updated to a new version. Sentinel-1 scenes for samples #31 and #59 have been missing in v1.0.0 and are now included in v1.0.1 along with all relevant masks and metadata.

Dataset access

The dataset (~170 GB) is available for download at: https://zenodo.org/records/11278238 (v1.0.1)

Download the dataset parts and extract them into a single data directory as follows.

.
└── data/
    ├── 1/
    │   ├── sentinel12_copdem30_1_elevation.tif
    │   ├── sentinel12_copdem30_1_slope.tif
    │   ├── sentinel12_s1_1_img.tif
    │   ├── sentinel12_s1_1_msk.tif
    │   ├── sentinel12_s1_1_valid.tif
    │   ├── sentinel12_s2_1_img.tif
    │   ├── sentinel12_s2_1_msk.tif
    │   ├── sentinel12_s2_1_valid.tif
    │   └── sentinel12_1_meta.json
    ├── 5/
    │   ├── sentinel12_copdem30_5_elevation.tif
    │   ├── sentinel12_copdem30_5_slope.tif
    │   ├── sentinel12_s1_5_img.tif
    │   ├── sentinel12_s1_5_msk.tif
    │   ├── sentinel12_s1_5_valid.tif
    │   ├── sentinel12_s2_5_img.tif
    │   ├── sentinel12_s2_5_msk.tif
    │   ├── sentinel12_s2_5_valid.tif
    │   └── sentinel12_5_meta.json
    ├── .../
    │   └── ...
    └── catalog.json

Dataset information

Each file follows the naming scheme sentinel12_SENSOR_ID_LAYER.tif (e.g. sentinel12_s1_5_img.tif). Raster layers are stored as Cloud Optimized GeoTIFF (COG) and are projected to Universal Transverse Mercator (UTM).

Sensor	Layer	Description	Values	Format	Bands
S1	IMG	Sentinel-1 image GRD product	Unit: dB (scaled by factor 100)	GeoTIFF 10980 x 10980 px 2 bands Int16	0: VV 1: VH
S2	IMG	Sentinel-2 image L1C product	Unit: TOA reflectance (scaled by factor 10000)	GeoTIFF 10980 x 10980 px 6 bands UInt16	0: Blue 1: Green 2: Red 3: NIR 4: SWIR1 5: SWIR2
S1 / S2	MSK	Annotation mask Hand-labelled water mask	0: No Water 1: Water	GeoTIFF 10980 x 10980 px 1 band UInt8	0: Water mask
S1 / S2	VALID	Valid pixel mask Hand-labelled valid pixel mask	0: Invalid (cloud, cloud-shadow, nodata) 1: Valid	GeoTIFF 10980 x 10980 px 1 band UInt8	0: Valid mask
COPDEM30	ELEVATION	Copernicus DEM elevation	Unit: Meters	GeoTIFF 3660 x 3660 px 1 band Int16	0: Elevation
COPDEM30	SLOPE	Copernicus DEM slope	Unit: Degrees	GeoTIFF 3660 x 3660 px 1 band Int16	0: Slope
N.a.	META	METADATA	STAC metadata item	JSON	N.a.

Data preparation

Make sure to download the dataset as described above. Clone this repository, adjust settings.toml and run s1s2_water.py to prepare the dataset according to your desired settings.

The following splits images and masks for a specific sensor (Sentinel-1 or Sentinel-2) into training, validation and testing tiles with predefined shape and band combination. Slope information can be appended to the image band stack if required.

$ python s1s2_water.py --settings settings.toml

Data preparation parameters are defined in a settings TOML file (--settings)

SENSOR = "s2"                           # prepare Sentinel-1 or Sentinel-2 data ["s1", "s2"]
TILE_SHAPE = [256, 256]                 # desired tile shape in pixel
IMG_BANDS_IDX = [0, 1, 2, 3, 4, 5]      # desired image band combination
SLOPE = true                            # append slope band to image bands
EXCLUDE_NODATA = true                   # exclude tiles with nodata values
DATA_DIR = "/path/to/data_directory"    # data directory that holds the original images
OUT_DIR = "/path/to/output_directory"   # output directory that stores the prepared train, val and test tiles

# Sentinel-1 image bands
# {"VV": 0, "VH": 1}

# Sentinel-2 image bands
# {"Blue": 0, "Green": 1, "Red": 2, "NIR": 3, "SWIR1": 4, "SWIR2": 5}

Information on the deployed preprocessing steps for Sentinel-1 imagery can be found in the SNAP GPT file.

Installation

$ conda env create --file environment.yaml
$ conda activate s1s2_water

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images

Dataset version update (2024-05-24)

Dataset access

Dataset information

Data preparation

Installation

Files

README.md

Latest commit

History

README.md

File metadata and controls

S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images

Dataset version update (2024-05-24)

Dataset access

Dataset information

Data preparation

Installation