This implementation involves a change detection task using a sequence of multi-temporal RGB images. The goal of this project is to process a time-series of images, apply Fourier-based low-frequency masking to remove high-frequency noise, and then use a deep learning model to generate a binary change map indicating regions of change over time. The results are evaluated against ground truth data using various performance metrics.
-
Loading and Stacking Images The first part of the code loads a sequence of multi-temporal RGB images from a specified dataset. These images are stacked along a new dimension to create a 3D structure where the third axis represents the temporal component of the image sequence.
Input: File paths of RGB images.
Output: Stacked image array with the shape (height, width, 3, time_steps).
-
Applying Frequency Domain Transformation After stacking, the code applies a low-frequency filter to each image in the sequence using Fast Fourier Transform (FFT). This step is designed to remove high-frequency noise and highlight dominant low-frequency features for the subsequent change detection task.
Input: Stacked image array.
Output: Low-frequency filtered image sequence.
-
Normalization The filtered images are normalized to a [0, 1] scale by dividing by 255.
-
CNN-Based Decoder Model A Convolutional Neural Network (CNN)-based decoder model is created to predict change maps from the low-frequency images. This model processes the input images and generates a binary change map, which highlights the areas of change between consecutive images.
-
Discriminator for Adversarial Training A discriminator model is implemented as part of an adversarial training setup. This model tries to distinguish between the real change maps (ground truth) and the predicted change maps generated by the decoder.
-
Loss Functions Focal Loss: Helps the model focus more on difficult samples by penalizing incorrect predictions with higher weights.
Weighted Binary Cross-Entropy: Applies class weights based on the number of changed vs. unchanged pixels in the ground truth.
Contrastive Loss: Penalizes large differences between the predicted change maps and the ground truth.
Adversarial Loss: Encourages the decoder to generate realistic change maps that can fool the discriminator.
-
Training Loop The models are trained using adversarial training, where the decoder and discriminator are trained in alternating steps:
Train the Discriminator: Using real and predicted change maps.
Train the Decoder: Minimize the combined loss, including contrastive, weighted binary cross-entropy, and adversarial losses.
-
Ground Truth Loading and Processing The ground truth change map is loaded and processed into a binary format, where pixels corresponding to changes are marked as 1, and unchanged areas are marked as 0.
-
Evaluation Metrics After training, the predicted change maps are compared against the ground truth using the following metrics:
Accuracy: Measures the percentage of correctly predicted pixels.
Precision: The ratio of true positive predictions to the total predicted positives.
Recall: The ratio of true positives to the total actual positives.
F1 Score: Harmonic mean of precision and recall.
Kappa: Evaluates the agreement between predicted and actual labels.
IoU (Intersection over Union): Evaluates the overlap between predicted and actual change regions.
Data Scource: https://github.com/thebinyang/UTRNet