Skip to content

The official repo for the paper "HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model"

License

Notifications You must be signed in to change notification settings

WHU-Sigma/HyperSIGMA

Repository files navigation

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Di Wang1 ∗, Meiqi Hu1 ∗, Yao Jin1 ∗, Yuchun Miao1 ∗, Jiaqi Yang1 ∗, Yichu Xu1 ∗, Xiaolei Qin1 ∗, Jiaqi Ma1 ∗, Lingyu Sun1 ∗, Chenxing Li1 ∗, Chuan Fu2, Hongruixuan Chen3, Chengxi Han1 †, Naoto Yokoya3, Jing Zhang1 †, Minqiang Xu4, Lin Liu4, Lefei Zhang1, Chen Wu1 †, Bo Du1 †, Dacheng Tao5, Liangpei Zhang1 †

1 Wuhan University, 2 Chongqing University, 3 The University of Tokyo, 4 National Engineering Research Center of Speech and Language Information Processing, 5 Nanyang Technological University.

Equal contribution, Corresponding author

Hits Hits Hits Hits

Update | Overview | Datasets | Pretrained Models | Usage | Statement

🔥 Update

2024.10.22

2024.07.18

2024.06.18

🌞 Overview

HyperSIGMA is the first billion-level foundation model specifically designed for HSI interpretation. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module.

Figure 1. Framework of HyperSIGMA.


Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA’s versatility and superior representational capability compared to current state-of-the-art methods. It outperforms advanced models like SpectralGPT, even those specifically designed for these tasks.

Figure 2. HyperSIGMA demonstrates superior performance across 16 datasets and 7 tasks, including both high-level and low-level hyperspectral tasks, as well as multispectral scenes.

📖 Datasets

To train the foundational model, we collected hyperspectral remote sensing image samples from around the globe, constructing a large-scale hyperspectral dataset named HyperGlobal-450K for pre-training. HyperGlobal-450K contains over 20 million three-band images, far exceeding the scale of existing hyperspectral datasets.

Figure 3. The distribution of HyperGlobal-450K samples across the globe, comprising 1,701 images (1,486 EO-1 and 215 GF-5B) with hundreds of spectral bands.

🚀 Pretrained Models

Pretrain Backbone Model Weights
Spatial_MAE ViT-B Baidu Drive & Hugging Face
Spatial_MAE ViT-L Baidu Drive & Hugging Face
Spatial_MAE ViT-H Baidu Drive & Hugging Face
Spectral_MAE ViT-B Baidu Drive & Hugging Face
Spectral_MAE ViT-L Baidu Drive & Hugging Face
Spectral_MAE ViT-H Baidu Drive & Hugging Face

🔨 Usage

Pretraining

We pretrain the HyperSIGMA with SLURM. This is an example of pretraining the large version of Spatial ViT:

srun -J spatmae -p xahdnormal --gres=dcu:4 --ntasks=64 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spat.py \
--model 'spat_mae_l' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 32 --gpu_num 64 --port 60001

Another example of pretraining the huge version of Spectral ViT:

srun -J specmae -p xahdnormal --gres=dcu:4 --ntasks=128 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spec.py \
--model 'spec_mae_h' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 16 --gpu_num 128 --port 60004  --epochs 1600 --mask_ratio 0.75 \
--use_ckpt 'True'

The training can be recovered by setting --resume

--resume [path of saved model]

Finetuning

Image Classification:

Please refer to ImageClassification-README.

Target Detection & Anomaly Detection:

Please refer to HyperspectralDetection-README.

Change Detection:

Please refer to ChangeDetection-README.

Spectral Unmixing:

Please refer to HyperspectralUnmixing-README.

Denoising:

Please refer to Denoising-README.

Super-Resolution:

Please refer to SR-README.

Multispectral Change Detection:

Please refer to MultispectralCD-README.

⭐ Citation

If you find HyperSIGMA helpful, please consider giving this repo a ⭐ and citing:

@article{hypersigma,
  title={HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model},
  author={Wang, Di and Hu, Meiqi and Jin, Yao and Miao, Yuchun and Yang, Jiaqi and Xu, Yichu and Qin, Xiaolei and Ma, Jiaqi and Sun, Lingyu and Li, Chenxing and Fu, Chuan and Chen, Hongruixuan and Han, Chengxi and Yokoya, Naoto and Zhang, Jing and Xu, Minqiang and Liu, Lin and Zhang, Lefei and Wu, Chen and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal={arXiv preprint arXiv:2406.11519},
  year={2024}
}

🎺 Statement

For any other questions please contact di.wang at gmail.com or whu.edu.cn, and chengxi.han at whu.edu.cn.

💖 Thanks

This project is based on MMCV, MAE, Swin Transformer, VSA, RVSA, DAT, HTD-IRN, GT-HAD, MSDformer, SST-Former, SST, CNNAEU and DeepTrans. Thanks for their wonderful work!