Reimplementation of 4x SR3 https://arxiv.org/abs/2104.07636
The UNet structure is almost same as the vanilla DDPM, except that self-attention is performed at the last depth and the depth right before the last depth, group normalization is performed on total 8 groups instead of 32 groups, and the linear scale of embedding generation module is replaced from 10,000 to 5,000. As mentioned in the paper, gamma value is sampled between two alpha values at t-1 and t with a unifrom probability distribution, and the square rooted value of gamma is directly inserted to the embedding generation module.
Tag | Setting |
---|---|
Base Channel | 56 |
Train Batch Size | 4 |
Train Iterations | 500K |
Trian Data | DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images |
Validation Data | DIV2K Validation Set |
Test Data | Flickr2K Train Set from 1 to 1000 images |
Train Data Augmentation | Random Crop, Random Flip, Random Rotation |
Test Data Augmentation | Centor Crop |
Train Learning Rate Schedule | Cosine Annealing Schedule from 1e-5 to 1e-7 |
Train Beta Scehdule | Linear Schedule from 1e-4 to 0.005 |
Sample Gamma Schedule | Linear Schedule from 1e-4 to 0.1 |
Train Steps | 1000 |
Sample Steps | 100 |
Dataset | IS (Mean, Std.) | FID | PSNR | SSIM |
---|---|---|---|---|
centor crop 64x64 to 256x256 | (12.829, 0.992) | 3.642 | 23.185 | 0.564 |
centor crop 256x256 to 1024x1024 | (21.305, 2.290) | 0.312 | 23.819 | 0.617 |
Note that this model does not train on 256x256 to 1024x1024.
Inception Score shows low values as cropped images are hard to recognize as an object. As crop size increases, Inception Score also increases.
Note that the below LR images are upsampled images by using bicubic interpolation.
Tag | Image |
---|---|
LR | |
Sample | |
HR |
Tag | Image |
---|---|
LR | |
Sample | |
HR |
Tag | Image |
---|---|
LR | |
Sample | |
HR |
Dataset | IS (Mean, Std.) | FID | PSNR | SSIM |
---|---|---|---|---|
centor crop 32x32 to 128x128 | (7.159, 0.437) | 8.177 | 23.609 | 0.563 |
Tag | Setting |
---|---|
Base Channel | 64 |
Train Batch Size | 12 |
Train Iterations | 500K |
Trian Data | DIV2K Train Set + Flickr2K Train Set from 1001 to 2650 images |
Validation Data | DIV2K Validation Set |
Test Data | Flickr2K Train Set from 1 to 1000 images |
Train Data Augmentation | Random Crop, Random Flip, Random Rotation |
Test Data Augmentation | Centor Crop |
Train Learning Rate Schedule | Cosine Annealing Schedule from 1e-5 to 1e-7 |
Train Beta Scehdule | Linear Schedule from 1e-4 to 0.005 |
Sample Gamma Schedule | Linear Schedule from 1e-6 to 0.05 |
Train Steps | 1000 |
Sample Steps | 100 |
Note that the below LR images are upsampled images by using bicubic interpolation.
Tag | Image |
---|---|
LR | |
Sample | |
HR |
Tag | Image |
---|---|
LR | |
Sample | |
HR |