someone was kind enough to implement here: https://github.com/thuanz123/enhancing-transformers
let's train some benchmark checkpoints for ViT-VQGAN and RQVAE, starting with the datasets this person provided dataloaders for, and then moving up to LAION etc.
- Imagenet
- LSUN
- COCO
- CC3M
and of course... add to PyTTI