Kindly navigate to the original repo
Train a model as described above or download a pre-trained model:
- Open Images 1 billion parameter model available that trained 100 epochs. On 256x256 pixels, FID 41.48±0.21, SceneFID 14.60±0.15, Inception Score 18.47±0.27. The model was trained with 2d crops of images and is thus well-prepared for the task of generating high-resolution images, e.g. 512x512.
- Open Images distilled version of the above model with 125 million parameters allows for sampling on smaller GPUs (4 GB is enough for sampling 256x256 px images). Model was trained for 60 epochs with 10% soft loss, 90% hard loss. On 256x256 pixels, FID 43.07±0.40, SceneFID 15.93±0.19, Inception Score 17.23±0.11.
- COCO 30 epochs
- COCO 60 epochs (find model statistics for both COCO versions in
assets/coco_scene_images_training.svg
)
When downloading a pre-trained model, remember to change ckpt_path
in configs/*project.yaml
to point to your downloaded first-stage model (see ->Training).
Scene image generation can be run with
python scripts/make_scene_samples.py --outdir=/some/outdir -r /path/to/pretrained/model --resolution=512,512
Training on your own dataset can be beneficial to get better tokens and hence better images for your domain. Those are the steps to follow to make this work:
- install the repo with
conda env create -f environment.yaml
,conda activate taming
andpip install -e .
- put your .jpg files in a folder
your_folder
- create 2 text files a
xx_train.txt
andxx_test.txt
that point to the files in your training and test set respectively (for examplefind $(pwd)/your_folder -name "*.jpg" > train.txt
) - adapt
configs/custom_vqgan.yaml
to point to these 2 files - run
python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,1
to train on two GPUs. Use--gpus 0,
(with a trailing comma) to train on a single GPU.
- A video summary by Two Minute Papers.
- A video summary by Gradient Dude.
- A weights and biases report summarizing the paper by ayulockin.
- A video summary by What's AI.
- Take a look at ak9250's notebook if you want to run the streamlit demos on Colab.
VQGAN has been successfully used as an image generator guided by the CLIP model, both for pure image generation from scratch and image-to-image translation. We recommend the following notebooks/videos/resources:
- Advadnouns Patreon and corresponding LatentVision notebooks: https://www.patreon.com/patronizeme
- The notebook of Rivers Have Wings.
- A video explanation by Dot CSV (in Spanish, but English subtitles are available)