Modified Taming Transformers for gen.ggpht Image Synthesis

For the original README

Sampling

Train a model as described above or download a pre-trained model:

Open Images 1 billion parameter model available that trained 100 epochs. On 256x256 pixels, FID 41.48±0.21, SceneFID 14.60±0.15, Inception Score 18.47±0.27. The model was trained with 2d crops of images and is thus well-prepared for the task of generating high-resolution images, e.g. 512x512.
Open Images distilled version of the above model with 125 million parameters allows for sampling on smaller GPUs (4 GB is enough for sampling 256x256 px images). Model was trained for 60 epochs with 10% soft loss, 90% hard loss. On 256x256 pixels, FID 43.07±0.40, SceneFID 15.93±0.19, Inception Score 17.23±0.11.
COCO 30 epochs
COCO 60 epochs (find model statistics for both COCO versions in assets/coco_scene_images_training.svg)

When downloading a pre-trained model, remember to change ckpt_path in configs/*project.yaml to point to your downloaded first-stage model (see ->Training).

Scene image generation can be run with python scripts/make_scene_samples.py --outdir=/some/outdir -r /path/to/pretrained/model --resolution=512,512

Training on custom data

Training on your own dataset can be beneficial to get better tokens and hence better images for your domain. Those are the steps to follow to make this work:

install the repo with conda env create -f environment.yaml, conda activate taming and pip install -e .
put your .jpg files in a folder your_folder
create 2 text files a xx_train.txt and xx_test.txt that point to the files in your training and test set respectively (for example find $(pwd)/your_folder -name "*.jpg" > train.txt)
adapt configs/custom_vqgan.yaml to point to these 2 files
run python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,1 to train on two GPUs. Use --gpus 0, (with a trailing comma) to train on a single GPU.

Other

A video summary by Two Minute Papers.
A video summary by Gradient Dude.
A weights and biases report summarizing the paper by ayulockin.
A video summary by What's AI.
Take a look at ak9250's notebook if you want to run the streamlit demos on Colab.

Text-to-Image Optimization via CLIP

VQGAN has been successfully used as an image generator guided by the CLIP model, both for pure image generation from scratch and image-to-image translation. We recommend the following notebooks/videos/resources:

Advadnouns Patreon and corresponding LatentVision notebooks: https://www.patreon.com/patronizeme
The notebook of Rivers Have Wings.
A video explanation by Dot CSV (in Spanish, but English subtitles are available)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modified Taming Transformers for gen.ggpht Image Synthesis

For the original README

Sampling

Training on custom data

Other

Text-to-Image Optimization via CLIP

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
assets		assets
configs		configs
data		data
scripts		scripts
taming		taming
License.txt		License.txt
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
setup.py		setup.py

License

Common-Codes/taming-transformers

Folders and files

Latest commit

History

Repository files navigation

Modified Taming Transformers for gen.ggpht Image Synthesis

For the original README

Sampling

Training on custom data

Other

Text-to-Image Optimization via CLIP

About

Resources

License

Stars

Watchers

Forks

Languages