This project was set as the final project assignment for the 2023 2nd OUTTA AI Bootcamp, where I served as the overall leader. This bootcamp is operated by OUTTA, a non-profit AI education organization where I hold the position of president.
This project is designed to generate images based on text input. The goal of this project is to implement a simple T2I (Text-to-Image) Generation model using Conditional GANs.
Along with the OUTTA members, I created this project, set it as the final team project assignment for the 2023 2nd OUTTA AI Bootcamp, and evaluated the submissions to select the top-performing teams.
If you're interested in undertaking this project yourself, you can download the skeleton code from here.
This repository contains the solution for the project.
For a more detailed explanation about this project, please refer to the uploaded '2023_final_project_guideline.pdf'.
To execute this project, you'll need to modify the 'network.py' and 'train.py' files; it is recommended not to change other files.
A brief explanatory video about this project is available at the following link.
This project was designed to be primarily executed in the Google Colab environment.
Dataset can be downloaded from here.
You can see the source of the dataset at the following link.
Command for data preprocessing:
python preproc_datasets_celeba_zip_train.py --source=./multimodal_celeba_hq.zip \
--dest train_data_6cap.zip --width 256 --height 256 \
--transform center-crop --emb_dim 512 --width=256 --height=256
Zip file ./multimodal_celeba_hq.zip
is like:
./multimodal_celeba_hq.zip
├── image
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── 2.jpg
│ └── ...
└── celea-caption
│ ├── 0.txt
│ ├── 1.txt
│ ├── 2.txt
│ └── ...
If you want to measure FID and IS, run the file 'Evaluate_FID_and_IS.ipynb'.
This repository is implemented based on LAFITE, StackGAN++ and AttnGAN.