Source code of my semester project for the subject redes neuronales artificiales
.
As a self chosen objective, I decided to explore generative models such as DCGAN
and Wasserstein GAN
to generate photo-realistic dog images using modelling techniques as proposed by [Radford2015] and [Arjovsky2017].
Using the Wikipedia page on dog breeds it was possible to create a comprehensive list of all known dog breeds. From there, the following workflow as displayed in Figure 1
was setup to create different model iterations. For detailed information on the model architectures check the project report. The pipeline executes the following steps.
- Use the for this purpose made
ImageDatasetCompiler
to download a hundred images per keyword (here per dog breed) from Google Image Search. - Crop, resize and filter results to create a image dataset.
- Employ GPU-accelerated Jupyter Notebooks to train and evaluate chosen model variants.
- Sample images from generated models.
Figure 1: Data acquisition and modelling pipeline.
In Figure 2
you may see samples from the obtained training dataset. The dataset was filtered using explicit heuristics to omit cartoon-like images as well as shots taken in a studio environment (white back-drop).
Figure 2: Samples from the acquired dataset.
The results are discussed in detail in my report, which is available in both, English and Spanish. In Figure 3
you may see samples from the trained DCGAN
model after training. Further training resulted in mode collapse of the model.
Figure 3: Samples from the obtained DCGAN model.
Training of the DCGAN
and WGAN
models did not yield a generative model that is able to fully reproduce photo-realistic dog images, but the models are able to infer certain traits of the targeted species. Characteristics like silhouette, parts of faces, legs and different furs are learned and can be generated. A promising next step to further improve the model performance could be the use of Spectral Weight Normalization (see [Miyato2018]]) to increase stability over longer periods of training or the improved WGAN
architecture as described in [Gulrajani2017], that avoids clipping of the critic’s weights. Since the training was in general very computational expensive, even using the free GPU of Google Colab, the investigation in these models was a rather slow process, limiting my ability to fully search the hyperparameter and model space.