The general idea behind the VAE architecture to build a model, termed a debiasing variational autoencoder DB-VAE to remove unknown biases present within the training idea. We'll train our DB-VAE model on the facial detection task, run the debiasing operation during training, evaluate on the PPB dataset, and compare its accuracy to our original, biased CNN model.
Recall that we want to apply our DB-VAE to a supervised classification problem -- the facial detection task. Importantly, note how the encoder portion in the DB-VAE architecture also outputs a single supervised variable Zo
, corresponding to the class prediction -- face or not face. Usually, VAEs are not trained to output any supervised variables (such as a class prediction)! This is another key distinction between the DB-VAE and a traditional VAE.
Keep in mind that we only want to learn the latent representation of faces, as that's what we're ultimately debiasing against, even though we are training a model on a binary classification problem. We'll need to ensure that, for faces, our DB-VAE model both learns a representation of the unsupervised latent variables, captured by the distribution Qo (z/x) and outputs a supervised class prediction Zo but that, for negative examples, it only outputs a class prediction Zo
For face images, our loss function will have two components:
1. VAE loss: consists of the latent loss and the reconstruction loss.
2. Classification loss: standard cross-entropy loss for a binary classification problem.