Gregory Way and Casey Greene 2017
We performed a parameter sweep for three distinct architectures:
- Compression with one hidden layer into 100 features
- 5000 -> 100 -> 5000
- Compression with two hidden layers into 100 hidden units and 100 features
- 5000 -> 100 -> 100 -> 100 -> 5000
- Compression with two hidden layers into 300 hidden units and 100 features
- 5000 -> 300 -> 100 -> 300 -> 500
One Hidden Layer
Based on optimal validation loss, we chose optimal hyperparameters to be
learning rate
= 0.0005, batch size
= 50, and epochs
= 100.
We performed a parameter sweep under a small grid of possible values.
parameter | sweep |
---|---|
learning_rate | 0.0005,0.001,0.0015,0.002,0.0025 |
batch_size | 50,100,128,200 |
epochs | 10,25,50,100 |
kappa | 0.01,0.05,0.1,1 |
In general, we observed little to no difference across many parameters
indicating training the VAE on this data with this architecture is relatively
robust across parameter settings. This is particularly true for different
settings of kappa
.
kappa
controls the warmup period from transitioning from a deterministic
autoencoder to a variational model. kappa
linearly increases the KL divergence
loss penalty until weighted evenly with reconstruction cost. We do not observe
this parameter influencing training time or optimal loss.
Two Hidden Layers
Based on optimal validation loss, the two hidden layer model has optimal
hyperparameters at learning_rate
= 0.001, batch size
= 100, and epochs
= 100.
Again, training was relatively stable with comparable performance over a large grid.
With two layers, kappa
made a larger difference. The burn in kappa
period actually
penalized model performance, with kappa
< 1 having consistently worse performance.
We also trained a model with an alternative two layer architecture with 300 hidden features.
Two hidden layers does not improve performance as much as initially thought. There is also not much benefit in 2 compression layers. Observed below are the three optimal models described above.
We have yet to perform comparisons in regards to the biology learned by each model.