Name		Name	Last commit message	Last commit date
parent directory ..
config		config
figures		figures
param_sweep		param_sweep
results		results
scripts		scripts
README.md		README.md
analysis.sh		analysis.sh
visualize-parameter-sweep.ipynb		visualize-parameter-sweep.ipynb

README.md

Results for Initial Hyperparameter Sweep Considering Different Latent Dimensionality

Gregory Way 2018

Latent Space Dimensionality

Compression algorithms reduce the dimensionality of input data by enforcing the number of dimensions to bottleneck. A common problem is the decision of how many "useful" latent space features are present in data. The solution is optimized differently for different problems or goals. For example, when visualizing large differences between groups of data, a highly restrictive bottleneck, usually between 2 or 3 features, is required. However, when the goal is to extract meaningful patterns in the data that may have more subtle relationships across samples, the recommendations are opaque. As the bottleneck relaxes, the ability to explain the patterns decreases and the possibility of false positives increases.

In order to determine an optimal range of compression dimensions, we propose the following. We will first sweep over various different dimensions (results provided below) and perform several evals (to be described later).

Before sweeping over a large number of different dimensions, we perform a hyperparameter sweep of select dimensions. In this sense, we want to minimize the effect of poor hyperparameter combinations across different dimensions contributing to performance differences. In other words, we want to isolate the effect of changing dimensionality on the observed patterns and solutions. Therefore, we perform a parameter sweep over several hyperparameters for the two unsupervised neural network models. The models include a variational autoencoder (VAE; Tybalt) and a denoising autoencoder (DAE; ADAGE).

The full analysis is provided, with results visualized, in visualize-parameter-sweep.ipynb.

Summary Figure

Datasets

We perform a hyperparamter grid search across three different training datasets. The datasets include TCGA, GTEx, and TARGET. For more details about these datasets, refer to 0.expression-download/README.md.

Number of dimensions

Previously, we used a latent space dimensionality of 100 (Way and Greene 2018). Here, we sweep over dimensions: 5, 25, 50, 75, 100, and 125.

To reproduce the data for this analysis run the following command:

# From the top directory
conda activate biobombe

# Navigate into z-sweep directory
cd 1.initial-k-sweep
bash analysis.sh

Parameter Sweep

We sweep over the following parameter combinations for Tybalt and ADAGE models in the TCGA dataset:

Variable	Tybalt Values	ADAGE Values
Dimensions (k)	5, 25, 50, 75, 100, 125	5, 25, 50, 75, 100, 125
Learning Rate	0.0005, 0.001, 0.0015, 0.002, 0.0025	0.00005, 0.00001, 0.0005, 0.001, 0.0015, 0.002
Batch Size	50, 100, 150	50, 100
Epochs	50, 100	100
Kappa	0, 0.5, 1
Sparsity		0, 0.000001, 0.001
Noise		0, 0.1, 0.5
Weights		tied

This resulted in the training of 540 Tybalt models and 648 ADAGE models. Note that we have also tested ADAGE models with untied weights in the TCGA dataset (data not shown). In this setting, performance was worse than with tied weight models. For all downstream applications we use ADAGE models with tied weights.

Our goal was to determine optimal hyperparameter combinations for both models across various bottleneck dimensionalities across each of the three datasets.

All other hyperparameter combinations used across models and datasets are located in the config/ folder.

Results

All results can be viewed in 1.initial-z-sweep/visualize-parameter-sweep.ipynb.

In that notebook, we report the results in a series of visualizations and tables for Tybalt and ADAGE across TCGA, GTEx, and TARGET. Note that the notebook was converted to an R script with:

# Convert R notebook to R script for execution
jupyter nbconvert --to=script --FilesWriter.build_directory=scripts/nbconverted visualize-parameter-sweep.ipynb

Summary

Selection of hyperparameters across different latent space dimensionality operated as expected. Loss was higher for lower dimensions and lower dimensions benefited the most from increased regularization and higher learning rates. Nevertheless, we have obtained a broad set of optimal hyperparameters for use in a larger and more specific sweep of dimensionality for each of the three analyzed datasets.

Selected Optimal Hyperparamters

The analysis allowed us to select optimal hyperparameters for each dataset and algorithm combination. We report the results below:

TCGA

VAE (Tybalt)

Dimensions	Epochs	Batch Size	Learning Rate
5	100	50	0.002
25	100	50	0.0015
50	100	100	0.0015
75	100	150	0.0015
100	100	150	0.001
125	100	150	0.0005

DAE (ADAGE)

Dimensions	Epochs	Batch Size	Learning Rate
5	100	50	0.0015
25	100	50	0.0015
50	100	50	0.0005
75	100	50	0.0005
100	100	50	0.0005
125	100	50	0.0005

GTEx

VAE (Tybalt)

Dimensions	Kappa	Epochs	Batch Size	Learning Rate
5	0.5	100	100	0.0025
25	0.5	100	100	0.0025
50	0.5	100	100	0.002
75	0.5	100	50	0.002
100	0.5	100	50	0.0015
125	0.5	100	50	0.0015

DAE (ADAGE)

Dimensions	Noise	Epochs	Batch Size	Learning Rate
5	0.1	100	50	0.001
25	0.0	100	50	0.001
50	0.0	100	50	0.0005
75	0.0	100	50	0.0005
100	0.0	100	50	0.0005
125	0.0	100	50	0.0005

TARGET

VAE (Tybalt)

Dimensions	Kappa	Epochs	Batch Size	Learning Rate
5	0.5	100	25	0.0015
25	0.5	100	25	0.0015
50	0.5	100	25	0.0015
75	0.5	100	25	0.0015
100	0.5	100	25	0.0015
125	0.5	100	25	0.0005

DAE (ADAGE)

Dimensions	Noise	Epochs	Batch Size	Learning Rate
5	0.1	100	50	0.0005
25	0.1	100	50	0.0005
50	0.1	100	50	0.0005
75	0.1	100	50	0.0005
100	0.1	100	50	0.0005
125	0.1	100	50	0.0005

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.initial-k-sweep

1.initial-k-sweep

README.md

Results for Initial Hyperparameter Sweep Considering Different Latent Dimensionality

Latent Space Dimensionality

Summary Figure

Datasets

Number of dimensions

Parameter Sweep

Results

Summary

Selected Optimal Hyperparamters

TCGA

VAE (Tybalt)

DAE (ADAGE)

GTEx

VAE (Tybalt)

DAE (ADAGE)

TARGET

VAE (Tybalt)

DAE (ADAGE)

Files

1.initial-k-sweep

Directory actions

More options

Directory actions

More options

Latest commit

History

1.initial-k-sweep

Folders and files

parent directory

README.md

Results for Initial Hyperparameter Sweep Considering Different Latent Dimensionality

Latent Space Dimensionality

Summary Figure

Datasets

Number of dimensions

Parameter Sweep

Results

Summary

Selected Optimal Hyperparamters

TCGA

VAE (Tybalt)

DAE (ADAGE)

GTEx

VAE (Tybalt)

DAE (ADAGE)

TARGET

VAE (Tybalt)

DAE (ADAGE)