This readme describes how to run a TensorFlow autoencoder model on an IPU for both training and inference and how to interpret the results. At least 10 GB free on disk for caching is required to run this example.
Autoencoder models can be used to perform collaborative filtering in recommender systems in order to provide useful predictions, for example recommending films for Netflix users based on their (and other users) previous experiences. This autoencoder model shows significant improvement in results compared to previous models on a Netflix dataset. The model architecture is a deep autoencoder with 6 fully connected layers and a constrained decoder. Dense re-feeding is used for training to overcome the sparseness of data in collaborative filtering. It is implemented in TensorFlow with a model size of about 10 million parameters. This model is taken from the paper “Training Deep AutoEncoders for Collaborative Filtering” https://arxiv.org/pdf/1708.01715.pdf.
Regularisation techniques such as dropout are used to prevent over-fitting. The model implementation also uses loss scaling and mixed precision. Stochastic rounding techniques on the IPU enable the use of low precision fp16.16 for optimisation.
The autoencoder model uses the Netflix Prize training dataset, as shown below. Please note that the dataset can be downloaded separately from this web page: Netflix Prize Data Set. If the path to training data file is not specified, random data generated on the host will be used by the scripts.
Full | 3 months | 6 months | 1 year | |
---|---|---|---|---|
Training | 12/99 - 11/05 | 09/05 - 11/05 | 06/05 - 11/05 | 06/04 - 05/05 |
Users | 477,412 | 311,315 | 390,795 | 345,855 |
Ratings | 98,074,901 | 13,675,402 | 29,179,009 | 41,451,832 |
Testing | 12/05 | 12/05 | 12/05 | 06/05 |
Users | 173,482 | 160,906 | 169,541 | 197,951 |
Rating | 2,250,481 | 2,082,559 | 2,175,535 | 3,888,684 |
Autoencoder files | Description |
---|---|
README.md |
This file, describing how to run the model |
autoencoder_benchmark.py |
Script to perform a hyperparameter scan over learning rate values and benchmark training throughput across numerous IPU devices |
autoencoder_data.py |
Autoencoder end-to-end data pipeline |
autoencoder_main.py |
Main training and validation loop |
autoencoder_model.py |
Autoencoder model definition |
util.py |
Helper functions for running on the IPU |
-
Prepare the TensorFlow environment. Install the Poplar SDK following the instructions in the Getting Started guide for your IPU system. Make sure to source the
enable.sh
scripts for Poplar. -
Run the training program on random data:
python3 autoencoder_main.py
NOTE: If you have downloaded the dataset, you can use the --training-data-file and --validation-data-file options to specify a path to the data:
python3 autoencoder_main.py --training-data-file netflix_data/3m_train.txt --validation-data-file netflix_data/3m_valid.txt --size 128
The autoencoder can be run with standard TensorFlow and Poplar on 1xIPU or 2xIPU. Different batch sizes and either mixed precision (fp16.32) or half precision (fp16.16) can be used to test the IPU. The original paper suggests using different network layer sizes for different datasets, as described below.
The following options are available:
--size
defines neural network architecture options according to the original paper. Accepted values are 128, 256, or 512. Use 128 for the 3-month dataset, 256 for 6-month and 1-year, 512 for full. For size 256 and 1-year dataset specify --batch-size 32 --validation-batch-size 32
. For size 512 also specify --batch-size 8 --validation-batch-size 8
.
--batch-size
set the batch size of the training graph.
Note: --batch-size
and --size
will be set automatically to the values recommended above if not specified manually, as long as the file names of the datasets obtained using get_data.sh
are unchanged. Manually specifying these values overrides the automatic selection.
--validation-batch-size
set the batch size of the validation graph. This is set equal to --batch-size
if not specified manually.
--precision
allows to select between full, mixed or half precision. Accepted values are 32.32
, 16.32
, or 16.16
.
--multiprocessing
is recommended when running both training and validation. It will run the training and validation concurrently in separate processes.
--base-learning-rate
specifies the exponent of the base learning rate. The learning rate is set by lr = blr * batch-size. See https://arxiv.org/abs/1804.07612 for more details.
--no-prng
disables stochastic rounding.
--help
shows the available options.
Some examples of running the model on real data:
python3 autoencoder_main.py --training-data-file netflix_data/3m_train.txt --validation-data-file netflix_data/3m_valid.txt --size 128
python3 autoencoder_main.py --training-data-file netflix_data/6m_train.txt --validation-data-file netflix_data/6m_valid.txt --size 256
python3 autoencoder_main.py --training-data-file netflix_data/1y_train.txt --validation-data-file netflix_data/1y_valid.txt --size 256 --batch-size 32 --validation-batch-size 32
python3 autoencoder_main.py --training-data-file netflix_data/full_train.txt --validation-data-file netflix_data/full_valid.txt --size 512 --batch-size 8 --validation-batch-size 8
The script autoencoder_benchmark.py
allows for performing a hyperparameter search over different learning rate values across multiple IPU devices in parallel in order to optimise accuracy and loss, benchmarking the aggregate throughput (users/second).
Usage is similar to the main autoencoder script, with added options:
--num-ipus
sets the number of IPU devices (default is 16, a full chassis).
--base-learning-rates
is a string comprised of the base learning rates, noting that there must be as many learning rates as IPUs requested. This can be specified either as comma-separated values (e.g. '16,17,18') or as a range (e.g. '16..20'). The base learning rates are the moduli, due to parsing constraints, and the respective signs will be flipped.
For example to benchmark throughput on 1 IPU device over the 3-month Netflix training data, with a base learning rate of 2^-16 (batch and first layer sizes set automatically):
python3 autoencoder_benchmark.py --training-data-file netflix_data/3m_train.txt --num-ipus 1 --base-learning-rates '16'
To benchmark on 2 IPU devices with base learning rates 2^{-16, -17}:
python3 autoencoder_benchmark.py --training-data-file netflix_data/3m_train.txt --num-ipus 2 --base-learning-rates '16,17'
To benchmark across 16 devices with default learning rates of 2^{-12.5, ..., -20} (varying in steps of 0.5):
python3 autoencoder_benchmark.py --training-data-file netflix_data/3m_train.txt