Code for the paper "Learning Inductive Biases with Simple Neural Networks" (Feinman & Lake, 2018).
paper: https://arxiv.org/pdf/1802.02745.pdf
poster: http://www.cns.nyu.edu/~reuben/files/Poster-CogSci18.pdf
This code repository requires Keras and TensorFlow. Keras must be
configured to use TensorFlow backend. A full list of requirements can be found
in requirements.txt
. After cloning this repository, it is recommended that
you add the path to the repository to your PYTHONPATH
environment variable
to enable imports from any folder:
export PYTHONPATH="/path/to/learning-to-learn:$PYTHONPATH"
The repository contains 5 subfolders:
This folder contains the core reusable source code for the project.
This folder contains short Python scripts for running some experiments. Here you will find scripts for training a neural network model and evaluating its performance.
This folder contains a collection of Jupyter Notebooks for various small tasks, such as plotting results and performing parametric sensitivity tests.
This is a placeholder folder for image and model data. The Brodatz texture dataset is stored here.
This is where experiment results will be saved to and loaded from.
To train the MLP of Experiment 1 on all dataset sizes, i.e. all pairs of
{# categories, # examples}, run the following command using mlp_loop.py
from
the scripts folder:
python mlp_loop.py -ep=200 -r=10 -b=32 -s=</path/to/save/folder>
where </path/to/save/folder>
is a string containing the folder name you'd like
to use for the results (folder should not yet exist, or it will be overwritten).
This will default to ../results/mlp_results
if left unspecified. Results
of the 1st-order and 2nd-order generalization tests will be recorded for all 10
trials of each dataset size.
The parameter ep=200
indicates that you'd like to train for 200 epochs,
parameter r=10
indicates that you'd like to train 10 model runs for each
dataset size, and parameter b=32
indicates that you'd like to use a batch
size of 32 (although, for a training set with N samples, the batch size will be
min(N/5, 32) to ensure that we use at least 5 batches in SGD). This model will
be trained on CPU, as it is too small to benefit from GPU.
Once training is complete, you can plot heatmaps & contours of the results using
the notebook plot_results_experiments1and2.ipynb
.
To perform the parametric sensitivity tests with the MLP, see
the notebook parametric_tests_mlp.ipynb
for a walk-through.
To train the CNN of Experiment 2 on all dataset sizes, i.e. all pairs of
{# categories, # examples}, run the following command using cnn_loop.py
from
the scripts folder:
python cnn_loop.py -ep=400 -r=10 -b=32 -s=</path/to/save/folder> -g=0
where </path/to/save/folder>
is again a string containing the folder name
you'd like to use for the results. This will default to
../results/cnn_results
if left unspecified. Note that we are using 400
epochs as opposed to the 200 from Experiment 1. The additional parameter -g=0
indicates which GPU you would like to use for training, as this experiment will
benefit significantly from GPU speedup. Defaults to the system default if left
unspecified.
A bottleneck of this experiment is the building of the image datasets used for the generalization tests. These datasets each contain 1000 trials (1000x4 = 4000 images). I have parallelized the code using multiprocessing, selecting a # of processes based on the available resources. You will see a significant speedup with a larger CPU count machine.
Once training is complete, you can plot heatmaps & contours of the results using
the notebook plot_results_experiments1and2.ipynb
.
To perform the parametric sensitivity tests with the CNN, see the
notebook parametric_tests_cnn.ipynb
for a walk-through.
arXiv: There are 2 additional experiments in the arXiv version of the
paper under section "Experiment 2." First, we repeated the cnn_loop above, but
this time training the CNN to label objects according to their color name.
To run this experiment, follow the same steps from above but use the script
cnn_loop_color.py
. Secondly, we visualized the filters of a shape-trained and
a color-trained CNN. This can be reproduced using the notebook
notebooks/visualize_cnn_filters.ipynb
To train the 20 models of Experiment 3, run the following command using
vocabulary_acceleration.py
from the scripts folder:
python vocabulary_acceleration_multi.py -ep=70 -sf=0.6 -cf=0.2 -ca=60 -ex=10 -b=10 -r=20 -t=500 -g=0 -sp=</path/to/save/folder>
where </path/to/save/folder>
is again a string containing the folder name
you'd like to use for the results. For each model, the cumulative vocabulary
size and the 2nd-order generalization test results at each epoch will be stored
in a file called run%i.csv
where %i
is the index of the particular model.
Once training is complete, you can analyze the results using the notebook
analyze_vocabulary_acceleration.ipynb
.
You are welcome to re-run all experiments using the instructions above. However,
for time efficiency you can also inspect our results from these experiments, which are included
in the results/
subfolder. Here you will find the results of Experiment 1
(located in mlp_results/
), Experiment 2 (located in cnn_results/
and, for additional arXiv paper experiment, cnn_results_color/
),
and Experiment 3 (located in vocab_accel/
).
Please use the following BibTeX entry when citing this paper:
@article{Feinman2018,
title={Learning inductive biases with simple neural networks},
author={Reuben Feinman and Brenden M. Lake},
journal={arXiv preprint arXiv:1802.02745},
year={2018}
}