Multinode cifar10

Multi-node CIFAR10

This is a part of Multi-node guide. It is assumed you have completed the cluster configuration and Caffe build tutorials.

Introduction

This tutorial explains basics behind distributed training using Intel® Distribution of Caffe*. The cifar10 example is a good warm-up. Before you start distributed training on bigger networks, please make sure to try it. First, please complete the single node tutorial for cifar10 at http://caffe.berkeleyvision.org/gathered/examples/cifar10.html. The multi-node distributed training is an extension of the single-node capabilities therfore the knowledge of single node is necessary to understand what is going on. There you should learn that you need Cifar10 database which you can get by running:

data/cifar10/get_cifar10.sh
examples/cifar10/create_cifar10.sh

which will download the dataset and prepare it for calculation.

Training

When you are done testing single-node, you are ready to try multi-node training on a single machine.

"Multi-node" on a single machine

The example here is prepared for an easy start, therefore it will be run on a single machine spawning multiple processes just like they would on a cluster.

The basic scenario works like this:

mpirun -host localhost -n <NUMBER_OF_PROCESSES> \
/path/to/built/tools/caffe train --solver=/path/to/proto

The mpirun (mpiexec) executes N (-n) processes on a comma separated list of hosts (-host). The process it executes multiple times is given as arguments together with command line options (very similar like running Valgrind or GDB). Therefore, /path/to/built/tools/caffe train --solver=/path/to/proto will execute <NUMBER_OF_PROCESSES> times on a single local host.

An example prepared in examples/cifar10/train_full_multinode.sh looks as follows:

OMP_NUM_THREADS=1 \
mpirun -l -host 127.0.0.1 -n 4 \
./build/tools/caffe train --solver=examples/cifar10/cifar10_full_solver.prototxt

There is no need to change the solver protobuf configuration.

This script runs four processes (one thread per process) using the same database and each of them in each iteration draws a random batch. Each node uses the same protobuf configuration, so each will retrieve the same number of images to create batch for given iteration. Data is accessed in parallel and images are randomized only if shuffle: true is specified in solver protobuf in data_param section of the data layer for the TRAIN phase. The gradients from all processes are accumulated on the (root) node with rank 0 and averaged. What it means, is that we calculate gradients for total batch size of 400 images (by default the configuration has batch size 100). You can also shuffle the data yourself in a unique way or split the data into disjoint subsets.

Now, you can go back to the main multi-node guide and continue or try the googlenet tutorial linked in the practical part of the guide.

*Other names and brands may be claimed as the property of others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinode cifar10

Multi-node CIFAR10

Introduction

Training

"Multi-node" on a single machine

Clone this wiki locally