-
Notifications
You must be signed in to change notification settings - Fork 491
Multinode cifar10
This is a part of Multi-node guide. It is assumed you have completed the cluster configuration and Caffe build tutorials.
This tutorial explains basics behind distributed training using Intel® Distribution of Caffe*. The cifar10 example is a good warm-up. Before you start distributed training on bigger networks, please make sure to try it. First, please complete the single node tutorial for cifar10 at http://caffe.berkeleyvision.org/gathered/examples/cifar10.html. The multi-node distributed training is an extension of the single-node capabilities therfore the knowledge of single node is necessary to understand what is going on. There you should learn that you need Cifar10 database which you can get by running:
data/cifar10/get_cifar10.sh
examples/cifar10/create_cifar10.sh
which will download the dataset and prepare it for calculation.
When you are done testing single-node, you are ready to try multi-node training on a single machine.
The example here is prepared for an easy start, therefore it will be run on a single machine spawning multiple processes just like they would on a cluster.
The basic scenario works like this:
mpirun -host localhost -n <NUMBER_OF_PROCESSES> \
/path/to/built/tools/caffe train --solver=/path/to/proto
The mpirun (mpiexec) executes N (-n) processes on a comma separated list of hosts (-host). The process it executes multiple times is given as arguments together with command line options (very similar like running Valgrind or GDB). Therefore, /path/to/built/tools/caffe train --solver=/path/to/proto
will execute <NUMBER_OF_PROCESSES>
times on a single local host.
An example prepared in examples/cifar10/train_full_multinode.sh
looks as follows:
OMP_NUM_THREADS=1 \
mpirun -l -host 127.0.0.1 -n 4 \
./build/tools/caffe train --solver=examples/cifar10/cifar10_full_solver.prototxt
There is no need to change the solver protobuf configuration.
This script runs four processes (one thread per process) using the same database and each of them in each iteration draws a random batch. Each node uses the same protobuf configuration, so each will retrieve the same number of images to create batch for given iteration. Data is accessed in parallel and images are randomized only if shuffle: true
is specified in solver protobuf in data_param
section of the data layer for the TRAIN
phase. The gradients from all processes are accumulated on the (root) node with rank 0
and averaged. What it means, is that we calculate gradients for total batch size of 400
images (by default the configuration has batch size 100
). You can also shuffle the data yourself in a unique way or split the data into disjoint subsets.
Now, you can go back to the main multi-node guide and continue or try the googlenet tutorial linked in the practical part of the guide.
*Other names and brands may be claimed as the property of others.