Asynchronous SGD in Intel Caffe

Asynchronous SGD in Intel Caffe is based on hybrid strategy implemented by [1]. Nodes/processes are organized into computation groups. In one group, data parallelism is used for synchronizing gradients between nodes (by MPI all-reduce operation); and asynchronous communication across groups via a set of parameter servers.

Two addtional arguments should be specified in command line for this feature:

-n_server: number of nodes/processes for parameter server. 
           If it's larger than 0, asynchronous SGD is enabled. 
-n_group: number of computation groups. Number of nodes in one group 
          is (total_nodes - n_server) / n_group. total_nodes is number
          of nodes in total.

Here is an example:

mpirun -l -np 5 ./build/tools/caffe train -solver \
    examples/cifar10/cifar10_full_solver.prototxt -n_group 2 -n_server 1

-np 5: 5 processes are created.
-n_server 1: 1 process is used for parameter server.
-n_group 2: there are 2 groups and 2 processes are organized in one group.

Currently the topology for cifar10 in Intel Caffe is verified with asynchronous SGD functionality. we'll verify and support more topologies later.

Reference:

[1] Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey, Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous SGD in Intel Caffe

Asynchronous SGD in Intel Caffe

Clone this wiki locally