This repository contains a set of benchmarking scripts for evaluating the training performance of popular distributed deep learning methods in our paper, which mainly focuses on system-level optimization algorithms of synchronized stochastic gradient descent with data parallelism. Currently, it covers:
- Wait-free backpropagation (WFBP), which is also known as the technique of pipelining the backward computations with gradient communications and it is a default feature in current deep learning frameworks.
- Tensor fusion, which has been integraded in Horovod with a hand-craft threshold to determine when to fuse tensors, but it is possible to dynamically determine to fuse tensors in MG-WFBP.
- Tensor partition and priority schedule, which are proposed in ByteScheduler.
- Gradient compression with quantization (i.e., signSGD) and sparsification (i.e., TopK-SGD). These methods are included in the code, but they are excluded from our paper as the paper focuses on the system-level optimization methods.
- Convolutional neural networks (CNNs) on a fake ImageNet data set (i.e., randomly generate the input image of 224*224*3)
- Transformers: BERT-Base and BERT-Large pretraining models.
- Python 3.6+
- CUDA-10.+
- NCCL-2.4.+
- PyTorch-1.4.+
- OpenMPI-4.0.+
- Horovod-0.19.+
- BytePS-0.2.+
- ByteScheduler
- bit2byte: Optional if not run signSGD.
$git clone https://github.com/HKBU-HPML/ddl-benchmarks.git
$cd ddl-benchmarks
$pip install -r requirements.txt
Before running the scripts, please carefully configure the configuration files in the directory of configs
.
- configs/cluster*: configure the host files for MPI
- configs/envs.conf: configure the cluster enviroments.
Create a log folder, e.g.,
$mkdir -p logs/pcie
- The batch mode
$python benchmarks.py
- The individual mode, e.g.,
$cd horovod
$dnn=resnet50 bs=64 nworkers=64 ./horovod_mpi_cj.sh
If you are using this repository for your paper, please cite our work
@article{shi2020ddlsurvey,
author = {Shi, Shaohuai and Tang, Zhenheng and Chu, Xiaowen and Liu, Chengjian and Wang, Wei and Li, Bo},
title = {Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges},
journal = {arXiv},
url = {\url{https://arxiv.org/pdf/2005.13247.pdf}},
year = {2020}
}