- Install Anaconda python virtual environment manager (miniconda is recommended)
- Create conda environment and install required packages
conda env create -n allreduce_env -f environment.yml
- Activate the environment
conda activate allreduce_env
https://www.jetbrains.com/pycharm/download/#section=windows
Search for # Modify gradient allreduce here
and update code there. Replace star-reduce code with ring-allreduce as per:
- https://towardsdatascience.com/distributed-deep-learning-with-horovod-2d1eea004cb2
- https://towardsdatascience.com/visual-intuition-on-ring-allreduce-for-distributed-deep-learning-d1f34b4911da
python test_dataparallel.py
Training is run for the reference and dataparallel models simultaneously
python train_model.py