This document has advanced instructions for running SSD-ResNet34 BFloat16
training, which provides more control over the individual parameters that
are used. For more information on using /benchmarks/launch_benchmark.py
,
see the launch benchmark documentation.
Prior to using these instructions, please follow the setup instructions from
the model's README and/or the
AI Kit documentation to get your environment
setup (if running on bare metal) and download the dataset, pretrained model, etc.
If you are using AI Kit, please exclude the --docker-image
flag from the
commands below, since you will be running the the TensorFlow conda environment
instead of docker.
Any of the launch_benchmark.py
commands below can be run on bare metal by
removing the --docker-image
arg. Ensure that you have all of the
required prerequisites installed in your environment
before running without the docker container.
If you are new to docker and are running into issues with the container, see this document for troubleshooting tips.
Once your environment is setup, navigate to the benchmarks
directory of
the model zoo and set environment variables pointing to the directory for the
coco training dataset (or validation dataset for evaluation mode),
TensorFlow models repo, and an output directory where log files will be written.
# cd to the benchmarks directory in the model zoo
cd benchmarks
export TF_MODELS_DIR=<path to the clone of the TensorFlow models repo>
export DATASET_DIR=<path to the COCO dataset directory>
export OUTPUT_DIR=<directory where the log file will be written>
To do a demo run to test performance by training the model for a limited
number of steps, use the command below. The DATASET_DIR
should be pointing to
the COCO training dataset directory. You can change the number of training
steps or number of MPI processes to run multiple instances. If you want
checkpoint files to be saved, specify the --checkpoint <directory path>
flag for the location where files will be written.
Note: for best performance, use the same value for the arguments num-cores and num-intra-thread as follows: For single instance run (mpi_num_processes=1): the value is equal to number of logical cores per socket. For multi-instance run (mpi_num_processes > 1): the value is equal to (#_of_logical_cores_per_socket - 2). If the
--num-cores
or--num-intra-threads
args are not specified, these args will be calculated based on the number of logical cores on your system.
python launch_benchmark.py \
--data-location ${DATASET_DIR} \
--model-source-dir ${TF_MODELS_DIR} \
--model-name ssd-resnet34 \
--framework tensorflow \
--precision bfloat16 \
--mode training \
--num-train-steps 100 \
--num-cores 52 \
--num-inter-threads 1 \
--num-intra-threads 52 \
--batch-size=100 \
--weight_decay=1e-4 \
--num_warmup_batches=20 \
--mpi_num_processes=1 \
--mpi_num_processes_per_socket=1 \
--output-dir ${OUTPUT_DIR} \
--docker-image intel/intel-optimized-tensorflow:latest
To run training and achieve convergence, download the backbone model from the links below, then use the command below: https://storage.googleapis.com/intel-optimized-tensorflow/models/ssd-backbone/checkpoint https://storage.googleapis.com/intel-optimized-tensorflow/models/ssd-backbone/model.ckpt-28152.data-00000-of-00001 https://storage.googleapis.com/intel-optimized-tensorflow/models/ssd-backbone/model.ckpt-28152.index https://storage.googleapis.com/intel-optimized-tensorflow/models/ssd-backbone/model.ckpt-28152.meta
Place the above files in one directory, and pass that location below as --backbone-model.
The DATASET_DIR
should be pointing to the COCO training dataset directory. To prevent
conficts with checkpoint files generated by previous model runs, use an empty OUTPUT_DIR
.
python launch_benchmark.py \
--data-location ${DATASET_DIR} \
--model-source-dir ${TF_MODELS_DIR} \
--model-name ssd-resnet34 \
--framework tensorflow \
--precision bfloat16 \
--mode training \
--num-cores 50 \
--num-inter-threads 1 \
--num-intra-threads 50 \
--batch-size=100 \
--mpi_num_processes=4 \
--mpi_num_processes_per_socket=1 \
--epochs=60 \
--checkpoint <path to output_train_directory> \
--backbone-model <path to resnet34_backbone_trained_model> \
--output-dir ${OUTPUT_DIR} \
--docker-image intel/intel-optimized-tensorflow:latest
To run in eval mode (to check accuracy) if checkpoints are available. Use the below command:
Note that DATASET_DIR
should now points to the location of COCO validation dataset.
python launch_benchmark.py \
--data-location ${DATASET_DIR} \
--model-source-dir ${TF_MODELS_DIR} \
--model-name ssd-resnet34 \
--framework tensorflow \
--precision bfloat16 \
--mode training \
--num-cores 52 \
--num-inter-threads 1 \
--num-intra-threads 52 \
--batch-size=100 \
--mpi_num_processes=1 \
--mpi_num_processes_per_socket=1 \
--accuracy-only \
--checkpoint <path to pretrained_checkpoints> \
--output-dir ${OUTPUT_DIR} \
--docker-image intel/intel-optimized-tensorflow:latest