SOFA: A Cross-framework performance profiler for heterogeneous computing systems, especially for HPC and distributed machine learning systems.
- Run
./tools/prepare.sh
to install all the necessary packages and python packages. - [OPTIONAL] Run
./tools/empower.py $(whoami) $(which tcpdump)
to make network related events tracable in SOFA. After running this step, it is required to re-login to APPLY THE CHANGES!!!
- Simply run
./install.sh </PATH/TO/INSTALL>
to install SOFA on your system. Note thatsofa
will be appended to the path if the last directory is not sofa. - Then, run
source </PATH/TO/INSTALL>/sofa/tools/activate.sh
to activate SOFA running environment. (Need to be executed on each new shell.) - [ALTERNATIVE] Add
source </PATH/TO/INSTALL>/sofa/tools/activate.sh
in your~/.bashrc
to make this environment available on every shell.
SOFA supports serveral different usages, like how one can use perf. More details can be seen in the following slide
- slide: https://docs.google.com/presentation/d/1fyNnLlU-0WMIddkI8hgYn0Tg1vbP9i7VuXSPIsXB2L4/edit?usp=sharing
- Profile your program by sampling involved CPUs:
sofa stat "dd if=/dev/zero of=dummy.out bs=100M count=10"
- Profile your program by sampling all CPUs:
sofa stat "dd if=/dev/zero of=dummy.out bs=100M count=10" --profile_all_cpus
sofa record "dd if=/dev/zero of=dummy.out bs=100M count=10"
sofa report [--verbose] [--with-gui]
- If passing "--with-gui" to "sofa report", you could open browser with one of the following links for different visualizations.
SOFA provides options for advanced usages. Some examples are shown below. Please use sofa --help
to see more info.
sofa record "python tf_cnn_benchmarks.py" --cpu_filters="idle:black,tensorflow:orange"
sofa record "python tf_cnn_benchmarks.py" --gpu_filters="tensorflow:orange"
sofa record "python3.6 pytorch_dnn_example.py -a resnet50 /mnt/dataset/imagenet/mini-imagenet/raw-data --epochs=1 --batch-size=64"
sofa record "./scout dt-bench ps:resnet50 --hosts='192.168.0.100,192.168.0.101'"
sofa record "~/cuda_samples/1_Utilities/bandwidthTest/bandwidthTest"
sofa record "./scout t-bench resnet50_real"
We strongly encourage and appreciate any contributions to SOFA to make our performance engineering work more comfortable. But to maintain the quality of the codes, we need to regulate cooperations as the following:
- Please run
test/test.py
before sending pull request. If you want to test SOFA on some platforms, you could run./test/test.py --dockerfiles Dockerfile.ubuntu.1604,Dockerfile.ubuntu.1804
where corresponding dockerfiles must be placed inside directory of test.