Chop Chop Plots [Artifact] [arXiv]
Repository containing the scripts used to extract, aggregate, analyze and visualize the data generated by the benchmarks of Chop Chop and its baselines. The repository also contains the data and plots used in the Chop Chop paper. Organization:
- Prerequisites
- Extract and aggregate relevant data from the benchmarks
- Compute statistics on the aggregated data
- Plot dem stats
- Write a dope paper
- Install
python3
and its packages:matplotlib
,numpy
,pandas
- Compile Chop Chop to obtain the
heartbeat_statistics
binary - Coffee
- Patience
The following assumes that each run of each system configuration has been executed.
Different systems require different scripts to parse the logs but they all output .json
files for uniformity. Feel free to create directories or use symbolic links to sort the evaluation files generated for the BFT-SMaRt, HotStuff and Bullshark baselines.
For comparison, the archive sorted-results-little-boy.tar.xz
(435 MB decompressed) contains the directory tree, the dead symlinks and the aggregated .json
files of the baselines, as used in the paper. The archive agg-data.tar.xz
(83 MB decompressed) contains all the aggregated .json
files needed for the plots (some duplicated from the sorted results).
The script extract_chopchop.py
aggregates the data from Chop Chop logs into .json
. Please fill the variables at the top of the script to indicate: (1) the directory containing the compiled code of Chop Chop in order to access the heartbeat_statistics
binary, (2) the directory containing the raw evaluation data, and (3) the name of the ethernet interface of the servers used to determine the total throughput of a run.
Inputs: directories containing raw evaluation files (.bin
heartbeat files) as indicated in the variable DIR_RESULT
at the top and in the main loop at the bottom of the script.
Outputs: one .json
file per latency and per throughput per system configuration, and one .json
file for line rate measurements.
python3 extract_chopchop.py
The script extract_bftsmart_hotstuff.py
must be run once per evaluation run. For instance, it must be run 5 times per system configuration to obtain the data used in the paper, while the script extract_chopchop.py
must only be run once regardless of the number of runs.
Inputs:
bftsmart
orhotstuff
depending on the system configuration to parse;- One or several directories
DIRS
that contain in total 80.out
and 80.err
files of a single run (16 honest clients + 64 load clients). The script searches for these files recursively.
Outputs: DIR.json
located in the same directory as the first item in the list DIRS
.
python3 extract_bftsmart_hotstuff.py bftsmart <DIRS...>
python3 extract_bftsmart_hotstuff.py hotstuff <DIRS...>
The script extract_bullshark.py
must be run once per evaluation run as with bftsmart_hotstuff.py
. The script is mostly copied from the original Bullshark's easy_log.py
. The main changes are around lines 375 to generate a .json
containing the latency distribution and to include the logs of the worker nodes.
Inputs: a directory DIR
that contains the .out
and .err
files of a single run. The script searches for these files recursively.
Outputs: DIR.easier-log.json
located in the same directory as DIR
.
python3 extract_bullshark.py <DIR>
The script stats.py
computes statistics on all the aggregated .json
from all the runs of all the system configurations.
Inputs: aggregated .json
files in agg-data/
, the exact location of each file can be found at the bottom of the script.
Outputs: one statistics .csv
file per system configuration stored in stats/
. The stats/
directory is already populated with the files used in the paper.
python3 stats.py
The script plot.py
generates plots from the previously computed statistics.
with helper functions and variables in utils.py
.
Inputs: statistical data in stats/
, the exact location of each .csv
file can be found at the bottom of the script.
Outputs: plots in .pdf
format stored in figs/
, the exact location of each pdf can be found at the end of each function. Functions that start with an underscore (e.g. _plotCommaSingle
) contain older code that generates figures that are not in the paper. See utils.py
to generate .eps
in addition to .pdf
.
python3 plot.py
Directory figs/
is already populated with the figures used in the paper:
- Fig 1:
motivation-throughput-services.pdf
- Fig 7:
comma-split.pdf
- Fig 8:
merged-distillation-payloed-sizes.pdf
(let's keep the typo) - Fig 9:
linerate-throughput-log.pdf
- Fig 10:
merged-system-sizes-matching-trusted-resources.pdf
- Fig 11:
merged-servers-faults-applications.pdf
TODO. Input welcome.