Connects to EXAMON via MQTT and reads out any kind of data, e.g., blade energy consumption
- Score-P
- CMake version 3.8 or higher
- make
- some server on which EXAMON is running
- some server on which mosquitto(i.e. a MQTT-Broker) is running, see this launchpad if you want a more recent version to run on an Ubuntu
other dependencies are included in this Git-Repository (libmosquitto, scorep_plugin_cxx_wrapper), they should be fulfilled when running cmake and make.
BSD-3
Use CMake:
mkdir build
cd build;
cmake ../
make
- ENABLE_MPI (default: OFF)
- configure examon to report metrics to the mosquitto server
- compile your application with Score-P
- set environment variable
LD_LIBRARY_PATH
- tell Score-P that you want a trace with
SCOREP_ENABLE_TRACING=true
- specify which plugin you want to run by setting environment variable SCOREP_METRIC_PLUGINS (either
examon_sync_plugin
orexamon_async_plugin
) - specify which metrics you want to be included into the trace with either
SCOREP_METRIC_EXAMON_SYNC_PLUGIN
orSCOREP_METRIC_EXAMON_ASYNC_PLUGIN
environment variable. They take a comma separated list - run your program
In case you don't have scorep, mosquitto and examon running on the same system you might want to set the following environment variables:
environment variable | default | purpose |
---|---|---|
SCOREP_METRIC_EXAMON_SYNC_PLUGIN | none | specify metrics for sync plugin |
SCOREP_METRIC_EXAMON_ASYNC_PLUGIN | none | specify metrics for async plugin |
SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_BROKER / SCOREP_METRIC_EXAMON_SYNC_PLUGIN_BROKER | localhost | the address of the MQTT broker |
SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_EXAMON_HOST / SCOREP_METRIC_EXAMON_SYNC_PLUGIN_EXAMON_HOST | gethostname() | the hostname which examon uses in it's topics |
SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_CHANNEL / SCOREP_METRIC_EXAMON_SYNC_PLUGIN_CHANNEL | org/antarex/cluster/testcluster | the default channel configured in examon's `pmu_pub.conf` key `topic` |
SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_INTERVAL / SCOREP_METRIC_EXAMON_SYNC_PLUGIN_INTERVAL | none | used to tell examon via command channel to adapt this delay between readouts, unit: seconds (you may specify a floating point number) |
See Score-P documentation on how to compile your software with instrumentation
# required if plugin was not installed via make install
export LD_LIBRARY_PATH="/my/path/to/the/repository/scorep_plugin_examon/build"
export SCOREP_ENABLE_TRACING=true
export SCOREP_ENABLE_PROFILING=false
export SCOREP_TOTAL_MEMORY=1000M
export SCOREP_METRIC_PLUGINS="examon_async_plugin"
export SCOREP_METRIC_EXAMON_ASYNC_PLUGIN="cpu/0/tsc,core/+/temp;MAX,cpu/0/erg_pkg,cpu/0/erg_dram"
./my-program-which-was-compiled-using-scorep.bin
The above shell code should run your program and produce a trace, which you then may inspect with a tool like VAMPIR, therein choose the button "Add Counter Data Timeline" (should be fourth from the left) to see the recorded stats.
Using either SCOREP_METRIC_EXAMON_SYNC_PLUGIN
or SCOREP_METRIC_EXAMON_ASYNC_PLUGIN
you can specify the metrics you want this plugin to report from examon. For an example look ath the bash code above, which uses MQTT topics with cpu and core, like cpu/0/tsc
and core/+/temp;MAX
.
parameter = <metric-specification> [',' <metric-specification> ...]
metric-specification = <path> '/' <basename> [';' <option> ...]
path = any character except ','
basename = any character except ',' '/' or ';'
option = <accumulation-strategy> | <output-datatype> | <scaling_multiplicator>
accumulation-strategy = 'MIN' | 'MAX' | 'AVG' | 'SUM'
output-datatype = 'DOUBLE' | 'INT32' | 'UINT32' | 'INT64' | 'UINT64'
scaling_multiplicator = 's=' <floating point number>
NOTE when specifying multiple accumulation-strategy or output-datatype options on a single metric-specification the last one applies. E.g. cpu/+/erg_dram;SUM;MAX;AVG;MIN;UINT64;INT32;DOUBLE
would be accumulated using the MIN
strategy, and be reported as a double
value to Score-P.
MQTT understands wildcards in the path. +
for any amount of characters excluding /
. #
for any amount of characters.
Do note that you may specify an arbitrary topic path that does not end with a MQTT wildcard.
If you specify a wildcard within the topic path like in core/+/temp
, MQTT will match several topics to it e.g. core/0/temp
, core/1/temp
, core/2/temp
etc., in that case you might want to specify an ACUMMULATION_STRATEGY, which will tell this plugin how to combine the multiple metrics into one value. Currently the following acummulation strategies are available:
MIN
, select the smallest received value for each time indexMAX
, select the largest received value for each time indexSUM
, for each time index summate all received valuesAVG
(default), for each time index calculate the average value
You may also specify an output datatype. The metrics will be parsed to the specified datatype before being reported to Score-P. Available Datatypes:
DOUBLE
(default)INT32
UINT32
INT64
UINT64
The path you specify is only the very end of the Examon/MQTT path. If you look at the raw output of Examon using mosquitto_sub -v -t "#"
you might receive an output like this:
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data CK
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/tsc 6514034410521054;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/temp_pkg 31;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/erg_dram 1347318602;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/erg_cores 493158;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/erg_pkg 4176111766;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/erg_units 658947;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/freq_ref 3400000000.000000;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/C2 6458454251168686;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/C3 0;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/cpu/0/C6 0;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/tsc 6514034410281892;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/temp 30;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/instr 1348379485276;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/clk_curr 3838437806472;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/clk_ref 3962611107782;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/C3 528063812196;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/C6 144192590938;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/aperf 5629924683127;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/mperf 5726500245551;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/UOPS_RETIRED.RETIRE_SLOTS 140737561474785;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/ICACHE.MISSES 140737488408801;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/LONGEST_LAT_CACHE.MISS 140737489632218;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE 5726500245551;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/BR_MISP_RETIRED.ALL_BRANCHES 5726500245551;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/UOPS_ISSUED.ANY 65535;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/IDQ_UOPS_NOT_DELIVERED.CORE 5726500245551;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/0/INT_MISC.RECOVERY_CYCLES 5726500245551;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/1/tsc 6514034410435230;1534429383.000
org/antarex/cluster/testcluster/node/mabus/plugin/pmu_pub/chnl/data/core/1/temp 29;1534429383.000
...
This plugin will assemble the data path using the following schema (use SYNC
or ASYNC
in the variable name depending on which you are using):
*SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_CHANNEL* '/node/' *SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_EXAMON_HOST* '/plugin/pmu_pub/chnl/data/'
To which your specified path will be appended. The resulting string will be used in an invocation of MQTT's subscribe() to receive metric values from Examon/MQTT.
Thus, if you would want to listen to foo/node/supercomputer/plugin/pmu_pub/chnl/data/core/0/temp
, you might specify the following environment variables:
export SCOREP_METRIC_PLUGINS="examon_async_plugin"
export SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_EXAMON_HOST="supercomputer"
export SCOREP_METRIC_EXAMON_ASYNC_PLUGIN_CHANNEL="foo"
export SCOREP_METRIC_EXAMON_ASYNC_PLUGIN="core/0/temp"
Note that Examon provides metrics per cpu socket and per core. E.g. metric erg_pkg
is only available on a per cpu socket basis.
Note Metrics which's basename begins with erg
(except erg_units) will trigger this plugin to multiply their output value with the factor derived from cpu/0/erg_units to arrive at the correct Joule value.
list a lot of metrics from cpu socket 0
cpu/0/erg_pkg,cpu/0/erg_dram,cpu/0/tsc,cpu/0/temp_pkg,cpu/0/erg_cores,cpu/0/freq_ref,cpu/0/C2,cpu/0/C3,cpu/0/C6,cpu/0/uclk
compare average temperature of all cores with maximum temperature:
core/+/temp;AVG,core/+/temp;MAX
look at temperature and energy consumption
cpu/+/erg_pkg;AVG,core/+/temp;MAX
I would be thankfull for any bug report.
Example host_whitelist (replace the last name with the name from /etc/hostname
)
[BROKER:] 127.0.0.1 1883
HOSTNAME_OF_THE_SYSTEM_EXAMON_IS_RUNNING_ON
Excerpt from pmu_pub.conf
[MQTT]
brokerHost = 127.0.0.1
brokerPort = 1883
topic = org/antarex/cluster/testcluster
qos = 0
[Daemon]
dT = 1
daemonize = False
...
Wherein dT
may be a floating point denoting the delay between readouts of RAPL/MSR
Note I could not get Examon running with daemonization, therefore daemonize
is disabled in my configuration.
Note On Kernels since Version 4 (April 2015) it is necessary to enable RDPMC, otherwise there is a segfault.