-
Notifications
You must be signed in to change notification settings - Fork 2
Runtime Control
The runtime behavior and output can be controlled using a set of environment variables and a config file.
Vftrace checks for two environment variables upon launch of the application.
Setting VFTR_OFF
deactivates profiling by vftrace. This option takes precedence over any other setting.
Possible values to switch vftrace off are: “Yes”, “True”, “1”.
Possible values to switch vftrace on are: “No”, “False”, “0”.
Default is: “False”
export VFTR_OFF="yes"
VFTR_CONFIG
specifies the path to a config file in JSON format.
The used configuration in JSON format can be found appended to the logfiles (see Application Profiling#Configuration).
A config file with default settings can be found in the manpages for vftrace, or generated with the vftrace_generate_default_config tool (see Tools#vftrace_generate_default_config).
If no config file is given the default config is used.
A complete explanation of the options in the config file is given in the manpages or in the wiki (see Runtime Control#Config File).
To check a config file for correctness and compatibility with vftrace the vftrace_check_config
tool is provided (see Tools#vftrace_check_config).
export VFTR_CONFIG=config.json
The config file defines the runtime behavior of Vftrace and fine tunes the contents of the log files.
In order to use a file as config it needs to be exported with the VFTR_CONFIG
environment variable (see Runtime Control#VFTR_CONFIG).
The used configuration in JSON format can be found appended to the logfiles (see Application Profiling#Configuration).
A config file with default settings can be found in the manpages for vftrace, or generated with the vftrace_generate_default_config tool (see Tools#vftrace_generate_default_config).
If no config file is given the default config is used.
To check a config file for correctness and compatibility with vftrace the vftrace_check_config
tool is provided (see Tools#vftrace_check_config).
The default config is:
{
"off": false,
"output_directory": ".",
"outfile_basename": null,
"logfile_for_ranks": "none",
"print_config": true,
"strip_module_names": false,
"demangle_cxx": false,
"profile_table": {
"show_table": true,
"show_calltime_imbalances": false,
"show_callpath": false,
"show_overhead": false,
"sort_table": {
"column": "time_excl",
"ascending": false
}
},
"name_grouped_profile_table": {
"show_table": false,
"max_stack_ids": 8,
"sort_table": {
"column": "time_excl",
"ascending": false
}
},
"sampling": {
"active": false,
"sample_interval": 0.005000,
"outbuffer_size": 8,
"precise_functions": null
},
"mpi": {
"show_table": true,
"log_messages": true,
"only_for_ranks": "all",
"show_sync_time": false,
"show_callpath": false,
"sort_table": {
"column": "none",
"ascending": false
}
},
"cuda": {
"show_table": true,
"sort_table": {
"column": "time",
"ascending": false
}
},
"hardware_scenarios": {
"active": false
}
}
- off (Boolean): turn on/off vftrace profiling.
- output_directory (String): Directory for log-and vfd-files.
- outfile_basename (String): Basename for log-and vfd-files (_all.log). null results in the executable name as basename.
- logfile_for_ranks (String): Specifies which ranks get local log-files. Valid values are "none", "all", "a-b", "a,b", or a mix of the latter two. E.g. "2,3,5-8,11,13" will generate local log-files for ranks 2,3,5,6,7,8,11,13.
- print_config (Boolean): Whether to append the used config to the logfile.
- strip_module_names (Boolean): Whether to strip module names from fortran symbols to make them more readable.
Example:
my_lengthy_module_MOD_my_function
will be stripped tomy_function
. - demangle_cxx (Boolean): Whether to demangle C++ symbols to make them more readable.
This requires vftrace to be build with
libiberty
support and the application to be linked to-liberty
. Example:_Z9quicksortIiEviPT
and_Z9quicksortIdEviPT
will both be demangled toquicksort
. - profile_table (Section): This section controls how the runtime profile table is printed in the logfile.
- show_table (Boolean): Whether to print the runtime profile table.
- show_calltime_imbalances (Boolean): Whether to include columns on the calltime imbalances across ranks.
If a routine is executed on multiple ranks and they show vastly different runtimes it hints as badly distributed load.
The
Imbalances[%]
column shows the largest deviation (positive value = more than average; negative value = less than average) from the average time among all ranks in percent. Theon rank
shows the rank number on which this largest diviation occoured. In the following example the routinework2
might need optimization an better load balancing, as it has to much work, compared to the average rank in this routine.+-------+-----------+-----------+-----------+---------------+---------+--------------+--------+------+ | Calls | t_excl[s] | t_excl[%] | t_incl[s] | Imbalances[%] | on rank | Function | Caller | STID | +-------+-----------+-----------+-----------+---------------+---------+--------------+--------+------+ | 4 | 52.002 | 55.8 | 52.002 | 23.07 | 3 | work2 | main | 1 | | 4 | 40.003 | 42.9 | 40.003 | 0.00 | 1 | work1 | main | 3 | ...
- show_callpath (Boolean): Whether to include a column containing the complete callstack as shown Application Profiling#Global Call Sacks.
This can make the tables extremely wide and unreadable, but unnecessitates the need to lookup the StackID in the global call stack list.
+-------+-----------+-----------+-----------+--------------+--------+------+------------------------+ | Calls | t_excl[s] | t_excl[%] | t_incl[s] | Function | Caller | STID | Callpath | +-------+-----------+-----------+-----------+--------------+--------+------+------------------------+ ... | 1 | 0.000 | 0.0 | 0.000 | foo | main | 3 | foo<main<init | | 1 | 0.000 | 0.0 | 0.000 | bar | main | 6 | bar<main<init | | 1 | 0.000 | 0.0 | 0.000 | foo | bar | 5 | foo<bar<main<init | ...
- show_overhead (Boolean): Whether to include a column containing the function specific vftrace overhead.
This helps to identify functions where the runtime is small compared to the overhead introduced by vftrace.
If a function has a high overhead compared to its exclusive time and has a high callcount,
the chances are good, that the pure call overhead (without a profiler) will be significant as well.
To reduce the functions impact on the profile it could be excluded from the instrumentation.
+----------+-----------+-----------+-----------+-------------+--------------+--------+------+ | Calls | t_excl[s] | t_excl[%] | t_incl[s] | overhead[s] | Function | Caller | STID | +----------+-----------+-----------+-----------+-------------+--------------+--------+------+ | 29860703 | 19.763 | 98.1 | 19.763 | 12.897 | fib_rec | main | 1 | ...
- sort_table (Section)
- column
- ascending
- name_grouped_profile_table
- show_table
- max_stack_ids
- sort_table
- column
- ascending
- sampling
- active
- sample_interval
- outbuffer_size
- precise_functions
- mpi
- show_table
- log_messages
- only_for_ranks
- show_sync_time
- show_callpath
- sort_table
- column
- ascending
- cuda
- show_table
- sort_table
- column
- ascending
- hardware_scenarios
- active