Skip to content

Runtime Control

Felix Uhl edited this page Nov 16, 2022 · 13 revisions

The runtime behavior and output can be controlled using a set of environment variables and a config file.

Environment Variables

Vftrace checks for two environment variables upon launch of the application.

VFTR_OFF

Setting VFTR_OFF deactivates profiling by vftrace. This option takes precedence over any other setting. Possible values to switch vftrace off are: “Yes”, “True”, “1”. Possible values to switch vftrace on are: “No”, “False”, “0”. Default is: “False”

export VFTR_OFF="yes"

VFTR_CONFIG

VFTR_CONFIG specifies the path to a config file in JSON format. The used configuration in JSON format can be found appended to the logfiles (see Application Profiling#Configuration). A config file with default settings can be found in the manpages for vftrace, or generated with the vftrace_generate_default_config tool (see Tools#vftrace_generate_default_config). If no config file is given the default config is used. A complete explanation of the options in the config file is given in the manpages or in the wiki (see Runtime Control#Config File). To check a config file for correctness and compatibility with vftrace the vftrace_check_config tool is provided (see Tools#vftrace_check_config).

export VFTR_CONFIG=config.json

Config File

The config file defines the runtime behavior of Vftrace and fine tunes the contents of the log files. In order to use a file as config it needs to be exported with the VFTR_CONFIG environment variable (see Runtime Control#VFTR_CONFIG). The used configuration in JSON format can be found appended to the logfiles (see Application Profiling#Configuration). A config file with default settings can be found in the manpages for vftrace, or generated with the vftrace_generate_default_config tool (see Tools#vftrace_generate_default_config). If no config file is given the default config is used. To check a config file for correctness and compatibility with vftrace the vftrace_check_config tool is provided (see Tools#vftrace_check_config). The default config is:

{
   "off": false,
   "output_directory": ".",
   "outfile_basename": null,
   "logfile_for_ranks": "none",
   "print_config": true,
   "strip_module_names": false,
   "demangle_cxx": false,
   "profile_table": {
      "show_table": true,
      "show_calltime_imbalances": false,
      "show_callpath": false,
      "show_overhead": false,
      "sort_table": {
         "column": "time_excl",
         "ascending": false
      }
   },
   "name_grouped_profile_table": {
      "show_table": false,
      "max_stack_ids": 8,
      "sort_table": {
         "column": "time_excl",
         "ascending": false
      }
   },
   "sampling": {
      "active": false,
      "sample_interval": 0.005000,
      "outbuffer_size": 8,
      "precise_functions": null
   },
   "mpi": {
      "show_table": true,
      "log_messages": true,
      "only_for_ranks": "all",
      "show_sync_time": false,
      "show_callpath": false,
      "sort_table": {
         "column": "none",
         "ascending": false
      }
   },
   "cuda": {
      "show_table": true,
      "sort_table": {
         "column": "time",
         "ascending": false
      }
   },
   "hardware_scenarios": {
      "active": false
   }
}
  • off (Boolean): turn on/off vftrace profiling.
  • output_directory (String): Directory for log-and vfd-files.
  • outfile_basename (String): Basename for log-and vfd-files (_all.log). null results in the executable name as basename.
  • logfile_for_ranks (String): Specifies which ranks get local log-files. Valid values are "none", "all", "a-b", "a,b", or a mix of the latter two. E.g. "2,3,5-8,11,13" will generate local log-files for ranks 2,3,5,6,7,8,11,13.
  • print_config (Boolean): Whether to append the used config to the logfile.
  • strip_module_names (Boolean): Whether to strip module names from fortran symbols to make them more readable. Example: my_lengthy_module_MOD_my_function will be stripped to my_function.
  • demangle_cxx (Boolean): Whether to demangle C++ symbols to make them more readable. This requires vftrace to be build with libiberty support and the application to be linked to -liberty. Example: _Z9quicksortIiEviPT and _Z9quicksortIdEviPT will both be demangled to quicksort.
  • profile_table (Section): This section controls how the runtime profile table is printed in the logfile.
    • show_table (Boolean): Whether to print the runtime profile table.
    • show_calltime_imbalances (Boolean): Whether to include columns on the calltime imbalances across ranks. If a routine is executed on multiple ranks and they show vastly different runtimes it hints as badly distributed load. The Imbalances[%] column shows the largest deviation (positive value = more than average; negative value = less than average) from the average time among all ranks in percent. The on rank shows the rank number on which this largest diviation occoured. In the following example the routine work2 might need optimization an better load balancing, as it has to much work, compared to the average rank in this routine.
      +-------+-----------+-----------+-----------+---------------+---------+--------------+--------+------+
      | Calls | t_excl[s] | t_excl[%] | t_incl[s] | Imbalances[%] | on rank |   Function   | Caller | STID |
      +-------+-----------+-----------+-----------+---------------+---------+--------------+--------+------+
      |     4 |    52.002 |      55.8 |    52.002 |         23.07 |       3 |        work2 |   main |    1 |
      |     4 |    40.003 |      42.9 |    40.003 |          0.00 |       1 |        work1 |   main |    3 |
      ...
      
    • show_callpath (Boolean): Whether to include a column containing the complete callstack as shown Application Profiling#Global Call Sacks. This can make the tables extremely wide and unreadable, but unnecessitates the need to lookup the StackID in the global call stack list.
      +-------+-----------+-----------+-----------+--------------+--------+------+------------------------+
      | Calls | t_excl[s] | t_excl[%] | t_incl[s] |   Function   | Caller | STID |        Callpath        |
      +-------+-----------+-----------+-----------+--------------+--------+------+------------------------+
      ...
      |     1 |     0.000 |       0.0 |     0.000 |          foo |   main |    3 |          foo<main<init |
      |     1 |     0.000 |       0.0 |     0.000 |          bar |   main |    6 |          bar<main<init |
      |     1 |     0.000 |       0.0 |     0.000 |          foo |    bar |    5 |      foo<bar<main<init |
      ...
      
    • show_overhead (Boolean): Whether to include a column containing the function specific vftrace overhead. This helps to identify functions where the runtime is small compared to the overhead introduced by vftrace. If a function has a high overhead compared to its exclusive time and has a high callcount, the chances are good, that the pure call overhead (without a profiler) will be significant as well. To reduce the functions impact on the profile it could be excluded from the instrumentation.
      +----------+-----------+-----------+-----------+-------------+--------------+--------+------+
      |  Calls   | t_excl[s] | t_excl[%] | t_incl[s] | overhead[s] |   Function   | Caller | STID |
      +----------+-----------+-----------+-----------+-------------+--------------+--------+------+
      | 29860703 |    19.763 |      98.1 |    19.763 |      12.897 |      fib_rec |   main |    1 |
      ...
      
    • sort_table (Section)
      • column
      • ascending
  • name_grouped_profile_table
    • show_table
    • max_stack_ids
    • sort_table
      • column
      • ascending
  • sampling
    • active
    • sample_interval
    • outbuffer_size
    • precise_functions
  • mpi
    • show_table
    • log_messages
    • only_for_ranks
    • show_sync_time
    • show_callpath
    • sort_table
      • column
      • ascending
  • cuda
    • show_table
    • sort_table
      • column
      • ascending
  • hardware_scenarios
    • active
Clone this wiki locally