Skip to content

VTuneFocusedConnector

Si Hammond edited this page Mar 14, 2017 · 8 revisions

Tool Description

VTuneFocusedConnector works very similar to the VTuneConnector tool. The only difference is that it turns profiling via vtune off outside of frames. Note that the allowable frequency for this is not extremely high. It usually should only happen every few ms. Therefore this tool is often used in conjunction with the KernelFilter.

The tool is located at: https://github.com/kokkos/kokkos-tools/tree/master/src/tools/vtune-focused-connector

Compilation

The Makefile needs to know where VTune's home directory is. Other than that simply type "make" inside the source directory. When compiling for specific platforms modify the simple Makefile to use the correct compiler and link flags.

Usage

This is a standard tool which does not yet support tool chaining. Modify your VTune run environment to include:

KOKKOS_PROFILE_LIBRARY={PATH_TO_TOOL_DIRECTORY}/kp_vtune_focused_connector.so

When using it in conjunction with KernelFilter a filter file has to provided:

export KOKKOSP_KERNEL_FILTER=kernels.lst
export KOKKOS_PROFILE_LIBRARY={PATH_TO_TOOL_DIRECTORY}/kernel-filter/kp_kernel_timer.so;{PATH_TO_TOOL_DIRECTORY}/vtune-focused-connector/kp_vtune_focused_connector.so
./application COMMANDS

This tool's additional memory footprint is dwarfed by the memory usage of VTune during profiling.

Output

Switch to the domain/frame based view inside of VTune to analyze your applications kernel focused.

Example Output

Consider the following code:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  int N = 100000000;
  
  Kokkos::View<double*> a("A",N);
  Kokkos::View<double*> b("B",N);
  Kokkos::View<double*> c("C",N);
  
  Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int& i) {
    a(i) = 1.0*i;
    b(i) = 1.5*i;
    c(i) = 0.0;
  });
  
  double result = 0.0;
  for(int k = 0; k<50; k++) {
    
    Kokkos::parallel_for("AXPB", N, KOKKOS_LAMBDA (const int& i) {
      c(i) = 1.0*k*a(i) + b(i);
    });
    
    double dot;
    Kokkos::parallel_reduce("Dot", N, KOKKOS_LAMBDA (const int& i, double& lsum) {
      lsum += c(i)*c(i);
    },dot);
    result += dot;
  
  }
  printf("Result: %lf\n",result);
  Kokkos::finalize();
}

We now run VTune using with this tool and the KernelFilter using the following filter list:

AX(.*)
Dot

Here is a screenshot in VTune of the Bottom-up Frame/Domain view. The Kernel names are used for the domains, and individual calls with the same name are frames in that domain. Note how after an initial phase profiling halts. At the beginning of the pause the profiling tool is initialized and VTune profiling is turned off. It is turned on when the application enters either of the two matching kernels.

VTuneDomainFrame

Clone this wiki locally