Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Detailed Profiling of Graph Nodes in OmniTrace #335

Open
OmarSayedMostafa opened this issue Apr 9, 2024 · 1 comment
Open

Enabling Detailed Profiling of Graph Nodes in OmniTrace #335

OmarSayedMostafa opened this issue Apr 9, 2024 · 1 comment

Comments

@OmarSayedMostafa
Copy link

OmarSayedMostafa commented Apr 9, 2024

Hi, I am currently working on profiling VLLM and I observed that the tool captures the execution of graph kernels at a high level but does not provide detailed insights into individual graph nodes' execution.
image
image

My goal is to obtain detailed profiling information on the execution of individual graph nodes, similar to the capabilities offered by Nvidia Nsight, which allows for tracking nodes instead of just graph-level execution.
image

I am seeking guidance or a workaround to enable detailed profiling of graph nodes within OmniTrace. Any insights or configuration options?

here is the command I use:
omnitrace-run -c ~/.omnitrace.cfg --enable-categories device-critical-trace device_busy device_hip device_hsa device_memory_usage python rocm_hip rocm_hsa rocm_smi rocprofiler roctracer --roctracer-hip-activity --roctracer-hip-api --roctracer-hsa-activity --roctracer-hsa-api -- python -m omnitrace -- vllm_benchmark.py

Thanks in advance.

@OmarSayedMostafa OmarSayedMostafa changed the title Visualize/trace hip kernels nodes from Launched graph using hipGraphLaunch. Enabling Detailed Profiling of Graph Nodes in OmniTrace Apr 9, 2024
@jrmadsen
Copy link
Collaborator

Given that the arrows flow from the API functions to multiple kernels, it appears that you are indeed getting the individual graph node execution. The --roctracer-hsa-activity option that you have enables that. You might want to remove the --hip-device-activity option bc that is the “high-level” kernel tracing option and doing both simultaneously might be doing funny things with the connection of the flow events and could also contribute to why none of the kernel function names are getting resolved beyond “Kernel Execution”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants