Skip to content

Commit

Permalink
profile with kineto for small kernels
Browse files Browse the repository at this point in the history
  • Loading branch information
amirakb89 committed Dec 30, 2024
1 parent fbf3cd0 commit 933029d
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions fbgemm_gpu/bench/split_table_batched_embeddings_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -1331,6 +1331,9 @@ def _kineto_trace_handler(p: profile, phase: str) -> None:
p.export_chrome_trace(
trace_url.format(tbe_type=tbe_type, phase=phase, ospid=os.getpid())
)
# averges the sum of all kernels
total_cuda_time = sum(event.device_time*event.count/(iters+1) for event in p.key_averages() if event.cpu_time == 0.0)
print(f"Total CUDA time: {total_cuda_time:.3f} ")

# pyre-ignore[3]
def context_factory(on_trace_ready: Callable[[profile], None]):
Expand Down

0 comments on commit 933029d

Please sign in to comment.