Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm component refactoring #49

Merged
merged 26 commits into from
Oct 11, 2023

Conversation

gcongiu
Copy link
Contributor

@gcongiu gcongiu commented Jul 13, 2023

Pull Request Description

The rocm component enables multiple versions of the vendor profiling library to work correctly with PAPI. This is achieved by wrapping the rocprofiler calls with a new (more generic and simple) interface. The rocm component appropriately dispatches PAPI requests to the appropriate rocprofiler version through the wrapper (or dispatch) layer. Currently, the rocprofiler code contains functionalities (e.g., acquiring and releasing devices in multithreaded applications) that are not specific to the rocprofiler version and can be made available to future versions of rocprofiler. This PR restructures the rocm component code to extract common functionality and improve extendability of the component.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@gcongiu gcongiu force-pushed the 2023.05.12_rocm-cmp-refactor branch 4 times, most recently from c28f680 to 2b37d82 Compare July 25, 2023 09:47
@gcongiu gcongiu requested a review from adanalis August 18, 2023 09:38
@gcongiu gcongiu force-pushed the 2023.05.12_rocm-cmp-refactor branch 6 times, most recently from 35b7337 to f63abb4 Compare September 8, 2023 13:22
Errors associated with rocprofiler calls are assigned PAPI_EMISC, while
errors caused by unexpected user actions (e.g. starting an eventset that
is already running) are assigned PAPI_EINVAL. Everything that is not a
memory allocation failure (PAPI_ENOMEM) is assigned the PAPI_ECMP error.
This guard was introduced when rocmtools was planned instead of
rocprofiler V2.
Some functionality can be shared with other profiler versions, if and
when these become available. Thus, it makes sense to extract such
functionality from the specific profiler implementation and make it
available to future profiler versions.
The rocm lock and the profiling mode variables need to be shared between
the front-end and the back-end. The reason for the lock is that this has
to be initialized by the front-end which is the only one with access to
the required information. This lock design in PAPI is flawed as it is
hard to extend.
@gcongiu gcongiu force-pushed the 2023.05.12_rocm-cmp-refactor branch from 3871cf6 to 60d30cc Compare October 9, 2023 17:38
@gcongiu gcongiu merged commit eb71dfd into icl-utk-edu:master Oct 11, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants