Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024.09.18 intel_gpu bug fixes #253

Merged
merged 3 commits into from
Oct 9, 2024

Conversation

dbarry9
Copy link
Contributor

@dbarry9 dbarry9 commented Oct 3, 2024

Pull Request Description

This PR addresses bugs that were present in the intel_gpu component, namely:

  • Fix mapping between the metric code and metric group ID.
  • Remove check preventing metrics from separate groups from being added to separate event sets.
  • Implement PAPI_cleanup_eventset() functionality.

This Pull Request addresses Issue #227.

These changes have been tested using an OpenCL matrix addition kernel on the Intel Ponte Vecchio architecture.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@dbarry9 dbarry9 added the draft Do not merge (yet) label Oct 3, 2024
@dbarry9 dbarry9 removed the draft Do not merge (yet) label Oct 3, 2024
@dbarry9 dbarry9 linked an issue Oct 3, 2024 that may be closed by this pull request
There were previously only 8 bits allotted to enumerate the metric
groups, limited the number of groups to 256.
However, there are at least 1433 groups available on the Intel Ponte
Vecchio architecture.

These changes have been tested on the Intel Ponte Vecchio architecture.
The metric group should not be checked when PAPI_add_event() is called.
It should be checked upon PAPI_start() only.

These changes have been tested on the Intel Ponte Vecchio architecture.
Reset the appropriate fields in the context struct when PAPI_cleanup()
is called.
This corresponds to intel_gpu_update_control_state() being called with
the value 'count' equal to zero.

Also reset the internal metric counts when PAPI_start() is called, so
that the previous values do not persist after the next PAPI_start().

These changes have been tested on the Intel Ponte Vecchio architecture.
@dbarry9 dbarry9 force-pushed the 2024.09.18_intel-gpu-groups branch from 1942be9 to 0c0ad98 Compare October 9, 2024 18:50
@dbarry9 dbarry9 merged commit 8433c96 into icl-utk-edu:master Oct 9, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

intel_gpu: seg fault from second PAPI_stop()
2 participants