-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable DirectToVgprA/B step 2 #1204
base: develop
Are you sure you want to change the base?
Conversation
nakajee
commented
Oct 4, 2024
•
edited
Loading
edited
- enable more data types (f8,f64)
- enable DirectToVgprB
- enable DTV + pack
- enable DTV + LocalSplitU
- enable DTV + MatrixInstB
- enable DTV + StreamK
- enable more data types (f8,f64) - enable DirectToVgprB - enable DTV + pack - enable DTV + LocalSplitU - enable DTV + MatrixInstB
@@ -3432,7 +3444,8 @@ def subCheckLdsBlockSizePerPad(tc, idx): | |||
state["LdsOffsetB"] = state["LdsOffsetMetadata"] + state["LdsNumElementsAlignedMetadata"] | |||
|
|||
offsetBlk = state["LdsOffsetB"] + ldsNumBytesAlignedB | |||
offsetBlk = int(2**(math.ceil(math.log(offsetBlk, 2)))) | |||
if offsetBlk > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent log 0 in DTVA+DTVB case.
@@ -2135,6 +2146,7 @@ def assignDerivedParameters(state): | |||
|
|||
if state["StreamK"] != 0: | |||
state["GlobalSplitU"] = 0 # Cannot enable both Stream-K and GSU | |||
state["GlobalSplitUAlgorithm"] = "MultipleBuffer" # Set default Algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamK + GlobalSplitUAlgorithm casuses code generation error.
Force to set MultipleBuffer in StreamK case.
@@ -1916,10 +1922,15 @@ def isDirectToVgprDoable(state, tc): | |||
return False | |||
|
|||
# does not work with UnrollLoopSwapGlobalReadOrder | |||
if state["UnrollLoopSwapGlobalReadOrder"]>1: | |||
if state["UnrollLoopSwapGlobalReadOrder"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake in my previous commit.
Fixed the condition.
@@ -162,7 +162,6 @@ class StateValues: | |||
lrvwUnrollB: int = 0 | |||
lrvwUnrollMetadata: int = 0 # For Sparse Metadat | |||
|
|||
vgprValuDouble: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vgprValuDouble was not set to true anywhere.
This was originally added by me in Tensile for DirectToLds.
However, this was necessary due to some scheduling bug and this was already removed in Tensile.
I removed it this time (to avoid adding code for unnecessary feature).
I ran tensilelite tox and hipblaslt-test on local gfx942 node. |
Small update.
|