Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable DirectToVgprA/B step 2 #1204

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

nakajee
Copy link
Contributor

@nakajee nakajee commented Oct 4, 2024

  • enable more data types (f8,f64)
  • enable DirectToVgprB
  • enable DTV + pack
  • enable DTV + LocalSplitU
  • enable DTV + MatrixInstB
  • enable DTV + StreamK

- enable more data types (f8,f64)
- enable DirectToVgprB
- enable DTV + pack
- enable DTV + LocalSplitU
- enable DTV + MatrixInstB
@@ -3432,7 +3444,8 @@ def subCheckLdsBlockSizePerPad(tc, idx):
state["LdsOffsetB"] = state["LdsOffsetMetadata"] + state["LdsNumElementsAlignedMetadata"]

offsetBlk = state["LdsOffsetB"] + ldsNumBytesAlignedB
offsetBlk = int(2**(math.ceil(math.log(offsetBlk, 2))))
if offsetBlk > 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to prevent log 0 in DTVA+DTVB case.

@@ -2135,6 +2146,7 @@ def assignDerivedParameters(state):

if state["StreamK"] != 0:
state["GlobalSplitU"] = 0 # Cannot enable both Stream-K and GSU
state["GlobalSplitUAlgorithm"] = "MultipleBuffer" # Set default Algorithm
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamK + GlobalSplitUAlgorithm casuses code generation error.
Force to set MultipleBuffer in StreamK case.

@@ -1916,10 +1922,15 @@ def isDirectToVgprDoable(state, tc):
return False

# does not work with UnrollLoopSwapGlobalReadOrder
if state["UnrollLoopSwapGlobalReadOrder"]>1:
if state["UnrollLoopSwapGlobalReadOrder"]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a mistake in my previous commit.
Fixed the condition.

@@ -162,7 +162,6 @@ class StateValues:
lrvwUnrollB: int = 0
lrvwUnrollMetadata: int = 0 # For Sparse Metadat

vgprValuDouble: bool = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vgprValuDouble was not set to true anywhere.
This was originally added by me in Tensile for DirectToLds.
However, this was necessary due to some scheduling bug and this was already removed in Tensile.
I removed it this time (to avoid adding code for unnecessary feature).

@nakajee
Copy link
Contributor Author

nakajee commented Oct 4, 2024

I ran tensilelite tox and hipblaslt-test on local gfx942 node.
All passed.

@nakajee
Copy link
Contributor Author

nakajee commented Oct 4, 2024

Small update.

  • numBytes * VW < 4 should be in if TLU
  • DTV + packing requires ClusterLocalRead. Added code to set 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gfx94x Run CI on gfx94x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant