Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split local write instructions. #1180

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

hcman2
Copy link
Contributor

@hcman2 hcman2 commented Sep 30, 2024

ds_store_b128 takes more cycles then 1 mfma latency.
Split b128 into b32's and schedule into different mfma if ppossible.

@jichangjichang
Copy link
Collaborator

would this cause LDS bank conflict?

@hcman2
Copy link
Contributor Author

hcman2 commented Sep 30, 2024

would this cause LDS bank conflict?
I don't think this patch will change the LDS bank behavior.
ds_store_b128 addr, vgpr[0:3], offset:0
is changed into
ds_store_b32 addr, vgpr[0], offset:0
ds_store_b32 addr, vgpr[1], offset:4
ds_store_b32 addr, vgpr[2], offset:8
ds_store_b32 addr, vgpr[3], offset:12

if ds_store_b128 has no bank conflict, ds_store_b32 should has no bank conflict either.

@hcman2
Copy link
Contributor Author

hcman2 commented Sep 30, 2024

[----------] Global test environment tear-down
[==========] 48206 tests from 13 test suites ran. (1332135 ms total)
[ PASSED ] 48206 tests.

@hcman2 hcman2 added the gfx94x Run CI on gfx94x label Sep 30, 2024
@jichangjichang
Copy link
Collaborator

would this cause LDS bank conflict?
I don't think this patch will change the LDS bank behavior.
ds_store_b128 addr, vgpr[0:3], offset:0
is changed into
ds_store_b32 addr, vgpr[0], offset:0
ds_store_b32 addr, vgpr[1], offset:4
ds_store_b32 addr, vgpr[2], offset:8
ds_store_b32 addr, vgpr[3], offset:12

if ds_store_b128 has no bank conflict, ds_store_b32 should has no bank conflict either.

ds_store_b32 operations are issued by 32 threads at the same cycle.
ds_store_b128 operations are issued by 8 threads only at the same cycle.
This PR should change the access pattern of LDS banks for each cycle.

@hcman2
Copy link
Contributor Author

hcman2 commented Oct 1, 2024

gfx90a passed
[----------] Global test environment tear-down
[==========] 13061 tests from 13 test suites ran. (480980 ms total)
[ PASSED ] 13061 tests.

@hcman2
Copy link
Contributor Author

hcman2 commented Oct 1, 2024

========== 76 passed, 32 skipped, 1258 warnings in 8858.51s (2:27:38) ==========
gfx94x tox passed.

@hcman2
Copy link
Contributor Author

hcman2 commented Oct 4, 2024

If we split a b128 into 4 x b32. We will have default 4-way bank conflict.
The extra cycles for 1 4-way bank conflict b32 instruction are 2, 1 for addressing and 1 for data.
The basic ds_store_b32 is about 56 cycles so we have no instruction stall (less then 4 x mfma).
Considering the barrier, we have to make sure that the final LW has more then 56+2 cycles to be waited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gfx94x Run CI on gfx94x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants