Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLPC] Scalarize non-uniform loads inside the waterfall loop #2759

Closed

Conversation

kmitropoulou
Copy link
Contributor

@kmitropoulou kmitropoulou commented Oct 12, 2023

This patch refactors the code that pushes the loads inside the waterfall loop:

  1. removes the optimization for scalarized loads
  2. scalarize non-uniform instructions when it is possible.

@jayfoad
Copy link
Member

jayfoad commented Oct 12, 2023

Can you add a description please? Is this PR identical to #2645 or have you fixed something?

@kmitropoulou
Copy link
Contributor Author

kmitropoulou commented Oct 12, 2023

Yes, this patch is an update for #2645

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 7033caf

CTS tests (Failed: 31/139021)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 41128/69517 (59.2%)
    • Failed: 16/69517 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Flake
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28373/69517 (40.8%)
    • Warnings: 0/69517 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 41163/69504 (59.2%)
    • Failed: 15/69504 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28326/69504 (40.8%)
    • Warnings: 0/69504 (0.0%)

@kmitropoulou
Copy link
Contributor Author

test this please

@perlfu
Copy link
Contributor

perlfu commented Oct 12, 2023

Technically this is a revert for #2705 as well?
As it brings back the optional_bool parts, etc.

At a high level, GFX 10.3.2 codegen is just modifying the image descriptor after it is loaded as a workaround.
This code seems to explicitly match the sequence of that modification?
Is it possible to make the code more generic so it would be robust to changes in that modification sequence?

@kmitropoulou
Copy link
Contributor Author

Is it possible to make the code more generic so it would be robust to changes in that modification sequence?

As we discussed, this was my intention. Any suggestions are welcome :)

@kmitropoulou
Copy link
Contributor Author

Technically this is a revert for #2705 as well?

#2645 should have never been committed. So, I had to revert it in #2705.

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 7033caf

CTS tests (Failed: 31/139021)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 41129/69517 (59.2%)
    • Failed: 15/69517 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28373/69517 (40.8%)
    • Warnings: 0/69517 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 41162/69504 (59.2%)
    • Failed: 16/69504 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Flake
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28326/69504 (40.8%)
    • Warnings: 0/69504 (0.0%)

Copy link
Member

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of matching feels sort of backwards. If we know the code sequences that should go into the waterfall loops, shouldn't we create them like that directly? I'll try to see if I can expand on this, but I think the gist of it is that we should put the waterfall loop creation upside-down: instead of creating the image instruction and then calling createWaterfallLoop, we should call createWaterfallLoop first, and then insert instructions into the loop.

Value *OrigLoad = nullptr;
Value *OrigAnd = nullptr;
Value *OrigGep = nullptr;
Value *OrigExtract = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style in LLPC is to use lowerCamelCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Copy link
Member

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember now what the difficulty was with just creating waterfall loops directly. The descriptors load instructions are created separately from when the actual image access instructions are created. I wonder whether we shouldn't change this.

Copy link
Member

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose with some fixes this PR can be at least made correct.

Comment on lines 61 to 65
llvm::Value *emitWaterfallBegin(llvm::Instruction *nonUniformInst, llvm::ArrayRef<unsigned> operandIdxs,
llvm::ArrayRef<llvm::Value *> nonUniformIndices, bool useVgprForOperands = false,
const llvm::Twine &instName = "");

llvm::Value *emitWaterfallBeginForScalarizedLoops(llvm::Instruction *nonUniformInst,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods should be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

if (nonUniformIndices.empty())
return nonUniformInst;

if (nonUniformInst->getType()->isVoidTy())
scalarizeDescriptorLoads = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry I missed this. This is for the case that we have the following intrinsic: @llvm.amdgcn.struct.buffer.store.format.v4f32()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please comment on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't make sense though. The "scalarization" is a function of the descriptor inputs to the intrinsic, not a function of its output. This should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove it and I added two tests : scalarizationOfDescriptorLoadTest1.ll and scalarizationOfDescriptorLoadTest2.ll

newLoad->insertBefore(nonUniformInst);
nonUniformInst->setOperand(operandIdx, newLoad);
newGep->insertBefore(newLoad);
newGep->setOperand(1, readFirstLane);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check that operand 1 is actually the firstIndexInst.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not do it in the new version of the code.

auto *loadOp = dyn_cast<LoadInst>(OrigLoad);
auto *gepOp = dyn_cast<GetElementPtrInst>(OrigGep);
auto *AndOp = dyn_cast<Instruction>(OrigAnd);
auto *extractOp = dyn_cast<ExtractElementInst>(OrigExtract);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all just be cast. No reason to use dyn_cast after previously doing all the match checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out :) Not needed any more.

@nhaehnle
Copy link
Member

Yet another follow-up to this... wouldn't it be possible to collect the relevant instructions in traceNonUniformIndex and then optionally clone them into the waterfall loop in a generic fashion? That seems preferable over hard-coding the specific code sequences.

@perlfu
Copy link
Contributor

perlfu commented Oct 13, 2023

Yet another follow-up to this... wouldn't it be possible to collect the relevant instructions in traceNonUniformIndex and then optionally clone them into the waterfall loop in a generic fashion? That seems preferable over hard-coding the specific code sequences.

I agree capturing the visited instruction chain from traceNonUniformIndex seems like a good way to avoid this being brittle.

Longer term I have been suggesting that we move this lowering to separate pass later in the compiler and simply annotate the output of non-uniform instructions as %val = call @lgc.create.waterfall(%ret) or similar.
Your point about earlier parts of the compiler knowing potentially knowing which instructions are non-uniform is valid.
We could be preserving those annotations out of the SPIRVReader for use in later lowering -- although I suspect in some cases it might not tell us much more than operandIdxs does already.

@kmitropoulou
Copy link
Contributor Author

@nhaehnle : Thank you for the suggestion. I will try to capture the instruction chain from traceNonUniformIndex .

@@ -35,9 +35,11 @@
#include "lgc/state/TargetInfo.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/PatternMatch.h"
Copy link
Contributor

@tsymalla-AMD tsymalla-AMD Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the namespace directly into the method emitWaterfallBeginForScalarizedLoops?

@kmitropoulou
Copy link
Contributor Author

@tsymalla-AMD : I removed all the pattern match code :)

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 23cd027

CTS tests (Failed: 32/138994)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 41128/69503 (59.2%)
    • Failed: 16/69503 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28359/69503 (40.8%)
    • Warnings: 0/69503 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 41162/69491 (59.2%)
    • Failed: 16/69491 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28313/69491 (40.7%)
    • Warnings: 0/69491 (0.0%)

Copy link
Contributor

@tsymalla-AMD tsymalla-AMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some CTS are failing, but I did not check if those are related.

lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
lgc/builder/BuilderImpl.cpp Outdated Show resolved Hide resolved
@amdvlk-admin
Copy link
Collaborator

Test summary for commit d180bcf

CTS tests (Failed: 32/138994)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 41128/69503 (59.2%)
    • Failed: 16/69503 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28359/69503 (40.8%)
    • Warnings: 0/69503 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 41162/69491 (59.2%)
    • Failed: 16/69491 (0.0%)

      Failures:

      FAILURE: dEQP-VK.descriptor_indexing.sampled_image
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
      Stack trace: Fail
      
      FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
      Stack trace: Fail
      

    • Not Supported: 28313/69491 (40.7%)
    • Warnings: 0/69491 (0.0%)

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 43a7a13

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

// Initialization of instrToIndex and indexToInstr.
if (scalarizeDescriptorLoads) {
unsigned cnt = 0;
for (Instruction *I = nonUniformInst->getPrevNode(); I != nullptr && cnt < 64; I = I->getPrevNode(), ++cnt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the maximum number of steps something we should parametrize (or at least provide it as easily parametrizable value)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return;
}
assert(instrDeps.contains(currentVisitedInsn) && "The current visited instruction should have been in the map.");
auto it1 = instrDeps.try_emplace(newValue).first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a structured binding so this is a bit cleaner?

Copy link
Contributor Author

@kmitropoulou kmitropoulou Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is needed because try_empace() returns a pair of an interator and bool. The latter is true if the newValue was inserted in the map and it is false if it was already existed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you can check for the boolean value directly. However, it is just personal taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not using the boolean. That's why I did not add it.

auto &setOfInsns = it1->second;
auto it2 = instrDeps.find(currentVisitedInsn);
const auto &set = it2->second;
for (auto it3 = set.begin(indexToInstr), ite = set.end(indexToInstr); it3 != ite; ++it3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a range based for loop instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not do it because begin() and end() of TinyInstructionSet have an argument. Please let me know if there is a way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you originally stored indexToInstr as a member reference, right? With that approach, it would have worked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly :)

Copy link
Contributor

@tsymalla tsymalla Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not storing the reference to indexToInstr and instead add a separate member function that returns an iterator_range on which you can iterate?

iterator_range<IteratorT> getIteratorRange(IndexToInstrMap &indexToInstr) {
return iterator_range(set.begin(indexToInstr), set.end(indexToInstr));
}

That should allow you to make use of the range based for loop and not increase the storage size of the TinyInstructionSet?

@kmitropoulou kmitropoulou changed the title [WIP][LLPC] Scalarize non-uniform loads inside the waterfall loop [LLPC] Scalarize non-uniform loads inside the waterfall loop Nov 20, 2023
@amdvlk-admin
Copy link
Collaborator

Test summary for commit bbe0a57

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

@kmitropoulou kmitropoulou force-pushed the scalarizeloads branch 2 times, most recently from 504d64b to 8179bc4 Compare November 20, 2023 21:46
@amdvlk-admin
Copy link
Collaborator

Test summary for commit 8179bc4

CTS tests (Failed: 0/138378)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35162/69163 (50.8%)
    • Failed: 0/69163 (0.0%)
    • Not Supported: 34001/69163 (49.2%)
    • Warnings: 0/69163 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 504d64b

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35241/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33974/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

void TraceNonUniformIndex::insertNewValueInInstrDeps(Value *newValue, Instruction *currentVisitedInstr) {
if (!instrToIndex.contains(currentVisitedInstr)) {
// The instruction is either outside of 64 limit or in a different basic block. So, we bail-out scalarization.
scalarizeDescriptorLoads = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems setting TraceNonUniformIndex.scalarizeDescriptorLoads to false here means that the next time TraceNonUniformIndex::run is invoked it will not try to scalarize. That means if there are two non-uniform operands of, say, image.sample, if the first one could not be scalarized, then the second one would not be either.

Copy link
Contributor Author

@kmitropoulou kmitropoulou Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the existing code (before my changes) works. I am working to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Can you extend the comment to make it clear that this will also affect scalarization of other operands?

Implementation-wise I guess it is fine to address it in a follow-up commit to keep the scope of the changes smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #2859


void TraceNonUniformIndex::insertNewValueInInstrDeps(Value *newValue, Instruction *currentVisitedInstr) {
if (!instrToIndex.contains(currentVisitedInstr)) {
// The instruction is either outside of 64 limit or in a different basic block. So, we bail-out scalarization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in a different basic block

I know this is a part of the design of this patch, but do we really have to give up the scalarization if an instruction is in a different block?

A descriptor load in a different basic block than the image op is quite common, and with the changes here we would stop scalarizing it.

Copy link
Contributor Author

@kmitropoulou kmitropoulou Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not part of the design of this patch. This is how the existing code (before my changes) works because there is not backend support for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I misspoke - this is not really related to your patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #2877

Copy link
Member

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code changes are good. I think Piotr's points are fair but let's address them in separate PRs that you can work on while we're working out the last remaining kinks here.

There's still a thing I noticed inline, and the LGC tests ought to be simplified. As-is, they're touching a bunch of unrelated things like reading VS inputs. Please take a look at other tests, especially in lgc/test/Transforms/{Combine,Lower}CooperativeMatrix for examples of tests that are focused on shorter instruction sequences.

lgc/include/lgc/builder/BuilderImpl.h Outdated Show resolved Hide resolved
@kmitropoulou
Copy link
Contributor Author

@nhaehnle : I am working on refactoring the tests.

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 627712f

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

@kmitropoulou
Copy link
Contributor Author

@nhaehnle : I am working on refactoring the tests.

Done :)

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 195b936

CTS tests (Failed: 0/138443)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35211/69228 (50.9%)
    • Failed: 0/69228 (0.0%)
    • Not Supported: 34017/69228 (49.1%)
    • Warnings: 0/69228 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

@kmitropoulou
Copy link
Contributor Author

I refactored the tests as much as I could.

@kmitropoulou
Copy link
Contributor Author

ping

1 similar comment
@kmitropoulou
Copy link
Contributor Author

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants