Skip to content

Commit

Permalink
Implement shader table patching; Fix issues
Browse files Browse the repository at this point in the history
  • Loading branch information
nyorain committed Dec 20, 2024
1 parent c21587b commit 8868ba5
Show file tree
Hide file tree
Showing 26 changed files with 15,229 additions and 7,837 deletions.
26 changes: 20 additions & 6 deletions docs/own/todo.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,18 @@ urgent, bugs:
- [ ] a lot of descriptor code was probably never really tested for array bindings.
Make sure everything works.

- [ ] revisit timestamp queries. Shouldn't the first query be done
*before* the hooked cmd and not after??
- [ ] fix syncval hazards in gui (try out commands, e.g. transfer UpdateBuffer)
- [ ] windows performance is *severely* bottlenecked by system allocations from LinearAllocator.
Increased it temporarily but should probably just roll own block sub-allocator
(try to test with RDR2 again)

new, workstack:
- [ ] use isStateCmd(const Command&) to remove remaining command dynamic casts
- [ ] generate structSize in util/util.cpp, it is incomplete at the moment.
Also problematic that we cannot know the size of some platform-specific
structs. Important for chain copying!
- [ ] handle imgui cursor-to-be-shown and clipboard
- [ ] make sure to pass it via interface
- [ ] with hooked overlay, we have to implement it ourselves
Expand Down Expand Up @@ -267,6 +272,14 @@ patch capture shader debugging:
- [x] alternative positions: mouse cursor/last cliked mouse cursor
- [ ] allow to get there via a "debug this pixel" button
in image viewer?
- [ ] indicate in UI when patch job is running.
For raytracing pipelines, it can be quite a while.
- [ ] improve patching speed: don't insert every instruction into spirv vector.
First gather everything (for every section etc) then do one
patch-build pass.
Current approach copies again and again, problematic for large shaders.
- [ ] potential CRASH: we assume the pipeLayout still has a valid handle.
this might not be the case. We could recreate it, though.
- [ ] separate function arguments and local vars in UI
- [ ] toggle via UI: also capture all local named SSA IDs
- [ ] matrix decoration in captured output
Expand All @@ -275,23 +288,24 @@ patch capture shader debugging:
- [ ] ray tracing debugging
- [x] branch in all shaders via LaunchID
- [ ] additionally: allow to branch via ahs/chs inputs
- [ ] hook shader tables
- [x] hook shader tables
create reverse mapping inside of pipeline
then, when hooking the traceRayCommand, create own shader table
on the fly where the hooked shaders are replaced?
- [ ] fix CRASH potential in PatchJobData::pipe. Should not be
IntrusiveDerivedPtr, something else. Should keep it alive in
a different way.
- [ ] fix terrible pipeline-keepAlive ShaderPatch hack
- [x] fix terrible pipeline-keepAlive ShaderPatch hack
- [ ] additional shader debug selects (vertex/fragment)
- [ ] Layer
- [ ] ViewIndex
- [ ] ViewportIndex
- [ ] improve patch compile time by passing in basePipelineHandle
- [ ] improve patching speed: don't insert every instruction into spirv vector.
First gather everything (for every section etc) then do one
patch-build pass.
Current approach copies again and again, problematic for large shaders.
Difficult: we have to gurantee that the handle stays alive while
the pipeline is created.
- [ ] maybe create the inner handle as an intrusive ptr as well?
can we really always use a basePipelineHandle tho?
What if related resources got destroyed?

spvm:
- [x] Add OpSpecConstant* support
Expand Down
2 changes: 1 addition & 1 deletion src/cb.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,7 @@ VKAPI_ATTR VkResult VKAPI_CALL BeginCommandBuffer(
dlg_assert(pBeginInfo->pInheritanceInfo->renderPass);
inherit = *pBeginInfo->pInheritanceInfo;

auto* dynRender = LvlFindInChain<
auto* dynRender = lvl_find_in_chain<
VkCommandBufferInheritanceRenderingInfo>(inherit.pNext);
if(pBeginInfo->pInheritanceInfo->renderPass) {
// get render pass
Expand Down
15 changes: 10 additions & 5 deletions src/commandHook/hook.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <util/profiling.hpp>

#include <accelStructVertices.comp.spv.h>
#include <shaderTable.comp.spv.h>

#include <copyTex.comp.1DArray.spv.h>
#include <copyTex.comp.u1DArray.spv.h>
Expand Down Expand Up @@ -118,6 +119,7 @@ CommandHook::CommandHook(Device& dev) {
initVertexCopy(dev);
if(hasAppExt(dev, VK_KHR_ACCELERATION_STRUCTURE_EXTENSION_NAME)) {
initAccelStructCopy(dev);
initShaderTableHook(dev);
}

// init capture
Expand All @@ -127,11 +129,7 @@ CommandHook::CommandHook(Device& dev) {
VK_BUFFER_USAGE_TRANSFER_DST_BIT;
captureBuffer_.ensure(dev, shaderCaptureSize, usage, {},
"CaptureBuf", OwnBuffer::Type::deviceLocal);

VkBufferDeviceAddressInfo info {};
info.sType = VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO;
info.buffer = captureBuffer_.buf;
captureAddress_ = dev.dispatch.GetBufferDeviceAddress(dev.handle, &info);
captureAddress_ = captureBuffer_.queryAddress();
}

CommandHook::~CommandHook() {
Expand Down Expand Up @@ -279,6 +277,13 @@ void CommandHook::initImageCopyPipes(Device& dev) {
}
}

void CommandHook::initShaderTableHook(Device& dev) {
hookShaderTable_.init(dev, {{{
shaderTable_comp_spv_data,
VK_SHADER_STAGE_COMPUTE_BIT}}}, vku::PipeCreator::compute({}),
"hookShaderTable", false);
}

void CommandHook::initAccelStructCopy(Device& dev) {
VkPushConstantRange pcrs[1] = {};
pcrs[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
Expand Down
15 changes: 12 additions & 3 deletions src/commandHook/hook.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ struct CommandHookTarget {
u32 submissionID {u32(-1)};
};

struct ShaderCaptureHook {
vku::Pipeline pipe {};
OwnBuffer shaderTableMapping {};
std::atomic<u32> refCount {};
};

// Defines which data to retrieve
struct CommandHookOps {
// Which operations/state copies to peform.
Expand All @@ -71,14 +77,14 @@ struct CommandHookOps {
bool copyXfb {}; // transform feedback
u32 xfbSizeHint {u32(-1)};

IntrusivePtr<ShaderCaptureHook> shaderCapture {};
Vec3u32 shaderCaptureInput {}; // globalThreadID/vertexID/...

bool copyIndirectCmd {};
std::vector<DescriptorCopyOp> descriptorCopies;
std::vector<AttachmentCopyOp> attachmentCopies; // only for cmd inside renderpass
bool queryTime {};

VkPipeline useCapturePipe {};
Vec3u32 capturePipeInput {}; // globalThreadID/vertexID/...

// transfer
bool copyTransferSrcBefore {};
bool copyTransferSrcAfter {};
Expand Down Expand Up @@ -235,6 +241,7 @@ struct CommandHook {
// Initializes the pipelines and data needed for acceleration
// structure copies
void initAccelStructCopy(Device& dev);
void initShaderTableHook(Device& dev);
void initImageCopyPipes(Device& dev);
void initVertexCopy(Device& dev);

Expand Down Expand Up @@ -311,6 +318,8 @@ struct CommandHook {
vku::DynamicPipe copyVerticesByte32_;
vku::DynamicPipe copyVerticesUint32_;

vku::DynamicPipe hookShaderTable_;

// TODO: merge with Device DescriptorPool somehow
vku::DescriptorAllocator dsAlloc_;
};
Expand Down
136 changes: 122 additions & 14 deletions src/commandHook/record.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,11 @@ void CommandHookRecord::initState(RecordInfo& info) {
!info.ops.attachmentCopies.empty() ||
!info.ops.descriptorCopies.empty() ||
info.ops.copyIndirectCmd ||
info.ops.useCapturePipe || // only for copying of the data afterwards
info.ops.shaderCapture || // only for copying of the data afterwards
hookClearAttachment;

this->shaderCapture = info.ops.shaderCapture;

auto preEnd = hcommand.end() - 1;
info.hookedSubpass = u32(-1);

Expand Down Expand Up @@ -456,17 +458,18 @@ void CommandHookRecord::hookRecordDst(Command& cmd, RecordInfo& info) {
}
}

if(info.ops.useCapturePipe) {
if(info.ops.shaderCapture) {
if(cmd.category() == CommandCategory::draw) {
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_GRAPHICS,
info.ops.useCapturePipe);
info.ops.shaderCapture->pipe.vkHandle());
} else if(cmd.category() == CommandCategory::dispatch) {
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_COMPUTE,
info.ops.useCapturePipe);
info.ops.shaderCapture->pipe.vkHandle());
} else if(cmd.category() == CommandCategory::traceRays) {
dlg_warn("TODO: implement capturePipe for RayTracing via shaderTable patching");
// dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR,
// info.ops.useCapturePipe);
// NOTE: indirect trace rays not supported here
dlg_assert(cmd.type() == CommandType::traceRays);
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR,
info.ops.shaderCapture->pipe.vkHandle());
} else {
dlg_error("Unexpected hooked command used with capturePipe");
}
Expand Down Expand Up @@ -499,7 +502,11 @@ void CommandHookRecord::hookRecordDst(Command& cmd, RecordInfo& info) {
}
}

dispatchRecord(cmd, info);
if(info.ops.shaderCapture && cmd.category() == CommandCategory::traceRays) {
hookRecordDstHookShaderTable(cmd, info);
} else {
dispatchRecord(cmd, info);
}

auto cmdAsParent = dynamic_cast<const ParentCommand*>(&cmd);
auto nextInfo = info;
Expand Down Expand Up @@ -540,6 +547,107 @@ void CommandHookRecord::hookRecordDst(Command& cmd, RecordInfo& info) {
hookRecordAfterDst(cmd, info);
}

void CommandHookRecord::hookRecordDstHookShaderTable(Command& dst, RecordInfo& info) {
// TODO: this computation is included in the queryTime.

auto& dev = *record->dev;
DebugLabel lbl(dev, cb, "vil:hookRecordDstHookShaderTable");

dlg_assert(info.ops.shaderCapture->shaderTableMapping.buf);
const auto& rtcmd = static_cast<const TraceRaysCmd&>(dst);

const auto shaderTableSize =
rtcmd.raygenBindingTable.size +
rtcmd.callableBindingTable.size +
rtcmd.hitBindingTable.size +
rtcmd.missBindingTable.size;
const auto usage =
VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT |
VK_BUFFER_USAGE_STORAGE_BUFFER_BIT |
VK_BUFFER_USAGE_SHADER_BINDING_TABLE_BIT_KHR;

this->shaderTable.ensure(dev, shaderTableSize, usage, {},
"hookedShaderTable");
auto dstAddress = this->shaderTable.queryAddress();
// TODO: cache this!
auto mappingAddress =
info.ops.shaderCapture->shaderTableMapping.queryAddress();

struct TableToPatch {
const VkStridedDeviceAddressRegionKHR& src;
VkStridedDeviceAddressRegionKHR dst {};
};

auto tables = std::array{
TableToPatch{rtcmd.raygenBindingTable},
TableToPatch{rtcmd.missBindingTable},
TableToPatch{rtcmd.hitBindingTable},
TableToPatch{rtcmd.callableBindingTable},
};

// dlg_trace("raygen: {} {}", rtcmd.raygenBindingTable.size, rtcmd.raygenBindingTable.stride);
// dlg_trace("miss: {} {}", rtcmd.missBindingTable.size, rtcmd.missBindingTable.stride);
// dlg_trace("hit: {} {}", rtcmd.hitBindingTable.size, rtcmd.hitBindingTable.stride);
// dlg_trace("call: {} {}", rtcmd.callableBindingTable.size, rtcmd.callableBindingTable.stride);

auto& pipe = dev.commandHook->hookShaderTable_;
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_COMPUTE,
pipe.pipe());

for(auto& table : tables) {
if(!table.src.size || !table.src.deviceAddress) {
table.dst = {};
continue;
}

struct {
u64 srcTable;
u64 dstTable;
u64 mappings;
u32 stride;
u32 handleSize;
u32 count;
u32 groupCount; // TODO: remove, not needed.
} pcr;

dlg_assert(table.src.size % 4u == 0u);
dlg_assert(table.src.stride % 4u == 0u);
dlg_assert(dev.rtProps.shaderGroupHandleSize % 4u == 0u);

// floor by design
auto maxCount = table.src.size / table.src.stride;
pcr.srcTable = table.src.deviceAddress;
pcr.dstTable = dstAddress;
pcr.mappings = mappingAddress;
pcr.stride = table.src.stride / 4u;
pcr.count = maxCount;
pcr.handleSize = dev.rtProps.shaderGroupHandleSize / 4u;
pcr.groupCount = rtcmd.state->pipe->groups.size();

dev.dispatch.CmdPushConstants(cb, pipe.pipeLayout().vkHandle(),
VK_SHADER_STAGE_COMPUTE_BIT, 0u, sizeof(pcr), &pcr);

auto gx = ceilDivide(maxCount, 64u);
dev.dispatch.CmdDispatch(cb, gx, 1u, 1u);

table.dst.deviceAddress = dstAddress;
table.dst.size = table.src.size;
table.dst.stride = table.src.stride;

dstAddress += table.src.size;
}

vku::cmdBarrier(dev, cb, this->shaderTable.asSpan(),
vku::SyncScope::computeWrite(),
vku::SyncScope::allShaderRead() | vku::SyncScope::readIndirectCommand());

dev.dispatch.CmdTraceRaysKHR(cb,
&tables[0].dst, &tables[1].dst, &tables[2].dst, &tables[3].dst,
rtcmd.width, rtcmd.height, rtcmd.depth);

info.rebindComputeState = true;
}

void CommandHookRecord::hookRecord(Command* cmd, RecordInfo& info) {
*info.maxHookLevel = std::max(*info.maxHookLevel, info.nextHookLevel);

Expand Down Expand Up @@ -1659,7 +1767,7 @@ void CommandHookRecord::beforeDstOutsideRp(Command& bcmd, RecordInfo& info) {
}

// capturePipe
if(info.ops.useCapturePipe) {
if(info.ops.shaderCapture) {
// TODO: only for debugging
dev.dispatch.CmdFillBuffer(cb, hook->shaderCaptureBuffer(),
0u, hook->shaderCaptureSize, 0x0u);
Expand All @@ -1668,9 +1776,9 @@ void CommandHookRecord::beforeDstOutsideRp(Command& bcmd, RecordInfo& info) {
vku::SyncScope::transferWrite(), vku::SyncScope::transferWrite());

Vec4u32 input {};
input[0] = info.ops.capturePipeInput[0];
input[1] = info.ops.capturePipeInput[1];
input[2] = info.ops.capturePipeInput[2];
input[0] = info.ops.shaderCaptureInput[0];
input[1] = info.ops.shaderCaptureInput[1];
input[2] = info.ops.shaderCaptureInput[2];
dev.dispatch.CmdUpdateBuffer(cb, hook->shaderCaptureBuffer(),
0u, sizeof(input), &input);
vku::cmdBarrier(dev, cb, vku::BufferSpan{
Expand Down Expand Up @@ -1731,15 +1839,15 @@ void CommandHookRecord::afterDstOutsideRp(Command& bcmd, RecordInfo& info) {
copyTransfer(bcmd, info, false);
}

if(info.ops.useCapturePipe) {
if(info.ops.shaderCapture) {
if(bcmd.category() == CommandCategory::draw) {
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_GRAPHICS,
static_cast<const DrawCmdBase&>(bcmd).boundPipe()->handle);
} else if(bcmd.category() == CommandCategory::dispatch) {
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_COMPUTE,
static_cast<const DispatchCmdBase&>(bcmd).boundPipe()->handle);
} else if(bcmd.category() == CommandCategory::traceRays) {
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_COMPUTE,
dev.dispatch.CmdBindPipeline(cb, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR,
static_cast<const TraceRaysCmdBase&>(bcmd).boundPipe()->handle);
} else {
dlg_error("Unexpected hooked command used with capturePipe");
Expand Down
5 changes: 5 additions & 0 deletions src/commandHook/record.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

namespace vil {

struct ShaderCaptureHook;

// Internal representation of a hooked recording of a CommandRecord.
// Is kept alive only as long as the associated Record is referencing this
// (since it might resubmitted again, making this useful) or there are
Expand Down Expand Up @@ -56,6 +58,8 @@ struct CommandHookRecord {
VkRenderPass rp1 {};
VkRenderPass rp2 {};

IntrusivePtr<ShaderCaptureHook> shaderCapture {}; // keep alive

IntrusivePtr<CommandHookState> state {};
OwnBuffer dummyBuf {};

Expand Down Expand Up @@ -179,6 +183,7 @@ struct CommandHookRecord {
// Called when we arrived ath the hooked command itself. Will make sure
// all barriers are set, render passes split correclty and copies are done.
void hookRecordDst(Command& dst, RecordInfo&);
void hookRecordDstHookShaderTable(Command& dst, RecordInfo&);

// Called immediately before recording the hooked command itself.
// Will perform all needed operations.
Expand Down
Loading

0 comments on commit 8868ba5

Please sign in to comment.