diff --git a/.gitmodules b/.gitmodules
index 40411be..5c196ff 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -5,7 +5,6 @@
[submodule "thirdparty/tinygltf"]
path = thirdparty/tinygltf
url = https://github.com/syoyo/tinygltf.git
- branch = master
[submodule "thirdparty/imgui"]
path = thirdparty/imgui
url = https://github.com/ocornut/imgui.git
diff --git a/ChangeLog.md b/ChangeLog.md
index 731dbe9..c2766d7 100644
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -1,5 +1,39 @@
# RTXGI SDK Change Log
+## 1.3.5
+
+### SDK
+- **Improvements**
+ - Adds the new **Probe Variability** feature to the ```DDGIVolume```
+ - This is an optional feature that tracks the [coefficient of variation](https://en.wikipedia.org/wiki/Coefficient_of_variation) of a ```DDGIVolume```
+ - This can be used to estimate of how converged the probes of the volume are. When the coefficient settles around a small value, it is likely the probes contain representative irradiance values and ray tracing and probe updates can be disabled until an event occurs that invalidates the light field
+ - See [Probe Variability](docs/DDGIVolume.md#probe-variability) in the documentation for more details
+ - Adds changes to ```DDGIVolume``` D3D12 resource transitions based on feedback from GitHub Issue #68 (thanks!)
+ - ```UpdateDDGIVolumes()``` can now be safely used on direct *and* compute command lists
+ - Irradiance, Distance, and Probe Data resources are now expected to be in the ```D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE``` state by default
+ - These resources can be transitioned to the required states for each workload using the new ```DDGIVolume::TransitionResources(...)``` function where appropriate (also see ```EDDGIExecutionStage```)
+
+### Test Harness
+- **Improvements**
+ - Adds support for the SDK's new [Probe Variability](docs/DDGIVolume.md#probe-variability) feature, including buffer visualization, UI toggles, and checks to disable/enable probe traces based on volume variability
+ - Adds support for Shader Execution Reordering in DDGI probe ray tracing and the reference Path Tracer (D3D12 only). Requires an RTX 4000 series (Ada) GPU.
+ - Adds NVAPI as a new dependency (Test Harness only)
+ - Improves acceleration structure organization
+ - Reorganizes how BLAS are created from GLTF2 Mesh and MeshPrimitives
+ - MeshPrimitives are now geometries of the same BLAS (instead of individual BLAS)
+ - This prevents bad traversal characteristics when MeshPrimitives create substantially overlapping BLAS and increases trace performance up to 2x
+ - Adds the GeometryData and MeshOffsets indirection buffers for looking up MeshPrimitive information
+ - Updates RGS and Hit Shaders to look up MeshPrimitive information using DXR 1.1 GeometryIndex() and the new indirection buffers
+ - Updates Closest Hit shaders to conform with the [GLTF 2.0 specification](https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#metallic-roughness-material) for how albedo values sampled from texture should be combined with ```baseColorFactor```. Fixes GitHub Issue #67.
+ - Updates scene cache serialization/deserialization
+ - Stores new information and now stores a scene cache file for .glb scenes too
+- **Bug Fixes**
+ - Updates DXC binaries to v1.7.2207 (on Windows) to fix a shader compilation issue
+ - Fixes issues with ```DDGIVolume``` name strings not being handled properly
+ - Fixes D3D12 resource state problems caught by the debug layer
+ - Fixes a handful of other minor issues
+
+
## 1.3.0
### SDK
diff --git a/docs/DDGIVolume.md b/docs/DDGIVolume.md
index 6c8b928..7830c23 100644
--- a/docs/DDGIVolume.md
+++ b/docs/DDGIVolume.md
@@ -138,6 +138,10 @@ struct DDGIVolumeResourceIndices
uint probeDistanceSRVIndex;
uint probeDataUAVIndex;
uint probeDataSRVIndex;
+ uint probeVariabilityUAVIndex;
+ uint probeVariabilitySRVIndex;
+ uint probeVariabilityAverageUAVIndex;
+ uint probeVariabilityAverageSRVIndex;
};
```
@@ -222,6 +226,7 @@ The main workloads executed by a ```DDGIVolume``` are implemented in three shade
* ProbeBlendingCS.hlsl
* ProbeRelocationCS.hlsl
* ProbeClassificationCS.hlsl
+ * ReductionCS.hlsl
To make it possible to directly use the SDK's shader files in your codebase (with or without the RTXGI SDK), shader functionality is configured through shader compiler defines. All shaders support both traditionally bound and bindless resource access methods.
@@ -334,6 +339,10 @@ The following defines provide SDK shaders with the information necessary to unde
* *D3D12:* the UAV shader register ```X``` and space ```Y``` of the DDGIVolume probe data texture array.
* *Vulkan:* the binding slot ```X``` and descriptor set index ```Y``` of the DDGIVolume probe data texture array.
+```PROBE_VARIABILITY_REGISTER [uX|X]``` ```PROBE_VARIABILITY_SPACE [spaceY|Y]```
+ * *D3D12:* the UAV shader register ```X``` and space ```Y``` of the DDGIVolume probe variability texture array.
+ * *Vulkan:* the binding slot ```X``` and descriptor set index ```Y``` of the DDGIVolume probe variability texture array.
+
---
### [```ProbeBlendingCS.hlsl```](../rtxgi-sdk/shaders/ddgi/ProbeBlendingCS.hlsl)
@@ -438,14 +447,40 @@ struct ProbeRelocationBytecode
---
+### [```ReductionCS.hlsl```](../rtxgi-sdk/shaders/ddgi/ReductionCS.hlsl)
+
+This file contains compute shader code that reduces the probe variability texture down to a single value. See [Probe Variability](#probe-variability) for more information.
+
+This shader is used by the ```rtxgi::[d3d12|vulkan]::CalculateDDGIVolumeVariability(...)``` function.
+
+**Compilation Instructions**
+
+This shader file provides two entry points:
+ - ```DDGIReductionCS()``` - performs initial reduction pass on per-probe-texel variability data.
+ - ```DDGIExtraReductionCS()``` - if the probe variability texture is too large to reduce down to one value in a single pass, this shader will perform additional reductions and can be run repeatedly until the output reaches a single value.
+
+Pass compiled shader bytecode or pipeline state objects to the `ProbeVariabilityBytecode` or `ProbeVariability[PSO|Pipeline]` structs that corresponds to the entry points in the shader file (see below).
+
+```C++
+struct ProbeVariabilityBytecode
+{
+ ShaderBytecode reductionCS; // DDGIReductionCS() entry point
+ ShaderBytecode extraReductionCS; // DDGIExtraReductionCS() entry point
+};
+```
+
+---
+
## Texture Layout
-The ```DDGIVolume``` uses four texture arrays to store its data:
+The ```DDGIVolume``` uses six texture arrays to store its data:
1. Probe Ray Data
2. Probe Irradiance
3. Probe Distance
4. Probe Data
+ 5. Probe Variability
+ 6. Probe Variability Average
### Probe Ray Data
@@ -514,6 +549,32 @@ Within a texel:
Figure 7: A visualization of the Probe Data texture (zoomed) for the Crytek Sponza scene
+### Probe Variability
+
+This texture array stores the [coefficient of variation](https://en.wikipedia.org/wiki/Coefficient_of_variation) for all probe irradiance texels in a volume used by [Probe Variability](#probe-variability). The texture dimensions and layout are the same as the irradiance texture array ***with probe border texels omitted***. This texture array has a single channel that stores the scalar coefficient of variation value.
+
+Below is a visualization of the texture array. The visualization defines a threshold value, then marks inactive probes in blue, below-threshold values in red, and above-threshold values in green.
+
+
+
+Figure 8: A visualization of the Probe Variability texture array for the Cornell Box scene
+
+
+### Probe Variability Average
+
+[Probe Variability](#probe-variability) averages the coefficient of variation of all probes in a volume to generate a single variability value. This texture array stores the intermediate values used in the averaging process. The final average variability value is stored in texel (0, 0) when the reduction passes complete.
+
+This texture array has two channels:
+ - The averaged coefficient of variation is stored in the R channel
+ - A weight the reduction shader uses to average contributions from all probes is stored in the G channel
+
+Below is a visualization of the texture array. The visualization defines a threshold value, then marks inactive probes in blue, below-threshold values in red, and above-threshold values in green.
+
+
+
+Figure 9: A visualization of the Probe Variability Average texture for the Cornell Box scene
+
+
### Probe Count Limits
In addition to the available memory of the physical device, the number of probes a volume can contain is bounded by the graphics API's limits on texture (array) resources.
@@ -636,7 +697,7 @@ Critically, instead of adjusting the position of all probes when the active area
-Figure 8: Infinite Scrolling Volume Movement
+Figure 10: Infinite Scrolling Volume Movement
ISVs are also useful when dynamic indirect lighting is desired around the camera view or a player character. Anchor the infinite scrolling volume to the camera view or a player character and use the camera or player's movement to drive the volume's scrolling of the active area.
@@ -661,7 +722,7 @@ Any regular grid of sampling points will struggle to robustly handle all content
-Figure 9: (Left) Probes falling inside wall geometry without probe relocation. (Right) Probes adjusted to a more useful locations with probe relocation enabled.
+Figure 11: (Left) Probes falling inside wall geometry without probe relocation. (Right) Probes adjusted to a more useful locations with probe relocation enabled.
To use Probe Relocation:
@@ -692,7 +753,7 @@ Classification is executed in two phases:
-Figure 10: Disabled probes are highlighted with red outlines. Probes inside of geometry or with no surrounding geometry are disabled.
+Figure 12: Disabled probes are highlighted with red outlines. Probes inside of geometry or with no surrounding geometry are disabled.
@@ -707,9 +768,19 @@ The number of fixed rays is specified by the ```RTXGI_DDGI_NUM_FIXED_RAYS``` def
-Figure 11: The default fixed rays distribution used in probe relocation and classification.
+Figure 13: The default fixed rays distribution used in probe relocation and classification.
+# Probe Variability
+
+It is often the case that the irradiance estimates stored in ```DDGIVolume``` probes contain a non-zero level of variance (per octahedral texel), even after a substantial quantity of samples have been accumulated. In fact, it is possible that **some (or even all) texels of a given probe may never fully converge**. This results in a continuous amount of low frequency noise in indirect lighting estimates computed from a ```DDGIVolume```. While this is not a problem (visually) in a single frame (i.e. the estimate is still reasonable), the low frequency noise *changes randomly* each frame. This produces objectionable temporal artifacts.
+
+To address this problem, ```DDGIVolume``` are now able to track probe variability. Probe Variability measures an average [coefficient of variation](https://en.wikipedia.org/wiki/Coefficient_of_variation) across the volume's probes. This serves as an estimate of how voliatile the volume's estimate of the light field is from one update to the next. As more samples are blended in and probe irradiance estimates improve, the measured variability will decrease towards zero.
+
+Importantly, the **variability value may not ever reach zero**. Instead, probe irradiance estimates eventually settle in a state where the variability stays within a given range. At this point, probes are converged 'enough' and the objectionable low frequency noise can be avoided by pausing probe ray tracing and blending updates for the ```DDGIVolume```. When an event that triggers a change to the volume's light field occurs (e.g. a light or object moves, an explosion occurs, weather changes, etc) ray tracing and blending updates should be re-enabled until the variability measure settles again.
+
+The range and stability of probe variability values depends on several factors including: the extent of the ```DDGIVolume```, the distribution of probes, the number of rays traced per probe, and the light transport characteristics of the scene. As a result, the SDK exposes the measured variability and expects the application to make decisions to handle variability ranges and updates.
+
# Rules of Thumb
Below are rules of thumb related to ```DDGIVolume``` configuration and how a volume's settings affect the lighting results and content creation.
diff --git a/docs/Integration.md b/docs/Integration.md
index e1afb26..647dbde 100644
--- a/docs/Integration.md
+++ b/docs/Integration.md
@@ -33,7 +33,9 @@ At Render-Time
- *Tip:* use the SDK's ```RelocateDDGIVolumeProbes()``` function
5. [**Classify Probes (optional)**](DDGIVolume.md#probe-classification) within relevant, active ```DDGIVolume```s to deactivate tracing and blending operations for probes that do not contribute to the final result
- *Tip:* use the SDK's ```ClassifyDDGIVolumeProbes()``` function
-6. [**Query Irradiance**](#querying-irradiance-with-a-ddgivolume) from relevant, active ```DDGIVolume```s to gather indirect lighting in screen-space
+6. [**Calculate Variability (optional)**](DDGIVolume.md#probe-variability) within relevant, active ```DDGIVolume```s to generate variability measurements for the current update, then use these values to determine if the volume should remain active or not
+ - *Tip:* use the SDK's ```CalculateDDGIVolumeVariability()``` and ```ReadbackDDGIVolumeVariability()``` functions
+7. [**Query Irradiance**](#querying-irradiance-with-a-ddgivolume) from relevant, active ```DDGIVolume```s to gather indirect lighting in screen-space
### Implementation Details
diff --git a/docs/images/ddgivolume-textures-probevariability-avg.jpg b/docs/images/ddgivolume-textures-probevariability-avg.jpg
new file mode 100644
index 0000000..9a06c4d
Binary files /dev/null and b/docs/images/ddgivolume-textures-probevariability-avg.jpg differ
diff --git a/docs/images/ddgivolume-textures-probevariability.jpg b/docs/images/ddgivolume-textures-probevariability.jpg
new file mode 100644
index 0000000..03f9796
Binary files /dev/null and b/docs/images/ddgivolume-textures-probevariability.jpg differ
diff --git a/docs/images/integration-ddgi-flow.svg b/docs/images/integration-ddgi-flow.svg
index ee6ea78..376fbb5 100644
--- a/docs/images/integration-ddgi-flow.svg
+++ b/docs/images/integration-ddgi-flow.svg
@@ -2,11 +2,11 @@
+ id="defs1148">
+ id="E" />
+ id="F" />
+ id="G" />
+ id="H" />
+ id="I" />
+ id="J" />
+ id="K" />
+ id="L" />
+ id="M" />
+ id="N" />
+ id="O" />
+ id="P" />
+ id="Q" />
+ id="R" />
+ id="S" />
+ id="T" />
+ id="U" />
+ id="b">
+ id="V" />
+ id="W" />
+ id="X" />
+ id="Y" />
+ id="Z" />
+ id="aa" />
+ id="ab" />
+ id="ac" />
+ id="ad" />
+ id="ae" />
+ id="af" />
+ id="ag" />
+ id="ah" />
+ id="ai" />
+ id="c">
+ id="aj" />
+ id="d">
+ id="ak" />
+ id="al" />
+ id="e">
+ id="am" />
+ id="an" />
+ id="f">
+ id="g">
+ id="h">
+ id="ao" />
+ id="i">
+ id="ap" />
+ id="j">
+ id="l">
+ id="m">
+ id="aq" />
+ id="ar" />
+ id="n">
+ id="o">
+ id="p">
+ id="as" />
+ id="q">
+ id="r">
+ id="s">
+ id="t">
+ id="u">
+ id="v">
+ id="at" />
+ id="w">
+ id="x">
-
-
-
-
+ id="y">
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ id="defs2551">
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ d="M 114,-163 C 36,-179 61,-72 57,0 H 25 l -1,-190 h 30 c 1,12 -1,29 2,39 6,-27 23,-49 58,-41 v 29"
+ id="path3370" />
+ d="M 141,0 90,-78 38,0 H 4 L 72,-98 7,-190 h 35 l 48,74 47,-74 h 35 l -64,92 68,98 h -35"
+ id="path3372" />
+ d="M 177,-190 C 167,-65 218,103 67,71 44,65 29,51 23,28 L 55,23 C 70,70 155,55 144,-5 V -35 C 133,-14 115,1 83,1 29,1 15,-40 15,-95 c 0,-56 16,-97 71,-98 29,-1 48,16 59,35 1,-10 0,-23 2,-32 z M 94,-22 c 36,0 50,-32 50,-73 0,-42 -14,-75 -50,-75 -39,0 -46,34 -46,75 0,41 6,73 46,73"
+ id="path3374" />
+ d="m 24,-231 v -30 h 32 v 30 z M 24,0 V -190 H 56 V 0 H 24"
+ id="path3376" />
+ d="m 126,-127 c 33,6 58,20 58,59 0,88 -139,92 -164,29 -3,-8 -5,-16 -6,-25 l 32,-3 c 6,27 21,44 54,44 32,0 52,-15 52,-46 0,-38 -36,-46 -79,-43 v -28 c 39,1 72,-4 72,-42 0,-27 -17,-43 -46,-43 -28,0 -47,15 -49,41 l -32,-3 c 6,-42 35,-63 81,-64 48,-1 79,21 79,65 0,36 -21,52 -52,59"
+ id="path3378" />
+ d="m 27,0 v -27 h 64 v -190 l -56,39 v -29 l 58,-41 h 29 v 221 h 61 V 0 H 27"
+ id="path3380" />
+ d="m 101,-251 c 82,-7 93,87 43,132 L 82,-64 C 71,-53 59,-42 53,-27 H 182 V 0 H 18 c 2,-99 128,-94 128,-182 0,-28 -16,-43 -45,-43 -29,0 -46,15 -49,41 l -32,-3 c 6,-41 34,-60 81,-64"
+ id="path3382" />
+ d="m 212,-179 c -10,-28 -35,-45 -73,-45 -59,0 -87,40 -87,99 0,60 29,101 89,101 43,0 62,-24 78,-52 l 27,14 C 228,-24 195,4 139,4 59,4 22,-46 18,-125 c -6,-104 99,-153 187,-111 19,9 31,26 39,46"
+ id="path3384" />
+ d="m 117,-194 c 89,-4 53,116 60,194 h -32 v -121 c 0,-31 -8,-49 -39,-48 C 34,-167 62,-67 57,0 H 25 l -1,-190 h 30 c 1,10 -1,24 2,32 11,-22 29,-35 61,-36"
+ id="path3386" />
+ d="m 135,-143 c -3,-34 -86,-38 -87,0 15,53 115,12 119,90 4,78 -150,74 -157,8 l 28,-5 c 4,36 97,45 98,0 -10,-56 -113,-15 -118,-90 -4,-57 82,-63 122,-42 12,7 21,19 24,35"
+ id="path3388" />
+ d="M 106,-169 C 34,-169 62,-67 57,0 H 25 v -261 h 32 l -1,103 c 12,-21 28,-36 61,-36 89,0 53,116 60,194 h -32 v -121 c 2,-32 -8,-49 -39,-48"
+ id="path3390" />
+ d="m 96,-169 c -40,0 -48,33 -48,73 0,40 9,75 48,75 24,0 41,-14 43,-38 l 32,2 C 165,-20 140,4 97,4 38,4 21,-37 15,-95 c -10,-93 101,-131 147,-64 4,7 5,14 7,22 l -32,3 c -4,-21 -16,-35 -41,-35"
+ id="path3392" />
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ d="m 68,-38 c 1,34 0,65 -14,84 H 32 C 41,33 49,20 49,0 H 33 v -38 h 35"
+ id="path3532" />
+
+
+
+
+
+
+
+
+
+
+ d="M 190,0 58,-211 59,0 H 30 v -248 h 39 l 133,213 -2,-213 h 31 V 0 h -41"
+ id="path3552" />
+ d="M 127,-220 V 0 H 93 V -220 H 8 v -28 h 204 v 28 h -85"
+ id="path3554" />
+
+
+
+
+
+
+ d="m 115,-194 c 53,0 69,39 70,98 C 185,-30 162,4 115,4 84,3 66,-7 56,-30 L 54,0 H 23 l 1,-261 h 32 v 101 c 10,-23 28,-34 59,-34 z m -8,174 c 40,0 45,-34 45,-75 0,-40 -5,-75 -45,-74 -42,0 -51,32 -51,76 0,43 10,73 51,73"
+ id="path3566" />
+ d="m 101,-234 c -31,-9 -42,10 -38,44 h 38 v 23 H 63 V 0 H 32 V -167 H 5 v -23 h 27 c -7,-52 17,-82 69,-68 v 24"
+ id="path3568" />
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ id="g3650">
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+ d="m 30,-248 c 87,1 191,-15 191,75 0,78 -77,80 -158,76 V 0 H 30 Z m 33,125 c 57,0 124,11 124,-50 0,-59 -68,-47 -124,-48 v 98"
+ id="path3672" />
+ id="g3800">
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
-
-
-
-
-
-
+ id="g3824">
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
+
+
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ id="g4366">
-
-
-
-
-
-
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
> 20) & 0x00000007);
output.probeRelocationEnabled = (bool)((input.packed4 >> 23) & 0x00000001);
output.probeClassificationEnabled = (bool)((input.packed4 >> 24) & 0x00000001);
- output.probeScrollClear[0] = (bool)((input.packed4 >> 25) & 0x00000001);
- output.probeScrollClear[1] = (bool)((input.packed4 >> 26) & 0x00000001);
- output.probeScrollClear[2] = (bool)((input.packed4 >> 27) & 0x00000001);
- output.probeScrollDirections[0] = (bool)((input.packed4 >> 28) & 0x00000001);
- output.probeScrollDirections[1] = (bool)((input.packed4 >> 29) & 0x00000001);
- output.probeScrollDirections[2] = (bool)((input.packed4 >> 30) & 0x00000001);
+ output.probeVariabilityEnabled = (bool)((input.packed4 >> 25) & 0x00000001);
+ output.probeScrollClear[0] = (bool)((input.packed4 >> 26) & 0x00000001);
+ output.probeScrollClear[1] = (bool)((input.packed4 >> 27) & 0x00000001);
+ output.probeScrollClear[2] = (bool)((input.packed4 >> 28) & 0x00000001);
+ output.probeScrollDirections[0] = (bool)((input.packed4 >> 29) & 0x00000001);
+ output.probeScrollDirections[1] = (bool)((input.packed4 >> 30) & 0x00000001);
+ output.probeScrollDirections[2] = (bool)((input.packed4 >> 31) & 0x00000001);
return output;
}
diff --git a/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_D3D12.h b/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_D3D12.h
index 83665bf..a98e71b 100644
--- a/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_D3D12.h
+++ b/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_D3D12.h
@@ -33,6 +33,14 @@ namespace rtxgi
COUNT
};
+ enum class EDDGIExecutionStage
+ {
+ POST_PROBE_TRACE = 0,
+ PRE_GATHER_CS,
+ PRE_GATHER_PS,
+ POST_GATHER_PS,
+ };
+
//------------------------------------------------------------------------
// Managed Resource Mode (SDK manages volume resources)
//------------------------------------------------------------------------
@@ -49,6 +57,12 @@ namespace rtxgi
ShaderBytecode resetCS; // Probe classification reset compute shader bytecode
};
+ struct ProbeVariabilityByteCode
+ {
+ ShaderBytecode reductionCS; // Probe variability reduction compute shader bytecode
+ ShaderBytecode extraReductionCS; // Probe variability reduction extra passes compute shader bytecode
+ };
+
struct DDGIVolumeManagedResourcesDesc
{
bool enabled = false; // Enable or disable managed resources mode
@@ -58,8 +72,10 @@ namespace rtxgi
// Shader bytecode
ShaderBytecode probeBlendingIrradianceCS; // Probe blending (irradiance) compute shader bytecode
ShaderBytecode probeBlendingDistanceCS; // Probe blending (distance) compute shader bytecode
- ProbeRelocationBytecode probeRelocation; // [Optional] Probe Relocation bytecode
- ProbeClassificationBytecode probeClassification; // [Optional] Probe Classification bytecode
+
+ ProbeRelocationBytecode probeRelocation; // Probe Relocation bytecode
+ ProbeClassificationBytecode probeClassification; // Probe Classification bytecode
+ ProbeVariabilityByteCode probeVariability; // Probe Classification bytecode
};
//------------------------------------------------------------------------
@@ -78,6 +94,12 @@ namespace rtxgi
ID3D12PipelineState* resetPSO = nullptr; // Probe classification reset compute PSO
};
+ struct ProbeVariabilityPSO
+ {
+ ID3D12PipelineState* reductionPSO = nullptr; // Probe variability averaging PSO
+ ID3D12PipelineState* extraReductionPSO = nullptr; // Probe variability extra reduction PSO
+ };
+
struct DDGIVolumeUnmanagedResourcesDesc
{
bool enabled = false; // Enable or disable unmanaged resources mode
@@ -96,12 +118,17 @@ namespace rtxgi
ID3D12Resource* probeIrradiance = nullptr; // Probe irradiance texture array - RGB irradiance, encoded with a high gamma curve
ID3D12Resource* probeDistance = nullptr; // Probe distance texture array - R: mean distance | G: mean distance^2
ID3D12Resource* probeData = nullptr; // Probe data texture array - XYZ: world-space relocation offsets | W: classification state
+ ID3D12Resource* probeVariability = nullptr; // Probe variability texture array
+ ID3D12Resource* probeVariabilityAverage = nullptr; // Average of Probe variability for whole volume
+ ID3D12Resource* probeVariabilityReadback = nullptr; // CPU-readable resource containing final Probe variability average
// Pipeline State Objects
ID3D12PipelineState* probeBlendingIrradiancePSO = nullptr; // Probe blending (irradiance) compute PSO
ID3D12PipelineState* probeBlendingDistancePSO = nullptr; // Probe blending (distance) compute PSO
- ProbeRelocationPSO probeRelocation; // [Optional] Probe Relocation PSOs
- ProbeClassificationPSO probeClassification; // [Optional] Probe Classification PSOs
+
+ ProbeRelocationPSO probeRelocation; // Probe Relocation PSOs
+ ProbeClassificationPSO probeClassification; // Probe Classification PSOs
+ ProbeVariabilityPSO probeVariabilityPSOs; // Probe Variability PSOs
};
//------------------------------------------------------------------------
@@ -193,6 +220,11 @@ namespace rtxgi
*/
ERTXGIStatus ClearProbes(ID3D12GraphicsCommandList* cmdList);
+ /**
+ * Transitions volume resources to the appropriate state(s) for the given execution stage
+ */
+ void TransitionResources(ID3D12GraphicsCommandList* cmdList, EDDGIExecutionStage stage) const;
+
/**
* Releases resources owned by the volume
*/
@@ -210,7 +242,7 @@ namespace rtxgi
UINT GetRootParamSlotRootConstants() const { return m_rootParamSlotRootConstants; };
UINT GetRootParamSlotResourceDescriptorTable() const { return m_rootParamSlotResourceDescriptorTable; }
UINT GetRootParamSlotSamplerDescriptorTable() const { return m_rootParamSlotSamplerDescriptorTable; }
- DDGIRootConstants GetRootConstants() const { return { m_desc.index, m_descriptorHeapDesc.constantsIndex, m_descriptorHeapDesc.resourceIndicesIndex, 0 }; };
+ DDGIRootConstants GetRootConstants() const { return { m_desc.index, m_descriptorHeapDesc.constantsIndex, m_descriptorHeapDesc.resourceIndicesIndex, 0, 0, 0, 0 }; };
bool GetBindlessEnabled() const { return m_bindlessResources.enabled; }
EBindlessType GetBindlessType() const { return m_bindlessResources.type; }
@@ -237,12 +269,16 @@ namespace rtxgi
EDDGIVolumeTextureFormat GetIrradianceFormat() const { return m_desc.probeIrradianceFormat; }
EDDGIVolumeTextureFormat GetDistanceFormat() const { return m_desc.probeDistanceFormat; }
EDDGIVolumeTextureFormat GetProbeDataFormat() const { return m_desc.probeDataFormat; }
+ EDDGIVolumeTextureFormat GetProbeVariabilityFormat() const { return m_desc.probeVariabilityFormat; }
// Texture Arrays
ID3D12Resource* GetProbeRayData() const { return m_probeRayData; }
ID3D12Resource* GetProbeIrradiance() const { return m_probeIrradiance; }
ID3D12Resource* GetProbeDistance() const { return m_probeDistance; }
ID3D12Resource* GetProbeData() const { return m_probeData; }
+ ID3D12Resource* GetProbeVariability() const { return m_probeVariability; }
+ ID3D12Resource* GetProbeVariabilityAverage() const { return m_probeVariabilityAverage; }
+ ID3D12Resource* GetProbeVariabilityReadback() const { return m_probeVariabilityReadback; }
// Pipeline State Objects
ID3D12PipelineState* GetProbeBlendingIrradiancePSO() const { return m_probeBlendingIrradiancePSO; }
@@ -251,6 +287,8 @@ namespace rtxgi
ID3D12PipelineState* GetProbeRelocationResetPSO() const { return m_probeRelocationResetPSO; }
ID3D12PipelineState* GetProbeClassificationPSO() const { return m_probeClassificationPSO; }
ID3D12PipelineState* GetProbeClassificationResetPSO() const { return m_probeClassificationResetPSO; }
+ ID3D12PipelineState* GetProbeVariabilityReductionPSO() const { return m_probeVariabilityReductionPSO; }
+ ID3D12PipelineState* GetProbeVariabilityExtraReductionPSO() const { return m_probeVariabilityExtraReductionPSO; }
//------------------------------------------------------------------------
// Resource Setters
@@ -286,12 +324,15 @@ namespace rtxgi
void SetIrradianceFormat(EDDGIVolumeTextureFormat format) { m_desc.probeIrradianceFormat = format; }
void SetDistanceFormat(EDDGIVolumeTextureFormat format) { m_desc.probeDistanceFormat = format; }
void SetProbeDataFormat(EDDGIVolumeTextureFormat format) { m_desc.probeDataFormat = format; }
+ void SetProbeVariabilityFormat(EDDGIVolumeTextureFormat format) { m_desc.probeVariabilityFormat = format; }
#if !RTXGI_DDGI_RESOURCE_MANAGEMENT
void SetProbeRayData(ID3D12Resource* ptr) { m_probeRayData = ptr; }
void SetProbeIrradiance(ID3D12Resource* ptr) { m_probeIrradiance = ptr; }
void SetProbeDistance(ID3D12Resource* ptr) { m_probeDistance = ptr; }
void SetProbeData(ID3D12Resource* ptr) { m_probeData = ptr; }
+ void SetProbeVariability(ID3D12Resource* ptr) { m_probeVariability = ptr; }
+ void SetProbeVariabilityAverage(ID3D12Resource* ptr) { m_probeVariabilityAverage = ptr; }
#endif
private:
@@ -310,6 +351,9 @@ namespace rtxgi
ID3D12Resource* m_probeIrradiance = nullptr; // Probe irradiance texture array - RGB: irradiance, encoded with a high gamma curve
ID3D12Resource* m_probeDistance = nullptr; // Probe distance texture array - R: mean distance | G: mean distance^2
ID3D12Resource* m_probeData = nullptr; // Probe data texture array - XYZ: world-space relocation offsets | W: classification state
+ ID3D12Resource* m_probeVariability = nullptr; // Probe luminance difference from previous update
+ ID3D12Resource* m_probeVariabilityAverage = nullptr; // Average Probe variability for whole volume
+ ID3D12Resource* m_probeVariabilityReadback = nullptr; // CPU-readable buffer with average Probe variability
// Render Target Views
D3D12_CPU_DESCRIPTOR_HANDLE m_probeIrradianceRTV = { 0 }; // Probe irradiance render target view
@@ -334,6 +378,8 @@ namespace rtxgi
ID3D12PipelineState* m_probeRelocationResetPSO = nullptr; // Probe relocation reset compute shader pipeline state object
ID3D12PipelineState* m_probeClassificationPSO = nullptr; // Probe classification compute shader pipeline state object
ID3D12PipelineState* m_probeClassificationResetPSO = nullptr; // Probe classification reset compute shader pipeline state object
+ ID3D12PipelineState* m_probeVariabilityReductionPSO = nullptr; // Probe variability reduction
+ ID3D12PipelineState* m_probeVariabilityExtraReductionPSO = nullptr; // Probe variability extra reduction pass
#if RTXGI_DDGI_RESOURCE_MANAGEMENT
ID3D12DescriptorHeap* m_rtvDescriptorHeap = nullptr; // Descriptor heap for render target views
@@ -349,6 +395,8 @@ namespace rtxgi
bool CreateProbeIrradiance(const DDGIVolumeDesc& desc);
bool CreateProbeDistance(const DDGIVolumeDesc& desc);
bool CreateProbeData(const DDGIVolumeDesc& desc);
+ bool CreateProbeVariability(const DDGIVolumeDesc& desc);
+ bool CreateProbeVariabilityAverage(const DDGIVolumeDesc& desc);
bool IsDeviceChanged(const DDGIVolumeManagedResourcesDesc& desc)
{
@@ -380,20 +428,32 @@ namespace rtxgi
/**
* Updates one or more volume's probes using data in the volume's radiance texture.
* Probe blending and border update workloads are batched together for better performance.
+ * Volume resources are expected to be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS state.
*/
RTXGI_API ERTXGIStatus UpdateDDGIVolumeProbes(ID3D12GraphicsCommandList* cmdList, UINT numVolumes, DDGIVolume** volumes);
/**
* Adjusts one or more volume's world-space probe positions to avoid them being too close to or inside of geometry.
* If a volume has the reset flag set, all probe relocation offsets are set to zero before relocation occurs.
+ * Volume resources are expected to be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS state.
*/
RTXGI_API ERTXGIStatus RelocateDDGIVolumeProbes(ID3D12GraphicsCommandList* cmdList, UINT numVolumes, DDGIVolume** volumes);
/**
* Classifies one or more volume's probes as active or inactive based on the hit distance data in the ray data texture.
* If a volume has the reset flag set, all probes are set to active before classification occurs.
+ * Volume resources are expected to be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS state.
*/
RTXGI_API ERTXGIStatus ClassifyDDGIVolumeProbes(ID3D12GraphicsCommandList* cmdList, UINT numVolumes, DDGIVolume** volumes);
+ /**
+ * Calculates average variability for all probes in each provided volume
+ */
+ RTXGI_API ERTXGIStatus CalculateDDGIVolumeVariability(ID3D12GraphicsCommandList* cmdList, UINT numVolumes, DDGIVolume** volumes);
+
+ /**
+ * Reads back average variability for each provided volume, at the time of the call
+ */
+ RTXGI_API ERTXGIStatus ReadbackDDGIVolumeVariability(UINT numVolumes, DDGIVolume** volumes);
} // namespace d3d12
} // namespace rtxgi
diff --git a/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_VK.h b/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_VK.h
index 51626be..beac895 100644
--- a/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_VK.h
+++ b/rtxgi-sdk/include/rtxgi/ddgi/gfx/DDGIVolume_VK.h
@@ -33,6 +33,8 @@ namespace rtxgi
ProbeIrradiance,
ProbeDistance,
ProbeData,
+ ProbeVariability,
+ ProbeVariabilityAverage
};
//------------------------------------------------------------------------
@@ -51,6 +53,12 @@ namespace rtxgi
ShaderBytecode resetCS; // Probe classification reset compute shader bytecode
};
+ struct ProbeVariabilityByteCode
+ {
+ ShaderBytecode reductionCS; // Probe variability reduction compute shader bytecode
+ ShaderBytecode extraReductionCS; // Probe variability reduction extra passes compute shader bytecode
+ };
+
struct DDGIVolumeManagedResourcesDesc
{
bool enabled = false; // Enable or disable managed resources mode
@@ -62,8 +70,10 @@ namespace rtxgi
// Shader bytecode
ShaderBytecode probeBlendingIrradianceCS; // Probe blending (irradiance) compute shader bytecode
ShaderBytecode probeBlendingDistanceCS; // Probe blending (distance) compute shader bytecode
- ProbeRelocationBytecode probeRelocation; // [Optional] Probe Relocation bytecode
- ProbeClassificationBytecode probeClassification; // [Optional] Probe Classification bytecode
+
+ ProbeRelocationBytecode probeRelocation; // Probe Relocation bytecode
+ ProbeClassificationBytecode probeClassification; // Probe Classification bytecode
+ ProbeVariabilityByteCode probeVariability; // Probe Classification bytecode
};
//------------------------------------------------------------------------
@@ -88,6 +98,15 @@ namespace rtxgi
VkPipeline resetPipeline = nullptr; // Probe classification reset compute pipeline
};
+ struct ProbeVariabilityPipeline
+ {
+ VkShaderModule reductionModule = nullptr; // Probe variability reduction shader module
+ VkShaderModule extraReductionModule = nullptr; // Probe variability reduction extra passes shader module
+
+ VkPipeline reductionPipeline = nullptr; // Probe variability reduction compute pipeline
+ VkPipeline extraReductionPipeline = nullptr; // Probe variability extra reduction compute pipeline
+ };
+
struct DDGIVolumeUnmanagedResourcesDesc
{
bool enabled = false; // Enable or disable unmanaged resources mode
@@ -100,18 +119,26 @@ namespace rtxgi
VkImage probeIrradiance = nullptr; // Probe irradiance texture array - RGB: irradiance, encoded with a high gamma curve
VkImage probeDistance = nullptr; // Probe distance texture array - R: mean distance | G: mean distance^2
VkImage probeData = nullptr; // Probe data texture array - XYZ: world-space relocation offsets | W: classification state
+ VkImage probeVariability = nullptr; // Probe variability texture array
+ VkImage probeVariabilityAverage = nullptr; // Average of Probe variability for whole volume
+ VkBuffer probeVariabilityReadback = nullptr; // CPU-readable resource containing final Probe variability average
// Texture Memory
VkDeviceMemory probeRayDataMemory = nullptr; // Probe ray data texture array device memory
VkDeviceMemory probeIrradianceMemory = nullptr; // Probe irradiance texture array device memory
VkDeviceMemory probeDistanceMemory = nullptr; // Probe distance texture array device memory
VkDeviceMemory probeDataMemory = nullptr; // Probe data texture array device memory
+ VkDeviceMemory probeVariabilityMemory = nullptr; // Probe variability texture array device memory
+ VkDeviceMemory probeVariabilityAverageMemory = nullptr; // Probe variability average texture device memory
+ VkDeviceMemory probeVariabilityReadbackMemory = nullptr; // Probe variability readback texture device memory
// Texture Views
VkImageView probeRayDataView = nullptr; // Probe ray data texture array view
VkImageView probeIrradianceView = nullptr; // Probe irradiance texture array view
VkImageView probeDistanceView = nullptr; // Probe distance texture array view
VkImageView probeDataView = nullptr; // Probe data texture array view
+ VkImageView probeVariabilityView = nullptr; // Probe variability texture array view
+ VkImageView probeVariabilityAverageView = nullptr; // Probe variability average texture view
// Shader Modules
VkShaderModule probeBlendingIrradianceModule = nullptr; // Probe blending (irradiance) shader module
@@ -120,8 +147,10 @@ namespace rtxgi
// Pipelines
VkPipeline probeBlendingIrradiancePipeline = nullptr; // Probe blending (irradiance) compute pipeline
VkPipeline probeBlendingDistancePipeline = nullptr; // Probe blending (distance) compute pipeline
- ProbeRelocationPipeline probeRelocation; // [Optional] Probe Relocation pipelines
- ProbeClassificationPipeline probeClassification; // [Optional] Probe Classification pipelines
+
+ ProbeRelocationPipeline probeRelocation; // Probe Relocation pipelines
+ ProbeClassificationPipeline probeClassification; // Probe Classification pipelines
+ ProbeVariabilityPipeline probeVariabilityPipelines; // Probe Variability pipelines
};
//------------------------------------------------------------------------
@@ -238,7 +267,7 @@ namespace rtxgi
// Push Constants
uint32_t GetPushConstantsOffset() const { return m_pushConstantsOffset; }
- DDGIRootConstants GetPushConstants() const { return { m_desc.index, 0, 0 }; }
+ DDGIRootConstants GetPushConstants() const { return { m_desc.index, 0, 0, 0, 0, 0 }; }
// Resource Indices (Bindless)
DDGIVolumeResourceIndices GetResourceIndices() const { return m_bindlessResources.resourceIndices; }
@@ -258,24 +287,33 @@ namespace rtxgi
EDDGIVolumeTextureFormat GetIrradianceFormat() const { return m_desc.probeIrradianceFormat; }
EDDGIVolumeTextureFormat GetDistanceFormat() const { return m_desc.probeDistanceFormat; }
EDDGIVolumeTextureFormat GetProbeDataFormat() const { return m_desc.probeDataFormat; }
+ EDDGIVolumeTextureFormat GetProbeVariabilityFormat() const { return m_desc.probeVariabilityFormat; }
// Texture Arrays
VkImage GetProbeRayData() const { return m_probeRayData; }
VkImage GetProbeIrradiance() const { return m_probeIrradiance; }
VkImage GetProbeDistance() const { return m_probeDistance; }
VkImage GetProbeData() const { return m_probeData; }
+ VkImage GetProbeVariability() const { return m_probeVariability; }
+ VkImage GetProbeVariabilityAverage() const { return m_probeVariabilityAverage; }
+ VkBuffer GetProbeVariabilityReadback() const { return m_probeVariabilityReadback; }
// Texture Array Memory
VkDeviceMemory GetProbeRayDataMemory() const { return m_probeRayDataMemory; }
VkDeviceMemory GetProbeIrradianceMemory() const { return m_probeIrradianceMemory; }
VkDeviceMemory GetProbeDistanceMemory() const { return m_probeDistanceMemory; }
VkDeviceMemory GetProbeDataMemory() const { return m_probeDataMemory; }
+ VkDeviceMemory GetProbeVariabilityMemory() const { return m_probeVariabilityMemory; }
+ VkDeviceMemory GetProbeVariabilityAverageMemory() const { return m_probeVariabilityAverageMemory; }
+ VkDeviceMemory GetProbeVariabilityReadbackMemory() const { return m_probeVariabilityReadbackMemory; }
// Texture Array Views
VkImageView GetProbeRayDataView() const { return m_probeRayDataView; }
VkImageView GetProbeIrradianceView() const { return m_probeIrradianceView; }
VkImageView GetProbeDistanceView() const { return m_probeDistanceView; }
VkImageView GetProbeDataView() const { return m_probeDataView; }
+ VkImageView GetProbeVariabilityView() const { return m_probeVariabilityView; }
+ VkImageView GetProbeVariabilityAverageView() const { return m_probeVariabilityAverageView; }
// Shader Modules
VkShaderModule GetProbeBlendingIrradianceModule() const { return m_probeBlendingIrradianceModule; }
@@ -284,6 +322,8 @@ namespace rtxgi
VkShaderModule GetProbeRelocationResetModule() const { return m_probeRelocationResetModule; }
VkShaderModule GetProbeClassificationModule() const { return m_probeClassificationModule; }
VkShaderModule GetProbeClassificationResetModule() const { return m_probeClassificationResetModule; }
+ VkShaderModule GetProbeVariabilityReductionModule() const { return m_probeVariabilityReductionModule; }
+ VkShaderModule GetProbeVariabilityExtraReductionModule() const { return m_probeVariabilityExtraReductionModule; }
// Pipelines
VkPipeline GetProbeBlendingIrradiancePipeline() const { return m_probeBlendingIrradiancePipeline; }
@@ -292,6 +332,8 @@ namespace rtxgi
VkPipeline GetProbeRelocationResetPipeline() const { return m_probeRelocationResetPipeline; }
VkPipeline GetProbeClassificationPipeline() const { return m_probeClassificationPipeline; }
VkPipeline GetProbeClassificationResetPipeline() const { return m_probeClassificationResetPipeline; }
+ VkPipeline GetProbeVariabilityReductionPipeline() const { return m_probeVariabilityReductionPipeline; }
+ VkPipeline GetProbeVariabilityExtraReductionPipeline() const { return m_probeVariabilityExtraReductionPipeline; }
//------------------------------------------------------------------------
// Resource Setters
@@ -318,12 +360,16 @@ namespace rtxgi
void SetIrradianceFormat(EDDGIVolumeTextureFormat format) { m_desc.probeIrradianceFormat = format; }
void SetDistanceFormat(EDDGIVolumeTextureFormat format) { m_desc.probeDistanceFormat = format; }
void SetProbeDataFormat(EDDGIVolumeTextureFormat format) { m_desc.probeDataFormat = format; }
+ void SetProbeVariabilityFromat(EDDGIVolumeTextureFormat format) { m_desc.probeVariabilityFormat = format; }
#if !RTXGI_DDGI_RESOURCE_MANAGEMENT
void SetProbeRayData(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeRayData = ptr; m_probeRayDataMemory = memoryPtr; m_probeRayDataView = viewPtr; }
void SetProbeIrradiance(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeIrradiance = ptr; m_probeIrradianceMemory = memoryPtr; m_probeIrradianceView = viewPtr; }
void SetProbeDistance(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeDistance = ptr; m_probeDistanceMemory = memoryPtr; m_probeDistanceView = viewPtr; }
void SetProbeData(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeData = ptr; m_probeDataMemory = memoryPtr; m_probeDataView = viewPtr; }
+ void SetProbeVariability(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeVariability = ptr; m_probeVariabilityMemory = memoryPtr; m_probeVariabilityView = viewPtr; }
+ void SetProbeVariabilityAverage(VkImage ptr, VkDeviceMemory memoryPtr, VkImageView viewPtr) { m_probeVariabilityAverage = ptr; m_probeVariabilityAverageMemory = memoryPtr; m_probeVariabilityAverageView = viewPtr; }
+ void SetProbeVariabilityReadback(VkBuffer ptr, VkDeviceMemory memoryPtr) { m_probeVariabilityReadback = ptr; m_probeVariabilityReadbackMemory = memoryPtr; }
#endif
private:
@@ -345,18 +391,26 @@ namespace rtxgi
VkImage m_probeIrradiance = nullptr; // Probe irradiance texture array - RGB: irradiance, encoded with a high gamma curve
VkImage m_probeDistance = nullptr; // Probe distance texture array - R: mean distance | G: mean distance^2
VkImage m_probeData = nullptr; // Probe data texture array - XYZ: world-space relocation offsets | W: classification state
+ VkImage m_probeVariability = nullptr; // Probe variability texture
+ VkImage m_probeVariabilityAverage = nullptr; // Probe variability average texture
+ VkBuffer m_probeVariabilityReadback = nullptr; // Probe variability readback texture
// Texture Array Memory
VkDeviceMemory m_probeRayDataMemory = nullptr; // Probe ray data memory
VkDeviceMemory m_probeIrradianceMemory = nullptr; // Probe irradiance memory
VkDeviceMemory m_probeDistanceMemory = nullptr; // Probe distance memory
VkDeviceMemory m_probeDataMemory = nullptr; // Probe data memory
+ VkDeviceMemory m_probeVariabilityMemory = nullptr; // Probe variability memory
+ VkDeviceMemory m_probeVariabilityAverageMemory = nullptr; // Probe variability average memory
+ VkDeviceMemory m_probeVariabilityReadbackMemory = nullptr; // Probe variability readback memory
// Texture Array Views
VkImageView m_probeRayDataView = nullptr; // Probe ray data view
VkImageView m_probeIrradianceView = nullptr; // Probe irradiance view
VkImageView m_probeDistanceView = nullptr; // Probe distance view
VkImageView m_probeDataView = nullptr; // Probe data view
+ VkImageView m_probeVariabilityView = nullptr; // Probe variability view
+ VkImageView m_probeVariabilityAverageView = nullptr; // Probe variability average view
// Pipeline Layout
VkPipelineLayout m_pipelineLayout = nullptr; // Pipeline layout, used for all update compute shaders
@@ -378,6 +432,8 @@ namespace rtxgi
VkShaderModule m_probeRelocationResetModule = nullptr; // Probe relocation reset shader module
VkShaderModule m_probeClassificationModule = nullptr; // Probe classification shader module
VkShaderModule m_probeClassificationResetModule = nullptr; // Probe classification reset shader module
+ VkShaderModule m_probeVariabilityReductionModule = nullptr; // Probe variability reduction shader module
+ VkShaderModule m_probeVariabilityExtraReductionModule = nullptr; // Probe variability reduction extra passes shader module
// Pipelines
VkPipeline m_probeBlendingIrradiancePipeline = nullptr; // Probe blending (irradiance) compute shader pipeline
@@ -386,6 +442,8 @@ namespace rtxgi
VkPipeline m_probeRelocationResetPipeline = nullptr; // Probe relocation reset compute shader pipeline
VkPipeline m_probeClassificationPipeline = nullptr; // Probe classification compute shader pipeline
VkPipeline m_probeClassificationResetPipeline = nullptr; // Probe classification reset compute shader pipeline
+ VkPipeline m_probeVariabilityReductionPipeline = nullptr; // Probe variability reduction compute shader pipeline
+ VkPipeline m_probeVariabilityExtraReductionPipeline = nullptr; // Probe variability reduction extra passes compute shader pipeline
#if RTXGI_DDGI_RESOURCE_MANAGEMENT
ERTXGIStatus CreateManagedResources(const DDGIVolumeDesc& desc, const DDGIVolumeManagedResourcesDesc& managed);
@@ -402,6 +460,8 @@ namespace rtxgi
bool CreateProbeIrradiance(const DDGIVolumeDesc& desc);
bool CreateProbeDistance(const DDGIVolumeDesc& desc);
bool CreateProbeData(const DDGIVolumeDesc& desc);
+ bool CreateProbeVariability(const DDGIVolumeDesc& desc);
+ bool CreateProbeVariabilityAverage(const DDGIVolumeDesc& desc);
bool IsDeviceChanged(const DDGIVolumeManagedResourcesDesc& desc)
{
@@ -448,5 +508,14 @@ namespace rtxgi
*/
RTXGI_API ERTXGIStatus ClassifyDDGIVolumeProbes(VkCommandBuffer cmdBuffer, uint32_t numVolumes, DDGIVolume** volumes);
+ /**
+ * Calculates average variability for all probes in each provided volume
+ */
+ RTXGI_API ERTXGIStatus CalculateDDGIVolumeVariability(VkCommandBuffer cmdBuffer, uint32_t numVolumes, DDGIVolume** volumes);
+
+ /**
+ * Reads back average variability for each provided volume, at the time of the call
+ */
+ RTXGI_API ERTXGIStatus ReadbackDDGIVolumeVariability(VkDevice device, uint32_t numVolumes, DDGIVolume** volumes);
} // namespace vulkan
} // namespace rtxgi
diff --git a/rtxgi-sdk/shaders/Common.hlsl b/rtxgi-sdk/shaders/Common.hlsl
index 26623ad..82d4f02 100644
--- a/rtxgi-sdk/shaders/Common.hlsl
+++ b/rtxgi-sdk/shaders/Common.hlsl
@@ -121,4 +121,17 @@ float4 RTXGIQuaternionConjugate(float4 q)
return float4(-q.xyz, q.w);
}
+//------------------------------------------------------------------------
+// Luminance Helper
+//------------------------------------------------------------------------
+
+/**
+ * Convert Linear RGB value to Luminance
+ */
+float RTXGILinearRGBToLuminance(float3 rgb)
+{
+ const float3 LuminanceWeights = float3(0.2126, 0.7152, 0.0722);
+ return dot(rgb, LuminanceWeights);
+}
+
#endif // RTXGI_COMMON_HLSL
diff --git a/rtxgi-sdk/shaders/ddgi/ProbeBlendingCS.hlsl b/rtxgi-sdk/shaders/ddgi/ProbeBlendingCS.hlsl
index 064159b..92c6bde 100644
--- a/rtxgi-sdk/shaders/ddgi/ProbeBlendingCS.hlsl
+++ b/rtxgi-sdk/shaders/ddgi/ProbeBlendingCS.hlsl
@@ -32,7 +32,10 @@
#else
#define RAY_DATA_REG_DECL
#define OUTPUT_REG_DECL
- #define PROBE_DATA_REG_DECL
+ #define PROBE_DATA_REG_DECL
+ #if RTXGI_DDGI_BLEND_RADIANCE
+ #define PROBE_VARIABILITY_REG_DECL
+ #endif
#endif
#else
@@ -48,6 +51,9 @@
#define RAY_DATA_REG_DECL : register(RAY_DATA_REGISTER, RAY_DATA_SPACE)
#define OUTPUT_REG_DECL : register(OUTPUT_REGISTER, OUTPUT_SPACE)
#define PROBE_DATA_REG_DECL : register(PROBE_DATA_REGISTER, PROBE_DATA_SPACE)
+ #if RTXGI_DDGI_BLEND_RADIANCE
+ #define PROBE_VARIABILITY_REG_DECL : register(PROBE_VARIABILITY_REGISTER, PROBE_VARIABILITY_SPACE)
+ #endif
#endif // RTXGI_DDGI_BINDLESS_RESOURCES
#endif // RTXGI_DDGI_SHADER_REFLECTION || SPIRV
@@ -95,6 +101,12 @@
RTXGI_VK_BINDING(PROBE_DATA_REGISTER, PROBE_DATA_SPACE)
RWTexture2DArray ProbeData PROBE_DATA_REG_DECL;
+#if RTXGI_DDGI_BLEND_RADIANCE
+ // Probe variability
+ RTXGI_VK_BINDING(PROBE_VARIABILITY_REGISTER, PROBE_VARIABILITY_SPACE)
+ RWTexture2DArray ProbeVariability PROBE_VARIABILITY_REG_DECL;
+#endif
+
#endif // RTXGI_DDGI_BINDLESS_RESOURCES
// -------- SHARED MEMORY DECLARATIONS ------------------------------------------------------------
@@ -291,6 +303,7 @@ void DDGIProbeBlendingCS(
RWTexture2DArray RayData = ResourceDescriptorHeap[resourceIndices.rayDataUAVIndex];
#if RTXGI_DDGI_BLEND_RADIANCE
RWTexture2DArray Output = ResourceDescriptorHeap[resourceIndices.probeIrradianceUAVIndex];
+ RWTexture2DArray ProbeVariability = ResourceDescriptorHeap[resourceIndices.probeVariabilityUAVIndex];
#else
RWTexture2DArray Output = ResourceDescriptorHeap[resourceIndices.probeDistanceUAVIndex];
#endif
@@ -305,6 +318,7 @@ void DDGIProbeBlendingCS(
RWTexture2DArray RayData = RWTex2DArray[resourceIndices.rayDataUAVIndex];
#if RTXGI_DDGI_BLEND_RADIANCE
RWTexture2DArray Output = RWTex2DArray[resourceIndices.probeIrradianceUAVIndex];
+ RWTexture2DArray ProbeVariability = RWTex2DArray[resourceIndices.probeVariabilityUAVIndex];
#else
RWTexture2DArray Output = RWTex2DArray[resourceIndices.probeDistanceUAVIndex];
#endif
@@ -374,7 +388,13 @@ void DDGIProbeBlendingCS(
// Early out: don't blend rays for probes that are inactive
int probeState = DDGILoadProbeState(probeIndex, ProbeData, volume);
- if (probeState == RTXGI_DDGI_PROBE_STATE_INACTIVE) return;
+ if (probeState == RTXGI_DDGI_PROBE_STATE_INACTIVE)
+ {
+ #if RTXGI_DDGI_BLEND_RADIANCE
+ ProbeVariability[DispatchThreadID].r = 0.f;
+ #endif
+ return;
+ }
// Get the probe ray direction associated with this thread
float2 probeOctantUV = DDGIGetNormalizedOctahedralCoordinates(int2(threadCoords.xy), RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS);
@@ -496,6 +516,9 @@ void DDGIProbeBlendingCS(
float3 delta = (result.rgb - previous.rgb);
+ float3 previousIrradianceMean = previous.rgb;
+ float3 currentIrradianceSample = result.rgb;
+
if (RTXGIMaxComponent(previous.rgb - result.rgb) > volume.probeIrradianceThreshold)
{
// Lower the hysteresis when a large lighting change is detected
@@ -524,6 +547,16 @@ void DDGIProbeBlendingCS(
lerpDelta = min(max(c_threshold, abs(lerpDelta)), abs(delta)) * sign(lerpDelta);
}
result = float4(previous.rgb + lerpDelta, 1.f);
+
+ if (volume.probeVariabilityEnabled)
+ {
+ float3 newIrradianceMean = result.rgb;
+ float3 newIrradianceSigma2 = (currentIrradianceSample - previousIrradianceMean) * (currentIrradianceSample - newIrradianceMean);
+ float newLuminanceSigma2 = RTXGILinearRGBToLuminance(newIrradianceSigma2);
+ float newLuminanceMean = RTXGILinearRGBToLuminance(newIrradianceMean);
+ float coefficientOfVariation = (newLuminanceMean <= c_threshold) ? 0.f : sqrt(newLuminanceSigma2) / newLuminanceMean;
+ ProbeVariability[threadCoords].r = coefficientOfVariation;
+ }
#else
// Interpolate the new filtered distance with the existing filtered distance in the probe.
diff --git a/rtxgi-sdk/shaders/ddgi/ReductionCS.hlsl b/rtxgi-sdk/shaders/ddgi/ReductionCS.hlsl
new file mode 100644
index 0000000..21c37e0
--- /dev/null
+++ b/rtxgi-sdk/shaders/ddgi/ReductionCS.hlsl
@@ -0,0 +1,411 @@
+/*
+* Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.
+*
+* NVIDIA CORPORATION and its licensors retain all intellectual property
+* and proprietary rights in and to this software, related documentation
+* and any modifications thereto. Any use, reproduction, disclosure or
+* distribution of this software and related documentation without an express
+* license agreement from NVIDIA CORPORATION is strictly prohibited.
+*/
+
+// For example usage, see DDGI_[D3D12|VK].cpp::CompileDDGIVolumeShaders() function.
+
+// -------- CONFIG FILE ---------------------------------------------------------------------------
+
+#if RTXGI_DDGI_USE_SHADER_CONFIG_FILE
+#include
+#endif
+
+// -------- DEFINE VALIDATION ---------------------------------------------------------------------
+
+#include "include/validation/ReductionDefines.hlsl"
+
+// -------- REGISTER DECLARATIONS -----------------------------------------------------------------
+
+#if RTXGI_DDGI_SHADER_REFLECTION || defined(__spirv__)
+
+ // Don't declare registers when using reflection or cross-compiling to SPIRV
+ #define VOLUME_CONSTS_REG_DECL
+ #if RTXGI_DDGI_BINDLESS_RESOURCES
+ #define VOLUME_RESOURCES_REG_DECL
+ #define RWTEX2DARRAY_REG_DECL
+ #else
+ #define RAY_DATA_REG_DECL
+ #define PROBE_DATA_REG_DECL
+ #define PROBE_VARIABILITY_REG_DECL
+ #define PROBE_VARIABILITY_AVERAGE_REG_DECL
+ #endif
+
+#else
+
+ // Declare registers and spaces when using D3D without reflection
+ #define VOLUME_CONSTS_REG_DECL : register(VOLUME_CONSTS_REGISTER, VOLUME_CONSTS_SPACE)
+ #if RTXGI_DDGI_BINDLESS_RESOURCES
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+ #define VOLUME_RESOURCES_REG_DECL : register(VOLUME_RESOURCES_REGISTER, VOLUME_RESOURCES_SPACE)
+ #define RWTEX2DARRAY_REG_DECL : register(RWTEX2DARRAY_REGISTER, RWTEX2DARRAY_SPACE)
+ #endif
+ #else
+ #define RAY_DATA_REG_DECL : register(RAY_DATA_REGISTER, RAY_DATA_SPACE)
+ #define OUTPUT_REG_DECL : register(OUTPUT_REGISTER, OUTPUT_SPACE)
+ #define PROBE_DATA_REG_DECL : register(PROBE_DATA_REGISTER, PROBE_DATA_SPACE)
+ #define PROBE_VARIABILITY_REG_DECL : register(PROBE_VARIABILITY_REGISTER, PROBE_VARIABILITY_SPACE)
+ #define PROBE_VARIABILITY_AVERAGE_REG_DECL : register(PROBE_VARIABILITY_AVERAGE_REGISTER, PROBE_VARIABILITY_SPACE)
+ #endif // RTXGI_DDGI_BINDLESS_RESOURCES
+
+#endif // RTXGI_DDGI_SHADER_REFLECTION || SPIRV
+
+// -------- ROOT / PUSH CONSTANT DECLARATIONS -----------------------------------------------------
+
+#include "include/ProbeCommon.hlsl"
+#include "include/DDGIRootConstants.hlsl"
+
+// -------- RESOURCE DECLARATIONS -----------------------------------------------------------------
+
+#if RTXGI_DDGI_BINDLESS_RESOURCES
+
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+
+ // DDGIVolume constants structured buffer
+ RTXGI_VK_BINDING(VOLUME_CONSTS_REGISTER, VOLUME_CONSTS_SPACE)
+ StructuredBuffer DDGIVolumes VOLUME_CONSTS_REG_DECL;
+
+ // DDGIVolume resource indices structured buffer
+ RTXGI_VK_BINDING(VOLUME_RESOURCES_REGISTER, VOLUME_RESOURCES_SPACE)
+ StructuredBuffer DDGIVolumeBindless VOLUME_RESOURCES_REG_DECL;
+
+ // DDGIVolume ray data, probe irradiance, probe distance, and probe data
+ RTXGI_VK_BINDING(RWTEX2DARRAY_REGISTER, RWTEX2DARRAY_SPACE)
+ RWTexture2DArray RWTex2DArray[] RWTEX2DARRAY_REG_DECL;
+
+ #endif
+
+#else
+
+ // DDGIVolume constants structured buffer
+ RTXGI_VK_BINDING(VOLUME_CONSTS_REGISTER, VOLUME_CONSTS_SPACE)
+ StructuredBuffer DDGIVolumes VOLUME_CONSTS_REG_DECL;
+
+ // Probe data (world-space offsets and classification states)
+ RTXGI_VK_BINDING(PROBE_DATA_REGISTER, PROBE_DATA_SPACE)
+ RWTexture2DArray ProbeData PROBE_DATA_REG_DECL;
+
+ // Probe variability
+ RTXGI_VK_BINDING(PROBE_VARIABILITY_REGISTER, PROBE_VARIABILITY_SPACE)
+ RWTexture2DArray ProbeVariability PROBE_VARIABILITY_REG_DECL;
+
+ // Probe variability average
+ RTXGI_VK_BINDING(PROBE_VARIABILITY_AVERAGE_REGISTER, PROBE_VARIABILITY_SPACE)
+ RWTexture2DArray ProbeVariabilityAverage PROBE_VARIABILITY_AVERAGE_REG_DECL;
+
+#endif // RTXGI_DDGI_BINDLESS_RESOURCES
+
+// -------- SHARED MEMORY DECLARATIONS ------------------------------------------------------------
+
+#define NUM_THREADS_X 4
+#define NUM_THREADS_Y 8
+#define NUM_THREADS_Z 4
+#define NUM_THREADS NUM_THREADS_X*NUM_THREADS_Y*NUM_THREADS_Z
+#define NUM_WAVES NUM_THREADS / RTXGI_DDGI_WAVE_LANE_COUNT
+
+groupshared float ThreadGroupSum[NUM_WAVES];
+groupshared uint MaxSumEntry;
+groupshared uint NumTotalSamples;
+
+// -------- HELPER FUNCTIONS ----------------------------------------------------------------------
+
+// Sums values in the ThreadGroupSum shared memory array, from 0 to MaxSumEntry
+// At the end of the function, ThreadGroupSum[0] should have the total of the whole array
+void reduceSharedMemorySum(uint ThreadIndexInGroup, uint waveIndex, uint waveLaneCount)
+{
+ uint numSharedMemoryEntries = MaxSumEntry + 1;
+ uint activeThreads = numSharedMemoryEntries;
+ while (activeThreads > 1)
+ {
+ bool usefulThread = ThreadIndexInGroup < activeThreads;
+ if (usefulThread)
+ {
+ float value = ThreadGroupSum[ThreadIndexInGroup];
+ GroupMemoryBarrierWithGroupSync();
+
+ float warpTotalValue = WaveActiveSum(value);
+
+ if (WaveIsFirstLane())
+ {
+ ThreadGroupSum[waveIndex] = warpTotalValue;
+ }
+ GroupMemoryBarrierWithGroupSync();
+ }
+ // Divide by wave size, rounding up (ceil)
+ activeThreads = (activeThreads + waveLaneCount - 1) / waveLaneCount;
+ }
+}
+
+// -------- ENTRY POINT ---------------------------------------------------------------------------
+
+[numthreads(NUM_THREADS_X, NUM_THREADS_Y, NUM_THREADS_Z)]
+void DDGIReductionCS(uint3 GroupID : SV_GroupID, uint3 GroupThreadID : SV_GroupThreadID, uint ThreadIndexInGroup : SV_GroupIndex)
+{
+ if (ThreadIndexInGroup == 0)
+ {
+ MaxSumEntry = 0;
+ NumTotalSamples = 0;
+ }
+ GroupMemoryBarrierWithGroupSync();
+
+ // Doing 4x2 samples per thread
+ const uint3 ThreadSampleFootprint = uint3(4, 2, 1);
+
+ uint3 groupCoordOffset = GroupID.xyz * uint3(NUM_THREADS_X, NUM_THREADS_Y, NUM_THREADS_Z) * ThreadSampleFootprint;
+ uint3 threadCoordInGroup = GroupThreadID.xyz;
+ uint3 threadCoordGlobal = groupCoordOffset + threadCoordInGroup * ThreadSampleFootprint;
+
+ uint volumeIndex = GetDDGIVolumeIndex();
+
+#if RTXGI_DDGI_BINDLESS_RESOURCES
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_DESCRIPTOR_HEAP
+ // Get the DDGIVolume constants structured buffer from the descriptor heap (SM6.6+ only)
+ StructuredBuffer DDGIVolumes = ResourceDescriptorHeap[GetDDGIVolumeConstantsIndex()];
+ #endif
+#endif
+
+ // Get the volume's constants
+ DDGIVolumeDescGPU volume = UnpackDDGIVolumeDescGPU(DDGIVolumes[volumeIndex]);
+
+ // Get the volume's resources
+#if RTXGI_DDGI_BINDLESS_RESOURCES
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_DESCRIPTOR_HEAP
+
+ // Get the volume's resource indices from the descriptor heap (SM6.6+ only)
+ StructuredBuffer DDGIVolumeBindless = ResourceDescriptorHeap[GetDDGIVolumeResourceIndicesIndex()];
+ DDGIVolumeResourceIndices resourceIndices = DDGIVolumeBindless[volumeIndex];
+
+ // Get the volume's texture array UAVs from the descriptor heap (SM6.6+ only)
+ RWTexture2DArray ProbeVariability = ResourceDescriptorHeap[resourceIndices.probeVariabilityUAVIndex];
+ RWTexture2DArray ProbeVariabilityAverage = ResourceDescriptorHeap[resourceIndices.probeVariabilityAverageUAVIndex];
+ RWTexture2DArray ProbeData = ResourceDescriptorHeap[resourceIndices.probeDataUAVIndex];
+
+ #elif RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+
+ // Get the volume's resource indices
+ DDGIVolumeResourceIndices resourceIndices = DDGIVolumeBindless[volumeIndex];
+
+ // Get the volume's texture array UAVs
+ RWTexture2DArray ProbeVariability = RWTex2DArray[resourceIndices.probeVariabilityUAVIndex];
+ RWTexture2DArray ProbeVariabilityAverage = RWTex2DArray[resourceIndices.probeVariabilityAverageUAVIndex];
+ RWTexture2DArray ProbeData = RWTex2DArray[resourceIndices.probeDataUAVIndex];
+
+ #endif
+#endif
+
+ uint waveLaneCount = WaveGetLaneCount();
+ uint wavesPerThreadGroup = NUM_THREADS / waveLaneCount;
+ uint waveIndex = ThreadIndexInGroup / waveLaneCount;
+
+ // Total size of the input variability texture
+ uint3 probeVariabilitySize = GetReductionInputSize();
+
+ float sampleSum = 0;
+ uint numSamples = 0;
+ for (uint i = 0; i < ThreadSampleFootprint.x; i++)
+ {
+ for (uint j = 0; j < ThreadSampleFootprint.y; j++)
+ {
+ uint3 sampleCoord = threadCoordGlobal + uint3(i, j, 0);
+ // Iterating over non-border samples of the irradiance texture
+ // Calling GetProbeIndex with NUM_INTERIOR_TEXELS (instead of NUM_TEXELS) to make
+ // sample coordinates line up with probe indices and avoid sampling border texels
+ int probeIndex = DDGIGetProbeIndex(sampleCoord, RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS, volume);
+ bool sampleInBounds = all(sampleCoord < probeVariabilitySize);
+ if (sampleInBounds)
+ {
+ float value = ProbeVariability[sampleCoord].r;
+
+ // Skip inactive probes
+ if (volume.probeClassificationEnabled)
+ {
+ uint3 probeDataCoords = DDGIGetProbeTexelCoords(probeIndex, volume);
+ int probeState = ProbeData[probeDataCoords].w;
+ if (probeState == RTXGI_DDGI_PROBE_STATE_INACTIVE)
+ {
+ value = 0.f;
+ continue;
+ }
+ }
+
+ sampleSum += value;
+ numSamples++;
+ }
+ }
+ }
+
+ // Sum up the warp
+ float waveTotalValue = WaveActiveSum(sampleSum);
+ // Sum up useful sample count
+ uint usefulSampleCount = WaveActiveSum(numSamples);
+ // Write sum and sample count for this wave
+ if (WaveIsFirstLane())
+ {
+ ThreadGroupSum[waveIndex] = waveTotalValue;
+ InterlockedMax(MaxSumEntry, waveIndex);
+ InterlockedAdd(NumTotalSamples, usefulSampleCount);
+ }
+ GroupMemoryBarrierWithGroupSync();
+ reduceSharedMemorySum(ThreadIndexInGroup, waveIndex, waveLaneCount);
+
+ if (ThreadIndexInGroup == 0)
+ {
+ float TotalPossibleSamples = NUM_THREADS * ThreadSampleFootprint.x * ThreadSampleFootprint.y;
+ // Average value for the samples we took
+ ProbeVariabilityAverage[GroupID.xyz].r = NumTotalSamples > 0 ? ThreadGroupSum[0] / NumTotalSamples : 0;
+ // Normalizing "weight" factor for this thread group, to allow partial thread groups to average properly with full groups
+ ProbeVariabilityAverage[GroupID.xyz].g = NumTotalSamples / TotalPossibleSamples;
+ }
+}
+
+// -------- SHARED MEMORY DECLARATIONS ------------------------------------------------------------
+
+groupshared float ThreadGroupAverage[NUM_WAVES];
+groupshared uint MaxAverageEntry;
+groupshared float ThreadGroupWeight[NUM_WAVES];
+
+// -------- HELPER FUNCTIONS ----------------------------------------------------------------------
+
+// Weighted average ThreadGroupAverage shared memory array, from 0 to MaxSumEntry
+// At the end of the function, ThreadGroupAverage[0] should have the average of the whole array
+// ThreadGroupWeight[0] will have the total weight of this thread group to be used when averaging with other groups
+void reduceSharedMemoryAverage(uint ThreadIndexInGroup, uint waveIndex, uint waveLaneCount)
+{
+ uint numSharedMemoryEntries = MaxAverageEntry + 1;
+ uint activeThreads = numSharedMemoryEntries;
+ while (activeThreads > 1)
+ {
+ bool usefulThread = ThreadIndexInGroup < activeThreads;
+ if (usefulThread)
+ {
+ float value = ThreadGroupAverage[ThreadIndexInGroup];
+ float weight = ThreadGroupWeight[ThreadIndexInGroup];
+ GroupMemoryBarrierWithGroupSync();
+
+ float waveTotalValue = WaveActiveSum(weight*value);
+ float waveTotalWeight = WaveActiveSum(weight);
+ float TotalPossibleWeight = WaveActiveCountBits(true);
+
+ if (WaveIsFirstLane())
+ {
+ ThreadGroupAverage[waveIndex] = waveTotalValue / waveTotalWeight;
+ ThreadGroupWeight[waveIndex] = waveTotalWeight / TotalPossibleWeight;
+ }
+ GroupMemoryBarrierWithGroupSync();
+ }
+ activeThreads = (activeThreads + waveLaneCount - 1) / waveLaneCount;
+ }
+}
+
+// -------- ENTRY POINT ---------------------------------------------------------------------------
+
+[numthreads(NUM_THREADS_X, NUM_THREADS_Y, NUM_THREADS_Z)]
+void DDGIExtraReductionCS(uint3 GroupID : SV_GroupID, uint3 GroupThreadID : SV_GroupThreadID, uint ThreadIndexInGroup : SV_GroupIndex)
+{
+ if (ThreadIndexInGroup == 0)
+ {
+ MaxAverageEntry = 0;
+ }
+ GroupMemoryBarrierWithGroupSync();
+
+ uint volumeIndex = GetDDGIVolumeIndex();
+#if RTXGI_DDGI_BINDLESS_RESOURCES
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_DESCRIPTOR_HEAP
+ // Get the DDGIVolume constants structured buffer from the descriptor heap (SM6.6+ only)
+ StructuredBuffer DDGIVolumes = ResourceDescriptorHeap[GetDDGIVolumeConstantsIndex()];
+ #endif
+#endif
+
+ // Get the volume's constants
+ DDGIVolumeDescGPU volume = UnpackDDGIVolumeDescGPU(DDGIVolumes[volumeIndex]);
+
+ // Get the volume's resources
+#if RTXGI_DDGI_BINDLESS_RESOURCES
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_DESCRIPTOR_HEAP
+
+ // Get the volume's resource indices from the descriptor heap (SM6.6+ only)
+ StructuredBuffer DDGIVolumeBindless = ResourceDescriptorHeap[GetDDGIVolumeResourceIndicesIndex()];
+ DDGIVolumeResourceIndices resourceIndices = DDGIVolumeBindless[volumeIndex];
+
+ // Get the volume's texture array UAVs from the descriptor heap (SM6.6+ only)
+ RWTexture2DArray ProbeVariability = ResourceDescriptorHeap[resourceIndices.probeVariabilityUAVIndex];
+ RWTexture2DArray ProbeVariabilityAverage = ResourceDescriptorHeap[resourceIndices.probeVariabilityAverageUAVIndex];
+ RWTexture2DArray ProbeData = ResourceDescriptorHeap[resourceIndices.probeDataUAVIndex];
+
+ #elif RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+
+ // Get the volume's resource indices
+ DDGIVolumeResourceIndices resourceIndices = DDGIVolumeBindless[volumeIndex];
+
+ // Get the volume's texture array UAVs
+ RWTexture2DArray ProbeVariability = RWTex2DArray[resourceIndices.probeVariabilityUAVIndex];
+ RWTexture2DArray ProbeVariabilityAverage = RWTex2DArray[resourceIndices.probeVariabilityAverageUAVIndex];
+ RWTexture2DArray ProbeData = RWTex2DArray[resourceIndices.probeDataUAVIndex];
+
+ #endif
+#endif
+
+ uint waveLaneCount = WaveGetLaneCount();
+ uint wavesPerThreadGroup = NUM_THREADS / waveLaneCount;
+ uint waveIndex = ThreadIndexInGroup / waveLaneCount;
+
+ // Doing 4x2 samples per thread
+ const uint3 ThreadSampleFootprint = uint3(4, 2, 1);
+
+ uint3 groupCoordOffset = GroupID.xyz * uint3(NUM_THREADS_X, NUM_THREADS_Y, NUM_THREADS_Z) * ThreadSampleFootprint;
+ uint3 threadCoordInGroup = GroupThreadID.xyz;
+ uint3 threadCoordGlobal = groupCoordOffset + threadCoordInGroup * ThreadSampleFootprint;
+ uint3 inputSize = GetReductionInputSize();
+
+ bool footprintInBounds = all(threadCoordGlobal < inputSize);
+ float threadFootprintValueSum = 0;
+ float threadFootprintWeightSum = 0;
+
+ if (footprintInBounds)
+ {
+ for (uint i = 0; i < ThreadSampleFootprint.x; i++)
+ {
+ for (uint j = 0; j < ThreadSampleFootprint.y; j++)
+ {
+ uint3 sampleCoord = threadCoordGlobal + uint3(i, j, 0);
+ bool sampleInBounds = all(sampleCoord < inputSize);
+ if (sampleInBounds)
+ {
+ float value = ProbeVariabilityAverage[sampleCoord].r;
+ float weight = ProbeVariabilityAverage[sampleCoord].g;
+ threadFootprintValueSum += weight * value;
+ threadFootprintWeightSum += weight;
+ }
+ }
+ }
+ }
+ float threadAverageValue = footprintInBounds ? threadFootprintValueSum / threadFootprintWeightSum : 0;
+ // Per-thread weight will be 1.0 if thread sampled all 4x2 pixels, 0.125 if it only sampled one
+ float ThreadTotalPossibleWeight = ThreadSampleFootprint.x * ThreadSampleFootprint.y;
+ float threadWeight = threadFootprintWeightSum / ThreadTotalPossibleWeight;
+
+ // Sum up the warp
+ float waveTotalValue = WaveActiveSum(threadWeight * threadAverageValue);
+ float waveTotalWeight = WaveActiveSum(threadWeight);
+ float waveTotalPossibleWeight = waveLaneCount * ThreadTotalPossibleWeight;
+
+ if (WaveIsFirstLane() && WaveActiveAnyTrue(footprintInBounds))
+ {
+ ThreadGroupAverage[waveIndex] = waveTotalValue / waveTotalWeight;
+ ThreadGroupWeight[waveIndex] = waveTotalWeight / waveTotalPossibleWeight;
+ InterlockedMax(MaxSumEntry, waveIndex);
+ }
+
+ GroupMemoryBarrierWithGroupSync();
+ reduceSharedMemoryAverage(ThreadIndexInGroup, waveIndex, waveLaneCount);
+ if (ThreadIndexInGroup == 0)
+ {
+ ProbeVariabilityAverage[GroupID.xyz].r = ThreadGroupAverage[0];
+ ProbeVariabilityAverage[GroupID.xyz].g = ThreadGroupWeight[0];
+ }
+}
diff --git a/rtxgi-sdk/shaders/ddgi/include/Common.hlsl b/rtxgi-sdk/shaders/ddgi/include/Common.hlsl
index d60d12e..6b71345 100644
--- a/rtxgi-sdk/shaders/ddgi/include/Common.hlsl
+++ b/rtxgi-sdk/shaders/ddgi/include/Common.hlsl
@@ -27,10 +27,12 @@
// Texture formats (matches EDDGIVolumeTextureFormat)
#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_U32 0
-#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F16x2 1
-#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F16x4 2
-#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F32x2 3
-#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F32x4 4
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F16 1
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F16x2 2
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F16x4 3
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F32 4
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F32x2 5
+#define RTXGI_DDGI_VOLUME_TEXTURE_FORMAT_F32x4 6
// The number of fixed rays that are used by probe relocation and classification.
// These rays directions are always the same to produce temporally stable results.
diff --git a/rtxgi-sdk/shaders/ddgi/include/DDGIRootConstants.hlsl b/rtxgi-sdk/shaders/ddgi/include/DDGIRootConstants.hlsl
index 36594bb..e08f174 100644
--- a/rtxgi-sdk/shaders/ddgi/include/DDGIRootConstants.hlsl
+++ b/rtxgi-sdk/shaders/ddgi/include/DDGIRootConstants.hlsl
@@ -37,6 +37,7 @@
uint GetDDGIVolumeIndex() { return DDGI.volumeIndex; }
uint GetDDGIVolumeConstantsIndex() { return DDGI.volumeConstantsIndex; }
uint GetDDGIVolumeResourceIndicesIndex() { return DDGI.volumeResourceIndicesIndex; }
+ uint3 GetReductionInputSize() { return uint3(DDGI.reductionInputSizeX, DDGI.reductionInputSizeY, DDGI.reductionInputSizeZ); }
#else // VULKAN
@@ -77,22 +78,33 @@
{
// IMPORTANT: insert padding to match the layout of your push constants!
// The padding below matches the size of the Test Harness' "GlobalConstants" struct
- // with 44 float values before the DDGIRootConstants (see test-harness/include/graphics/Types.h)
+ // with 48 float values before the DDGIRootConstants (see test-harness/include/graphics/Types.h)
float4x4 padding0;
float4x4 padding1;
- float4x3 padding2;
+ float4x4 padding2;
uint RTXGI_PUSH_CONSTS_FIELD_DDGI_VOLUME_INDEX_NAME;
- uint3 ddgi_pad;
+ uint2 ddgi_pad0;
+ uint RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_X_NAME;
+ uint RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Y_NAME;
+ uint RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Z_NAME;
+ uint2 ddgi_pad1;
};
[[vk::push_constant]] RTXGI_PUSH_CONSTS_STRUCT_NAME RTXGI_PUSH_CONSTS_VARIABLE_NAME;
#endif
uint GetDDGIVolumeIndex() { return RTXGI_PUSH_CONSTS_VARIABLE_NAME.RTXGI_PUSH_CONSTS_FIELD_DDGI_VOLUME_INDEX_NAME; }
+ uint3 GetReductionInputSize()
+ {
+ return uint3(RTXGI_PUSH_CONSTS_VARIABLE_NAME.RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_X_NAME,
+ RTXGI_PUSH_CONSTS_VARIABLE_NAME.RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Y_NAME,
+ RTXGI_PUSH_CONSTS_VARIABLE_NAME.RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Z_NAME);
+ }
#elif RTXGI_PUSH_CONSTS_TYPE == RTXGI_PUSH_CONSTS_TYPE_SDK
[[vk::push_constant]] ConstantBuffer DDGI;
uint GetDDGIVolumeIndex() { return DDGI.volumeIndex; }
+ uint3 GetReductionInputSize() { return uint3(DDGI.reductionInputSizeX, DDGI.reductionInputSizeY, DDGI.reductionInputSizeZ); }
#endif // RTXGI_PUSH_CONSTS_TYPE
diff --git a/rtxgi-sdk/shaders/ddgi/include/validation/ProbeBlendingDefines.hlsl b/rtxgi-sdk/shaders/ddgi/include/validation/ProbeBlendingDefines.hlsl
index 63296bf..6c60164 100644
--- a/rtxgi-sdk/shaders/ddgi/include/validation/ProbeBlendingDefines.hlsl
+++ b/rtxgi-sdk/shaders/ddgi/include/validation/ProbeBlendingDefines.hlsl
@@ -36,6 +36,8 @@
#define RAY_DATA_SPACE 0
#if RTXGI_DDGI_BLEND_RADIANCE
#define OUTPUT_REGISTER 2
+ #define PROBE_VARIABILITY_REGISTER 5
+ #define PROBE_VARIABILITY_SPACE 0
#else
#define OUTPUT_REGISTER 3
#endif
@@ -51,6 +53,8 @@
#define RAY_DATA_SPACE space1
#if RTXGI_DDGI_BLEND_RADIANCE
#define OUTPUT_REGISTER u1
+ #define PROBE_VARIABILITY_REGISTER u4
+ #define PROBE_VARIABILITY_SPACE space1
#else
#define OUTPUT_REGISTER u2
#endif
@@ -160,6 +164,24 @@
#error Required define PROBE_DATA_SPACE is not defined for ProbeBlendingCS.hlsl!
#endif
+
+ #if RTXGI_DDGI_BLEND_RADIANCE
+ // PROBE_VARIABILITY_REGISTER and PROBE_VARIABILITY_SPACE must be passed in as defines at shader compilation time *when not using reflection*
+ // and when probe classification is enabled.
+ // These defines specify the shader register and space used for the DDGIVolume probe data texture.
+ // Ex: PROBE_VARIABILITY_REGISTER u2
+ // Ex: PROBE_VARIABILITY_SPACE space1
+
+ #ifndef PROBE_VARIABILITY_REGISTER
+ #error Required define PROBE_VARIABILITY_REGISTER is not defined for ProbeBlendingCS.hlsl!
+ #endif
+
+ #ifndef PROBE_VARIABILITY_SPACE
+ #error Required define PROBE_VARIABILITY_SPACE is not defined for ProbeBlendingCS.hlsl!
+ #endif
+
+ #endif // RTXGI_DDGI_BLEND_RADIANCE
+
#endif // RTXGI_DDGI_BINDLESS_RESOURCES
#endif // !RTXGI_DDGI_SHADER_REFLECTION
#endif // RTXGI_DDGI_BINDLESS_RESOURCES
diff --git a/rtxgi-sdk/shaders/ddgi/include/validation/ReductionDefines.hlsl b/rtxgi-sdk/shaders/ddgi/include/validation/ReductionDefines.hlsl
new file mode 100644
index 0000000..9a4c27c
--- /dev/null
+++ b/rtxgi-sdk/shaders/ddgi/include/validation/ReductionDefines.hlsl
@@ -0,0 +1,170 @@
+/*
+* Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.
+*
+* NVIDIA CORPORATION and its licensors retain all intellectual property
+* and proprietary rights in and to this software, related documentation
+* and any modifications thereto. Any use, reproduction, disclosure or
+* distribution of this software and related documentation without an express
+* license agreement from NVIDIA CORPORATION is strictly prohibited.
+*/
+
+// RTXGI_DDGI_RESOURCE_MANAGEMENT must be passed in as a define at shader compilation time.
+// This define specifies if the shader resources are managed by the SDK (and not the application).
+// Ex: RTXGI_DDGI_RESOURCE_MANAGEMENT [0|1]
+#ifndef RTXGI_DDGI_RESOURCE_MANAGEMENT
+ #error Required define RTXGI_DDGI_RESOURCE_MANAGEMENT is not defined for ReductionCS.hlsl!
+#endif
+
+// -------- SHADER REFLECTION DEFINES -------------------------------------------------------------
+
+// RTXGI_DDGI_SHADER_REFLECTION must be passed in as a define at shader compilation time.
+// This define specifies if the shader resources will be determined using shader reflection.
+// Ex: RTXGI_DDGI_SHADER_REFLECTION [0|1]
+#ifndef RTXGI_DDGI_SHADER_REFLECTION
+ #error Required define RTXGI_DDGI_SHADER_REFLECTION is not defined for ReductionCS.hlsl!
+#else
+ #if !RTXGI_DDGI_SHADER_REFLECTION
+ // REGISTERs AND SPACEs (SHADER REFLECTION DISABLED)
+
+ // MANAGED RESOURCES DEFINES
+ #if RTXGI_DDGI_RESOURCE_MANAGEMENT
+ #ifdef __spirv__
+ #define RTXGI_PUSH_CONSTS_TYPE 1
+ #define VOLUME_CONSTS_REGISTER 0
+ #define VOLUME_CONSTS_SPACE 0
+ #define PROBE_VARIABILITY_REGISTER 5
+ #define PROBE_VARIABILITY_AVERAGE_REGISTER 6
+ #define PROBE_VARIABILITY_SPACE 0
+ #define PROBE_DATA_REGISTER 4
+ #define PROBE_DATA_SPACE 0
+ #else
+ #define CONSTS_REGISTER b0
+ #define CONSTS_SPACE space1
+ #define VOLUME_CONSTS_REGISTER t0
+ #define VOLUME_CONSTS_SPACE space1
+ #define PROBE_VARIABILITY_REGISTER u4
+ #define PROBE_VARIABILITY_AVERAGE_REGISTER u5
+ #define PROBE_VARIABILITY_SPACE space1
+ #define PROBE_DATA_REGISTER u3
+ #define PROBE_DATA_SPACE space1
+ #endif
+ #endif // RTXGI_DDGI_RESOURCE_MANAGEMENT
+
+ // VOLUME_CONSTS_REGISTER and VOLUME_CONSTS_SPACE must be passed in as defines at shader compilation time *when not using reflection*.
+ // These defines specify the shader register and space used for the DDGIVolumeDescGPUPacked structured buffer.
+ // Ex: VOLUME_CONSTS_REGISTER t5
+ // Ex: VOLUME_CONSTS_SPACE space0
+ #ifndef VOLUME_CONSTS_REGISTER
+ #error Required define VOLUME_CONSTS_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+ #ifndef VOLUME_CONSTS_SPACE
+ #error Required define VOLUME_CONSTS_SPACE is not defined for ReductionCS.hlsl!
+ #endif
+ #endif // !RTXGI_DDGI_SHADER_REFLECTION
+#endif // RTXGI_DDGI_SHADER_REFLECTION
+
+// -------- RESOURCE BINDING DEFINES --------------------------------------------------------------
+
+// RTXGI_DDGI_BINDLESS_RESOURCES must be passed in as a define at shader compilation time.
+// This define specifies whether resources will be accessed bindlessly or not.
+// Ex: RTXGI_DDGI_BINDLESS_RESOURCES [0|1]
+#ifndef RTXGI_DDGI_BINDLESS_RESOURCES
+ #error Required define RTXGI_DDGI_BINDLESS_RESOURCES is not defined for ReductionCS.hlsl!
+#else
+ #if !RTXGI_DDGI_SHADER_REFLECTION
+ // Shader Reflection DISABLED
+ #if RTXGI_DDGI_BINDLESS_RESOURCES
+ // Bindless Resources ENABLED
+
+ // RTXGI_BINDLESS_TYPE must be passed in as a define at shader compilation time when *bindless resources are used*.
+ // This define specifies whether bindless resources will be accessed through bindless resource arrays or the (D3D12) descriptor heap.
+ // Ex: RTXGI_BINDLESS_TYPE [RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS(0)|RTXGI_BINDLESS_TYPE_DESCRIPTOR_HEAP(1)]
+ #ifndef RTXGI_BINDLESS_TYPE
+ #error Required define RTXGI_BINDLESS_TYPE is not defined for ReductionCS.hlsl!
+ #endif
+
+ #if RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+ // Bindless resources are accessed using SM6.5 and below style resource arrays
+
+ // VOLUME_RESOURCES_REGISTER and VOLUME_RESOURCES_SPACE must be passed in as defines at shader compilation time
+ // *not* using reflection and using bindless resource arrays.
+ // These defines specify the shader register and space used for the DDGIVolumeResourceIndices structured buffer.
+ // Ex: VOLUME_RESOURCES_REGISTER t6
+ // Ex: VOLUME_RESOURCES_SPACE space0
+ #ifndef VOLUME_RESOURCES_REGISTER
+ #error Required define VOLUME_RESOURCES_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+ #ifndef VOLUME_RESOURCES_REGISTER
+ #error Required define VOLUME_RESOURCES_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+
+ // RWTEX2DARRAY_REGISTER and RWTEX2DARRAY_SPACE must be passed in as defines at shader compilation time
+ // *not* using reflection and using bindless resource arrays.
+ // These defines specify the shader register and space of the RWTexture2DArray resource array that the DDGIVolume's
+ // ray data, irradiance, distance, and probe data texture arrays are retrieved from bindlessly.
+ // Ex: RWTEX2DARRAY_REGISTER u6
+ // Ex: RWTEX2DARRAY_SPACE space1
+ #ifndef RWTEX2DARRAY_REGISTER
+ #error Required bindless mode define RWTEX2DARRAY_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+ #ifndef RWTEX2DARRAY_SPACE
+ #error Required bindless mode define RWTEX2DARRAY_SPACE is not defined for ReductionCS.hlsl!
+ #endif
+
+ #endif // RTXGI_BINDLESS_TYPE == RTXGI_BINDLESS_TYPE_RESOURCE_ARRAYS
+
+ #else // RTXGI_DDGI_BINDLESS_RESOURCES
+
+ // Bindless Resources DISABLED (BOUND RESOURCE DEFINES)
+
+ // PROBE_DATA_REGISTER and PROBE_DATA_SPACE must be passed in as defines at shader compilation time *when not using reflection*.
+ // These defines specify the shader register and space used for the DDGIVolume probe data texture array.
+ // Ex: PROBE_DATA_REGISTER u2
+ // Ex: PROBE_DATA_SPACE space1
+ #ifndef PROBE_DATA_REGISTER
+ #error Required define PROBE_DATA_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+ #ifndef PROBE_DATA_SPACE
+ #error Required define PROBE_DATA_SPACE is not defined for ReductionCS.hlsl!
+ #endif
+
+ // PROBE_VARIABILITY_REGISTER and PROBE_VARIABILITY_SPACE must be passed in as defines at shader compilation time *when not using reflection*
+ // and when probe classification is enabled.
+ // These defines specify the shader register and space used for the DDGIVolume probe variability texture.
+ // Ex: PROBE_VARIABILITY_REGISTER u2
+ // Ex: PROBE_VARIABILITY_SPACE space1
+
+ #ifndef PROBE_VARIABILITY_REGISTER
+ #error Required define PROBE_VARIABILITY_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+
+ #ifndef PROBE_VARIABILITY_AVERAGE_REGISTER
+ #error Required define PROBE_VARIABILITY_AVERAGE_REGISTER is not defined for ReductionCS.hlsl!
+ #endif
+
+ #ifndef PROBE_VARIABILITY_SPACE
+ #error Required define PROBE_VARIABILITY_SPACE is not defined for ReductionCS.hlsl!
+ #endif
+
+ #endif // RTXGI_DDGI_BINDLESS_RESOURCES
+ #endif // !RTXGI_DDGI_SHADER_REFLECTION
+#endif // RTXGI_DDGI_BINDLESS_RESOURCES
+
+// -------- CONFIGURATION DEFINES -----------------------------------------------------------------
+
+// RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS must be passed in as a define at shader compilation time.
+// This define specifies the number of texels in a single dimension of a probe *excluding* the 1-texel probe border.
+// Ex: RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS 6 => irradiance data is 6x6 texels (for a single probe)
+// Ex: RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS 14 => distance data is 14x14 texels (for a single probe)
+#ifndef RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS
+ #error Required define RTXGI_DDGI_PROBE_NUM_INTERIOR_TEXELS is not defined for ReductionCS.hlsl!
+#endif
+
+// RTXGI_DDGI_WAVE_LANE_COUNT must be passed in as a define at shader compilation time.
+// This define specifies the number of threads in a wave, needed to determine required shared memory
+// Ex: RTXGI_DDGI_WAVE_LANE_COUNT 32 => 32 threads in a wave
+#ifndef RTXGI_DDGI_WAVE_LANE_COUNT
+#error Required define RTXGI_DDGI_WAVE_LANE_COUNT is not defined for ReductionCS.hlsl!
+#endif
+
+// -------------------------------------------------------------------------------------------
diff --git a/rtxgi-sdk/src/ddgi/DDGIVolume.cpp b/rtxgi-sdk/src/ddgi/DDGIVolume.cpp
index 4a3ceaa..2f83854 100644
--- a/rtxgi-sdk/src/ddgi/DDGIVolume.cpp
+++ b/rtxgi-sdk/src/ddgi/DDGIVolume.cpp
@@ -25,7 +25,7 @@ namespace rtxgi
void SetInsertPerfMarkers(bool value) { bInsertPerfMarkers = value; }
int GetDDGIVolumeNumRTVDescriptors() { return 2; }
- int GetDDGIVolumeNumTex2DArrayDescriptors() { return 4; }
+ int GetDDGIVolumeNumTex2DArrayDescriptors() { return 6; }
int GetDDGIVolumeNumResourceDescriptors() { return 2 * GetDDGIVolumeNumTex2DArrayDescriptors(); } // Multiplied by 2 to account for UAV *and* SRV descriptors
bool ValidateShaderBytecode(const ShaderBytecode& bytecode)
@@ -74,6 +74,25 @@ namespace rtxgi
width *= (uint32_t)(desc.probeNumDistanceTexels);
height *= (uint32_t)(desc.probeNumDistanceTexels);
}
+ else if (type == EDDGIVolumeTextureType::Variability)
+ {
+ width *= (uint32_t)(desc.probeNumIrradianceInteriorTexels);
+ height *= (uint32_t)(desc.probeNumIrradianceInteriorTexels);
+ }
+ else if (type == EDDGIVolumeTextureType::VariabilityAverage)
+ {
+ // Start with Probe variability texture dimensions
+ width *= (uint32_t)(desc.probeNumIrradianceInteriorTexels);
+ height *= (uint32_t)(desc.probeNumIrradianceInteriorTexels);
+ // Divide into thread groups, should match NUM_THREADS_XYZ in ReductionCS.hlsl
+ const uint3 NumThreadsInGroup = { 4, 8, 4 };
+ // Also divide by sample footprint per-thread, should match ThreadSampleFootprint in ReductionCS.hlsl
+ const uint3 DimensionScale = { NumThreadsInGroup.x * 4, NumThreadsInGroup.y * 2, NumThreadsInGroup.z };
+ // Size of diff total texture is just diff divided by thread group dimensions, rounded up
+ width = (width + DimensionScale.x - 1) / DimensionScale.x;
+ height = (height + DimensionScale.y - 1) / DimensionScale.y;
+ arraySize = (arraySize + DimensionScale.z - 1) / DimensionScale.z;
+ }
}
}
@@ -90,6 +109,7 @@ namespace rtxgi
if(m_desc.movementType == EDDGIVolumeMovementType::Scrolling) ComputeScrolling();
}
+#if _DEBUG
void DDGIVolumeBase::ValidatePackedData(const DDGIVolumeDescGPUPacked packed) const
{
DDGIVolumeDescGPU l = UnpackDDGIVolumeDescGPU(packed);
@@ -120,6 +140,7 @@ namespace rtxgi
assert(l.probeIrradianceFormat == r.probeIrradianceFormat);
assert(l.probeRelocationEnabled == r.probeRelocationEnabled);
assert(l.probeClassificationEnabled == r.probeClassificationEnabled);
+ assert(l.probeVariabilityEnabled == r.probeVariabilityEnabled);
assert(l.probeScrollClear[0] == r.probeScrollClear[0]);
assert(l.probeScrollClear[1] == r.probeScrollClear[1]);
assert(l.probeScrollClear[2] == r.probeScrollClear[2]);
@@ -127,6 +148,7 @@ namespace rtxgi
assert(l.probeScrollDirections[1] == r.probeScrollDirections[1]);
assert(l.probeScrollDirections[2] == r.probeScrollDirections[2]);
}
+#endif
//------------------------------------------------------------------------
// Getters
@@ -168,6 +190,7 @@ namespace rtxgi
descGPU.probeIrradianceFormat = static_cast(m_desc.probeIrradianceFormat);
descGPU.probeRelocationEnabled = m_desc.probeRelocationEnabled;
descGPU.probeClassificationEnabled = m_desc.probeClassificationEnabled;
+ descGPU.probeVariabilityEnabled = m_desc.probeVariabilityEnabled;
descGPU.probeScrollClear[0] = m_probeScrollClear[0];
descGPU.probeScrollClear[1] = m_probeScrollClear[1];
descGPU.probeScrollClear[2] = m_probeScrollClear[2];
@@ -267,6 +290,8 @@ namespace rtxgi
uint32_t numIrradianceBytesPerTexel = 0;
uint32_t numDistanceBytesPerTexel = 0;
uint32_t numProbeDataBytesPerTexel = 0;
+ uint32_t numProbeVariabilityBytesPerTexel = 0;
+ uint32_t numProbeVariabilityAverageBytesPerTexel = 0;
// Compute the number of irradiance and distance texels
uint32_t numIrradianceTexelsPerProbe = (m_desc.probeNumIrradianceTexels * m_desc.probeNumIrradianceTexels);
@@ -289,12 +314,25 @@ namespace rtxgi
if (m_desc.probeDataFormat == EDDGIVolumeTextureFormat::F16x4) numProbeDataBytesPerTexel = 8;
else if (m_desc.probeDataFormat == EDDGIVolumeTextureFormat::F32x4) numProbeDataBytesPerTexel = 16;
+ // Get the number of bytes per probe variability texel
+ if (m_desc.probeVariabilityFormat == EDDGIVolumeTextureFormat::F16) numProbeVariabilityBytesPerTexel = 2;
+ else if (m_desc.probeVariabilityFormat == EDDGIVolumeTextureFormat::F32) numProbeVariabilityBytesPerTexel = 4;
+
+ // Variability average is always F32x2 (8 bytes)
+ numProbeVariabilityAverageBytesPerTexel = 8;
+
// Compute the number of bytes per probe
uint32_t bytesPerProbe = 0;
bytesPerProbe += GetNumRaysPerProbe() * numRayDataBytesPerTexel;
bytesPerProbe += (numIrradianceTexelsPerProbe * numIrradianceBytesPerTexel);
bytesPerProbe += (numDistanceTexelsPerProbe * numDistanceBytesPerTexel);
bytesPerProbe += numProbeDataBytesPerTexel;
+ bytesPerProbe += numProbeVariabilityBytesPerTexel;
+
+ // Coefficient of variation average texture is different (smaller) dimensions from other textures
+ uint32_t width, height, arraySize;
+ GetDDGIVolumeTextureDimensions(m_desc, EDDGIVolumeTextureType::VariabilityAverage, width, height, arraySize);
+ bytesPerVolume += width * height * arraySize * numProbeVariabilityAverageBytesPerTexel;
// Add the per probe memory use
bytesPerVolume += GetNumProbes() * bytesPerProbe;
diff --git a/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_D3D12.cpp b/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_D3D12.cpp
index 29db639..bc601a8 100644
--- a/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_D3D12.cpp
+++ b/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_D3D12.cpp
@@ -38,6 +38,8 @@ namespace rtxgi
if (!ValidateShaderBytecode(desc.probeRelocation.resetCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_RELOCATION_RESET;
if (!ValidateShaderBytecode(desc.probeClassification.updateCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_CLASSIFICATION;
if (!ValidateShaderBytecode(desc.probeClassification.resetCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_CLASSIFICATION_RESET;
+ if (!ValidateShaderBytecode(desc.probeVariability.reductionCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_VARIABILITY_REDUCTION;
+ if (!ValidateShaderBytecode(desc.probeVariability.extraReductionCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_VARIABILITY_EXTRA_REDUCTION;
return ERTXGIStatus::OK;
}
@@ -52,6 +54,9 @@ namespace rtxgi
if (desc.probeIrradiance == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_IRRADIANCE;
if (desc.probeDistance == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_DISTANCE;
if (desc.probeData == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_DATA;
+ if (desc.probeVariability == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY;
+ if (desc.probeVariabilityAverage == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY_AVERAGE;
+ if (desc.probeVariabilityReadback == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY_READBACK;
// Render Target Views
if (desc.probeIrradianceRTV.ptr == 0) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_DESCRIPTOR;
@@ -64,6 +69,8 @@ namespace rtxgi
if (desc.probeRelocation.resetPSO == nullptr) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_PSO_PROBE_RELOCATION_RESET;
if (desc.probeClassification.updatePSO == nullptr) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_PSO_PROBE_CLASSIFICATION;
if (desc.probeClassification.resetPSO == nullptr) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_PSO_PROBE_CLASSIFICATION_RESET;
+ if (desc.probeVariabilityPSOs.reductionPSO == nullptr) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_PSO_PROBE_REDUCTION;
+ if (desc.probeVariabilityPSOs.extraReductionPSO == nullptr) return ERTXGIStatus::ERROR_DDGI_D3D12_INVALID_PSO_PROBE_EXTRA_REDUCTION;
return ERTXGIStatus::OK;
}
@@ -95,6 +102,15 @@ namespace rtxgi
if (format == EDDGIVolumeTextureFormat::F16x4) return DXGI_FORMAT_R16G16B16A16_FLOAT;
else if (format == EDDGIVolumeTextureFormat::F32x4) return DXGI_FORMAT_R32G32B32A32_FLOAT;
}
+ else if (type == EDDGIVolumeTextureType::Variability)
+ {
+ if (format == EDDGIVolumeTextureFormat::F16) return DXGI_FORMAT_R16_FLOAT;
+ else if(format == EDDGIVolumeTextureFormat::F32) return DXGI_FORMAT_R32_FLOAT;
+ }
+ else if (type == EDDGIVolumeTextureType::VariabilityAverage)
+ {
+ return DXGI_FORMAT_R32G32_FLOAT;
+ }
return DXGI_FORMAT_UNKNOWN;
}
@@ -106,7 +122,9 @@ namespace rtxgi
// 1 UAV for probe irradiance texture array (u1, space1)
// 1 UAV for probe distance texture array (u2, space1)
// 1 UAV for probe data texture array (u3, space1)
- D3D12_DESCRIPTOR_RANGE ranges[5];
+ // 1 UAV for probe variation array (u4, space1)
+ // 1 UAV for probe variation average array (u5, space1)
+ D3D12_DESCRIPTOR_RANGE ranges[7];
// Volume Constants Structured Buffer (t0, space1)
ranges[0].NumDescriptors = 1;
@@ -143,6 +161,20 @@ namespace rtxgi
ranges[4].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV;
ranges[4].OffsetInDescriptorsFromTableStart = heapDesc.resourceIndices.probeDataUAVIndex;
+ // Probe Variability Texture Array UAV (u4, space1)
+ ranges[5].NumDescriptors = 1;
+ ranges[5].BaseShaderRegister = 4;
+ ranges[5].RegisterSpace = 1;
+ ranges[5].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV;
+ ranges[5].OffsetInDescriptorsFromTableStart = heapDesc.resourceIndices.probeVariabilityUAVIndex;
+
+ // Probe Variability Average Texture Array UAV (u5, space1)
+ ranges[6].NumDescriptors = 1;
+ ranges[6].BaseShaderRegister = 5;
+ ranges[6].RegisterSpace = 1;
+ ranges[6].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV;
+ ranges[6].OffsetInDescriptorsFromTableStart = heapDesc.resourceIndices.probeVariabilityAverageUAVIndex;
+
// Root Parameters
std::vector rootParameters;
@@ -277,33 +309,6 @@ namespace rtxgi
UINT volumeIndex;
std::vector barriers;
- // Transition volume textures to unordered access for read/write
- D3D12_RESOURCE_BARRIER barrier = {};
- barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
- barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
- barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
- barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
-
- // Transition(s)
- for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
- {
- const DDGIVolume* volume = volumes[volumeIndex];
-
- // Transition the volume's irradiance and distance textures to unordered access
- barrier.Transition.pResource = volume->GetProbeIrradiance();
- barriers.push_back(barrier);
-
- barrier.Transition.pResource = volume->GetProbeDistance();
- barriers.push_back(barrier);
- }
-
- // Wait for the resource transitions to complete
- if (!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
-
- barriers.clear();
- barrier = {};
- barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
-
// Irradiance Blending
if (bInsertPerfMarkers) PIXBeginEvent(cmdList, PIX_COLOR(RTXGI_PERF_MARKER_GREEN), "Probe Irradiance");
for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
@@ -358,8 +363,12 @@ namespace rtxgi
}
// Add a barrier
+ D3D12_RESOURCE_BARRIER barrier = {};
+ barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
barrier.UAV.pResource = volume->GetProbeIrradiance();
barriers.push_back(barrier);
+ barrier.UAV.pResource = volume->GetProbeVariability();
+ barriers.push_back(barrier);
}
if (bInsertPerfMarkers) PIXEndEvent(cmdList);
@@ -417,40 +426,15 @@ namespace rtxgi
}
// Add a barrier
+ D3D12_RESOURCE_BARRIER barrier = {};
+ barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
barrier.UAV.pResource = volume->GetProbeDistance();
barriers.push_back(barrier);
}
if (bInsertPerfMarkers) PIXEndEvent(cmdList);
// Barrier(s)
- // Wait for the irradiance and distance blending passes
- // to complete before using the textures
- if (!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
-
- // Remove previous barriers
- barriers.clear();
-
- // Transition volume textures back to pixel shader resources for read
- barrier = {};
- barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
- barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
- barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
- barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
-
- // Transition(s)
- for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
- {
- const DDGIVolume* volume = volumes[volumeIndex];
-
- // Transition the volume's irradiance and distance texture arrays to unordered access
- barrier.Transition.pResource = volume->GetProbeIrradiance();
- barriers.push_back(barrier);
-
- barrier.Transition.pResource = volume->GetProbeDistance();
- barriers.push_back(barrier);
- }
-
- // Wait for the resource transitions to complete
+ // Wait for the irradiance and distance blending passes to complete before using the textures
if (!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
if (bInsertPerfMarkers) PIXEndEvent(cmdList);
@@ -517,9 +501,11 @@ namespace rtxgi
}
// Probe Relocation Reset Barrier(s)
- if(!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
-
- barriers.clear();
+ if (!barriers.empty())
+ {
+ cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
+ barriers.clear();
+ }
// Probe Relocation
for(volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
@@ -632,9 +618,11 @@ namespace rtxgi
}
// Probe Classification Reset Barrier(s)
- if (!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
-
- barriers.clear();
+ if (!barriers.empty())
+ {
+ cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
+ barriers.clear();
+ }
// Probe Classification
for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
@@ -689,6 +677,223 @@ namespace rtxgi
return ERTXGIStatus::OK;
}
+ ERTXGIStatus CalculateDDGIVolumeVariability(ID3D12GraphicsCommandList* cmdList, UINT numVolumes, DDGIVolume** volumes)
+ {
+ if (bInsertPerfMarkers) PIXBeginEvent(cmdList, PIX_COLOR(RTXGI_PERF_MARKER_GREEN), "Probe Variability Calculation");
+
+ UINT volumeIndex;
+
+ // Reduction
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ // Set the descriptor heap(s)
+ std::vector heaps;
+ heaps.push_back(volume->GetResourceDescriptorHeap());
+ if (volume->GetSamplerDescriptorHeap()) heaps.push_back(volume->GetSamplerDescriptorHeap());
+ cmdList->SetDescriptorHeaps((UINT)heaps.size(), heaps.data());
+
+ // Set root signature and root constants
+ cmdList->SetComputeRootSignature(volume->GetRootSignature());
+ cmdList->SetComputeRoot32BitConstants(volume->GetRootParamSlotRootConstants(), DDGIRootConstants::GetNum32BitValues(), volume->GetRootConstants().GetData(), 0);
+
+ // Set the descriptor tables (when relevant)
+ if (volume->GetBindlessEnabled())
+ {
+ // Bindless resources, using application's root signature
+ if (volume->GetBindlessType() == EBindlessType::RESOURCE_ARRAYS)
+ {
+ // Only need to set descriptor tables when using traditional resource array bindless
+ cmdList->SetComputeRootDescriptorTable(volume->GetRootParamSlotResourceDescriptorTable(), volume->GetResourceDescriptorHeap()->GetGPUDescriptorHandleForHeapStart());
+ if (volume->GetSamplerDescriptorHeap()) cmdList->SetComputeRootDescriptorTable(volume->GetRootParamSlotSamplerDescriptorTable(), volume->GetSamplerDescriptorHeap()->GetGPUDescriptorHandleForHeapStart());
+ }
+ }
+ else
+ {
+ // Bound resources, using the SDK's root signature
+ cmdList->SetComputeRootDescriptorTable(volume->GetRootParamSlotResourceDescriptorTable(), volume->GetResourceDescriptorHeap()->GetGPUDescriptorHandleForHeapStart());
+ }
+
+ // Get the number of probes on the XYZ dimensions of the texture
+ UINT probeCountX, probeCountY, probeCountZ;
+ GetDDGIVolumeProbeCounts(volume->GetDesc(), probeCountX, probeCountY, probeCountZ);
+
+ // Initially, the reduction input is the full variability size (same as irradiance texture without border texels)
+ UINT inputTexelsX = probeCountX * volume->GetDesc().probeNumIrradianceInteriorTexels;
+ UINT inputTexelsY = probeCountY * volume->GetDesc().probeNumIrradianceInteriorTexels;
+ UINT inputTexelsZ = probeCountZ;
+
+ const uint3 NumThreadsInGroup = { 4, 8, 4 }; // Each thread group will have 8x8x8 threads
+ constexpr uint2 ThreadSampleFootprint = { 4, 2 }; // Each thread will sample 4x2 texels
+
+ DDGIRootConstants consts = volume->GetRootConstants();
+
+ // First pass reduction takes probe irradiance data and calculates variability, reduces as much as possible
+ {
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers())
+ {
+ std::string msg = "Reduction, DDGIVolume[" + std::to_string(volume->GetIndex()) + "] - \"" + volume->GetName() + "\"";
+ PIXBeginEvent(cmdList, PIX_COLOR(RTXGI_PERF_MARKER_GREEN), msg.c_str());
+ }
+
+ // Set the PSO and dispatch threads
+ cmdList->SetPipelineState(volume->GetProbeVariabilityReductionPSO());
+
+ // One thread group per output texel
+ UINT outputTexelsX = (UINT)ceil((float)inputTexelsX / (NumThreadsInGroup.x * ThreadSampleFootprint.x));
+ UINT outputTexelsY = (UINT)ceil((float)inputTexelsY / (NumThreadsInGroup.y * ThreadSampleFootprint.y));
+ UINT outputTexelsZ = (UINT)ceil((float)inputTexelsZ / NumThreadsInGroup.z);
+
+ consts.reductionInputSizeX = inputTexelsX;
+ consts.reductionInputSizeY = inputTexelsY;
+ consts.reductionInputSizeZ = inputTexelsZ;
+ cmdList->SetComputeRoot32BitConstants(volume->GetRootParamSlotRootConstants(), DDGIRootConstants::GetNum32BitValues(), consts.GetData(), 0);
+
+ cmdList->Dispatch(outputTexelsX, outputTexelsY, outputTexelsZ);
+
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers()) PIXEndEvent(cmdList);
+
+ // Each thread group will write out a value to the averaging texture
+ // If there is more than one thread group, we will need to do extra averaging passes
+ inputTexelsX = outputTexelsX;
+ inputTexelsY = outputTexelsY;
+ inputTexelsZ = outputTexelsZ;
+ }
+
+ // UAV barrier needed after each reduction pass
+ D3D12_RESOURCE_BARRIER reductionBarrier = {};
+ reductionBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
+ reductionBarrier.UAV.pResource = volume->GetProbeVariabilityAverage();
+ cmdList->ResourceBarrier(1, &reductionBarrier);
+
+ // Extra reduction passes average values in variability texture down to single value
+ while (inputTexelsX > 1 || inputTexelsY > 1 || inputTexelsZ > 1)
+ {
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers())
+ {
+ std::string msg = "Extra Reduction, DDGIVolume[" + std::to_string(volume->GetIndex()) + "] - \"" + volume->GetName() + "\"";
+ PIXBeginEvent(cmdList, PIX_COLOR(RTXGI_PERF_MARKER_GREEN), msg.c_str());
+ }
+
+ cmdList->SetPipelineState(volume->GetProbeVariabilityExtraReductionPSO());
+
+ // One thread group per output texel
+ UINT outputTexelsX = (UINT)ceil((float)inputTexelsX / (NumThreadsInGroup.x * ThreadSampleFootprint.x));
+ UINT outputTexelsY = (UINT)ceil((float)inputTexelsY / (NumThreadsInGroup.y * ThreadSampleFootprint.y));
+ UINT outputTexelsZ = (UINT)ceil((float)inputTexelsZ / NumThreadsInGroup.z);
+
+ consts.reductionInputSizeX = inputTexelsX;
+ consts.reductionInputSizeY = inputTexelsY;
+ consts.reductionInputSizeZ = inputTexelsZ;
+ cmdList->SetComputeRoot32BitConstants(volume->GetRootParamSlotRootConstants(), DDGIRootConstants::GetNum32BitValues(), consts.GetData(), 0);
+
+ cmdList->Dispatch(outputTexelsX, outputTexelsY, outputTexelsZ);
+
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers()) PIXEndEvent(cmdList);
+
+ inputTexelsX = outputTexelsX;
+ inputTexelsY = outputTexelsY;
+ inputTexelsZ = outputTexelsZ;
+
+ // Need a barrier in between each reduction pass
+ cmdList->ResourceBarrier(1, &reductionBarrier);
+ }
+ }
+
+ if (bInsertPerfMarkers) PIXEndEvent(cmdList);
+
+ // Copy readback buffer
+ std::vector barriers;
+ if (bInsertPerfMarkers) PIXBeginEvent(cmdList, PIX_COLOR(RTXGI_PERF_MARKER_GREEN), "Probe Variability Readback");
+
+ {
+ D3D12_RESOURCE_BARRIER beforeBarrier = {};
+ beforeBarrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
+ beforeBarrier.Transition.StateBefore = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
+ beforeBarrier.Transition.StateAfter = D3D12_RESOURCE_STATE_COPY_SOURCE;
+ beforeBarrier.Transition.Subresource = 0;
+
+ D3D12_RESOURCE_BARRIER afterBarrier = beforeBarrier;
+ afterBarrier.Transition.StateBefore = beforeBarrier.Transition.StateAfter;
+ afterBarrier.Transition.StateAfter = beforeBarrier.Transition.StateBefore;
+
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ beforeBarrier.Transition.pResource = volume->GetProbeVariabilityAverage();
+ barriers.push_back(beforeBarrier);
+ }
+
+ if (!barriers.empty())
+ {
+ cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
+ barriers.clear();
+ }
+
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ D3D12_TEXTURE_COPY_LOCATION copyLocSrc = {};
+ copyLocSrc.pResource = volume->GetProbeVariabilityAverage();
+ copyLocSrc.Type = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX;
+ copyLocSrc.SubresourceIndex = 0;
+
+ D3D12_TEXTURE_COPY_LOCATION copyLocDst = {};
+ copyLocDst.pResource = volume->GetProbeVariabilityReadback();
+ copyLocDst.Type = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT;
+ copyLocDst.PlacedFootprint.Offset = 0;
+ copyLocDst.PlacedFootprint.Footprint.Width = 1;
+ copyLocDst.PlacedFootprint.Footprint.Height = 1;
+ copyLocDst.PlacedFootprint.Footprint.Depth = 1;
+ copyLocDst.PlacedFootprint.Footprint.Format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, volume->GetDesc().probeVariabilityFormat);
+ copyLocDst.PlacedFootprint.Footprint.RowPitch = D3D12_TEXTURE_DATA_PITCH_ALIGNMENT;
+
+ D3D12_BOX box = { 0, 0, 0, 1, 1, 1};
+ cmdList->CopyTextureRegion(©LocDst, 0, 0, 0, ©LocSrc, &box);
+
+ afterBarrier.Transition.pResource = volume->GetProbeVariabilityAverage();
+ barriers.push_back(afterBarrier);
+ }
+
+ if (!barriers.empty()) cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
+ }
+
+ if (bInsertPerfMarkers) PIXEndEvent(cmdList);
+
+ return ERTXGIStatus::OK;
+ }
+
+ ERTXGIStatus ReadbackDDGIVolumeVariability(UINT numVolumes, DDGIVolume** volumes)
+ {
+ for (UINT volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ // Get the volume
+ DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ // Get the probe variability readback buffer
+ ID3D12Resource* readback = volume->GetProbeVariabilityReadback();
+
+ // Read the first 32-bits of the readback buffer
+ float* pMappedMemory = nullptr;
+ D3D12_RANGE readRange = { 0, sizeof(float) };
+ D3D12_RANGE writeRange = {};
+ HRESULT hr = readback->Map(0, &readRange, (void**)&pMappedMemory);
+ if (FAILED(hr)) return ERTXGIStatus::ERROR_DDGI_MAP_FAILURE_VARIABILITY_READBACK_BUFFER;
+ float value = pMappedMemory[0];
+ readback->Unmap(0, &writeRange);
+
+ volume->SetVolumeAverageVariability(value);
+ }
+ return ERTXGIStatus::OK;
+ }
+
//------------------------------------------------------------------------
// Private DDGIVolume Functions
//------------------------------------------------------------------------
@@ -707,6 +912,8 @@ namespace rtxgi
RTXGI_SAFE_RELEASE(m_probeRelocationResetPSO);
RTXGI_SAFE_RELEASE(m_probeClassificationPSO);
RTXGI_SAFE_RELEASE(m_probeClassificationResetPSO);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityReductionPSO);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityExtraReductionPSO);
}
ERTXGIStatus DDGIVolume::CreateManagedResources(const DDGIVolumeDesc& desc, const DDGIVolumeManagedResourcesDesc& managed)
@@ -755,17 +962,29 @@ namespace rtxgi
managed.probeClassification.resetCS,
&m_probeClassificationResetPSO,
"Probe Classification Reset")) return ERTXGIStatus::ERROR_DDGI_D3D12_CREATE_FAILURE_PSO;
+
+ if (!CreateComputePSO(
+ managed.probeVariability.reductionCS,
+ &m_probeVariabilityReductionPSO,
+ "Probe Variability Reduction")) return ERTXGIStatus::ERROR_DDGI_D3D12_CREATE_FAILURE_PSO;
+
+ if (!CreateComputePSO(
+ managed.probeVariability.extraReductionCS,
+ &m_probeVariabilityExtraReductionPSO,
+ "Probe Variability Extra Reduction")) return ERTXGIStatus::ERROR_DDGI_D3D12_CREATE_FAILURE_PSO;
}
// Create the textures
if (deviceChanged || m_desc.ShouldAllocateProbes(desc))
{
// Probe counts have changed. The texture arrays are the wrong size or aren't allocated yet.
- // (Re)allocate the probe ray data, irradiance, distance, and data textures.
+ // (Re)allocate the probe ray data, irradiance, distance, data, and variability textures.
if (!CreateProbeRayData(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_RAY_DATA;
if (!CreateProbeIrradiance(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_IRRADIANCE;
if (!CreateProbeDistance(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_DISTANCE;
if (!CreateProbeData(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_DATA;
+ if (!CreateProbeVariability(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_VARIABILITY;
+ if (!CreateProbeVariabilityAverage(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_VARIABILITY_AVERAGE;
}
else
{
@@ -806,6 +1025,9 @@ namespace rtxgi
m_probeIrradiance = unmanaged.probeIrradiance;
m_probeDistance = unmanaged.probeDistance;
m_probeData = unmanaged.probeData;
+ m_probeVariability = unmanaged.probeVariability;
+ m_probeVariabilityAverage = unmanaged.probeVariabilityAverage;
+ m_probeVariabilityReadback = unmanaged.probeVariabilityReadback;
// Render Target Views
m_probeIrradianceRTV = unmanaged.probeIrradianceRTV;
@@ -818,6 +1040,8 @@ namespace rtxgi
m_probeRelocationResetPSO = unmanaged.probeRelocation.resetPSO;
m_probeClassificationPSO = unmanaged.probeClassification.updatePSO;
m_probeClassificationResetPSO = unmanaged.probeClassification.resetPSO;
+ m_probeVariabilityReductionPSO = unmanaged.probeVariabilityPSOs.reductionPSO;
+ m_probeVariabilityExtraReductionPSO = unmanaged.probeVariabilityPSOs.extraReductionPSO;
}
#endif
@@ -910,12 +1134,12 @@ namespace rtxgi
// Transition the probe textures render targets
D3D12_RESOURCE_BARRIER barriers[2] = {};
barriers[0].Transition.pResource = m_probeIrradiance;
- barriers[0].Transition.StateBefore = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ barriers[0].Transition.StateBefore = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
barriers[0].Transition.StateAfter = D3D12_RESOURCE_STATE_RENDER_TARGET;
barriers[0].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
barriers[1].Transition.pResource = m_probeDistance;
- barriers[1].Transition.StateBefore = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ barriers[1].Transition.StateBefore = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
barriers[1].Transition.StateAfter = D3D12_RESOURCE_STATE_RENDER_TARGET;
barriers[1].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
@@ -930,10 +1154,10 @@ namespace rtxgi
// Transition the probe textures back to unordered access
barriers[0].Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
- barriers[0].Transition.StateAfter = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ barriers[0].Transition.StateAfter = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
barriers[1].Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
- barriers[1].Transition.StateAfter = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ barriers[1].Transition.StateAfter = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
// Wait for the transitions
cmdList->ResourceBarrier(2, barriers);
@@ -943,6 +1167,46 @@ namespace rtxgi
return ERTXGIStatus::OK;
}
+ void DDGIVolume::TransitionResources(ID3D12GraphicsCommandList* cmdList, EDDGIExecutionStage stage) const
+ {
+ std::vector barriers;
+
+ D3D12_RESOURCE_BARRIER barrier = {};
+ barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
+ barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
+
+ if (stage == EDDGIExecutionStage::POST_PROBE_TRACE)
+ {
+ barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
+ barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
+ }
+ else if (stage == EDDGIExecutionStage::PRE_GATHER_CS)
+ {
+ barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
+ barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
+ }
+ else if (stage == EDDGIExecutionStage::PRE_GATHER_PS)
+ {
+ barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_UNORDERED_ACCESS;
+ barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ }
+ else if (stage == EDDGIExecutionStage::POST_GATHER_PS)
+ {
+ barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE;
+ barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE;
+ }
+
+ // Add the volume texture array resources
+ barrier.Transition.pResource = m_probeIrradiance;
+ barriers.push_back(barrier);
+ barrier.Transition.pResource = m_probeDistance;
+ barriers.push_back(barrier);
+ barrier.Transition.pResource = m_probeData;
+ barriers.push_back(barrier);
+
+ cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
+ }
+
DDGIVolumeResourceIndices DDGIVolume::GetResourceIndices() const
{
if(m_bindlessResources.type == EBindlessType::DESCRIPTOR_HEAP) return m_descriptorHeapDesc.resourceIndices;
@@ -972,6 +1236,16 @@ namespace rtxgi
if (view == EResourceViewType::UAV) return m_descriptorHeapDesc.resourceIndices.probeDataUAVIndex;
if (view == EResourceViewType::SRV) return m_descriptorHeapDesc.resourceIndices.probeDataSRVIndex;
}
+ else if (type == EDDGIVolumeTextureType::Variability)
+ {
+ if (view == EResourceViewType::UAV) return m_descriptorHeapDesc.resourceIndices.probeVariabilityUAVIndex;
+ if (view == EResourceViewType::SRV) return m_descriptorHeapDesc.resourceIndices.probeVariabilitySRVIndex;
+ }
+ else if (type == EDDGIVolumeTextureType::VariabilityAverage)
+ {
+ if (view == EResourceViewType::UAV) return m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageUAVIndex;
+ if (view == EResourceViewType::SRV) return m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageSRVIndex;
+ }
return 0;
}
@@ -998,6 +1272,16 @@ namespace rtxgi
if (view == EResourceViewType::UAV) m_descriptorHeapDesc.resourceIndices.probeDataUAVIndex = index;
if (view == EResourceViewType::SRV) m_descriptorHeapDesc.resourceIndices.probeDataSRVIndex = index;
}
+ else if (type == EDDGIVolumeTextureType::Variability)
+ {
+ if (view == EResourceViewType::UAV) m_descriptorHeapDesc.resourceIndices.probeVariabilityUAVIndex = index;
+ if (view == EResourceViewType::SRV) m_descriptorHeapDesc.resourceIndices.probeVariabilitySRVIndex = index;
+ }
+ else if (type == EDDGIVolumeTextureType::VariabilityAverage)
+ {
+ if (view == EResourceViewType::UAV) m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageUAVIndex = index;
+ if (view == EResourceViewType::SRV) m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageSRVIndex = index;
+ }
}
void DDGIVolume::Destroy()
@@ -1043,6 +1327,9 @@ namespace rtxgi
RTXGI_SAFE_RELEASE(m_probeIrradiance);
RTXGI_SAFE_RELEASE(m_probeDistance);
RTXGI_SAFE_RELEASE(m_probeData);
+ RTXGI_SAFE_RELEASE(m_probeVariability);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityAverage);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityReadback);
RTXGI_SAFE_RELEASE(m_probeBlendingIrradiancePSO);
RTXGI_SAFE_RELEASE(m_probeBlendingDistancePSO);
@@ -1050,6 +1337,8 @@ namespace rtxgi
RTXGI_SAFE_RELEASE(m_probeRelocationResetPSO);
RTXGI_SAFE_RELEASE(m_probeClassificationPSO);
RTXGI_SAFE_RELEASE(m_probeClassificationResetPSO);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityReductionPSO);
+ RTXGI_SAFE_RELEASE(m_probeVariabilityExtraReductionPSO);
#else
m_rootSignature = nullptr;
@@ -1057,6 +1346,9 @@ namespace rtxgi
m_probeIrradiance = nullptr;
m_probeDistance = nullptr;
m_probeData = nullptr;
+ m_probeVariability = nullptr;
+ m_probeVariabilityAverage = nullptr;
+ m_probeVariabilityReadback = nullptr;
m_probeBlendingIrradiancePSO = nullptr;
m_probeBlendingDistancePSO = nullptr;
@@ -1064,6 +1356,8 @@ namespace rtxgi
m_probeRelocationResetPSO = nullptr;
m_probeClassificationPSO = nullptr;
m_probeClassificationResetPSO = nullptr;
+ m_probeVariabilityReductionPSO = nullptr;
+ m_probeVariabilityExtraReductionPSO = nullptr;
#endif;
}
@@ -1158,6 +1452,30 @@ namespace rtxgi
m_device->CreateShaderResourceView(m_probeData, &srvDesc, srvHandle);
}
+ // Probe variability texture descriptors
+ {
+ uavHandle.ptr = heapStart.ptr + (m_descriptorHeapDesc.resourceIndices.probeVariabilityUAVIndex * m_descriptorHeapDesc.entrySize);
+ srvHandle.ptr = heapStart.ptr + (m_descriptorHeapDesc.resourceIndices.probeVariabilitySRVIndex * m_descriptorHeapDesc.entrySize);
+
+ srvDesc.Format = uavDesc.Format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::Variability, m_desc.probeVariabilityFormat);
+ m_device->CreateUnorderedAccessView(m_probeVariability, nullptr, &uavDesc, uavHandle);
+ m_device->CreateShaderResourceView(m_probeVariability, &srvDesc, srvHandle);
+ }
+
+ // Probe variability average texture descriptors
+ {
+ uavHandle.ptr = heapStart.ptr + (m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageUAVIndex * m_descriptorHeapDesc.entrySize);
+ srvHandle.ptr = heapStart.ptr + (m_descriptorHeapDesc.resourceIndices.probeVariabilityAverageSRVIndex * m_descriptorHeapDesc.entrySize);
+
+ UINT variabilityAverageArraySize;
+ GetDDGIVolumeTextureDimensions(m_desc, EDDGIVolumeTextureType::VariabilityAverage, width, height, variabilityAverageArraySize);
+ srvDesc.Format = uavDesc.Format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, m_desc.probeVariabilityFormat);
+ uavDesc.Texture2DArray.ArraySize = variabilityAverageArraySize;
+ srvDesc.Texture2DArray.ArraySize = variabilityAverageArraySize;
+ m_device->CreateUnorderedAccessView(m_probeVariabilityAverage, nullptr, &uavDesc, uavHandle);
+ m_device->CreateShaderResourceView(m_probeVariabilityAverage, &srvDesc, srvHandle);
+ }
+
// Describe the RTV heap
D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {};
heapDesc.NumDescriptors = GetDDGIVolumeNumRTVDescriptors();
@@ -1308,7 +1626,7 @@ namespace rtxgi
// Create the texture resource
D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS | D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET;
- bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, flags, &m_probeIrradiance);
+ bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE, flags, &m_probeIrradiance);
if (!result) return false;
#ifdef RTXGI_GFX_NAME_OBJECTS
std::wstring name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Irradiance";
@@ -1334,7 +1652,7 @@ namespace rtxgi
// Create the texture resource
D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS | D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET;
- bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, flags, &m_probeDistance);
+ bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE, flags, &m_probeDistance);
if (!result) return false;
#ifdef RTXGI_GFX_NAME_OBJECTS
std::wstring name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Distance";
@@ -1360,7 +1678,7 @@ namespace rtxgi
if (width <= 0 || height <= 0 || arraySize <= 0) return false;
// Create the texture resource
- bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS, &m_probeData);
+ bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS, &m_probeData);
if (!result) return false;
#ifdef RTXGI_GFX_NAME_OBJECTS
std::wstring name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Data";
@@ -1370,6 +1688,90 @@ namespace rtxgi
return true;
}
+ bool DDGIVolume::CreateProbeVariability(const DDGIVolumeDesc& desc)
+ {
+ RTXGI_SAFE_RELEASE(m_probeVariability);
+
+ UINT width = 0;
+ UINT height = 0;
+ UINT arraySize = 0;
+ DXGI_FORMAT format = DXGI_FORMAT_UNKNOWN;
+
+ // Get the texture dimensions and format
+ GetDDGIVolumeTextureDimensions(desc, EDDGIVolumeTextureType::Variability, width, height, arraySize);
+ format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::Variability, desc.probeVariabilityFormat);
+
+ // Check for problems
+ if (width <= 0 || height <= 0 || arraySize <= 0) return false;
+
+ // Create the texture resource
+ bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS, &m_probeVariability);
+ if (!result) return false;
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ std::wstring name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Variability";
+ m_probeVariability->SetName(name.c_str());
+ #endif
+
+ return true;
+ }
+
+ bool DDGIVolume::CreateProbeVariabilityAverage(const DDGIVolumeDesc& desc)
+ {
+ RTXGI_SAFE_RELEASE(m_probeVariabilityAverage);
+
+ UINT width = 0;
+ UINT height = 0;
+ UINT arraySize = 0;
+ DXGI_FORMAT format = DXGI_FORMAT_UNKNOWN;
+
+ // Get the texture dimensions and format
+ GetDDGIVolumeTextureDimensions(desc, EDDGIVolumeTextureType::VariabilityAverage, width, height, arraySize);
+ format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, desc.probeVariabilityFormat);
+
+ // Check for problems
+ if (width <= 0 || height <= 0 || arraySize <= 0) return false;
+
+ // Create the texture resource
+ bool result = CreateTexture(width, height, arraySize, format, D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS, &m_probeVariabilityAverage);
+ if (!result) return false;
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ std::wstring name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Variability Average";
+ m_probeVariabilityAverage->SetName(name.c_str());
+ #endif
+
+ // Create the readback texture
+ RTXGI_SAFE_RELEASE(m_probeVariabilityReadback);
+
+ // Readback texture is always in "full" format (R32G32F)
+ format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, desc.probeVariabilityFormat);
+ {
+ D3D12_HEAP_PROPERTIES readbackHeapProperties = {};
+ readbackHeapProperties.Type = D3D12_HEAP_TYPE_READBACK;
+
+ D3D12_RESOURCE_DESC desc = {};
+ desc.Format = DXGI_FORMAT_UNKNOWN;
+ desc.Width = sizeof(float) * 2;
+ desc.Height = 1;
+ desc.MipLevels = 1;
+ desc.DepthOrArraySize = 1;
+ desc.SampleDesc.Count = 1;
+ desc.SampleDesc.Quality = 0;
+ desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
+ desc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
+ desc.Flags = D3D12_RESOURCE_FLAG_NONE;
+
+ HRESULT hr = m_device->CreateCommittedResource(&readbackHeapProperties, D3D12_HEAP_FLAG_NONE, &desc, D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&m_probeVariabilityReadback));
+ result = SUCCEEDED(hr);
+ }
+ if (!result) return false;
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ name = L"DDGIVolume[" + std::to_wstring(desc.index) + L"], Probe Variability Readback";
+ m_probeVariabilityReadback->SetName(name.c_str());
+ #endif
+
+ return true;
+ }
+
#endif // RTXGI_DDGI_RESOURCE_MANAGEMENT
} // namespace d3d12
diff --git a/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_VK.cpp b/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_VK.cpp
index 14cfd4a..958a031 100644
--- a/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_VK.cpp
+++ b/rtxgi-sdk/src/ddgi/gfx/DDGIVolume_VK.cpp
@@ -74,6 +74,8 @@ namespace rtxgi
if (!ValidateShaderBytecode(desc.probeRelocation.resetCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_RELOCATION_RESET;
if (!ValidateShaderBytecode(desc.probeClassification.updateCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_CLASSIFICATION;
if (!ValidateShaderBytecode(desc.probeClassification.resetCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_CLASSIFICATION_RESET;
+ if (!ValidateShaderBytecode(desc.probeVariability.reductionCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_CLASSIFICATION_RESET;
+ if (!ValidateShaderBytecode(desc.probeVariability.extraReductionCS)) return ERTXGIStatus::ERROR_DDGI_INVALID_BYTECODE_PROBE_VARIABILITY_EXTRA_REDUCTION;
return ERTXGIStatus::OK;
}
@@ -89,18 +91,26 @@ namespace rtxgi
if (desc.probeIrradiance == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_IRRADIANCE;
if (desc.probeDistance == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_DISTANCE;
if (desc.probeData == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_DATA;
+ if (desc.probeVariability == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY;
+ if (desc.probeVariabilityAverage == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY_AVERAGE;
+ if (desc.probeVariabilityReadback == nullptr) return ERTXGIStatus::ERROR_DDGI_INVALID_TEXTURE_PROBE_VARIABILITY_READBACK;
// Texture Array Memory
if (desc.probeRayDataMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_RAY_DATA;
if (desc.probeIrradianceMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_IRRADIANCE;
if (desc.probeDistanceMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_DISTANCE;
if (desc.probeDataMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_DATA;
+ if (desc.probeVariabilityMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_VARIABILITY;
+ if (desc.probeVariabilityAverageMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_VARIABILITY_AVERAGE;
+ if (desc.probeVariabilityReadbackMemory == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_MEMORY_PROBE_VARIABILITY_READBACK;
// Texture Array Views
if (desc.probeRayDataView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_RAY_DATA;
if (desc.probeIrradianceView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_IRRADIANCE;
if (desc.probeDistanceView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_DISTANCE;
if (desc.probeDataView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_DATA;
+ if (desc.probeVariabilityView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_VARIABILITY;
+ if (desc.probeVariabilityAverageView == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_IMAGE_VIEW_PROBE_VARIABILITY_AVERAGE;
// Shader Modules
if (desc.probeBlendingIrradianceModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_BLENDING_IRRADIANCE;
@@ -109,6 +119,8 @@ namespace rtxgi
if (desc.probeRelocation.resetModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_RELOCATION_RESET;
if (desc.probeClassification.updateModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_CLASSIFICATION;
if (desc.probeClassification.resetModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_CLASSIFICATION_RESET;
+ if (desc.probeVariabilityPipelines.reductionModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_VARIABILITY_REDUCTION;
+ if (desc.probeVariabilityPipelines.extraReductionModule == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_SHADER_MODULE_PROBE_VARIABILITY_EXTRA_REDUCTION;
// Pipelines
if (desc.probeBlendingIrradiancePipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_BLENDING_IRRADIANCE;
@@ -117,6 +129,8 @@ namespace rtxgi
if (desc.probeRelocation.resetPipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_RELOCATION_RESET;
if (desc.probeClassification.updatePipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_CLASSIFICATION;
if (desc.probeClassification.resetPipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_CLASSIFICATION_RESET;
+ if (desc.probeVariabilityPipelines.reductionPipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_VARIABILITY_REDUCTION;
+ if (desc.probeVariabilityPipelines.extraReductionPipeline == nullptr) return ERTXGIStatus::ERROR_DDGI_VK_INVALID_PIPELINE_PROBE_VARIABILITY_EXTRA_REDUCTION;
return ERTXGIStatus::OK;
}
@@ -148,10 +162,19 @@ namespace rtxgi
if (format == EDDGIVolumeTextureFormat::F16x4) return VK_FORMAT_R16G16B16A16_SFLOAT;
else if (format == EDDGIVolumeTextureFormat::F32x4) return VK_FORMAT_R32G32B32A32_SFLOAT;
}
+ else if (type == EDDGIVolumeTextureType::Variability)
+ {
+ if (format == EDDGIVolumeTextureFormat::F16) return VK_FORMAT_R16_SFLOAT;
+ else if (format == EDDGIVolumeTextureFormat::F32) return VK_FORMAT_R32_SFLOAT;
+ }
+ else if (type == EDDGIVolumeTextureType::VariabilityAverage)
+ {
+ return VK_FORMAT_R32G32_SFLOAT;
+ }
return VK_FORMAT_UNDEFINED;
}
- uint32_t GetDDGIVolumeLayoutBindingCount() { return 5; }
+ uint32_t GetDDGIVolumeLayoutBindingCount() { return 7; }
void GetDDGIVolumeLayoutDescs(
VkDescriptorSetLayoutCreateInfo& descriptorSetLayoutCreateInfo,
@@ -165,6 +188,8 @@ namespace rtxgi
// 1 UAV probe irradiance texture array (2)
// 1 UAV probe distance texture array (3)
// 1 UAV probe data texture array (4)
+ // 1 UAV probe variation texture array (5)
+ // 1 UAV probe variation average array (6)
// 0: Volume Constants Structured Buffer
VkDescriptorSetLayoutBinding& bind0 = bindings[0];
@@ -201,6 +226,20 @@ namespace rtxgi
bind4.descriptorCount = 1;
bind4.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
+ // 5: Probe Variability
+ VkDescriptorSetLayoutBinding& bind5 = bindings[5];
+ bind5.binding = static_cast(EDDGIVolumeBindings::ProbeVariability);
+ bind5.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
+ bind5.descriptorCount = 1;
+ bind5.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
+
+ // 6: Probe Variability
+ VkDescriptorSetLayoutBinding& bind6 = bindings[6];
+ bind6.binding = static_cast(EDDGIVolumeBindings::ProbeVariabilityAverage);
+ bind6.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
+ bind6.descriptorCount = 1;
+ bind6.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
+
// Describe the descriptor set layout
descriptorSetLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
descriptorSetLayoutCreateInfo.bindingCount = GetDDGIVolumeLayoutBindingCount();
@@ -361,6 +400,8 @@ namespace rtxgi
// Add a barrier
barrier.image = volume->GetProbeIrradiance();
barriers.push_back(barrier);
+ barrier.image = volume->GetProbeVariability();
+ barriers.push_back(barrier);
}
if (bInsertPerfMarkers) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
@@ -402,9 +443,9 @@ namespace rtxgi
}
if (bInsertPerfMarkers) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
- // Wait for the irradiance and distance blending passes
- // to complete before updating the borders
- if(!barriers.empty())
+ // Irradiance pass must finish generating variability before possible reduction pass
+ // Also ensures that irradiance and distance complete before border update after reduction
+ if (!barriers.empty())
{
vkCmdPipelineBarrier(
cmdBuffer,
@@ -624,6 +665,232 @@ namespace rtxgi
return ERTXGIStatus::OK;
}
+ ERTXGIStatus CalculateDDGIVolumeVariability(VkCommandBuffer cmdBuffer, uint32_t numVolumes, DDGIVolume** volumes)
+ {
+ if (bInsertPerfMarkers) AddPerfMarker(cmdBuffer, RTXGI_PERF_MARKER_GREEN, "Probe Variability Calculation");
+
+ uint32_t volumeIndex;
+ std::vector barriers;
+
+ // Reduction
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ // Bind the descriptor set and push constants
+ vkCmdBindDescriptorSets(cmdBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, volume->GetPipelineLayout(), 0, 1, volume->GetDescriptorSetConstPtr(), 0, nullptr);
+
+ // Get the number of probes on the XYZ dimensions of the texture
+ uint32_t probeCountX, probeCountY, probeCountZ;
+ GetDDGIVolumeProbeCounts(volume->GetDesc(), probeCountX, probeCountY, probeCountZ);
+
+ // Initially, the reduction input is the full variability size (same as irradiance texture)
+ uint32_t inputTexelsX = probeCountX * volume->GetDesc().probeNumIrradianceInteriorTexels;
+ uint32_t inputTexelsY = probeCountY * volume->GetDesc().probeNumIrradianceInteriorTexels;
+ uint32_t inputTexelsZ = probeCountZ;
+
+ const uint3 NumThreadsInGroup = { 4, 8, 4 }; // Each thread group will have 8x8x8 threads
+ constexpr uint2 ThreadSampleFootprint = { 4, 2 }; // Each thread will sample 4x2 texels
+
+ // Set push constants
+ DDGIRootConstants consts = volume->GetPushConstants();
+
+ // First pass reduction takes probe irradiance data and calculates variability, reduces as much as possible
+ {
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers())
+ {
+ std::string msg = "Reduction, DDGIVolume[" + std::to_string(volume->GetIndex()) + "] - \"" + volume->GetName() + "\"";
+ AddPerfMarker(cmdBuffer, RTXGI_PERF_MARKER_GREEN, msg.c_str());
+ }
+
+ // Set the PSO and dispatch threads
+ vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, volume->GetProbeVariabilityReductionPipeline());
+
+ // One thread group per output texel
+ uint32_t outputTexelsX = (uint32_t)ceil((float)inputTexelsX / (float)(NumThreadsInGroup.x * ThreadSampleFootprint.x));
+ uint32_t outputTexelsY = (uint32_t)ceil((float)inputTexelsY / (float)(NumThreadsInGroup.y * ThreadSampleFootprint.y));
+ uint32_t outputTexelsZ = (uint32_t)ceil((float)inputTexelsZ / (float)NumThreadsInGroup.z);
+
+ consts.reductionInputSizeX = inputTexelsX;
+ consts.reductionInputSizeY = inputTexelsY;
+ consts.reductionInputSizeZ = inputTexelsZ;
+ vkCmdPushConstants(cmdBuffer, volume->GetPipelineLayout(), VK_SHADER_STAGE_ALL, volume->GetPushConstantsOffset(), DDGIRootConstants::GetSizeInBytes(), consts.GetData());
+
+ vkCmdDispatch(cmdBuffer, outputTexelsX, outputTexelsY, outputTexelsZ);
+
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers()) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
+
+ // Each thread group will write out a value to the averaging texture
+ // If there is more than one thread group, we will need to do extra averaging passes
+ inputTexelsX = outputTexelsX;
+ inputTexelsY = outputTexelsY;
+ inputTexelsZ = outputTexelsZ;
+ }
+
+ // UAV barrier needed after each reduction pass
+ VkImageMemoryBarrier reductionBarrier = {};
+ reductionBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
+ reductionBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
+ reductionBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT;
+ reductionBarrier.oldLayout = reductionBarrier.newLayout = VK_IMAGE_LAYOUT_GENERAL;
+ reductionBarrier.subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
+ reductionBarrier.image = volume->GetProbeVariabilityAverage();
+ vkCmdPipelineBarrier(
+ cmdBuffer,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ 0,
+ 0, nullptr,
+ 0, nullptr,
+ 1, &reductionBarrier);
+
+ // Future extra passes (if they run) will re-use the reductionBarrier struct, so update srcAcessMask to match
+ reductionBarrier.srcAccessMask = VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT;
+
+ // Extra reduction passes average values in variability texture down to single value
+ while (inputTexelsX > 1 || inputTexelsY > 1 || inputTexelsZ > 1)
+ {
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers())
+ {
+ std::string msg = "Extra Reduction, DDGIVolume[" + std::to_string(volume->GetIndex()) + "] - \"" + volume->GetName() + "\"";
+ AddPerfMarker(cmdBuffer, RTXGI_PERF_MARKER_GREEN, msg.c_str());
+ }
+
+ vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, volume->GetProbeVariabilityExtraReductionPipeline());
+
+ // One thread group per output texel
+ uint32_t outputTexelsX = (uint32_t)ceil((float)inputTexelsX / (float)(NumThreadsInGroup.x * ThreadSampleFootprint.x));
+ uint32_t outputTexelsY = (uint32_t)ceil((float)inputTexelsY / (float)(NumThreadsInGroup.y * ThreadSampleFootprint.y));
+ uint32_t outputTexelsZ = (uint32_t)ceil((float)inputTexelsZ / (float)NumThreadsInGroup.z);
+
+ consts.reductionInputSizeX = inputTexelsX;
+ consts.reductionInputSizeY = inputTexelsY;
+ consts.reductionInputSizeZ = inputTexelsZ;
+ vkCmdPushConstants(cmdBuffer, volume->GetPipelineLayout(), VK_SHADER_STAGE_ALL, volume->GetPushConstantsOffset(), DDGIRootConstants::GetSizeInBytes(), consts.GetData());
+
+ vkCmdDispatch(cmdBuffer, outputTexelsX, outputTexelsY, outputTexelsZ);
+
+ if (bInsertPerfMarkers && volume->GetInsertPerfMarkers()) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
+
+ inputTexelsX = outputTexelsX;
+ inputTexelsY = outputTexelsY;
+ inputTexelsZ = outputTexelsZ;
+
+ // Need a barrier in between each reduction pass
+ vkCmdPipelineBarrier(
+ cmdBuffer,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ 0,
+ 0, nullptr,
+ 0, nullptr,
+ 1, &reductionBarrier);
+ }
+ }
+
+ if (bInsertPerfMarkers) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
+
+ // Copy readback buffer
+ if (bInsertPerfMarkers) AddPerfMarker(cmdBuffer, RTXGI_PERF_MARKER_GREEN, "Probe Variability Readback");
+
+ {
+ VkImageMemoryBarrier beforeBarrier = {};
+ beforeBarrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
+ beforeBarrier.srcAccessMask = VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_SHADER_WRITE_BIT;
+ beforeBarrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
+ beforeBarrier.oldLayout = VK_IMAGE_LAYOUT_GENERAL;
+ beforeBarrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
+ beforeBarrier.subresourceRange = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
+
+ VkImageMemoryBarrier afterBarrier = beforeBarrier;
+ afterBarrier.srcAccessMask = beforeBarrier.dstAccessMask;
+ afterBarrier.dstAccessMask = beforeBarrier.srcAccessMask;
+ afterBarrier.oldLayout = beforeBarrier.newLayout;
+ afterBarrier.newLayout = beforeBarrier.oldLayout;
+
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ beforeBarrier.image = volume->GetProbeVariabilityAverage();
+ barriers.push_back(beforeBarrier);
+ }
+
+ if (!barriers.empty())
+ {
+ vkCmdPipelineBarrier(
+ cmdBuffer,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ VK_PIPELINE_STAGE_TRANSFER_BIT,
+ 0,
+ 0, nullptr,
+ 0, nullptr,
+ static_cast(barriers.size()), barriers.data());
+
+ barriers.clear();
+ }
+
+ for (volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ const DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ VkBufferImageCopy copy = {};
+ copy.imageSubresource = { VK_IMAGE_ASPECT_COLOR_BIT, 0, 0, 1 };
+ copy.imageExtent = { 1, 1, 1 };
+ vkCmdCopyImageToBuffer(cmdBuffer,
+ volume->GetProbeVariabilityAverage(), VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+ volume->GetProbeVariabilityReadback(),
+ 1, ©);
+
+ afterBarrier.image = volume->GetProbeVariabilityAverage();
+ barriers.push_back(afterBarrier);
+ }
+
+ if (!barriers.empty())
+ {
+ vkCmdPipelineBarrier(
+ cmdBuffer,
+ VK_PIPELINE_STAGE_TRANSFER_BIT,
+ VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
+ 0,
+ 0, nullptr,
+ 0, nullptr,
+ static_cast(barriers.size()), barriers.data());
+ barriers.clear();
+ }
+ }
+
+ if (bInsertPerfMarkers) vkCmdEndDebugUtilsLabelEXT(cmdBuffer);
+
+ return ERTXGIStatus::OK;
+ }
+
+ ERTXGIStatus ReadbackDDGIVolumeVariability(VkDevice device, uint32_t numVolumes, DDGIVolume** volumes)
+ {
+ for (uint32_t volumeIndex = 0; volumeIndex < numVolumes; volumeIndex++)
+ {
+ // Get the volume
+ DDGIVolume* volume = volumes[volumeIndex];
+ if (!volume->GetProbeVariabilityEnabled()) continue; // Skip if the volume is not calculating variability
+
+ // Get the probe variability readback buffer
+ VkDeviceMemory readback = volume->GetProbeVariabilityReadbackMemory();
+
+ // Read the first 32-bits of the readback buffer
+ float* pMappedMemory = nullptr;
+ VkResult result = vkMapMemory(device, readback, 0, sizeof(float), 0, (void**)&pMappedMemory);
+ if (VKFAILED(result)) return ERTXGIStatus::ERROR_DDGI_MAP_FAILURE_VARIABILITY_READBACK_BUFFER;
+ float value = pMappedMemory[0];
+ vkUnmapMemory(device, readback);
+
+ volume->SetVolumeAverageVariability(value);
+ }
+ return ERTXGIStatus::OK;
+ }
+
//------------------------------------------------------------------------
// Private DDGIVolume Functions
//------------------------------------------------------------------------
@@ -642,6 +909,8 @@ namespace rtxgi
vkDestroyShaderModule(m_device, m_probeRelocationResetModule, nullptr);
vkDestroyShaderModule(m_device, m_probeClassificationModule, nullptr);
vkDestroyShaderModule(m_device, m_probeClassificationResetModule, nullptr);
+ vkDestroyShaderModule(m_device, m_probeVariabilityReductionModule, nullptr);
+ vkDestroyShaderModule(m_device, m_probeVariabilityExtraReductionModule, nullptr);
// Release the existing compute pipelines
vkDestroyPipeline(m_device, m_probeBlendingIrradiancePipeline, nullptr);
@@ -650,6 +919,8 @@ namespace rtxgi
vkDestroyPipeline(m_device, m_probeRelocationResetPipeline, nullptr);
vkDestroyPipeline(m_device, m_probeClassificationPipeline, nullptr);
vkDestroyPipeline(m_device, m_probeClassificationResetPipeline, nullptr);
+ vkDestroyPipeline(m_device, m_probeVariabilityReductionPipeline, nullptr);
+ vkDestroyPipeline(m_device, m_probeVariabilityExtraReductionPipeline, nullptr);
}
ERTXGIStatus DDGIVolume::CreateManagedResources(const DDGIVolumeDesc& desc, const DDGIVolumeManagedResourcesDesc& managed)
@@ -712,6 +983,20 @@ namespace rtxgi
&m_probeClassificationResetModule,
&m_probeClassificationResetPipeline,
"Probe Classification Reset")) return ERTXGIStatus::ERROR_DDGI_VK_CREATE_FAILURE_PIPELINE;
+
+ if (!CreateComputePipeline(
+ managed.probeVariability.reductionCS,
+ "DDGIReductionCS",
+ &m_probeVariabilityReductionModule,
+ &m_probeVariabilityReductionPipeline,
+ "Probe Variability Reduction")) return ERTXGIStatus::ERROR_DDGI_VK_CREATE_FAILURE_PIPELINE;
+
+ if (!CreateComputePipeline(
+ managed.probeVariability.extraReductionCS,
+ "DDGIExtraReductionCS",
+ &m_probeVariabilityExtraReductionModule,
+ &m_probeVariabilityExtraReductionPipeline,
+ "Probe Variability Extra Reduction")) return ERTXGIStatus::ERROR_DDGI_VK_CREATE_FAILURE_PIPELINE;
}
// Create the textures
@@ -723,6 +1008,8 @@ namespace rtxgi
if (!CreateProbeIrradiance(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_IRRADIANCE;
if (!CreateProbeDistance(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_DISTANCE;
if (!CreateProbeData(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_DATA;
+ if (!CreateProbeVariability(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_VARIABILITY;
+ if (!CreateProbeVariabilityAverage(desc)) return ERTXGIStatus::ERROR_DDGI_ALLOCATE_FAILURE_TEXTURE_PROBE_VARIABILITY_AVERAGE;
}
else
{
@@ -759,18 +1046,26 @@ namespace rtxgi
m_probeIrradiance = unmanaged.probeIrradiance;
m_probeDistance = unmanaged.probeDistance;
m_probeData = unmanaged.probeData;
+ m_probeVariability = unmanaged.probeVariability;
+ m_probeVariabilityAverage = unmanaged.probeVariabilityAverage;
+ m_probeVariabilityReadback = unmanaged.probeVariabilityReadback;
// Texture Array Memory
m_probeRayDataMemory = unmanaged.probeRayDataMemory;
m_probeIrradianceMemory = unmanaged.probeIrradianceMemory;
m_probeDistanceMemory = unmanaged.probeDistanceMemory;
m_probeDataMemory = unmanaged.probeDataMemory;
+ m_probeVariabilityMemory = unmanaged.probeVariabilityMemory;
+ m_probeVariabilityAverageMemory = unmanaged.probeVariabilityAverageMemory;
+ m_probeVariabilityReadbackMemory = unmanaged.probeVariabilityReadbackMemory;
// Texture Array Views
m_probeRayDataView = unmanaged.probeRayDataView;
m_probeIrradianceView = unmanaged.probeIrradianceView;
m_probeDistanceView = unmanaged.probeDistanceView;
m_probeDataView = unmanaged.probeDataView;
+ m_probeVariabilityView = unmanaged.probeVariabilityView;
+ m_probeVariabilityAverageView = unmanaged.probeVariabilityAverageView;
// Shader Modules
m_probeBlendingIrradianceModule = unmanaged.probeBlendingIrradianceModule;
@@ -779,6 +1074,8 @@ namespace rtxgi
m_probeRelocationResetModule = unmanaged.probeRelocation.resetModule;
m_probeClassificationModule = unmanaged.probeClassification.updateModule;
m_probeClassificationResetModule = unmanaged.probeClassification.resetModule;
+ m_probeVariabilityReductionModule = unmanaged.probeVariabilityPipelines.reductionModule;
+ m_probeVariabilityExtraReductionModule = unmanaged.probeVariabilityPipelines.extraReductionModule;
// Pipelines
m_probeBlendingIrradiancePipeline = unmanaged.probeBlendingIrradiancePipeline;
@@ -787,6 +1084,8 @@ namespace rtxgi
m_probeRelocationResetPipeline = unmanaged.probeRelocation.resetPipeline;
m_probeClassificationPipeline = unmanaged.probeClassification.updatePipeline;
m_probeClassificationResetPipeline = unmanaged.probeClassification.resetPipeline;
+ m_probeVariabilityReductionPipeline = unmanaged.probeVariabilityPipelines.reductionPipeline;
+ m_probeVariabilityExtraReductionPipeline = unmanaged.probeVariabilityPipelines.extraReductionPipeline;
}
#endif
@@ -944,6 +1243,8 @@ namespace rtxgi
vkDestroyShaderModule(m_device, m_probeRelocationResetModule, nullptr);
vkDestroyShaderModule(m_device, m_probeClassificationModule, nullptr);
vkDestroyShaderModule(m_device, m_probeClassificationResetModule, nullptr);
+ vkDestroyShaderModule(m_device, m_probeVariabilityReductionModule, nullptr);
+ vkDestroyShaderModule(m_device, m_probeVariabilityExtraReductionModule, nullptr);
// Pipelines
vkDestroyPipeline(m_device, m_probeBlendingIrradiancePipeline, nullptr);
@@ -952,6 +1253,8 @@ namespace rtxgi
vkDestroyPipeline(m_device, m_probeRelocationResetPipeline, nullptr);
vkDestroyPipeline(m_device, m_probeClassificationPipeline, nullptr);
vkDestroyPipeline(m_device, m_probeClassificationResetPipeline, nullptr);
+ vkDestroyPipeline(m_device, m_probeVariabilityReductionPipeline, nullptr);
+ vkDestroyPipeline(m_device, m_probeVariabilityExtraReductionPipeline, nullptr);
// Texture Arrays
vkDestroyImage(m_device, m_probeRayData, nullptr);
@@ -970,6 +1273,17 @@ namespace rtxgi
vkDestroyImageView(m_device, m_probeDataView, nullptr);
vkFreeMemory(m_device, m_probeDataMemory, nullptr);
+ vkDestroyImage(m_device, m_probeVariability, nullptr);
+ vkDestroyImageView(m_device, m_probeVariabilityView, nullptr);
+ vkFreeMemory(m_device, m_probeVariabilityMemory, nullptr);
+
+ vkDestroyImage(m_device, m_probeVariabilityAverage, nullptr);
+ vkDestroyImageView(m_device, m_probeVariabilityAverageView, nullptr);
+ vkFreeMemory(m_device, m_probeVariabilityAverageMemory, nullptr);
+
+ vkDestroyBuffer(m_device, m_probeVariabilityReadback, nullptr);
+ vkFreeMemory(m_device, m_probeVariabilityReadbackMemory, nullptr);
+
m_descriptorSetLayout = nullptr;
m_descriptorPool = nullptr;
m_device = nullptr;
@@ -992,6 +1306,14 @@ namespace rtxgi
m_probeData = nullptr;
m_probeDataMemory = nullptr;
m_probeDataView = nullptr;
+ m_probeVariability = nullptr;
+ m_probeVariabilityMemory = nullptr;
+ m_probeVariabilityView = nullptr;
+ m_probeVariabilityAverage = nullptr;
+ m_probeVariabilityAverageMemory = nullptr;
+ m_probeVariabilityAverageView = nullptr;
+ m_probeVariabilityReadback = nullptr;
+ m_probeVariabilityReadbackMemory = nullptr;
// Shader Modules
m_probeBlendingIrradianceModule = nullptr;
@@ -1000,6 +1322,8 @@ namespace rtxgi
m_probeRelocationResetModule = nullptr;
m_probeClassificationModule = nullptr;
m_probeClassificationResetModule = nullptr;
+ m_probeVariabilityReductionModule = nullptr;
+ m_probeVariabilityExtraReductionModule = nullptr;
// Pipelines
m_probeBlendingIrradiancePipeline = nullptr;
@@ -1008,6 +1332,8 @@ namespace rtxgi
m_probeRelocationResetPipeline = nullptr;
m_probeClassificationPipeline = nullptr;
m_probeClassificationResetPipeline = nullptr;
+ m_probeVariabilityReductionPipeline = nullptr;
+ m_probeVariabilityExtraReductionPipeline = nullptr;
}
uint32_t DDGIVolume::GetGPUMemoryUsedInBytes() const
@@ -1052,6 +1378,13 @@ namespace rtxgi
barriers.push_back(barrier);
barrier.image = m_probeData;
barriers.push_back(barrier);
+ barrier.image = m_probeVariability;
+ barriers.push_back(barrier);
+
+ GetDDGIVolumeTextureDimensions(m_desc, EDDGIVolumeTextureType::VariabilityAverage, width, height, arraySize);
+ barrier.image = m_probeVariabilityAverage;
+ barrier.subresourceRange.layerCount = arraySize;
+ barriers.push_back(barrier);
vkCmdPipelineBarrier(cmdBuffer, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, 0, 0, nullptr, 0, nullptr, static_cast(barriers.size()), barriers.data());
}
@@ -1126,13 +1459,15 @@ namespace rtxgi
descriptor->descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
descriptor->pBufferInfo = &volumeConstants;
- // 1-4: Volume Texture Array UAVs
+ // 1-6: Volume Texture Array UAVs
VkDescriptorImageInfo rwTex2D[] =
{
{ VK_NULL_HANDLE, m_probeRayDataView, VK_IMAGE_LAYOUT_GENERAL },
{ VK_NULL_HANDLE, m_probeIrradianceView, VK_IMAGE_LAYOUT_GENERAL },
{ VK_NULL_HANDLE, m_probeDistanceView, VK_IMAGE_LAYOUT_GENERAL },
- { VK_NULL_HANDLE, m_probeDataView, VK_IMAGE_LAYOUT_GENERAL }
+ { VK_NULL_HANDLE, m_probeDataView, VK_IMAGE_LAYOUT_GENERAL },
+ { VK_NULL_HANDLE, m_probeVariabilityView, VK_IMAGE_LAYOUT_GENERAL },
+ { VK_NULL_HANDLE, m_probeVariabilityAverageView, VK_IMAGE_LAYOUT_GENERAL }
};
descriptor = &descriptors.emplace_back();
@@ -1144,6 +1479,30 @@ namespace rtxgi
descriptor->descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
descriptor->pImageInfo = rwTex2D;
+ VkDescriptorImageInfo variabilityInfo = { VK_NULL_HANDLE, m_probeVariabilityView, VK_IMAGE_LAYOUT_GENERAL };
+
+ // Probe Variability
+ descriptor = &descriptors.emplace_back();
+ descriptor->sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
+ descriptor->dstSet = m_descriptorSet;
+ descriptor->dstBinding = static_cast(EDDGIVolumeBindings::ProbeVariability);
+ descriptor->dstArrayElement = 0;
+ descriptor->descriptorCount = 1;
+ descriptor->descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
+ descriptor->pImageInfo = &variabilityInfo;
+
+ VkDescriptorImageInfo variabilityAverageInfo = { VK_NULL_HANDLE, m_probeVariabilityAverageView, VK_IMAGE_LAYOUT_GENERAL };
+
+ // Probe Variability Average
+ descriptor = &descriptors.emplace_back();
+ descriptor->sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
+ descriptor->dstSet = m_descriptorSet;
+ descriptor->dstBinding = static_cast(EDDGIVolumeBindings::ProbeVariabilityAverage);
+ descriptor->dstArrayElement = 0;
+ descriptor->descriptorCount = 1;
+ descriptor->descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
+ descriptor->pImageInfo = &variabilityAverageInfo;
+
// Update the descriptor set
vkUpdateDescriptorSets(m_device, static_cast(descriptors.size()), descriptors.data(), 0, nullptr);
@@ -1399,6 +1758,102 @@ namespace rtxgi
return true;
}
+ bool DDGIVolume::CreateProbeVariability(const DDGIVolumeDesc& desc)
+ {
+ vkDestroyImage(m_device, m_probeVariability, nullptr);
+ vkDestroyImageView(m_device, m_probeVariabilityView, nullptr);
+ vkFreeMemory(m_device, m_probeVariabilityMemory, nullptr);
+
+ uint32_t width = 0;
+ uint32_t height = 0;
+ uint32_t arraySize = 0;
+ GetDDGIVolumeTextureDimensions(desc, EDDGIVolumeTextureType::Variability, width, height, arraySize);
+
+ // Check for problems
+ if (width <= 0 || height <= 0 || arraySize <= 0) return false;
+
+ VkFormat format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::Variability, desc.probeVariabilityFormat);
+ VkImageUsageFlags usage = VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;
+
+ // Create the texture, allocate memory, and bind the memory
+ bool result = CreateTexture(width, height, arraySize, format, usage, &m_probeVariability, &m_probeVariabilityMemory, &m_probeVariabilityView);
+ if (!result) return false;
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ std::string name = "DDGIVolume[" + std::to_string(desc.index) + "], Probe Variability";
+ std::string memory = name + " Memory";
+ std::string view = name + " View";
+ SetObjectName(m_device, reinterpret_cast(m_probeVariability), name.c_str(), VK_OBJECT_TYPE_IMAGE);
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityMemory), memory.c_str(), VK_OBJECT_TYPE_DEVICE_MEMORY);
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityView), view.c_str(), VK_OBJECT_TYPE_IMAGE_VIEW);
+ #endif
+ return true;
+ }
+
+ bool DDGIVolume::CreateProbeVariabilityAverage(const DDGIVolumeDesc& desc)
+ {
+ vkDestroyImage(m_device, m_probeVariabilityAverage, nullptr);
+ vkDestroyImageView(m_device, m_probeVariabilityAverageView, nullptr);
+ vkFreeMemory(m_device, m_probeVariabilityAverageMemory, nullptr);
+
+ uint32_t width = 0;
+ uint32_t height = 0;
+ uint32_t arraySize = 0;
+ GetDDGIVolumeTextureDimensions(desc, EDDGIVolumeTextureType::VariabilityAverage, width, height, arraySize);
+
+ // Check for problems
+ if (width <= 0 || height <= 0 || arraySize <= 0) return false;
+
+ VkFormat format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, desc.probeVariabilityFormat);
+ VkImageUsageFlags usage = VK_IMAGE_USAGE_STORAGE_BIT | VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_TRANSFER_SRC_BIT;
+
+ // Create the texture, allocate memory, and bind the memory
+ bool result = CreateTexture(width, height, arraySize, format, usage, &m_probeVariabilityAverage, &m_probeVariabilityAverageMemory, &m_probeVariabilityAverageView);
+ if (!result) return false;
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ std::string name = "DDGIVolume[" + std::to_string(desc.index) + "], Probe Variability Average";
+ std::string memory = name + " Memory";
+ std::string view = name + " View";
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityAverage), name.c_str(), VK_OBJECT_TYPE_IMAGE);
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityAverageMemory), memory.c_str(), VK_OBJECT_TYPE_DEVICE_MEMORY);
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityAverageView), view.c_str(), VK_OBJECT_TYPE_IMAGE_VIEW);
+ #endif
+
+ // Create the readback texture
+ vkDestroyBuffer(m_device, m_probeVariabilityReadback, nullptr);
+
+ // Readback texture is always in "full" format (R32G32F)
+ format = GetDDGIVolumeTextureFormat(EDDGIVolumeTextureType::VariabilityAverage, desc.probeVariabilityFormat);
+ {
+ VkBufferCreateInfo bufferCreateInfo = {};
+ bufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
+ bufferCreateInfo.size = sizeof(float) * 2;
+ bufferCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_DST_BIT;
+
+ // Create the buffer
+ VkResult result = vkCreateBuffer(m_device, &bufferCreateInfo, nullptr, &m_probeVariabilityReadback);
+ if (VKFAILED(result)) return false;
+
+ // Get memory requirements
+ VkMemoryRequirements reqs;
+ vkGetBufferMemoryRequirements(m_device, m_probeVariabilityReadback, &reqs);
+
+ // Allocate memory
+ VkMemoryAllocateFlags flags = 0;
+ VkMemoryPropertyFlags props = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT;
+ if (!AllocateMemory(reqs, props, flags, &m_probeVariabilityReadbackMemory)) return false;
+
+ vkBindBufferMemory(m_device, m_probeVariabilityReadback, m_probeVariabilityReadbackMemory, 0);
+ }
+ #ifdef RTXGI_GFX_NAME_OBJECTS
+ name = "DDGIVolume[" + std::to_string(desc.index) + "], Probe Variability Readback";
+ memory = name + " Memory";
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityReadback), name.c_str(), VK_OBJECT_TYPE_BUFFER);
+ SetObjectName(m_device, reinterpret_cast(m_probeVariabilityReadbackMemory), memory.c_str(), VK_OBJECT_TYPE_DEVICE_MEMORY);
+ #endif
+
+ return true;
+ }
+
#endif // RTXGI_MANAGED_RESOURCES
} // namespace vulkan
} // namespace rtxgi
diff --git a/samples/test-harness/CMakeLists.txt b/samples/test-harness/CMakeLists.txt
index 9eaeb5f..b713048 100644
--- a/samples/test-harness/CMakeLists.txt
+++ b/samples/test-harness/CMakeLists.txt
@@ -225,6 +225,7 @@ set_source_files_properties(${TEST_HARNESS_DDGIVIS_SHADER_SOURCE} PROPERTIES VS_
# Test Harness options
option(RTXGISAMPLES_GFX_NAME_OBJECTS "Enable naming of graphics objects (for debugging)" ON)
option(RTXGISAMPLES_GFX_PERF_MARKERS "Enable GPU performance markers" ON)
+option(RTXGISAMPLES_GFX_NVAPI "Enable NVAPI" ON)
# Test Harness bindless options
set(RTXGISAMPLES_TEST_HARNESS_BINDLESS_TYPE "Resource Arrays" CACHE STRING "The bindless resource implementation to use")
@@ -310,6 +311,7 @@ set(GLFW_INCLUDE_PATH "${ROOT_DIR}/thirdparty/glfw/include")
set(IMGUI_INCLUDE_PATH "${ROOT_DIR}/thirdparty/imgui")
set(IMGUI_BACKENDS_INCLUDE_PATH "${ROOT_DIR}/thirdparty/imgui/backends")
set(TINYGLTF_INCLUDE_PATH "${ROOT_DIR}/thirdparty/tinygltf")
+set(NVAPI_INCLUDE_PATH "${ROOT_DIR}/thirdparty/nvapi")
# ---- WINDOWS / D3D12 --------------------------------------------------------------------------------------
@@ -355,11 +357,21 @@ if(RTXGI_API_D3D12_ENABLE)
${GLFW_INCLUDE_PATH}
${IMGUI_INCLUDE_PATH}
${IMGUI_BACKENDS_INCLUDE_PATH}
+ ${NVAPI_INCLUDE_PATH}
${TINYGLTF_INCLUDE_PATH}
)
- # Add statically linked libs
- target_link_libraries(${TARGET_EXE} RTXGI-D3D12 glfw d3d11 d3d12 dxgi)
+ if(RTXGISAMPLES_GFX_NVAPI)
+ target_compile_definitions(${TARGET_EXE} PRIVATE GFX_NVAPI=1)
+
+ # Add statically linked libs
+ target_link_libraries(${TARGET_EXE} RTXGI-D3D12 glfw d3d11 d3d12 dxgi ${ROOT_DIR}/thirdparty/nvapi/amd64/nvapi64.lib)
+ else()
+ target_compile_definitions(${TARGET_EXE} PRIVATE GFX_NVAPI=0)
+
+ # Add statically linked libs
+ target_link_libraries(${TARGET_EXE} RTXGI-D3D12 glfw d3d11 d3d12 dxgi)
+ endif()
# Add common compiler definitions for exposed Test Harness options
SetupOptions(${TARGET_EXE})
diff --git a/samples/test-harness/config/cornell.ini b/samples/test-harness/config/cornell.ini
index c98d745..ea311da 100644
--- a/samples/test-harness/config/cornell.ini
+++ b/samples/test-harness/config/cornell.ini
@@ -13,7 +13,7 @@ app.rtxgiSDK=../../../rtxgi-sdk/
app.title=RTXGI Test Harness
# scene
-scene.name=Cornell Box
+scene.name=Cornell-Box
scene.path=data/gltf/cornell/
scene.file=cornell.glb
scene.screenshotPath=cornell
@@ -76,11 +76,14 @@ ddgi.volume.0.name=Cornell-Box
ddgi.volume.0.probeRelocation.enabled=1
ddgi.volume.0.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.0.probeClassification.enabled=1
+ddgi.volume.0.probeVariability.enabled=0
+ddgi.volume.0.probeVariability.threshold=0.03
ddgi.volume.0.infiniteScrolling.enabled=1
-ddgi.volume.0.textures.rayData.format=3 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=0.0 1.0 0.0
ddgi.volume.0.probeCounts=9 9 9
ddgi.volume.0.probeSpacing=0.3 0.3 0.3
@@ -101,6 +104,7 @@ ddgi.volume.0.vis.texture.irradianceScale=2
ddgi.volume.0.vis.texture.distanceScale=1
ddgi.volume.0.vis.texture.probeDataScale=10
ddgi.volume.0.vis.texture.rayDataScale=0.56
+ddgi.volume.0.vis.texture.probeVariabilityScale=2.667
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/config/furnace.ini b/samples/test-harness/config/furnace.ini
index fde55c9..31e8f40 100644
--- a/samples/test-harness/config/furnace.ini
+++ b/samples/test-harness/config/furnace.ini
@@ -48,11 +48,14 @@ ddgi.volume.0.name=Scene Volume
ddgi.volume.0.probeRelocation.enabled=0
ddgi.volume.0.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.0.probeClassification.enabled=0
+ddgi.volume.0.probeVariability.enabled=0
+ddgi.volume.0.probeVariability.threshold=0.01
ddgi.volume.0.infiniteScrolling.enabled=0
-ddgi.volume.0.textures.rayData.format=3 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=0.0 0.5 0.0
ddgi.volume.0.probeCounts=8 3 8
ddgi.volume.0.probeSpacing=2 1 2
@@ -73,6 +76,7 @@ ddgi.volume.0.vis.texture.irradianceScale=2.0
ddgi.volume.0.vis.texture.distanceScale=1.0
ddgi.volume.0.vis.texture.probeDataScale=16
ddgi.volume.0.vis.texture.rayDataScale=0.5
+ddgi.volume.0.vis.texture.probeVariabilityScale=2.667
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/config/multi-cornell.ini b/samples/test-harness/config/multi-cornell.ini
index eaeec58..c03561b 100644
--- a/samples/test-harness/config/multi-cornell.ini
+++ b/samples/test-harness/config/multi-cornell.ini
@@ -52,11 +52,14 @@ ddgi.volume.0.name=Cornell-Box-1
ddgi.volume.0.probeRelocation.enabled=1
ddgi.volume.0.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.0.probeClassification.enabled=1
+ddgi.volume.0.probeVariability.enabled=0
+ddgi.volume.0.probeVariability.threshold=0.04
ddgi.volume.0.infiniteScrolling.enabled=0
-ddgi.volume.0.textures.rayData.format=3 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=0.0 2.701 0.0
ddgi.volume.0.probeCounts=9 9 9
ddgi.volume.0.probeSpacing=0.75 0.75 0.75
@@ -77,16 +80,20 @@ ddgi.volume.0.vis.texture.irradianceScale=2.1
ddgi.volume.0.vis.texture.distanceScale=1.05
ddgi.volume.0.vis.texture.probeDataScale=16.81
ddgi.volume.0.vis.texture.rayDataScale=0.59
+ddgi.volume.0.vis.texture.probeVariabilityScale=2.799
ddgi.volume.1.name=Cornell-Box-2
ddgi.volume.1.probeRelocation.enabled=1
ddgi.volume.1.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.1.probeClassification.enabled=1
+ddgi.volume.1.probeVariability.enabled=0
+ddgi.volume.1.probeVariability.threshold=0.04
ddgi.volume.1.infiniteScrolling.enabled=0
-ddgi.volume.1.textures.rayData.format=3
-ddgi.volume.1.textures.irradiance.format=0
-ddgi.volume.1.textures.distance.format=1
-ddgi.volume.1.textures.data.format=2
+ddgi.volume.1.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.1.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.1.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.1.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.1.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.1.origin=11.662 2.581 -6.0
ddgi.volume.1.rotation=0 43 0
ddgi.volume.1.probeCounts=9 9 9
@@ -108,16 +115,20 @@ ddgi.volume.1.vis.texture.irradianceScale=2.1
ddgi.volume.1.vis.texture.distanceScale=1.05
ddgi.volume.1.vis.texture.probeDataScale=16.81
ddgi.volume.1.vis.texture.rayDataScale=0.59
+ddgi.volume.1.vis.texture.probeVariabilityScale=2.799
ddgi.volume.2.name=Cornell-Box-3
ddgi.volume.2.probeRelocation.enabled=1
ddgi.volume.2.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.2.probeClassification.enabled=1
+ddgi.volume.2.probeVariability.enabled=0
+ddgi.volume.2.probeVariability.threshold=0.04
ddgi.volume.2.infiniteScrolling.enabled=0
-ddgi.volume.2.textures.rayData.format=3
-ddgi.volume.2.textures.irradiance.format=0
-ddgi.volume.2.textures.distance.format=1
-ddgi.volume.2.textures.data.format=2
+ddgi.volume.2.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.2.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.2.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.2.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.2.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.2.origin=-10.379 2.701 -7.488
ddgi.volume.2.rotation=0 64 0
ddgi.volume.2.probeCounts=9 9 9
@@ -139,16 +150,20 @@ ddgi.volume.2.vis.texture.irradianceScale=2.1
ddgi.volume.2.vis.texture.distanceScale=1.05
ddgi.volume.2.vis.texture.probeDataScale=16.81
ddgi.volume.2.vis.texture.rayDataScale=0.59
+ddgi.volume.2.vis.texture.probeVariabilityScale=2.799
ddgi.volume.3.name=Cornell-Box-4
ddgi.volume.3.probeRelocation.enabled=1
ddgi.volume.3.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.3.probeClassification.enabled=1
+ddgi.volume.3.probeVariability.enabled=0
+ddgi.volume.3.probeVariability.threshold=0.04
ddgi.volume.3.infiniteScrolling.enabled=0
-ddgi.volume.3.textures.rayData.format=3
-ddgi.volume.3.textures.irradiance.format=0
-ddgi.volume.3.textures.distance.format=1
-ddgi.volume.3.textures.data.format=2
+ddgi.volume.3.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.3.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.3.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.3.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.3.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.3.origin=0.565 2.629 -12.5
ddgi.volume.3.rotation=17 21 26
ddgi.volume.3.probeCounts=9 9 9
@@ -170,16 +185,20 @@ ddgi.volume.3.vis.texture.irradianceScale=2.1
ddgi.volume.3.vis.texture.distanceScale=1.05
ddgi.volume.3.vis.texture.probeDataScale=16.81
ddgi.volume.3.vis.texture.rayDataScale=0.59
+ddgi.volume.3.vis.texture.probeVariabilityScale=2.799
ddgi.volume.4.name=Cornell-Box-5
ddgi.volume.4.probeRelocation.enabled=1
ddgi.volume.4.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.4.probeClassification.enabled=1
+ddgi.volume.4.probeVariability.enabled=0
+ddgi.volume.4.probeVariability.threshold=0.04
ddgi.volume.4.infiniteScrolling.enabled=0
-ddgi.volume.4.textures.rayData.format=3
-ddgi.volume.4.textures.irradiance.format=0
-ddgi.volume.4.textures.distance.format=1
-ddgi.volume.4.textures.data.format=2
+ddgi.volume.4.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.4.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.4.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.4.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.4.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.4.origin=-4.456 6.386 -19.6
ddgi.volume.4.rotation=-15 22 -22
ddgi.volume.4.probeCounts=9 9 9
@@ -201,16 +220,20 @@ ddgi.volume.4.vis.texture.irradianceScale=2.1
ddgi.volume.4.vis.texture.distanceScale=1.05
ddgi.volume.4.vis.texture.probeDataScale=16.81
ddgi.volume.4.vis.texture.rayDataScale=0.59
+ddgi.volume.4.vis.texture.probeVariabilityScale=2.799
ddgi.volume.5.name=Cornell-Box-6
ddgi.volume.5.probeRelocation.enabled=1
ddgi.volume.5.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.5.probeClassification.enabled=1
+ddgi.volume.5.probeVariability.enabled=0
+ddgi.volume.5.probeVariability.threshold=0.04
ddgi.volume.5.infiniteScrolling.enabled=0
-ddgi.volume.5.textures.rayData.format=3
-ddgi.volume.5.textures.irradiance.format=0
-ddgi.volume.5.textures.distance.format=1
-ddgi.volume.5.textures.data.format=2
+ddgi.volume.5.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.5.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.5.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.5.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.5.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.5.origin=11.888 2.732 -19.5
ddgi.volume.5.rotation=33 36.5 -3
ddgi.volume.5.probeCounts=9 9 9
@@ -232,6 +255,7 @@ ddgi.volume.5.vis.texture.irradianceScale=2.1
ddgi.volume.5.vis.texture.distanceScale=1.05
ddgi.volume.5.vis.texture.probeDataScale=16.81
ddgi.volume.5.vis.texture.rayDataScale=0.59
+ddgi.volume.5.vis.texture.probeVariabilityScale=2.799
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/config/sponza.ini b/samples/test-harness/config/sponza.ini
index a535986..d1431f0 100644
--- a/samples/test-harness/config/sponza.ini
+++ b/samples/test-harness/config/sponza.ini
@@ -25,7 +25,7 @@ scene.lights.0.name=Sun
scene.lights.0.type=0
scene.lights.0.direction=0.0 -1.0 0.3
scene.lights.0.color=1.0 1.0 1.0
-scene.lights.0.power=1.45
+scene.lights.0.power=3.14
# scene cameras
scene.cameras.0.name=Upper Floor
@@ -59,11 +59,14 @@ ddgi.volume.0.name=Scene-Volume
ddgi.volume.0.probeRelocation.enabled=1
ddgi.volume.0.probeRelocation.minFrontfaceDistance=0.3 # should be at least as large as probeViewBias!
ddgi.volume.0.probeClassification.enabled=1
+ddgi.volume.0.probeVariability.enabled=1
+ddgi.volume.0.probeVariability.threshold=0.4
ddgi.volume.0.infiniteScrolling.enabled=0
-ddgi.volume.0.textures.rayData.format=3 # EDDGIVolumeTextureFormat::F32x2
-ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=-0.4 5.4 -0.25
ddgi.volume.0.probeCounts=22 22 22
ddgi.volume.0.probeSpacing=1.02 0.5 0.45
@@ -84,6 +87,7 @@ ddgi.volume.0.vis.texture.irradianceScale=0.36
ddgi.volume.0.vis.texture.distanceScale=0.18
ddgi.volume.0.vis.texture.probeDataScale=2.88
ddgi.volume.0.vis.texture.rayDataScale=0.247
+ddgi.volume.0.vis.texture.probeVariabilityScale=0.479
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/config/tunnel.ini b/samples/test-harness/config/tunnel.ini
index e0b4b5c..dc71638 100644
--- a/samples/test-harness/config/tunnel.ini
+++ b/samples/test-harness/config/tunnel.ini
@@ -52,11 +52,14 @@ ddgi.volume.0.name=Infinite Scrolling Volume
ddgi.volume.0.probeRelocation.enabled=1
ddgi.volume.0.probeRelocation.minFrontfaceDistance=2.2
ddgi.volume.0.probeClassification.enabled=1
+ddgi.volume.0.probeVariability.enabled=1
+ddgi.volume.0.probeVariability.threshold=0.02
ddgi.volume.0.infiniteScrolling.enabled=1
-ddgi.volume.0.textures.rayData.format=4 # EDDGIVolumeTextureFormat::F32x2
-ddgi.volume.0.textures.irradiance.format=4 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.rayData.format=6 # EDDGIVolumeTextureFormat::F32x4
+ddgi.volume.0.textures.irradiance.format=6 # EDDGIVolumeTextureFormat::F32x4
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=128.129 11.62 -13.673
ddgi.volume.0.probeCounts=24 9 12
ddgi.volume.0.probeSpacing=5 2.5 5
@@ -77,6 +80,7 @@ ddgi.volume.0.vis.texture.irradianceScale=0.8
ddgi.volume.0.vis.texture.distanceScale=0.4
ddgi.volume.0.vis.texture.probeDataScale=6.4
ddgi.volume.0.vis.texture.rayDataScale=0.2
+ddgi.volume.0.vis.texture.probeVariabilityScale=1.066
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/config/two-rooms.ini b/samples/test-harness/config/two-rooms.ini
index 027d90b..0a8cd52 100644
--- a/samples/test-harness/config/two-rooms.ini
+++ b/samples/test-harness/config/two-rooms.ini
@@ -66,11 +66,14 @@ ddgi.volume.0.name=Rooms Volume
ddgi.volume.0.probeRelocation.enabled=1
ddgi.volume.0.probeRelocation.minFrontfaceDistance=0.1
ddgi.volume.0.probeClassification.enabled=1
+ddgi.volume.0.probeVariability.enabled=1
+ddgi.volume.0.probeVariability.threshold=0.035
ddgi.volume.0.infiniteScrolling.enabled=1
-ddgi.volume.0.textures.rayData.format=3 # EDDGIVolumeTextureFormat::F32x2
-ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
-ddgi.volume.0.textures.distance.format=1 # EDDGIVolumeTextureFormat::F16x2
-ddgi.volume.0.textures.data.format=2 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.rayData.format=5 # EDDGIVolumeTextureFormat::F32x2
+ddgi.volume.0.textures.irradiance.format=0 # EDDGIVolumeTextureFormat::U32
+ddgi.volume.0.textures.distance.format=2 # EDDGIVolumeTextureFormat::F16x2
+ddgi.volume.0.textures.data.format=3 # EDDGIVolumeTextureFormat::F16x4
+ddgi.volume.0.textures.variability.format=1 # EDDGIVolumeTextureFormat::F16
ddgi.volume.0.origin=0.0 24.0 0.0
ddgi.volume.0.probeCounts=32 6 32
ddgi.volume.0.probeSpacing=11 11 11
@@ -91,6 +94,7 @@ ddgi.volume.0.vis.texture.irradianceScale=0.7
ddgi.volume.0.vis.texture.distanceScale=0.35
ddgi.volume.0.vis.texture.probeDataScale=3
ddgi.volume.0.vis.texture.rayDataScale=0.49
+ddgi.volume.0.vis.texture.probeVariabilityScale=0.933
# ray traced ambient occlusion
rtao.enable=1
diff --git a/samples/test-harness/include/Benchmark.h b/samples/test-harness/include/Benchmark.h
index 3afe495..ea0f254 100644
--- a/samples/test-harness/include/Benchmark.h
+++ b/samples/test-harness/include/Benchmark.h
@@ -27,5 +27,5 @@ namespace Benchmark
std::stringstream gpuTimingCsv;
};
void StartBenchmark(BenchmarkRun& benchmarkRun, Instrumentation::Performance& perf, Configs::Config& config, Graphics::Globals& gfx);
- void UpdateBenchmark(BenchmarkRun& benchmarkRun, Instrumentation::Performance& perf, Configs::Config& config, Graphics::Globals& gfx, std::ofstream& log);
-}
\ No newline at end of file
+ bool UpdateBenchmark(BenchmarkRun& benchmarkRun, Instrumentation::Performance& perf, Configs::Config& config, Graphics::Globals& gfx, std::ofstream& log);
+}
diff --git a/samples/test-harness/include/Configs.h b/samples/test-harness/include/Configs.h
index 0bea16e..84f319a 100644
--- a/samples/test-harness/include/Configs.h
+++ b/samples/test-harness/include/Configs.h
@@ -28,6 +28,7 @@ namespace Configs
rtxgi::EDDGIVolumeTextureFormat irradianceFormat;
rtxgi::EDDGIVolumeTextureFormat distanceFormat;
rtxgi::EDDGIVolumeTextureFormat dataFormat;
+ rtxgi::EDDGIVolumeTextureFormat variabilityFormat;
};
struct DDGIVolume
@@ -41,7 +42,9 @@ namespace Configs
bool clearProbes = false;
bool probeRelocationEnabled = false;
bool probeClassificationEnabled = false;
+ bool probeVariabilityEnabled = false;
bool infiniteScrollingEnabled = false;
+ bool clearProbeVariability = false;
DirectX::XMFLOAT3 origin = { 0.f, 0.f, 0.f };
DirectX::XMFLOAT3 eulerAngles = { 0.f, 0.f, 0.f };
@@ -59,6 +62,7 @@ namespace Configs
float probeViewBias = 0.f;
float probeIrradianceThreshold = 0.f;
float probeBrightnessThreshold = 0.f;
+ float probeVariabilityThreshold = 0.f;
float probeMinFrontfaceDistance = 0.f;
@@ -72,6 +76,7 @@ namespace Configs
float probeIrradianceScale = 1.f;
float probeDistanceScale = 1.f;
float probeDataScale = 1.f;
+ float probeVariabilityScale = 1.f;
rtxgi::EDDGIVolumeProbeVisType probeVisType = rtxgi::EDDGIVolumeProbeVisType::Default;
};
@@ -84,6 +89,7 @@ namespace Configs
bool showTextures = false;
bool showIndirect = false;
bool insertPerfMarkers = true;
+ bool shaderExecutionReordering = false;
uint32_t selectedVolume = 0;
std::vector volumes;
};
@@ -138,6 +144,7 @@ namespace Configs
{
bool enabled = false;
bool antialiasing = false;
+ bool shaderExecutionReordering = false;
bool reload = false;
float rayNormalBias = 0.001f;
float rayViewBias = 0.001f;
diff --git a/samples/test-harness/include/Direct3D12.h b/samples/test-harness/include/Direct3D12.h
index 86d9774..32b10bf 100644
--- a/samples/test-harness/include/Direct3D12.h
+++ b/samples/test-harness/include/Direct3D12.h
@@ -89,6 +89,11 @@ namespace Graphics
}
};
+ struct Features
+ {
+ UINT waveLaneCount;
+ };
+
struct Globals
{
IDXGIFactory7* factory = nullptr;
@@ -114,6 +119,8 @@ namespace Graphics
Shaders::ShaderCompiler shaderCompiler;
+ Features features = {};
+
// For Windowed->Fullscreen->Windowed transitions
int x = 0;
int y = 0;
@@ -128,6 +135,7 @@ namespace Graphics
bool fullscreenChanged = false;
bool allowTearing = false;
+ bool supportsShaderExecutionReordering = false;
};
struct RenderTargets
@@ -179,9 +187,13 @@ namespace Graphics
UINT8* materialsSTBPtr = nullptr;
// ByteAddress Buffers
- ID3D12Resource* materialIndicesRB = nullptr;
- ID3D12Resource* materialIndicesRBUpload = nullptr;
- UINT8* materialIndicesRBPtr = nullptr;
+ ID3D12Resource* meshOffsetsRB = nullptr;
+ ID3D12Resource* meshOffsetsRBUpload = nullptr;
+ UINT8* meshOffsetsRBPtr = nullptr;
+
+ ID3D12Resource* geometryDataRB = nullptr;
+ ID3D12Resource* geometryDataRBUpload = nullptr;
+ UINT8* geometryDataRBPtr = nullptr;
// Shared Render Targets
RenderTargets rt;
@@ -209,8 +221,8 @@ namespace Graphics
ID3D12RootSignature* CreateRootSignature(Globals& d3d, const D3D12_ROOT_SIGNATURE_DESC& desc);
bool CreateBuffer(Globals& d3d, const BufferDesc& info, ID3D12Resource** ppResource);
- bool CreateVertexBuffer(Globals& d3d, const Scenes::MeshPrimitive& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_VERTEX_BUFFER_VIEW& view);
- bool CreateIndexBuffer(Globals& d3d, const Scenes::MeshPrimitive& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_INDEX_BUFFER_VIEW& view);
+ bool CreateVertexBuffer(Globals& d3d, const Scenes::Mesh& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_VERTEX_BUFFER_VIEW& view);
+ bool CreateIndexBuffer(Globals& d3d, const Scenes::Mesh& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_INDEX_BUFFER_VIEW& view);
bool CreateTexture(Globals& d3d, const TextureDesc& info, ID3D12Resource** resource);
bool CreateRasterPSO(
@@ -270,33 +282,34 @@ namespace Graphics
// Texture2DArray UAV
const int UAV_TEX2DARRAY_START = UAV_DDGI_OUTPUT + 1; // 16: RWTexture2DArray UAV Start
- const int UAV_DDGI_VOLUME_TEX2DARRAY = UAV_TEX2DARRAY_START; // 16: 24 UAV, 4 for each DDGIVolume (RayData, Irradiance, Distance, Probe Data)
+ const int UAV_DDGI_VOLUME_TEX2DARRAY = UAV_TEX2DARRAY_START; // 16: 36 UAV, 6 for each DDGIVolume (RayData, Irradiance, Distance, Probe Data, Variability, VariabilityAverage)
- // Shader Resource Views // 40: SRV Start
+ // Shader Resource Views // 52: SRV Start
const int SRV_START = UAV_DDGI_VOLUME_TEX2DARRAY + (rtxgi::GetDDGIVolumeNumTex2DArrayDescriptors() * MAX_DDGIVOLUMES);
// RaytracingAccelerationStructure SRV
- const int SRV_TLAS_START = SRV_START; // 40: TLAS SRV Start
- const int SRV_SCENE_TLAS = SRV_TLAS_START; // 40: 1 SRV for the Scene TLAS
- const int SRV_DDGI_PROBE_VIS_TLAS = SRV_SCENE_TLAS + 1; // 41: 1 SRV for the DDGI Probe Vis TLAS
+ const int SRV_TLAS_START = SRV_START; // 52: TLAS SRV Start
+ const int SRV_SCENE_TLAS = SRV_TLAS_START; // 52: 1 SRV for the Scene TLAS
+ const int SRV_DDGI_PROBE_VIS_TLAS = SRV_SCENE_TLAS + 1; // 53: 1 SRV for the DDGI Probe Vis TLAS
// Texture2D SRV
- const int SRV_TEX2D_START = SRV_TLAS_START + MAX_TLAS; // 42: Texture2D SRV Start
- const int SRV_BLUE_NOISE = SRV_TEX2D_START; // 42: 1 SRV for the Blue Noise Texture
- const int SRV_IMGUI_FONTS = SRV_BLUE_NOISE + 1; // 43: 1 SRV for the ImGui Font Texture
- const int SRV_SCENE_TEXTURES = SRV_IMGUI_FONTS + 1; // 44: 300 SRV (max), 1 SRV for each Material Texture
+ const int SRV_TEX2D_START = SRV_TLAS_START + MAX_TLAS; // 54: Texture2D SRV Start
+ const int SRV_BLUE_NOISE = SRV_TEX2D_START; // 54: 1 SRV for the Blue Noise Texture
+ const int SRV_IMGUI_FONTS = SRV_BLUE_NOISE + 1; // 55: 1 SRV for the ImGui Font Texture
+ const int SRV_SCENE_TEXTURES = SRV_IMGUI_FONTS + 1; // 56: 300 SRV (max), 1 SRV for each Material Texture
// Texture2DArray SRV
- const int SRV_TEX2DARRAY_START = SRV_SCENE_TEXTURES + MAX_TEXTURES; // 344: Texture2DArray SRV Start
- const int SRV_DDGI_VOLUME_TEX2DARRAY = SRV_TEX2DARRAY_START; // 344: 24 SRV, 4 for each DDGIVolume (RayData, Irradiance, Distance, Probe Data)
+ const int SRV_TEX2DARRAY_START = SRV_SCENE_TEXTURES + MAX_TEXTURES; // 356: Texture2DArray SRV Start
+ const int SRV_DDGI_VOLUME_TEX2DARRAY = SRV_TEX2DARRAY_START; // 356: 36 SRV, 6 for each DDGIVolume (RayData, Irradiance, Distance, Probe Data, Variability, Variability Average)
- // ByteAddressBuffer SRV // 368: ByteAddressBuffer SRV Start
+ // ByteAddressBuffer SRV // 392: ByteAddressBuffer SRV Start
const int SRV_BYTEADDRESS_START = SRV_TEX2DARRAY_START + (rtxgi::GetDDGIVolumeNumTex2DArrayDescriptors() * MAX_DDGIVOLUMES);
- const int SRV_SPHERE_INDICES = SRV_BYTEADDRESS_START; // 368: 1 SRV for DDGI Probe Vis Sphere Index Buffer
- const int SRV_SPHERE_VERTICES = SRV_SPHERE_INDICES + 1; // 369: 1 SRV for DDGI Probe Vis Sphere Vertex Buffer
- const int SRV_MATERIAL_INDICES = SRV_SPHERE_VERTICES + 1; // 370: 1 SRV for Mesh Primitive Material Indices
- const int SRV_INDICES = SRV_MATERIAL_INDICES + 1; // 371: n SRV for Mesh Primitive Index Buffers
- const int SRV_VERTICES = SRV_INDICES + 1; // 372: n SRV for Mesh Primitive Vertex Buffers
+ const int SRV_SPHERE_INDICES = SRV_BYTEADDRESS_START; // 392: 1 SRV for DDGI Probe Vis Sphere Index Buffer
+ const int SRV_SPHERE_VERTICES = SRV_SPHERE_INDICES + 1; // 393: 1 SRV for DDGI Probe Vis Sphere Vertex Buffer
+ const int SRV_MESH_OFFSETS = SRV_SPHERE_VERTICES + 1; // 394: 1 SRV for Mesh Offsets in the Geometry Data Buffer
+ const int SRV_GEOMETRY_DATA = SRV_MESH_OFFSETS + 1; // 395: 1 SRV for Geometry (Mesh Primitive) Data
+ const int SRV_INDICES = SRV_GEOMETRY_DATA + 1; // 396: n SRV for Mesh Index Buffers
+ const int SRV_VERTICES = SRV_INDICES + 1; // 397: n SRV for Mesh Vertex Buffers
};
}
diff --git a/samples/test-harness/include/Geometry.h b/samples/test-harness/include/Geometry.h
index df03638..470aec4 100644
--- a/samples/test-harness/include/Geometry.h
+++ b/samples/test-harness/include/Geometry.h
@@ -13,6 +13,6 @@
namespace Geometry
{
- void CreateSphere(uint32_t latitudes, uint32_t longitudes, Scenes::MeshPrimitive& mesh);
+ void CreateSphere(uint32_t latitudes, uint32_t longitudes, Scenes::Mesh& mesh);
}
diff --git a/samples/test-harness/include/Scenes.h b/samples/test-harness/include/Scenes.h
index ad2df33..4d80735 100644
--- a/samples/test-harness/include/Scenes.h
+++ b/samples/test-harness/include/Scenes.h
@@ -24,6 +24,8 @@ namespace Scenes
int material = -1;
bool opaque = true;
bool doubleSided = false;
+ uint32_t vertexByteOffset = 0;
+ uint32_t indexByteOffset = 0;
rtxgi::AABB boundingBox; // not instanced transformed
std::vector vertices;
std::vector indices;
@@ -31,7 +33,10 @@ namespace Scenes
struct Mesh
{
+ int index = -1;
std::string name = "";
+ uint32_t numIndices = 0;
+ uint32_t numVertices = 0;
rtxgi::AABB boundingBox; // not instance transformed
std::vector primitives;
};
diff --git a/samples/test-harness/include/Vulkan.h b/samples/test-harness/include/Vulkan.h
index 9863266..f04c9a2 100644
--- a/samples/test-harness/include/Vulkan.h
+++ b/samples/test-harness/include/Vulkan.h
@@ -198,6 +198,11 @@ namespace Graphics
}
};
+ struct Features
+ {
+ uint32_t waveLaneCount;
+ };
+
struct Globals
{
VkInstance instance = nullptr;
@@ -234,6 +239,8 @@ namespace Graphics
Shaders::ShaderCompiler shaderCompiler;
+ Features features = {};
+
// For Windowed->Fullscreen->Windowed transitions
int x = 0;
int y = 0;
@@ -247,12 +254,15 @@ namespace Graphics
int fullscreen = 0;
bool fullscreenChanged = false;
+ bool supportsShaderExecutionReordering = false;
+
VkDebugUtilsMessengerEXT debugUtilsMessenger = nullptr;
VkPhysicalDeviceFeatures deviceFeatures = {};
VkPhysicalDeviceProperties2 deviceProps = {};
VkPhysicalDeviceAccelerationStructurePropertiesKHR deviceASProps = {};
VkPhysicalDeviceRayTracingPipelinePropertiesKHR deviceRTPipelineProps = {};
+ VkPhysicalDeviceSubgroupProperties deviceSubgroupProps = {};
};
struct RenderTargets
@@ -315,11 +325,17 @@ namespace Graphics
uint8_t* materialsSTBPtr = nullptr;
// ByteAddress Buffers
- VkBuffer materialIndicesRB = nullptr;
- VkDeviceMemory materialIndicesRBMemory = nullptr;
- VkBuffer materialIndicesRBUploadBuffer = nullptr;
- VkDeviceMemory materialIndicesRBUploadMemory = nullptr;
- uint8_t* materialIndicesRBPtr = nullptr;
+ VkBuffer meshOffsetsRB = nullptr;
+ VkDeviceMemory meshOffsetsRBMemory = nullptr;
+ VkBuffer meshOffsetsRBUploadBuffer = nullptr;
+ VkDeviceMemory meshOffsetsRBUploadMemory = nullptr;
+ uint8_t* meshOffsetsRBPtr = nullptr;
+
+ VkBuffer geometryDataRB = nullptr;
+ VkDeviceMemory geometryDataRBMemory = nullptr;
+ VkBuffer geometryDataRBUploadBuffer = nullptr;
+ VkDeviceMemory geometryDataRBUploadMemory = nullptr;
+ uint8_t* geometryDataRBPtr = nullptr;
// Shared Render Targets
RenderTargets rt;
@@ -363,8 +379,8 @@ namespace Graphics
void SetImageLayoutBarrier(VkCommandBuffer cmdBuffer, VkImage image, const ImageBarrierDesc info);
bool CreateBuffer(Globals& vk, const BufferDesc& info, VkBuffer* buffer, VkDeviceMemory* memory);
- bool CreateIndexBuffer(Globals& vk, const Scenes::MeshPrimitive& primitive, VkBuffer* ib, VkDeviceMemory* ibMemory, VkBuffer* ibUpload, VkDeviceMemory* ibUploadMemory);
- bool CreateVertexBuffer(Globals& vk, const Scenes::MeshPrimitive& primitive, VkBuffer* vb, VkDeviceMemory* vbMemory, VkBuffer* vbUpload, VkDeviceMemory* vbUploadMemory);
+ bool CreateIndexBuffer(Globals& vk, const Scenes::Mesh& mesh, VkBuffer* ib, VkDeviceMemory* ibMemory, VkBuffer* ibUpload, VkDeviceMemory* ibUploadMemory);
+ bool CreateVertexBuffer(Globals& vk, const Scenes::Mesh& mesh, VkBuffer* vb, VkDeviceMemory* vbMemory, VkBuffer* vbUpload, VkDeviceMemory* vbUploadMemory);
bool CreateTexture(Globals& vk, const TextureDesc& info, VkImage* image, VkDeviceMemory* imageMemory, VkImageView* imageView);
bool CreateShaderModule(VkDevice device, const Shaders::ShaderProgram& shader, VkShaderModule* module);
diff --git a/samples/test-harness/include/graphics/DDGIShaderConfig.h b/samples/test-harness/include/graphics/DDGIShaderConfig.h
index 104e365..9650d5a 100644
--- a/samples/test-harness/include/graphics/DDGIShaderConfig.h
+++ b/samples/test-harness/include/graphics/DDGIShaderConfig.h
@@ -59,6 +59,9 @@
#define RTXGI_PUSH_CONSTS_STRUCT_NAME GlobalConstants
#define RTXGI_PUSH_CONSTS_VARIABLE_NAME GlobalConst
#define RTXGI_PUSH_CONSTS_FIELD_DDGI_VOLUME_INDEX_NAME ddgi_volumeIndex
+ #define RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_X_NAME ddgi_reductionInputSizeX
+ #define RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Y_NAME ddgi_reductionInputSizeY
+ #define RTXGI_PUSH_CONSTS_FIELD_DDGI_REDUCTION_INPUT_SIZE_Z_NAME ddgi_reductionInputSizeZ
#define VOLUME_CONSTS_REGISTER 5
#define VOLUME_CONSTS_SPACE 0
#define VOLUME_RESOURCES_REGISTER 6
@@ -79,6 +82,9 @@
#define OUTPUT_SPACE 0
#define PROBE_DATA_REGISTER 4
#define PROBE_DATA_SPACE 0
+ #define PROBE_VARIABILITY_REGISTER 5
+ #define PROBE_VARIABILITY_AVERAGE_REGISTER 6
+ #define PROBE_VARIABILITY_SPACE 0
#endif
#else
#define CONSTS_REGISTER b0
@@ -103,6 +109,9 @@
#define OUTPUT_SPACE space1
#define PROBE_DATA_REGISTER u3
#define PROBE_DATA_SPACE space1
+ #define PROBE_VARIABILITY_REGISTER u4
+ #define PROBE_VARIABILITY_AVERAGE_REGISTER u5
+ #define PROBE_VARIABILITY_SPACE space1
#endif
#endif
#endif
diff --git a/samples/test-harness/include/graphics/DDGIVisualizations_D3D12.h b/samples/test-harness/include/graphics/DDGIVisualizations_D3D12.h
index 2ce4e02..f141156 100644
--- a/samples/test-harness/include/graphics/DDGIVisualizations_D3D12.h
+++ b/samples/test-harness/include/graphics/DDGIVisualizations_D3D12.h
@@ -62,7 +62,7 @@ namespace Graphics
ID3D12Resource* probeIBUpload = nullptr;
D3D12_INDEX_BUFFER_VIEW probeIBView;
- Scenes::MeshPrimitive probe;
+ Scenes::Mesh probe;
AccelerationStructure blas;
AccelerationStructure tlas;
diff --git a/samples/test-harness/include/graphics/DDGIVisualizations_VK.h b/samples/test-harness/include/graphics/DDGIVisualizations_VK.h
index 5382c9a..fa5403e 100644
--- a/samples/test-harness/include/graphics/DDGIVisualizations_VK.h
+++ b/samples/test-harness/include/graphics/DDGIVisualizations_VK.h
@@ -75,7 +75,7 @@ namespace Graphics
VkBuffer probeIBUpload = nullptr;
VkDeviceMemory probeIBUploadMemory = nullptr;
- Scenes::MeshPrimitive probe;
+ Scenes::Mesh probe;
AccelerationStructure blas;
AccelerationStructure tlas;
diff --git a/samples/test-harness/include/graphics/DDGI_D3D12.h b/samples/test-harness/include/graphics/DDGI_D3D12.h
index 82a795f..94199a2 100644
--- a/samples/test-harness/include/graphics/DDGI_D3D12.h
+++ b/samples/test-harness/include/graphics/DDGI_D3D12.h
@@ -48,6 +48,7 @@ namespace Graphics
D3D12_GPU_VIRTUAL_ADDRESS shaderTableHitGroupTableStartAddress = 0;
// DDGI
+ std::vector volumeDescs;
std::vector volumes;
std::vector selectedVolumes;
@@ -61,6 +62,9 @@ namespace Graphics
ID3D12Resource* volumeConstantsSTBUpload = nullptr;
UINT volumeConstantsSTBSizeInBytes = 0;
+ // Variability Tracking
+ std::vector numVolumeVariabilitySamples;
+
// Performance Stats
Instrumentation::Stat* cpuStat = nullptr;
Instrumentation::Stat* gpuStat = nullptr;
@@ -70,6 +74,7 @@ namespace Graphics
Instrumentation::Stat* blendStat = nullptr;
Instrumentation::Stat* relocateStat = nullptr;
Instrumentation::Stat* lightingStat = nullptr;
+ Instrumentation::Stat* variabilityStat = nullptr;
bool enabled = false;
};
diff --git a/samples/test-harness/include/graphics/DDGI_VK.h b/samples/test-harness/include/graphics/DDGI_VK.h
index 527cef9..879fb28 100644
--- a/samples/test-harness/include/graphics/DDGI_VK.h
+++ b/samples/test-harness/include/graphics/DDGI_VK.h
@@ -54,6 +54,7 @@ namespace Graphics
VkDeviceAddress shaderTableHitGroupTableStartAddress = 0;
// DDGI
+ std::vector volumeDescs;
std::vector volumes;
std::vector selectedVolumes;
@@ -75,6 +76,9 @@ namespace Graphics
VkDeviceMemory volumeConstantsSTBUploadMemory = nullptr;
uint64_t volumeConstantsSTBSizeInBytes = 0;
+ // Variability Tracking
+ std::vector numVolumeVariabilitySamples;
+
Instrumentation::Stat* cpuStat = nullptr;
Instrumentation::Stat* gpuStat = nullptr;
@@ -83,6 +87,7 @@ namespace Graphics
Instrumentation::Stat* blendStat = nullptr;
Instrumentation::Stat* relocateStat = nullptr;
Instrumentation::Stat* lightingStat = nullptr;
+ Instrumentation::Stat* variabilityStat = nullptr;
bool enabled = false;
};
diff --git a/samples/test-harness/include/graphics/Types.h b/samples/test-harness/include/graphics/Types.h
index c486764..03f6966 100644
--- a/samples/test-harness/include/graphics/Types.h
+++ b/samples/test-harness/include/graphics/Types.h
@@ -96,6 +96,13 @@ namespace Graphics
float2 uv0;
};
+ struct GeometryData
+ {
+ uint materialIndex;
+ uint indexByteAddress;
+ uint vertexByteAddress;
+ };
+
struct Camera
{
float3 position;
@@ -183,9 +190,15 @@ namespace Graphics
return data;
}
+ // Pack the SER bool into the second-to-last bit of samplesPerPixel
+ void SetShaderExecutionReordering(bool value)
+ {
+ samplesPerPixel |= ((uint)value << 30);
+ }
+
+ // Pack the AA bool into the last bit of samplesPerPixel
void SetAntialiasing(bool value)
{
- // Pack bool into the last bit of samplesPerPixel
samplesPerPixel |= ((uint)value << 31);
}
#endif
@@ -318,12 +331,14 @@ namespace Graphics
float irradianceTextureScale;
float distanceTextureScale;
float probeDataTextureScale;
+ float probeVariabilityTextureScale;
+ float probeVariabilityTextureThreshold;
#ifndef HLSL
- uint32_t data[8];
- static uint32_t GetNum32BitValues() { return 8; }
+ uint32_t data[10];
+ static uint32_t GetNum32BitValues() { return 10; }
static uint32_t GetSizeInBytes() { return GetNum32BitValues() * 4; }
- static uint32_t GetAlignedNum32BitValues() { return 8; }
+ static uint32_t GetAlignedNum32BitValues() { return 12; }
static uint32_t GetAlignedSizeInBytes() { return GetAlignedNum32BitValues() * 4; }
uint32_t* GetData()
{
@@ -335,6 +350,9 @@ namespace Graphics
data[5] = *(uint32_t*)&irradianceTextureScale;
data[6] = *(uint32_t*)&distanceTextureScale;
data[7] = *(uint32_t*)&probeDataTextureScale;
+ data[8] = *(uint32_t*)&probeVariabilityTextureScale;
+ data[9] = *(uint32_t*)&probeVariabilityTextureThreshold;
+ //data[10/11] = 0; // empty, alignment padding
return data;
}
@@ -350,8 +368,8 @@ namespace Graphics
RTAOConsts rtao; // 16 32-bit values, 64 bytes
CompositeConsts composite; // 4 32-bit values, 16 bytes
PostProcessConsts post; // 4 32-bit values, 16 bytes
- DDGIVisConsts ddgivis; // 8 32-bit values, 32 bytes
- // 44 32-bit values, 176 bytes
+ DDGIVisConsts ddgivis; // 12 32-bit values, 48 bytes
+ // 48 32-bit values, 192 bytes
static uint32_t GetNum32BitValues()
{
@@ -453,11 +471,18 @@ namespace Graphics
float ddgivis_irradianceTextureScale;
float ddgivis_distanceTextureScale;
float ddgivis_probeDataTextureScale;
+ float ddgivis_probeVariabilityTextureScale;
+ float ddgivis_probeVariabilityTextureThreshold;
+ uint2 ddgivis_pad;
#ifdef __spirv__
// DDGIRootConstants
uint ddgi_volumeIndex;
- uint3 ddgi_pad;
+ uint2 ddgi_pad0;
+ uint ddgi_reductionInputSizeX;
+ uint ddgi_reductionInputSizeY;
+ uint ddgi_reductionInputSizeZ;
+ uint2 ddgi_pad1;
#endif
#endif // HLSL
};
diff --git a/samples/test-harness/shaders/AHS.hlsl b/samples/test-harness/shaders/AHS.hlsl
index d2e2116..5f9d9f6 100644
--- a/samples/test-harness/shaders/AHS.hlsl
+++ b/samples/test-harness/shaders/AHS.hlsl
@@ -14,18 +14,22 @@
[shader("anyhit")]
void AHS_LOD0(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttributes attrib)
{
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
float alpha = material.opacity;
if (material.alphaMode == 2)
{
// Load and interpolate the triangle's texture coordinates
float3 barycentrics = float3((1.f - attrib.barycentrics.x - attrib.barycentrics.y), attrib.barycentrics.x, attrib.barycentrics.y);
- float2 uv0 = LoadAndInterpolateUV0(InstanceID(), PrimitiveIndex(), barycentrics);
+ float2 uv0 = LoadAndInterpolateUV0(InstanceID(), PrimitiveIndex(), geometry, barycentrics);
if (material.albedoTexIdx > -1)
{
- alpha = GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), uv0, 0).a;
+ alpha *= GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), uv0, 0).a;
}
}
@@ -35,15 +39,19 @@ void AHS_LOD0(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttr
[shader("anyhit")]
void AHS_PRIMARY(inout PackedPayload payload, BuiltInTriangleIntersectionAttributes attrib)
{
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
float alpha = material.opacity;
if (material.alphaMode == 2)
{
// Load the vertices
Vertex vertices[3];
- LoadVerticesPosUV0(InstanceID(), PrimitiveIndex(), vertices);
+ LoadVerticesPosUV0(InstanceID(), PrimitiveIndex(), geometry, vertices);
// Compute texture coordinate differentials
float2 dUVdx, dUVdy;
@@ -60,7 +68,7 @@ void AHS_PRIMARY(inout PackedPayload payload, BuiltInTriangleIntersectionAttribu
// Sample the texture
if (material.albedoTexIdx > -1)
{
- alpha = GetTex2D(material.albedoTexIdx).SampleGrad(GetAnisoWrapSampler(), v.uv0, dUVdx, dUVdy).a;
+ alpha *= GetTex2D(material.albedoTexIdx).SampleGrad(GetAnisoWrapSampler(), v.uv0, dUVdx, dUVdy).a;
}
}
@@ -70,15 +78,19 @@ void AHS_PRIMARY(inout PackedPayload payload, BuiltInTriangleIntersectionAttribu
[shader("anyhit")]
void AHS_GI(inout PackedPayload payload, BuiltInTriangleIntersectionAttributes attrib)
{
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the surface material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
float alpha = material.opacity;
if (material.alphaMode == 2)
{
// Load the vertices
Vertex vertices[3];
- LoadVerticesPosUV0(InstanceID(), PrimitiveIndex(), vertices);
+ LoadVerticesPosUV0(InstanceID(), PrimitiveIndex(), geometry, vertices);
// Interpolate the triangle's texture coordinates
float3 barycentrics = float3((1.f - attrib.barycentrics.x - attrib.barycentrics.y), attrib.barycentrics.x, attrib.barycentrics.y);
@@ -92,7 +104,7 @@ void AHS_GI(inout PackedPayload payload, BuiltInTriangleIntersectionAttributes a
GetTex2D(material.albedoTexIdx).GetDimensions(0, width, height, numLevels);
// Sample the texture
- alpha = GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), v.uv0, numLevels * 0.6667f).a;
+ alpha *= GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), v.uv0, numLevels * 0.6667f).a;
}
}
diff --git a/samples/test-harness/shaders/CHS.hlsl b/samples/test-harness/shaders/CHS.hlsl
index 0557fd8..b01faba 100644
--- a/samples/test-harness/shaders/CHS.hlsl
+++ b/samples/test-harness/shaders/CHS.hlsl
@@ -18,9 +18,13 @@ void CHS_LOD0(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttr
payload.hitT = RayTCurrent();
payload.hitKind = HitKind();
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the triangle's vertices
Vertex vertices[3];
- LoadVertices(InstanceID(), PrimitiveIndex(), vertices);
+ LoadVertices(InstanceID(), PrimitiveIndex(), geometry, vertices);
// Interpolate the triangle's attributes for the hit location (position, normal, tangent, texture coordinates)
float3 barycentrics = float3((1.f - attrib.barycentrics.x - attrib.barycentrics.y), attrib.barycentrics.x, attrib.barycentrics.y);
@@ -36,15 +40,16 @@ void CHS_LOD0(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttr
payload.shadingNormal = payload.normal;
// Load the surface material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
payload.albedo = material.albedo;
+ payload.opacity = material.opacity;
// Albedo and Opacity
if (material.albedoTexIdx > -1)
{
float4 bco = GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), v.uv0, 0);
- payload.albedo = bco.rgb;
- payload.opacity = bco.a;
+ payload.albedo *= bco.rgb;
+ payload.opacity *= bco.a;
}
// Shading normal
@@ -83,9 +88,13 @@ void CHS_PRIMARY(inout PackedPayload packedPayload, BuiltInTriangleIntersectionA
payload.hitT = RayTCurrent();
payload.hitKind = HitKind();
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the triangle's vertices
Vertex vertices[3];
- LoadVertices(InstanceID(), PrimitiveIndex(), vertices);
+ LoadVertices(InstanceID(), PrimitiveIndex(), geometry, vertices);
// Interpolate the triangle's attributes for the hit location (position, normal, tangent, texture coordinates)
float3 barycentrics = float3((1.f - attrib.barycentrics.x - attrib.barycentrics.y), attrib.barycentrics.x, attrib.barycentrics.y);
@@ -101,8 +110,9 @@ void CHS_PRIMARY(inout PackedPayload packedPayload, BuiltInTriangleIntersectionA
payload.shadingNormal = payload.normal;
// Load the surface material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
payload.albedo = material.albedo;
+ payload.opacity = material.opacity;
// Compute texture coordinate differentials
float2 dUVdx, dUVdy;
@@ -116,8 +126,8 @@ void CHS_PRIMARY(inout PackedPayload packedPayload, BuiltInTriangleIntersectionA
if (material.albedoTexIdx > -1)
{
float4 bco = GetTex2D(material.albedoTexIdx).SampleGrad(GetAnisoWrapSampler(), v.uv0, dUVdx, dUVdy);
- payload.albedo = bco.rgb;
- payload.opacity = bco.a;
+ payload.albedo *= bco.rgb;
+ payload.opacity *= bco.a;
}
// Shading normal
@@ -157,9 +167,13 @@ void CHS_GI(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttrib
payload.hitT = RayTCurrent();
payload.hitKind = HitKind();
+ // Load the intersected mesh geometry's data
+ GeometryData geometry;
+ GetGeometryData(InstanceID(), GeometryIndex(), geometry);
+
// Load the triangle's vertices
Vertex vertices[3];
- LoadVertices(InstanceID(), PrimitiveIndex(), vertices);
+ LoadVertices(InstanceID(), PrimitiveIndex(), geometry, vertices);
// Interpolate the triangle's attributes for the hit location (position, normal, tangent, texture coordinates)
float3 barycentrics = float3((1.f - attrib.barycentrics.x - attrib.barycentrics.y), attrib.barycentrics.x, attrib.barycentrics.y);
@@ -175,8 +189,9 @@ void CHS_GI(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttrib
payload.shadingNormal = payload.normal;
// Load the surface material
- Material material = GetMaterial(GetMaterialIndex(InstanceID()));
+ Material material = GetMaterial(geometry);
payload.albedo = material.albedo;
+ payload.opacity = material.opacity;
// Albedo and Opacity
if (material.albedoTexIdx > -1)
@@ -187,8 +202,23 @@ void CHS_GI(inout PackedPayload packedPayload, BuiltInTriangleIntersectionAttrib
// Sample the albedo texture
float4 bco = GetTex2D(material.albedoTexIdx).SampleLevel(GetBilinearWrapSampler(), v.uv0, numLevels / 2.f);
- payload.albedo = bco.rgb;
- payload.opacity = bco.a;
+ payload.albedo *= bco.rgb;
+ payload.opacity *= bco.a;
+ }
+
+ // Shading normal
+ if (material.normalTexIdx > -1)
+ {
+ // Get the number of mip levels
+ uint width, height, numLevels;
+ GetTex2D(material.normalTexIdx).GetDimensions(0, width, height, numLevels);
+
+ float3 tangent = normalize(mul(ObjectToWorld3x4(), float4(v.tangent.xyz, 0.f)).xyz);
+ float3 bitangent = cross(payload.normal, tangent) * v.tangent.w;
+ float3x3 TBN = { tangent, bitangent, payload.normal };
+ payload.shadingNormal = GetTex2D(material.normalTexIdx).SampleLevel(GetBilinearWrapSampler(), v.uv0, numLevels / 2.f).xyz;
+ payload.shadingNormal = (payload.shadingNormal * 2.f) - 1.f; // Transform to [-1, 1]
+ payload.shadingNormal = mul(payload.shadingNormal, TBN); // Transform tangent-space normal to world-space
}
// Pack the payload
diff --git a/samples/test-harness/shaders/GBufferRGS.hlsl b/samples/test-harness/shaders/GBufferRGS.hlsl
index 0b46a75..3f338ca 100644
--- a/samples/test-harness/shaders/GBufferRGS.hlsl
+++ b/samples/test-harness/shaders/GBufferRGS.hlsl
@@ -56,7 +56,7 @@ void RayGen()
RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
0xFF,
0,
- 1,
+ 0,
0,
ray,
packedPayload);
diff --git a/samples/test-harness/shaders/PathTraceRGS.hlsl b/samples/test-harness/shaders/PathTraceRGS.hlsl
index 2acfdd6..0ec2cb7 100644
--- a/samples/test-harness/shaders/PathTraceRGS.hlsl
+++ b/samples/test-harness/shaders/PathTraceRGS.hlsl
@@ -30,15 +30,47 @@ float3 TracePath(RayDesc ray, uint seed)
{
// Trace the ray
PackedPayload packedPayload = (PackedPayload)0;
+
+ #if GFX_NVAPI
+ if (GetPTShaderExecutionReordering())
+ {
+ NvHitObject hit;
+ NvTraceRayHitObject(
+ SceneTLAS,
+ RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
+ 0xFF,
+ 0,
+ 0,
+ 0,
+ ray,
+ packedPayload,
+ hit);
+ NvReorderThread(hit, 0, 0);
+ NvInvokeHitObject(SceneTLAS, hit, packedPayload);
+ }
+ else
+ {
+ TraceRay(
+ SceneTLAS,
+ RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
+ 0xFF,
+ 0,
+ 0,
+ 0,
+ ray,
+ packedPayload);
+ }
+ #else
TraceRay(
SceneTLAS,
RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
0xFF,
0,
- 1,
+ 0,
0,
ray,
packedPayload);
+ #endif
// Unpack the payload
Payload payload = UnpackPayload(packedPayload);
diff --git a/samples/test-harness/shaders/ProbeTraceRGS.hlsl b/samples/test-harness/shaders/ProbeTraceRGS.hlsl
deleted file mode 100644
index 55730e1..0000000
--- a/samples/test-harness/shaders/ProbeTraceRGS.hlsl
+++ /dev/null
@@ -1,216 +0,0 @@
-/*
-* Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
-*
-* NVIDIA CORPORATION and its licensors retain all intellectual property
-* and proprietary rights in and to this software, related documentation
-* and any modifications thereto. Any use, reproduction, disclosure or
-* distribution of this software and related documentation without an express
-* license agreement from NVIDIA CORPORATION is strictly prohibited.
-*/
-
-// -------- FEATURE DEFINES -----------------------------------------------------------------------
-
-// RTXGI_DDGI_PROBE_RELOCATION must be passed in as a define at shader compilation time.
-// This define specifies if probe relocation is enabled or disabled.
-// Ex: RTXGI_DDGI_PROBE_RELOCATION [0|1]
-#ifndef RTXGI_DDGI_PROBE_RELOCATION
-#error Required define RTXGI_DDGI_PROBE_RELOCATION is not defined for ProbeTraceRGS.hlsl!
-#endif
-
-// RTXGI_DDGI_PROBE_CLASSIFICATION must be passed in as a define at shader compilation time.
-// This define specifies if probe classification is enabled or disabled.
-// Ex: RTXGI_DDGI_PROBE_CLASSIFICATION [0|1]
-#ifndef RTXGI_DDGI_PROBE_CLASSIFICATION
-#error Required define RTXGI_DDGI_PROBE_CLASSIFICATION is not defined for ProbeTraceRGS.hlsl!
-#endif
-
-// RTXGI_DDGI_VOLUME_INFINITE_SCROLLING must be passed in as a define at shader compilation time.
-// This define specifies if infinite scrolling volume functionality is enabled or disabled.
-// Ex: RTXGI_DDGI_VOLUME_INFINITE_SCROLLING [0|1]
-#ifndef RTXGI_DDGI_VOLUME_INFINITE_SCROLLING
-#error Required define RTXGI_DDGI_VOLUME_INFINITE_SCROLLING is not defined for ProbeTraceRGS.hlsl!
-#endif
-
-// -------- CONFIGURATION DEFINES -----------------------------------------------------------------
-
-// RTXGI_DDGI_FORMAT_PROBE_RAY_DATA must be passed in as a define at shader compilation time.
-// This define specifies the format of the probe ray data texture.
-// Ex: RTXGI_DDGI_FORMAT_PROBE_RAY_DATA 0 => R32G32_FLOAT
-// Ex: RTXGI_DDGI_FORMAT_PROBE_RAY_DATA 1 => R32G32B32A32_FLOAT
-#ifndef RTXGI_DDGI_FORMAT_PROBE_RAY_DATA
-#error Required define RTXGI_DDGI_FORMAT_PROBE_RAY_DATA is not defined for ProbeTraceRGS.hlsl!
-#endif
-
-// -------------------------------------------------------------------------------------------
-
-#include "../../../rtxgi-sdk/shaders/ddgi/Irradiance.hlsl"
-
-#include "include/Descriptors.hlsl"
-#include "include/Lighting.hlsl"
-#include "include/RayTracing.hlsl"
-
-// ---[ Ray Generation Shader ]---
-
-[shader("raygeneration")]
-void RayGen()
-{
- float4 result = 0.f;
-
- uint2 DispatchIndex = DispatchRaysIndex().xy;
- int rayIndex = DispatchIndex.x; // index of the current probe ray
- int probeIndex = DispatchIndex.y; // index of current probe
-
- // Get the DDGIVolume's constants
- DDGIVolumeDescGPU DDGIVolume = UnpackDDGIVolumeDescGPU(DDGIVolumes[DDGI.volumeIndex]);
-
-#if RTXGI_DDGI_PROBE_RELOCATION || RTXGI_DDGI_PROBE_CLASSIFICATION
- Texture2D ProbeData = GetDDGIVolumeProbeDataSRV(DDGI.volumeIndex);
-#endif
-
-#if RTXGI_DDGI_PROBE_CLASSIFICATION
- #if RTXGI_DDGI_VOLUME_INFINITE_SCROLLING
- int storageProbeIndex = DDGIGetProbeIndexOffset(probeIndex, DDGIVolume.probeCounts, DDGIVolume.probeScrollOffsets);
- #else
- int storageProbeIndex = probeIndex;
- #endif
-
- int2 texelPosition = DDGIGetProbeTexelPosition(storageProbeIndex, DDGIVolume.probeCounts);
- float probeState = ProbeData.Load(int3(texelPosition, 0)).w;
- if (probeState == RTXGI_DDGI_PROBE_STATE_INACTIVE && rayIndex >= RTXGI_DDGI_NUM_FIXED_RAYS)
- {
- // Do not shoot rays when the probe is inactive *unless* it is one of the "fixed" rays used by probe classification
- return;
- }
-#endif
-
-#if RTXGI_DDGI_PROBE_RELOCATION
- #if RTXGI_DDGI_VOLUME_INFINITE_SCROLLING
- float3 probeWorldPosition = DDGIGetProbeWorldPositionWithOffset(probeIndex, DDGIVolume.origin, DDGIVolume.probeCounts, DDGIVolume.probeSpacing, DDGIVolume.probeScrollOffsets, ProbeData);
- #else
- float3 probeWorldPosition = DDGIGetProbeWorldPositionWithOffset(probeIndex, DDGIVolume.origin, DDGIVolume.probeCounts, DDGIVolume.probeSpacing, ProbeData);
- #endif
-#else
- float3 probeWorldPosition = DDGIGetProbeWorldPosition(probeIndex, DDGIVolume.origin, DDGIVolume.probeCounts, DDGIVolume.probeSpacing);
-#endif
-
- float3 probeRayDirection = DDGIGetProbeRayDirection(rayIndex, DDGIVolume.probeNumRays, DDGIVolume.probeRayRotation);
-
- // Setup the probe ray
- RayDesc ray;
- ray.Origin = probeWorldPosition;
- ray.Direction = probeRayDirection;
- ray.TMin = 0.f;
- ray.TMax = DDGIVolume.probeMaxRayDistance;
-
- // Trace the Probe Ray
- PackedPayload packedPayload = (PackedPayload)0;
-
-#if RTXGI_DDGI_PROBE_CLASSIFICATION
- // Pass the probe's state flag to hit shaders through the payload
- packedPayload.packed0.x = probeState;
-#endif
-
- TraceRay(
- SceneBVH,
- RAY_FLAG_NONE,
- 0xFF,
- 0,
- 1,
- 0,
- ray,
- packedPayload);
-
- // Get a reference to the ray data texture
- RWTexture2D RayData = GetDDGIVolumeRayDataUAV(DDGI.volumeIndex);
-
- // The ray missed. Set hit distance to a large value and exit early.
- if (packedPayload.hitT < 0.f)
- {
- #if (RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 1)
- RayData[DispatchIndex.xy] = float4(GetGlobalConst(app, skyRadiance), 1e27f);
- #else // RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 0
- RayData[DispatchIndex.xy] = float4(asfloat(RTXGIFloat3ToUint(GetGlobalConst(app, skyRadiance))), 1e27f, 0.f, 0.f);
- #endif
- return;
- }
-
- // Unpack the payload
- Payload payload = UnpackPayload(packedPayload);
-
- // Hit a surface backface.
- if (payload.hitKind == HIT_KIND_TRIANGLE_BACK_FACE)
- {
- // Make hit distance negative to mark a backface hit for blending, probe relocation, and probe classification.
- // Shorten the hit distance on a backface hit by 80% to decrease the influence of the probe during irradiance sampling.
- #if (RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 1)
- RayData[DispatchIndex.xy].w = -payload.hitT * 0.2f;
- #else // RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 0
- RayData[DispatchIndex.xy].g = -payload.hitT * 0.2f;
- #endif
- return;
- }
-
-#if RTXGI_DDGI_PROBE_CLASSIFICATION
- if (probeState == RTXGI_DDGI_PROBE_STATE_INACTIVE)
- {
- // Hit a front face, but the probe is inactive. This ray is only used for classification, so don't need to do lighting.
- #if (RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 1)
- RayData[DispatchIndex.xy].w = payload.hitT;
- #else // RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 0
- RayData[DispatchIndex.xy].g = payload.hitT;
- #endif
- return;
- }
-#endif
-
- // Direct Lighting and Shadowing
- float3 diffuse = DirectDiffuseLighting(payload, GetGlobalConst(pt, rayNormalBias), GetGlobalConst(pt, rayViewBias), SceneBVH);
-
- // Indirect Lighting (recursive)
- float3 irradiance = 0.f;
- float3 surfaceBias = DDGIGetSurfaceBias(payload.normal, ray.Direction, DDGIVolume);
-
- DDGIVolumeResources resources;
- resources.probeIrradiance = GetDDGIVolumeIrradianceSRV(DDGI.volumeIndex);
- resources.probeDistance = GetDDGIVolumeDistanceSRV(DDGI.volumeIndex);
-#if RTXGI_DDGI_PROBE_RELOCATION || RTXGI_DDGI_PROBE_CLASSIFICATION
- resources.probeData = ProbeData;
-#endif
- resources.bilinearSampler = GetBilinearWrapSampler();
-
- // Compute volume blending weight
- float volumeBlendWeight = DDGIGetVolumeBlendWeight(payload.worldPosition, DDGIVolume);
-
- // Avoid evaluating irradiance when the surface is outside the volume
- if (volumeBlendWeight > 0)
- {
- // Get irradiance from the DDGIVolume
- irradiance = DDGIGetVolumeIrradiance(
- payload.worldPosition,
- surfaceBias,
- payload.normal,
- DDGIVolume,
- resources);
-
- // Attenuate irradiance by the blend weight
- irradiance *= volumeBlendWeight;
- }
-
- // Perfectly diffuse reflectors don't exist in the real world. Limit the BRDF
- // albedo to a maximum value to account for the energy loss at each bounce.
- float maxAlbedo = 0.9f;
-
- // Compute final color
- result = float4(diffuse + ((min(payload.albedo, maxAlbedo) / PI) * irradiance), payload.hitT);
-
-#if (RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 1)
- // Use R32G32B32A32_FLOAT format. Store color components and hit distance as 32-bit float values.
- RayData[DispatchIndex.xy] = result;
-#else // RTXGI_DDGI_FORMAT_PROBE_RAY_DATA == 0
- // Use R32G32_FLOAT format (don't use R32G32_UINT since hit distance needs to be negative sometimes).
- // Pack color as R10G10B10 in R32 and store hit distance in G32.
- static const float c_threshold = 1.f / 255.f;
- if (RTXGIMaxComponent(result.rgb) <= c_threshold) result.rgb = float3(0.f, 0.f, 0.f);
- RayData[DispatchIndex.xy] = float4(asfloat(RTXGIFloat3ToUint(result.rgb)), payload.hitT, 0.f, 0.f);
-#endif
-}
diff --git a/samples/test-harness/shaders/RTAOTraceRGS.hlsl b/samples/test-harness/shaders/RTAOTraceRGS.hlsl
index a65482c..e351f4a 100644
--- a/samples/test-harness/shaders/RTAOTraceRGS.hlsl
+++ b/samples/test-harness/shaders/RTAOTraceRGS.hlsl
@@ -59,7 +59,7 @@ float GetOcclusion(int2 screenPos, float3 worldPos, float3 normal)
RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
0xFF,
0,
- 1,
+ 0,
0,
ray,
packedPayload);
diff --git a/samples/test-harness/shaders/ddgi/ProbeTraceRGS.hlsl b/samples/test-harness/shaders/ddgi/ProbeTraceRGS.hlsl
index f7f50c3..9088675 100644
--- a/samples/test-harness/shaders/ddgi/ProbeTraceRGS.hlsl
+++ b/samples/test-harness/shaders/ddgi/ProbeTraceRGS.hlsl
@@ -83,16 +83,47 @@ void RayGen()
// Get the acceleration structure
RaytracingAccelerationStructure SceneTLAS = GetAccelerationStructure(SCENE_TLAS_INDEX);
+#if GFX_NVAPI
+ if (GetPTShaderExecutionReordering())
+ {
+ NvHitObject hit;
+ NvTraceRayHitObject(
+ SceneTLAS,
+ RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
+ 0xFF,
+ 0,
+ 0,
+ 0,
+ ray,
+ packedPayload,
+ hit);
+ NvReorderThread(hit, 0, 0);
+ NvInvokeHitObject(SceneTLAS, hit, packedPayload);
+ }
+ else
+ {
+ TraceRay(
+ SceneTLAS,
+ RAY_FLAG_CULL_BACK_FACING_TRIANGLES,
+ 0xFF,
+ 0,
+ 0,
+ 0,
+ ray,
+ packedPayload);
+ }
+#else
// Trace the Probe Ray
TraceRay(
SceneTLAS,
RAY_FLAG_NONE,
0xFF,
0,
- 1,
+ 0,
0,
ray,
packedPayload);
+#endif
// Get the ray data texture array
RWTexture2DArray RayData = GetRWTex2DArray(resourceIndices.rayDataUAVIndex);
diff --git a/samples/test-harness/shaders/ddgi/visualizations/VolumeTexturesCS.hlsl b/samples/test-harness/shaders/ddgi/visualizations/VolumeTexturesCS.hlsl
index e22ca7d..059feb0 100644
--- a/samples/test-harness/shaders/ddgi/visualizations/VolumeTexturesCS.hlsl
+++ b/samples/test-harness/shaders/ddgi/visualizations/VolumeTexturesCS.hlsl
@@ -53,6 +53,8 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
Texture2DArray ProbeIrradiance = GetTex2DArray(resourceIndices.probeIrradianceSRVIndex);
Texture2DArray ProbeDistance = GetTex2DArray(resourceIndices.probeDistanceSRVIndex);
Texture2DArray ProbeData = GetTex2DArray(resourceIndices.probeDataSRVIndex);
+ Texture2DArray ProbeVariability = GetTex2DArray(resourceIndices.probeVariabilitySRVIndex);
+ Texture2DArray ProbeVariabilityAverage = GetTex2DArray(resourceIndices.probeVariabilityAverageSRVIndex);
// Load and unpack the DDGIVolume's constants
DDGIVolumeDescGPU volume = UnpackDDGIVolumeDescGPU(DDGIVolumes[volumeIndex]);
@@ -83,7 +85,7 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
if(DispatchThreadID.x < irradianceRect.x && DispatchThreadID.y < irradianceRect.y)
{
// Compute the sampling coordinates
- uint2 numScaledTexelsPerSlice = numTexelsPerSlice * irradianceScale;
+ uint2 numScaledTexelsPerSlice = numTexelsPerSlice * irradianceScale;
float2 sliceUV = (float2(0.5f, 0.5f) + float2(DispatchThreadID.xy % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
float3 coords = float3(sliceUV, sliceIndex);
@@ -126,7 +128,7 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
if (DispatchThreadID.x < xmax && DispatchThreadID.y >= ymin && DispatchThreadID.y < ymax)
{
// Compute the sampling coordinates
- uint2 numScaledTexelsPerSlice = numTexelsPerSlice * distanceScale;
+ uint2 numScaledTexelsPerSlice = numTexelsPerSlice * distanceScale;
float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
float3 coords = float3(sliceUV, sliceIndex);
@@ -142,12 +144,79 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
return;
}
+ // Variability
+ float variabilityScale = GetGlobalConst(ddgivis, probeVariabilityTextureScale);
+ numTexelsPerSlice = numProbesPerSlice * volume.probeNumIrradianceInteriorTexels;
+ uint2 variabilityRect = uint2(numTexelsPerSlice.x * numSlices, numTexelsPerSlice.y) * variabilityScale;
+ xmax = variabilityRect.x;
+ ymin += distanceRect.y + 5;
+ ymax = (ymin + variabilityRect.y);
+ if (DispatchThreadID.x < xmax.x && DispatchThreadID.y >= ymin && DispatchThreadID.y < ymax)
+ {
+ // Compute the sampling coordinates
+ uint2 numScaledTexelsPerSlice = numTexelsPerSlice * variabilityScale;
+ float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
+ float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
+ float3 coords = float3(sliceUV, sliceIndex);
+
+ // Sample the variability texture
+ float diff = ProbeVariability.SampleLevel(GetPointClampSampler(), coords, 0).r;
+
+ // Sample the probe data texture
+ bool active = true;
+ if (volume.probeClassificationEnabled)
+ {
+ // Sample the probe data texture
+ uint state = ProbeData.SampleLevel(GetPointClampSampler(), coords, 0).a;
+ active = (state == RTXGI_DDGI_PROBE_STATE_ACTIVE);
+ }
+
+ // Disabled = blue, above threshold = green, below = red, nan = yellow
+ if (!active) color = float3(0.f, 0.f, 1.f);
+ else if (isnan(diff)) color = float3(1.f, 1.f, 0.f);
+ else if (diff > GetGlobalConst(ddgivis, probeVariabilityTextureThreshold)) color = float3(0.f, 1.0, 0.f);
+ else color = float3(1.f, 0.f, 0.f);
+
+ // Overwrite GBufferA's albedo and mark the pixel to not be lit
+ GBufferA[DispatchThreadID.xy] = float4(color, 0.f);
+
+ return;
+ }
+
+ // Variability average
+ // 1/4 number of slices (rounded up) after reduction
+ uint2 variabilityAvgRect = uint2(numTexelsPerSlice.x * ((numSlices + 3)/4), numTexelsPerSlice.y) * variabilityScale;
+ xmax = variabilityAvgRect.x;
+ ymin += variabilityRect.y + 5;
+ ymax = (ymin + variabilityAvgRect.y);
+ if (DispatchThreadID.x < xmax.x && DispatchThreadID.y >= ymin && DispatchThreadID.y < ymax)
+ {
+ // Compute the sampling coordinates
+ uint2 numScaledTexelsPerSlice = numTexelsPerSlice * variabilityScale;
+ float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
+ float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
+ float3 coords = float3(sliceUV, sliceIndex);
+
+ // Sample the variability average texture
+ float diff = ProbeVariabilityAverage.SampleLevel(GetPointClampSampler(), coords, 0).r;
+
+ // Above threshold = green, below = red, nan = yellow
+ if (isnan(diff)) color = float3(1.f, 1.f, 0.f);
+ else if (diff > GetGlobalConst(ddgivis, probeVariabilityTextureThreshold)) color = float3(0.f, 1.f, 0.f);
+ else color = float3(1.f, 0.f, 0.f);
+
+ // Overwrite GBufferA's albedo and mark the pixel to not be lit
+ GBufferA[DispatchThreadID.xy] = float4(color, 0.f);
+
+ return;
+ }
+
// Get the texture scale factor for probe data
float probeDataScale = GetGlobalConst(ddgivis, probeDataTextureScale);
// Relocation Offsets
uint2 offsetRect = 0;
- ymin += distanceRect.y + 5;
+ ymin += variabilityAvgRect.y + 5;
if (volume.probeRelocationEnabled)
{
offsetRect = uint2(numProbesPerSlice.x * numSlices, numProbesPerSlice.y) * probeDataScale;
@@ -157,7 +226,7 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
if (DispatchThreadID.x < xmax && DispatchThreadID.y >= ymin && DispatchThreadID.y < ymax)
{
// Compute the sampling coordinates
- uint2 numScaledTexelsPerSlice = numProbesPerSlice * probeDataScale;
+ uint2 numScaledTexelsPerSlice = numProbesPerSlice * probeDataScale;
float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
float3 coords = float3(sliceUV, sliceIndex);
@@ -184,7 +253,7 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
if (DispatchThreadID.x < xmax && DispatchThreadID.y >= ymin && DispatchThreadID.y < ymax)
{
// Compute the sampling coordinates
- uint2 numScaledTexelsPerSlice = numProbesPerSlice * probeDataScale;
+ uint2 numScaledTexelsPerSlice = numProbesPerSlice * probeDataScale;
float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
float3 coords = float3(sliceUV, sliceIndex);
@@ -212,8 +281,7 @@ void CS(uint3 DispatchThreadID : SV_DispatchThreadID)
if (DispatchThreadID.x <= xmax && DispatchThreadID.y > ymin && DispatchThreadID.y <= ymax)
{
// Compute the sampling coordinates
- uint2 numScaledTexelsPerSlice = numTexelsPerSlice * rayDataScale;
-
+ uint2 numScaledTexelsPerSlice = numTexelsPerSlice * rayDataScale;
float2 sliceUV = (float2(0.5f, 0.5f) + float2(uint2(DispatchThreadID.x, DispatchThreadID.y - ymin) % numScaledTexelsPerSlice)) / float2(numScaledTexelsPerSlice);
float sliceIndex = float(DispatchThreadID.x / numScaledTexelsPerSlice.x);
float3 coords = float3(sliceUV, sliceIndex);
diff --git a/samples/test-harness/shaders/include/Descriptors.hlsl b/samples/test-harness/shaders/include/Descriptors.hlsl
index 424474e..f1156e4 100644
--- a/samples/test-harness/shaders/include/Descriptors.hlsl
+++ b/samples/test-harness/shaders/include/Descriptors.hlsl
@@ -38,8 +38,9 @@ VK_PUSH_CONST ConstantBuffer GlobalConst : register(b0, space0)
#define GetGlobalConst(x, y) (GlobalConst.x##_##y)
-uint GetPTSamplesPerPixel() { return (GetGlobalConst(pt, samplesPerPixel) & 0x7FFFFFFF); }
+uint GetPTSamplesPerPixel() { return (GetGlobalConst(pt, samplesPerPixel) & 0x3FFFFFFF); }
uint GetPTAntialiasing() { return (GetGlobalConst(pt, samplesPerPixel) & 0x80000000); }
+uint GetPTShaderExecutionReordering() { return GetGlobalConst(pt, samplesPerPixel) & 0x40000000; }
uint HasDirectionalLight() { return GetGlobalConst(lighting, hasDirectionalLight); }
uint GetNumPointLights() { return GetGlobalConst(lighting, numPointLights); }
@@ -97,8 +98,9 @@ VK_BINDING(13, 0) ByteAddressBuffer ByteAddrBuffer[]
#define SPHERE_INDEX_BUFFER_INDEX 0
#define SPHERE_VERTEX_BUFFER_INDEX 1
-#define MATERIAL_INDICES_INDEX 2
-#define GEOMETRY_BUFFERS_INDEX 3
+#define MESH_OFFSETS_INDEX 2
+#define GEOMETRY_DATA_INDEX 3
+#define GEOMETRY_BUFFERS_INDEX 4
// Sampler Accessor Functions ------------------------------------------------------------------------------
@@ -112,8 +114,16 @@ SamplerState GetAnisoWrapSampler() { return Samplers[2]; }
StructuredBuffer GetLights() { return Lights; }
-Material GetMaterial(uint index) { return Materials[index]; }
-uint GetMaterialIndex(uint meshIndex) { return ByteAddrBuffer[MATERIAL_INDICES_INDEX].Load(meshIndex * 4); }
+void GetGeometryData(uint meshIndex, uint geometryIndex, out GeometryData geometry)
+{
+ uint address = ByteAddrBuffer[MESH_OFFSETS_INDEX].Load(meshIndex * 4); // address of the Mesh in the GeometryData buffer
+ address += geometryIndex * 12; // offset to mesh primitive geometry, GeometryData stride is 12 bytes
+
+ geometry.materialIndex = ByteAddrBuffer[GEOMETRY_DATA_INDEX].Load(address);
+ geometry.indexByteAddress = ByteAddrBuffer[GEOMETRY_DATA_INDEX].Load(address + 4);
+ geometry.vertexByteAddress = ByteAddrBuffer[GEOMETRY_DATA_INDEX].Load(address + 8);
+}
+Material GetMaterial(GeometryData geometry) { return Materials[geometry.materialIndex]; }
StructuredBuffer GetDDGIVolumeConstants(uint index) { return DDGIVolumes; }
StructuredBuffer GetDDGIVolumeResourceIndices(uint index) { return DDGIVolumeBindless; }
@@ -156,15 +166,16 @@ Texture2DArray GetTex2DArray(uint index) { return Tex2DArray[index]; }
#define RTAO_RAW_INDEX 14
#define DDGI_OUTPUT_INDEX 15
-#define SCENE_TLAS_INDEX 40
-#define DDGIPROBEVIS_TLAS_INDEX 41
+#define SCENE_TLAS_INDEX 52
+#define DDGIPROBEVIS_TLAS_INDEX 53
-#define BLUE_NOISE_INDEX 42
+#define BLUE_NOISE_INDEX 54
-#define SPHERE_INDEX_BUFFER_INDEX 368
-#define SPHERE_VERTEX_BUFFER_INDEX 369
-#define MATERIAL_INDICES_INDEX 370
-#define GEOMETRY_BUFFERS_INDEX 371
+#define SPHERE_INDEX_BUFFER_INDEX 392
+#define SPHERE_VERTEX_BUFFER_INDEX 393
+#define MESH_OFFSETS_INDEX 394
+#define GEOMETRY_DATA_INDEX 395
+#define GEOMETRY_BUFFERS_INDEX 396
// Sampler Accessor Functions ------------------------------------------------------------------------------
@@ -178,8 +189,17 @@ SamplerState GetAnisoWrapSampler() { return SamplerDescriptorHeap[2]; }
StructuredBuffer GetLights() { return StructuredBuffer(ResourceDescriptorHeap[LIGHTS_INDEX]); }
-Material GetMaterial(uint index) { return StructuredBuffer(ResourceDescriptorHeap[MATERIALS_INDEX]).Load(index); }
-uint GetMaterialIndex(uint meshIndex) { return ByteAddressBuffer(ResourceDescriptorHeap[MATERIAL_INDICES_INDEX]).Load(meshIndex * 4); }
+void GetGeometryData(uint meshIndex, uint geometryIndex, out GeometryData geometry)
+{
+ uint address = ByteAddressBuffer(ResourceDescriptorHeap[MESH_OFFSETS_INDEX]).Load(meshIndex * 4) * 12; // offset to start of mesh, GeometryData is 12 bytes
+ address += geometryIndex * 12; // offset to mesh primitive geometry
+
+ ByteAddressBuffer geometryData = ByteAddressBuffer(ResourceDescriptorHeap[GEOMETRY_DATA_INDEX]);
+ geometry.materialIndex = geometryData.Load(address);
+ geometry.indexByteAddress = geometryData.Load(address + 4);
+ geometry.vertexByteAddress = geometryData.Load(address + 8);
+}
+Material GetMaterial(GeometryData geometry) { return StructuredBuffer(ResourceDescriptorHeap[MATERIALS_INDEX]).Load(geometry.materialIndex); }
StructuredBuffer GetDDGIVolumeConstants(uint index) { return ResourceDescriptorHeap[index]; }
StructuredBuffer GetDDGIVolumeResourceIndices(uint index) { return ResourceDescriptorHeap[index]; }
diff --git a/samples/test-harness/shaders/include/Lighting.hlsl b/samples/test-harness/shaders/include/Lighting.hlsl
index b7a8984..55f7f15 100644
--- a/samples/test-harness/shaders/include/Lighting.hlsl
+++ b/samples/test-harness/shaders/include/Lighting.hlsl
@@ -57,7 +57,7 @@ float LightVisibility(
RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH | RAY_FLAG_SKIP_CLOSEST_HIT_SHADER,
0xFF,
0,
- 1,
+ 0,
0,
ray,
packedPayload);
diff --git a/samples/test-harness/shaders/include/Platform.hlsl b/samples/test-harness/shaders/include/Platform.hlsl
index 010e876..09c5eda 100644
--- a/samples/test-harness/shaders/include/Platform.hlsl
+++ b/samples/test-harness/shaders/include/Platform.hlsl
@@ -11,6 +11,13 @@
#ifndef PLATFORM_HLSL
#define PLATFORM_HLSL
+#if GFX_NVAPI
+#define NV_SHADER_EXTN_SLOT u999999
+#define NV_SHADER_EXTN_REGISTER_SPACE space999999
+#define NV_HITOBJECT_USE_MACRO_API
+#include "nvapi/nvHLSLExtns.h"
+#endif
+
#ifdef __spirv__
#define VK_BINDING(x, y) [[vk::binding(x, y)]]
#define VK_PUSH_CONST [[vk::push_constant]]
diff --git a/samples/test-harness/shaders/include/RayTracing.hlsl b/samples/test-harness/shaders/include/RayTracing.hlsl
index 3837e75..65f0f12 100644
--- a/samples/test-harness/shaders/include/RayTracing.hlsl
+++ b/samples/test-harness/shaders/include/RayTracing.hlsl
@@ -37,7 +37,6 @@ PackedPayload PackPayload(Payload input)
output.packed1.y = f32tof16(input.shadingNormal.z);
output.packed1.y |= f32tof16(input.opacity) << 16;
output.packed1.z = f32tof16(input.hitKind);
- //output.packed1.z = unused
return output;
}
@@ -73,26 +72,26 @@ Payload UnpackPayload(PackedPayload input)
/**
* Load a triangle's indices.
*/
-uint3 LoadIndices(uint meshIndex, uint primitiveIndex)
+uint3 LoadIndices(uint meshIndex, uint primitiveIndex, GeometryData geometry)
{
- uint address = (primitiveIndex * 3) * 4; // 3 indices per primitive, 4 bytes for each index
- return GetIndexBuffer(meshIndex).Load3(address); // Mesh index buffers start at index 3 and alternate with vertex buffer pointers
+ uint address = geometry.indexByteAddress + (primitiveIndex * 3) * 4; // 3 indices per primitive, 4 bytes for each index
+ return GetIndexBuffer(meshIndex).Load3(address); // Mesh index buffers start at index 4 and alternate with vertex buffer pointers
}
/**
* Load a triangle's vertex data (all: position, normal, tangent, uv0).
*/
-void LoadVertices(uint meshIndex, uint primitiveIndex, out Vertex vertices[3])
+void LoadVertices(uint meshIndex, uint primitiveIndex, GeometryData geometry, out Vertex vertices[3])
{
// Get the indices
- uint3 indices = LoadIndices(meshIndex, primitiveIndex);
+ uint3 indices = LoadIndices(meshIndex, primitiveIndex, geometry);
// Load the vertices
uint address;
for (uint i = 0; i < 3; i++)
{
- vertices[i] = (Vertex)0; // Initialize the vertex
- address = (indices[i] * 12) * 4; // Vertices contain 12 floats / 48 bytes
+ vertices[i] = (Vertex)0;
+ address = geometry.vertexByteAddress + (indices[i] * 12) * 4; // Vertices contain 12 floats / 48 bytes
// Load the position
vertices[i].position = asfloat(GetVertexBuffer(meshIndex).Load3(address));
@@ -114,17 +113,17 @@ void LoadVertices(uint meshIndex, uint primitiveIndex, out Vertex vertices[3])
/**
* Load a triangle's vertex data (only position and uv0).
*/
-void LoadVerticesPosUV0(uint meshIndex, uint primitiveIndex, out Vertex vertices[3])
+void LoadVerticesPosUV0(uint meshIndex, uint primitiveIndex, GeometryData geometry, out Vertex vertices[3])
{
// Get the indices
- uint3 indices = LoadIndices(meshIndex, primitiveIndex);
+ uint3 indices = LoadIndices(meshIndex, primitiveIndex, geometry);
// Load the vertices
uint address;
for (uint i = 0; i < 3; i++)
{
- vertices[i] = (Vertex)0; // Initialize the vertex
- address = (indices[i] * 12) * 4; // Vertices contain 12 floats / 48 bytes
+ vertices[i] = (Vertex)0;
+ address = geometry.vertexByteAddress + (indices[i] * 12) * 4; // Vertices contain 12 floats / 48 bytes
// Load the position
vertices[i].position = asfloat(GetVertexBuffer(meshIndex).Load3(address));
@@ -138,18 +137,18 @@ void LoadVerticesPosUV0(uint meshIndex, uint primitiveIndex, out Vertex vertices
/**
* Load (only) a triangle's texture coordinates and return the barycentric interpolated texture coordinates.
*/
-float2 LoadAndInterpolateUV0(uint meshIndex, uint primitiveIndex, float3 barycentrics)
+float2 LoadAndInterpolateUV0(uint meshIndex, uint primitiveIndex, GeometryData geometry, float3 barycentrics)
{
// Get the triangle indices
- uint3 indices = LoadIndices(meshIndex, primitiveIndex);
+ uint3 indices = LoadIndices(meshIndex, primitiveIndex, geometry);
// Interpolate the texture coordinates
int address;
float2 uv0 = float2(0.f, 0.f);
for (uint i = 0; i < 3; i++)
{
- address = (indices[i] * 12) * 4; // 12 floats (3: pos, 3: normals, 4:tangent, 2:uv0)
- address += 40; // 40 bytes (10 * 4): skip position, normal, and tangent
+ address = geometry.vertexByteAddress + (indices[i] * 12) * 4; // 12 floats (3: pos, 3: normals, 4:tangent, 2:uv0)
+ address += 40; // 40 bytes (10 * 4): skip position, normal, and tangent
uv0 += asfloat(GetVertexBuffer(meshIndex).Load2(address)) * barycentrics[i];
}
diff --git a/samples/test-harness/shaders/include/nvapi/nvHLSLExtns.h b/samples/test-harness/shaders/include/nvapi/nvHLSLExtns.h
new file mode 100644
index 0000000..9394036
--- /dev/null
+++ b/samples/test-harness/shaders/include/nvapi/nvHLSLExtns.h
@@ -0,0 +1,2206 @@
+ /************************************************************************************************************************************\
+|* *|
+|* Copyright © 2012 NVIDIA Corporation. All rights reserved. *|
+|* *|
+|* NOTICE TO USER: *|
+|* *|
+|* This software is subject to NVIDIA ownership rights under U.S. and international Copyright laws. *|
+|* *|
+|* This software and the information contained herein are PROPRIETARY and CONFIDENTIAL to NVIDIA *|
+|* and are being provided solely under the terms and conditions of an NVIDIA software license agreement. *|
+|* Otherwise, you have no rights to use or access this software in any manner. *|
+|* *|
+|* If not covered by the applicable NVIDIA software license agreement: *|
+|* NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOFTWARE FOR ANY PURPOSE. *|
+|* IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. *|
+|* NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, *|
+|* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. *|
+|* IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, *|
+|* OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, *|
+|* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. *|
+|* *|
+|* U.S. Government End Users. *|
+|* This software is a "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT 1995), *|
+|* consisting of "commercial computer software" and "commercial computer software documentation" *|
+|* as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. *|
+|* Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (JUNE 1995), *|
+|* all U.S. Government End Users acquire the software with only those rights set forth herein. *|
+|* *|
+|* Any use of this software in individual and commercial software must include, *|
+|* in the user documentation and internal comments to the code, *|
+|* the above Disclaimer (as applicable) and U.S. Government End Users Notice. *|
+|* *|
+ \************************************************************************************************************************************/
+
+////////////////////////// NVIDIA SHADER EXTENSIONS /////////////////
+
+// this file is to be #included in the app HLSL shader code to make
+// use of nvidia shader extensions
+
+
+#include "nvHLSLExtnsInternal.h"
+
+//----------------------------------------------------------------------------//
+//------------------------- Warp Shuffle Functions ---------------------------//
+//----------------------------------------------------------------------------//
+
+// all functions have variants with width parameter which permits sub-division
+// of the warp into segments - for example to exchange data between 4 groups of
+// 8 lanes in a SIMD manner. If width is less than warpSize then each subsection
+// of the warp behaves as a separate entity with a starting logical lane ID of 0.
+// A thread may only exchange data with others in its own subsection. Width must
+// have a value which is a power of 2 so that the warp can be subdivided equally;
+// results are undefined if width is not a power of 2, or is a number greater
+// than warpSize.
+
+//
+// simple variant of SHFL instruction
+// returns val from the specified lane
+// optional width parameter must be a power of two and width <= 32
+//
+int NvShfl(int val, uint srcLane, int width = NV_WARP_SIZE)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = val; // variable to be shuffled
+ g_NvidiaExt[index].src0u.y = srcLane; // source lane
+ g_NvidiaExt[index].src0u.z = __NvGetShflMaskFromWidth(width);
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_SHFL;
+
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+int2 NvShfl(int2 val, uint srcLane, int width = NV_WARP_SIZE)
+{
+ int x = NvShfl(val.x, srcLane, width);
+ int y = NvShfl(val.y, srcLane, width);
+ return int2(x, y);
+}
+
+int4 NvShfl(int4 val, uint srcLane, int width = NV_WARP_SIZE)
+{
+ int x = NvShfl(val.x, srcLane, width);
+ int y = NvShfl(val.y, srcLane, width);
+ int z = NvShfl(val.z, srcLane, width);
+ int w = NvShfl(val.w, srcLane, width);
+ return int4(x, y, z, w);
+}
+
+//
+// Copy from a lane with lower ID relative to caller
+//
+int NvShflUp(int val, uint delta, int width = NV_WARP_SIZE)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = val; // variable to be shuffled
+ g_NvidiaExt[index].src0u.y = delta; // relative lane offset
+ g_NvidiaExt[index].src0u.z = (NV_WARP_SIZE - width) << 8; // minIndex = maxIndex for shfl_up (src2[4:0] is expected to be 0)
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_SHFL_UP;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+//
+// Copy from a lane with higher ID relative to caller
+//
+int NvShflDown(int val, uint delta, int width = NV_WARP_SIZE)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = val; // variable to be shuffled
+ g_NvidiaExt[index].src0u.y = delta; // relative lane offset
+ g_NvidiaExt[index].src0u.z = __NvGetShflMaskFromWidth(width);
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_SHFL_DOWN;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+//
+// Copy from a lane based on bitwise XOR of own lane ID
+//
+int NvShflXor(int val, uint laneMask, int width = NV_WARP_SIZE)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = val; // variable to be shuffled
+ g_NvidiaExt[index].src0u.y = laneMask; // laneMask to be XOR'ed with current laneId to get the source lane id
+ g_NvidiaExt[index].src0u.z = __NvGetShflMaskFromWidth(width);
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_SHFL_XOR;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+
+//----------------------------------------------------------------------------//
+//----------------------------- Warp Vote Functions---------------------------//
+//----------------------------------------------------------------------------//
+
+// returns 0xFFFFFFFF if the predicate is true for any thread in the warp, returns 0 otherwise
+uint NvAny(int predicate)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = predicate;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_VOTE_ANY;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+// returns 0xFFFFFFFF if the predicate is true for ALL threads in the warp, returns 0 otherwise
+uint NvAll(int predicate)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = predicate;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_VOTE_ALL;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+// returns a mask of all threads in the warp with bits set for threads that have predicate true
+uint NvBallot(int predicate)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = predicate;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_VOTE_BALLOT;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+
+//----------------------------------------------------------------------------//
+//----------------------------- Utility Functions ----------------------------//
+//----------------------------------------------------------------------------//
+
+// returns the lane index of the current thread (thread index in warp)
+int NvGetLaneId()
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_GET_LANE_ID;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+// returns value of special register - specify subopcode from any of NV_SPECIALOP_* specified in nvShaderExtnEnums.h - other opcodes undefined behavior
+uint NvGetSpecial(uint subOpCode)
+{
+ return __NvGetSpecial(subOpCode);
+}
+
+//----------------------------------------------------------------------------//
+//----------------------------- FP16 Atmoic Functions-------------------------//
+//----------------------------------------------------------------------------//
+
+// The functions below performs atomic operations on two consecutive fp16
+// values in the given raw UAV.
+// The uint paramater 'fp16x2Val' is treated as two fp16 values byteAddress must be multiple of 4
+// The returned value are the two fp16 values packed into a single uint
+
+uint NvInterlockedAddFp16x2(RWByteAddressBuffer uav, uint byteAddress, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWByteAddressBuffer uav, uint byteAddress, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWByteAddressBuffer uav, uint byteAddress, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+
+// versions of the above functions taking two fp32 values (internally converted to fp16 values)
+uint NvInterlockedAddFp16x2(RWByteAddressBuffer uav, uint byteAddress, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWByteAddressBuffer uav, uint byteAddress, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWByteAddressBuffer uav, uint byteAddress, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, byteAddress, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MAX);
+}
+
+
+//----------------------------------------------------------------------------//
+
+// The functions below perform atomic operation on a R16G16_FLOAT UAV at the given address
+// the uint paramater 'fp16x2Val' is treated as two fp16 values
+// the returned value are the two fp16 values (.x and .y components) packed into a single uint
+// Warning: Behaviour of these set of functions is undefined if the UAV is not
+// of R16G16_FLOAT format (might result in app crash or TDR)
+
+uint NvInterlockedAddFp16x2(RWTexture1D uav, uint address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture1D uav, uint address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture1D uav, uint address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+uint NvInterlockedAddFp16x2(RWTexture2D uav, uint2 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture2D uav, uint2 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture2D uav, uint2 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+uint NvInterlockedAddFp16x2(RWTexture3D uav, uint3 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture3D uav, uint3 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture3D uav, uint3 address, uint fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+
+// versions taking two fp32 values (internally converted to fp16)
+uint NvInterlockedAddFp16x2(RWTexture1D uav, uint address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture1D uav, uint address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture1D uav, uint address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MAX);
+}
+
+uint NvInterlockedAddFp16x2(RWTexture2D uav, uint2 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture2D uav, uint2 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture2D uav, uint2 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MAX);
+}
+
+uint NvInterlockedAddFp16x2(RWTexture3D uav, uint3 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_ADD);
+}
+
+uint NvInterlockedMinFp16x2(RWTexture3D uav, uint3 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MIN);
+}
+
+uint NvInterlockedMaxFp16x2(RWTexture3D uav, uint3 address, float2 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x2Tofp16x2(val), NV_EXTN_ATOM_MAX);
+}
+
+
+//----------------------------------------------------------------------------//
+
+// The functions below perform Atomic operation on a R16G16B16A16_FLOAT UAV at the given address
+// the uint2 paramater 'fp16x2Val' is treated as four fp16 values
+// i.e, fp16x2Val.x = uav.xy and fp16x2Val.y = uav.yz
+// The returned value are the four fp16 values (.xyzw components) packed into uint2
+// Warning: Behaviour of these set of functions is undefined if the UAV is not
+// of R16G16B16A16_FLOAT format (might result in app crash or TDR)
+
+uint2 NvInterlockedAddFp16x4(RWTexture1D uav, uint address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture1D uav, uint address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture1D uav, uint address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedAddFp16x4(RWTexture2D uav, uint2 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture2D uav, uint2 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture2D uav, uint2 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedAddFp16x4(RWTexture3D uav, uint3 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture3D uav, uint3 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture3D uav, uint3 address, uint2 fp16x2Val)
+{
+ return __NvAtomicOpFP16x2(uav, address, fp16x2Val, NV_EXTN_ATOM_MAX);
+}
+
+// versions taking four fp32 values (internally converted to fp16)
+uint2 NvInterlockedAddFp16x4(RWTexture1D uav, uint address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture1D uav, uint address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture1D uav, uint address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedAddFp16x4(RWTexture2D uav, uint2 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture2D uav, uint2 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture2D uav, uint2 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedAddFp16x4(RWTexture3D uav, uint3 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMinFp16x4(RWTexture3D uav, uint3 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedMaxFp16x4(RWTexture3D uav, uint3 address, float4 val)
+{
+ return __NvAtomicOpFP16x2(uav, address, __fp32x4Tofp16x4(val), NV_EXTN_ATOM_MAX);
+}
+
+
+//----------------------------------------------------------------------------//
+//----------------------------- FP32 Atmoic Functions-------------------------//
+//----------------------------------------------------------------------------//
+
+// The functions below performs atomic add on the given UAV treating the value as float
+// byteAddress must be multiple of 4
+// The returned value is the value present in memory location before the atomic add
+
+float NvInterlockedAddFp32(RWByteAddressBuffer uav, uint byteAddress, float val)
+{
+ return __NvAtomicAddFP32(uav, byteAddress, val);
+}
+
+//----------------------------------------------------------------------------//
+
+// The functions below perform atomic add on a R32_FLOAT UAV at the given address
+// the returned value is the value before performing the atomic add
+// Warning: Behaviour of these set of functions is undefined if the UAV is not
+// of R32_FLOAT format (might result in app crash or TDR)
+
+float NvInterlockedAddFp32(RWTexture1D uav, uint address, float val)
+{
+ return __NvAtomicAddFP32(uav, address, val);
+}
+
+float NvInterlockedAddFp32(RWTexture2D uav, uint2 address, float val)
+{
+ return __NvAtomicAddFP32(uav, address, val);
+}
+
+float NvInterlockedAddFp32(RWTexture3D uav, uint3 address, float val)
+{
+ return __NvAtomicAddFP32(uav, address, val);
+}
+
+
+//----------------------------------------------------------------------------//
+//--------------------------- UINT64 Atmoic Functions-------------------------//
+//----------------------------------------------------------------------------//
+
+// The functions below performs atomic operation on the given UAV treating the value as uint64
+// byteAddress must be multiple of 8
+// The returned value is the value present in memory location before the atomic operation
+// uint2 vector type is used to represent a single uint64 value with the x component containing the low 32 bits and y component the high 32 bits.
+
+uint2 NvInterlockedAddUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMaxUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedMinUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedAndUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_AND);
+}
+
+uint2 NvInterlockedOrUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_OR);
+}
+
+uint2 NvInterlockedXorUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_XOR);
+}
+
+uint2 NvInterlockedCompareExchangeUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 compare_value, uint2 value)
+{
+ return __NvAtomicCompareExchangeUINT64(uav, byteAddress, compare_value, value);
+}
+
+uint2 NvInterlockedExchangeUint64(RWByteAddressBuffer uav, uint byteAddress, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, byteAddress, value, NV_EXTN_ATOM_SWAP);
+}
+
+//----------------------------------------------------------------------------//
+
+// The functions below perform atomic operation on a R32G32_UINT UAV at the given address treating the value as uint64
+// the returned value is the value before performing the atomic operation
+// uint2 vector type is used to represent a single uint64 value with the x component containing the low 32 bits and y component the high 32 bits.
+// Warning: Behaviour of these set of functions is undefined if the UAV is not of R32G32_UINT format (might result in app crash or TDR)
+
+uint2 NvInterlockedAddUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMaxUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedMinUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedAndUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_AND);
+}
+
+uint2 NvInterlockedOrUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_OR);
+}
+
+uint2 NvInterlockedXorUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_XOR);
+}
+
+uint2 NvInterlockedCompareExchangeUint64(RWTexture1D uav, uint address, uint2 compare_value, uint2 value)
+{
+ return __NvAtomicCompareExchangeUINT64(uav, address, compare_value, value);
+}
+
+uint2 NvInterlockedExchangeUint64(RWTexture1D uav, uint address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_SWAP);
+}
+
+uint2 NvInterlockedAddUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMaxUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedMinUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedAndUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_AND);
+}
+
+uint2 NvInterlockedOrUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_OR);
+}
+
+uint2 NvInterlockedXorUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_XOR);
+}
+
+uint2 NvInterlockedCompareExchangeUint64(RWTexture2D uav, uint2 address, uint2 compare_value, uint2 value)
+{
+ return __NvAtomicCompareExchangeUINT64(uav, address, compare_value, value);
+}
+
+uint2 NvInterlockedExchangeUint64(RWTexture2D uav, uint2 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_SWAP);
+}
+
+uint2 NvInterlockedAddUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_ADD);
+}
+
+uint2 NvInterlockedMaxUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MAX);
+}
+
+uint2 NvInterlockedMinUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_MIN);
+}
+
+uint2 NvInterlockedAndUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_AND);
+}
+
+uint2 NvInterlockedOrUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_OR);
+}
+
+uint2 NvInterlockedXorUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_XOR);
+}
+
+uint2 NvInterlockedCompareExchangeUint64(RWTexture3D uav, uint3 address, uint2 compare_value, uint2 value)
+{
+ return __NvAtomicCompareExchangeUINT64(uav, address, compare_value, value);
+}
+
+uint2 NvInterlockedExchangeUint64(RWTexture3D uav, uint3 address, uint2 value)
+{
+ return __NvAtomicOpUINT64(uav, address, value, NV_EXTN_ATOM_SWAP);
+}
+
+//----------------------------------------------------------------------------//
+//--------------------------- VPRS functions ---------------------------------//
+//----------------------------------------------------------------------------//
+
+// Returns the shading rate and the number of per-pixel shading passes for current VPRS pixel
+uint3 NvGetShadingRate()
+{
+ uint3 shadingRate = (uint3)0;
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_GET_SHADING_RATE;
+ g_NvidiaExt[index].numOutputsForIncCounter = 3;
+ shadingRate.x = g_NvidiaExt.IncrementCounter();
+ shadingRate.y = g_NvidiaExt.IncrementCounter();
+ shadingRate.z = g_NvidiaExt.IncrementCounter();
+ return shadingRate;
+}
+
+float NvEvaluateAttributeAtSampleForVPRS(float attrib, uint sampleIndex, int2 pixelOffset)
+{
+ float value = (float)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float2 NvEvaluateAttributeAtSampleForVPRS(float2 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ float2 value = (float2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float3 NvEvaluateAttributeAtSampleForVPRS(float3 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ float3 value = (float3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ value.z = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float4 NvEvaluateAttributeAtSampleForVPRS(float4 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ float4 value = (float4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ value.z = asfloat(g_NvidiaExt.IncrementCounter());
+ value.w = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int NvEvaluateAttributeAtSampleForVPRS(int attrib, uint sampleIndex, int2 pixelOffset)
+{
+ int value = (int)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int2 NvEvaluateAttributeAtSampleForVPRS(int2 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ int2 value = (int2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int3 NvEvaluateAttributeAtSampleForVPRS(int3 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ int3 value = (int3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ value.z = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int4 NvEvaluateAttributeAtSampleForVPRS(int4 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ int4 value = (int4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ value.z = asint(g_NvidiaExt.IncrementCounter());
+ value.w = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint NvEvaluateAttributeAtSampleForVPRS(uint attrib, uint sampleIndex, int2 pixelOffset)
+{
+ uint value = (uint)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint2 NvEvaluateAttributeAtSampleForVPRS(uint2 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ uint2 value = (uint2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint3 NvEvaluateAttributeAtSampleForVPRS(uint3 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ uint3 value = (uint3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ value.z = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint4 NvEvaluateAttributeAtSampleForVPRS(uint4 attrib, uint sampleIndex, int2 pixelOffset)
+{
+ uint4 value = (uint4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.x = sampleIndex;
+ g_NvidiaExt[ext].src2u.xy = pixelOffset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ value.z = asuint(g_NvidiaExt.IncrementCounter());
+ value.w = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+
+float NvEvaluateAttributeSnappedForVPRS(float attrib, uint2 offset)
+{
+ float value = (float)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float2 NvEvaluateAttributeSnappedForVPRS(float2 attrib, uint2 offset)
+{
+ float2 value = (float2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float3 NvEvaluateAttributeSnappedForVPRS(float3 attrib, uint2 offset)
+{
+ float3 value = (float3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ value.z = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+float4 NvEvaluateAttributeSnappedForVPRS(float4 attrib, uint2 offset)
+{
+ float4 value = (float4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asfloat(g_NvidiaExt.IncrementCounter());
+ value.y = asfloat(g_NvidiaExt.IncrementCounter());
+ value.z = asfloat(g_NvidiaExt.IncrementCounter());
+ value.w = asfloat(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int NvEvaluateAttributeSnappedForVPRS(int attrib, uint2 offset)
+{
+ int value = (int)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int2 NvEvaluateAttributeSnappedForVPRS(int2 attrib, uint2 offset)
+{
+ int2 value = (int2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int3 NvEvaluateAttributeSnappedForVPRS(int3 attrib, uint2 offset)
+{
+ int3 value = (int3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ value.z = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+int4 NvEvaluateAttributeSnappedForVPRS(int4 attrib, uint2 offset)
+{
+ int4 value = (int4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asint(g_NvidiaExt.IncrementCounter());
+ value.y = asint(g_NvidiaExt.IncrementCounter());
+ value.z = asint(g_NvidiaExt.IncrementCounter());
+ value.w = asint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint NvEvaluateAttributeSnappedForVPRS(uint attrib, uint2 offset)
+{
+ uint value = (uint)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.x = asuint(attrib.x);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 1;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint2 NvEvaluateAttributeSnappedForVPRS(uint2 attrib, uint2 offset)
+{
+ uint2 value = (uint2)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xy = asuint(attrib.xy);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 2;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint3 NvEvaluateAttributeSnappedForVPRS(uint3 attrib, uint2 offset)
+{
+ uint3 value = (uint3)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyz = asuint(attrib.xyz);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 3;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ value.z = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+uint4 NvEvaluateAttributeSnappedForVPRS(uint4 attrib, uint2 offset)
+{
+ uint4 value = (uint4)0;
+ uint ext = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[ext].opcode = NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED;
+ g_NvidiaExt[ext].src0u.xyzw = asuint(attrib.xyzw);
+ g_NvidiaExt[ext].src1u.xy = offset;
+ g_NvidiaExt[ext].numOutputsForIncCounter = 4;
+ value.x = asuint(g_NvidiaExt.IncrementCounter());
+ value.y = asuint(g_NvidiaExt.IncrementCounter());
+ value.z = asuint(g_NvidiaExt.IncrementCounter());
+ value.w = asuint(g_NvidiaExt.IncrementCounter());
+ return value;
+}
+
+// MATCH instruction variants
+uint NvWaveMatch(uint value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = value;
+ g_NvidiaExt[index].src1u.x = 1;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+uint NvWaveMatch(uint2 value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = value.xy;
+ g_NvidiaExt[index].src1u.x = 2;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+uint NvWaveMatch(uint4 value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u = value;
+ g_NvidiaExt[index].src1u.x = 4;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+uint NvWaveMatch(float value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = asuint(value);
+ g_NvidiaExt[index].src1u.x = 1;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+uint NvWaveMatch(float2 value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = asuint(value);
+ g_NvidiaExt[index].src1u.x = 2;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+uint NvWaveMatch(float4 value)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u = asuint(value);
+ g_NvidiaExt[index].src1u.x = 4;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_MATCH_ANY;
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ return g_NvidiaExt.IncrementCounter();
+}
+
+
+//----------------------------------------------------------------------------//
+//------------------------------ Footprint functions -------------------------//
+//----------------------------------------------------------------------------//
+// texSpace and smpSpace must be immediates, texIndex and smpIndex can be variable
+// offset must be immediate
+// the required components of location and offset fields can be filled depending on the dimension/type of the texture
+// texType should be one of 2D or 3D as defined in nvShaderExtnEnums.h and and should be an immediate literal
+// if the above restrictions are not met, the behaviour of this instruction is undefined
+
+uint4 NvFootprintFine(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprint(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, offset);
+}
+
+uint4 NvFootprintCoarse(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprint(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, offset);
+}
+
+
+
+uint4 NvFootprintFineBias(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float bias, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintBias(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, bias, offset);
+}
+
+uint4 NvFootprintCoarseBias(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float bias, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintBias(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, bias, offset);
+}
+
+
+
+uint4 NvFootprintFineLevel(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float lodLevel, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintLevel(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, lodLevel, offset);
+}
+
+uint4 NvFootprintCoarseLevel(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float lodLevel, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintLevel(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, lodLevel, offset);
+}
+
+
+
+uint4 NvFootprintFineGrad(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float3 ddx, float3 ddy, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintGrad(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, ddx, ddy, offset);
+}
+
+uint4 NvFootprintCoarseGrad(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float3 ddx, float3 ddy, int3 offset = int3(0, 0, 0))
+{
+ return __NvFootprintGrad(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, ddx, ddy, offset);
+}
+
+uint NvFootprintExtractLOD(uint4 blob)
+{
+ return ((blob.w & 0xF000) >> 12);
+}
+
+uint NvFootprintExtractReturnGran(uint4 blob)
+{
+ return ((blob.z & 0xF000000) >> 24);
+}
+
+uint2 NvFootprintExtractAnchorTileLoc2D(uint4 blob)
+{
+ uint2 loc;
+ loc.x = (blob.w & 0xFFF);
+ loc.y = (blob.z & 0xFFF);
+ return loc;
+}
+
+uint3 NvFootprintExtractAnchorTileLoc3D(uint4 blob)
+{
+ uint3 loc;
+ loc.x = (blob.w & 0xFFF);
+ loc.y = ((blob.w & 0xFFF0000) >> 16);
+ loc.z = (blob.z & 0x1FFF);
+ return loc;
+}
+
+uint2 NvFootprintExtractOffset2D(uint4 blob)
+{
+ uint2 loc;
+ loc.x = ((blob.z & 0x070000) >> 16);
+ loc.y = ((blob.z & 0x380000) >> 19);
+ return loc;
+}
+
+uint3 NvFootprintExtractOffset3D(uint4 blob)
+{
+ uint3 loc;
+ loc.x = ((blob.z & 0x030000) >> 16);
+ loc.y = ((blob.z & 0x0C0000) >> 18);
+ loc.z = ((blob.z & 0x300000) >> 20);
+ return loc;
+}
+
+uint2 NvFootprintExtractBitmask(uint4 blob)
+{
+ return blob.xy;
+}
+
+
+// Variant of Footprint extensions which returns isSingleLod (out parameter)
+// isSingleLod = true -> This footprint request touched the texels from only single LOD.
+uint4 NvFootprintFine(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprint(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+uint4 NvFootprintCoarse(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprint(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+
+
+uint4 NvFootprintFineBias(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float bias, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintBias(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, bias, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+uint4 NvFootprintCoarseBias(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float bias, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintBias(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, bias, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+
+
+uint4 NvFootprintFineLevel(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float lodLevel, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintLevel(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, lodLevel, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+uint4 NvFootprintCoarseLevel(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float lodLevel, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintLevel(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, lodLevel, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+
+
+uint4 NvFootprintFineGrad(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float3 ddx, float3 ddy, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintGrad(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_FINE, gran, ddx, ddy, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+uint4 NvFootprintCoarseGrad(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint gran, float3 ddx, float3 ddy, out uint isSingleLod, int3 offset = int3(0, 0, 0))
+{
+ uint4 res = __NvFootprintGrad(texSpace, texIndex, smpSpace, smpIndex, texType, location, NV_EXTN_FOOTPRINT_MODE_COARSE, gran, ddx, ddy, offset);
+ isSingleLod = __NvGetSpecial(NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED);
+ return res;
+}
+
+
+uint NvActiveThreads()
+{
+ return NvBallot(1);
+}
+
+
+//----------------------------------------------------------------------------//
+//------------------------------ WaveMultiPrefix functions -------------------//
+//----------------------------------------------------------------------------//
+
+// Following are the WaveMultiPrefix functions for different operations (Add, Bitand, BitOr, BitXOr) for different datatypes (uint, uint2, uint4)
+// This is a set of functions which implement multi-prefix operations among the set of active lanes in the current wave (WARP).
+// A multi-prefix operation comprises a set of prefix operations, executed in parallel within subsets of lanes identified with the provided bitmasks.
+// These bitmasks represent partitioning of the set of active lanes in the current wave into N groups (where N is the number of unique masks across all lanes in the wave).
+// N prefix operations are then performed each within its corresponding group.
+// The groups are assumed to be non-intersecting (that is, a given lane can be a member of one and only one group),
+// and bitmasks in all lanes belonging to the same group are required to be the same.
+// There are 2 type of functions - Exclusive and Inclusive prefix operations.
+// e.g. For NvWaveMultiPrefixInclusiveAdd(val, mask) operation - For each of the groups (for which mask input is same) following is the expected output :
+// i^th thread in a group has value = sum(values of threads 0 to i)
+// For Exclusive version of same opeartion -
+// i^th thread in a group has value = sum(values of threads 0 to i-1) and 0th thread in a the Group has value 0
+
+// Extensions for Add
+uint NvWaveMultiPrefixInclusiveAdd(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ // As remainingThreads only has threads in group with smaller thread ids than its own thread-id nextLane can never be 31 for any thread in the group except the smallest one
+ // For smallest thread in the group, remainingThreads is 0 --> nextLane is ~0 (i.e. considering last 5 bits its 31)
+ // So passing maskClampValue=30 to __NvShflGeneric, it will return laneValid=false for the smallest thread in the group. So update val and nextLane based on laneValid.
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val + temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint NvWaveMultiPrefixExclusiveAdd(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : 0;
+ return NvWaveMultiPrefixInclusiveAdd(val, mask);
+}
+
+uint2 NvWaveMultiPrefixInclusiveAdd(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val + temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint2 NvWaveMultiPrefixExclusiveAdd(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint2(0, 0);
+ return NvWaveMultiPrefixInclusiveAdd(val, mask);
+}
+
+uint4 NvWaveMultiPrefixInclusiveAdd(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val + temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint4 NvWaveMultiPrefixExclusiveAdd(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint4(0, 0, 0, 0);
+ return NvWaveMultiPrefixInclusiveAdd(val, mask);
+}
+
+// MultiPrefix extensions for Bitand
+uint NvWaveMultiPrefixInclusiveAnd(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val & temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint NvWaveMultiPrefixExclusiveAnd(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : ~0;
+ return NvWaveMultiPrefixInclusiveAnd(val, mask);
+}
+
+uint2 NvWaveMultiPrefixInclusiveAnd(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val & temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint2 NvWaveMultiPrefixExclusiveAnd(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint2(~0, ~0);
+ return NvWaveMultiPrefixInclusiveAnd(val, mask);
+}
+
+
+uint4 NvWaveMultiPrefixInclusiveAnd(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val & temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint4 NvWaveMultiPrefixExclusiveAnd(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint4(~0, ~0, ~0, ~0);
+ return NvWaveMultiPrefixInclusiveAnd(val, mask);
+}
+
+
+// MultiPrefix extensions for BitOr
+uint NvWaveMultiPrefixInclusiveOr(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val | temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint NvWaveMultiPrefixExclusiveOr(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : 0;
+ return NvWaveMultiPrefixInclusiveOr(val, mask);
+}
+
+uint2 NvWaveMultiPrefixInclusiveOr(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val | temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint2 NvWaveMultiPrefixExclusiveOr(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint2(0, 0);
+ return NvWaveMultiPrefixInclusiveOr(val, mask);
+}
+
+
+uint4 NvWaveMultiPrefixInclusiveOr(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val | temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint4 NvWaveMultiPrefixExclusiveOr(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint4(0, 0, 0, 0);
+ return NvWaveMultiPrefixInclusiveOr(val, mask);
+}
+
+
+// MultiPrefix extensions for BitXOr
+uint NvWaveMultiPrefixInclusiveXOr(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val ^ temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint NvWaveMultiPrefixExclusiveXOr(uint val, uint mask)
+{
+ uint temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : 0;
+ return NvWaveMultiPrefixInclusiveXOr(val, mask);
+}
+
+uint2 NvWaveMultiPrefixInclusiveXOr(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val ^ temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint2 NvWaveMultiPrefixExclusiveXOr(uint2 val, uint mask)
+{
+ uint2 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint2(0, 0);
+ return NvWaveMultiPrefixInclusiveXOr(val, mask);
+}
+
+
+uint4 NvWaveMultiPrefixInclusiveXOr(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint nextLane = firstbithigh(remainingThreads);
+ for (uint i = 0; i < NV_WARP_SIZE_LOG2; i++)
+ {
+ temp = NvShfl(val, nextLane);
+ uint laneValid;
+ uint newLane = asuint(__NvShflGeneric(nextLane, nextLane, 30, laneValid));
+ if (laneValid) // if nextLane's nextLane is valid
+ {
+ val = val ^ temp;
+ nextLane = newLane;
+ }
+ }
+ return val;
+}
+
+uint4 NvWaveMultiPrefixExclusiveXOr(uint4 val, uint mask)
+{
+ uint4 temp;
+ uint a = NvActiveThreads();
+ uint remainingThreads = a & __NvGetSpecial(NV_SPECIALOP_THREADLTMASK) & mask;
+ uint lane = firstbithigh(remainingThreads);
+ temp = NvShfl(val, lane);
+ val = remainingThreads != 0 ? temp : uint4(0, 0, 0, 0);
+ return NvWaveMultiPrefixInclusiveXOr(val, mask);
+}
+
+//----------------------------------------------------------------------------//
+//------------------------- DXR HitObject Extension --------------------------//
+//----------------------------------------------------------------------------//
+
+// Support for templates in HLSL requires HLSL 2021+. When using dxc,
+// use the -HV 2021 command line argument to enable these versions.
+#if defined(__HLSL_VERSION) && (__HLSL_VERSION >= 2021) && !defined(NV_HITOBJECT_USE_MACRO_API)
+
+struct NvHitObject {
+ uint _handle;
+
+ bool IsMiss()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_MISS;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ bool IsHit()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_HIT;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ bool IsNop()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_NOP;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ uint GetInstanceID()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_ID;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetInstanceIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetPrimitiveIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_PRIMITIVE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetGeometryIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_GEOMETRY_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetHitKind()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_HIT_KIND;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ RayDesc GetRayDesc()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_RAY_DESC;
+ g_NvidiaExt[index].src0u.x = _handle;
+
+ uint tmin = g_NvidiaExt.IncrementCounter();
+ uint tmax = g_NvidiaExt.IncrementCounter();
+ uint rayOrgX = g_NvidiaExt.IncrementCounter();
+ uint rayOrgY = g_NvidiaExt.IncrementCounter();
+ uint rayOrgZ = g_NvidiaExt.IncrementCounter();
+ uint rayDirX = g_NvidiaExt.IncrementCounter();
+ uint rayDirY = g_NvidiaExt.IncrementCounter();
+ uint rayDirZ = g_NvidiaExt.IncrementCounter();
+
+ RayDesc ray;
+ ray.TMin = asfloat(tmin);
+ ray.TMax = asfloat(tmax);
+ ray.Origin.x = asfloat(rayOrgX);
+ ray.Origin.y = asfloat(rayOrgY);
+ ray.Origin.z = asfloat(rayOrgZ);
+ ray.Direction.x = asfloat(rayDirX);
+ ray.Direction.y = asfloat(rayDirY);
+ ray.Direction.z = asfloat(rayDirZ);
+
+ return ray;
+ }
+
+ template
+ T GetAttributes()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_ATTRIBUTES;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint callHandle = g_NvidiaExt.IncrementCounter();
+
+ T attrs;
+ CallShader(callHandle, attrs);
+ return attrs;
+ }
+
+ uint GetShaderTableIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_SHADER_TABLE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint LoadLocalRootTableConstant(uint RootConstantOffsetInBytes)
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_LOAD_LOCAL_ROOT_TABLE_CONSTANT;
+ g_NvidiaExt[index].src0u.x = _handle;
+ g_NvidiaExt[index].src0u.y = RootConstantOffsetInBytes;
+ return g_NvidiaExt.IncrementCounter();
+ }
+};
+
+template
+NvHitObject NvTraceRayHitObject(
+ RaytracingAccelerationStructure AccelerationStructure,
+ uint RayFlags,
+ uint InstanceInclusionMask,
+ uint RayContributionToHitGroupIndex,
+ uint MultiplierForGeometryContributionToHitGroupIndex,
+ uint MissShaderIndex,
+ RayDesc Ray,
+ inout T Payload)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_TRACE_RAY;
+ g_NvidiaExt[index].numOutputsForIncCounter = 2;
+ g_NvidiaExt[index].src0u.x = MissShaderIndex;
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+ uint traceHandle = g_NvidiaExt.IncrementCounter();
+
+ TraceRay(AccelerationStructure, RayFlags, InstanceInclusionMask, RayContributionToHitGroupIndex, MultiplierForGeometryContributionToHitGroupIndex, traceHandle, Ray, Payload);
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+template
+NvHitObject NvMakeHit(
+ RaytracingAccelerationStructure AccelerationStructure,
+ uint InstanceIndex,
+ uint GeometryIndex,
+ uint PrimitiveIndex,
+ uint HitKind,
+ uint RayContributionToHitGroupIndex,
+ uint MultiplierForGeometryContributionToHitGroupIndex,
+ RayDesc Ray,
+ T Attributes)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_HIT;
+ g_NvidiaExt[index].numOutputsForIncCounter = 2;
+ g_NvidiaExt[index].src0u.x = InstanceIndex;
+ g_NvidiaExt[index].src0u.y = GeometryIndex;
+ g_NvidiaExt[index].src0u.z = PrimitiveIndex;
+ g_NvidiaExt[index].src0u.w = HitKind;
+ g_NvidiaExt[index].src1u.x = RayContributionToHitGroupIndex;
+ g_NvidiaExt[index].src1u.y = MultiplierForGeometryContributionToHitGroupIndex;
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+ uint traceHandle = g_NvidiaExt.IncrementCounter();
+
+ struct AttrWrapper { T Attrs; };
+ AttrWrapper wrapper;
+ wrapper.Attrs = Attributes;
+ CallShader(traceHandle, wrapper);
+
+ struct DummyPayload { int a; };
+ DummyPayload payload;
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, traceHandle, Ray, payload);
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+template
+NvHitObject NvMakeHitWithRecordIndex(
+ uint HitGroupRecordIndex,
+ RaytracingAccelerationStructure AccelerationStructure,
+ uint InstanceIndex,
+ uint GeometryIndex,
+ uint PrimitiveIndex,
+ uint HitKind,
+ RayDesc Ray,
+ T Attributes)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_HIT_WITH_RECORD_INDEX;
+ g_NvidiaExt[index].numOutputsForIncCounter = 2;
+ g_NvidiaExt[index].src0u.x = InstanceIndex;
+ g_NvidiaExt[index].src0u.y = GeometryIndex;
+ g_NvidiaExt[index].src0u.z = PrimitiveIndex;
+ g_NvidiaExt[index].src0u.w = HitKind;
+ g_NvidiaExt[index].src1u.x = HitGroupRecordIndex;
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+ uint traceHandle = g_NvidiaExt.IncrementCounter();
+
+ struct AttrWrapper { T Attrs; };
+ AttrWrapper wrapper;
+ wrapper.Attrs = Attributes;
+ CallShader(traceHandle, wrapper);
+
+ struct DummyPayload { int a; };
+ DummyPayload payload;
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, traceHandle, Ray, payload);
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+NvHitObject NvMakeMiss(
+ uint MissShaderIndex,
+ RayDesc Ray)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_MISS;
+ g_NvidiaExt[index].src0u.x = MissShaderIndex;
+ g_NvidiaExt[index].src0u.y = asuint(Ray.TMin);
+ g_NvidiaExt[index].src0u.z = asuint(Ray.TMax);
+ g_NvidiaExt[index].src1u.x = asuint(Ray.Origin.x);
+ g_NvidiaExt[index].src1u.y = asuint(Ray.Origin.y);
+ g_NvidiaExt[index].src1u.z = asuint(Ray.Origin.z);
+ g_NvidiaExt[index].src2u.x = asuint(Ray.Direction.x);
+ g_NvidiaExt[index].src2u.y = asuint(Ray.Direction.y);
+ g_NvidiaExt[index].src2u.z = asuint(Ray.Direction.z);
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+NvHitObject NvMakeNop()
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_NOP;
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+void NvReorderThread(uint CoherenceHint, uint NumCoherenceHintBits)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD;
+ g_NvidiaExt[index].src0u.x = 0;
+ g_NvidiaExt[index].src0u.y = 0;
+ g_NvidiaExt[index].src0u.z = CoherenceHint;
+ g_NvidiaExt[index].src0u.w = NumCoherenceHintBits;
+ g_NvidiaExt.IncrementCounter();
+}
+
+void NvReorderThread(NvHitObject HitObj, uint CoherenceHint, uint NumCoherenceHintBits)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD;
+ g_NvidiaExt[index].src0u.x = 1;
+ g_NvidiaExt[index].src0u.y = HitObj._handle;
+ g_NvidiaExt[index].src0u.z = CoherenceHint;
+ g_NvidiaExt[index].src0u.w = NumCoherenceHintBits;
+ g_NvidiaExt.IncrementCounter();
+}
+
+void NvReorderThread(NvHitObject HitObj)
+{
+ NvReorderThread(HitObj, 0, 0);
+}
+
+template
+void NvInvokeHitObject(
+ RaytracingAccelerationStructure AccelerationStructure,
+ NvHitObject HitObj,
+ inout T Payload)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_INVOKE;
+ g_NvidiaExt[index].src0u.x = HitObj._handle;
+ uint handle = g_NvidiaExt.IncrementCounter();
+
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, handle, (RayDesc)0, Payload);
+}
+
+// Macro-based version of the HitObject API. Use this when HLSL 2021 is not available.
+// Enable by specifying #define NV_HITOBJECT_USE_MACRO_API before including this header.
+#elif defined(NV_HITOBJECT_USE_MACRO_API)
+
+struct NvHitObject {
+ uint _handle;
+
+ bool IsMiss()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_MISS;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ bool IsHit()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_HIT;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ bool IsNop()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_IS_NOP;
+ g_NvidiaExt[index].src0u.x = _handle;
+ uint ret = g_NvidiaExt.IncrementCounter();
+ return ret != 0;
+ }
+
+ uint GetInstanceID()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_ID;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetInstanceIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetPrimitiveIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_PRIMITIVE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetGeometryIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_GEOMETRY_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint GetHitKind()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_HIT_KIND;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ RayDesc GetRayDesc()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_RAY_DESC;
+ g_NvidiaExt[index].src0u.x = _handle;
+
+ uint tmin = g_NvidiaExt.IncrementCounter();
+ uint tmax = g_NvidiaExt.IncrementCounter();
+ uint rayOrgX = g_NvidiaExt.IncrementCounter();
+ uint rayOrgY = g_NvidiaExt.IncrementCounter();
+ uint rayOrgZ = g_NvidiaExt.IncrementCounter();
+ uint rayDirX = g_NvidiaExt.IncrementCounter();
+ uint rayDirY = g_NvidiaExt.IncrementCounter();
+ uint rayDirZ = g_NvidiaExt.IncrementCounter();
+
+ RayDesc ray;
+ ray.TMin = asfloat(tmin);
+ ray.TMax = asfloat(tmax);
+ ray.Origin.x = asfloat(rayOrgX);
+ ray.Origin.y = asfloat(rayOrgY);
+ ray.Origin.z = asfloat(rayOrgZ);
+ ray.Direction.x = asfloat(rayDirX);
+ ray.Direction.y = asfloat(rayDirY);
+ ray.Direction.z = asfloat(rayDirZ);
+
+ return ray;
+ }
+
+ uint GetShaderTableIndex()
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_SHADER_TABLE_INDEX;
+ g_NvidiaExt[index].src0u.x = _handle;
+ return g_NvidiaExt.IncrementCounter();
+ }
+
+ uint LoadLocalRootTableConstant(uint RootConstantOffsetInBytes)
+ {
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_LOAD_LOCAL_ROOT_TABLE_CONSTANT;
+ g_NvidiaExt[index].src0u.x = _handle;
+ g_NvidiaExt[index].src0u.y = RootConstantOffsetInBytes;
+ return g_NvidiaExt.IncrementCounter();
+ }
+};
+
+#define NvTraceRayHitObject(AccelerationStructure,RayFlags,InstanceInclusionMask,RayContributionToHitGroupIndex,MultiplierForGeometryContributionToHitGroupIndex,MissShaderIndex,Ray,Payload,ResultHitObj) \
+do { \
+ uint _rayFlags = RayFlags; \
+ uint _instanceInclusionMask = InstanceInclusionMask; \
+ uint _rayContributionToHitGroupIndex = RayContributionToHitGroupIndex; \
+ uint _multiplierForGeometryContributionToHitGroupIndex = MultiplierForGeometryContributionToHitGroupIndex; \
+ uint _missShaderIndex = MissShaderIndex; \
+ RayDesc _ray = Ray; \
+ uint _index = g_NvidiaExt.IncrementCounter(); \
+ g_NvidiaExt[_index].opcode = NV_EXTN_OP_HIT_OBJECT_TRACE_RAY; \
+ g_NvidiaExt[_index].numOutputsForIncCounter = 2; \
+ g_NvidiaExt[_index].src0u.x = _missShaderIndex; \
+ uint _hitHandle = g_NvidiaExt.IncrementCounter(); \
+ uint _traceHandle = g_NvidiaExt.IncrementCounter(); \
+ TraceRay(AccelerationStructure, _rayFlags, _instanceInclusionMask, _rayContributionToHitGroupIndex, _multiplierForGeometryContributionToHitGroupIndex, _traceHandle, _ray, Payload); \
+ ResultHitObj._handle = _hitHandle; \
+} while(0)
+
+struct NvHitObjectMacroDummyPayloadType { int a; };
+
+#define NvMakeHit(AccelerationStructure,InstanceIndex,GeometryIndex,PrimitiveIndex,HitKind,RayContributionToHitGroupIndex,MultiplierForGeometryContributionToHitGroupIndex,Ray,Attributes,ResultHitObj) \
+do { \
+ uint _instanceIndex = InstanceIndex; \
+ uint _geometryIndex = GeometryIndex; \
+ uint _primitiveIndex = PrimitiveIndex; \
+ uint _hitKind = HitKind; \
+ uint _rayContributionToHitGroupIndex = RayContributionToHitGroupIndex; \
+ uint _multiplierForGeometryContributionToHitGroupIndex = MultiplierForGeometryContributionToHitGroupIndex; \
+ RayDesc _ray = Ray; \
+ uint _index = g_NvidiaExt.IncrementCounter(); \
+ g_NvidiaExt[_index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_HIT; \
+ g_NvidiaExt[_index].numOutputsForIncCounter = 2; \
+ g_NvidiaExt[_index].src0u.x = _instanceIndex; \
+ g_NvidiaExt[_index].src0u.y = _geometryIndex; \
+ g_NvidiaExt[_index].src0u.z = _primitiveIndex; \
+ g_NvidiaExt[_index].src0u.w = _hitKind; \
+ g_NvidiaExt[_index].src1u.x = _rayContributionToHitGroupIndex; \
+ g_NvidiaExt[_index].src1u.y = _multiplierForGeometryContributionToHitGroupIndex; \
+ uint _hitHandle = g_NvidiaExt.IncrementCounter(); \
+ uint _traceHandle = g_NvidiaExt.IncrementCounter(); \
+ CallShader(_traceHandle, Attributes); \
+ NvHitObjectMacroDummyPayloadType _payload; \
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, _traceHandle, _ray, _payload); \
+ ResultHitObj._handle = _hitHandle; \
+} while(0)
+
+#define NvMakeHitWithRecordIndex(HitGroupRecordIndex,AccelerationStructure,InstanceIndex,GeometryIndex,PrimitiveIndex,HitKind,Ray,Attributes,ResultHitObj) \
+do { \
+ uint _hitGroupRecordIndex = HitGroupRecordIndex; \
+ uint _instanceIndex = InstanceIndex; \
+ uint _geometryIndex = GeometryIndex; \
+ uint _primitiveIndex = PrimitiveIndex; \
+ uint _hitKind = HitKind; \
+ RayDesc _ray = Ray; \
+ uint _index = g_NvidiaExt.IncrementCounter(); \
+ g_NvidiaExt[_index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_HIT_WITH_RECORD_INDEX; \
+ g_NvidiaExt[_index].numOutputsForIncCounter = 2; \
+ g_NvidiaExt[_index].src0u.x = _instanceIndex; \
+ g_NvidiaExt[_index].src0u.y = _geometryIndex; \
+ g_NvidiaExt[_index].src0u.z = _primitiveIndex; \
+ g_NvidiaExt[_index].src0u.w = _hitKind; \
+ g_NvidiaExt[_index].src1u.x = _hitGroupRecordIndex; \
+ uint _hitHandle = g_NvidiaExt.IncrementCounter(); \
+ uint _traceHandle = g_NvidiaExt.IncrementCounter(); \
+ CallShader(_traceHandle, Attributes); \
+ NvHitObjectMacroDummyPayloadType _payload; \
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, _traceHandle, _ray, _payload); \
+ ResultHitObj._handle = _hitHandle; \
+} while(0)
+
+NvHitObject NvMakeMiss(
+ uint MissShaderIndex,
+ RayDesc Ray)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_MISS;
+ g_NvidiaExt[index].src0u.x = MissShaderIndex;
+ g_NvidiaExt[index].src0u.y = asuint(Ray.TMin);
+ g_NvidiaExt[index].src0u.z = asuint(Ray.TMax);
+ g_NvidiaExt[index].src1u.x = asuint(Ray.Origin.x);
+ g_NvidiaExt[index].src1u.y = asuint(Ray.Origin.y);
+ g_NvidiaExt[index].src1u.z = asuint(Ray.Origin.z);
+ g_NvidiaExt[index].src2u.x = asuint(Ray.Direction.x);
+ g_NvidiaExt[index].src2u.y = asuint(Ray.Direction.y);
+ g_NvidiaExt[index].src2u.z = asuint(Ray.Direction.z);
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+NvHitObject NvMakeNop()
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_MAKE_NOP;
+ uint hitHandle = g_NvidiaExt.IncrementCounter();
+
+ NvHitObject hitObj;
+ hitObj._handle = hitHandle;
+ return hitObj;
+}
+
+#define NvGetAttributesFromHitObject(HitObj,ResultAttributes) \
+do { \
+ uint _index = g_NvidiaExt.IncrementCounter(); \
+ g_NvidiaExt[_index].opcode = NV_EXTN_OP_HIT_OBJECT_GET_ATTRIBUTES; \
+ g_NvidiaExt[_index].src0u.x = HitObj._handle; \
+ uint _callHandle = g_NvidiaExt.IncrementCounter(); \
+ CallShader(_callHandle, ResultAttributes); \
+} while(0)
+
+void NvReorderThread(uint CoherenceHint, uint NumCoherenceHintBits)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD;
+ g_NvidiaExt[index].src0u.x = 0;
+ g_NvidiaExt[index].src0u.y = 0;
+ g_NvidiaExt[index].src0u.z = CoherenceHint;
+ g_NvidiaExt[index].src0u.w = NumCoherenceHintBits;
+ g_NvidiaExt.IncrementCounter();
+}
+
+void NvReorderThread(NvHitObject HitObj, uint CoherenceHint, uint NumCoherenceHintBits)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD;
+ g_NvidiaExt[index].src0u.x = 1;
+ g_NvidiaExt[index].src0u.y = HitObj._handle;
+ g_NvidiaExt[index].src0u.z = CoherenceHint;
+ g_NvidiaExt[index].src0u.w = NumCoherenceHintBits;
+ g_NvidiaExt.IncrementCounter();
+}
+
+void NvReorderThread(NvHitObject HitObj)
+{
+ NvReorderThread(HitObj, 0, 0);
+}
+
+#define NvInvokeHitObject(AccelerationStructure,HitObj,Payload) \
+do { \
+ uint _index = g_NvidiaExt.IncrementCounter(); \
+ g_NvidiaExt[_index].opcode = NV_EXTN_OP_HIT_OBJECT_INVOKE; \
+ g_NvidiaExt[_index].src0u.x = HitObj._handle; \
+ uint _handle = g_NvidiaExt.IncrementCounter(); \
+ TraceRay(AccelerationStructure, 0, 0, 0, 0, _handle, (RayDesc)0, Payload); \
+} while(0)
+
+#endif
diff --git a/samples/test-harness/shaders/include/nvapi/nvHLSLExtnsInternal.h b/samples/test-harness/shaders/include/nvapi/nvHLSLExtnsInternal.h
new file mode 100644
index 0000000..c5936a5
--- /dev/null
+++ b/samples/test-harness/shaders/include/nvapi/nvHLSLExtnsInternal.h
@@ -0,0 +1,767 @@
+ /************************************************************************************************************************************\
+|* *|
+|* Copyright © 2012 NVIDIA Corporation. All rights reserved. *|
+|* *|
+|* NOTICE TO USER: *|
+|* *|
+|* This software is subject to NVIDIA ownership rights under U.S. and international Copyright laws. *|
+|* *|
+|* This software and the information contained herein are PROPRIETARY and CONFIDENTIAL to NVIDIA *|
+|* and are being provided solely under the terms and conditions of an NVIDIA software license agreement. *|
+|* Otherwise, you have no rights to use or access this software in any manner. *|
+|* *|
+|* If not covered by the applicable NVIDIA software license agreement: *|
+|* NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOFTWARE FOR ANY PURPOSE. *|
+|* IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. *|
+|* NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, *|
+|* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. *|
+|* IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, *|
+|* OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, *|
+|* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. *|
+|* *|
+|* U.S. Government End Users. *|
+|* This software is a "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT 1995), *|
+|* consisting of "commercial computer software" and "commercial computer software documentation" *|
+|* as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. *|
+|* Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (JUNE 1995), *|
+|* all U.S. Government End Users acquire the software with only those rights set forth herein. *|
+|* *|
+|* Any use of this software in individual and commercial software must include, *|
+|* in the user documentation and internal comments to the code, *|
+|* the above Disclaimer (as applicable) and U.S. Government End Users Notice. *|
+|* *|
+ \************************************************************************************************************************************/
+
+////////////////////////// NVIDIA SHADER EXTENSIONS /////////////////
+// internal functions
+// Functions in this file are not expected to be called by apps directly
+
+#include "nvShaderExtnEnums.h"
+
+struct NvShaderExtnStruct
+{
+ uint opcode; // opcode
+ uint rid; // resource ID
+ uint sid; // sampler ID
+
+ uint4 dst1u; // destination operand 1 (for instructions that need extra destination operands)
+ uint4 src3u; // source operand 3
+ uint4 src4u; // source operand 4
+ uint4 src5u; // source operand 5
+
+ uint4 src0u; // uint source operand 0
+ uint4 src1u; // uint source operand 0
+ uint4 src2u; // uint source operand 0
+ uint4 dst0u; // uint destination operand
+
+ uint markUavRef; // the next store to UAV is fake and is used only to identify the uav slot
+ uint numOutputsForIncCounter; // Used for output to IncrementCounter
+ float padding1[27]; // struct size: 256 bytes
+};
+
+// RW structured buffer for Nvidia shader extensions
+
+// Application needs to define NV_SHADER_EXTN_SLOT as a unused slot, which should be
+// set using NvAPI_D3D11_SetNvShaderExtnSlot() call before creating the first shader that
+// uses nvidia shader extensions. E.g before including this file in shader define it as:
+// #define NV_SHADER_EXTN_SLOT u7
+
+// For SM5.1, application needs to define NV_SHADER_EXTN_REGISTER_SPACE as register space
+// E.g. before including this file in shader define it as:
+// #define NV_SHADER_EXTN_REGISTER_SPACE space2
+
+// Note that other operations to this UAV will be ignored so application
+// should bind a null resource
+
+#ifdef NV_SHADER_EXTN_REGISTER_SPACE
+RWStructuredBuffer g_NvidiaExt : register( NV_SHADER_EXTN_SLOT, NV_SHADER_EXTN_REGISTER_SPACE );
+#else
+RWStructuredBuffer g_NvidiaExt : register( NV_SHADER_EXTN_SLOT );
+#endif
+
+//----------------------------------------------------------------------------//
+// the exposed SHFL instructions accept a mask parameter in src2
+// To compute lane mask from width of segment:
+// minLaneID : currentLaneId & src2[12:8]
+// maxLaneID : minLaneId | (src2[4:0] & ~src2[12:8])
+// where [minLaneId, maxLaneId] defines the segment where currentLaneId belongs
+// we always set src2[4:0] to 11111 (0x1F), and set src2[12:8] as (32 - width)
+int __NvGetShflMaskFromWidth(uint width)
+{
+ return ((NV_WARP_SIZE - width) << 8) | 0x1F;
+}
+
+//----------------------------------------------------------------------------//
+
+void __NvReferenceUAVForOp(RWByteAddressBuffer uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav.Store(index, 0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = float2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = float2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = float2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = float4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = float4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = float4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = 0.0f;
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = 0.0f;
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = 0.0f;
+}
+
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = uint2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = uint2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = uint2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = uint4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = uint4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = uint4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = 0;
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = 0;
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = 0;
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = int2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = int2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = int2(0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = int4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = int4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = int4(0,0,0,0);
+}
+
+void __NvReferenceUAVForOp(RWTexture1D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[index] = 0;
+}
+
+void __NvReferenceUAVForOp(RWTexture2D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint2(index,index)] = 0;
+}
+
+void __NvReferenceUAVForOp(RWTexture3D uav)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].markUavRef = 1;
+ uav[uint3(index,index,index)] = 0;
+}
+
+//----------------------------------------------------------------------------//
+// ATOMIC op sub-opcodes
+#define NV_EXTN_ATOM_AND 0
+#define NV_EXTN_ATOM_OR 1
+#define NV_EXTN_ATOM_XOR 2
+
+#define NV_EXTN_ATOM_ADD 3
+#define NV_EXTN_ATOM_MAX 6
+#define NV_EXTN_ATOM_MIN 7
+
+#define NV_EXTN_ATOM_SWAP 8
+#define NV_EXTN_ATOM_CAS 9
+
+//----------------------------------------------------------------------------//
+
+// performs Atomic operation on two consecutive fp16 values in the given UAV
+// the uint paramater 'fp16x2Val' is treated as two fp16 values
+// the passed sub-opcode 'op' should be an immediate constant
+// byteAddress must be multiple of 4
+// the returned value are the two fp16 values packed into a single uint
+uint __NvAtomicOpFP16x2(RWByteAddressBuffer uav, uint byteAddress, uint fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = byteAddress;
+ g_NvidiaExt[index].src1u.x = fp16x2Val;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.x;
+}
+
+//----------------------------------------------------------------------------//
+
+// performs Atomic operation on a R16G16_FLOAT UAV at the given address
+// the uint paramater 'fp16x2Val' is treated as two fp16 values
+// the passed sub-opcode 'op' should be an immediate constant
+// the returned value are the two fp16 values (.x and .y components) packed into a single uint
+// Warning: Behaviour of these set of functions is undefined if the UAV is not
+// of R16G16_FLOAT format (might result in app crash or TDR)
+
+uint __NvAtomicOpFP16x2(RWTexture1D uav, uint address, uint fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address;
+ g_NvidiaExt[index].src1u.x = fp16x2Val;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.x;
+}
+
+uint __NvAtomicOpFP16x2(RWTexture2D uav, uint2 address, uint fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = address;
+ g_NvidiaExt[index].src1u.x = fp16x2Val;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.x;
+}
+
+uint __NvAtomicOpFP16x2(RWTexture3D uav, uint3 address, uint fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = address;
+ g_NvidiaExt[index].src1u.x = fp16x2Val;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.x;
+}
+
+//----------------------------------------------------------------------------//
+
+// performs Atomic operation on a R16G16B16A16_FLOAT UAV at the given address
+// the uint2 paramater 'fp16x2Val' is treated as four fp16 values
+// i.e, fp16x2Val.x = uav.xy and fp16x2Val.y = uav.yz
+// the passed sub-opcode 'op' should be an immediate constant
+// the returned value are the four fp16 values (.xyzw components) packed into uint2
+// Warning: Behaviour of these set of functions is undefined if the UAV is not
+// of R16G16B16A16_FLOAT format (might result in app crash or TDR)
+
+uint2 __NvAtomicOpFP16x2(RWTexture1D uav, uint address, uint2 fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ // break it down into two fp16x2 atomic ops
+ uint2 retVal;
+
+ // first op has x-coordinate = x * 2
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address * 2;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.x;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.x = g_NvidiaExt[index].dst0u.x;
+
+ // second op has x-coordinate = x * 2 + 1
+ index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address * 2 + 1;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.y;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.y = g_NvidiaExt[index].dst0u.x;
+
+ return retVal;
+}
+
+uint2 __NvAtomicOpFP16x2(RWTexture2D uav, uint2 address, uint2 fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ // break it down into two fp16x2 atomic ops
+ uint2 retVal;
+
+ // first op has x-coordinate = x * 2
+ uint2 addressTemp = uint2(address.x * 2, address.y);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = addressTemp;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.x;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.x = g_NvidiaExt[index].dst0u.x;
+
+ // second op has x-coordinate = x * 2 + 1
+ addressTemp.x++;
+ index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = addressTemp;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.y;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.y = g_NvidiaExt[index].dst0u.x;
+
+ return retVal;
+}
+
+uint2 __NvAtomicOpFP16x2(RWTexture3D uav, uint3 address, uint2 fp16x2Val, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ // break it down into two fp16x2 atomic ops
+ uint2 retVal;
+
+ // first op has x-coordinate = x * 2
+ uint3 addressTemp = uint3(address.x * 2, address.y, address.z);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = addressTemp;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.x;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.x = g_NvidiaExt[index].dst0u.x;
+
+ // second op has x-coordinate = x * 2 + 1
+ addressTemp.x++;
+ index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = addressTemp;
+ g_NvidiaExt[index].src1u.x = fp16x2Val.y;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP16_ATOMIC;
+ retVal.y = g_NvidiaExt[index].dst0u.x;
+
+ return retVal;
+}
+
+uint __fp32x2Tofp16x2(float2 val)
+{
+ return (f32tof16(val.y)<<16) | f32tof16(val.x) ;
+}
+
+uint2 __fp32x4Tofp16x4(float4 val)
+{
+ return uint2( (f32tof16(val.y)<<16) | f32tof16(val.x), (f32tof16(val.w)<<16) | f32tof16(val.z) ) ;
+}
+
+//----------------------------------------------------------------------------//
+
+// FP32 Atomic functions
+// performs Atomic operation treating the uav as float (fp32) values
+// the passed sub-opcode 'op' should be an immediate constant
+// byteAddress must be multiple of 4
+float __NvAtomicAddFP32(RWByteAddressBuffer uav, uint byteAddress, float val)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = byteAddress;
+ g_NvidiaExt[index].src1u.x = asuint(val); // passing as uint to make it more convinient for the driver to translate
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_ADD;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP32_ATOMIC;
+
+ return asfloat(g_NvidiaExt[index].dst0u.x);
+}
+
+float __NvAtomicAddFP32(RWTexture1D uav, uint address, float val)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address;
+ g_NvidiaExt[index].src1u.x = asuint(val);
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_ADD;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP32_ATOMIC;
+
+ return asfloat(g_NvidiaExt[index].dst0u.x);
+}
+
+float __NvAtomicAddFP32(RWTexture2D uav, uint2 address, float val)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = address;
+ g_NvidiaExt[index].src1u.x = asuint(val);
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_ADD;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP32_ATOMIC;
+
+ return asfloat(g_NvidiaExt[index].dst0u.x);
+}
+
+float __NvAtomicAddFP32(RWTexture3D uav, uint3 address, float val)
+{
+ __NvReferenceUAVForOp(uav);
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = address;
+ g_NvidiaExt[index].src1u.x = asuint(val);
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_ADD;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FP32_ATOMIC;
+
+ return asfloat(g_NvidiaExt[index].dst0u.x);
+}
+
+//----------------------------------------------------------------------------//
+
+// UINT64 Atmoic Functions
+// The functions below performs atomic operation on the given UAV treating the value as uint64
+// byteAddress must be multiple of 8
+// The returned value is the value present in memory location before the atomic operation
+// uint2 vector type is used to represent a single uint64 value with the x component containing the low 32 bits and y component the high 32 bits.
+
+uint2 __NvAtomicCompareExchangeUINT64(RWByteAddressBuffer uav, uint byteAddress, uint2 compareValue, uint2 value)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = byteAddress;
+ g_NvidiaExt[index].src1u.xy = compareValue;
+ g_NvidiaExt[index].src1u.zw = value;
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_CAS;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicOpUINT64(RWByteAddressBuffer uav, uint byteAddress, uint2 value, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = byteAddress;
+ g_NvidiaExt[index].src1u.xy = value;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicCompareExchangeUINT64(RWTexture1D uav, uint address, uint2 compareValue, uint2 value)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address;
+ g_NvidiaExt[index].src1u.xy = compareValue;
+ g_NvidiaExt[index].src1u.zw = value;
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_CAS;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicOpUINT64(RWTexture1D uav, uint address, uint2 value, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = address;
+ g_NvidiaExt[index].src1u.xy = value;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicCompareExchangeUINT64(RWTexture2D uav, uint2 address, uint2 compareValue, uint2 value)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = address;
+ g_NvidiaExt[index].src1u.xy = compareValue;
+ g_NvidiaExt[index].src1u.zw = value;
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_CAS;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicOpUINT64(RWTexture2D uav, uint2 address, uint2 value, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xy = address;
+ g_NvidiaExt[index].src1u.xy = value;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicCompareExchangeUINT64(RWTexture3D uav, uint3 address, uint2 compareValue, uint2 value)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = address;
+ g_NvidiaExt[index].src1u.xy = compareValue;
+ g_NvidiaExt[index].src1u.zw = value;
+ g_NvidiaExt[index].src2u.x = NV_EXTN_ATOM_CAS;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+uint2 __NvAtomicOpUINT64(RWTexture3D uav, uint3 address, uint2 value, uint atomicOpType)
+{
+ __NvReferenceUAVForOp(uav);
+
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.xyz = address;
+ g_NvidiaExt[index].src1u.xy = value;
+ g_NvidiaExt[index].src2u.x = atomicOpType;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_UINT64_ATOMIC;
+
+ return g_NvidiaExt[index].dst0u.xy;
+}
+
+
+uint4 __NvFootprint(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint footprintmode, uint gran, int3 offset = int3(0, 0, 0))
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = texIndex;
+ g_NvidiaExt[index].src0u.y = smpIndex;
+ g_NvidiaExt[index].src1u.xyz = asuint(location);
+ g_NvidiaExt[index].src1u.w = gran;
+ g_NvidiaExt[index].src3u.x = texSpace;
+ g_NvidiaExt[index].src3u.y = smpSpace;
+ g_NvidiaExt[index].src3u.z = texType;
+ g_NvidiaExt[index].src3u.w = footprintmode;
+ g_NvidiaExt[index].src4u.xyz = asuint(offset);
+
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FOOTPRINT;
+ g_NvidiaExt[index].numOutputsForIncCounter = 4;
+
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ uint4 op;
+ op.x = g_NvidiaExt.IncrementCounter();
+ op.y = g_NvidiaExt.IncrementCounter();
+ op.z = g_NvidiaExt.IncrementCounter();
+ op.w = g_NvidiaExt.IncrementCounter();
+ return op;
+}
+
+uint4 __NvFootprintBias(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint footprintmode, uint gran, float bias, int3 offset = int3(0, 0, 0))
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = texIndex;
+ g_NvidiaExt[index].src0u.y = smpIndex;
+ g_NvidiaExt[index].src1u.xyz = asuint(location);
+ g_NvidiaExt[index].src1u.w = gran;
+ g_NvidiaExt[index].src2u.x = asuint(bias);
+ g_NvidiaExt[index].src3u.x = texSpace;
+ g_NvidiaExt[index].src3u.y = smpSpace;
+ g_NvidiaExt[index].src3u.z = texType;
+ g_NvidiaExt[index].src3u.w = footprintmode;
+ g_NvidiaExt[index].src4u.xyz = asuint(offset);
+
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FOOTPRINT_BIAS;
+ g_NvidiaExt[index].numOutputsForIncCounter = 4;
+
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ uint4 op;
+ op.x = g_NvidiaExt.IncrementCounter();
+ op.y = g_NvidiaExt.IncrementCounter();
+ op.z = g_NvidiaExt.IncrementCounter();
+ op.w = g_NvidiaExt.IncrementCounter();
+ return op;
+}
+
+uint4 __NvFootprintLevel(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint footprintmode, uint gran, float lodLevel, int3 offset = int3(0, 0, 0))
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = texIndex;
+ g_NvidiaExt[index].src0u.y = smpIndex;
+ g_NvidiaExt[index].src1u.xyz = asuint(location);
+ g_NvidiaExt[index].src1u.w = gran;
+ g_NvidiaExt[index].src2u.x = asuint(lodLevel);
+ g_NvidiaExt[index].src3u.x = texSpace;
+ g_NvidiaExt[index].src3u.y = smpSpace;
+ g_NvidiaExt[index].src3u.z = texType;
+ g_NvidiaExt[index].src3u.w = footprintmode;
+ g_NvidiaExt[index].src4u.xyz = asuint(offset);
+
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FOOTPRINT_LEVEL;
+ g_NvidiaExt[index].numOutputsForIncCounter = 4;
+
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ uint4 op;
+ op.x = g_NvidiaExt.IncrementCounter();
+ op.y = g_NvidiaExt.IncrementCounter();
+ op.z = g_NvidiaExt.IncrementCounter();
+ op.w = g_NvidiaExt.IncrementCounter();
+ return op;
+}
+
+uint4 __NvFootprintGrad(uint texSpace, uint texIndex, uint smpSpace, uint smpIndex, uint texType, float3 location, uint footprintmode, uint gran, float3 ddx, float3 ddy, int3 offset = int3(0, 0, 0))
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = texIndex;
+ g_NvidiaExt[index].src0u.y = smpIndex;
+ g_NvidiaExt[index].src1u.xyz = asuint(location);
+ g_NvidiaExt[index].src1u.w = gran;
+ g_NvidiaExt[index].src2u.xyz = asuint(ddx);
+ g_NvidiaExt[index].src5u.xyz = asuint(ddy);
+ g_NvidiaExt[index].src3u.x = texSpace;
+ g_NvidiaExt[index].src3u.y = smpSpace;
+ g_NvidiaExt[index].src3u.z = texType;
+ g_NvidiaExt[index].src3u.w = footprintmode;
+ g_NvidiaExt[index].src4u.xyz = asuint(offset);
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_FOOTPRINT_GRAD;
+ g_NvidiaExt[index].numOutputsForIncCounter = 4;
+
+ // result is returned as the return value of IncrementCounter on fake UAV slot
+ uint4 op;
+ op.x = g_NvidiaExt.IncrementCounter();
+ op.y = g_NvidiaExt.IncrementCounter();
+ op.z = g_NvidiaExt.IncrementCounter();
+ op.w = g_NvidiaExt.IncrementCounter();
+ return op;
+}
+
+// returns value of special register - specify subopcode from any of NV_SPECIALOP_* specified in nvShaderExtnEnums.h - other opcodes undefined behavior
+uint __NvGetSpecial(uint subOpCode)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_GET_SPECIAL;
+ g_NvidiaExt[index].src0u.x = subOpCode;
+ return g_NvidiaExt.IncrementCounter();
+}
+
+// predicate is returned in laneValid indicating if srcLane is in range and val from specified lane is returned.
+int __NvShflGeneric(int val, uint srcLane, uint maskClampVal, out uint laneValid)
+{
+ uint index = g_NvidiaExt.IncrementCounter();
+ g_NvidiaExt[index].src0u.x = val; // variable to be shuffled
+ g_NvidiaExt[index].src0u.y = srcLane; // source lane
+ g_NvidiaExt[index].src0u.z = maskClampVal;
+ g_NvidiaExt[index].opcode = NV_EXTN_OP_SHFL_GENERIC;
+ g_NvidiaExt[index].numOutputsForIncCounter = 2;
+
+ laneValid = asuint(g_NvidiaExt.IncrementCounter());
+ return g_NvidiaExt.IncrementCounter();
+}
\ No newline at end of file
diff --git a/samples/test-harness/shaders/include/nvapi/nvShaderExtnEnums.h b/samples/test-harness/shaders/include/nvapi/nvShaderExtnEnums.h
new file mode 100644
index 0000000..cfa918b
--- /dev/null
+++ b/samples/test-harness/shaders/include/nvapi/nvShaderExtnEnums.h
@@ -0,0 +1,141 @@
+ /************************************************************************************************************************************\
+|* *|
+|* Copyright © 2012 NVIDIA Corporation. All rights reserved. *|
+|* *|
+|* NOTICE TO USER: *|
+|* *|
+|* This software is subject to NVIDIA ownership rights under U.S. and international Copyright laws. *|
+|* *|
+|* This software and the information contained herein are PROPRIETARY and CONFIDENTIAL to NVIDIA *|
+|* and are being provided solely under the terms and conditions of an NVIDIA software license agreement. *|
+|* Otherwise, you have no rights to use or access this software in any manner. *|
+|* *|
+|* If not covered by the applicable NVIDIA software license agreement: *|
+|* NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOFTWARE FOR ANY PURPOSE. *|
+|* IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. *|
+|* NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, *|
+|* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. *|
+|* IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, *|
+|* OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, *|
+|* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. *|
+|* *|
+|* U.S. Government End Users. *|
+|* This software is a "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT 1995), *|
+|* consisting of "commercial computer software" and "commercial computer software documentation" *|
+|* as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. *|
+|* Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (JUNE 1995), *|
+|* all U.S. Government End Users acquire the software with only those rights set forth herein. *|
+|* *|
+|* Any use of this software in individual and commercial software must include, *|
+|* in the user documentation and internal comments to the code, *|
+|* the above Disclaimer (as applicable) and U.S. Government End Users Notice. *|
+|* *|
+ \************************************************************************************************************************************/
+
+////////////////////////////////////////////////////////////////////////////////
+////////////////////////// NVIDIA SHADER EXTENSIONS ////////////////////////////
+////////////////////////////////////////////////////////////////////////////////
+
+// This file can be included both from HLSL shader code as well as C++ code.
+// The app should call NvAPI_D3D11_IsNvShaderExtnOpCodeSupported() / NvAPI_D3D12_IsNvShaderExtnOpCodeSupported()
+// to check for support for every nv shader extension opcode it plans to use
+
+
+
+//----------------------------------------------------------------------------//
+//---------------------------- NV Shader Extn Version -----------------------//
+//----------------------------------------------------------------------------//
+#define NV_SHADER_EXTN_VERSION 1
+
+//----------------------------------------------------------------------------//
+//---------------------------- Misc constants --------------------------------//
+//----------------------------------------------------------------------------//
+#define NV_WARP_SIZE 32
+#define NV_WARP_SIZE_LOG2 5
+
+//----------------------------------------------------------------------------//
+//---------------------------- opCode constants ------------------------------//
+//----------------------------------------------------------------------------//
+
+
+#define NV_EXTN_OP_SHFL 1
+#define NV_EXTN_OP_SHFL_UP 2
+#define NV_EXTN_OP_SHFL_DOWN 3
+#define NV_EXTN_OP_SHFL_XOR 4
+
+#define NV_EXTN_OP_VOTE_ALL 5
+#define NV_EXTN_OP_VOTE_ANY 6
+#define NV_EXTN_OP_VOTE_BALLOT 7
+
+#define NV_EXTN_OP_GET_LANE_ID 8
+#define NV_EXTN_OP_FP16_ATOMIC 12
+#define NV_EXTN_OP_FP32_ATOMIC 13
+
+#define NV_EXTN_OP_GET_SPECIAL 19
+
+#define NV_EXTN_OP_UINT64_ATOMIC 20
+
+#define NV_EXTN_OP_MATCH_ANY 21
+
+// FOOTPRINT - For Sample and SampleBias
+#define NV_EXTN_OP_FOOTPRINT 28
+#define NV_EXTN_OP_FOOTPRINT_BIAS 29
+
+#define NV_EXTN_OP_GET_SHADING_RATE 30
+
+// FOOTPRINT - For SampleLevel and SampleGrad
+#define NV_EXTN_OP_FOOTPRINT_LEVEL 31
+#define NV_EXTN_OP_FOOTPRINT_GRAD 32
+
+// SHFL Generic
+#define NV_EXTN_OP_SHFL_GENERIC 33
+
+#define NV_EXTN_OP_VPRS_EVAL_ATTRIB_AT_SAMPLE 51
+#define NV_EXTN_OP_VPRS_EVAL_ATTRIB_SNAPPED 52
+
+// HitObject API
+#define NV_EXTN_OP_HIT_OBJECT_TRACE_RAY 67
+#define NV_EXTN_OP_HIT_OBJECT_MAKE_HIT 68
+#define NV_EXTN_OP_HIT_OBJECT_MAKE_HIT_WITH_RECORD_INDEX 69
+#define NV_EXTN_OP_HIT_OBJECT_MAKE_MISS 70
+#define NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD 71
+#define NV_EXTN_OP_HIT_OBJECT_INVOKE 72
+#define NV_EXTN_OP_HIT_OBJECT_IS_MISS 73
+#define NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_ID 74
+#define NV_EXTN_OP_HIT_OBJECT_GET_INSTANCE_INDEX 75
+#define NV_EXTN_OP_HIT_OBJECT_GET_PRIMITIVE_INDEX 76
+#define NV_EXTN_OP_HIT_OBJECT_GET_GEOMETRY_INDEX 77
+#define NV_EXTN_OP_HIT_OBJECT_GET_HIT_KIND 78
+#define NV_EXTN_OP_HIT_OBJECT_GET_RAY_DESC 79
+#define NV_EXTN_OP_HIT_OBJECT_GET_ATTRIBUTES 80
+#define NV_EXTN_OP_HIT_OBJECT_GET_SHADER_TABLE_INDEX 81
+#define NV_EXTN_OP_HIT_OBJECT_LOAD_LOCAL_ROOT_TABLE_CONSTANT 82
+#define NV_EXTN_OP_HIT_OBJECT_IS_HIT 83
+#define NV_EXTN_OP_HIT_OBJECT_IS_NOP 84
+#define NV_EXTN_OP_HIT_OBJECT_MAKE_NOP 85
+
+//----------------------------------------------------------------------------//
+//-------------------- GET_SPECIAL subOpCode constants -----------------------//
+//----------------------------------------------------------------------------//
+#define NV_SPECIALOP_THREADLTMASK 4
+#define NV_SPECIALOP_FOOTPRINT_SINGLELOD_PRED 5
+#define NV_SPECIALOP_GLOBAL_TIMER_LO 9
+#define NV_SPECIALOP_GLOBAL_TIMER_HI 10
+
+//----------------------------------------------------------------------------//
+//----------------------------- Texture Types -------------------------------//
+//----------------------------------------------------------------------------//
+#define NV_EXTN_TEXTURE_1D 2
+#define NV_EXTN_TEXTURE_1D_ARRAY 3
+#define NV_EXTN_TEXTURE_2D 4
+#define NV_EXTN_TEXTURE_2D_ARRAY 5
+#define NV_EXTN_TEXTURE_3D 6
+#define NV_EXTN_TEXTURE_CUBE 7
+#define NV_EXTN_TEXTURE_CUBE_ARRAY 8
+
+
+//---------------------------------------------------------------------------//
+//----------------FOOTPRINT Enums for NvFootprint* extns---------------------//
+//---------------------------------------------------------------------------//
+#define NV_EXTN_FOOTPRINT_MODE_FINE 0
+#define NV_EXTN_FOOTPRINT_MODE_COARSE 1
diff --git a/samples/test-harness/src/Benchmark.cpp b/samples/test-harness/src/Benchmark.cpp
index e6919ae..02a08ae 100644
--- a/samples/test-harness/src/Benchmark.cpp
+++ b/samples/test-harness/src/Benchmark.cpp
@@ -33,7 +33,7 @@ namespace Benchmark
config.app.benchmarkRunning = true;
}
- void UpdateBenchmark(BenchmarkRun& benchmarkRun, Instrumentation::Performance& perf, Configs::Config& config, Graphics::Globals& gfx, std::ofstream& log)
+ bool UpdateBenchmark(BenchmarkRun& benchmarkRun, Instrumentation::Performance& perf, Configs::Config& config, Graphics::Globals& gfx, std::ofstream& log)
{
config.app.benchmarkProgress = (uint32_t)(((float)benchmarkRun.numFramesBenched / (float)NumBenchmarkFrames) * 100.f);
@@ -119,8 +119,10 @@ namespace Benchmark
}
config.app.benchmarkRunning = false;
+ return true;
}
benchmarkRun.numFramesBenched++;
+ return false;
}
-}
\ No newline at end of file
+}
diff --git a/samples/test-harness/src/Caches.cpp b/samples/test-harness/src/Caches.cpp
index 800f1ed..f3d425c 100644
--- a/samples/test-harness/src/Caches.cpp
+++ b/samples/test-harness/src/Caches.cpp
@@ -12,7 +12,7 @@
using namespace DirectX;
-#define SCENE_CACHE_VERSION 3
+#define SCENE_CACHE_VERSION 4
namespace Caches
{
@@ -95,6 +95,10 @@ namespace Caches
mesh.name = std::string(buffer);
delete[] buffer;
+ Read(in, &mesh.index, sizeof(uint32_t));
+ Read(in, &mesh.numIndices, sizeof(uint32_t));
+ Read(in, &mesh.numVertices, sizeof(uint32_t));
+
// Read mesh bounding box
Read(in, &mesh.boundingBox, sizeof(rtxgi::AABB));
@@ -114,6 +118,8 @@ namespace Caches
Read(in, &mp.material, sizeof(int));
Read(in, &mp.opaque, sizeof(bool));
Read(in, &mp.doubleSided, sizeof(bool));
+ Read(in, &mp.indexByteOffset, sizeof(uint32_t));
+ Read(in, &mp.vertexByteOffset, sizeof(uint32_t));
Read(in, &mp.boundingBox, sizeof(rtxgi::AABB)); // post-transform bounding box
Read(in, &numVertices);
@@ -245,6 +251,10 @@ namespace Caches
out.write(mesh.name.c_str(), numChars);
out.seekp(out.tellp());
+ Write(out, &mesh.index, sizeof(uint32_t));
+ Write(out, &mesh.numIndices, sizeof(uint32_t));
+ Write(out, &mesh.numVertices, sizeof(uint32_t));
+
// Mesh bounding box
Write(out, &mesh.boundingBox, sizeof(rtxgi::AABB));
@@ -260,6 +270,8 @@ namespace Caches
Write(out, &primitive.material, sizeof(int));
Write(out, &primitive.opaque, sizeof(bool));
Write(out, &primitive.doubleSided, sizeof(bool));
+ Write(out, &primitive.indexByteOffset, sizeof(uint32_t));
+ Write(out, &primitive.vertexByteOffset, sizeof(uint32_t));
Write(out, &primitive.boundingBox, sizeof(rtxgi::AABB));
Write(out, &numVertices);
Write(out, primitive.vertices.data(), sizeof(Graphics::Vertex) * numVertices);
@@ -436,7 +448,7 @@ namespace Caches
Read(in, &cacheVersion, sizeof(uint32_t));
if(cacheVersion != SCENE_CACHE_VERSION)
{
- log << "\n\tWarning: scene cache version '" << cacheVersion << "' does not match expected version '" << SCENE_CACHE_VERSION << "'\n";
+ log << "\n\tWarning: scene cache version '" << cacheVersion << "' does not match expected version '" << SCENE_CACHE_VERSION << "'";
log << "\n\tRebuilding scene cache...";
return false;
}
diff --git a/samples/test-harness/src/Configs.cpp b/samples/test-harness/src/Configs.cpp
index 55a34de..5d841be 100644
--- a/samples/test-harness/src/Configs.cpp
+++ b/samples/test-harness/src/Configs.cpp
@@ -156,9 +156,9 @@ namespace Configs
destination = (rtxgi::EDDGIVolumeProbeVisType)stoi(source);
}
- /*
- * Parse a post process configuration entry.
- */
+ /**
+ * Parse a post process configuration entry.
+ */
bool ParseConfigPostProcessEntry(const std::vector& tokens, const std::string& rhs, Config& config, uint32_t lineNumber, std::ofstream& log)
{
// Post process entries have no more than 3 tokens
@@ -199,9 +199,9 @@ namespace Configs
return false;
}
- /*
- * Parse a DDGI configuration entry.
- */
+ /**
+ * Parse a DDGI configuration entry.
+ */
bool ParseConfigDDGIEntry(const std::vector& tokens, const std::string& rhs, Config& config, uint32_t lineNumber, std::ofstream& log)
{
// DDGI entries have no more than 6 tokens
@@ -257,7 +257,19 @@ namespace Configs
Store(data, config.ddgi.volumes[volumeIndex].probeClassificationEnabled); return true;
}
}
-
+
+ if (tokens[3].compare("probeVariability") == 0)
+ {
+ if (tokens.size() == 5 && tokens[4].compare("enabled") == 0)
+ {
+ Store(data, config.ddgi.volumes[volumeIndex].probeVariabilityEnabled); return true;
+ }
+ else if (tokens.size() == 5 && tokens[4].compare("threshold") == 0)
+ {
+ Store(data, config.ddgi.volumes[volumeIndex].probeVariabilityThreshold); return true;
+ }
+ }
+
if (tokens[3].compare("infiniteScrolling") == 0)
{
if (tokens.size() == 5 && tokens[4].compare("enabled") == 0)
@@ -289,6 +301,11 @@ namespace Configs
Store(data, config.ddgi.volumes[volumeIndex].textureFormats.dataFormat);
return true;
}
+ else if (tokens[4].compare("variability") == 0 && tokens[5].compare("format") == 0)
+ {
+ Store(data, config.ddgi.volumes[volumeIndex].textureFormats.variabilityFormat);
+ return true;
+ }
}
if (tokens[3].compare("vis") == 0)
@@ -342,6 +359,12 @@ namespace Configs
Store(data, config.ddgi.volumes[volumeIndex].probeDataScale);
return true;
}
+
+ if (tokens[5].compare("probeVariabilityScale") == 0)
+ {
+ Store(data, config.ddgi.volumes[volumeIndex].probeVariabilityScale);
+ return true;
+ }
}
}
}
diff --git a/samples/test-harness/src/Direct3D12.cpp b/samples/test-harness/src/Direct3D12.cpp
index 5b685a3..8f1a7cd 100644
--- a/samples/test-harness/src/Direct3D12.cpp
+++ b/samples/test-harness/src/Direct3D12.cpp
@@ -12,6 +12,14 @@
#include "UI.h"
#include "ImageCapture.h"
+#if GFX_NVAPI
+#include "nvapi.h"
+#include "nvShaderExtnEnums.h"
+
+#define NV_SHADER_EXTN_SLOT 999999
+#define NV_SHADER_EXTN_REGISTER_SPACE 999999
+#endif
+
namespace Graphics
{
using namespace DirectX;
@@ -45,6 +53,9 @@ namespace Graphics
return true;
}
+ /**
+ * Convert wide strings to narrow strings.
+ */
void ConvertWideStringToNarrow(std::wstring& wide, std::string& narrow)
{
narrow.resize(wide.size());
@@ -59,11 +70,11 @@ namespace Graphics
/**
* Device creation helper.
*/
- bool CreateDeviceInternal(ID3D12Device6*& device, IDXGIFactory7*& factory, Configs::Config& config)
+ bool CreateDeviceInternal(Globals& d3d, Configs::Config& config)
{
// Create the device
IDXGIAdapter1* adapter = nullptr;
- for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != factory->EnumAdapters1(adapterIndex, &adapter); ++adapterIndex)
+ for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != d3d.factory->EnumAdapters1(adapterIndex, &adapter); ++adapterIndex)
{
DXGI_ADAPTER_DESC1 adapterDesc;
adapter->GetDesc1(&adapterDesc);
@@ -72,39 +83,70 @@ namespace Graphics
continue; // Don't select the Basic Render Driver adapter
}
- if (SUCCEEDED(D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_12_0, _uuidof(ID3D12Device6), (void**)&device)))
+ if (SUCCEEDED(D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_12_0, _uuidof(ID3D12Device6), (void**)&d3d.device)))
{
// Check if the device supports ray tracing
- D3D12_FEATURE_DATA_D3D12_OPTIONS5 features5;
- HRESULT hr = device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &features5, sizeof(D3D12_FEATURE_DATA_D3D12_OPTIONS5));
+ D3D12_FEATURE_DATA_D3D12_OPTIONS5 features5 = {};
+ HRESULT hr = d3d.device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &features5, sizeof(D3D12_FEATURE_DATA_D3D12_OPTIONS5));
if (FAILED(hr) || features5.RaytracingTier < D3D12_RAYTRACING_TIER_1_0)
{
- SAFE_RELEASE(device);
- device = nullptr;
+ SAFE_RELEASE(d3d.device);
+ d3d.device = nullptr;
continue;
}
// Check if the device supports SM6.6
- D3D12_FEATURE_DATA_SHADER_MODEL shaderModel;
+ D3D12_FEATURE_DATA_SHADER_MODEL shaderModel = {};
shaderModel.HighestShaderModel = D3D_SHADER_MODEL_6_6;
- hr = device->CheckFeatureSupport(D3D12_FEATURE_SHADER_MODEL, &shaderModel, sizeof(D3D12_FEATURE_DATA_SHADER_MODEL));
+ hr = d3d.device->CheckFeatureSupport(D3D12_FEATURE_SHADER_MODEL, &shaderModel, sizeof(D3D12_FEATURE_DATA_SHADER_MODEL));
if (FAILED(hr))
{
- SAFE_RELEASE(device);
- device = nullptr;
+ SAFE_RELEASE(d3d.device);
+ d3d.device = nullptr;
continue;
}
// Resource binding tier 3 is required for SM6.6 dynamic resources
- D3D12_FEATURE_DATA_D3D12_OPTIONS features;
- hr = device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS, &features, sizeof(D3D12_FEATURE_DATA_D3D12_OPTIONS));
+ D3D12_FEATURE_DATA_D3D12_OPTIONS features = {};
+ hr = d3d.device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS, &features, sizeof(D3D12_FEATURE_DATA_D3D12_OPTIONS));
if (FAILED(hr) || features.ResourceBindingTier < D3D12_RESOURCE_BINDING_TIER_3)
{
- SAFE_RELEASE(device);
- device = nullptr;
+ SAFE_RELEASE(d3d.device);
+ d3d.device = nullptr;
continue;
}
+ #if GFX_NVAPI
+ // Check for SER HLSL extension support
+ NvAPI_Status status = NvAPI_D3D12_IsNvShaderExtnOpCodeSupported(
+ d3d.device,
+ NV_EXTN_OP_HIT_OBJECT_REORDER_THREAD,
+ &d3d.supportsShaderExecutionReordering);
+
+ if (status == NVAPI_OK && d3d.supportsShaderExecutionReordering)
+ {
+ // Check for SER device support
+ NVAPI_D3D12_RAYTRACING_THREAD_REORDERING_CAPS ReorderCaps = NVAPI_D3D12_RAYTRACING_THREAD_REORDERING_CAP_NONE;
+ status = NvAPI_D3D12_GetRaytracingCaps(
+ d3d.device,
+ NVAPI_D3D12_RAYTRACING_CAPS_TYPE_THREAD_REORDERING,
+ &ReorderCaps,
+ sizeof(ReorderCaps));
+
+ if (status != NVAPI_OK || ReorderCaps == NVAPI_D3D12_RAYTRACING_THREAD_REORDERING_CAP_NONE)
+ {
+ d3d.supportsShaderExecutionReordering = false;
+ }
+ }
+ #endif
+
+ D3D12_FEATURE_DATA_D3D12_OPTIONS1 waveFeatures = {};
+ hr = d3d.device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS1, &waveFeatures, sizeof(waveFeatures));
+ if (SUCCEEDED(hr))
+ {
+ d3d.features.waveLaneCount = waveFeatures.WaveLaneCountMin;
+ }
+
// Set the graphics API name
config.app.api = "Direct3D 12";
@@ -112,12 +154,12 @@ namespace Graphics
std::wstring name(adapterDesc.Description);
ConvertWideStringToNarrow(name, config.app.gpuName);
#ifdef GFX_NAME_OBJECTS
- device->SetName(name.c_str());
+ d3d.device->SetName(name.c_str());
#endif
break;
}
- if (device == nullptr)
+ if (d3d.device == nullptr)
{
return false; // Didn't find a device that supports ray tracing
}
@@ -426,33 +468,42 @@ namespace Graphics
}
/**
- * Create the index buffer for a mesh primitive.
+ * Create the index buffer for a mesh.
+ * Copy the index data to the upload buffer and schedule a copy to the device buffer.
*/
- bool CreateIndexBuffer(Globals& d3d, const Scenes::MeshPrimitive& primitive, ID3D12Resource** device, ID3D12Resource** upload, D3D12_INDEX_BUFFER_VIEW& view)
+ bool CreateIndexBuffer(Globals& d3d, const Scenes::Mesh& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_INDEX_BUFFER_VIEW& view)
{
// Create the index buffer upload resource
- UINT size = static_cast(primitive.indices.size()) * sizeof(UINT);
- BufferDesc desc = { size, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
+ UINT sizeInBytes = mesh.numIndices * sizeof(UINT);
+ BufferDesc desc = { sizeInBytes, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
if (!CreateBuffer(d3d, desc, upload)) return false;
// Create the index buffer device resource
- desc = { size, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
+ desc = { sizeInBytes, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
if (!CreateBuffer(d3d, desc, device)) return false;
// Initialize the index buffer view
view.Format = DXGI_FORMAT_R32_UINT;
- view.SizeInBytes = size;
+ view.SizeInBytes = sizeInBytes;
view.BufferLocation = (*device)->GetGPUVirtualAddress();
- // Copy the index data to the upload buffer
+ // Copy the index data of each mesh primitive to the upload buffer
UINT8* pData = nullptr;
D3D12_RANGE readRange = {};
D3DCHECK((*upload)->Map(0, &readRange, reinterpret_cast(&pData)));
- memcpy(pData, primitive.indices.data(), size);
+
+ for (UINT primitiveIndex = 0; primitiveIndex < static_cast(mesh.primitives.size()); primitiveIndex++)
+ {
+ // Get the mesh primitive and copy its indices to the upload buffer
+ const Scenes::MeshPrimitive& primitive = mesh.primitives[primitiveIndex];
+
+ UINT size = static_cast(primitive.indices.size()) * sizeof(UINT);
+ memcpy(pData + primitive.indexByteOffset, primitive.indices.data(), size);
+ }
(*upload)->Unmap(0, nullptr);
// Schedule a copy of the upload buffer to the device buffer
- d3d.cmdList->CopyBufferRegion(*device, 0, *upload, 0, size);
+ d3d.cmdList->CopyBufferRegion(*device, 0, *upload, 0, sizeInBytes);
// Transition the default heap resource to generic read after the copy is complete
D3D12_RESOURCE_BARRIER barrier = {};
@@ -468,34 +519,43 @@ namespace Graphics
}
/**
- * Create the vertex buffer for a mesh primitive.
+ * Create the vertex buffer for a mesh.
+ * Copy the vertex data to the upload buffer and schedule a copy to the device buffer.
*/
- bool CreateVertexBuffer(Globals& d3d, const Scenes::MeshPrimitive& primitive, ID3D12Resource** device, ID3D12Resource** upload, D3D12_VERTEX_BUFFER_VIEW& view)
+ bool CreateVertexBuffer(Globals& d3d, const Scenes::Mesh& mesh, ID3D12Resource** device, ID3D12Resource** upload, D3D12_VERTEX_BUFFER_VIEW& view)
{
- // Create the vertex buffer resource
+ // Create the vertex buffer upload resource
UINT stride = sizeof(Vertex);
- UINT size = static_cast(primitive.vertices.size()) * stride;
- BufferDesc desc = { size, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
+ UINT sizeInBytes = mesh.numVertices * stride;
+ BufferDesc desc = { sizeInBytes, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
if (!CreateBuffer(d3d, desc, upload)) return false;
// Create the vertex buffer device resource
- desc = { size, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
+ desc = { sizeInBytes, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
if (!CreateBuffer(d3d, desc, device)) return false;
// Initialize the vertex buffer view
view.StrideInBytes = stride;
- view.SizeInBytes = size;
+ view.SizeInBytes = sizeInBytes;
view.BufferLocation = (*device)->GetGPUVirtualAddress();
- // Copy the vertex data to the upload buffer
+ // Copy the vertex data of each mesh primitive to the upload buffer
UINT8* pData = nullptr;
D3D12_RANGE readRange = {};
D3DCHECK((*upload)->Map(0, &readRange, reinterpret_cast(&pData)));
- memcpy(pData, primitive.vertices.data(), size);
+
+ for (UINT primitiveIndex = 0; primitiveIndex < static_cast(mesh.primitives.size()); primitiveIndex++)
+ {
+ // Get the mesh primitive and copy its vertices to the upload buffer
+ const Scenes::MeshPrimitive& primitive = mesh.primitives[primitiveIndex];
+
+ UINT size = static_cast(primitive.vertices.size()) * stride;
+ memcpy(pData + primitive.vertexByteOffset, primitive.vertices.data(), size);
+ }
(*upload)->Unmap(0, nullptr);
// Schedule a copy of the upload buffer to the device buffer
- d3d.cmdList->CopyBufferRegion(*device, 0, *upload, 0, size);
+ d3d.cmdList->CopyBufferRegion(*device, 0, *upload, 0, sizeInBytes);
// Transition the default heap resource to generic read after the copy is complete
D3D12_RESOURCE_BARRIER barrier = {};
@@ -511,30 +571,40 @@ namespace Graphics
}
/**
- * Create a bottom level acceleration structure for a mesh primitive.
+ * Create a bottom level acceleration structure for a mesh.
*/
- bool CreateBLAS(Globals& d3d, Resources& resources, const Scenes::MeshPrimitive& primitive, const std::string debugName = "")
+ bool CreateBLAS(Globals& d3d, Resources& resources, const Scenes::Mesh& mesh)
{
- D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAGS buildFlags = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_TRACE;
+ // Describe the mesh primitives
+ std::vector primitives;
- // Describe the mesh primitive geometry
D3D12_RAYTRACING_GEOMETRY_DESC desc = {};
desc.Type = D3D12_RAYTRACING_GEOMETRY_TYPE_TRIANGLES;
- desc.Triangles.VertexBuffer.StartAddress = resources.sceneVBs[primitive.index]->GetGPUVirtualAddress();
- desc.Triangles.VertexBuffer.StrideInBytes = resources.sceneVBViews[primitive.index].StrideInBytes;
- desc.Triangles.VertexCount = static_cast(primitive.vertices.size());
- desc.Triangles.VertexFormat = DXGI_FORMAT_R32G32B32_FLOAT;
- desc.Triangles.IndexBuffer = resources.sceneIBs[primitive.index]->GetGPUVirtualAddress();
- desc.Triangles.IndexFormat = resources.sceneIBViews[primitive.index].Format;
- desc.Triangles.IndexCount = static_cast(primitive.indices.size());
- desc.Flags = primitive.opaque ? D3D12_RAYTRACING_GEOMETRY_FLAG_OPAQUE : D3D12_RAYTRACING_GEOMETRY_FLAG_NONE;
+ for (UINT primitiveIndex = 0; primitiveIndex < static_cast(mesh.primitives.size()); primitiveIndex++)
+ {
+ // Get the mesh primitive
+ const Scenes::MeshPrimitive& primitive = mesh.primitives[primitiveIndex];
+
+ desc.Triangles.VertexBuffer.StartAddress = resources.sceneVBs[mesh.index]->GetGPUVirtualAddress() + primitive.vertexByteOffset;
+ desc.Triangles.VertexBuffer.StrideInBytes = resources.sceneVBViews[mesh.index].StrideInBytes;
+ desc.Triangles.VertexCount = static_cast(primitive.vertices.size());
+ desc.Triangles.VertexFormat = DXGI_FORMAT_R32G32B32_FLOAT;
+ desc.Triangles.IndexBuffer = resources.sceneIBs[mesh.index]->GetGPUVirtualAddress() + primitive.indexByteOffset;
+ desc.Triangles.IndexFormat = resources.sceneIBViews[mesh.index].Format;
+ desc.Triangles.IndexCount = static_cast(primitive.indices.size());
+ desc.Flags = primitive.opaque ? D3D12_RAYTRACING_GEOMETRY_FLAG_OPAQUE : D3D12_RAYTRACING_GEOMETRY_FLAG_NONE;
+
+ primitives.push_back(desc);
+ }
+
+ D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAGS buildFlags = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_TRACE;
// Describe the bottom level acceleration structure inputs
D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS asInputs = {};
asInputs.Type = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL;
asInputs.DescsLayout = D3D12_ELEMENTS_LAYOUT_ARRAY;
- asInputs.NumDescs = 1;
- asInputs.pGeometryDescs = &desc;
+ asInputs.NumDescs = static_cast(primitives.size());
+ asInputs.pGeometryDescs = primitives.data();
asInputs.Flags = buildFlags;
// Get the size requirements for the BLAS buffer
@@ -552,11 +622,10 @@ namespace Graphics
D3D12_RESOURCE_STATE_COMMON,
D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS,
};
- if (!CreateBuffer(d3d, blasScratchDesc, &resources.blas[primitive.index].scratch)) return false;
+ if (!CreateBuffer(d3d, blasScratchDesc, &resources.blas[mesh.index].scratch)) return false;
#ifdef GFX_NAME_OBJECTS
- std::wstring name = std::wstring(debugName.begin(), debugName.end());
- name.append(L" (scratch)");
- resources.blas[primitive.index].scratch->SetName(name.c_str());
+ std::wstring name = L"BLAS: " + std::wstring(mesh.name.begin(), mesh.name.end()) + L" (scratch)";
+ resources.blas[mesh.index].scratch->SetName(name.c_str());
#endif
// Create the BLAS buffer
@@ -568,17 +637,17 @@ namespace Graphics
D3D12_RESOURCE_STATE_RAYTRACING_ACCELERATION_STRUCTURE,
D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS,
};
- if (!CreateBuffer(d3d, blasDesc, &resources.blas[primitive.index].as)) return false;
+ if (!CreateBuffer(d3d, blasDesc, &resources.blas[mesh.index].as)) return false;
#ifdef GFX_NAME_OBJECTS
- name = std::wstring(debugName.begin(), debugName.end());
- resources.blas[primitive.index].as->SetName(name.c_str());
+ name = L"BLAS: " + std::wstring(mesh.name.begin(), mesh.name.end());
+ resources.blas[mesh.index].as->SetName(name.c_str());
#endif
// Describe and build the BLAS
D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_DESC buildDesc = {};
buildDesc.Inputs = asInputs;
- buildDesc.ScratchAccelerationStructureData = resources.blas[primitive.index].scratch->GetGPUVirtualAddress();
- buildDesc.DestAccelerationStructureData = resources.blas[primitive.index].as->GetGPUVirtualAddress();
+ buildDesc.ScratchAccelerationStructureData = resources.blas[mesh.index].scratch->GetGPUVirtualAddress();
+ buildDesc.DestAccelerationStructureData = resources.blas[mesh.index].as->GetGPUVirtualAddress();
d3d.cmdList->BuildRaytracingAccelerationStructure(&buildDesc, 0, nullptr);
@@ -999,6 +1068,24 @@ namespace Graphics
}
#endif
+ #if GFX_NVAPI
+ // Fake UAV for NVAPI
+ D3D12_DESCRIPTOR_RANGE nvapiRange = {};
+ nvapiRange.BaseShaderRegister = NV_SHADER_EXTN_SLOT;
+ nvapiRange.NumDescriptors = 1;
+ nvapiRange.RegisterSpace = NV_SHADER_EXTN_REGISTER_SPACE;
+ nvapiRange.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV;
+ nvapiRange.OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND;
+
+ // Root Parameter 2 (or 4): NVAPI
+ D3D12_ROOT_PARAMETER param = {};
+ param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
+ param.ShaderVisibility = D3D12_SHADER_VISIBILITY_ALL;
+ param.DescriptorTable.NumDescriptorRanges = 1;
+ param.DescriptorTable.pDescriptorRanges = &nvapiRange;
+ rootParameters.push_back(param);
+ #endif
+
// Describe the root signature
D3D12_ROOT_SIGNATURE_DESC desc = {};
desc.NumParameters = static_cast(rootParameters.size());
@@ -1105,11 +1192,13 @@ namespace Graphics
SAFE_RELEASE(resources.lightsSTB);
SAFE_RELEASE(resources.lightsSTBUpload);
SAFE_RELEASE(resources.materialsSTB);
- SAFE_RELEASE(resources.materialIndicesRB);
+ SAFE_RELEASE(resources.meshOffsetsRB);
+ SAFE_RELEASE(resources.geometryDataRB);
resources.cameraCBPtr = nullptr;
resources.lightsSTBPtr = nullptr;
resources.materialsSTBPtr = nullptr;
- resources.materialIndicesRBPtr = nullptr;
+ resources.meshOffsetsRBPtr = nullptr;
+ resources.geometryDataRBPtr = nullptr;
// Render Targets
SAFE_RELEASE(resources.rt.GBufferA);
@@ -1172,6 +1261,10 @@ namespace Graphics
SAFE_RELEASE(d3d.cmdQueue);
SAFE_RELEASE(d3d.device);
SAFE_RELEASE(d3d.factory);
+
+ #if GFX_NVAPI
+ NvAPI_Unload();
+ #endif
}
//----------------------------------------------------------------------------------------------------------
@@ -1212,7 +1305,7 @@ namespace Graphics
*/
bool CreateSceneLightsBuffer(Globals& d3d, Resources& resources, const Scenes::Scene& scene)
{
- UINT size = ALIGN(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT, Scenes::Light::GetGPUDataSize() * static_cast(scene.lights.size()));
+ UINT size = ALIGN(D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT, Scenes::Light::GetGPUDataSize() * static_cast(scene.lights.size()));
if (size == 0) return true; // scenes with no lights are valid
// Create the lights upload buffer resource
@@ -1269,12 +1362,12 @@ namespace Graphics
}
/**
- * Create the scene materials buffers.
+ * Create the scene materials buffer.
*/
- bool CreateSceneMaterialsBuffers(Globals& d3d, Resources& resources, const Scenes::Scene& scene)
+ bool CreateSceneMaterialsBuffer(Globals& d3d, Resources& resources, const Scenes::Scene& scene)
{
// Create the materials buffer upload resource
- UINT size = ALIGN(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT, Scenes::Material::GetGPUDataSize() * static_cast(scene.materials.size()));
+ UINT size = ALIGN(D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT, Scenes::Material::GetGPUDataSize() * static_cast(scene.materials.size()));
BufferDesc desc = { size, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
if (!CreateBuffer(d3d, desc, &resources.materialsSTBUpload)) return false;
#ifdef GFX_NAME_OBJECTS
@@ -1343,56 +1436,120 @@ namespace Graphics
handle.ptr = resources.srvDescHeapStart.ptr + (DescriptorHeapOffsets::STB_MATERIALS * resources.srvDescHeapEntrySize);
d3d.device->CreateShaderResourceView(resources.materialsSTB, &srvDesc, handle);
- // Material Indices
+ return true;
+ }
+
+ /**
+ * Create the scene material indexing buffers.
+ */
+ bool CreateSceneMaterialIndexingBuffers(Globals& d3d, Resources& resources, const Scenes::Scene& scene)
+ {
+ // Mesh Offsets
- // Create the material indices upload buffer resource
- size = ALIGN(D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT, sizeof(UINT) * scene.numMeshPrimitives);
- desc = { size, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
- if (!CreateBuffer(d3d, desc, &resources.materialIndicesRBUpload)) return false;
+ // Create the mesh offsets upload buffer resource
+ UINT meshOffsetsSize = ALIGN(D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT, sizeof(UINT) * static_cast(scene.meshes.size()) );
+ BufferDesc desc = { meshOffsetsSize, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
+ if (!CreateBuffer(d3d, desc, &resources.meshOffsetsRBUpload)) return false;
#ifdef GFX_NAME_OBJECTS
- resources.materialIndicesRBUpload->SetName(L"Material Indices Upload Raw Buffer");
+ resources.meshOffsetsRBUpload->SetName(L"Mesh Offsets Upload ByteAddressBuffer");
#endif
- // Create the material indices device buffer resource
- desc = { size, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
- if (!CreateBuffer(d3d, desc, &resources.materialIndicesRB)) return false;
+ // Create the mesh offsets device buffer resource
+ desc = { meshOffsetsSize, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
+ if (!CreateBuffer(d3d, desc, &resources.meshOffsetsRB)) return false;
+ #ifdef GFX_NAME_OBJECTS
+ resources.meshOffsetsRB->SetName(L"Mesh Offsets ByteAddressBuffer");
+ #endif
+
+ // Geometry Data
+
+ // Create the geometry (mesh primitive) data upload buffer resource
+ UINT geometryDataSize = ALIGN(D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT, sizeof(GeometryData) * scene.numMeshPrimitives);
+ desc = { geometryDataSize, 0, EHeapType::UPLOAD, D3D12_RESOURCE_STATE_GENERIC_READ, D3D12_RESOURCE_FLAG_NONE };
+ if (!CreateBuffer(d3d, desc, &resources.geometryDataRBUpload)) return false;
+ #ifdef GFX_NAME_OBJECTS
+ resources.geometryDataRBUpload->SetName(L"Geometry Data Upload ByteAddressBuffer");
+ #endif
+
+ // Create the geometry (mesh primitive) data device buffer resource
+ desc = { geometryDataSize, 0, EHeapType::DEFAULT, D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_FLAG_NONE };
+ if (!CreateBuffer(d3d, desc, &resources.geometryDataRB)) return false;
#ifdef GFX_NAME_OBJECTS
- resources.materialIndicesRB->SetName(L"Material Indices Raw Buffer");
+ resources.geometryDataRB->SetName(L"Geometry Data ByteAddressBuffer");
#endif
- // Copy the material indices to the upload buffer
- offset = 0;
- D3DCHECK(resources.materialIndicesRBUpload->Map(0, &readRange, reinterpret_cast(&resources.materialIndicesRBPtr)));
+ // Copy the mesh offsets and geometry data to the upload buffers
+ UINT primitiveOffset = 0;
+ D3D12_RANGE readRange = {};
+ D3DCHECK(resources.meshOffsetsRBUpload->Map(0, &readRange, reinterpret_cast(&resources.meshOffsetsRBPtr)));
+ D3DCHECK(resources.geometryDataRBUpload->Map(0, &readRange, reinterpret_cast(&resources.geometryDataRBPtr)));
+
+ UINT8* meshOffsetsAddress = resources.meshOffsetsRBPtr;
+ UINT8* geometryDataAddress = resources.geometryDataRBPtr;
for (UINT meshIndex = 0; meshIndex < static_cast(scene.meshes.size()); meshIndex++)
{
- const Scenes::Mesh mesh = scene.meshes[meshIndex];
- for (UINT primitiveIndex = 0; primitiveIndex < static_cast(scene.meshes[meshIndex].primitives.size()); primitiveIndex++)
+ // Get the mesh
+ const Scenes::Mesh& mesh = scene.meshes[meshIndex];
+
+ // Copy the mesh offset to the upload buffer
+ UINT meshOffset = primitiveOffset * sizeof(GeometryData);
+ memcpy(meshOffsetsAddress, &meshOffset, sizeof(UINT));
+ meshOffsetsAddress += sizeof(UINT);
+
+ for (UINT primitiveIndex = 0; primitiveIndex < static_cast(mesh.primitives.size()); primitiveIndex++)
{
- const Scenes::MeshPrimitive& primitive = scene.meshes[meshIndex].primitives[primitiveIndex];
- memcpy(resources.materialIndicesRBPtr + offset, &primitive.material, sizeof(UINT));
- offset += sizeof(UINT);
+ // Get the mesh primitive and copy its material index to the upload buffer
+ const Scenes::MeshPrimitive& primitive = mesh.primitives[primitiveIndex];
+
+ GeometryData data;
+ data.materialIndex = primitive.material;
+ data.indexByteAddress = primitive.indexByteOffset;
+ data.vertexByteAddress = primitive.vertexByteOffset;
+ memcpy(geometryDataAddress, &data, sizeof(GeometryData));
+
+ geometryDataAddress += sizeof(GeometryData);
+ primitiveOffset++;
}
}
- resources.materialIndicesRBUpload->Unmap(0, nullptr);
+ resources.meshOffsetsRBUpload->Unmap(0, nullptr);
+ resources.geometryDataRBUpload->Unmap(0, nullptr);
- // Schedule a copy of the upload buffer to the device buffer
- d3d.cmdList->CopyBufferRegion(resources.materialIndicesRB, 0, resources.materialIndicesRBUpload, 0, size);
+ // Schedule a copy of the upload buffers to the device buffers
+ d3d.cmdList->CopyBufferRegion(resources.meshOffsetsRB, 0, resources.meshOffsetsRBUpload, 0, meshOffsetsSize);
+ d3d.cmdList->CopyBufferRegion(resources.geometryDataRB, 0, resources.geometryDataRBUpload, 0, geometryDataSize);
- // Transition the default heap resource to generic read after the copy is complete
- barrier.Transition.pResource = resources.materialIndicesRB;
+ // Transition the default heap resources to generic read after the copies are complete
+ std::vector barriers;
- d3d.cmdList->ResourceBarrier(1, &barrier);
+ D3D12_RESOURCE_BARRIER barrier = {};
+ barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
+ barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_COPY_DEST;
+ barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_GENERIC_READ;
+ barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
+
+ barrier.Transition.pResource = resources.meshOffsetsRB;
+ barriers.push_back(barrier);
+ barrier.Transition.pResource = resources.geometryDataRB;
+ barriers.push_back(barrier);
+
+ d3d.cmdList->ResourceBarrier(static_cast(barriers.size()), barriers.data());
- // Add the material indices ByteAddressBuffer SRV to the descriptor heap
- srvDesc = {};
+ // Add the mesh offsets ByteAddressBuffer SRV to the descriptor heap
+ D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
srvDesc.Format = DXGI_FORMAT_R32_TYPELESS;
srvDesc.ViewDimension = D3D12_SRV_DIMENSION_BUFFER;
- srvDesc.Buffer.NumElements = scene.numMeshPrimitives;
srvDesc.Buffer.Flags = D3D12_BUFFER_SRV_FLAG_RAW;
srvDesc.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
- handle.ptr = resources.srvDescHeapStart.ptr + (DescriptorHeapOffsets::SRV_MATERIAL_INDICES * resources.srvDescHeapEntrySize);
- d3d.device->CreateShaderResourceView(resources.materialIndicesRB, &srvDesc, handle);
+ D3D12_CPU_DESCRIPTOR_HANDLE handle;
+ srvDesc.Buffer.NumElements = static_cast(scene.meshes.size());
+ handle.ptr = resources.srvDescHeapStart.ptr + (DescriptorHeapOffsets::SRV_MESH_OFFSETS * resources.srvDescHeapEntrySize);
+ d3d.device->CreateShaderResourceView(resources.meshOffsetsRB, &srvDesc, handle);
+
+ // Add the geometry (mesh primitive) data ByteAddressBuffer SRV to the descriptor heap
+ srvDesc.Buffer.NumElements = scene.numMeshPrimitives * (sizeof(GeometryData) / sizeof(UINT));
+ handle.ptr = resources.srvDescHeapStart.ptr + (DescriptorHeapOffsets::SRV_GEOMETRY_DATA * resources.srvDescHeapEntrySize);
+ d3d.device->CreateShaderResourceView(resources.geometryDataRB, &srvDesc, handle);
return true;
}
@@ -1403,7 +1560,7 @@ namespace Graphics
bool CreateSceneInstancesBuffer(Globals& d3d, Resources& resources, const std::vector& instances)
{
// Create the TLAS instance upload buffer resource
- UINT size = static_cast