The main goal of the Mesh shader is to increase the flexibility and performance of thegeometry pipeline. Mesh shaders subsume most aspects of Vertex and Geometry shaders intoone shader stage by processing batches of vertices and primitives before the rasterizer. They are additionally capable of amplifying and culling geometry.
There will additionally be a new Amplification shader stage, which enables current tessellation scenarios.Eventually the entire vertex pipeline will be two stages: an Amplification shader followed by a Mesh shader.
In recent years developers proposed geometry pipelines that process index buffers with acompute shader before vertex shaders. This compels usto revisit geometry pipelines and move towards accommodating this type of pipeline as part of the API.
Mesh shaders require developers to refactor their geometrytoolchain by separating geometry into compressed batches (meshlets) thatfit well into the Mesh shader model and turn off hardware index readsaltogether. Titles can convert meshlets into regular index buffers forvertex shader fallback.
A Mesh shader is a new type of shader that combines vertex and primitive processing.VS, HS, DS, and GS shader stages are replaced with Amplification Shader and Mesh Shader.Roughly, Mesh shaders replace VS+GS or DS+GS shaders and Amplification shaders replace VS+HS.
The Amplification shader allows users to decide how many Mesh shader groups to runand passes data to those groups.The intent for the Amplification shader is to eventually replace hardware tessellators.
Mesh shader threadgroups have access to groupshared memory like compute shaders.Shader writers are free to declare groupshared arrays up to the maximumallowed size. The groupshared limit for mesh shaders is reduced to 28k, slightly smallerthan the 32k limit which applies to compute shaders.
Mesh shader outputs comprise vertices and primitives. Differentfrom, say, a vertex shader, there is no implicit association of athreadgroup thread and an output vertex or a primitive. For example, athreadgroup may have 3 threads, each thread outputting 6 vertices and 2primitives for a total of 18 vertices and 6 primitives per threadgroup.This gives sufficient freedom to the shader writer to balance the ALUloads in threads and avoid wasting lanes.
The number of output vertices and primitives must be specified at runtimeby the shader by calling SetMeshOutputCounts,otherwise, the shader will not output any mesh,and writing to vertex or primitive attributes or indicesresults in undefined behavior.
In the case that there is a Pixel shader, in the Mesh shader to Pixel shader pipeline,signature elements are no longer aligned between stages by packing location.Instead, Pixel shader input elements must be matched to output elementsfrom the mesh shader by the semantic name, system value type and semantic index.Attributes that are uniform for a primitive,including system value attributes such as:SV_RenderTargetArrayIndex, SV_ViewportArrayIndex, and SV_CullPrimitive,should now be placed in the attribute structure for primitives,rather than the attribute structure for vertices.Attributes used with GetAttributeAtVertexshould be placed in the attribute structure for vertices,and marked with the nointerpolation modifier. In the case that there is a pixel shader input aligned with a mesh shader per-primitive output,and that attribute is not marked as nointerpolation, the driver will still force the attribute to nointerpolatein order to maintain functionality.
Vertex order is determined by the order of the vertex indices for the primitive.The first vertex is the one referenced by the first index in this vector.When the term provoking vertex is used in other feature descriptions,for the mesh shader pipeline, it means the first vertex.This order applies to the component order of SV_Barycentricsand the index passed to GetAttributeAtVertex.If a nointerpolation attribute in the vertices is read directly in the pixel shader,its value comes from the first vertex specified in the vertex indices for this primitive.Primitive attributes do not require any interpolation modifiers to be specified, nor do they have any effect.
To support the ViewID feature of D3D12, there is a system value input SV_ViewIDwhich specifies the current view being computed by the Mesh shader group.The model exposed to the user is the same as with the shader stages of the existing vertex pipeline.You write your mesh shader as if the group is computing only one view, with some constraints.
In order to enable single-pass muti-view implementation for Tier 3 View Instancing,certain constraints will be enforced on what is allowed to be dependent on SV_ViewID.This ensure that vertices and primitives produced for each view align across views,while vertex and primitive attributes can vary per-view.A new primitive attribute SV_CullPrimitive allows you to cull primitiveson a per-view basis, which translates to a view mask on multi-view implementations.The compiler will track SV_ViewID dependent attributes and groupshared memoryso that the D3D runtime can validate the shader againstthe attribute and groupshared limitsfor the view count on Tier 3 View Instancing.More details in the SV_ViewID section.
Programmable amplification support is done using an Amplification shader. Thisshader stage can be used to replace the hardware tessellator. The ideais to be able to launch a variable number of children Mesh shaders toenable amplification factors larger than a single Mesh shader cansupport. An Amplification shader is bound with a Mesh shader in a Pipeline StateObject, and therefore an Amplification shader can only launch one type of a childMesh shader, many instances of it, if needed.
Each child Mesh shader has access to the data structure created by theparent Amplification shader. This is not entirely dissimilar to how currentlyper-patch attributes get passed into the domain shader.
Pixel shader invocations (any shader invocations) may not execute in order. Rather only the resulting rendertarget/depth/stencil buffer accesses must honor any specified ordering, over any given target location, e.g. rendertaret/depth sample. If the application needs UAV accesses during pixel shader invocation to be ordered over any given target location, the RasterizerOrderedViews (ROV) feature needs to be used in the pixel shader.
If the pipeline state includes a mesh shader but no amplification shader, the outputs of each thread group are retired by the rasterizer sequentially with respect to neighboring thread groups. So all primitives generated by the DispatchMesh() API call are retired by the rasterizer in a fully defined order. The ordering of thread groups is defined as:
If the pipeline includes an amplification shader, only partial ordering of rasterization output is guaranteed. Individual amplification shader thread groups retire their rasterized output, produced via child mesh shader thread groups, in sequential order with respect to neighboring amplification shader thread groups. The ordering of thread groups is the (x,y,z) order described above, applied to the amplification shader stage in this case.
However, the child mesh shader thread groups produced by any individual amplification shader thread group may retire rasterized output in any order with respect to other children of the parent amplification shader thread group.
In order to track the number of invocations of Mesh Shaders and Amplification shaders in a given program, as well as the number of primitives output by a mesh shader,applications can use the Pipeline Statistics feature.
This means versioning the D3D12_QUERY_DATA_PIPELINE_STATISTICS struct to include ASInvocations and MSInvocations - to track the number of amplification/mesh shaders that are invoked- and MSPrimitives to track the number of primitives output by a mesh shader.This exists as follows:
Since some drivers have Mesh Shader support without support in pipeline statistics, it is important to use CheckFeatureSupport to query D3D12_FEATURE_DATA_D3D12_OPTIONS8 for MeshShaderPipelineStatsSupported before using Pipeline Statistics to evaluate mesh shader or amplification shader data.
At the DDI level, drivers do not return UNKNOWN. At the application level, a capability value of UNKNOWNmay be returned if the capability can not be queried from the driver. This could be relevant when, for example,using a driver which supports the mesh shader feature but has not been updated to support the 0086 DDI version.
Some target hardware has limitations where mesh shaders that write to the SV_RenderTargetArrayIndex attribute are limited to outputting a 3 bit value. This allows for selecting between at most 8 render targets. On such hardware, outputting an index value larger than 7 produces undefined behavior.
Note that D3D12_FEATURE_DATA_D3D12_OPTIONS9 was not present in the first Windows OS release that introduced support for mesh shaders. Applications running on an operating system version where D3D12_FEATURE_DATA_D3D12_OPTIONS9 is not available must assume that mesh shaders only support a maximum render target array index of 7 and going beyond this produces undefined results.
Drivers report their mesh-shader-render-target-array-index by setting the MeshShaderSupportsFullRangeRenderTargetArrayIndexfield accordingly. On drivers which set the field to false or are of an older DDI version, the capability value is assumedto be false.
To ensure that mesh and amplification shaders are supported, after calling CheckFeatureSupport, check that the MeshShaderTier is not D3D12_MESH_SHADER_TIER_NOT_SUPPORTED. The following code demonstrates this:
DispatchMesh launches the threadgroups for the amplification shader or the mesh shader in a case where no amplification shader is attached. Each of the three thread group counts must be less than 64k and the product of ThreadGroupCountX*ThreadGroupCountY*ThreadGroupCountZ must not exceed 2^22.
This is a mandatory function attribute on the entry point of the Mesh shader.It specifies the launch size of the threadgroup of the Mesh shader, just like with compute shader.The number of threads can not exceed X * Y * Z = 128.
ffe2fad269