/ chapters / shaders.txt
shaders.txt
   1  // Copyright (c) 2015-2019 Khronos Group. This work is licensed under a
   2  // Creative Commons Attribution 4.0 International License; see
   3  // http://creativecommons.org/licenses/by/4.0/
   4  
   5  [[shaders]]
   6  = Shaders
   7  
   8  A shader specifies programmable operations that execute for each vertex,
   9  control point, tessellated vertex, primitive, fragment, or workgroup in the
  10  corresponding stage(s) of the graphics and compute pipelines.
  11  
  12  Graphics pipelines include vertex shader execution as a result of
  13  <<drawing,primitive assembly>>, followed, if enabled, by tessellation
  14  control and evaluation shaders operating on <<drawing-patch-lists,patches>>,
  15  geometry shaders, if enabled, operating on primitives, and fragment shaders,
  16  if present, operating on fragments generated by <<primsrast,Rasterization>>.
  17  In this specification, vertex, tessellation control, tessellation evaluation
  18  and geometry shaders are collectively referred to as vertex processing
  19  stages and occur in the logical pipeline before rasterization.
  20  The fragment shader occurs logically after rasterization.
  21  
  22  Only the compute shader stage is included in a compute pipeline.
  23  Compute shaders operate on compute invocations in a workgroup.
  24  
  25  Shaders can: read from input variables, and read from and write to output
  26  variables.
  27  Input and output variables can: be used to transfer data between shader
  28  stages, or to allow the shader to interact with values that exist in the
  29  execution environment.
  30  Similarly, the execution environment provides constants that describe
  31  capabilities.
  32  
  33  Shader variables are associated with execution environment-provided inputs
  34  and outputs using _built-in_ decorations in the shader.
  35  The available decorations for each stage are documented in the following
  36  subsections.
  37  
  38  
  39  [[shader-modules]]
  40  == Shader Modules
  41  
  42  [open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles']
  43  --
  44  
  45  _Shader modules_ contain _shader code_ and one or more entry points.
  46  Shaders are selected from a shader module by specifying an entry point as
  47  part of <<pipelines,pipeline>> creation.
  48  The stages of a pipeline can: use shaders that come from different modules.
  49  The shader code defining a shader module must: be in the SPIR-V format, as
  50  described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.
  51  
  52  Shader modules are represented by sname:VkShaderModule handles:
  53  
  54  include::{generated}/api/handles/VkShaderModule.txt[]
  55  
  56  --
  57  
  58  [open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos']
  59  --
  60  
  61  To create a shader module, call:
  62  
  63  include::{generated}/api/protos/vkCreateShaderModule.txt[]
  64  
  65    * pname:device is the logical device that creates the shader module.
  66    * pname:pCreateInfo is a pointer to an instance of the
  67      sname:VkShaderModuleCreateInfo structure.
  68    * pname:pAllocator controls host memory allocation as described in the
  69      <<memory-allocation, Memory Allocation>> chapter.
  70    * pname:pShaderModule points to a slink:VkShaderModule handle in which the
  71      resulting shader module object is returned.
  72  
  73  Once a shader module has been created, any entry points it contains can: be
  74  used in pipeline shader stages as described in <<pipelines-compute,Compute
  75  Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.
  76  
  77  ifdef::VK_NV_glsl_shader[]
  78  If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will
  79  be generated and the compile log will be reported back to the application by
  80  `<<VK_EXT_debug_report>>` if enabled.
  81  endif::VK_NV_glsl_shader[]
  82  
  83  include::{generated}/validity/protos/vkCreateShaderModule.txt[]
  84  --
  85  
  86  [open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs']
  87  --
  88  
  89  The sname:VkShaderModuleCreateInfo structure is defined as:
  90  
  91  include::{generated}/api/structs/VkShaderModuleCreateInfo.txt[]
  92  
  93    * pname:sType is the type of this structure.
  94    * pname:pNext is `NULL` or a pointer to an extension-specific structure.
  95    * pname:flags is reserved for future use.
  96    * pname:codeSize is the size, in bytes, of the code pointed to by
  97      pname:pCode.
  98    * pname:pCode points to code that is used to create the shader module.
  99      The type and format of the code is determined from the content of the
 100      memory addressed by pname:pCode.
 101  
 102  .Valid Usage
 103  ****
 104    * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]]
 105      pname:codeSize must: be greater than 0
 106  ifndef::VK_NV_glsl_shader[]
 107    * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]]
 108      pname:codeSize must: be a multiple of 4
 109    * [[VUID-VkShaderModuleCreateInfo-pCode-01087]]
 110      pname:pCode must: point to valid SPIR-V code, formatted and packed as
 111      described by the <<spirv-spec,Khronos SPIR-V Specification>>
 112    * [[VUID-VkShaderModuleCreateInfo-pCode-01088]]
 113      pname:pCode must: adhere to the validation rules described by the
 114      <<spirvenv-module-validation, Validation Rules within a Module>> section
 115      of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
 116  endif::VK_NV_glsl_shader[]
 117  ifdef::VK_NV_glsl_shader[]
 118    * [[VUID-VkShaderModuleCreateInfo-pCode-01376]]
 119      If pname:pCode points to SPIR-V code, pname:codeSize must: be a multiple
 120      of 4
 121    * [[VUID-VkShaderModuleCreateInfo-pCode-01377]]
 122      pname:pCode must: point to either valid SPIR-V code, formatted and
 123      packed as described by the <<spirv-spec,Khronos SPIR-V Specification>>
 124      or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl`
 125      extension specification
 126    * [[VUID-VkShaderModuleCreateInfo-pCode-01378]]
 127      If pname:pCode points to SPIR-V code, that code must: adhere to the
 128      validation rules described by the <<spirvenv-module-validation,
 129      Validation Rules within a Module>> section of the
 130      <<spirvenv-capabilities,SPIR-V Environment>> appendix
 131    * [[VUID-VkShaderModuleCreateInfo-pCode-01379]]
 132      If pname:pCode points to GLSL code, it must: be valid GLSL code written
 133      to the `GL_KHR_vulkan_glsl` GLSL extension specification
 134  endif::VK_NV_glsl_shader[]
 135    * [[VUID-VkShaderModuleCreateInfo-pCode-01089]]
 136      pname:pCode must: declare the code:Shader capability for SPIR-V code
 137    * [[VUID-VkShaderModuleCreateInfo-pCode-01090]]
 138      pname:pCode must: not declare any capability that is not supported by
 139      the API, as described by the <<spirvenv-module-validation,
 140      Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
 141      Environment>> appendix
 142    * [[VUID-VkShaderModuleCreateInfo-pCode-01091]]
 143      If pname:pCode declares any of the capabilities listed as optional: in
 144      the <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, the
 145      corresponding feature(s) must: be enabled.
 146  ****
 147  
 148  include::{generated}/validity/structs/VkShaderModuleCreateInfo.txt[]
 149  --
 150  
 151  [open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags']
 152  --
 153  include::{generated}/api/flags/VkShaderModuleCreateFlags.txt[]
 154  
 155  tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is
 156  currently reserved for future use.
 157  --
 158  
 159  ifdef::VK_EXT_validation_cache[]
 160  include::VK_EXT_validation_cache/shader-module-validation-cache.txt[]
 161  endif::VK_EXT_validation_cache[]
 162  
 163  
 164  [open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos']
 165  --
 166  
 167  To destroy a shader module, call:
 168  
 169  include::{generated}/api/protos/vkDestroyShaderModule.txt[]
 170  
 171    * pname:device is the logical device that destroys the shader module.
 172    * pname:shaderModule is the handle of the shader module to destroy.
 173    * pname:pAllocator controls host memory allocation as described in the
 174      <<memory-allocation, Memory Allocation>> chapter.
 175  
 176  A shader module can: be destroyed while pipelines created using its shaders
 177  are still in use.
 178  
 179  .Valid Usage
 180  ****
 181    * [[VUID-vkDestroyShaderModule-shaderModule-01092]]
 182      If sname:VkAllocationCallbacks were provided when pname:shaderModule was
 183      created, a compatible set of callbacks must: be provided here
 184    * [[VUID-vkDestroyShaderModule-shaderModule-01093]]
 185      If no sname:VkAllocationCallbacks were provided when pname:shaderModule
 186      was created, pname:pAllocator must: be `NULL`
 187  ****
 188  
 189  include::{generated}/validity/protos/vkDestroyShaderModule.txt[]
 190  --
 191  
 192  
 193  [[shaders-execution]]
 194  == Shader Execution
 195  
 196  At each stage of the pipeline, multiple invocations of a shader may: execute
 197  simultaneously.
 198  Further, invocations of a single shader produced as the result of different
 199  commands may: execute simultaneously.
 200  The relative execution order of invocations of the same shader type is
 201  undefined:.
 202  Shader invocations may: complete in a different order than that in which the
 203  primitives they originated from were drawn or dispatched by the application.
 204  However, fragment shader outputs are written to attachments in
 205  <<primrast-order,rasterization order>>.
 206  
 207  The relative execution order of invocations of different shader types is
 208  largely undefined:.
 209  However, when invoking a shader whose inputs are generated from a previous
 210  pipeline stage, the shader invocations from the previous stage are
 211  guaranteed to have executed far enough to generate input values for all
 212  required inputs.
 213  
 214  
 215  [[shaders-execution-memory-ordering]]
 216  == Shader Memory Access Ordering
 217  
 218  The order in which image or buffer memory is read or written by shaders is
 219  largely undefined:.
 220  For some shader types (vertex, tessellation evaluation, and in some cases,
 221  fragment), even the number of shader invocations that may: perform loads and
 222  stores is undefined:.
 223  
 224  In particular, the following rules apply:
 225  
 226    * <<shaders-vertex-execution,Vertex>> and
 227      <<shaders-tessellation-evaluation-execution,tessellation evaluation>>
 228      shaders will be invoked at least once for each unique vertex, as defined
 229      in those sections.
 230    * <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or
 231      more times, as defined in that section.
 232    * The relative execution order of invocations of the same shader type is
 233      undefined:.
 234      A store issued by a shader when working on primitive B might complete
 235      prior to a store for primitive A, even if primitive A is specified prior
 236      to primitive B. This applies even to fragment shaders; while fragment
 237      shader outputs are always written to the framebuffer in
 238      <<primrast-order, rasterization order>>, stores executed by fragment
 239      shader invocations are not.
 240    * The relative execution order of invocations of different shader types is
 241      largely undefined:.
 242  
 243  [NOTE]
 244  .Note
 245  ====
 246  The above limitations on shader invocation order make some forms of
 247  synchronization between shader invocations within a single set of primitives
 248  unimplementable.
 249  For example, having one invocation poll memory written by another invocation
 250  assumes that the other invocation has been launched and will complete its
 251  writes in finite time.
 252  ====
 253  
 254  ifdef::VK_KHR_vulkan_memory_model[]
 255  
 256  The <<memory-model,Memory Model>> appendix defines the terminology and rules
 257  for how to correctly communicate between shader invocations, such as when a
 258  write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes
 259  a <<memory-model-access-data-race,Data Race>>.
 260  
 261  Applications must: not cause a data race.
 262  
 263  endif::VK_KHR_vulkan_memory_model[]
 264  
 265  ifndef::VK_KHR_vulkan_memory_model[]
 266  
 267  Stores issued to different memory locations within a single shader
 268  invocation may: not be visible to other invocations, or may: not become
 269  visible in the order they were performed.
 270  
 271  The code:OpMemoryBarrier instruction can: be used to provide stronger
 272  ordering of reads and writes performed by a single invocation.
 273  code:OpMemoryBarrier guarantees that any memory transactions issued by the
 274  shader invocation prior to the instruction complete prior to the memory
 275  transactions issued after the instruction.
 276  Memory barriers are needed for algorithms that require multiple invocations
 277  to access the same memory and require the operations to be performed in a
 278  partially-defined relative order.
 279  For example, if one shader invocation does a series of writes, followed by
 280  an code:OpMemoryBarrier instruction, followed by another write, then the
 281  results of the series of writes before the barrier become visible to other
 282  shader invocations at a time earlier or equal to when the results of the
 283  final write become visible to those invocations.
 284  In practice it means that another invocation that sees the results of the
 285  final write would also see the previous writes.
 286  Without the memory barrier, the final write may: be visible before the
 287  previous writes.
 288  
 289  Writes that are the result of shader stores through a variable decorated
 290  with code:Coherent automatically have available writes to the same buffer,
 291  buffer view, or image view made visible to them, and are themselves
 292  automatically made available to access by the same buffer, buffer view, or
 293  image view.
 294  Reads that are the result of shader loads through a variable decorated with
 295  code:Coherent automatically have available writes to the same buffer, buffer
 296  view, or image view made visible to them.
 297  The order that coherent writes to different locations become available is
 298  undefined:, unless enforced by a memory barrier instruction or other memory
 299  dependency.
 300  
 301  [NOTE]
 302  .Note
 303  ====
 304  Explicit memory dependencies must: still be used to guarantee availability
 305  and visibility for access via other buffers, buffer views, or image views.
 306  ====
 307  
 308  The built-in atomic memory transaction instructions can: be used to read and
 309  write a given memory address atomically.
 310  While built-in atomic functions issued by multiple shader invocations are
 311  executed in undefined: order relative to each other, these functions perform
 312  both a read and a write of a memory address and guarantee that no other
 313  memory transaction will write to the underlying memory between the read and
 314  write.
 315  Atomic operations ensure automatic availability and visibility for writes
 316  and reads in the same way as those to code:Coherent variables.
 317  
 318  [NOTE]
 319  .Note
 320  ====
 321  Memory accesses performed on different resource descriptors with the same
 322  memory backing may: not be well-defined even with the code:Coherent
 323  decoration or via atomics, due to things such as image layouts or ownership
 324  of the resource - as described in the <<synchronization, Synchronization and
 325  Cache Control>> chapter.
 326  ====
 327  
 328  [NOTE]
 329  .Note
 330  ====
 331  Atomics allow shaders to use shared global addresses for mutual exclusion or
 332  as counters, among other uses.
 333  ====
 334  
 335  endif::VK_KHR_vulkan_memory_model[]
 336  
 337  [[shaders-inputs]]
 338  == Shader Inputs and Outputs
 339  
 340  Data is passed into and out of shaders using variables with input or output
 341  storage class, respectively.
 342  User-defined inputs and outputs are connected between stages by matching
 343  their code:Location decorations.
 344  Additionally, data can: be provided by or communicated to special functions
 345  provided by the execution environment using code:BuiltIn decorations.
 346  
 347  In many cases, the same code:BuiltIn decoration can: be used in multiple
 348  shader stages with similar meaning.
 349  The specific behavior of variables decorated as code:BuiltIn is documented
 350  in the following sections.
 351  
 352  ifdef::VK_NV_mesh_shader[]
 353  [[shaders-task]]
 354  == Task Shaders
 355  
 356  Task shaders operate in conjunction with the mesh shaders to produce a
 357  collection of primitives that will be processed by subsequent stages of the
 358  graphics pipeline.
 359  Its primary purpose is to create a variable amount of subsequent mesh shader
 360  invocations.
 361  
 362  Task shaders are invoked via the execution of the
 363  <<drawing-mesh-shading,programmable mesh shading>> pipeline.
 364  
 365  The task shader has no fixed-function inputs other than variables
 366  identifying the specific workgroup and invocation.
 367  The only fixed output of the task shader is a task count, identifying the
 368  number of mesh shader workgroups to create.
 369  The task shader can write additional outputs to task memory, which can be
 370  read by all of the mesh shader workgroups it created.
 371  
 372  === Task Shader Execution
 373  
 374  Task workloads are formed from groups of work items called workgroups and
 375  processed by the task shader in the current graphics pipeline.
 376  A workgroup is a collection of shader invocations that execute the same
 377  shader, potentially in parallel.
 378  Task shaders execute in _global workgroups_ which are divided into a number
 379  of _local workgroups_ with a size that can: be set by assigning a value to
 380  the code:LocalSize execution mode or via an object decorated by the
 381  code:WorkgroupSize decoration.
 382  An invocation within a local workgroup can: share data with other members of
 383  the local workgroup through shared variables and issue memory and control
 384  flow barriers to synchronize with other members of the local workgroup.
 385  
 386  [[shaders-mesh]]
 387  == Mesh Shaders
 388  
 389  Mesh shaders operate in workgroups to produce a collection of primitives
 390  that will be processed by subsequent stages of the graphics pipeline.
 391  Each workgroup emits zero or more output primitives and the group of
 392  vertices and their associated data required for each output primitive.
 393  
 394  Mesh shaders are invoked via the execution of the
 395  <<drawing-mesh-shading,programmable mesh shading>> pipeline.
 396  
 397  The only inputs available to the mesh shader are variables identifying the
 398  specific workgroup and invocation and, if applicable, any outputs written to
 399  task memory by the task shader that spawned the mesh shader's workgroup.
 400  The mesh shader can operate without a task shader as well.
 401  
 402  The invocations of the mesh shader workgroup write an output mesh,
 403  comprising a set of primitives with per-primitive attributes, a set of
 404  vertices with per-vertex attributes, and an array of indices identifying the
 405  mesh vertices that belong to each primitive.
 406  The primitives of this mesh are then processed by subsequent graphics
 407  pipeline stages, where the outputs of the mesh shader form an interface with
 408  the fragment shader.
 409  
 410  === Mesh Shader Execution
 411  
 412  Mesh workloads are formed from groups of work items called workgroups and
 413  processed by the mesh shader in the current graphics pipeline.
 414  A workgroup is a collection of shader invocations that execute the same
 415  shader, potentially in parallel.
 416  Mesh shaders execute in _global workgroups_ which are divided into a number
 417  of _local workgroups_ with a size that can: be set by assigning a value to
 418  the code:LocalSize execution mode or via an object decorated by the
 419  code:WorkgroupSize decoration.
 420  An invocation within a local workgroup can: share data with other members of
 421  the local workgroup through shared variables and issue memory and control
 422  flow barriers to synchronize with other members of the local workgroup.
 423  
 424  The _global workgroups_ may be generated explcitly via the API, or
 425  implicitly through the task shader's work creation mechanism.
 426  endif::VK_NV_mesh_shader[]
 427  
 428  [[shaders-vertex]]
 429  == Vertex Shaders
 430  
 431  Each vertex shader invocation operates on one vertex and its associated
 432  <<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
 433  associated data.
 434  ifndef::VK_NV_mesh_shader[]
 435  Graphics pipelines must: include a vertex shader, and the vertex shader
 436  stage is always the first shader stage in the graphics pipeline.
 437  endif::VK_NV_mesh_shader[]
 438  ifdef::VK_NV_mesh_shader[]
 439  Graphics pipelines using primitive shading must: include a vertex shader,
 440  and the vertex shader stage is always the first shader stage in the graphics
 441  pipeline.
 442  endif::VK_NV_mesh_shader[]
 443  
 444  [[shaders-vertex-execution]]
 445  === Vertex Shader Execution
 446  
 447  A vertex shader must: be executed at least once for each vertex specified by
 448  a draw command.
 449  ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
 450  If the subpass includes multiple views in its view mask, the shader may: be
 451  invoked separately for each view.
 452  endif::VK_VERSION_1_1,VK_KHR_multiview[]
 453  During execution, the shader is presented with the index of the vertex and
 454  instance for which it has been invoked.
 455  Input variables declared in the vertex shader are filled by the
 456  implementation with the values of vertex attributes associated with the
 457  invocation being executed.
 458  
 459  If the same vertex is specified multiple times in a draw command (e.g. by
 460  including the same index value multiple times in an index buffer) the
 461  implementation may: reuse the results of vertex shading if it can statically
 462  determine that the vertex shader invocations will produce identical results.
 463  
 464  [NOTE]
 465  .Note
 466  ====
 467  It is implementation-dependent when and if results of vertex shading are
 468  reused, and thus how many times the vertex shader will be executed.
 469  This is true also if the vertex shader contains stores or atomic operations
 470  (see <<features-vertexPipelineStoresAndAtomics,
 471  pname:vertexPipelineStoresAndAtomics>>).
 472  ====
 473  
 474  
 475  [[shaders-tessellation-control]]
 476  == Tessellation Control Shaders
 477  
 478  The tessellation control shader is used to read an input patch provided by
 479  the application and to produce an output patch.
 480  Each tessellation control shader invocation operates on an input patch
 481  (after all control points in the patch are processed by a vertex shader) and
 482  its associated data, and outputs a single control point of the output patch
 483  and its associated data, and can: also output additional per-patch data.
 484  The input patch is sized according to the pname:patchControlPoints member of
 485  slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.
 486  The size of the output patch is controlled by the code:OpExecutionMode
 487  code:OutputVertices specified in the tessellation control or tessellation
 488  evaluation shaders, which must: be specified in at least one of the shaders.
 489  The size of the input and output patches must: each be greater than zero and
 490  less than or equal to
 491  sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.
 492  
 493  
 494  [[shaders-tessellation-control-execution]]
 495  === Tessellation Control Shader Execution
 496  
 497  A tessellation control shader is invoked at least once for each _output_
 498  vertex in a patch.
 499  ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
 500  If the subpass includes multiple views in its view mask, the shader may: be
 501  invoked separately for each view.
 502  endif::VK_VERSION_1_1,VK_KHR_multiview[]
 503  
 504  Inputs to the tessellation control shader are generated by the vertex
 505  shader.
 506  Each invocation of the tessellation control shader can: read the attributes
 507  of any incoming vertices and their associated data.
 508  The invocations corresponding to a given patch execute logically in
 509  parallel, with undefined: relative execution order.
 510  However, the code:OpControlBarrier instruction can: be used to provide
 511  limited control of the execution order by synchronizing invocations within a
 512  patch, effectively dividing tessellation control shader execution into a set
 513  of phases.
 514  Tessellation control shaders will read undefined: values if one invocation
 515  reads a per-vertex or per-patch attribute written by another invocation at
 516  any point during the same phase, or if two invocations attempt to write
 517  different values to the same per-patch output in a single phase.
 518  
 519  
 520  [[shaders-tessellation-evaluation]]
 521  == Tessellation Evaluation Shaders
 522  
 523  The Tessellation Evaluation Shader operates on an input patch of control
 524  points and their associated data, and a single input barycentric coordinate
 525  indicating the invocation's relative position within the subdivided patch,
 526  and outputs a single vertex and its associated data.
 527  
 528  
 529  [[shaders-tessellation-evaluation-execution]]
 530  === Tessellation Evaluation Shader Execution
 531  
 532  A tessellation evaluation shader is invoked at least once for each unique
 533  vertex generated by the tessellator.
 534  ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
 535  If the subpass includes multiple views in its view mask, the shader may: be
 536  invoked separately for each view.
 537  endif::VK_VERSION_1_1,VK_KHR_multiview[]
 538  
 539  
 540  [[shaders-geometry]]
 541  == Geometry Shaders
 542  
 543  The geometry shader operates on a group of vertices and their associated
 544  data assembled from a single input primitive, and emits zero or more output
 545  primitives and the group of vertices and their associated data required for
 546  each output primitive.
 547  
 548  
 549  [[shaders-geometry-execution]]
 550  === Geometry Shader Execution
 551  
 552  A geometry shader is invoked at least once for each primitive produced by
 553  the tessellation stages, or at least once for each primitive generated by
 554  <<drawing,primitive assembly>> when tessellation is not in use.
 555  A shader can request that the geometry shader runs multiple
 556  <<geometry-invocations, instances>>.
 557  A geometry shader is invoked at least once for each instance.
 558  ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
 559  If the subpass includes multiple views in its view mask, the shader may: be
 560  invoked separately for each view.
 561  endif::VK_VERSION_1_1,VK_KHR_multiview[]
 562  
 563  
 564  [[shaders-fragment]]
 565  == Fragment Shaders
 566  
 567  Fragment shaders are invoked as the result of rasterization in a graphics
 568  pipeline.
 569  Each fragment shader invocation operates on a single fragment and its
 570  associated data.
 571  With few exceptions, fragment shaders do not have access to any data
 572  associated with other fragments and are considered to execute in isolation
 573  of fragment shader invocations associated with other fragments.
 574  
 575  
 576  [[shaders-fragment-execution]]
 577  === Fragment Shader Execution
 578  
 579  For each fragment generated by rasterization, a fragment shader may: be
 580  invoked.
 581  A fragment shader must: not be invoked if the <<fragops-early,Early
 582  Per-Fragment Tests>> cause it to have no coverage.
 583  ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
 584  If the subpass includes multiple views in its view mask, the shader may: be
 585  invoked separately for each view.
 586  endif::VK_VERSION_1_1,VK_KHR_multiview[]
 587  
 588  Furthermore, if it is determined that a fragment generated as the result of
 589  rasterizing a first primitive will have its outputs entirely overwritten by
 590  a fragment generated as the result of rasterizing a second primitive in the
 591  same subpass, and the fragment shader used for the fragment has no other
 592  side effects, then the fragment shader may: not be executed for the fragment
 593  from the first primitive.
 594  
 595  Relative ordering of execution of different fragment shader invocations is
 596  not defined.
 597  
 598  For each fragment generated by a primitive, the number of times the fragment
 599  shader is invoked is implementation-dependent, but must: obey the following
 600  constraints:
 601  
 602    * Each covered sample is included in a single fragment shader invocation.
 603    * When sample shading is not enabled, there is at least one fragment
 604      shader invocation.
 605    * When sample shading is enabled, the minimum number of fragment shader
 606      invocations is as defined in
 607  ifdef::VK_NV_shading_rate_image[]
 608      <<primsrast-shading-rate-image,Shading Rate Image>> and
 609  endif::VK_NV_shading_rate_image[]
 610      <<primsrast-sampleshading,Sample Shading>>.
 611  
 612  When there is more than one fragment shader invocation per fragment, the
 613  association of samples to invocations is implementation-dependent.
 614  
 615  In addition to the conditions outlined above for the invocation of a
 616  fragment shader, a fragment shader invocation may: be produced as a _helper
 617  invocation_.
 618  A helper invocation is a fragment shader invocation that is created solely
 619  for the purposes of evaluating derivatives for use in non-helper fragment
 620  shader invocations.
 621  Stores and atomics performed by helper invocations must: not have any effect
 622  on memory, and values returned by atomic instructions in helper invocations
 623  are undefined:.
 624  
 625  ifdef::VK_EXT_fragment_density_map[]
 626  If the render pass has a fragment density map attachment, more than one
 627  fragment shader invocation may: be invoked for each covered sample.
 628  Stores and atomics performed by these additional invocations have the normal
 629  effect.
 630  Such additional invocations are only produced if
 631  sname:VkPhysicalDeviceFragmentDensityMapPropertiesEXT::pname:fragmentDensityInvocations
 632  is ename:VK_TRUE.
 633  
 634  [NOTE]
 635  .Note
 636  ====
 637  Implementations may: generate these additional fragment shader invocations
 638  in order to make transitions between fragment areas with different fragment
 639  densities more smooth.
 640  ====
 641  endif::VK_EXT_fragment_density_map[]
 642  
 643  [[shaders-fragment-earlytest]]
 644  === Early Fragment Tests
 645  
 646  An explicit control is provided to allow fragment shaders to enable early
 647  fragment tests.
 648  If the fragment shader specifies the code:EarlyFragmentTests
 649  code:OpExecutionMode, the per-fragment tests described in
 650  <<fragops-early-mode,Early Fragment Test Mode>> are performed prior to
 651  fragment shader execution.
 652  Otherwise, they are performed after fragment shader execution.
 653  
 654  ifdef::VK_EXT_post_depth_coverage[]
 655  [[shaders-fragment-earlytest-postdepthcoverage]]
 656  If the fragment shader additionally specifies the code:PostDepthCoverage
 657  code:OpExecutionMode, the value of a variable decorated with the
 658  <<interfaces-builtin-variables-samplemask,code:SampleMask>> built-in
 659  reflects the coverage after the early fragment tests.
 660  Otherwise, it reflects the coverage before the early fragment tests.
 661  endif::VK_EXT_post_depth_coverage[]
 662  
 663  ifdef::VK_EXT_fragment_shader_interlock[]
 664  
 665  [[shaders-fragment-shader-interlock]]
 666  === Fragment Shader Interlock
 667  
 668  In normal operation, it is possible for more than one fragment shader
 669  invocation to be executed simultaneously for the same pixel if there are
 670  overlapping primitives.
 671  If the <<features-features-fragmentShaderSampleInterlock,
 672  fragmentShaderSampleInterlock>>,
 673  <<features-features-fragmentShaderPixelInterlock,
 674  fragmentShaderPixelInterlock>>, or
 675  <<features-features-fragmentShaderShadingRateInterlock,
 676  fragmentShaderShadingRateInterlock>> features are enabled, it is possible to
 677  define a critical section within the fragment shader that is guaranteed to
 678  not run simultaneously with another fragment shader invocation for the same
 679  sample(s) or pixel(s).
 680  It is also possible to control the relative ordering of execution of these
 681  critical sections across different fragment shader invovations.
 682  
 683  If the <<spirvenv-capabilities-table-fragmentShaderInterlock,
 684  code:FragmentShaderSampleInterlockEXT, code:FragmentShaderPixelInterlockEXT,
 685  or code:FragmentShaderShadingRateInterlockEXT>> capabilities are declared in
 686  the fragment shader, the code:OpBeginInvocationInterlockEXT and
 687  code:OpEndInvocationInterlockEXT instructions must: be used to delimit a
 688  critical section of fragment shader code.
 689  
 690  To ensure each invocation of the critical section is executed in
 691  <<drawing-primitive-order, primitive order>>, declare one of the
 692  code:PixelInterlockOrderedEXT, code:SampleInterlockOrderedEXT, or
 693  code:ShadingRateInterlockOrderedEXT execution modes.
 694  If the order of execution of each invocation of the critical section does
 695  not matter, declare one of the code:PixelInterlockUnorderedEXT,
 696  code:SampleInterlockUnorderedEXT, or code:ShadingRateInterlockUnorderedEXT
 697  execution modes.
 698  
 699  The code:PixelInterlockOrderedEXT and code:PixelInterlockUnorderedEXT
 700  execution modes provide mutual exclusion in the critical section for any
 701  pair of fragments corresponding to the same pixel, or pixels if the fragment
 702  covers more than one pixel.
 703  With sample shading enabled, these execution modes are treated like
 704  code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT
 705  respectively.
 706  
 707  The code:SampleInterlockOrderedEXT and code:SampleInterlockUnorderedEXT
 708  execution modes only provide mutual exclusion for pairs of fragments that
 709  both cover at least one common sample in the same pixel; these are
 710  recommended for performance if shaders use per-sample data structures.
 711  If these execution modes are used in single-sample mode they are treated
 712  like code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT
 713  respectively.
 714  
 715  ifdef::VK_NV_shading_rate_image[]
 716  The code:ShadingRateInterlockOrderedEXT and
 717  code:ShadingRateInterlockUnorderedEXT execution modes provide mutual
 718  exclusion for pairs of fragments that both have at least one common sample
 719  in the same pixel, even if none of the common samples are covered by both
 720  fragments.
 721  With sample shading enabled, these execution modes are treated like
 722  code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT
 723  respectively.
 724  endif::VK_NV_shading_rate_image[]
 725  ifndef::VK_NV_shading_rate_image[]
 726  The code:ShadingRateInterlockOrderedEXT and
 727  code:ShadingRateInterlockUnorderedEXT execution modes are not supported.
 728  endif::VK_NV_shading_rate_image[]
 729  
 730  endif::VK_EXT_fragment_shader_interlock[]
 731  
 732  [[shaders-compute]]
 733  == Compute Shaders
 734  
 735  Compute shaders are invoked via flink:vkCmdDispatch and
 736  flink:vkCmdDispatchIndirect commands.
 737  In general, they have access to similar resources as shader stages executing
 738  as part of a graphics pipeline.
 739  
 740  Compute workloads are formed from groups of work items called workgroups and
 741  processed by the compute shader in the current compute pipeline.
 742  A workgroup is a collection of shader invocations that execute the same
 743  shader, potentially in parallel.
 744  Compute shaders execute in _global workgroups_ which are divided into a
 745  number of _local workgroups_ with a size that can: be set by assigning a
 746  value to the code:LocalSize execution mode or via an object decorated by the
 747  code:WorkgroupSize decoration.
 748  An invocation within a local workgroup can: share data with other members of
 749  the local workgroup through shared variables and issue memory and control
 750  flow barriers to synchronize with other members of the local workgroup.
 751  
 752  
 753  [[shaders-interpolation-decorations]]
 754  == Interpolation Decorations
 755  
 756  Interpolation decorations control the behavior of attribute interpolation in
 757  the fragment shader stage.
 758  Interpolation decorations can: be applied to code:Input storage class
 759  variables in the fragment shader stage's interface, and control the
 760  interpolation behavior of those variables.
 761  
 762  Inputs that could be interpolated can: be decorated by at most one of the
 763  following decorations:
 764  
 765    * code:Flat: no interpolation
 766    * code:NoPerspective: linear interpolation (for
 767      <<line_linear_interpolation,lines>> and
 768      <<triangle_linear_interpolation,polygons>>)
 769  ifdef::NV_VK_fragment_shader_barycentric[]
 770    * code:PerVertexNV: values fetched from shader-specified primitive vertex
 771  endif::NV_VK_fragment_shader_barycentric[]
 772  
 773  Fragment input variables decorated with neither code:Flat nor
 774  code:NoPerspective use perspective-correct interpolation (for
 775  <<line_perspective_interpolation,lines>> and
 776  <<triangle_perspective_interpolation,polygons>>).
 777  
 778  The presence of and type of interpolation is controlled by the above
 779  interpolation decorations as well as the auxiliary decorations code:Centroid
 780  and code:Sample.
 781  
 782  A variable decorated with code:Flat will not be interpolated.
 783  Instead, it will have the same value for every fragment within a triangle.
 784  This value will come from a single <<vertexpostproc-flatshading,provoking
 785  vertex>>.
 786  A variable decorated with code:Flat can: also be decorated with
 787  code:Centroid or code:Sample, which will mean the same thing as decorating
 788  it only as code:Flat.
 789  
 790  For fragment shader input variables decorated with neither code:Centroid nor
 791  code:Sample, the assigned variable may: be interpolated anywhere within the
 792  fragment and a single value may: be assigned to each sample within the
 793  fragment.
 794  
 795  If a fragment shader input is decorated with code:Centroid, a single value
 796  may: be assigned to that variable for all samples in the fragment, but that
 797  value must: be interpolated to a location that lies in both the fragment and
 798  in the primitive being rendered, including any of the fragment's samples
 799  covered by the primitive.
 800  Because the location at which the variable is interpolated may: be different
 801  in neighboring fragments, and derivatives may: be computed by computing
 802  differences between neighboring fragments, derivatives of centroid-sampled
 803  inputs may: be less accurate than those for non-centroid interpolated
 804  variables.
 805  ifdef::VK_NV_shading_rate_image[]
 806  If
 807  slink:VkPipelineViewportShadingRateImageStateCreateInfoNV::pname:shadingRateImageEnable
 808  is enabled, implementations may: estimate derivatives using differencing
 809  without dividing by the distance between adjacent sample locations when the
 810  fragment size is larger than one pixel.
 811  endif::VK_NV_shading_rate_image[]
 812  ifdef::VK_EXT_post_depth_coverage[]
 813  The <<shaders-fragment-earlytest-postdepthcoverage,code:PostDepthCoverage>>
 814  execution mode does not affect the determination of the centroid location.
 815  endif::VK_EXT_post_depth_coverage[]
 816  
 817  If a fragment shader input is decorated with code:Sample, a separate value
 818  must: be assigned to that variable for each covered sample in the fragment,
 819  and that value must: be sampled at the location of the individual sample.
 820  When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the fragment
 821  center must: be used for code:Centroid, code:Sample, and undecorated
 822  attribute interpolation.
 823  
 824  Fragment shader inputs that are signed or unsigned integers, integer
 825  vectors, or any double-precision floating-point type must: be decorated with
 826  code:Flat.
 827  
 828  ifdef::VK_AMD_shader_explicit_vertex_parameter[]
 829  When the `<<VK_AMD_shader_explicit_vertex_parameter>>` device extension is
 830  enabled inputs can: be also decorated with the code:CustomInterpAMD
 831  interpolation decoration, including fragment shader inputs that are signed
 832  or unsigned integers, integer vectors, or any double-precision
 833  floating-point type.
 834  Inputs decorated with code:CustomInterpAMD can: only be accessed by the
 835  extended instruction code:InterpolateAtVertexAMD and allows accessing the
 836  value of the input for individual vertices of the primitive.
 837  endif::VK_AMD_shader_explicit_vertex_parameter[]
 838  
 839  ifdef::VK_NV_fragment_shader_barycentric[]
 840  [[shaders-interpolation-decorations-pervertexnv]]
 841  When the pname:fragmentShaderBarycentric feature is enabled, inputs can: be
 842  also decorated with the code:PerVertexNV interpolation decoration, including
 843  fragment shader inputs that are signed or unsigned integers, integer
 844  vectors, or any double-precision floating-point type.
 845  Inputs decorated with code:PerVertexNV can: only be accessed using an extra
 846  array dimension, where the extra index identifies one of the vertices of the
 847  primitive that produced the fragment.
 848  endif::VK_NV_fragment_shader_barycentric[]
 849  
 850  ifdef::VK_NV_ray_tracing[]
 851  include::VK_NV_ray_tracing/raytracing-shaders.txt[]
 852  endif::VK_NV_ray_tracing[]
 853  
 854  [[shaders-staticuse]]
 855  == Static Use
 856  
 857  A SPIR-V module declares a global object in memory using the code:OpVariable
 858  instruction, which results in a pointer code:x to that object.
 859  A specific entry point in a SPIR-V module is said to _statically use_ that
 860  object if that entry point's call tree contains a function that contains a
 861  memory instruction or image instruction with code:x as an code:id operand.
 862  See the "`Memory Instructions`" and "`Image Instructions`" subsections of
 863  section 3 "`Binary Form`" of the SPIR-V specification for the complete list
 864  of SPIR-V memory instructions.
 865  
 866  Static use is not used to control the behavior of variables with code:Input
 867  and code:Output storage.
 868  The effects of those variables are applied based only on whether they are
 869  present in a shader entry point's interface.
 870  
 871  [[shaders-invocationgroups]]
 872  == Invocation and Derivative Groups
 873  
 874  An _invocation group_ (see the subsection "`Control Flow`" of section 2 of
 875  the SPIR-V specification) for a compute shader is the set of invocations in
 876  a single local workgroup.
 877  For graphics shaders, an invocation group is an implementation-dependent
 878  subset of the set of shader invocations of a given shader stage which are
 879  produced by a single drawing command.
 880  For indirect drawing commands with pname:drawCount greater than one,
 881  invocations from separate draws are in distinct invocation groups.
 882  
 883  [NOTE]
 884  .Note
 885  ====
 886  Because the partitioning of invocations into invocation groups is
 887  implementation-dependent and not observable, applications generally need to
 888  assume the worst case of all invocations in a draw belonging to a single
 889  invocation group.
 890  ====
 891  
 892  A _derivative group_ (see the subsection "`Control Flow`" of section 2 of
 893  the SPIR-V 1.00 Revision 4 specification) is a set of invocations which are
 894  used together to compute a derivative.
 895  ifdef::VK_VERSION_1_1[]
 896  For a fragment shader, a derivative group is generated by a single primitive
 897  (point, line, or triangle) and includes any helper invocations needed to
 898  compute derivatives.
 899  If the pname:subgroupSize field of slink:VkPhysicalDeviceSubgroupProperties
 900  is at least 4, a derivative group for a fragment shader corresponds to a
 901  single subgroup quad.
 902  Otherwise, a derivative group is the set of invocations generated by a
 903  single primitive.
 904  endif::VK_VERSION_1_1[]
 905  ifndef::VK_VERSION_1_1[]
 906  For a fragment shader, a derivative group is the set of invocations
 907  generated by a single primitive.
 908  endif::VK_VERSION_1_1[]
 909  ifdef::VK_NV_compute_shader_derivatives[]
 910  A derivative group for a compute shader is a single local workgroup.
 911  endif::VK_NV_compute_shader_derivatives[]
 912  
 913  Derivative values are undefined: for a sampled image instruction if the
 914  instruction is in flow control that is not uniform across the derivative
 915  group.
 916  
 917  ifdef::VK_VERSION_1_1[]
 918  [[shaders-subgroup]]
 919  == Subgroups
 920  
 921  A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V
 922  1.3 Revision 1 specification) is a set of invocations that can synchronize
 923  and share data with each other efficiently.
 924  An invocation group is partitioned into one or more subgroups.
 925  
 926  Subgroup operations are divided into various categories as described in
 927  elink:VkSubgroupFeatureFlagBits.
 928  
 929  [[shaders-subgroup-basic]]
 930  === Basic Subgroup Operations
 931  
 932  The basic subgroup operations allow two classes of functionality within
 933  shaders
 934  - elect and barrier.
 935  Invocations within a subgroup can: choose a single invocation to perform
 936  some task for the subgroup as a whole using elect.
 937  Invocations within a subgroup can: perform a subgroup barrier to ensure the
 938  ordering of execution or memory accesses within a subgroup.
 939  Barriers can: be performed on buffer memory accesses, code:WorkgroupLocal
 940  memory accesses, and image memory accesses to ensure that any results
 941  written are visible by other invocations within the subgroup.
 942  An code:OpControlBarrier can: also be used to perform a full execution
 943  control barrier.
 944  A full execution control barrier will ensure that each active invocation
 945  within the subgroup reaches a point of execution before any are allowed to
 946  continue.
 947  
 948  [[shaders-subgroup-vote]]
 949  === Vote Subgroup Operations
 950  
 951  The vote subgroup operations allow invocations within a subgroup to compare
 952  values across a subgroup.
 953  The types of votes enabled are:
 954  
 955    * Do all active subgroup invocations agree that an expression is true?
 956    * Do any active subgroup invocations evaluate an expression to true?
 957    * Do all active subgroup invocations have the same value of an expression?
 958  
 959  [NOTE]
 960  .Note
 961  ====
 962  These operations are useful in combination with control flow in that they
 963  allow for developers to check whether conditions match across the subgroup
 964  and choose potentially faster code-paths in these cases.
 965  ====
 966  
 967  [[shaders-subgroup-arithmetic]]
 968  === Arithmetic Subgroup Operations
 969  
 970  The arithmetic subgroup operations allow invocations to perform scan and
 971  reduction operations across a subgroup.
 972  For reduction operations, each invocation in a subgroup will obtain the same
 973  result of these arithmetic operations applied across the subgroup.
 974  For scan operations, each invocation in the subgroup will perform an
 975  inclusive or exclusive scan, cumulatively applying the operation across the
 976  invocations in a subgroup in an implementation-defined order.
 977  The operations supported are add, mul, min, max, and, or, xor.
 978  
 979  [[shaders-subgroup-ballot]]
 980  === Ballot Subgroup Operations
 981  
 982  The ballot subgroup operations allow invocations to perform more complex
 983  votes across the subgroup.
 984  The ballot functionality allows all invocations within a subgroup to provide
 985  a boolean value and get as a result what each invocation provided as their
 986  boolean value.
 987  The broadcast functionality allows values to be broadcast from an invocation
 988  to all other invocations within the subgroup, given that the invocation to
 989  be broadcast from is known at pipeline creation time.
 990  
 991  [[shaders-subgroup-shuffle]]
 992  === Shuffle Subgroup Operations
 993  
 994  The shuffle subgroup operations allow invocations to read values from other
 995  invocations within a subgroup.
 996  
 997  [[shaders-subgroup-shuffle-relative]]
 998  === Shuffle Relative Subgroup Operations
 999  
1000  The shuffle relative subgroup operations allow invocations to read values
1001  from other invocations within the subgroup relative to the current
1002  invocation in the group.
1003  The relative operations supported allow data to be shifted up and down
1004  through the invocations within a subgroup.
1005  
1006  [[shaders-subgroup-clustered]]
1007  === Clustered Subgroup Operations
1008  
1009  The clustered subgroup operations allow invocations to perform an operation
1010  among partitions of a subgroup, such that the operation is only performed
1011  within the subgroup invocations within a partition.
1012  The partitions for clustered subgroup operations are consecutive
1013  power-of-two size groups of invocations and the cluster size must: be known
1014  at pipeline creation time.
1015  The operations supported are add, mul, min, max, and, or, xor.
1016  
1017  [[shaders-subgroup-quad]]
1018  === Quad Subgroup Operations
1019  
1020  The quad subgroup operations allow clusters of 4 invocations (a quad), to
1021  share data efficiently with each other.
1022  ifdef::VK_VERSION_1_1[]
1023  For fragment shaders, if the pname:subgroupSize field of
1024  slink:VkPhysicalDeviceSubgroupProperties is at least 4, each quad
1025  corresponds to one of the groups of four shader invocations used for
1026  <<texture-derivatives,derivatives>>.
1027  endif::VK_VERSION_1_1[]
1028  ifdef::VK_NV_compute_shader_derivatives[]
1029  For compute shaders using the code:DerivativeGroupQuadsNV or
1030  code:DerivativeGroupLinearNV execution modes, each quad corresponds to one
1031  of the groups of four shader invocations used for
1032  <<texture-derivatives-compute,derivatives>>.
1033  The invocations in each quad are ordered to have attribute values of
1034  P~i0,j0~, P~i1,j0~, P~i0,j1~, and P~i1,j1~, respectively.
1035  endif::VK_NV_compute_shader_derivatives[]
1036  
1037  ifdef::VK_NV_shader_subgroup_partitioned[]
1038  
1039  [[shaders-subgroup-partitioned]]
1040  === Partitioned Subgroup Operations
1041  
1042  The partitioned subgroup operations allow a subgroup to partition its
1043  invocations into disjoint subsets and to perform scan and reduce operations
1044  among invocations belonging to the same subset.
1045  The partitions for partitioned subgroup operations are specified by a ballot
1046  operation and can: be computed at runtime.
1047  The operations supported are add, mul, min, max, and, or, xor.
1048  
1049  endif::VK_NV_shader_subgroup_partitioned[]
1050  
1051  endif::VK_VERSION_1_1[]
1052  
1053  ifdef::VK_NV_cooperative_matrix[]
1054  == Cooperative Matrices
1055  
1056  A _cooperative matrix_ type is a SPIR-V type where the storage for and
1057  computations performed on the matrix are spread across a set of invocations
1058  such as a subgroup.
1059  These types give the implementation freedom in how to optimize matrix
1060  multiplies.
1061  
1062  SPIR-V defines the types and instructions, but does not specify rules about
1063  what sizes/combinations are valid, and it is expected that different
1064  implementations may: support different sizes.
1065  
1066  [open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos']
1067  --
1068  
1069  To enumerate the supported cooperative matrix types and operations, call:
1070  
1071  include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]
1072  
1073    * pname:physicalDevice is the physical device.
1074    * pname:pPropertyCount is a pointer to an integer related to the number of
1075      cooperative matrix properties available or queried.
1076    * pname:pProperties is either `NULL` or a pointer to an array of
1077      slink:VkCooperativeMatrixPropertiesNV structures.
1078  
1079  If pname:pProperties is `NULL`, then the number of cooperative matrix
1080  properties available is returned in pname:pPropertyCount.
1081  Otherwise, pname:pPropertyCount must: point to a variable set by the user to
1082  the number of elements in the pname:pProperties array, and on return the
1083  variable is overwritten with the number of structures actually written to
1084  pname:pProperties.
1085  If pname:pPropertyCount is less than the number of cooperative matrix
1086  properties available, at most pname:pPropertyCount structures will be
1087  written.
1088  If pname:pPropertyCount is smaller than the number of cooperative matrix
1089  properties available, ename:VK_INCOMPLETE will be returned instead of
1090  ename:VK_SUCCESS, to indicate that not all the available cooperative matrix
1091  properties were returned.
1092  
1093  include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]
1094  --
1095  
1096  [open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs']
1097  --
1098  
1099  Each sname:VkCooperativeMatrixPropertiesNV structure describes a single
1100  supported combination of types for a matrix multiply/add operation
1101  (code:OpCooperativeMatrixMulAddNV).
1102  The multiply can: be described in terms of the following variables and types
1103  (in SPIR-V pseudocode):
1104  
1105  [source,c]
1106  ---------------------------------------------------
1107      %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
1108      %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
1109      %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
1110      %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize
1111  
1112      %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV
1113  ---------------------------------------------------
1114  
1115  A matrix multiply with these dimensions is known as an _MxNxK_ matrix
1116  multiply.
1117  
1118  The sname:VkCooperativeMatrixPropertiesNV structure is defined as:
1119  
1120  include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.txt[]
1121  
1122    * pname:sType is the type of this structure.
1123    * pname:pNext is `NULL` or a pointer to an extension-specific structure.
1124    * pname:MSize is the number of rows in matrices A, C, and D.
1125    * pname:KSize is the number of columns in matrix A and rows in matrix B.
1126    * pname:NSize is the number of columns in matrices B, C, D.
1127    * pname:AType is the component type of matrix A, of type
1128      elink:VkComponentTypeNV.
1129    * pname:BType is the component type of matrix B, of type
1130      elink:VkComponentTypeNV.
1131    * pname:CType is the component type of matrix C, of type
1132      elink:VkComponentTypeNV.
1133    * pname:DType is the component type of matrix D, of type
1134      elink:VkComponentTypeNV.
1135    * pname:scope is the scope of all the matrix types, of type
1136      elink:VkScopeNV.
1137  
1138  If some types are preferred over other types (e.g. for performance), they
1139  should: appear earlier in the list enumerated by
1140  flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.
1141  
1142  At least one entry in the list must: have power of two values for all of
1143  pname:MSize, pname:KSize, and pname:NSize.
1144  
1145  include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.txt[]
1146  --
1147  
1148  [open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums']
1149  --
1150  
1151  Possible values for elink:VkScopeNV include:
1152  
1153  include::{generated}/api/enums/VkScopeNV.txt[]
1154  
1155    * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope.
1156    * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope.
1157    * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope.
1158    * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamilyKHR
1159      scope.
1160  
1161  All enum values match the corresponding SPIR-V value.
1162  --
1163  
1164  [open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums']
1165  --
1166  
1167  Possible values for elink:VkComponentTypeNV include:
1168  
1169  include::{generated}/api/enums/VkComponentTypeNV.txt[]
1170  
1171    * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V
1172      code:OpTypeFloat 16.
1173    * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V
1174      code:OpTypeFloat 32.
1175    * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V
1176      code:OpTypeFloat 64.
1177    * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8
1178      1.
1179    * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt
1180      16 1.
1181    * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt
1182      32 1.
1183    * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt
1184      64 1.
1185    * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8
1186      0.
1187    * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt
1188      16 0.
1189    * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt
1190      32 0.
1191    * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt
1192      64 0.
1193  --
1194  
1195  endif::VK_NV_cooperative_matrix[]
1196  
1197  ifdef::VK_EXT_validation_cache[]
1198  [[shaders-validation-cache]]
1199  == Validation Cache
1200  
1201  [open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles']
1202  --
1203  
1204  Validation cache objects allow the result of internal validation to be
1205  reused, both within a single application run and between multiple runs.
1206  Reuse within a single run is achieved by passing the same validation cache
1207  object when creating supported Vulkan objects.
1208  Reuse across runs of an application is achieved by retrieving validation
1209  cache contents in one run of an application, saving the contents, and using
1210  them to preinitialize a validation cache on a subsequent run.
1211  The contents of the validation cache objects are managed by the validation
1212  layers.
1213  Applications can: manage the host memory consumed by a validation cache
1214  object and control the amount of data retrieved from a validation cache
1215  object.
1216  
1217  Validation cache objects are represented by sname:VkValidationCacheEXT
1218  handles:
1219  
1220  include::{generated}/api/handles/VkValidationCacheEXT.txt[]
1221  
1222  --
1223  
1224  [open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos']
1225  --
1226  
1227  To create validation cache objects, call:
1228  
1229  include::{generated}/api/protos/vkCreateValidationCacheEXT.txt[]
1230  
1231    * pname:device is the logical device that creates the validation cache
1232      object.
1233    * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT
1234      structure that contains the initial parameters for the validation cache
1235      object.
1236    * pname:pAllocator controls host memory allocation as described in the
1237      <<memory-allocation, Memory Allocation>> chapter.
1238    * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT
1239      handle in which the resulting validation cache object is returned.
1240  
1241  [NOTE]
1242  .Note
1243  ====
1244  Applications can: track and manage the total host memory size of a
1245  validation cache object using the pname:pAllocator.
1246  Applications can: limit the amount of data retrieved from a validation cache
1247  object in fname:vkGetValidationCacheDataEXT.
1248  Implementations should: not internally limit the total number of entries
1249  added to a validation cache object or the total host memory consumed.
1250  ====
1251  
1252  Once created, a validation cache can: be passed to the
1253  fname:vkCreateShaderModule command as part of the
1254  sname:VkShaderModuleCreateInfo pname:pNext chain.
1255  If a sname:VkShaderModuleValidationCacheCreateInfoEXT object is part of the
1256  sname:VkShaderModuleCreateInfo::pname:pNext chain, and its
1257  pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation
1258  will query it for possible reuse opportunities and update it with new
1259  content.
1260  The use of the validation cache object in these commands is internally
1261  synchronized, and the same validation cache object can: be used in multiple
1262  threads simultaneously.
1263  
1264  [NOTE]
1265  .Note
1266  ====
1267  Implementations should: make every effort to limit any critical sections to
1268  the actual accesses to the cache, which is expected to be significantly
1269  shorter than the duration of the fname:vkCreateShaderModule command.
1270  ====
1271  
1272  include::{generated}/validity/protos/vkCreateValidationCacheEXT.txt[]
1273  --
1274  
1275  [open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs']
1276  --
1277  
1278  The sname:VkValidationCacheCreateInfoEXT structure is defined as:
1279  
1280  include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.txt[]
1281  
1282    * pname:sType is the type of this structure.
1283    * pname:pNext is `NULL` or a pointer to an extension-specific structure.
1284    * pname:flags is reserved for future use.
1285    * pname:initialDataSize is the number of bytes in pname:pInitialData.
1286      If pname:initialDataSize is zero, the validation cache will initially be
1287      empty.
1288    * pname:pInitialData is a pointer to previously retrieved validation cache
1289      data.
1290      If the validation cache data is incompatible (as defined below) with the
1291      device, the validation cache will be initially empty.
1292      If pname:initialDataSize is zero, pname:pInitialData is ignored.
1293  
1294  .Valid Usage
1295  ****
1296    * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]]
1297      If pname:initialDataSize is not `0`, it must: be equal to the size of
1298      pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT
1299      when pname:pInitialData was originally retrieved
1300    * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]]
1301      If pname:initialDataSize is not `0`, pname:pInitialData must: have been
1302      retrieved from a previous call to fname:vkGetValidationCacheDataEXT
1303  ****
1304  
1305  include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.txt[]
1306  --
1307  
1308  [open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags']
1309  --
1310  include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.txt[]
1311  
1312  tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask,
1313  but is currently reserved for future use.
1314  --
1315  
1316  [open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos']
1317  --
1318  
1319  Validation cache objects can: be merged using the command:
1320  
1321  include::{generated}/api/protos/vkMergeValidationCachesEXT.txt[]
1322  
1323    * pname:device is the logical device that owns the validation cache
1324      objects.
1325    * pname:dstCache is the handle of the validation cache to merge results
1326      into.
1327    * pname:srcCacheCount is the length of the pname:pSrcCaches array.
1328    * pname:pSrcCaches is an array of validation cache handles, which will be
1329      merged into pname:dstCache.
1330      The previous contents of pname:dstCache are included after the merge.
1331  
1332  [NOTE]
1333  .Note
1334  ====
1335  The details of the merge operation are implementation dependent, but
1336  implementations should: merge the contents of the specified validation
1337  caches and prune duplicate entries.
1338  ====
1339  
1340  .Valid Usage
1341  ****
1342    * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]]
1343      pname:dstCache must: not appear in the list of source caches
1344  ****
1345  
1346  include::{generated}/validity/protos/vkMergeValidationCachesEXT.txt[]
1347  --
1348  
1349  [open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos']
1350  --
1351  
1352  Data can: be retrieved from a validation cache object using the command:
1353  
1354  include::{generated}/api/protos/vkGetValidationCacheDataEXT.txt[]
1355  
1356    * pname:device is the logical device that owns the validation cache.
1357    * pname:validationCache is the validation cache to retrieve data from.
1358    * pname:pDataSize is a pointer to a value related to the amount of data in
1359      the validation cache, as described below.
1360    * pname:pData is either `NULL` or a pointer to a buffer.
1361  
1362  If pname:pData is `NULL`, then the maximum size of the data that can: be
1363  retrieved from the validation cache, in bytes, is returned in
1364  pname:pDataSize.
1365  Otherwise, pname:pDataSize must: point to a variable set by the user to the
1366  size of the buffer, in bytes, pointed to by pname:pData, and on return the
1367  variable is overwritten with the amount of data actually written to
1368  pname:pData.
1369  
1370  If pname:pDataSize is less than the maximum size that can: be retrieved by
1371  the validation cache, at most pname:pDataSize bytes will be written to
1372  pname:pData, and fname:vkGetValidationCacheDataEXT will return
1373  ename:VK_INCOMPLETE.
1374  Any data written to pname:pData is valid and can: be provided as the
1375  pname:pInitialData member of the sname:VkValidationCacheCreateInfoEXT
1376  structure passed to fname:vkCreateValidationCacheEXT.
1377  
1378  Two calls to fname:vkGetValidationCacheDataEXT with the same parameters
1379  must: retrieve the same data unless a command that modifies the contents of
1380  the cache is called between them.
1381  
1382  [[validation-cache-header]]
1383  Applications can: store the data retrieved from the validation cache, and
1384  use these data, possibly in a future run of the application, to populate new
1385  validation cache objects.
1386  The results of validation, however, may: depend on the vendor ID, device ID,
1387  driver version, and other details of the device.
1388  To enable applications to detect when previously retrieved data is
1389  incompatible with the device, the initial bytes written to pname:pData must:
1390  be a header consisting of the following members:
1391  
1392  .Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
1393  [width="85%",cols="8%,21%,71%",options="header"]
1394  |====
1395  | Offset | Size | Meaning
1396  | 0 | 4                    | length in bytes of the entire validation cache header
1397                               written as a stream of bytes, with the least
1398                               significant byte first
1399  | 4 | 4                    | a elink:VkValidationCacheHeaderVersionEXT value
1400                               written as a stream of bytes, with the least
1401                               significant byte first
1402  | 8 | ename:VK_UUID_SIZE   | a layer commit ID expressed as a UUID, which uniquely
1403                               identifies the version of the validation layers used
1404                               to generate these validation results
1405  |====
1406  
1407  The first four bytes encode the length of the entire validation cache
1408  header, in bytes.
1409  This value includes all fields in the header including the validation cache
1410  version field and the size of the length field.
1411  
1412  The next four bytes encode the validation cache version, as described for
1413  elink:VkValidationCacheHeaderVersionEXT.
1414  A consumer of the validation cache should: use the cache version to
1415  interpret the remainder of the cache header.
1416  
1417  If pname:pDataSize is less than what is necessary to store this header,
1418  nothing will be written to pname:pData and zero will be written to
1419  pname:pDataSize.
1420  
1421  include::{generated}/validity/protos/vkGetValidationCacheDataEXT.txt[]
1422  --
1423  
1424  [open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT']
1425  --
1426  Possible values of the second group of four bytes in the header returned by
1427  flink:vkGetValidationCacheDataEXT, encoding the validation cache version,
1428  are:
1429  
1430  include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.txt[]
1431  
1432    * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one
1433      of the validation cache.
1434  --
1435  
1436  [open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos']
1437  --
1438  
1439  To destroy a validation cache, call:
1440  
1441  include::{generated}/api/protos/vkDestroyValidationCacheEXT.txt[]
1442  
1443    * pname:device is the logical device that destroys the validation cache
1444      object.
1445    * pname:validationCache is the handle of the validation cache to destroy.
1446    * pname:pAllocator controls host memory allocation as described in the
1447      <<memory-allocation, Memory Allocation>> chapter.
1448  
1449  .Valid Usage
1450  ****
1451    * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]]
1452      If sname:VkAllocationCallbacks were provided when pname:validationCache
1453      was created, a compatible set of callbacks must: be provided here
1454    * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]]
1455      If no sname:VkAllocationCallbacks were provided when
1456      pname:validationCache was created, pname:pAllocator must: be `NULL`
1457  ****
1458  
1459  include::{generated}/validity/protos/vkDestroyValidationCacheEXT.txt[]
1460  --
1461  endif::VK_EXT_validation_cache[]