shaders.txt
1 // Copyright (c) 2015-2019 Khronos Group. This work is licensed under a 2 // Creative Commons Attribution 4.0 International License; see 3 // http://creativecommons.org/licenses/by/4.0/ 4 5 [[shaders]] 6 = Shaders 7 8 A shader specifies programmable operations that execute for each vertex, 9 control point, tessellated vertex, primitive, fragment, or workgroup in the 10 corresponding stage(s) of the graphics and compute pipelines. 11 12 Graphics pipelines include vertex shader execution as a result of 13 <<drawing,primitive assembly>>, followed, if enabled, by tessellation 14 control and evaluation shaders operating on <<drawing-patch-lists,patches>>, 15 geometry shaders, if enabled, operating on primitives, and fragment shaders, 16 if present, operating on fragments generated by <<primsrast,Rasterization>>. 17 In this specification, vertex, tessellation control, tessellation evaluation 18 and geometry shaders are collectively referred to as vertex processing 19 stages and occur in the logical pipeline before rasterization. 20 The fragment shader occurs logically after rasterization. 21 22 Only the compute shader stage is included in a compute pipeline. 23 Compute shaders operate on compute invocations in a workgroup. 24 25 Shaders can: read from input variables, and read from and write to output 26 variables. 27 Input and output variables can: be used to transfer data between shader 28 stages, or to allow the shader to interact with values that exist in the 29 execution environment. 30 Similarly, the execution environment provides constants that describe 31 capabilities. 32 33 Shader variables are associated with execution environment-provided inputs 34 and outputs using _built-in_ decorations in the shader. 35 The available decorations for each stage are documented in the following 36 subsections. 37 38 39 [[shader-modules]] 40 == Shader Modules 41 42 [open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles'] 43 -- 44 45 _Shader modules_ contain _shader code_ and one or more entry points. 46 Shaders are selected from a shader module by specifying an entry point as 47 part of <<pipelines,pipeline>> creation. 48 The stages of a pipeline can: use shaders that come from different modules. 49 The shader code defining a shader module must: be in the SPIR-V format, as 50 described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix. 51 52 Shader modules are represented by sname:VkShaderModule handles: 53 54 include::{generated}/api/handles/VkShaderModule.txt[] 55 56 -- 57 58 [open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos'] 59 -- 60 61 To create a shader module, call: 62 63 include::{generated}/api/protos/vkCreateShaderModule.txt[] 64 65 * pname:device is the logical device that creates the shader module. 66 * pname:pCreateInfo is a pointer to an instance of the 67 sname:VkShaderModuleCreateInfo structure. 68 * pname:pAllocator controls host memory allocation as described in the 69 <<memory-allocation, Memory Allocation>> chapter. 70 * pname:pShaderModule points to a slink:VkShaderModule handle in which the 71 resulting shader module object is returned. 72 73 Once a shader module has been created, any entry points it contains can: be 74 used in pipeline shader stages as described in <<pipelines-compute,Compute 75 Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>. 76 77 ifdef::VK_NV_glsl_shader[] 78 If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will 79 be generated and the compile log will be reported back to the application by 80 `<<VK_EXT_debug_report>>` if enabled. 81 endif::VK_NV_glsl_shader[] 82 83 include::{generated}/validity/protos/vkCreateShaderModule.txt[] 84 -- 85 86 [open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs'] 87 -- 88 89 The sname:VkShaderModuleCreateInfo structure is defined as: 90 91 include::{generated}/api/structs/VkShaderModuleCreateInfo.txt[] 92 93 * pname:sType is the type of this structure. 94 * pname:pNext is `NULL` or a pointer to an extension-specific structure. 95 * pname:flags is reserved for future use. 96 * pname:codeSize is the size, in bytes, of the code pointed to by 97 pname:pCode. 98 * pname:pCode points to code that is used to create the shader module. 99 The type and format of the code is determined from the content of the 100 memory addressed by pname:pCode. 101 102 .Valid Usage 103 **** 104 * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]] 105 pname:codeSize must: be greater than 0 106 ifndef::VK_NV_glsl_shader[] 107 * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]] 108 pname:codeSize must: be a multiple of 4 109 * [[VUID-VkShaderModuleCreateInfo-pCode-01087]] 110 pname:pCode must: point to valid SPIR-V code, formatted and packed as 111 described by the <<spirv-spec,Khronos SPIR-V Specification>> 112 * [[VUID-VkShaderModuleCreateInfo-pCode-01088]] 113 pname:pCode must: adhere to the validation rules described by the 114 <<spirvenv-module-validation, Validation Rules within a Module>> section 115 of the <<spirvenv-capabilities,SPIR-V Environment>> appendix 116 endif::VK_NV_glsl_shader[] 117 ifdef::VK_NV_glsl_shader[] 118 * [[VUID-VkShaderModuleCreateInfo-pCode-01376]] 119 If pname:pCode points to SPIR-V code, pname:codeSize must: be a multiple 120 of 4 121 * [[VUID-VkShaderModuleCreateInfo-pCode-01377]] 122 pname:pCode must: point to either valid SPIR-V code, formatted and 123 packed as described by the <<spirv-spec,Khronos SPIR-V Specification>> 124 or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl` 125 extension specification 126 * [[VUID-VkShaderModuleCreateInfo-pCode-01378]] 127 If pname:pCode points to SPIR-V code, that code must: adhere to the 128 validation rules described by the <<spirvenv-module-validation, 129 Validation Rules within a Module>> section of the 130 <<spirvenv-capabilities,SPIR-V Environment>> appendix 131 * [[VUID-VkShaderModuleCreateInfo-pCode-01379]] 132 If pname:pCode points to GLSL code, it must: be valid GLSL code written 133 to the `GL_KHR_vulkan_glsl` GLSL extension specification 134 endif::VK_NV_glsl_shader[] 135 * [[VUID-VkShaderModuleCreateInfo-pCode-01089]] 136 pname:pCode must: declare the code:Shader capability for SPIR-V code 137 * [[VUID-VkShaderModuleCreateInfo-pCode-01090]] 138 pname:pCode must: not declare any capability that is not supported by 139 the API, as described by the <<spirvenv-module-validation, 140 Capabilities>> section of the <<spirvenv-capabilities,SPIR-V 141 Environment>> appendix 142 * [[VUID-VkShaderModuleCreateInfo-pCode-01091]] 143 If pname:pCode declares any of the capabilities listed as optional: in 144 the <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, the 145 corresponding feature(s) must: be enabled. 146 **** 147 148 include::{generated}/validity/structs/VkShaderModuleCreateInfo.txt[] 149 -- 150 151 [open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags'] 152 -- 153 include::{generated}/api/flags/VkShaderModuleCreateFlags.txt[] 154 155 tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is 156 currently reserved for future use. 157 -- 158 159 ifdef::VK_EXT_validation_cache[] 160 include::VK_EXT_validation_cache/shader-module-validation-cache.txt[] 161 endif::VK_EXT_validation_cache[] 162 163 164 [open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos'] 165 -- 166 167 To destroy a shader module, call: 168 169 include::{generated}/api/protos/vkDestroyShaderModule.txt[] 170 171 * pname:device is the logical device that destroys the shader module. 172 * pname:shaderModule is the handle of the shader module to destroy. 173 * pname:pAllocator controls host memory allocation as described in the 174 <<memory-allocation, Memory Allocation>> chapter. 175 176 A shader module can: be destroyed while pipelines created using its shaders 177 are still in use. 178 179 .Valid Usage 180 **** 181 * [[VUID-vkDestroyShaderModule-shaderModule-01092]] 182 If sname:VkAllocationCallbacks were provided when pname:shaderModule was 183 created, a compatible set of callbacks must: be provided here 184 * [[VUID-vkDestroyShaderModule-shaderModule-01093]] 185 If no sname:VkAllocationCallbacks were provided when pname:shaderModule 186 was created, pname:pAllocator must: be `NULL` 187 **** 188 189 include::{generated}/validity/protos/vkDestroyShaderModule.txt[] 190 -- 191 192 193 [[shaders-execution]] 194 == Shader Execution 195 196 At each stage of the pipeline, multiple invocations of a shader may: execute 197 simultaneously. 198 Further, invocations of a single shader produced as the result of different 199 commands may: execute simultaneously. 200 The relative execution order of invocations of the same shader type is 201 undefined:. 202 Shader invocations may: complete in a different order than that in which the 203 primitives they originated from were drawn or dispatched by the application. 204 However, fragment shader outputs are written to attachments in 205 <<primrast-order,rasterization order>>. 206 207 The relative execution order of invocations of different shader types is 208 largely undefined:. 209 However, when invoking a shader whose inputs are generated from a previous 210 pipeline stage, the shader invocations from the previous stage are 211 guaranteed to have executed far enough to generate input values for all 212 required inputs. 213 214 215 [[shaders-execution-memory-ordering]] 216 == Shader Memory Access Ordering 217 218 The order in which image or buffer memory is read or written by shaders is 219 largely undefined:. 220 For some shader types (vertex, tessellation evaluation, and in some cases, 221 fragment), even the number of shader invocations that may: perform loads and 222 stores is undefined:. 223 224 In particular, the following rules apply: 225 226 * <<shaders-vertex-execution,Vertex>> and 227 <<shaders-tessellation-evaluation-execution,tessellation evaluation>> 228 shaders will be invoked at least once for each unique vertex, as defined 229 in those sections. 230 * <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or 231 more times, as defined in that section. 232 * The relative execution order of invocations of the same shader type is 233 undefined:. 234 A store issued by a shader when working on primitive B might complete 235 prior to a store for primitive A, even if primitive A is specified prior 236 to primitive B. This applies even to fragment shaders; while fragment 237 shader outputs are always written to the framebuffer in 238 <<primrast-order, rasterization order>>, stores executed by fragment 239 shader invocations are not. 240 * The relative execution order of invocations of different shader types is 241 largely undefined:. 242 243 [NOTE] 244 .Note 245 ==== 246 The above limitations on shader invocation order make some forms of 247 synchronization between shader invocations within a single set of primitives 248 unimplementable. 249 For example, having one invocation poll memory written by another invocation 250 assumes that the other invocation has been launched and will complete its 251 writes in finite time. 252 ==== 253 254 ifdef::VK_KHR_vulkan_memory_model[] 255 256 The <<memory-model,Memory Model>> appendix defines the terminology and rules 257 for how to correctly communicate between shader invocations, such as when a 258 write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes 259 a <<memory-model-access-data-race,Data Race>>. 260 261 Applications must: not cause a data race. 262 263 endif::VK_KHR_vulkan_memory_model[] 264 265 ifndef::VK_KHR_vulkan_memory_model[] 266 267 Stores issued to different memory locations within a single shader 268 invocation may: not be visible to other invocations, or may: not become 269 visible in the order they were performed. 270 271 The code:OpMemoryBarrier instruction can: be used to provide stronger 272 ordering of reads and writes performed by a single invocation. 273 code:OpMemoryBarrier guarantees that any memory transactions issued by the 274 shader invocation prior to the instruction complete prior to the memory 275 transactions issued after the instruction. 276 Memory barriers are needed for algorithms that require multiple invocations 277 to access the same memory and require the operations to be performed in a 278 partially-defined relative order. 279 For example, if one shader invocation does a series of writes, followed by 280 an code:OpMemoryBarrier instruction, followed by another write, then the 281 results of the series of writes before the barrier become visible to other 282 shader invocations at a time earlier or equal to when the results of the 283 final write become visible to those invocations. 284 In practice it means that another invocation that sees the results of the 285 final write would also see the previous writes. 286 Without the memory barrier, the final write may: be visible before the 287 previous writes. 288 289 Writes that are the result of shader stores through a variable decorated 290 with code:Coherent automatically have available writes to the same buffer, 291 buffer view, or image view made visible to them, and are themselves 292 automatically made available to access by the same buffer, buffer view, or 293 image view. 294 Reads that are the result of shader loads through a variable decorated with 295 code:Coherent automatically have available writes to the same buffer, buffer 296 view, or image view made visible to them. 297 The order that coherent writes to different locations become available is 298 undefined:, unless enforced by a memory barrier instruction or other memory 299 dependency. 300 301 [NOTE] 302 .Note 303 ==== 304 Explicit memory dependencies must: still be used to guarantee availability 305 and visibility for access via other buffers, buffer views, or image views. 306 ==== 307 308 The built-in atomic memory transaction instructions can: be used to read and 309 write a given memory address atomically. 310 While built-in atomic functions issued by multiple shader invocations are 311 executed in undefined: order relative to each other, these functions perform 312 both a read and a write of a memory address and guarantee that no other 313 memory transaction will write to the underlying memory between the read and 314 write. 315 Atomic operations ensure automatic availability and visibility for writes 316 and reads in the same way as those to code:Coherent variables. 317 318 [NOTE] 319 .Note 320 ==== 321 Memory accesses performed on different resource descriptors with the same 322 memory backing may: not be well-defined even with the code:Coherent 323 decoration or via atomics, due to things such as image layouts or ownership 324 of the resource - as described in the <<synchronization, Synchronization and 325 Cache Control>> chapter. 326 ==== 327 328 [NOTE] 329 .Note 330 ==== 331 Atomics allow shaders to use shared global addresses for mutual exclusion or 332 as counters, among other uses. 333 ==== 334 335 endif::VK_KHR_vulkan_memory_model[] 336 337 [[shaders-inputs]] 338 == Shader Inputs and Outputs 339 340 Data is passed into and out of shaders using variables with input or output 341 storage class, respectively. 342 User-defined inputs and outputs are connected between stages by matching 343 their code:Location decorations. 344 Additionally, data can: be provided by or communicated to special functions 345 provided by the execution environment using code:BuiltIn decorations. 346 347 In many cases, the same code:BuiltIn decoration can: be used in multiple 348 shader stages with similar meaning. 349 The specific behavior of variables decorated as code:BuiltIn is documented 350 in the following sections. 351 352 ifdef::VK_NV_mesh_shader[] 353 [[shaders-task]] 354 == Task Shaders 355 356 Task shaders operate in conjunction with the mesh shaders to produce a 357 collection of primitives that will be processed by subsequent stages of the 358 graphics pipeline. 359 Its primary purpose is to create a variable amount of subsequent mesh shader 360 invocations. 361 362 Task shaders are invoked via the execution of the 363 <<drawing-mesh-shading,programmable mesh shading>> pipeline. 364 365 The task shader has no fixed-function inputs other than variables 366 identifying the specific workgroup and invocation. 367 The only fixed output of the task shader is a task count, identifying the 368 number of mesh shader workgroups to create. 369 The task shader can write additional outputs to task memory, which can be 370 read by all of the mesh shader workgroups it created. 371 372 === Task Shader Execution 373 374 Task workloads are formed from groups of work items called workgroups and 375 processed by the task shader in the current graphics pipeline. 376 A workgroup is a collection of shader invocations that execute the same 377 shader, potentially in parallel. 378 Task shaders execute in _global workgroups_ which are divided into a number 379 of _local workgroups_ with a size that can: be set by assigning a value to 380 the code:LocalSize execution mode or via an object decorated by the 381 code:WorkgroupSize decoration. 382 An invocation within a local workgroup can: share data with other members of 383 the local workgroup through shared variables and issue memory and control 384 flow barriers to synchronize with other members of the local workgroup. 385 386 [[shaders-mesh]] 387 == Mesh Shaders 388 389 Mesh shaders operate in workgroups to produce a collection of primitives 390 that will be processed by subsequent stages of the graphics pipeline. 391 Each workgroup emits zero or more output primitives and the group of 392 vertices and their associated data required for each output primitive. 393 394 Mesh shaders are invoked via the execution of the 395 <<drawing-mesh-shading,programmable mesh shading>> pipeline. 396 397 The only inputs available to the mesh shader are variables identifying the 398 specific workgroup and invocation and, if applicable, any outputs written to 399 task memory by the task shader that spawned the mesh shader's workgroup. 400 The mesh shader can operate without a task shader as well. 401 402 The invocations of the mesh shader workgroup write an output mesh, 403 comprising a set of primitives with per-primitive attributes, a set of 404 vertices with per-vertex attributes, and an array of indices identifying the 405 mesh vertices that belong to each primitive. 406 The primitives of this mesh are then processed by subsequent graphics 407 pipeline stages, where the outputs of the mesh shader form an interface with 408 the fragment shader. 409 410 === Mesh Shader Execution 411 412 Mesh workloads are formed from groups of work items called workgroups and 413 processed by the mesh shader in the current graphics pipeline. 414 A workgroup is a collection of shader invocations that execute the same 415 shader, potentially in parallel. 416 Mesh shaders execute in _global workgroups_ which are divided into a number 417 of _local workgroups_ with a size that can: be set by assigning a value to 418 the code:LocalSize execution mode or via an object decorated by the 419 code:WorkgroupSize decoration. 420 An invocation within a local workgroup can: share data with other members of 421 the local workgroup through shared variables and issue memory and control 422 flow barriers to synchronize with other members of the local workgroup. 423 424 The _global workgroups_ may be generated explcitly via the API, or 425 implicitly through the task shader's work creation mechanism. 426 endif::VK_NV_mesh_shader[] 427 428 [[shaders-vertex]] 429 == Vertex Shaders 430 431 Each vertex shader invocation operates on one vertex and its associated 432 <<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and 433 associated data. 434 ifndef::VK_NV_mesh_shader[] 435 Graphics pipelines must: include a vertex shader, and the vertex shader 436 stage is always the first shader stage in the graphics pipeline. 437 endif::VK_NV_mesh_shader[] 438 ifdef::VK_NV_mesh_shader[] 439 Graphics pipelines using primitive shading must: include a vertex shader, 440 and the vertex shader stage is always the first shader stage in the graphics 441 pipeline. 442 endif::VK_NV_mesh_shader[] 443 444 [[shaders-vertex-execution]] 445 === Vertex Shader Execution 446 447 A vertex shader must: be executed at least once for each vertex specified by 448 a draw command. 449 ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 450 If the subpass includes multiple views in its view mask, the shader may: be 451 invoked separately for each view. 452 endif::VK_VERSION_1_1,VK_KHR_multiview[] 453 During execution, the shader is presented with the index of the vertex and 454 instance for which it has been invoked. 455 Input variables declared in the vertex shader are filled by the 456 implementation with the values of vertex attributes associated with the 457 invocation being executed. 458 459 If the same vertex is specified multiple times in a draw command (e.g. by 460 including the same index value multiple times in an index buffer) the 461 implementation may: reuse the results of vertex shading if it can statically 462 determine that the vertex shader invocations will produce identical results. 463 464 [NOTE] 465 .Note 466 ==== 467 It is implementation-dependent when and if results of vertex shading are 468 reused, and thus how many times the vertex shader will be executed. 469 This is true also if the vertex shader contains stores or atomic operations 470 (see <<features-vertexPipelineStoresAndAtomics, 471 pname:vertexPipelineStoresAndAtomics>>). 472 ==== 473 474 475 [[shaders-tessellation-control]] 476 == Tessellation Control Shaders 477 478 The tessellation control shader is used to read an input patch provided by 479 the application and to produce an output patch. 480 Each tessellation control shader invocation operates on an input patch 481 (after all control points in the patch are processed by a vertex shader) and 482 its associated data, and outputs a single control point of the output patch 483 and its associated data, and can: also output additional per-patch data. 484 The input patch is sized according to the pname:patchControlPoints member of 485 slink:VkPipelineTessellationStateCreateInfo, as part of input assembly. 486 The size of the output patch is controlled by the code:OpExecutionMode 487 code:OutputVertices specified in the tessellation control or tessellation 488 evaluation shaders, which must: be specified in at least one of the shaders. 489 The size of the input and output patches must: each be greater than zero and 490 less than or equal to 491 sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize. 492 493 494 [[shaders-tessellation-control-execution]] 495 === Tessellation Control Shader Execution 496 497 A tessellation control shader is invoked at least once for each _output_ 498 vertex in a patch. 499 ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 500 If the subpass includes multiple views in its view mask, the shader may: be 501 invoked separately for each view. 502 endif::VK_VERSION_1_1,VK_KHR_multiview[] 503 504 Inputs to the tessellation control shader are generated by the vertex 505 shader. 506 Each invocation of the tessellation control shader can: read the attributes 507 of any incoming vertices and their associated data. 508 The invocations corresponding to a given patch execute logically in 509 parallel, with undefined: relative execution order. 510 However, the code:OpControlBarrier instruction can: be used to provide 511 limited control of the execution order by synchronizing invocations within a 512 patch, effectively dividing tessellation control shader execution into a set 513 of phases. 514 Tessellation control shaders will read undefined: values if one invocation 515 reads a per-vertex or per-patch attribute written by another invocation at 516 any point during the same phase, or if two invocations attempt to write 517 different values to the same per-patch output in a single phase. 518 519 520 [[shaders-tessellation-evaluation]] 521 == Tessellation Evaluation Shaders 522 523 The Tessellation Evaluation Shader operates on an input patch of control 524 points and their associated data, and a single input barycentric coordinate 525 indicating the invocation's relative position within the subdivided patch, 526 and outputs a single vertex and its associated data. 527 528 529 [[shaders-tessellation-evaluation-execution]] 530 === Tessellation Evaluation Shader Execution 531 532 A tessellation evaluation shader is invoked at least once for each unique 533 vertex generated by the tessellator. 534 ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 535 If the subpass includes multiple views in its view mask, the shader may: be 536 invoked separately for each view. 537 endif::VK_VERSION_1_1,VK_KHR_multiview[] 538 539 540 [[shaders-geometry]] 541 == Geometry Shaders 542 543 The geometry shader operates on a group of vertices and their associated 544 data assembled from a single input primitive, and emits zero or more output 545 primitives and the group of vertices and their associated data required for 546 each output primitive. 547 548 549 [[shaders-geometry-execution]] 550 === Geometry Shader Execution 551 552 A geometry shader is invoked at least once for each primitive produced by 553 the tessellation stages, or at least once for each primitive generated by 554 <<drawing,primitive assembly>> when tessellation is not in use. 555 A shader can request that the geometry shader runs multiple 556 <<geometry-invocations, instances>>. 557 A geometry shader is invoked at least once for each instance. 558 ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 559 If the subpass includes multiple views in its view mask, the shader may: be 560 invoked separately for each view. 561 endif::VK_VERSION_1_1,VK_KHR_multiview[] 562 563 564 [[shaders-fragment]] 565 == Fragment Shaders 566 567 Fragment shaders are invoked as the result of rasterization in a graphics 568 pipeline. 569 Each fragment shader invocation operates on a single fragment and its 570 associated data. 571 With few exceptions, fragment shaders do not have access to any data 572 associated with other fragments and are considered to execute in isolation 573 of fragment shader invocations associated with other fragments. 574 575 576 [[shaders-fragment-execution]] 577 === Fragment Shader Execution 578 579 For each fragment generated by rasterization, a fragment shader may: be 580 invoked. 581 A fragment shader must: not be invoked if the <<fragops-early,Early 582 Per-Fragment Tests>> cause it to have no coverage. 583 ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 584 If the subpass includes multiple views in its view mask, the shader may: be 585 invoked separately for each view. 586 endif::VK_VERSION_1_1,VK_KHR_multiview[] 587 588 Furthermore, if it is determined that a fragment generated as the result of 589 rasterizing a first primitive will have its outputs entirely overwritten by 590 a fragment generated as the result of rasterizing a second primitive in the 591 same subpass, and the fragment shader used for the fragment has no other 592 side effects, then the fragment shader may: not be executed for the fragment 593 from the first primitive. 594 595 Relative ordering of execution of different fragment shader invocations is 596 not defined. 597 598 For each fragment generated by a primitive, the number of times the fragment 599 shader is invoked is implementation-dependent, but must: obey the following 600 constraints: 601 602 * Each covered sample is included in a single fragment shader invocation. 603 * When sample shading is not enabled, there is at least one fragment 604 shader invocation. 605 * When sample shading is enabled, the minimum number of fragment shader 606 invocations is as defined in 607 ifdef::VK_NV_shading_rate_image[] 608 <<primsrast-shading-rate-image,Shading Rate Image>> and 609 endif::VK_NV_shading_rate_image[] 610 <<primsrast-sampleshading,Sample Shading>>. 611 612 When there is more than one fragment shader invocation per fragment, the 613 association of samples to invocations is implementation-dependent. 614 615 In addition to the conditions outlined above for the invocation of a 616 fragment shader, a fragment shader invocation may: be produced as a _helper 617 invocation_. 618 A helper invocation is a fragment shader invocation that is created solely 619 for the purposes of evaluating derivatives for use in non-helper fragment 620 shader invocations. 621 Stores and atomics performed by helper invocations must: not have any effect 622 on memory, and values returned by atomic instructions in helper invocations 623 are undefined:. 624 625 ifdef::VK_EXT_fragment_density_map[] 626 If the render pass has a fragment density map attachment, more than one 627 fragment shader invocation may: be invoked for each covered sample. 628 Stores and atomics performed by these additional invocations have the normal 629 effect. 630 Such additional invocations are only produced if 631 sname:VkPhysicalDeviceFragmentDensityMapPropertiesEXT::pname:fragmentDensityInvocations 632 is ename:VK_TRUE. 633 634 [NOTE] 635 .Note 636 ==== 637 Implementations may: generate these additional fragment shader invocations 638 in order to make transitions between fragment areas with different fragment 639 densities more smooth. 640 ==== 641 endif::VK_EXT_fragment_density_map[] 642 643 [[shaders-fragment-earlytest]] 644 === Early Fragment Tests 645 646 An explicit control is provided to allow fragment shaders to enable early 647 fragment tests. 648 If the fragment shader specifies the code:EarlyFragmentTests 649 code:OpExecutionMode, the per-fragment tests described in 650 <<fragops-early-mode,Early Fragment Test Mode>> are performed prior to 651 fragment shader execution. 652 Otherwise, they are performed after fragment shader execution. 653 654 ifdef::VK_EXT_post_depth_coverage[] 655 [[shaders-fragment-earlytest-postdepthcoverage]] 656 If the fragment shader additionally specifies the code:PostDepthCoverage 657 code:OpExecutionMode, the value of a variable decorated with the 658 <<interfaces-builtin-variables-samplemask,code:SampleMask>> built-in 659 reflects the coverage after the early fragment tests. 660 Otherwise, it reflects the coverage before the early fragment tests. 661 endif::VK_EXT_post_depth_coverage[] 662 663 ifdef::VK_EXT_fragment_shader_interlock[] 664 665 [[shaders-fragment-shader-interlock]] 666 === Fragment Shader Interlock 667 668 In normal operation, it is possible for more than one fragment shader 669 invocation to be executed simultaneously for the same pixel if there are 670 overlapping primitives. 671 If the <<features-features-fragmentShaderSampleInterlock, 672 fragmentShaderSampleInterlock>>, 673 <<features-features-fragmentShaderPixelInterlock, 674 fragmentShaderPixelInterlock>>, or 675 <<features-features-fragmentShaderShadingRateInterlock, 676 fragmentShaderShadingRateInterlock>> features are enabled, it is possible to 677 define a critical section within the fragment shader that is guaranteed to 678 not run simultaneously with another fragment shader invocation for the same 679 sample(s) or pixel(s). 680 It is also possible to control the relative ordering of execution of these 681 critical sections across different fragment shader invovations. 682 683 If the <<spirvenv-capabilities-table-fragmentShaderInterlock, 684 code:FragmentShaderSampleInterlockEXT, code:FragmentShaderPixelInterlockEXT, 685 or code:FragmentShaderShadingRateInterlockEXT>> capabilities are declared in 686 the fragment shader, the code:OpBeginInvocationInterlockEXT and 687 code:OpEndInvocationInterlockEXT instructions must: be used to delimit a 688 critical section of fragment shader code. 689 690 To ensure each invocation of the critical section is executed in 691 <<drawing-primitive-order, primitive order>>, declare one of the 692 code:PixelInterlockOrderedEXT, code:SampleInterlockOrderedEXT, or 693 code:ShadingRateInterlockOrderedEXT execution modes. 694 If the order of execution of each invocation of the critical section does 695 not matter, declare one of the code:PixelInterlockUnorderedEXT, 696 code:SampleInterlockUnorderedEXT, or code:ShadingRateInterlockUnorderedEXT 697 execution modes. 698 699 The code:PixelInterlockOrderedEXT and code:PixelInterlockUnorderedEXT 700 execution modes provide mutual exclusion in the critical section for any 701 pair of fragments corresponding to the same pixel, or pixels if the fragment 702 covers more than one pixel. 703 With sample shading enabled, these execution modes are treated like 704 code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT 705 respectively. 706 707 The code:SampleInterlockOrderedEXT and code:SampleInterlockUnorderedEXT 708 execution modes only provide mutual exclusion for pairs of fragments that 709 both cover at least one common sample in the same pixel; these are 710 recommended for performance if shaders use per-sample data structures. 711 If these execution modes are used in single-sample mode they are treated 712 like code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT 713 respectively. 714 715 ifdef::VK_NV_shading_rate_image[] 716 The code:ShadingRateInterlockOrderedEXT and 717 code:ShadingRateInterlockUnorderedEXT execution modes provide mutual 718 exclusion for pairs of fragments that both have at least one common sample 719 in the same pixel, even if none of the common samples are covered by both 720 fragments. 721 With sample shading enabled, these execution modes are treated like 722 code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT 723 respectively. 724 endif::VK_NV_shading_rate_image[] 725 ifndef::VK_NV_shading_rate_image[] 726 The code:ShadingRateInterlockOrderedEXT and 727 code:ShadingRateInterlockUnorderedEXT execution modes are not supported. 728 endif::VK_NV_shading_rate_image[] 729 730 endif::VK_EXT_fragment_shader_interlock[] 731 732 [[shaders-compute]] 733 == Compute Shaders 734 735 Compute shaders are invoked via flink:vkCmdDispatch and 736 flink:vkCmdDispatchIndirect commands. 737 In general, they have access to similar resources as shader stages executing 738 as part of a graphics pipeline. 739 740 Compute workloads are formed from groups of work items called workgroups and 741 processed by the compute shader in the current compute pipeline. 742 A workgroup is a collection of shader invocations that execute the same 743 shader, potentially in parallel. 744 Compute shaders execute in _global workgroups_ which are divided into a 745 number of _local workgroups_ with a size that can: be set by assigning a 746 value to the code:LocalSize execution mode or via an object decorated by the 747 code:WorkgroupSize decoration. 748 An invocation within a local workgroup can: share data with other members of 749 the local workgroup through shared variables and issue memory and control 750 flow barriers to synchronize with other members of the local workgroup. 751 752 753 [[shaders-interpolation-decorations]] 754 == Interpolation Decorations 755 756 Interpolation decorations control the behavior of attribute interpolation in 757 the fragment shader stage. 758 Interpolation decorations can: be applied to code:Input storage class 759 variables in the fragment shader stage's interface, and control the 760 interpolation behavior of those variables. 761 762 Inputs that could be interpolated can: be decorated by at most one of the 763 following decorations: 764 765 * code:Flat: no interpolation 766 * code:NoPerspective: linear interpolation (for 767 <<line_linear_interpolation,lines>> and 768 <<triangle_linear_interpolation,polygons>>) 769 ifdef::NV_VK_fragment_shader_barycentric[] 770 * code:PerVertexNV: values fetched from shader-specified primitive vertex 771 endif::NV_VK_fragment_shader_barycentric[] 772 773 Fragment input variables decorated with neither code:Flat nor 774 code:NoPerspective use perspective-correct interpolation (for 775 <<line_perspective_interpolation,lines>> and 776 <<triangle_perspective_interpolation,polygons>>). 777 778 The presence of and type of interpolation is controlled by the above 779 interpolation decorations as well as the auxiliary decorations code:Centroid 780 and code:Sample. 781 782 A variable decorated with code:Flat will not be interpolated. 783 Instead, it will have the same value for every fragment within a triangle. 784 This value will come from a single <<vertexpostproc-flatshading,provoking 785 vertex>>. 786 A variable decorated with code:Flat can: also be decorated with 787 code:Centroid or code:Sample, which will mean the same thing as decorating 788 it only as code:Flat. 789 790 For fragment shader input variables decorated with neither code:Centroid nor 791 code:Sample, the assigned variable may: be interpolated anywhere within the 792 fragment and a single value may: be assigned to each sample within the 793 fragment. 794 795 If a fragment shader input is decorated with code:Centroid, a single value 796 may: be assigned to that variable for all samples in the fragment, but that 797 value must: be interpolated to a location that lies in both the fragment and 798 in the primitive being rendered, including any of the fragment's samples 799 covered by the primitive. 800 Because the location at which the variable is interpolated may: be different 801 in neighboring fragments, and derivatives may: be computed by computing 802 differences between neighboring fragments, derivatives of centroid-sampled 803 inputs may: be less accurate than those for non-centroid interpolated 804 variables. 805 ifdef::VK_NV_shading_rate_image[] 806 If 807 slink:VkPipelineViewportShadingRateImageStateCreateInfoNV::pname:shadingRateImageEnable 808 is enabled, implementations may: estimate derivatives using differencing 809 without dividing by the distance between adjacent sample locations when the 810 fragment size is larger than one pixel. 811 endif::VK_NV_shading_rate_image[] 812 ifdef::VK_EXT_post_depth_coverage[] 813 The <<shaders-fragment-earlytest-postdepthcoverage,code:PostDepthCoverage>> 814 execution mode does not affect the determination of the centroid location. 815 endif::VK_EXT_post_depth_coverage[] 816 817 If a fragment shader input is decorated with code:Sample, a separate value 818 must: be assigned to that variable for each covered sample in the fragment, 819 and that value must: be sampled at the location of the individual sample. 820 When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the fragment 821 center must: be used for code:Centroid, code:Sample, and undecorated 822 attribute interpolation. 823 824 Fragment shader inputs that are signed or unsigned integers, integer 825 vectors, or any double-precision floating-point type must: be decorated with 826 code:Flat. 827 828 ifdef::VK_AMD_shader_explicit_vertex_parameter[] 829 When the `<<VK_AMD_shader_explicit_vertex_parameter>>` device extension is 830 enabled inputs can: be also decorated with the code:CustomInterpAMD 831 interpolation decoration, including fragment shader inputs that are signed 832 or unsigned integers, integer vectors, or any double-precision 833 floating-point type. 834 Inputs decorated with code:CustomInterpAMD can: only be accessed by the 835 extended instruction code:InterpolateAtVertexAMD and allows accessing the 836 value of the input for individual vertices of the primitive. 837 endif::VK_AMD_shader_explicit_vertex_parameter[] 838 839 ifdef::VK_NV_fragment_shader_barycentric[] 840 [[shaders-interpolation-decorations-pervertexnv]] 841 When the pname:fragmentShaderBarycentric feature is enabled, inputs can: be 842 also decorated with the code:PerVertexNV interpolation decoration, including 843 fragment shader inputs that are signed or unsigned integers, integer 844 vectors, or any double-precision floating-point type. 845 Inputs decorated with code:PerVertexNV can: only be accessed using an extra 846 array dimension, where the extra index identifies one of the vertices of the 847 primitive that produced the fragment. 848 endif::VK_NV_fragment_shader_barycentric[] 849 850 ifdef::VK_NV_ray_tracing[] 851 include::VK_NV_ray_tracing/raytracing-shaders.txt[] 852 endif::VK_NV_ray_tracing[] 853 854 [[shaders-staticuse]] 855 == Static Use 856 857 A SPIR-V module declares a global object in memory using the code:OpVariable 858 instruction, which results in a pointer code:x to that object. 859 A specific entry point in a SPIR-V module is said to _statically use_ that 860 object if that entry point's call tree contains a function that contains a 861 memory instruction or image instruction with code:x as an code:id operand. 862 See the "`Memory Instructions`" and "`Image Instructions`" subsections of 863 section 3 "`Binary Form`" of the SPIR-V specification for the complete list 864 of SPIR-V memory instructions. 865 866 Static use is not used to control the behavior of variables with code:Input 867 and code:Output storage. 868 The effects of those variables are applied based only on whether they are 869 present in a shader entry point's interface. 870 871 [[shaders-invocationgroups]] 872 == Invocation and Derivative Groups 873 874 An _invocation group_ (see the subsection "`Control Flow`" of section 2 of 875 the SPIR-V specification) for a compute shader is the set of invocations in 876 a single local workgroup. 877 For graphics shaders, an invocation group is an implementation-dependent 878 subset of the set of shader invocations of a given shader stage which are 879 produced by a single drawing command. 880 For indirect drawing commands with pname:drawCount greater than one, 881 invocations from separate draws are in distinct invocation groups. 882 883 [NOTE] 884 .Note 885 ==== 886 Because the partitioning of invocations into invocation groups is 887 implementation-dependent and not observable, applications generally need to 888 assume the worst case of all invocations in a draw belonging to a single 889 invocation group. 890 ==== 891 892 A _derivative group_ (see the subsection "`Control Flow`" of section 2 of 893 the SPIR-V 1.00 Revision 4 specification) is a set of invocations which are 894 used together to compute a derivative. 895 ifdef::VK_VERSION_1_1[] 896 For a fragment shader, a derivative group is generated by a single primitive 897 (point, line, or triangle) and includes any helper invocations needed to 898 compute derivatives. 899 If the pname:subgroupSize field of slink:VkPhysicalDeviceSubgroupProperties 900 is at least 4, a derivative group for a fragment shader corresponds to a 901 single subgroup quad. 902 Otherwise, a derivative group is the set of invocations generated by a 903 single primitive. 904 endif::VK_VERSION_1_1[] 905 ifndef::VK_VERSION_1_1[] 906 For a fragment shader, a derivative group is the set of invocations 907 generated by a single primitive. 908 endif::VK_VERSION_1_1[] 909 ifdef::VK_NV_compute_shader_derivatives[] 910 A derivative group for a compute shader is a single local workgroup. 911 endif::VK_NV_compute_shader_derivatives[] 912 913 Derivative values are undefined: for a sampled image instruction if the 914 instruction is in flow control that is not uniform across the derivative 915 group. 916 917 ifdef::VK_VERSION_1_1[] 918 [[shaders-subgroup]] 919 == Subgroups 920 921 A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V 922 1.3 Revision 1 specification) is a set of invocations that can synchronize 923 and share data with each other efficiently. 924 An invocation group is partitioned into one or more subgroups. 925 926 Subgroup operations are divided into various categories as described in 927 elink:VkSubgroupFeatureFlagBits. 928 929 [[shaders-subgroup-basic]] 930 === Basic Subgroup Operations 931 932 The basic subgroup operations allow two classes of functionality within 933 shaders 934 - elect and barrier. 935 Invocations within a subgroup can: choose a single invocation to perform 936 some task for the subgroup as a whole using elect. 937 Invocations within a subgroup can: perform a subgroup barrier to ensure the 938 ordering of execution or memory accesses within a subgroup. 939 Barriers can: be performed on buffer memory accesses, code:WorkgroupLocal 940 memory accesses, and image memory accesses to ensure that any results 941 written are visible by other invocations within the subgroup. 942 An code:OpControlBarrier can: also be used to perform a full execution 943 control barrier. 944 A full execution control barrier will ensure that each active invocation 945 within the subgroup reaches a point of execution before any are allowed to 946 continue. 947 948 [[shaders-subgroup-vote]] 949 === Vote Subgroup Operations 950 951 The vote subgroup operations allow invocations within a subgroup to compare 952 values across a subgroup. 953 The types of votes enabled are: 954 955 * Do all active subgroup invocations agree that an expression is true? 956 * Do any active subgroup invocations evaluate an expression to true? 957 * Do all active subgroup invocations have the same value of an expression? 958 959 [NOTE] 960 .Note 961 ==== 962 These operations are useful in combination with control flow in that they 963 allow for developers to check whether conditions match across the subgroup 964 and choose potentially faster code-paths in these cases. 965 ==== 966 967 [[shaders-subgroup-arithmetic]] 968 === Arithmetic Subgroup Operations 969 970 The arithmetic subgroup operations allow invocations to perform scan and 971 reduction operations across a subgroup. 972 For reduction operations, each invocation in a subgroup will obtain the same 973 result of these arithmetic operations applied across the subgroup. 974 For scan operations, each invocation in the subgroup will perform an 975 inclusive or exclusive scan, cumulatively applying the operation across the 976 invocations in a subgroup in an implementation-defined order. 977 The operations supported are add, mul, min, max, and, or, xor. 978 979 [[shaders-subgroup-ballot]] 980 === Ballot Subgroup Operations 981 982 The ballot subgroup operations allow invocations to perform more complex 983 votes across the subgroup. 984 The ballot functionality allows all invocations within a subgroup to provide 985 a boolean value and get as a result what each invocation provided as their 986 boolean value. 987 The broadcast functionality allows values to be broadcast from an invocation 988 to all other invocations within the subgroup, given that the invocation to 989 be broadcast from is known at pipeline creation time. 990 991 [[shaders-subgroup-shuffle]] 992 === Shuffle Subgroup Operations 993 994 The shuffle subgroup operations allow invocations to read values from other 995 invocations within a subgroup. 996 997 [[shaders-subgroup-shuffle-relative]] 998 === Shuffle Relative Subgroup Operations 999 1000 The shuffle relative subgroup operations allow invocations to read values 1001 from other invocations within the subgroup relative to the current 1002 invocation in the group. 1003 The relative operations supported allow data to be shifted up and down 1004 through the invocations within a subgroup. 1005 1006 [[shaders-subgroup-clustered]] 1007 === Clustered Subgroup Operations 1008 1009 The clustered subgroup operations allow invocations to perform an operation 1010 among partitions of a subgroup, such that the operation is only performed 1011 within the subgroup invocations within a partition. 1012 The partitions for clustered subgroup operations are consecutive 1013 power-of-two size groups of invocations and the cluster size must: be known 1014 at pipeline creation time. 1015 The operations supported are add, mul, min, max, and, or, xor. 1016 1017 [[shaders-subgroup-quad]] 1018 === Quad Subgroup Operations 1019 1020 The quad subgroup operations allow clusters of 4 invocations (a quad), to 1021 share data efficiently with each other. 1022 ifdef::VK_VERSION_1_1[] 1023 For fragment shaders, if the pname:subgroupSize field of 1024 slink:VkPhysicalDeviceSubgroupProperties is at least 4, each quad 1025 corresponds to one of the groups of four shader invocations used for 1026 <<texture-derivatives,derivatives>>. 1027 endif::VK_VERSION_1_1[] 1028 ifdef::VK_NV_compute_shader_derivatives[] 1029 For compute shaders using the code:DerivativeGroupQuadsNV or 1030 code:DerivativeGroupLinearNV execution modes, each quad corresponds to one 1031 of the groups of four shader invocations used for 1032 <<texture-derivatives-compute,derivatives>>. 1033 The invocations in each quad are ordered to have attribute values of 1034 P~i0,j0~, P~i1,j0~, P~i0,j1~, and P~i1,j1~, respectively. 1035 endif::VK_NV_compute_shader_derivatives[] 1036 1037 ifdef::VK_NV_shader_subgroup_partitioned[] 1038 1039 [[shaders-subgroup-partitioned]] 1040 === Partitioned Subgroup Operations 1041 1042 The partitioned subgroup operations allow a subgroup to partition its 1043 invocations into disjoint subsets and to perform scan and reduce operations 1044 among invocations belonging to the same subset. 1045 The partitions for partitioned subgroup operations are specified by a ballot 1046 operation and can: be computed at runtime. 1047 The operations supported are add, mul, min, max, and, or, xor. 1048 1049 endif::VK_NV_shader_subgroup_partitioned[] 1050 1051 endif::VK_VERSION_1_1[] 1052 1053 ifdef::VK_NV_cooperative_matrix[] 1054 == Cooperative Matrices 1055 1056 A _cooperative matrix_ type is a SPIR-V type where the storage for and 1057 computations performed on the matrix are spread across a set of invocations 1058 such as a subgroup. 1059 These types give the implementation freedom in how to optimize matrix 1060 multiplies. 1061 1062 SPIR-V defines the types and instructions, but does not specify rules about 1063 what sizes/combinations are valid, and it is expected that different 1064 implementations may: support different sizes. 1065 1066 [open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos'] 1067 -- 1068 1069 To enumerate the supported cooperative matrix types and operations, call: 1070 1071 include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[] 1072 1073 * pname:physicalDevice is the physical device. 1074 * pname:pPropertyCount is a pointer to an integer related to the number of 1075 cooperative matrix properties available or queried. 1076 * pname:pProperties is either `NULL` or a pointer to an array of 1077 slink:VkCooperativeMatrixPropertiesNV structures. 1078 1079 If pname:pProperties is `NULL`, then the number of cooperative matrix 1080 properties available is returned in pname:pPropertyCount. 1081 Otherwise, pname:pPropertyCount must: point to a variable set by the user to 1082 the number of elements in the pname:pProperties array, and on return the 1083 variable is overwritten with the number of structures actually written to 1084 pname:pProperties. 1085 If pname:pPropertyCount is less than the number of cooperative matrix 1086 properties available, at most pname:pPropertyCount structures will be 1087 written. 1088 If pname:pPropertyCount is smaller than the number of cooperative matrix 1089 properties available, ename:VK_INCOMPLETE will be returned instead of 1090 ename:VK_SUCCESS, to indicate that not all the available cooperative matrix 1091 properties were returned. 1092 1093 include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[] 1094 -- 1095 1096 [open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs'] 1097 -- 1098 1099 Each sname:VkCooperativeMatrixPropertiesNV structure describes a single 1100 supported combination of types for a matrix multiply/add operation 1101 (code:OpCooperativeMatrixMulAddNV). 1102 The multiply can: be described in terms of the following variables and types 1103 (in SPIR-V pseudocode): 1104 1105 [source,c] 1106 --------------------------------------------------- 1107 %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize 1108 %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize 1109 %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize 1110 %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize 1111 1112 %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV 1113 --------------------------------------------------- 1114 1115 A matrix multiply with these dimensions is known as an _MxNxK_ matrix 1116 multiply. 1117 1118 The sname:VkCooperativeMatrixPropertiesNV structure is defined as: 1119 1120 include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.txt[] 1121 1122 * pname:sType is the type of this structure. 1123 * pname:pNext is `NULL` or a pointer to an extension-specific structure. 1124 * pname:MSize is the number of rows in matrices A, C, and D. 1125 * pname:KSize is the number of columns in matrix A and rows in matrix B. 1126 * pname:NSize is the number of columns in matrices B, C, D. 1127 * pname:AType is the component type of matrix A, of type 1128 elink:VkComponentTypeNV. 1129 * pname:BType is the component type of matrix B, of type 1130 elink:VkComponentTypeNV. 1131 * pname:CType is the component type of matrix C, of type 1132 elink:VkComponentTypeNV. 1133 * pname:DType is the component type of matrix D, of type 1134 elink:VkComponentTypeNV. 1135 * pname:scope is the scope of all the matrix types, of type 1136 elink:VkScopeNV. 1137 1138 If some types are preferred over other types (e.g. for performance), they 1139 should: appear earlier in the list enumerated by 1140 flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV. 1141 1142 At least one entry in the list must: have power of two values for all of 1143 pname:MSize, pname:KSize, and pname:NSize. 1144 1145 include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.txt[] 1146 -- 1147 1148 [open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums'] 1149 -- 1150 1151 Possible values for elink:VkScopeNV include: 1152 1153 include::{generated}/api/enums/VkScopeNV.txt[] 1154 1155 * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope. 1156 * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope. 1157 * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope. 1158 * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamilyKHR 1159 scope. 1160 1161 All enum values match the corresponding SPIR-V value. 1162 -- 1163 1164 [open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums'] 1165 -- 1166 1167 Possible values for elink:VkComponentTypeNV include: 1168 1169 include::{generated}/api/enums/VkComponentTypeNV.txt[] 1170 1171 * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V 1172 code:OpTypeFloat 16. 1173 * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V 1174 code:OpTypeFloat 32. 1175 * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V 1176 code:OpTypeFloat 64. 1177 * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1178 1. 1179 * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt 1180 16 1. 1181 * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt 1182 32 1. 1183 * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt 1184 64 1. 1185 * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1186 0. 1187 * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt 1188 16 0. 1189 * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt 1190 32 0. 1191 * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt 1192 64 0. 1193 -- 1194 1195 endif::VK_NV_cooperative_matrix[] 1196 1197 ifdef::VK_EXT_validation_cache[] 1198 [[shaders-validation-cache]] 1199 == Validation Cache 1200 1201 [open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles'] 1202 -- 1203 1204 Validation cache objects allow the result of internal validation to be 1205 reused, both within a single application run and between multiple runs. 1206 Reuse within a single run is achieved by passing the same validation cache 1207 object when creating supported Vulkan objects. 1208 Reuse across runs of an application is achieved by retrieving validation 1209 cache contents in one run of an application, saving the contents, and using 1210 them to preinitialize a validation cache on a subsequent run. 1211 The contents of the validation cache objects are managed by the validation 1212 layers. 1213 Applications can: manage the host memory consumed by a validation cache 1214 object and control the amount of data retrieved from a validation cache 1215 object. 1216 1217 Validation cache objects are represented by sname:VkValidationCacheEXT 1218 handles: 1219 1220 include::{generated}/api/handles/VkValidationCacheEXT.txt[] 1221 1222 -- 1223 1224 [open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos'] 1225 -- 1226 1227 To create validation cache objects, call: 1228 1229 include::{generated}/api/protos/vkCreateValidationCacheEXT.txt[] 1230 1231 * pname:device is the logical device that creates the validation cache 1232 object. 1233 * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT 1234 structure that contains the initial parameters for the validation cache 1235 object. 1236 * pname:pAllocator controls host memory allocation as described in the 1237 <<memory-allocation, Memory Allocation>> chapter. 1238 * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT 1239 handle in which the resulting validation cache object is returned. 1240 1241 [NOTE] 1242 .Note 1243 ==== 1244 Applications can: track and manage the total host memory size of a 1245 validation cache object using the pname:pAllocator. 1246 Applications can: limit the amount of data retrieved from a validation cache 1247 object in fname:vkGetValidationCacheDataEXT. 1248 Implementations should: not internally limit the total number of entries 1249 added to a validation cache object or the total host memory consumed. 1250 ==== 1251 1252 Once created, a validation cache can: be passed to the 1253 fname:vkCreateShaderModule command as part of the 1254 sname:VkShaderModuleCreateInfo pname:pNext chain. 1255 If a sname:VkShaderModuleValidationCacheCreateInfoEXT object is part of the 1256 sname:VkShaderModuleCreateInfo::pname:pNext chain, and its 1257 pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation 1258 will query it for possible reuse opportunities and update it with new 1259 content. 1260 The use of the validation cache object in these commands is internally 1261 synchronized, and the same validation cache object can: be used in multiple 1262 threads simultaneously. 1263 1264 [NOTE] 1265 .Note 1266 ==== 1267 Implementations should: make every effort to limit any critical sections to 1268 the actual accesses to the cache, which is expected to be significantly 1269 shorter than the duration of the fname:vkCreateShaderModule command. 1270 ==== 1271 1272 include::{generated}/validity/protos/vkCreateValidationCacheEXT.txt[] 1273 -- 1274 1275 [open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs'] 1276 -- 1277 1278 The sname:VkValidationCacheCreateInfoEXT structure is defined as: 1279 1280 include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.txt[] 1281 1282 * pname:sType is the type of this structure. 1283 * pname:pNext is `NULL` or a pointer to an extension-specific structure. 1284 * pname:flags is reserved for future use. 1285 * pname:initialDataSize is the number of bytes in pname:pInitialData. 1286 If pname:initialDataSize is zero, the validation cache will initially be 1287 empty. 1288 * pname:pInitialData is a pointer to previously retrieved validation cache 1289 data. 1290 If the validation cache data is incompatible (as defined below) with the 1291 device, the validation cache will be initially empty. 1292 If pname:initialDataSize is zero, pname:pInitialData is ignored. 1293 1294 .Valid Usage 1295 **** 1296 * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]] 1297 If pname:initialDataSize is not `0`, it must: be equal to the size of 1298 pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT 1299 when pname:pInitialData was originally retrieved 1300 * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]] 1301 If pname:initialDataSize is not `0`, pname:pInitialData must: have been 1302 retrieved from a previous call to fname:vkGetValidationCacheDataEXT 1303 **** 1304 1305 include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.txt[] 1306 -- 1307 1308 [open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags'] 1309 -- 1310 include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.txt[] 1311 1312 tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, 1313 but is currently reserved for future use. 1314 -- 1315 1316 [open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos'] 1317 -- 1318 1319 Validation cache objects can: be merged using the command: 1320 1321 include::{generated}/api/protos/vkMergeValidationCachesEXT.txt[] 1322 1323 * pname:device is the logical device that owns the validation cache 1324 objects. 1325 * pname:dstCache is the handle of the validation cache to merge results 1326 into. 1327 * pname:srcCacheCount is the length of the pname:pSrcCaches array. 1328 * pname:pSrcCaches is an array of validation cache handles, which will be 1329 merged into pname:dstCache. 1330 The previous contents of pname:dstCache are included after the merge. 1331 1332 [NOTE] 1333 .Note 1334 ==== 1335 The details of the merge operation are implementation dependent, but 1336 implementations should: merge the contents of the specified validation 1337 caches and prune duplicate entries. 1338 ==== 1339 1340 .Valid Usage 1341 **** 1342 * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]] 1343 pname:dstCache must: not appear in the list of source caches 1344 **** 1345 1346 include::{generated}/validity/protos/vkMergeValidationCachesEXT.txt[] 1347 -- 1348 1349 [open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos'] 1350 -- 1351 1352 Data can: be retrieved from a validation cache object using the command: 1353 1354 include::{generated}/api/protos/vkGetValidationCacheDataEXT.txt[] 1355 1356 * pname:device is the logical device that owns the validation cache. 1357 * pname:validationCache is the validation cache to retrieve data from. 1358 * pname:pDataSize is a pointer to a value related to the amount of data in 1359 the validation cache, as described below. 1360 * pname:pData is either `NULL` or a pointer to a buffer. 1361 1362 If pname:pData is `NULL`, then the maximum size of the data that can: be 1363 retrieved from the validation cache, in bytes, is returned in 1364 pname:pDataSize. 1365 Otherwise, pname:pDataSize must: point to a variable set by the user to the 1366 size of the buffer, in bytes, pointed to by pname:pData, and on return the 1367 variable is overwritten with the amount of data actually written to 1368 pname:pData. 1369 1370 If pname:pDataSize is less than the maximum size that can: be retrieved by 1371 the validation cache, at most pname:pDataSize bytes will be written to 1372 pname:pData, and fname:vkGetValidationCacheDataEXT will return 1373 ename:VK_INCOMPLETE. 1374 Any data written to pname:pData is valid and can: be provided as the 1375 pname:pInitialData member of the sname:VkValidationCacheCreateInfoEXT 1376 structure passed to fname:vkCreateValidationCacheEXT. 1377 1378 Two calls to fname:vkGetValidationCacheDataEXT with the same parameters 1379 must: retrieve the same data unless a command that modifies the contents of 1380 the cache is called between them. 1381 1382 [[validation-cache-header]] 1383 Applications can: store the data retrieved from the validation cache, and 1384 use these data, possibly in a future run of the application, to populate new 1385 validation cache objects. 1386 The results of validation, however, may: depend on the vendor ID, device ID, 1387 driver version, and other details of the device. 1388 To enable applications to detect when previously retrieved data is 1389 incompatible with the device, the initial bytes written to pname:pData must: 1390 be a header consisting of the following members: 1391 1392 .Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT 1393 [width="85%",cols="8%,21%,71%",options="header"] 1394 |==== 1395 | Offset | Size | Meaning 1396 | 0 | 4 | length in bytes of the entire validation cache header 1397 written as a stream of bytes, with the least 1398 significant byte first 1399 | 4 | 4 | a elink:VkValidationCacheHeaderVersionEXT value 1400 written as a stream of bytes, with the least 1401 significant byte first 1402 | 8 | ename:VK_UUID_SIZE | a layer commit ID expressed as a UUID, which uniquely 1403 identifies the version of the validation layers used 1404 to generate these validation results 1405 |==== 1406 1407 The first four bytes encode the length of the entire validation cache 1408 header, in bytes. 1409 This value includes all fields in the header including the validation cache 1410 version field and the size of the length field. 1411 1412 The next four bytes encode the validation cache version, as described for 1413 elink:VkValidationCacheHeaderVersionEXT. 1414 A consumer of the validation cache should: use the cache version to 1415 interpret the remainder of the cache header. 1416 1417 If pname:pDataSize is less than what is necessary to store this header, 1418 nothing will be written to pname:pData and zero will be written to 1419 pname:pDataSize. 1420 1421 include::{generated}/validity/protos/vkGetValidationCacheDataEXT.txt[] 1422 -- 1423 1424 [open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT'] 1425 -- 1426 Possible values of the second group of four bytes in the header returned by 1427 flink:vkGetValidationCacheDataEXT, encoding the validation cache version, 1428 are: 1429 1430 include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.txt[] 1431 1432 * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one 1433 of the validation cache. 1434 -- 1435 1436 [open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos'] 1437 -- 1438 1439 To destroy a validation cache, call: 1440 1441 include::{generated}/api/protos/vkDestroyValidationCacheEXT.txt[] 1442 1443 * pname:device is the logical device that destroys the validation cache 1444 object. 1445 * pname:validationCache is the handle of the validation cache to destroy. 1446 * pname:pAllocator controls host memory allocation as described in the 1447 <<memory-allocation, Memory Allocation>> chapter. 1448 1449 .Valid Usage 1450 **** 1451 * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]] 1452 If sname:VkAllocationCallbacks were provided when pname:validationCache 1453 was created, a compatible set of callbacks must: be provided here 1454 * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]] 1455 If no sname:VkAllocationCallbacks were provided when 1456 pname:validationCache was created, pname:pAllocator must: be `NULL` 1457 **** 1458 1459 include::{generated}/validity/protos/vkDestroyValidationCacheEXT.txt[] 1460 -- 1461 endif::VK_EXT_validation_cache[]