/ appendices / VK_NV_shader_image_footprint.txt
VK_NV_shader_image_footprint.txt
  1  include::meta/VK_NV_shader_image_footprint.txt[]
  2  
  3  *Last Modified Date*::
  4      2018-09-13
  5  *IP Status*::
  6      No known IP claims.
  7  *Contributors*::
  8    - Pat Brown, NVIDIA
  9    - Chris Lentini, NVIDIA
 10    - Daniel Koch, NVIDIA
 11    - Jeff Bolz, NVIDIA
 12  
 13  This extension adds Vulkan support for the `SPV_NV_shader_image_footprint`
 14  SPIR-V extension.
 15  That SPIR-V extension provides a new instruction
 16  code:OpImageSampleFootprintNV allowing shaders to determine the set of
 17  texels that would be accessed by an equivalent filtered texture lookup.
 18  
 19  Instead of returning a filtered texture value, the instruction returns a
 20  structure that can be interpreted by shader code to determine the footprint
 21  of a filtered texture lookup.
 22  This structure includes integer values that identify a small neighborhood of
 23  texels in the image being accessed and a bitfield that indicates which
 24  texels in that neighborhood would be used.
 25  The structure also includes a bitfield where each bit identifies whether any
 26  texel in a small aligned block of texels would be fetched by the texture
 27  lookup.
 28  The size of each block is specified by an access _granularity_ provided by
 29  the shader.
 30  The minimum granularity supported by this extension is 2x2 (for 2D textures)
 31  and 2x2x2 (for 3D textures); the maximum granularity is 256x256 (for 2D
 32  textures) or 64x32x32 (for 3D textures).
 33  Each footprint query returns the footprint from a single texture level.
 34  When using minification filters that combine accesses from multiple mipmap
 35  levels, shaders must perform separate queries for the two levels accessed
 36  ("`fine`" and "`coarse`").
 37  The footprint query also returns a flag indicating if the texture lookup
 38  would access texels from only one mipmap level or from two neighboring
 39  levels.
 40  
 41  This extension should be useful for multi-pass rendering operations that do
 42  an initial expensive rendering pass to produce a first image that is then
 43  used as a texture for a second pass.
 44  If the second pass ends up accessing only portions of the first image (e.g.,
 45  due to visbility), the work spent rendering the non-accessed portion of the
 46  first image was wasted.
 47  With this feature, an application can limit this waste using an initial pass
 48  over the geometry in the second image that performs a footprint query for
 49  each visible pixel to determine the set of pixels that it needs from the
 50  first image.
 51  This pass would accumulate an aggregate footprint of all visible pixels into
 52  a separate "`footprint image`" using shader atomics.
 53  Then, when rendering the first image, the application can kill all shading
 54  work for pixels not in this aggregate footprint.
 55  
 56  This extension has a number of limitations.
 57  The code:OpImageSampleFootprintNV instruction only supports for two- and
 58  three-dimensional textures.
 59  Footprint evaluation only supports the CLAMP_TO_EDGE wrap mode; results are
 60  undefined for all other wrap modes.
 61  Only a limited set of granularity values and that set does not support
 62  separate coverage information for each texel in the original image.
 63  
 64  When using SPIR-V generated from the OpenGL Shading Language, the new
 65  instruction will be generated from code using the new
 66  code:textureFootprint*NV built-in functions from the
 67  `GL_NV_shader_texture_footprint` shading language extension.
 68  
 69  === New Object Types
 70  
 71  None.
 72  
 73  === New Enum Constants
 74  
 75    * Extending elink:VkStructureType:
 76    ** ename:VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SHADER_IMAGE_FOOTPRINT_FEATURES_NV
 77  
 78  === New Enums
 79  
 80  None.
 81  
 82  === New Structures
 83  
 84    * slink:VkPhysicalDeviceShaderImageFootprintFeaturesNV
 85  
 86  === New Functions
 87  
 88  None.
 89  
 90  === New SPIR-V Capability
 91  
 92    * <<spirvenv-capabilities-table-imagefootprint,ImageFootprintNV>>
 93  
 94  === Issues
 95  
 96  (1) The footprint returned by the SPIR-V instruction is a structure that
 97      includes an anchor, an offset, and a mask that represents a 8x8 or 4x4x4
 98      neighborhood of texel groups.
 99      But the bits of the mask are not stored in simple pitch order.
100      Why is the footprint built this way?
101  
102  *RESOLVED*: We expect that applications using this feature will want to use
103  a fixed granularity and accumulate coverage information from the returned
104  footprints into an aggregate "`footprint image`" that tracks the portions of
105  an image that would be needed by regular texture filtering.
106  If an application is using a two-dimensional image with 4x4 pixel
107  granularity, we expect that the footprint image will use 64-bit texels where
108  each bit in an 8x8 array of bits corresponds to coverage for a 4x4 block in
109  the original image.
110  Texel (0,0) in the footprint image would correspond to texels (0,0) through
111  (31,31) in the original image.
112  
113  In the usual case, the footprint for a single access will fully contained in
114  a 32x32 aligned region of the original texture, which corresponds to a
115  single 64-bit texel in the footprint image.
116  In that case, the implementation will return an anchor coordinate pointing
117  at the single footprint image texel, an offset vector of (0,0), and a mask
118  whose bits are aligned with the bits in the footprint texel.
119  For this case, the shader can simply atomically OR the mask bits into the
120  contents of the footprint texel to accumulate footprint coverage.
121  
122  In the worst case, the footprint for a single access spans multiple 32x32
123  aligned regions and may require updates to four separate footprint image
124  texels.
125  In this case, the implementation will return an anchor coordinate pointing
126  at the lower right footprint image texel and an offset will identify how
127  many "`columns`" and "`rows`" of the returned 8x8 mask correspond to
128  footprint texels to the left and above the anchor texel.
129  If the anchor is (2,3), the 64 bits of the returned mask are arranged
130  spatially as follows, where each 4x4 block is assigned a bit number that
131  matches its bit number in the footprint image texels:
132  
133  ----
134      +-------------------------+-------------------------+
135      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
136      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
137      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
138      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
139      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
140      | -- -- -- -- -- -- 46 47 | 40 41 42 43 44 45 -- -- |
141      | -- -- -- -- -- -- 54 55 | 48 49 50 51 52 53 -- -- |
142      | -- -- -- -- -- -- 62 63 | 56 57 58 59 60 61 -- -- |
143      +-------------------------+-------------------------+
144      | -- -- -- -- -- -- 06 07 | 00 01 02 03 04 05 -- -- |
145      | -- -- -- -- -- -- 14 15 | 08 09 10 11 12 13 -- -- |
146      | -- -- -- -- -- -- 22 23 | 16 17 18 19 20 21 -- -- |
147      | -- -- -- -- -- -- 30 31 | 24 25 26 27 28 29 -- -- |
148      | -- -- -- -- -- -- 38 39 | 32 33 34 35 36 37 -- -- |
149      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
150      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
151      | -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
152      +-------------------------+-------------------------+
153  ----
154  
155  To accumulate coverage for each of the four footprint image texels, a shader
156  can AND the returned mask with simple masks derived from the x and y offset
157  values and then atomically OR the updated mask bits into the contents of the
158  corresponding footprint texel.
159  
160  [source,c++]
161  ----
162      uint64_t returnedMask = (uint64_t(footprint.mask.x) | (uint64_t(footprint.mask.y) << 32));
163      uint64_t rightMask    = ((0xFF >> footprint.offset.x) * 0x0101010101010101UL);
164      uint64_t bottomMask   = 0xFFFFFFFFFFFFFFFFUL >> (8 * footprint.offset.y);
165      uint64_t bottomRight  = returnedMask & bottomMask & rightMask;
166      uint64_t bottomLeft   = returnedMask & bottomMask & (~rightMask);
167      uint64_t topRight     = returnedMask & (~bottomMask) & rightMask;
168      uint64_t topLeft      = returnedMask & (~bottomMask) & (~rightMask);
169  ----
170  
171  (2) What should an application do to ensure maximum performance when
172      accumulating footprints into an aggregate footprint image?
173  
174  *RESOLVED*: We expect that the most common usage of this feature will be to
175  accumulate aggregate footprint coverage, as described in the previous issue.
176  Even if you ignore the anisotropic filtering case where the implementation
177  may return a granularity larger than that requested by the caller, each
178  shader invocation will need to use atomic functions to update up to four
179  footprint image texels for each level of detail accessed.
180  Having each active shader invocation perform multiple atomic operations can
181  be expensive, particularly when neighboring invocations will want to update
182  the same footprint image texels.
183  
184  Techniques can be used to reduce the number of atomic operations performed
185  when accumulating coverage include:
186  
187    * Have logic that detects returned footprints where all components of the
188      returned offset vector are zero.
189      In that case, the mask returned by the footprint function is guaranteed
190      to be aligned with the footprint image texels and affects only a single
191      footprint image texel.
192    * Have fragment shaders communicate using built-in functions from the
193      `VK_NV_shader_subgroup_partitioned` extension or other shader subgroup
194      extensions.
195      If you have multiple invocations in a subgroup that need to update the
196      same texel (x,y) in the footprint image, compute an aggregate footprint
197      mask across all invocations in the subgroup updating that texel and have
198      a single invocation perform an atomic operation using that aggregate
199      mask.
200    * When the returned footprint spans multiple texels in the footprint
201      image, each invocation need to perform four atomic operations.
202      In the previous issue, we had an example that computed separate masks
203      for "`topLeft`", "`topRight`", "`bottomLeft`", and "`bottomRight`".
204      When the invocations in a subgroup have good locality, it might be the
205      case the "`top left`" for some invocations might refer to footprint
206      image texel (10,10), while neighbors might have their "`top left`"
207      texels at (11,10), (10,11), and (11,11).
208      If you compute separate masks for even/odd x and y values instead of
209      left/right or top/bottom, the "`odd/odd`" mask for all invocations in
210      the subgroup hold coverage for footprint image texel (11,11), which can
211      be updated by a single atomic operation for the entire subgroup.
212  
213  === Examples
214  
215  TBD
216  
217  === Version History
218  
219   * Revision 2, 2018-09-13 (Pat Brown)
220     - Add issue (2) with performance tips.
221  
222   * Revision 1, 2018-08-12 (Pat Brown)
223     - Initial draft