This article is an excerpt from the book, "
Figure 6.1 – A meshlet subdivision example
These vertices can make up an arbitrary number of triangles, but we usually tune this value according to the hardware we are running on. In Vulkan, the recommended value is 126 (as written in
Figure 6.2 – A meshlet bounding spheres example; some of the larger spheres have been hidden for clarity
Some of you might ask: why not AABBs? AABBs require at least two vec3 of data: one for the center and one for the half-size vector. Another encoding could be to store the minimum and maximum corners. Instead, spheres can be encoded with a single vec4: a vec3 for the center plus the radius.
Given that we might need to process millions of meshlets, each saved byte counts! Spheres can also be more easily tested for frustum and occlusion culling, as we will describe later in the chapter.
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
The next additional piece of data that we’re going to use is the meshlet cone, as shown in the following screenshot:
https:// github.com/JarkkoPFC/meshlete) and we encourage you to try both to find the one that best suits your needs.
After we have loaded the data (vertices and indices) for a given mesh, we are going to generate the list of meshlets. First, we determine the maximum number of meshlets that could be generated for our mesh and allocate memory for the vertices and indices arrays that will describe the meshlets:
const sizet max_meshlets = meshopt_buildMeshletsBound(
indices_accessor.count, max_vertices, max_triangles );
Array<meshopt_Meshlet> local_meshlets;
local_meshlets.init( temp_allocator, max_meshlets,
max_meshlets );
Array<u32> meshlet_vertex_indices;
meshlet_vertex_indices.init( temp_allocator, max_meshlets *
max_vertices, max_meshlets* max_vertices );
Array<u8> meshlet_triangles;
meshlet_triangles.init( temp_allocator, max_meshlets *
max_triangles * 3, max_meshlets* max_triangles * 3 );
Notice the types for the indices and triangle arrays. We are not modifying the original vertex or index buffer, but only generating a list of indices in the original buffers. Another interesting aspect is that we only need 1 byte to store the triangle indices. Again, saving memory is very important to keep meshlet processing efficient!
The next step is to generate our meshlets:
const sizet max_vertices = 64;
const sizet max_triangles = 124;
const f32 cone_weight = 0.0f;
sizet meshlet_count = meshopt_buildMeshlets(
local_meshlets.data,
meshlet_vertex_indices.data,
meshlet_triangles.data, indices,
indices_accessor.count,
vertices,
position_buffer_accessor.count,
sizeof( vec3s ),
max_vertices,
max_triangles,
cone_weight );
As mentioned in the preceding step, we need to tell the library the maximum number of vertices and triangles that a meshlet can contain. In our case, we are using the recommended values for the Vulkan API. The other parameters include the original vertex and index buffer, and the arrays we have just created that will contain the data for the meshlets.
Let’s have a better look at the data structure of each meshlet:
struct meshopt_Meshlet
{
unsigned int vertex_offset;
unsigned int triangle_offset;
unsigned int vertex_count;
unsigned int triangle_count;
};
Each meshlet is described by two offsets and two counts, one for the vertex indices and one for the indices of the triangles. Note that these off sets refer to meshlet_vertex_indices and meshlet_ triangles that are populated by the library, not the original vertex and index buff ers of the mesh.
Now that we have the meshlet data, we need to upload it to the GPU. To keep the data size to a minimum, we store the positions at full resolution while we compress the normals to 1 byte for each dimension and UV coordinates to half-float for each dimension. In pseudocode, this is as follows:
meshlet_vertex_data.normal = ( normal + 1.0 ) * 127.0;
meshlet_vertex_data.uv_coords = quantize_half( uv_coords );
The next step is to extract the additional data (bounding sphere and cone) for each meshlet:
for ( u32 m = 0; m < meshlet_count; ++m ) {
meshopt_Meshlet& local_meshlet = local_meshlets[ m ];
meshopt_Bounds meshlet_bounds =
meshopt_computeMeshletBounds(
meshlet_vertex_indices.data +
local_meshlet.vertex_offset,
meshlet_triangles.data +
local_meshlet.triangle_offset,
local_meshlet.triangle_count,
vertices,
position_buffer_accessor
.count,
sizeof( vec3s ) );
...
}
We loop over all the meshlets and we call the MeshOptimizer API that computes the bounds for each meshlet. Let’s see in more detail the structure of the data that is returned:
struct meshopt_Bounds
{
float center[3];
float radius;
float cone_apex[3];
float cone_axis[3];
float cone_cutoff;
signed char cone_axis_s8[3];
signed char cone_cutoff_s8;
};
The first four floats represent the bounding sphere. Next, we have the cone definition, which is comprised of the cone direction (cone_axis) and the cone angle (cone_cutoff). We are not using the cone_apex value as it makes the back-face culling computation more expensive. However, it can lead to better results.
Once again, notice that quantized values (cone_axis_s8 and cone_cutoff_s8) help us reduce the size of the data required for each meshlet.
Finally, meshlet data is copied into GPU buff ers and it will be used during the execution of task and mesh shaders.
For each processed mesh, we will also save an offset and count of meshlets to add a coarse culling based on the parent mesh: if the mesh is visible, then its meshlets will be added.
In this article, we have described what meshlets are and why they are useful to improve the culling of geometry on the GPU.
Conclusion
Meshlets represent a powerful tool for optimizing the rendering of complex geometries. By subdividing meshes into small, efficient chunks and incorporating additional data like bounding spheres and cones, we can achieve finer-grained control over visibility and culling processes. Whether you're leveraging advanced shader technologies or applying these concepts with compute shaders, adopting meshlets can lead to significant performance improvements in your graphics pipeline. With libraries like MeshOptimizer at your disposal, implementing this technique has never been more accessible.
Author Bio
Marco Castorina first became familiar with Vulkan while working as a driver developer at Samsung. Later, he developed a 2D and 3D renderer in Vulkan from scratch for a leading media server company. He recently joined the games graphics performance team at AMD. In his spare time, he keeps up to date with the latest techniques in real-time graphics. He also likes cooking and playing guitar.
Gabriel Sassone is a rendering enthusiast currently working as a principal rendering engineer at The Multiplayer Group. Previously working for Avalanche Studios, where he first encountered Vulkan, they developed the Vulkan layer for the proprietary Apex Engine and its Google Stadia port. He previously worked at ReadyAtDawn, Codemasters, FrameStudios, and some other non-gaming tech companies. His spare time is filled with music and rendering, gaming, and outdoor activities.