Hi All,
One of the constraints you have to accomodate when working with Vulkan is that you have create RGB vkImage/vsg::Image on the GPU, you can only create R, RG and RGBA vKImage. It's possible that this constant is relaxed on some hardware, but for the NVidia cards I've been using thus far, it's part of life when using Vulkan.
To avoid problems you have to convert RGB images to RGBA before copying the data to the vkImage on the GPU. This conversion can be done in the image loader, or by the VulkanSceneGraph when it uploads the image data to the staging buffer before finally copying to the vkImage on the GPU. This conveniently hides the complexity of conversion, allowing you to use use RGB data on the CPU, but it doesn't hide the cost of conversion - it's a relatively expensive CPU operation due to memory bandwidth cost of reading/writing.
For one of operations like loading an RGB image and then uploading to RGBA vkImage on loading the cost is negligible but for tasks where you have stream of RGB data, such as from a camera or video stream, that you need to render as texture then you pay the conversion penalty on every copy to the GPU. The topic of this email is one technique for avoiding this penalty by using a compute shader that takes the RGB data and converts to RGBA then writes to the vkImage.
The vsgExample project already had a
vsgdynamictexture example that updates a vsg::vec3Array2D on the CPU and then copies this to the GPU using a vsg::CopyAndReleaseImage command, so I used this as a base and created a new compute shader version, vsgyndnanictexture_cs:
The compute shader that does the conversion is:
The vsgdynanictextyre_cs.cpp s 82 lines longer than the original vsgdynanictextyre.cpp, and the compute shader is 30 lines long, so it's requires more set up work than just letting the VSG handling the RGB->RGBA conversion under the hood for you, the benefit is:
vsgdynamictexture : 1730fps
vsgdynamictexture_cs : 3340fps
So nearly twice as fast, the framerate stats are also far more stable in the compute shader case, sometimes the non compute shader example records as little as 1200fps in some runs, suggestion that CPU contention can be a real issue with the CPU conversion.
As another test, I added an --rgba option to vsgdynamictexture to allow you to select the use of a vsg::vec4Array2D rather than vsg::vec3Array2D and performance goes up from 1730fps to 3115fps, coming very close the compute shader case. The use of vec4 avoids the conversion from vec3 to vec4 step, so despite requiring 1/3 more memory in the source array is much faster.
All the required changes are checked into VulkanSceneGraph and vsgExamples master.
The new vsgdynamictexture_cs example is our first example where the compute shader is populating data that is directly used by graphics shaders within the same frame, so is the forerunner of doing much more complex compute/graphics integration down the line.
Cheers,
Robert.