Using compute shaders to convert RGB to RGBA vk/vsg::Image

525 views

Skip to first unread message

Robert Osfield

unread,

Apr 15, 2021, 3:44:20 AM4/15/21

to vsg-users : VulkanSceneGraph Developer Discussion Group

Hi All,

One of the constraints you have to accomodate when working with Vulkan is that you have create RGB vkImage/vsg::Image on the GPU, you can only create R, RG and RGBA vKImage. It's possible that this constant is relaxed on some hardware, but for the NVidia cards I've been using thus far, it's part of life when using Vulkan.

To avoid problems you have to convert RGB images to RGBA before copying the data to the vkImage on the GPU. This conversion can be done in the image loader, or by the VulkanSceneGraph when it uploads the image data to the staging buffer before finally copying to the vkImage on the GPU. This conveniently hides the complexity of conversion, allowing you to use use RGB data on the CPU, but it doesn't hide the cost of conversion - it's a relatively expensive CPU operation due to memory bandwidth cost of reading/writing.

For one of operations like loading an RGB image and then uploading to RGBA vkImage on loading the cost is negligible but for tasks where you have stream of RGB data, such as from a camera or video stream, that you need to render as texture then you pay the conversion penalty on every copy to the GPU. The topic of this email is one technique for avoiding this penalty by using a compute shader that takes the RGB data and converts to RGBA then writes to the vkImage.

The vsgExample project already had a vsgdynamictexture example that updates a vsg::vec3Array2D on the CPU and then copies this to the GPU using a vsg::CopyAndReleaseImage command, so I used this as a base and created a new compute shader version, vsgyndnanictexture_cs:

https://github.com/vsg-dev/vsgExamples/blob/master/examples/state/vsgdynamictexture_cs/vsgdynamictexture_cs.cpp

The compute shader that does the conversion is:

https://github.com/vsg-dev/vsgExamples/blob/master/data/shaders/RGBtoRGBA.comp

The vsgdynanictextyre_cs.cpp s 82 lines longer than the original vsgdynanictextyre.cpp, and the compute shader is 30 lines long, so it's requires more set up work than just letting the VSG handling the RGB->RGBA conversion under the hood for you, the benefit is:

vsgdynamictexture : 1730fps

vsgdynamictexture_cs : 3340fps

So nearly twice as fast, the framerate stats are also far more stable in the compute shader case, sometimes the non compute shader example records as little as 1200fps in some runs, suggestion that CPU contention can be a real issue with the CPU conversion.

As another test, I added an --rgba option to vsgdynamictexture to allow you to select the use of a vsg::vec4Array2D rather than vsg::vec3Array2D and performance goes up from 1730fps to 3115fps, coming very close the compute shader case. The use of vec4 avoids the conversion from vec3 to vec4 step, so despite requiring 1/3 more memory in the source array is much faster.

All the required changes are checked into VulkanSceneGraph and vsgExamples master.

The new vsgdynamictexture_cs example is our first example where the compute shader is populating data that is directly used by graphics shaders within the same frame, so is the forerunner of doing much more complex compute/graphics integration down the line.

Cheers,

Robert.

Robert Osfield

unread,

Apr 15, 2021, 7:06:27 AM4/15/21

to vsg-users : VulkanSceneGraph Developer Discussion Group

I have just done a refinement of vsg::Buffer allocation in the new vsgdynamictexture_cs example so that it uses VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT rather than the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT it was using before and see a further improvement in performance, that varies across the three GPUs on my system. My motherboard shares the PCIExpress so I'm only seeing 8x, 4x and 8x for my 2060, 1650 and 1650 respectively.

The performance I get are:

2060 (8x) 1650 (4x) 1650 (8x)

HOST_VISIBLE 3379fps 699fps 2400fps

DEVICE_LOCAL 3608fps 1287fps 3483fps

The PCIExpress bottleneck is so clear with my middle card stuck on 4x PCIExpress, almost double the performance with just changing one flag!

Reply all

Reply to author

Forward

0 new messages