Optimize data transfer to GPU

36 views
Skip to first unread message

Daniel Trstenjak

unread,
Jun 23, 2021, 5:27:30 AM6/23/21
to osg-...@googlegroups.com
Hi,

for an animation we have to change the coordinates per frame. The draw
time for the model without a coordinate change is about 3ms. With the
coordinate change it can go up to 200ms.

With a model of about half the size I see pretty much the same draw
time without the coordinate change. With the coordinate change it goes
up to 25ms.

Does the difference between 25ms and 200ms mean that the bottleneck is
the data transfer to the GPU? How can I check this on linux to see if
this is the case?

We're using VBOs for the coordinates. The update is pretty much getting
the data pointer of the osg::Array, setting the new coordinates and
then calling dirty on the array and the geometry.

Thanks for any hints!

Greetings,
Daniel

Robert Osfield

unread,
Jun 23, 2021, 6:03:06 AM6/23/21
to OpenSceneGraph Users
Hi Daniel,

With the details provided it's not possible to know exactly what is causing the bottleneck in your case, it may be connected to transfer of data, but 200ms is a big stall so it may be something else going wrong than just the data transfer across the bus.

How much of your scene graph is static and how much is dynamic? 

How much dynamic data needs to be transferred when it's updated?

How are you assigning/updating this data? 

Do you meshes uses display lists?  How are you telling the OSG that the data is updated and needs to be transferred?

What type of hardware are you working on?  What OS and drivers?

Cheers,
Robert.

Daniel Trstenjak

unread,
Jun 23, 2021, 10:32:47 AM6/23/21
to OpenSceneGraph Users
Hi Robert,

the 200ms seem to be very rare spikes and the median draw time for an
animation step for the bigger model is 40ms and for the smaller model 13ms.


> How much dynamic data needs to be transferred when it's updated?

The bigger model has 110MB of coordinates and normals to be transferred
and the smaller model 55MB. For both models the coordinates and
normales are split into 42 'osg::Geometry'.


> How much of your scene graph is static and how much is dynamic?

If you mean the setting 'osg::Object::DataVariance', then we don't have
any explicit setting of it. But pretty much all of the big geometry data in the
application is in a way dynamic.


> How are you assigning/updating this data?

Getting the data pointer with 'osg::Array::getDataPointer()' and directly
writing to it.


> Do you meshes uses display lists?

No, only VBOs.


> How are you telling the OSG that the data is updated and needs to be transferred?

Calling 'osg::Array::dirty()' and 'osg::Geometry::dirtyBound()'.


> What type of hardware are you working on?

CPU: Intel i7-5960X, 8x 3,0 GHz
Mainboard: Fujitsu D3348-B Intel C612, PCI Express x16 Slots 3*Gen3 and 1*Gen2
RAM: 32GB DDR4-2133 Crucial
GPU: Nvidia GTX970 ASUS STRIX 4GB


> What OS and drivers?

OS: Ubuntu Linux 20.04
Driver: NVIDIA 390.143


Thanks a lot for your help!


Greetings,
Daniel

Robert Osfield

unread,
Jun 24, 2021, 3:52:30 AM6/24/21
to OpenSceneGraph Users
Hi Daniel,

On Wednesday, 23 June 2021 at 15:32:47 UTC+1 daniel.t...@gmail.com wrote:
the 200ms seem to be very rare spikes and the median draw time for an
animation step for the bigger model is 40ms and for the smaller model 13ms.

The spike could be an indication of a couple of different things:

  1) might be the OS or other application/process running at the same time and periodically running,, make sure you have as few other apps and processes running during testing
  2) the OpenGL fifo might be very close to filling on each frame and sometimes fills complete blocking your application/OSG from putting more data into the fifom causing the stall.

 


> How much dynamic data needs to be transferred when it's updated?

The bigger model has 110MB of coordinates and normals to be transferred
and the smaller model 55MB. For both models the coordinates and
normales are split into 42 'osg::Geometry'.

What happens with the stalls if you reduce the amount of data?  Is there a trigger point where the amount of data trips the system and it begins to exhibit the stalls?

 
> How much of your scene graph is static and how much is dynamic?

If you mean the setting 'osg::Object::DataVariance', then we don't have
any explicit setting of it. But pretty much all of the big geometry data in the
application is in a way dynamic.

ObjectDataviance is only used as a hint to things like optimization and holding back the next frame to ensure you don't try to render something at the same time as write to it.
 


> How are you assigning/updating this data?

Getting the data pointer with 'osg::Array::getDataPointer()' and directly
writing to it.


> Do you meshes uses display lists?

No, only VBOs.

Have you tried just using standard vertex arrays, so no VBO and no display lists?  This would force it to copy on each new frame, but sometimes can help.

 

> How are you telling the OSG that the data is updated and needs to be transferred?

Calling 'osg::Array::dirty()' and 'osg::Geometry::dirtyBound()'.

How is the update, cull and draw, draw GPU stats typically look when working OK and when stalls happen?
 
--

There may be completely different ways of achieve the end result you want, from using VulkanSceneGraph (Vulkan is just far better designed handling data management) through to refactoring to use shaders to do more of the heavy lifting, 

Could you provide a screenshot or a high level description of what you are doing with all this vertex data.

Cheers,
Robert

Daniel Trstenjak

unread,
Jun 24, 2021, 5:39:37 AM6/24/21
to OpenSceneGraph Users
Hi Robert,

> 1) might be the OS or other application/process running at the same time and periodically running,, make sure you
> have as few other apps and processes running during testing
> 2) the OpenGL fifo might be very close to filling on each frame and sometimes fills complete blocking your
> application/OSG from putting more data into the fifom causing the stall.

there's no other major application running at the same time. The system
monitor doesn't show up anything at the stall rate.

Beside the application I have only some consoles open, so no other
graphical application that might be using OpenGL.


> What happens with the stalls if you reduce the amount of data? Is there a trigger point where the amount of data trips
> the system and it begins to exhibit the stalls?

With the 55MB coordinate/normal data the draw times are very stable
between 14-17ms. With the 110MB coordinate/normal data most draw times
are between 40-50ms with the stall up to 200ms about every 10 frames,
but it's not deterministic every 10 frames.

I just talked to a colleague that tested it on a newer system with the
110MB coordinate/normal data. He has no stalls at all and the draw times
are quite stable at 20ms. I read that the NVIDIA GTX 970 doesn't really
take advantage of PCI-E 3.0 and runs with almost the same performance on
PCI-E 2.0. So I suspect that it might really be about the memory performance.


> Have you tried just using standard vertex arrays, so no VBO and no display lists? This would force it to copy on each
> new frame, but sometimes can help.

It's a lot slower with standard vertex arrays.


> How is the update, cull and draw, draw GPU stats typically look when working OK and when stalls happen?

The update and cull times are quite low (<0.5ms) in all cases, with or
without the stall and regardless of the size of the coordinate/normal data.


> There may be completely different ways of achieve the end result you want, from using VulkanSceneGraph (Vulkan is just
> far better designed handling data management) through to refactoring to use shaders to do more of the heavy lifting,

The problem is that it's a big graphical application based on the OpenGL
fixed pipeline so even using shaders for this isn't an option at the
moment, because a lot of other graphical features are based on the fixed
pipeline.

Thanks a lot for your time!


Greetings,
Daniel
Reply all
Reply to author
Forward
0 new messages