[osg-users] Best practice for dynamic StateSets & Geometry

Jannik Heller

unread,

Apr 14, 2015, 1:27:54 PM4/14/15

to osg-...@lists.openscenegraph.org

Hi OSG friends,

A common challenge for OSG users are the implications of the viewer threading model - by default the viewer.frame() will return before the draw dispatch is complete, meaning users (and the OSG) can start preparing the next frame before the current frame has completed. However, if you attempt to change a StateSet or Drawable in the frame update, you run the risk of modifying data that the OSG is still working with in a background thread, resulting in crashes.
Often times you will see code dealing with this by setting the DataVariance of the object to DYNAMIC. Unfortunately as result the draw dispatch has to complete before the frame() returns, for me this dropped the frame rate in half.
Recently I developed a more efficient solution for dealing with this and would like to hear your thoughts.
The idea is similar to "double buffering" with the framebuffer - you create two copies of the data on start, one copy is write only, another copy is read only, and when the frame completes the roles are swapped. You can implement this idea for both Drawables and StateSets:
- Dynamic Drawables (RigGeometry, MorphGeometry, etc): create a deep copy of the Drawable, decorate both Drawables with a FrameSwitch node. A FrameSwitch node is a variant of Group that only traverses even or non-even children based on the current FrameStamp. Code (https://github.com/OpenMW/openmw/blob/f7da9796692e14c79632cb85fa75a90b082cd863/components/nifosg/nifloader.cpp#L179)
- Dynamic StateSets: Create two copies of the StateSet on start, then every frame in a NodeCallback swap the roles of these StateSets, apply changes to the first StateSet, then set the currently active StateSet on the Node. Code (https://github.com/scrawl/openmw/blob/osg/components/sceneutil/statesetupdater.cpp#L8)

There are some downsides to this approach (mostly that for data that is just rarely changing, you have to apply every change twice), but other than that it works beautifully and now I've got 2x the framerate again.

I'm curious how the OSG veterans are dealing with this. Anything I've missed?

Cheers
Jannik

------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=63390#63390

_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Robert Osfield

unread,

Apr 15, 2015, 5:53:19 AM4/15/15

to OpenSceneGraph Users

Hi Jannik,

General purpose double buffering of data is something I've considered in the past but not attempted to implement into the core OSG as it introduces a range of complexities in implementation and API. There are places in the OSG where buffering is done locally but in general these are areas where the implementation is lightweight and there is a clear benefit.

In your case it's cool that you've found a way of implementing and got a performance benefit. It's a complexity that most users haven't tried to tackle before, but I suspect this is probably partly down to not having the same bottlenecks that you are probably seeing.

I don't know the bottlenecks in your app, but for DrawThreadPerContext to make such big difference in performance it would suggest that your are CPU bound. Are you update, cull, draw dispatch or draw GPU bound? Use the osgViewer::StatsHandler to show the relatively load on screen. Most scene graphs app will have a pretty light update, a modest cull and draw dispatch. If any of these phases are the bottleneck then look to resolving these might be the key to getting best performance rather than adding double buffering.

Also, make sure that you are using a release build when performance profiling, the results you get in debug make a huge difference and can totally distort the relative cost of different phases.

Cheers,

Robert.

Jannik Heller

unread,

Apr 15, 2015, 6:54:24 AM4/15/15

to osg-...@lists.openscenegraph.org

Hi Robert,

Thanks for the hints - I am using a release build, and I already disabled double precision from cmake which gave me another nice boost.

In the stats handler I am seeing roughly the same amount of time spent in the cull, draw and GPU threads. After adding the double buffering the 3 threads all run in parallel so performance is decent now.
I know that my app is CPU bound but there's not much I can do about it.
Some of the time in the cull thread is spent updating vertex animations, and some time for organizing light lists. I have scenes with a lot more than 8 lights, so I have to check what lights are closest to a given sub-graph before rendering it. This system was really slow to begin with but I already optimized it a lot. Non the less setting the lights still has a noticable overhead.
Next problem is the sheer number of objects - often thousands per scene. I tried batching before but the problem is the scenes I am working with are scripted, so objects can be moved or removed at any time, also, batching objects would interfere with the light lists - i.e. with too many objects sharing a large batch I can not set fine grained light lists on them.
I'm looking forward to switching the light system to deferred shading in the future - I'm sure then it will be GPU bound. I still need a forward rendering fallback in place though.

Cheers,

Jannik

------------------
Read this topic online here:

http://forum.openscenegraph.org/viewtopic.php?p=63417#63417

Robert Osfield

unread,

Apr 15, 2015, 9:20:37 AM4/15/15

to OpenSceneGraph Users

Hi Jannik,

On 15 April 2015 at 11:55, Jannik Heller <scr...@baseoftrash.de> wrote:

Thanks for the hints - I am using a release build, and I already disabled double precision from cmake which gave me another nice boost.

I am surprised you saw a boost with disable double precision. What specific element you do you change w.r.t double precision?

For the change to double precision making a difference it's a hint that you have a scene that is poorly balanced.

In the stats handler I am seeing roughly the same amount of time spent in the cull, draw and GPU threads. After adding the double buffering the 3 threads all run in parallel so performance is decent now.
I know that my app is CPU bound but there's not much I can do about it.
Some of the time in the cull thread is spent updating vertex animations, and some time for organizing light lists. I have scenes with a lot more than 8 lights, so I have to check what lights are closest to a given sub-graph before rendering it. This system was really slow to begin with but I already optimized it a lot. Non the less setting the lights still has a noticable overhead.
Next problem is the sheer number of objects - often thousands per scene. I tried batching before but the problem is the scenes I am working with are scripted, so objects can be moved or removed at any time, also, batching objects would interfere with the light lists - i.e. with too many objects sharing a large batch I can not set fine grained light lists on them.
I'm looking forward to switching the light system to deferred shading in the future - I'm sure then it will be GPU bound. I still need a forward rendering fallback in place though.

Thousands of objects per frame is not unusually for a scene graph application and should work fine rendering at 60hz on modern hardware. If you are hitting CPU limits prematurely then it's another hint that the scene graph is poorly balanced.

There are lots of different things you can do to create a more efficient scene graph, exactly what to advice is hard to do without knowing more specifics about the application and types of data being handling. Batching might be one thing to try, but only if it's established as the main bottleneck. From what you've written I wonder if the animation element to your scene is what is slowing things down. Are you using CPU based animation? Could you shift it onto the GPU?

Robert.

Jannik Heller

unread,

Apr 15, 2015, 10:56:28 AM4/15/15

to osg-...@lists.openscenegraph.org

> I am surprised you saw a boost with disable double precision. What specific element you do you change w.r.t double precision?

I enabled OSG_USE_FLOAT_MATRIX and OSG_USE_FLOAT_PLANE and observed a 10% framerate improvement.

Might be related to particle systems, which I forgot to mention I'm also using.

>
> For the change to double precision making a difference it's a hint that you have a scene that is poorly balanced.
>

W.r.t. balancing, for exterior environments I have objects organized in a grid of 3x3 cells. A quad tree might yield better results. Main problem though is that I am restricted to runtime optimizations - the scene itself comes from a third party source that I am not at liberty to modify or distribute. The scenes were created for a game in 2002, so not surprising they're CPU bound on modern hardware now.

I plan on trying the osgUtil::Optimizer at some point though I'm afraid I can only run it on individual models and not the whole scene - as the scene is tightly connected to game logics, scripting, physics, etc.

>
> Are you using CPU based animation? Could you shift it onto the GPU?
>

Yes and yes - I am using software skinning at the moment just because it works. I will try and move to hardware skinning later, but for now I am focused on porting the rest of my rendering code to OSG before optimizing more.

Thanks,
Jannik[/quote]