Cull threads and frame pipelining

54 views
Skip to first unread message

James Hogan

unread,
Jun 14, 2022, 5:51:35 PM6/14/22
to OpenSceneGraph Users
Hi,

In trying to understand the performance of an OSG app (Flightgear with VR), I
can see from the performance timings overlay that even with
CullThreadPerCameraDrawThreadPerContext a lot of time is spent culling the
main cameras each frame before drawing can start, and that GPU utilisation is
around 50%.

I've had a play around with multiple near/far/middle cameras which if ranged
right with increasing amounts of cull work (culls happening in parallel) are
able to get some drawing done sooner and each one culled in time for when its
ready to draw, increasing FPS.

However ideally it would be possible to start the next frame going (event,
update, & cull) as soon as culling of current frame is complete, so that its
ready to start drawing the next frame as soon as drawing current frame is
complete. This would also fit nicely into how OpenXR allows frame pipelining
with predicted frame times etc.

Is this something that would be theoretically possible with OSG?

Are the dependencies between the stages documented anywhere? Looking at the
code I can see the StateGraph and RenderLeaf stuff produced by cull stages, but
its unclear to me right now whether update can take place safely before these
have been used for drawing.

Mads Sandvei mentioned this about a year ago:
> For example, if a user uses DrawThreadPerContext, the main thread can
> continue to the update phase of the next frame immediately when the last of
> slave cameras have begun its draw traversals.

So my current understanding/guess of dependencies right now (where [n] refers
to frame):
cull[n-1] < event[n] < update[n] < cull[n]
per camera: cull < draw
with single GL context: draws much be sequential

And to avoid having more than x=2 frames in the pipeline, maybe an artificial
dependency:
draw[n-x] < event[n]

Any thoughts?

Thanks
James


Robert Osfield

unread,
Jun 29, 2022, 5:33:43 AM6/29/22
to OpenSceneGraph Users
Hi James,

For performance improvements with VR applications it may be best to have a look at the OpenGL MultiView extension.  These are integrated with the MultiView branch of the OSG that is built off the 3.6 branch.

The MultiView functionality requires use of shaders, but if your scene graph is already shader based these could be tweaked to include the extra uniforms required for the multiview.  Since most OSG/OpenGL applications are CPU limited it's possible to add stereo rendering with very low overhead, in the tests I've done with my Geforce 2060 system it's possible to do stereo with negligible slow down over normal rendering.

There also some work I've done on the VSG that could potentially backported to the OSG that would improve CPU performance such as the vsg::Allocator work.

These approaches will likely give far better performance improvements that trying to be clever about traversals.

Cheers,
Robert.

James Hogan

unread,
Jul 2, 2022, 6:47:48 AM7/2/22
to osg-...@googlegroups.com

Hi Robert,


On Wednesday, 29 June 2022 10:33:43 BST Robert Osfield wrote:

> Hi James,

>

> For performance improvements with VR applications it may be best to have a

> look at the OpenGL MultiView extension.  These are integrated with the

> MultiView branch of the OSG that is built off the 3.6 branch.


Thanks. Yes, I need to have a proper look at this.


> The MultiView functionality requires use of shaders, but if your scene

> graph is already shader based these could be tweaked to include the extra

> uniforms required for the multiview.  Since most OSG/OpenGL applications

> are CPU limited it's possible to add stereo rendering with very low

> overhead, in the tests I've done with my Geforce 2060 system it's possible

> to do stereo with negligible slow down over normal rendering.

>

> There also some work I've done on the VSG that could potentially backported

> to the OSG that would improve CPU performance such as the vsg::Allocator

> work.

>

> These approaches will likely give far better performance improvements that

> trying to be clever about traversals.


Also, I eventually figured out that my flightgear LOD distances were dumb, which seemed to be the cause of most of the cull traversal time. Resetting those to default did improve things a bit and I was able to get it playable on a newish PC, but I'm sure there's plenty more improvements to make.


I think I understand a bit more about the dependencies between frames now too. The issue I think is that there are quite a few dynamic objects which delay the next frame from starting until all dynamic objects have been drawn. Probably quite invasive to fix so I'll drop that idea for now.


I wonder whether VSG takes the same approach to them? I imagine a buffered / RCU style approach to dynamic draw data would be ideal for parallelism but perhaps its not necessary since command buffers can be generated in parallel anyway.


Cheers

James

signature.asc

Chris Djali / AnyOldName3

unread,
Jul 2, 2022, 1:10:48 PM7/2/22
to OpenSceneGraph Users
Hi James,

While dynamic drawables and statesets are usually the easiest, there are other approaches that can get the same results without blocking the next frame. For example, OpenMW double buffers drawables and statesets that would otherwise need to be dynamic, and selects alternating ones for each frame via cull/update callbacks etc.

Hope this helps,

Chris
Reply all
Reply to author
Forward
0 new messages