Re: [osg-users] Performance drop in 3.6.4 vs 3.5.1

162 views
Skip to first unread message

Robert Osfield

unread,
Jan 8, 2020, 11:04:50 AM1/8/20
to OpenSceneGraph Users
Hi Anders,

Try exporting the .obj file to .osgb or .osgt from 3.5.1 and then compare the performance between 3.5.1 and 3.6.4.  This would check whether the .obj loader is a variable.

As a general comment, frame rates many times higher than vsync should be treated careful, the frame time can be so small that small overheads elsewhere can lead to large % changes that appear significant but once you start using normal workloads these small differences no longer have an outsized % difference.

I would check other OS's to see see if the differences apply there too.

Robert.



On Wed, 8 Jan 2020 at 15:56, Anders Backman <and...@cs.umu.se> wrote:
Hi all.

Windows 10.
NVida GeForce RTX 2080.


I recently switched to 3.6.4 from 3.5.1 and noticed a huge drop in performance, especially when running with two separate windows (two applications).

1. I use SingleThreaded mode
2. I use  m_viewer->setReleaseContextAtEndOfFrameHint(false);
3.  I use window->setSyncToVBlank(false);

The above attributes are quite tightly connected to my issues.

But first, running osgViewer with those settings I get using a simple obj file. Details not important, see below.
When loading a simple .obj file (couple of hundred triangles) into osgViewer:

> osgViewer --window 0 0 1280 720

3.5.1: 1900 fps. Draw 0.08ms
3.6.4: 2500 fps Draw 0.05ms

Now this already show something different between the two versions, although perhaps not so relevant.

image.png

But if I start two instances of the viewer at the same time, I get a HUGE difference.

3.5.1: Two windows, both run in ~2000fps, smoothly.
3.6.4: Two windows, fps varies between 80-1200fps. Animations are not smooth at all (when spinning the model).

I also have a few more issues, but I have not been able to pin them down yet:

I get small objects culled at a certain distance although they where not culled in previous version.
Might be some change in how bounding volume update is handled in 3.6.4, will know more later.

Anyone else noticed the performance drop/change in 3.6.4?

/Anders



--
__________________________________________
Anders Backman, HPC2N
90187 Umeå University, Sweden
and...@cs.umu.se http://www.hpc2n.umu.se
Cell: +46-70-392 64 67
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Robert Osfield

unread,
Jan 9, 2020, 7:07:26 AM1/9/20
to OpenSceneGraph Users
Hi Anders,

I have just run the program and model on my Linux system and it works OK with the 3.6 branch.  I haven't compared to 3.5.1 yet as this would require a rebuild.  While there are other things to look at first.

I had a look at the o.osgt and note that there are a large number of triangle strip primitive sets in the geometry, this is inefficient for modern hardware, if one uses display lists then it hides this inefficiency, but when using VBO or even VAO as required by modern OpenGL performance will be poor with a high CPU overhead for the amount of vertex/polygon data.

One change between 3.5.1 and 3.6.x was the change to using VBO's by default - to fit better with OpenGL ES and OpenGL 3.x usage.  This will have meant that by default VBO's will be preferred over display lists where supported, so poorly optimized meshes will become more obvious.

To improve performance with modern hardware/OpenGL the best thing is to run the osgUtil::MeshOptimizers to sort out the meshes,

A snippet of the differences below:

> git diff OpenSceneGraph-3.5.1 Drawable.cpp

Drawable::Drawable()
 {
-    _boundingBoxComputed = false;
-
     // Note, if your are defining a subclass from drawable which is
     // dynamically updated then you should set both the following to
     // to false in your constructor.  This will prevent any display
@@ -286,33 +226,58 @@ Drawable::Drawable()
     _useDisplayList = false;
 #endif
 
+#if 0
     _supportsVertexBufferObjects = false;
+    //_useVertexBufferObjects = false;
     _useVertexBufferObjects = false;
-    // _useVertexBufferObjects = true;
+#else
+    _supportsVertexBufferObjects = true;
+    _useVertexBufferObjects = true;
+#endif
+
+    _useVertexArrayObject = false;
 }



On Thu, 9 Jan 2020 at 11:45, Anders Backman <and...@cs.umu.se> wrote:
The biggest issue here is that two windows (without vsync) now heavily affects each other, which they did not in OSG 3.5.1. That is certainly a big difference between the two versions.
The performance difference remains after writing a osgt file in 3.5.1

Anders Backman

unread,
Aug 2, 2021, 8:55:47 AM8/2/21
to OpenSceneGraph Users
Recently discovered this when running OSG on a MultiGPU machine:

  _affinity = OpenThreads::Affinity(availableProcessors[availableProcessor]);


It turns out that even if we tell OSG to use different GPU:s for each instance of the application, the app will set affinity to the first CPU.
So even if the computer has 32 cores and 8 GPU:s, CPU 0 will be the bottleneck!! When running 8 osg-applications on this monster machine will completely kill the performance as Core 0 will be completely trashed due to context switching.

This is a side effect we did not expect which has really affected performance for us for quite some time. 
The  viewerBaseInit is a protected method called from the constructor, so we would have to do some workaround to manage this.

Any suggestions on how to make this into a more general viable solution?

Cheers,
Anders

"François Cami"

unread,
Aug 2, 2021, 11:57:38 AM8/2/21
to osg-...@googlegroups.com, Anders Backman
On Mon, Aug 2, 2021 at 2:55 PM Anders Backman <backm...@gmail.com> wrote:
>
> Recently discovered this when running OSG on a MultiGPU machine:
>
> _affinity = OpenThreads::Affinity(availableProcessors[availableProcessor]);
>
> https://github.com/openscenegraph/OpenSceneGraph/blob/master/src/osgViewer/ViewerBase.cpp#L115
>
> It turns out that even if we tell OSG to use different GPU:s for each instance of the application, the app will set affinity to the first CPU.
> So even if the computer has 32 cores and 8 GPU:s, CPU 0 will be the bottleneck!! When running 8 osg-applications on this monster machine will completely kill the performance as Core 0 will be completely trashed due to context switching.
>
> This is a side effect we did not expect which has really affected performance for us for quite some time.
> The viewerBaseInit is a protected method called from the constructor, so we would have to do some workaround to manage this.
>
> Any suggestions on how to make this into a more general viable solution?

Short of hacking osgViewer, I think you could try to set CPU core
affinity from the CLI:
$ taskset -c X myOSGapp
(replace X by the core you want to run on and see TASKSET(1) for more examples).

If this computer has NUMA, maybe NUMACTL(8) will work better for you.

Could you report if this works for you?

Regards,
François
> --
> You received this message because you are subscribed to the Google Groups "OpenSceneGraph Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to osg-users+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/osg-users/975c8861-8b16-49aa-9182-c70f6a49964en%40googlegroups.com.

Robert Osfield

unread,
Aug 3, 2021, 3:50:12 AM8/3/21
to OpenSceneGraph Users
Hi Anders,

The Affinity class is meant to help control affinity, and make it possible to override the default behavior.  It's quite a few years since I worked on it so I can't recall the details.  I will have discussed this work here on osg-users so searching for osg::Affinity in the archives as well as the code base will likely be helpful.

Robert,

Chris Djali / AnyOldName3

unread,
Aug 4, 2021, 6:39:44 PM8/4/21
to OpenSceneGraph Users
Hi everyone,

In OpenMW we were having issues where fixed affinity was giving much worse results than the OS scheduler (e.g. because it stops the process being moved to a colder core when one heats up, and therefore hurts dynamic frequency scaling). As of OSG 3.5.5, osgViewer::ViewerBase has had the setUseConfigureAffinity function available to disable setting the affinity, and using it works for us. You need to call it on the viewer before it sets up threading and otherwise it just works.

Hope this helps,

Chris
Reply all
Reply to author
Forward
0 new messages