[osg-users] Rendering performance issues

211 views
Skip to first unread message

Bob Balfour

unread,
Jul 7, 2008, 6:37:23 PM7/7/08
to OpenSceneGraph Users
I'm experiencing some performance peculiarities that I don't quite understand. My platform is an HP Blackbird, Vista OS, 4-CPU, 2 NVidia 8800, OSG2.4.

I have attached 4 timing images:

The first image (vsyncOFF-1GPU.jpg) shows the timing rendering 1 slave camera on an extended (2880x900) dual-monitor on a single Nvidia card.  Note the 211Hz rendering rate with VSync off and the substantial Draw/GPU overlap.  [Note that in all of these timing tests none of the CPUs appeared more than 50% loaded.]  This is the type of performance I was anticipating.

Now all I did was turn on Vsync.  The second image (vsyncON-app-long.jpg) now shows a 60Hz rendering rate as expected (with a delay after GPU waiting for VSync), but no Draw/GPU overlap, there's even a slight gap between the two?  Why is this different now?

Now I flew thru the scene to a point where the Draw became quite small. The third image (vsyncON-app-short.jpg)  shows the Cull,Draw,GPU all <1.0, but I'm only achieving a 46Hz rendering rate because of the considerable gap between Draw and GPU.  Why is the GPU now being delayed so much after Draw?? It's almost like the GPU is stuck starting out there mid-frame?

Now I turned VSync back OFF, but this time rendered TWO slave cameras (each 2880x900, one on each Nvidia card), and the same effect appeared (see the forth image vsyncOFF-2GPUs.jpg) where the GPUs are being delayed to mid-frame, and there is quite a delay after the GPUs complete (which should not be there with VSync OFF?).  Because of these substantial delays which I don't understand, performance has degraded considerably from my initial test image, which had good overlap and no delays.

Is there any explanation for this, and can the delays be eliminated and performance improved?

Also, is there a purpose/need for VSync using LCD flatpanel monitors or DLP projectors?

Thanks.

Bob.
--
Robert E. Balfour, Ph.D.
Exec. V.P. & CTO,  BALFOUR Technologies LLC
960 So. Broadway, Suite 108, Hicksville NY 11801
Phone: (516)513-0030  Fax: (516)513-0027  email: b...@BAL4.com
"Solutions in four dimensions" with fourDscape®
vsyncOFF-1GPU.jpg
vsyncON-app-long.jpg
vsyncON-app-short.jpg
vsyncOFF-2GPUs.jpg

James Killian

unread,
Jul 7, 2008, 7:12:07 PM7/7/08
to OpenSceneGraph Users

I am so glad that someone else is getting the same results as we are. I've
seen these same kind of gaps with our game and they come and go as I have
not been able to nail down the cause for them yet. My guess is that there
is some external CPU I/O activity which is causing this, especially if the
graph representation works like a profiler class. The last time I profiled
(I think around 8482) the threads looked very clean especially with minimal
critical section usage.

I guess the question for you, is can you profile your external CPU activity?
I use QueryPerformanceCount() and clock the time used for external code. I
know for instance that collision detection has been expensive, and thus
causes the gaps. Aside from this, if you have a Win32 platform you may use
SystemInternals tools like process monitor (formerly known as filemon) to
ensure your app is not doing some unexpected I/O operation. I know for
instance that the trailed smoke particle effect will continuously reload the
image file everytime it is used... I wouldn't have found this if it wasn't
for filemon. Perhaps there is some other implicit I/O surprise which is
happening to you.

I hope you can rule out the external workload, because perhaps there is some
performance change that we all can benefit from. :)

> /"Solutions in four dimensions" with *fourDscape*®/
>
>


----------------------------------------------------------------------------
----


----------------------------------------------------------------------------
----


----------------------------------------------------------------------------
----


----------------------------------------------------------------------------
----


----------------------------------------------------------------------------
----


> _______________________________________________
> osg-users mailing list
> osg-...@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>

_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Paul Martz

unread,
Jul 7, 2008, 10:00:26 PM7/7/08
to OpenSceneGraph Users
The first image (vsyncOFF-1GPU.jpg) shows the timing rendering 1 slave camera on an extended (2880x900) dual-monitor on a single Nvidia card.  Note the 211Hz rendering rate with VSync off and the substantial Draw/GPU overlap.  [Note that in all of these timing tests none of the CPUs appeared more than 50% loaded.]  This is the type of performance I was anticipating.

Now all I did was turn on Vsync.  The second image (vsyncON-app-long.jpg) now shows a 60Hz rendering rate as expected (with a delay after GPU waiting for VSync), but no Draw/GPU overlap, there's even a slight gap between the two?  Why is this different now? 
I suspect this issue is caused by the graphics card having an upper limit on the number of buffer swaps queued up. Once the limit of queued swaps is reached, the card won't start processing the next frame until it performs the swap for the current frame, thus causing the "gap" that you see. This is, of course, not an issue with vsync disabled.
 
Not sure about the "gap" in the rest of your post; perhaps Robert has more info.
 
I hope that helps,
 
Paul Martz
Skew Matrix Software LLC
+1 303 859 9466

Adrian Egli OpenSceneGraph (3D)

unread,
Jul 8, 2008, 2:37:40 AM7/8/08
to OpenSceneGraph Users
Hi all

i just tested osgviewer with a pagedLod scene, and i got some interesting behaviour, may zou can retest it similar:  change the threading mode with 'm'. (i am working under windows vista)

under Singlethread, the cap become much smaller.

adegli

2008/7/8 Paul Martz <pma...@skew-matrix.com>:
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




--
********************************************
Adrian Egli
Im001.PNG
Im002.PNG
Im003.jpg

Robert Osfield

unread,
Jul 8, 2008, 4:29:22 AM7/8/08
to OpenSceneGraph Users
Hi Guys,

The gap between draw dispatch (the yellow bar) and draw GPU (the
orange bar) is by its nature variable. The reason is it so variable
is that I have to estimate the begining of the draw GPU as the OpenGL
GPU stats extension only provide elapsed time, they don't provide any
details on absolutely time, while the CPU code is all in absolute
timings. As GPU stats are really useful the best I could do is take
the time that the GPU stats were collected (on the next frame), which
gives us a time that we know it was complete by, then estimate that
GPU work was, I can't recall the exact maths off hand. The bottom
line is that it's position is an estimate, but it's length is measure.

As for the frame rate being awful when running with two GPU, this is
likely to be a driver/OS issue.

W.r.t whether you should use vsynv when you are using
projectors/LCD's, yes vysn is still required, they still read their
data via a scan line, so if swap buffer happens during the scan then
some of the screen will be from the previous frame, and some from the
next, or it can even be several frames worth on a single frame if your
frame rate is high. What you'll see if a tearing across the screen,
especially visible when you are turning the eye point quickly.

Robert.

Bob Balfour

unread,
Jul 8, 2008, 11:28:56 AM7/8/08
to OpenSceneGraph Users
My main frustration is demonstrated by the second and third timing images I had posted, both with VSync on (so 60Hz max frame rate), with a single GPU.  Both Event/Update timings are trivial.

1. One timing had Cull(2.22)+Draw(3.81)+GPU(3.86)=9.89ms @ 59.99Hz  (good)

Now all I did was trackball the eyepoint to another location in the scene, and got:
2. Cull(0.64)+Draw(0.90)+GPU(0.92)=2.46ms @ 46.46Hz frame rate.

This makes no sense to me.  With 4 times more rendering time(#1) we can achieve max frame rate, but with a very light rendering load(#2), our frame rate is substantially degraded?  How frustrating is that!

w.r.t. Paul's suggestion about buffer swaps being queued being the cause of the Draw/GPU gap, I'm not sure I quite understand how/why that would happen, plus if that's a potential cause why moving the eyepoint in my example above to a less rendering load would cause it to happen.

Clearly something is causing a large delay, but I'm still quite confused as to what it might be, how to track it down, and how to eliminate/avoid it.

Bob.

---------------------------------------------------------------------------------------

Paul Martz

unread,
Jul 8, 2008, 12:01:04 PM7/8/08
to OpenSceneGraph Users
 
My main frustration is demonstrated by the second and third timing images I had posted, both with VSync on (so 60Hz max frame rate), with a single GPU.  Both Event/Update timings are trivial.

1. One timing had Cull(2.22)+Draw(3.81)+GPU(3.86)=9.89ms @ 59.99Hz  (good)

Now all I did was trackball the eyepoint to another location in the scene, and got:
2. Cull(0.64)+Draw(0.90)+GPU(0.92)=2.46ms @ 46.46Hz frame rate.

This makes no sense to me.  With 4 times more rendering time(#1) we can achieve max frame rate, but with a very light rendering load(#2), our frame rate is substantially degraded?  How frustrating is that! 
 
Have you tried adding your own timing code to verify that you really are no longer getting 60Hz framerate? ("Trust but verify" as Reagan once said.)
 
 w.r.t. Paul's suggestion about buffer swaps being queued being the cause of the Draw/GPU gap, I'm not sure I quite understand how/why that would happen, plus if that's a potential cause why moving the eyepoint in my example above to a less rendering load would cause it to happen. 
 I can try to explain it better: The graphics hardware has a FIFO for receiving input. When the upper limit on swaps is reached, the hardware blocks the OS from putting more stuff into the FIFO until one of the swaps already in the FIFO gets processed. Only then will the FIFO accept the new data.
 
Illustration: Assume the hardware has a max queued swap limit of 2 swaps, and the application is running at an ungodly fast pace...
 
  App sends frame 0 data
    Hardware starts processing frame 0 data
  App issues swap for frame 0
  App sends frame 1 data
  App issues swap for frame 1
  App is now blocked because 2 swaps are queued
    Hardware executes swap to display frame 0
    Hardware starts processing frame 1 data
  App sends frame 2 data
  App issues swap for frame 2
  App is now blocked because 2 swaps are queued
    Hardware executes swap to display frame 1
    Hardware starts processing frame 2 data
  App sends frame 3 data
  App issues swap for frame 3
  App is now blocked because 2 swaps are queued
    Hardware executes swap to display frame 2
    Hardware starts processing frame 3 data
 
Etc.
Thus, there will be a gap between when OSG sends the data, and the hardware begins processing it.
 
I'm not sure this has anything to do with your third case, where the rendering load is lighter yet the framerate appears to drop. I believe these are two separate issues.

Todd J. Furlong

unread,
Jul 8, 2008, 1:03:36 PM7/8/08
to OpenSceneGraph Users
Not sure if this is related, but...

When timing an OpenGL application, we always make sure to do a
glFinish() before recording the time at the end of a draw. The glFinish
waits for the OpenGL commands in the pipeline to complete before
returning and gives a truer measurement of the time. Without waiting,
draw timing tends to be reported as higher than it really is until the
FIFO buffer gets filled.

-Todd

> *Skew Matrix Software LLC*
> http://www.skew-matrix.com <http://www.skew-matrix.com/>
> +1 303 859 9466
>

--
Todd J. Furlong
Inv3rsion, LLC
http://www.inv3rsion.com

Robert Osfield

unread,
Jul 8, 2008, 1:10:17 PM7/8/08
to OpenSceneGraph Users
Hi Todd,

On Tue, Jul 8, 2008 at 6:03 PM, Todd J. Furlong <to...@inv3rsion.com> wrote:
> Not sure if this is related, but...
>
> When timing an OpenGL application, we always make sure to do a glFinish()
> before recording the time at the end of a draw. The glFinish waits for the
> OpenGL commands in the pipeline to complete before returning and gives a
> truer measurement of the time. Without waiting, draw timing tends to be
> reported as higher than it really is until the FIFO buffer gets filled.

This is typically true for trying to time OpenGL, but when you have
the OpenGL timing stats extension supported you can put timing markers
directly into the fifo and get back actual elapsed time between points
for what is happening down on the GPU. This extension means you can
get a clear picture of what's happening on the GPU without needing to
flush/finish. osgViewer supports this extension, and the stats that
this thread are covering have these GPU stats in place.

Robert.

Alf Inge Iversen

unread,
Jul 18, 2008, 11:54:48 AM7/18/08
to OpenSceneGraph Users
Bob Balfour wrote:

> Why is the GPU now being delayed so much after
> Draw?? It's almost like the GPU is stuck starting out there mid-frame?

Hi,
We are dealing with a problem that looks like a closely related problem. Though we would prefer a kind of delay after draw:

Our application is a real-time visualization process, and constant frame rate at 60 Hz is an absolute necessity. Most of the time the load is not not bigger than 60 Hz should be well maintained, but we still observe severe glitches in the visualiation, especially when running a mirror view as a second channel embedded in the main channel.

Closer investigation (by visually observing the OSG stats, together with printouts to the console window to detect lost real-tme position data) shows that even if no datasets are missing and the computing time is well inside the available time slot, glitches occur. When observing the stats, we realized that the execution of the GPU happen to be performed sometimes within the current frame (see attached image GPU-early.jpg), sometimes in the next frame (see GPU-late.jpg). We could not identify why the execution of the GPU is sometimes postponed. As you see from the images that even when the load is minimal, the GPU may be postponed. We observed significant glitches in the visualization at the moments the execution shifted from one frame to the other or vice versa.

Some latency can be accepted, so to ensure a stable image, it should be possible to lock the start of the GPU processing to the vertical sync. It would then process on data delivered from the Draw of last frame. This would give a stable image, in addition to increase the available time for the processing (full 16 ms for Update, Cull and Draw, and another full 16 ms for the GPU).

Is there a way to control the execution start time of the GPU from the application?


-Alf Inge

----------------------------------------
Mr. Alf Inge Iversen, VP Engineering
AutoSim AS, Strandveien 106, 9006 Tromsø
Visit us at http://www.autosim.no

GPU-early.JPG
GPU-late.JPG

Robert Osfield

unread,
Jul 18, 2008, 1:31:29 PM7/18/08
to OpenSceneGraph Users
Hi Alfe,

Is there a reason why you are running single threaded? This would the
obvious improvement you could make.

Robert.

Alf Inge Iversen

unread,
Jul 18, 2008, 2:08:54 PM7/18/08
to OpenSceneGraph Users
Hi Robert,

We just recently (three weeks ago) upgraded our software to OSG 2.4, and in 2.2 we was not able to get the multithreading work properly. There will still be some time untill we release the new version to our customers as the release has to be synchronized with some other modules.

However, even with multithreading, how can we ensure that the GPU execution is not shifted between the frames? If you see the two images I attached to my previous post, you will see that in the situation with low load, the GPU processing is done on the next frame even if it is more than time enough within current frame. While in the more loaded situation the GPU processing is done on current frame even if it barely is time.

We need to be sure that the GPU is executed every frame, and the only way I can think about, is to synchronize the GPU to the vertical sync signal. It is not a good idea to wait until culling is ended, and then trust that there will be time eough to complete the GPU processing within the same frame.


Thanks,
-Alf Inge

Robert Osfield

unread,
Jul 18, 2008, 4:03:00 PM7/18/08
to OpenSceneGraph Users
Hi Alfe,

Multi-threading is really out of scope of this thread so I suggest you
start up a thread about this particular topic. Also please specify
viewer usage/OS version/OSG version/hardware available etc.

Robert.

Alf Inge Iversen

unread,
Jul 18, 2008, 5:01:37 PM7/18/08
to OpenSceneGraph Users
Hi Robert,

The question I have is not about multithreading, but how to ensure
regular execution of the GPU. Fluctuating GPU execution is our problem,
which is closely related to the problem first described in this thread.
You mentioned multithreading as a possible solution, which I think is
not enough to ensure a stable GPU.

But you are right, I will start another thread on this particular issue.

Regards,
Alf Inge


Robert Osfield wrote:

> Multi-threading is really out of scope of this thread so I suggest you
> start up a thread about this particular topic. Also please specify
> viewer usage/OS version/OSG version/hardware available etc.
>
> Robert.
>

Reply all
Reply to author
Forward
0 new messages