Robert E. Balfour, Ph.D. Exec. V.P. & CTO, BALFOUR Technologies LLC 960 So. Broadway, Suite 108, Hicksville NY 11801 Phone: (516)513-0030 Fax: (516)513-0027 email: b...@BAL4.com "Solutions in four dimensions" with fourDscape®
I guess the question for you, is can you profile your external CPU activity?
I use QueryPerformanceCount() and clock the time used for external code. I
know for instance that collision detection has been expensive, and thus
causes the gaps. Aside from this, if you have a Win32 platform you may use
SystemInternals tools like process monitor (formerly known as filemon) to
ensure your app is not doing some unexpected I/O operation. I know for
instance that the trailed smoke particle effect will continuously reload the
image file everytime it is used... I wouldn't have found this if it wasn't
for filemon. Perhaps there is some other implicit I/O surprise which is
happening to you.
I hope you can rule out the external workload, because perhaps there is some
performance change that we all can benefit from. :)
> /"Solutions in four dimensions" with *fourDscape*®/
>
>
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
----------------------------------------------------------------------------
----
> _______________________________________________
> osg-users mailing list
> osg-...@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
The first image (vsyncOFF-1GPU.jpg) shows the timing rendering 1 slave camera on an extended (2880x900) dual-monitor on a single Nvidia card. Note the 211Hz rendering rate with VSync off and the substantial Draw/GPU overlap. [Note that in all of these timing tests none of the CPUs appeared more than 50% loaded.] This is the type of performance I was anticipating.
Now all I did was turn on Vsync. The second image (vsyncON-app-long.jpg) now shows a 60Hz rendering rate as expected (with a delay after GPU waiting for VSync), but no Draw/GPU overlap, there's even a slight gap between the two? Why is this different now?
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
The gap between draw dispatch (the yellow bar) and draw GPU (the
orange bar) is by its nature variable. The reason is it so variable
is that I have to estimate the begining of the draw GPU as the OpenGL
GPU stats extension only provide elapsed time, they don't provide any
details on absolutely time, while the CPU code is all in absolute
timings. As GPU stats are really useful the best I could do is take
the time that the GPU stats were collected (on the next frame), which
gives us a time that we know it was complete by, then estimate that
GPU work was, I can't recall the exact maths off hand. The bottom
line is that it's position is an estimate, but it's length is measure.
As for the frame rate being awful when running with two GPU, this is
likely to be a driver/OS issue.
W.r.t whether you should use vsynv when you are using
projectors/LCD's, yes vysn is still required, they still read their
data via a scan line, so if swap buffer happens during the scan then
some of the screen will be from the previous frame, and some from the
next, or it can even be several frames worth on a single frame if your
frame rate is high. What you'll see if a tearing across the screen,
especially visible when you are turning the eye point quickly.
Robert.
My main frustration is demonstrated by the second and third timing images I had posted, both with VSync on (so 60Hz max frame rate), with a single GPU. Both Event/Update timings are trivial.
1. One timing had Cull(2.22)+Draw(3.81)+GPU(3.86)=9.89ms @ 59.99Hz (good)
Now all I did was trackball the eyepoint to another location in the scene, and got:
2. Cull(0.64)+Draw(0.90)+GPU(0.92)=2.46ms @ 46.46Hz frame rate.
This makes no sense to me. With 4 times more rendering time(#1) we can achieve max frame rate, but with a very light rendering load(#2), our frame rate is substantially degraded? How frustrating is that!
w.r.t. Paul's suggestion about buffer swaps being queued being the cause of the Draw/GPU gap, I'm not sure I quite understand how/why that would happen, plus if that's a potential cause why moving the eyepoint in my example above to a less rendering load would cause it to happen.
When timing an OpenGL application, we always make sure to do a
glFinish() before recording the time at the end of a draw. The glFinish
waits for the OpenGL commands in the pipeline to complete before
returning and gives a truer measurement of the time. Without waiting,
draw timing tends to be reported as higher than it really is until the
FIFO buffer gets filled.
-Todd
> *Skew Matrix Software LLC*
> http://www.skew-matrix.com <http://www.skew-matrix.com/>
> +1 303 859 9466
>
--
Todd J. Furlong
Inv3rsion, LLC
http://www.inv3rsion.com
On Tue, Jul 8, 2008 at 6:03 PM, Todd J. Furlong <to...@inv3rsion.com> wrote:
> Not sure if this is related, but...
>
> When timing an OpenGL application, we always make sure to do a glFinish()
> before recording the time at the end of a draw. The glFinish waits for the
> OpenGL commands in the pipeline to complete before returning and gives a
> truer measurement of the time. Without waiting, draw timing tends to be
> reported as higher than it really is until the FIFO buffer gets filled.
This is typically true for trying to time OpenGL, but when you have
the OpenGL timing stats extension supported you can put timing markers
directly into the fifo and get back actual elapsed time between points
for what is happening down on the GPU. This extension means you can
get a clear picture of what's happening on the GPU without needing to
flush/finish. osgViewer supports this extension, and the stats that
this thread are covering have these GPU stats in place.
Robert.
> Why is the GPU now being delayed so much after
> Draw?? It's almost like the GPU is stuck starting out there mid-frame?
Hi,
We are dealing with a problem that looks like a closely related problem. Though we would prefer a kind of delay after draw:
Our application is a real-time visualization process, and constant frame rate at 60 Hz is an absolute necessity. Most of the time the load is not not bigger than 60 Hz should be well maintained, but we still observe severe glitches in the visualiation, especially when running a mirror view as a second channel embedded in the main channel.
Closer investigation (by visually observing the OSG stats, together with printouts to the console window to detect lost real-tme position data) shows that even if no datasets are missing and the computing time is well inside the available time slot, glitches occur. When observing the stats, we realized that the execution of the GPU happen to be performed sometimes within the current frame (see attached image GPU-early.jpg), sometimes in the next frame (see GPU-late.jpg). We could not identify why the execution of the GPU is sometimes postponed. As you see from the images that even when the load is minimal, the GPU may be postponed. We observed significant glitches in the visualization at the moments the execution shifted from one frame to the other or vice versa.
Some latency can be accepted, so to ensure a stable image, it should be possible to lock the start of the GPU processing to the vertical sync. It would then process on data delivered from the Draw of last frame. This would give a stable image, in addition to increase the available time for the processing (full 16 ms for Update, Cull and Draw, and another full 16 ms for the GPU).
Is there a way to control the execution start time of the GPU from the application?
-Alf Inge
----------------------------------------
Mr. Alf Inge Iversen, VP Engineering
AutoSim AS, Strandveien 106, 9006 Tromsø
Visit us at http://www.autosim.no
Is there a reason why you are running single threaded? This would the
obvious improvement you could make.
Robert.
We just recently (three weeks ago) upgraded our software to OSG 2.4, and in 2.2 we was not able to get the multithreading work properly. There will still be some time untill we release the new version to our customers as the release has to be synchronized with some other modules.
However, even with multithreading, how can we ensure that the GPU execution is not shifted between the frames? If you see the two images I attached to my previous post, you will see that in the situation with low load, the GPU processing is done on the next frame even if it is more than time enough within current frame. While in the more loaded situation the GPU processing is done on current frame even if it barely is time.
We need to be sure that the GPU is executed every frame, and the only way I can think about, is to synchronize the GPU to the vertical sync signal. It is not a good idea to wait until culling is ended, and then trust that there will be time eough to complete the GPU processing within the same frame.
Thanks,
-Alf Inge
Multi-threading is really out of scope of this thread so I suggest you
start up a thread about this particular topic. Also please specify
viewer usage/OS version/OSG version/hardware available etc.
Robert.
The question I have is not about multithreading, but how to ensure
regular execution of the GPU. Fluctuating GPU execution is our problem,
which is closely related to the problem first described in this thread.
You mentioned multithreading as a possible solution, which I think is
not enough to ensure a stable GPU.
But you are right, I will start another thread on this particular issue.
Regards,
Alf Inge
Robert Osfield wrote:
> Multi-threading is really out of scope of this thread so I suggest you
> start up a thread about this particular topic. Also please specify
> viewer usage/OS version/OSG version/hardware available etc.
>
> Robert.
>