[osg-users] RTT slave views and multi-threading

29 views
Skip to first unread message

Tugkan Calapoglu

unread,
Jan 13, 2010, 6:34:11 AM1/13/10
to OpenSceneGraph Users
Hi All,

I am using a slave view for rendering the scene to a texture. Initially
I tried with a camera node, however, this did not work well due to a
problem in LiSPSM shadows and I was suggested to use RTT slave views.

My setup is as follows: There is a single main view and I attach a slave
view to it. This slave view is attached with addSlave( slave , false );
so that it does *not* automatically use the master scene.

I attach a texture to the slave view and make my scene child of this
view. I attach a screen aligned quad to the main view. This quad
visualizes the RTT texture from the slave view.

Now I have a threading problem which can be seen on the snapshot I
attached. There are two issues:
1- The main view (cam1) has a very large draw time even though it only
renders the screen aligned quad. I double checked to see whether it also
renders the actual scene but this is not the case.

2- Slave view does not run cull and draw in parallel. Cull and draw do
run in parallel if they are not rendered with the slave view. Moreover,
if I change the render order of the slave camera from PRE_RENDER to
POST_RENDER it is ok.

I could simply use POST_RENDER but I am afraid it introduces an extra
one frame latency. If I render the screen aligned quad first and the
scene later than what I see on the quad is the texture from previous
frame (right?).

Any ideas?

cheers,
tugkan

RTTSlaveView1.jpg

Robert Osfield

unread,
Jan 13, 2010, 7:04:14 AM1/13/10
to OpenSceneGraph Users
Hi Tugkan,

The osgdistortion example works a bit like what you are describing,
could you try this to see what performance it's getting.

As for general notes about threading, if you are working on the same
graphics context as you are then all the draw dispatch and the draw
GPU can only be done by a single graphics thread so there is little
opportunity to make it more parallel without using another graphics
card/graphics context and interleaving of frames.

As for why the second camera is very expensive on draw dispatch, this
suggest to me that it's blocking either due to the OpenGL fifo being
full or that it contains a GL read back operation of some kind.

Robert.

> _______________________________________________
> osg-users mailing list
> osg-...@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>
>
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Tugkan Calapoglu

unread,
Jan 13, 2010, 7:20:49 AM1/13/10
to OpenSceneGraph Users
Hi Robert,

> Hi Tugkan,
>
> The osgdistortion example works a bit like what you are describing,
> could you try this to see what performance it's getting.
>
osgdistortion's threading model is set to SingleThreaded in the code. I
changed it to DrawThreadPerContext and now I can see that draw starts
after cull, i.e. they do not run parallel.

> As for general notes about threading, if you are working on the same
> graphics context as you are then all the draw dispatch and the draw
> GPU can only be done by a single graphics thread so there is little
> opportunity to make it more parallel without using another graphics
> card/graphics context and interleaving of frames.
>

Sure. I do not expect that two cameras render in parallel onto a single
window, but cull and draw of a certain camera should run parallel.
Indeed they do so normally with the exact same scene and application. It
breaks only if the second camera (the slave) has PRE_RENDER render order.


tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

Robert Osfield

unread,
Jan 13, 2010, 7:52:57 AM1/13/10
to OpenSceneGraph Users
HI Tugkan,

On Wed, Jan 13, 2010 at 12:20 PM, Tugkan Calapoglu <tug...@vires.com> wrote:
> Sure. I do not expect that two cameras render in parallel onto a single
> window, but cull and draw of a certain camera should run parallel.
> Indeed they do so normally with the exact same scene and application. It
> breaks only if the second camera (the slave) has PRE_RENDER render order.

Cull and draw can only run in a parallel once all the dynamic geometry
has been dispatched, otherwise the draw will be dispatching data that
is being modified by the next frames update and cull traversals.
Perhaps you have some dynamic geometry or StateSet's that are holding
back the next frame.

Regardless of threading of cull your problem is draw dispatch not
cull, you need to look into why the draw dispatch on the second draw
is taking so long. Please look at my last email.

Robert.

Wojciech Lewandowski

unread,
Jan 13, 2010, 8:35:25 AM1/13/10
to OpenSceneGraph Users
Hi Tugkan,

Robert mentioned lengthy read operation. It may be related to read buffer
operation thats used to compute shadow volume in
LightSpacePerpspectiveShadowMapDB. If your slave view uses
osgShadow::LightSpacePerpspectiveShadowMapDB then you may check if
osgShadow::LightSpacePerpspectiveShadowMapCB (cull bounds flavour) has the
same problem.

I am aware of LightSpacePerpspectiveShadowMapDB glReadBuffer limitation but
I could not find quick and easy to implement workaround that would do this
without scanning the image by CPU. I allocate small 64x64 texture and render
the scene there, then read it into CPU memory and use CPU to scan pixels to
optimzie shadow volume from depths and pixel locations strored in this
prerender image.

Wojtek

Tugkan Calapoglu

unread,
Jan 14, 2010, 6:39:58 AM1/14/10
to OpenSceneGraph Users
Hi Robert, Wojciech

my initial guess was that the lengthy draw dispatch of the master view
and failing cull & draw parallelism was the result of the same problem.

However, they actually seem to be different problems and I'll focus
first on the draw dispatch.

The master camera draws only a screen aligned quad and nothing else
(scene with shadows is rendered by the slave camera). Also no dynamic
geometry. But, I indeed have a read buffer operation: a glGetTexImage
call in the postdraw callback of the master camera. This call takes ~12 ms.

I read back a small texture that is rendered by a camera in the current
frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation.

It looks like using glReadPixels to read directly from the FBO is the
advised method for getting data back to the system memory. How do I get
the FBO that the camera is rendering to? Or, is there a better method to
get the texture data back to the sysmem?


cheers,
tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

J.P. Delport

unread,
Jan 14, 2010, 6:57:03 AM1/14/10
to OpenSceneGraph Users
Hi Tugkan,

Tugkan Calapoglu wrote:
> Hi Robert, Wojciech
>
> my initial guess was that the lengthy draw dispatch of the master view
> and failing cull & draw parallelism was the result of the same problem.
>
> However, they actually seem to be different problems and I'll focus
> first on the draw dispatch.
>
> The master camera draws only a screen aligned quad and nothing else
> (scene with shadows is rendered by the slave camera). Also no dynamic
> geometry. But, I indeed have a read buffer operation: a glGetTexImage
> call in the postdraw callback of the master camera. This call takes ~12 ms.
>
> I read back a small texture that is rendered by a camera in the current
> frame. The camera uses FRAME_BUFFER_OBJECT as render target implementation.
>
> It looks like using glReadPixels to read directly from the FBO is the
> advised method for getting data back to the system memory. How do I get
> the FBO that the camera is rendering to? Or, is there a better method to
> get the texture data back to the sysmem?
>

Simplest is to just attach an osg::Image to the RTT (to FBO) camera. See
the attach method of osg::Camera. Think there is an example in osgprerender.

Also see here:
http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/52651
and
http://thread.gmane.org/gmane.comp.graphics.openscenegraph.user/53432

rgds
jp

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

Tugkan Calapoglu

unread,
Jan 14, 2010, 7:00:38 AM1/14/10
to OpenSceneGraph Users
hi Jp,

unfortunately that method is easy but very slow. I think it also uses
glGetTexImage.

cheers,
tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

Robert Osfield

unread,
Jan 14, 2010, 7:13:21 AM1/14/10
to OpenSceneGraph Users
Hi Tugkan,

On Thu, Jan 14, 2010 at 12:00 PM, Tugkan Calapoglu <tug...@vires.com> wrote:
> unfortunately that method is easy but very slow. I think it also uses
> glGetTexImage.

An operation like glReadPixels and glGetTexImage involves the fifo
being flushed and the data copied back into main memory. These two
things together make it slow and there isn't much you can do about it
directly.

The best way to deal with the high cost of these operations is to
avoid them completely. Try to use algorithms that can use render to
texture using FBO's and read this textures directly in other
shaders. Never try to copy the results back to the CPU/main memory,
this does force you to do more work on the GPU and rely on more
complex shaders but in the end it means that you don't have to force a
round trip to the GPU.

Robert.

J.P. Delport

unread,
Jan 14, 2010, 7:21:59 AM1/14/10
to OpenSceneGraph Users
Hi,

Tugkan Calapoglu wrote:
> hi Jp,
>
> unfortunately that method is easy but very slow. I think it also uses
> glGetTexImage.

You might be surprised. Have you read the threads I linked to? Attach
uses glReadPixels (while doing the FBO rendering, so you don't have to
bind anything yourself later) and in many cases this is the fastest. If
you want something more elaborate, such as async PBO use, see the
osgscreencapture example. Also, test whatever you use for your setup,
all sorts of things can change the efficiency of reading data back to
CPU. YMMV.

Like Robert said tho, not reading anything back to CPU if you can help
it is the best.

rgds
jp

--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

_______________________________________________

Tugkan Calapoglu

unread,
Jan 14, 2010, 7:31:00 AM1/14/10
to OpenSceneGraph Users
Hi Robert,

I am working on an HDR implementation which should work on multiple
channels. The method I use requires average luminance of the scene. If I
use different average luminances for different channels the colors will
simply not match. E.g. in a tunnel front channel will see the tunnel
exit and have a higher average luminance than the side channels which
only see the dark tunnel walls.

So, I do need a way to collect current average luminances of all
channels and compute a single average that can be used for all (by
channel I mean separate computers that are connected to separate
projectors).

I know that getting data back from GPU is slow but 12ms for a 4x4
texture seems extreme.

glReadPixels seems to be faster, because we are able to make full screen
grabs (800x600) and still keep 60hz (even w/o pbo). Some GPGPU people
suggest using glReadPixels to read directly from a FBO rather than
glGetTexImage, so I was wondering if there is a way to obtain the
osg::FBO pointer from the camera?

cheers,
tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

Robert Osfield

unread,
Jan 14, 2010, 7:44:14 AM1/14/10
to OpenSceneGraph Users
Hi Tugkan,

On Thu, Jan 14, 2010 at 12:31 PM, Tugkan Calapoglu <tug...@vires.com> wrote:
> I know that getting data back from GPU is slow but 12ms for a 4x4
> texture seems extreme.

It's the flushing of the fifo that is the problem, that's why it's so
slow, not the data transfer itself. Once you flush the fifo you loose
the parallelism between the CPU and GPU.

The only way to hide this is to use PBO's to do the read back and do
the actual read back on the next frame rather than in the current
frame. In your case you might be able to get away with this, a frames
latency might not be a big issue if you can keep to a solid 60Hz and
the values you are reading back aren't changing drastically between
frames.

Tugkan Calapoglu

unread,
Jan 14, 2010, 7:47:13 AM1/14/10
to OpenSceneGraph Users
Hi Jp,

my initial implementation used osg:Image attached to a camera and it was
just as slow.

I will see what I can do with PBO's.


regards,
tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

Tugkan Calapoglu

unread,
Jan 14, 2010, 7:51:53 AM1/14/10
to OpenSceneGraph Users
Hi Robert,

yes one frame latency is OK. Is there an example about the PBO usage?
osgscreencapture seems to be about getting the data from frame buffer
not from an RTT texture.

tugkan


--
Tugkan Calapoglu

-------------------------------------
VIRES Simulationstechnologie GmbH
Oberaustrasse 34
83026 Rosenheim
Germany
phone +49.8031.463641
fax +49.8031.463645
email tug...@vires.com
internet www.vires.com
-------------------------------------
Sitz der Gesellschaft: Rosenheim
Handelsregister Traunstein HRB 10410
Geschaeftsfuehrer: Marius Dupuis
Wunibald Karl
-------------------------------------

J.P. Delport

unread,
Jan 14, 2010, 7:55:34 AM1/14/10
to OpenSceneGraph Users
Hi,

Tugkan Calapoglu wrote:
> Hi Jp,
>
> my initial implementation used osg:Image attached to a camera and it was
> just as slow.

OK.

>
> I will see what I can do with PBO's.

There is some code in the threads I linked to earlier that shows how to
get data into a PBO using osg's PixelBufferDataObject. It does not do
the async reading, but see here for more details:
http://www.songho.ca/opengl/gl_pbo.html

regards
jp

--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.

_______________________________________________

Robert Osfield

unread,
Jan 14, 2010, 7:57:22 AM1/14/10
to OpenSceneGraph Users
Hi Tugkan,

On Thu, Jan 14, 2010 at 12:51 PM, Tugkan Calapoglu <tug...@vires.com> wrote:
> yes one frame latency is OK. Is there an example about the PBO usage?
> osgscreencapture seems to be about getting the data from frame buffer
> not from an RTT texture.

osgscreencapture uses a frame latency when it double buffers the
PBO's. It doesn't matter whether it's frame buffer or FBO, the PBO is
only related to memory management.

Reply all
Reply to author
Forward
0 new messages