[osg-users] Optimizing scene structure and geometry

343 views
Skip to first unread message

Jean-Sébastien Guay

unread,
Jan 21, 2011, 4:13:03 PM1/21/11
to OpenSceneGraph Users
Hi all,

I thought I had a pretty firm grasp on what to optimize given a certain
set of scene stats, but I've optimized what I can and I'm still getting
little improvement in results. So I'll explain my situation here and
hope you guys have some good suggestions. Sorry if this is a long
message, but I prefer to give all the relevant data now rather than get
asked later.

The whole scene is about a 200m x 200m square (apart from the ocean and
skydome but these are not significant, I have removed them and confirmed
that the situation is the same). The worst case viewpoint is a flying
view where the whole scene could be visible at once. So I need to
balance culling cost with draw cost, since in some views we will see
only part of the scene (so we should be able to cull away at least part
of what's not visible) and in the flying view everything is visible so
we shouldn't waste too much time doing cull tests which we know will not
cull anything.

The other thing is that there are a lot of dynamic objects, so there are
a lot of transforms. But I can't change this, it's part of our simulation.

So, after doing some optimization (removing redundant groups, building
texture atlases where possible, merging geodes and geometry, generating
triangle strips, most of which I did with the osgUtil::Optimizer), I get
the following stats, which I'll talk about a bit later:

Scene stats:
StateSets 1345
Groups 392
Transforms 672
Geodes 992
Geometry 992
Vertices 139859
Primitives 87444

Camera stats:
State graphs 1282
Drawables 2151
PrimitiveSets 73953
Triangles 3538
Tri. Strips 211091
Tri. Fans 16
Quads 11526
Quad Strips 534
Total primitives 226705

And, both in our simulator and in osgViewer, for the same scene and same
viewpoint, I get:

FPS: ~35
Cull: 5.4ms
Draw: 19ms
GPU: 19ms

This is on a pretty good machine: Core i7 920, GeForce GTX 260.

First of all, the stats above tell me that the "Primitives" part of the
scene stats refers to primitive sets, not just primitives... Since the
camera stats tell me there are over 226000 primitives in the current view.

As you can see, the number of primitiveSets is very high. If I
understand correctly, each PrimitiveSet will result in an OpenGL draw
call, and since my draw time is what's high now, I would want to reduce
that (since I'm currently at about 3 primitives per primitiveSet on
average). If I remove triangle strip generation from the optimizer
options, the stats become:

Scene stats:
StateSets 1345
Groups 392
Transforms 672
Geodes 992
Geometry 992
Vertices 190392
Primitives 51197

Camera stats:
State graphs 1254
Drawables 2117
PrimitiveSets 4899
Triangles 17122
Tri. Strips 191
Tri. Fans 7212
Quads 106464
Quad Strips 534
Total primitives 131523

This indicates to me that the tristrip visitor in the optimizer does a
pretty bad job. I looked at an .osg dump, and it seems to generate a
separate strip for each quad (so one strip for 4 vertices) which is
ridiculous... But that's a subject for another day.

When I disabled the tristripper, you can see a massive decrease in the
number of primitiveSets (and even in the number of primitives), however
there was no significant change in the frame rate and timings. I don't
understand this. I would have expected, with more primitives per
primitiveSet (I'm now at about 26 prims per primSet on average, as
opposed to around 3 before) and much less draw calls, that the draw time
would have been much lower. That's not what happens in practice.

My previous attempts at optimizing (using the osgUtil::Optimizer) were
also centered around lowering the number of primitives (by creating
texture atlases and sharing state so the merging of geodes and geometry
objects gave good results). And even though that also lowered the
numbers (I started at around 2215 Geodes and 2521 Geometry objects in
the same scene, compare that to 992 each now), it also had underwhelming
results in practice.

Clearly there are more than one primitiveSet per Geometry in the above
stats. What I see in the dumped .osg file, is there is often things like:

PrimitiveSets 4
{
DrawArrays TRIANGLES 0 12
DrawArrays QUADS 12 152
DrawArrays TRIANGLES 164 12
DrawArrays QUADS 176 152
}

I would expect, by reordering the vertex/color/normal/texCoord data, I
would be able to get only 2 primitiveSets there, one TRIANGLES and one
QUADS. Am I wrong? Why does the osgUtil::Optimizer not do this already
when merging Geometry objects? I expect because it's easier not to do
it, but still, it gives sub-optimal results...

Of course I can't do that for strips or fans, unless I insert new
vertices to restart the strip. Again this is something that could be
done, but might bring diminishing returns in my case given that my own
scene contains many more triangles and quads than strips and fans (when
I turn off tristripping).

So, first of all, am I on the right track trying to reduce the number of
primitiveSets? Do you think on current hardware, disabling tristripping
is a good idea?

Why, when disabling tristripping which reduced the number of
primitiveSets from 73953 to 4899, didn't I see an increase in performance?

Is there some other way to find out what's going on and seeing what I
can improve to increase the performance? I've tried running our app in
gDEBugger, which tipped me off that I was batching poorly when using
triangle strips (about 3 prims per primitiveSet as I said above).
Turning off triangle strips improved the situation (as gDEBugger sees
it), but not by that much, which is probably coherent with what I'm
seeing in practice, but I'm no closer to finding out what to improve
next. What is not mergeable now is like that because of different
settings in StateSets (backface culling on vs off, can't use texture
atlas because the wrap mode is set to REPEAT, etc.), so I don't think
osgUtil::Optimizer can help me improve the situation further...

I have looked at video memory usage by the way, and I'm fine in that
respect, so I don't think I'm getting any thrashing or paging between
video RAM and main RAM at runtime. Also, I'm using display lists for
most of the objects in the scene, I tried using Vertex Buffer Objects
and it actually slowed it down.

I should also mention that these results are obtained using
osgShadow::LightSpacePerspectiveShadowMap. I can run the dumped .osg
file with

osgshadow --lispsm --noUpdate --mapres 2048 <dumped_file>.osg

and I get the results above, which are pretty similar to our simulator.
If I run the same data file in plain osgViewer without shadows, it runs
at a solid 60Hz, with stats and timings:

Scene stats:
StateSets 1345
Groups 392
Transforms 672
Geodes 992
Geometry 992
Vertices 190392
Primitives 51197

Camera stats:
State graphs 321
Drawables 810
PrimitiveSets 1774
Triangles 7243
Tri. Strips 85
Tri. Fans 2508
Quads 39370
Quad Strips 178
Total primitives 49384

FPS: 60
Cull: 1.7ms
Draw: 8ms
GPU: 6.8ms

(that's the no tristrips version, so compare these stats to the second
set of stats from the top, not the first)

I would have expected most numbers there to be half what they were with
shadows enabled, but as you can see they're consistently less than half,
so shadows added more than a 100% overhead... Note that even if it added
exactly 100% overhead, I would still be at 16ms draw, which is too much,
but I'm just mentioning it in case it may prompt some other suggestions.

I'm not sure I could send my whole scene to everyone on the list, but I
might be able to send it to someone if they want to see firsthand. Just
the bare .osg file without any textures and without ocean and skydome
shows the problem adequately well.

Thanks in advance for any suggestions you might have. I really need to
improve this, and I've been working for a while already with only a
small improvement to show for my time...

J-S
--
______________________________________________________
Jean-Sebastien Guay jean-seba...@cm-labs.com
http://www.cm-labs.com/
http://whitestar02.webhop.org/
_______________________________________________
osg-users mailing list
osg-...@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Tim Moore

unread,
Jan 22, 2011, 4:41:25 AM1/22/11
to OpenSceneGraph Users
Interesting problem. It's similar to the performance issues that we see in FlightGear.

On Fri, Jan 21, 2011 at 10:13 PM, Jean-Sébastien Guay <jean-seba...@cm-labs.com> wrote:
Hi all,

I thought I had a pretty firm grasp on what to optimize given a certain set of scene stats, but I've optimized what I can and I'm still getting little improvement in results. So I'll explain my situation here and hope you guys have some good suggestions. Sorry if this is a long message, but I prefer to give all the relevant data now rather than get asked later.

The whole scene is about a 200m x 200m square (apart from the ocean and skydome but these are not significant, I have 
... 
The other thing is that there are a lot of dynamic objects, so there are a lot of transforms. But I can't change this, it's part of our simulation.

As an aside, you may need to change your approach to the dynamic objects from a "naive" scene graph to one where the geometry of the dynamic objects is instanced and / or coalesced.
So, after doing some optimization (removing redundant groups, building texture atlases where possible, merging geodes and geometry, generating triangle strips, most of which I did with the osgUtil::Optimizer), I get the following stats, which I'll talk about a bit later:

Scene stats:
StateSets     1345
Groups         392
Transforms     672
Geodes         992
Geometry       992
Vertices    139859
Primitives   87444

This is a big scene graph, and especially compared to the small amount of geometry that is in the scene.
Camera stats:
State graphs       1282
Drawables          2151
PrimitiveSets     73953
Triangles          3538
Tri. Strips      211091
Tri. Fans            16
Quads             11526
Quad Strips         534
Total primitives 226705

In addition to the geometry problems you attack later on, the state graph number is very high, as is the drawable number.  "State graphs" is a measure of the number of unique graphic states set up during  the rendering  traversal.
 
And, both in our simulator and in osgViewer, for the same scene and same viewpoint, I get:

FPS: ~35
Cull: 5.4ms
Draw: 19ms
GPU: 19ms

This is on a pretty good machine: Core i7 920, GeForce GTX 260.

The cull time seems high to me. I think this is caused by the expense of traversing a big scene graph every frame.

Draw time is very long, and GPU time probably suffers as a result. Ideally you want to shoot for a tiny draw time ( < 5ms) and let GPU time extend well past the end of the draw traversal, even overlapping with the next frame's update and cull traversals. The relationship between draw and GPU time can be hard to predict. I've found that it is very rare to have a GPU time that is less than draw time, and I think this is caused by the GPU stalling, waiting on OpenGL commands from the CPU.
First of all, the stats above tell me that the "Primitives" part of the scene stats refers to primitive sets, not just primitives... Since the camera stats tell me there are over 226000 primitives in the current view.

As you can see, the number of primitiveSets is very high. If I understand correctly, each PrimitiveSet will result in an OpenGL draw call, and since my draw time is what's high now, I would want to reduce that (since I'm currently at about 3 primitives per primitiveSet on average). If I remove triangle strip generation from the optimizer options, the stats become:
That is basically true. When you use display lists, the draw calls for a Drawable are all included in one display list, but this still results in OpenGL command overhead, as far as I can tell.

Scene stats:
StateSets     1345
Groups         392
Transforms     672
Geodes         992
Geometry       992
Vertices    190392
Primitives   51197

Camera stats:
State graphs       1254
Drawables          2117
PrimitiveSets      4899
Triangles         17122
Tri. Strips         191
Tri. Fans          7212
Quads            106464
Quad Strips         534
Total primitives 131523

This indicates to me that the tristrip visitor in the optimizer does a pretty bad job. I looked at an .osg dump, and it seems to generate a separate strip for each quad (so one strip for 4 vertices) which is ridiculous... But that's a subject for another day.

Yeah, I think we're advising against the tristripper these days. 
When I disabled the tristripper, you can see a massive decrease in the number of primitiveSets (and even in the number of primitives), however there was no significant change in the frame rate and timings. I don't understand this. I would have expected, with more primitives per primitiveSet (I'm now at about 26 prims per primSet on average, as opposed to around 3 before) and much less draw calls, that the draw time would have been much lower. That's not what happens in practice.

There are two factors here. 26 prims is still about 2 orders of magnitude below the ideal. You have all those drawables, and you will need to do a lot of work to reduce that number, both with the Optimizer and perhaps reorganizing your graph.

But there's also that big state graph number; that could be the bottleneck. 
My previous attempts at optimizing (using the osgUtil::Optimizer) were also centered around lowering the number of primitives (by creating texture atlases and sharing state so the merging of geodes and geometry objects gave good results). And even though that also lowered the numbers (I started at around 2215 Geodes and 2521 Geometry objects in the same scene, compare that to 992 each now), it also had underwhelming results in practice.

Clearly there are more than one primitiveSet per Geometry in the above stats. What I see in the dumped .osg file, is there is often things like:

         PrimitiveSets 4
         {
           DrawArrays TRIANGLES 0 12
           DrawArrays QUADS 12 152
           DrawArrays TRIANGLES 164 12
           DrawArrays QUADS 176 152
         }

I would expect, by reordering the vertex/color/normal/texCoord data, I would be able to get only 2 primitiveSets there, one TRIANGLES and one QUADS. Am I wrong? Why does the osgUtil::Optimizer not do this already when merging Geometry objects? I expect because it's easier not to do it, but still, it gives sub-optimal results...
You can get one primitive set there; quads are just two triangles. A happy side effect of the  INDEX_MESH | VERTEX_POSTTRANSFORM | VERTEX_PRETRANSFORM combo of optimizers, which are relatively new and not part of the default set, is that they combine multiple primitive sets into one. You could try those, even if their primary purpose -- optimizing cache use -- won't do much for these small meshes.

Of course I can't do that for strips or fans, unless I insert new vertices to restart the strip. Again this is something that could be done, but might bring diminishing returns in my case given that my own scene contains many more triangles and quads than strips and fans (when I turn off tristripping).

Dumping all the strips and fans into one big indexed DrawElements gives better results these days, and the new optimizers do that.
So, first of all, am I on the right track trying to reduce the number of primitiveSets? Do you think on current hardware, disabling tristripping is a good idea?

Why, when disabling tristripping which reduced the number of primitiveSets from 73953 to 4899, didn't I see an increase in performance?
That number is still very big. But there's also the large state graph number. 

Is there some other way to find out what's going on and seeing what I can improve to increase the performance? I've tried running our app in gDEBugger, which tipped me off that I was batching poorly when using triangle strips (about 3 prims per primitiveSet as I said above). Turning off triangle strips improved the situation (as gDEBugger sees it), but not by that much, which is probably coherent with what I'm seeing in practice, but I'm no closer to finding out what to improve next. What is not mergeable now is like that because of different settings in StateSets (backface culling on vs off, can't use texture atlas because the wrap mode is set to REPEAT, etc.), so I don't think osgUtil::Optimizer can help me improve the situation further...
I haven't found good tools on Linux for attacking this sort of thing. On Windows the NVidia PerfTools, or PerfDisplay, or whatever it's called, should give you a clue. I usually proceed by dumping the scene graph to a .osg, like you have done, and pouring  over it with a text editor :/. 

I have looked at video memory usage by the way, and I'm fine in that respect, so I don't think I'm getting any thrashing or paging between video RAM and main RAM at runtime. Also, I'm using display lists for most of the objects in the scene, I tried using Vertex Buffer Objects and it actually slowed it down.

I should also mention that these results are obtained using osgShadow::LightSpacePerspectiveShadowMap. I can run the dumped .osg file with

 osgshadow --lispsm --noUpdate --mapres 2048 <dumped_file>.osg

and I get the results above, which are pretty similar to our simulator. If I run the same data file in plain osgViewer without shadows, it runs at a solid 60Hz, with stats and timings:

Hah, you saved the best for last :) 
Some final thoughts:
You are traversing  the scene twice, so reducing  overhead in the traversal is doubly important.

Don't forget about the cull time. Draw can't start until cull is finished, so if you can knock 1ms off the cull time you are making good headway. Reducing the number of groups, geodes, drawables to the minimum should be the goal there.

You need to attack the state graph number. These may be challenging. By way of an example, look at the textures that are repeating, and therefore defeating your atlas efforts, and think about expanding them.

Keep the figure of 1000 verts per drawable (and primitive set) in mind. It might be hard to get there, but it's a worthy goal.

Tim 

Robert Osfield

unread,
Jan 22, 2011, 7:54:10 AM1/22/11
to OpenSceneGraph Users
Hi J-S,

Tim's analysis and suggestions are spot on.

I'll qualify a bit more with a bit of history on tri stripping. Tri
stripping used the most efficient way to batch up geometry for sending
to the graphics card, this was in the era when display listing was
king, and there was very little vertex caching down on the GPU.
Display lists are great in that they allow the graphics driver to
efficiently repackage all the vertex and primitive data in the way the
driver/GPU can most efficiently manage it.

Fast forward to today, VBO's are in and display lists are on there way
out. The cost of passing vertex data has gone down, but we don't hide
the cost of dispatching the primitives like we could with display
lists. This means that these days it's generally more efficient to
send primitives as a single block of GL_TRIANGLES than lots of
separate tri strips + fans.

Tim's mesh optimizers directly take into account the modern design of
GPU's and preference for coarse grain primitive sets. So
tri-stripping is out of favour, mesh optimization in :-)

As for cull, I don't think that's too high, and draw thread per
context can hide this small cost anyway so no issue. More of issue is
the number of batches of state that you have. Also working on
reducing the number of transforms. Geometry instancing and shaders
might be useful here.

Robert.

Wojciech Lewandowski

unread,
Jan 22, 2011, 4:04:04 PM1/22/11
to OpenSceneGraph Users
J-S,

You have not mentioned which lispsm is used but if thats DrawBounds
(flavour) it does 2 extra cull & render passes. Shadow map pass is preceeded
with depth buffer render pass used to compute DrawBounds. I should also
mention that this computation is made using ReadPixels and scanning the
picture on CPU. But picture is small (64x64) and I once did tests on
performance penalty for ReadPixels at GF 8800, and they not seemed to be a
bottleneck. I turned off reading after first succesful ReadPixels and
framerate did not changed. But I guess this situation may change with
different GPUs.

Our DBs are also made from small batches. My observations were that not
only the size of primitive sets but small state attribute changes between
them were hitting hard as well. I once did an experiment. We had a DB that
was suffering from small batches problem. I have built Texture Atlases, then
put all the scene Textures into single Texture2DArray (yes it was huge).
Then I removed Statesets and created only one StateSet at scene root with
Shaders to use Texture2DArray I built. I think I have not done anything to
primitve sets they were the same as before only StateSets were gone. And
framerate went up 2 times.

Cheers,
Wojtek

-----Oryginalna wiadomość-----
From: Jean-Sébastien Guay
Sent: Friday, January 21, 2011 10:13 PM
To: OpenSceneGraph Users
Subject: [osg-users] Optimizing scene structure and geometry

Полищук Сергей

unread,
Jan 23, 2011, 11:26:13 AM1/23/11
to OpenSceneGraph Users
Hi,

I think you can reduce cull time with build kdtrees option in osgdb registry or env var OSG_BUILD_KDTREES (if you not already using it). As for draw its related to large number of state changes i believe, so you should try to merge statesets. Large number of primitive sets is kinda bad, but with display lists (at least on nvidia hardware) it's dont hurt that much actually.

22.01.2011, 00:13, "Jean-Sébastien Guay" <jean-seba...@cm-labs.com>:

Robert Osfield

unread,
Jan 24, 2011, 3:50:03 AM1/24/11
to OpenSceneGraph Users
Hi,

The KdTrees in the OSG do not affect the cull traversal in any way whatsoever.

KdTree we have only affect the intersection performance.

Robert.

2011/1/23 Полищук Сергей <pol...@yandex.ru>:

Jean-Sébastien Guay

unread,
Jan 24, 2011, 9:35:40 AM1/24/11
to osg-...@lists.openscenegraph.org
Hi Wojtek,

> You have not mentioned which lispsm is used but if thats DrawBounds
> (flavour) it does 2 extra cull & render passes.

No, we use the ViewBounds type. and I forgot to mention in my arguments
to reproducing the results in the osgshadow example that I also used
ViewBounds. Sorry about that, seems even with my very long post I was
still missing some specifics :-)

> Our DBs are also made from small batches. My observations were that not
> only the size of primitive sets but small state attribute changes
> between them were hitting hard as well.

Yeah, I had also noticed the large number of state sets and state
graphs, and Tim remarked this in his reply too. I'll have a look at
reducing that in the next few days.

> I once did an experiment. We had
> a DB that was suffering from small batches problem. I have built Texture
> Atlases, then put all the scene Textures into single Texture2DArray (yes
> it was huge). Then I removed Statesets and created only one StateSet at
> scene root with Shaders to use Texture2DArray I built. I think I have
> not done anything to primitve sets they were the same as before only
> StateSets were gone. And framerate went up 2 times.

Very interesting results. I guess I could make a simple test of
traversing our scene graph removing all statesets after loading the
scene, and seeing if the framerate improves significantly then. That
would at least tell me I'm on the right track with attacking the number
of statesets.

One problem unfortunately is that some textures could not be included in
the atlases (because they are set to REPEAT), and sometimes not only
textures change, but also some render state such as cull face (some
fences made with a single polygon set to render both sides, for
example). I'll have to get our artist to model these kinds of things as
boxes, and replace the textures set to REPEAT.

Thanks for your insight,

Jean-Sébastien Guay

unread,
Jan 24, 2011, 9:35:46 AM1/24/11
to osg-...@lists.openscenegraph.org
Hi Tim and Robert,

OK, a few great suggestions, and a possible answer to why the work I've
done hasn't sped things up that much.

First, I've been focusing mostly on automatic optimizations that I can
code up in some processing tool since we already do that now (we use
osgconv to convert all our models to ive, so in the past week I've been
writing my own tool that uses parts of osgconv, the osgUtil::Optimizer
and some slightly modified classes to give better results on our data).
My goal was to be minimally intrusive on my artist's work. But I think
I'm at a point now that I'll have to get him to modify some things...

For the number of transforms, I doubt I'll be able to do much to lower
it, since it's a central part of our infrastructure that physics objects
are attached to transforms (with the physics object's transformation
being copied to the transform node each step). We might be able to get
some other kind of relationship, but not in the short term. So I'll have
to work on some other areas.

Reducing the number of statesets is a worthy goal, I've already tried
(building texture atlases mostly). I've gone from over 2500 statesets to
1345 in the stats I gave in my OP. The next steps will be to get my
modeler to remove most parts of models set to render both sides (cull
face off) and as Tim suggested removing textures set to REPEAT since
those are two things that defeat atlas building sometimes.

Using the INDEX_MESH | VERTEX_POSTTRANSFORM | VERTEX_PRETRANSFORM
optimizers is a good idea, up until now I was compiling my tool against
the same version of OSG as our simulator uses (2.8.3) though, so I
couldn't use them. I'll see if I can backport them to that version and
see if they give better results.

About the state graph number. I guess that's pretty much proportional to
the number of unique statesets in the graph? I'm at 1345 now, I guess if
I get that number down it should reduce the number of state graphs?

Tim mentions keeping a figure of about 1000 verts per primitive set in
mind. That's good info. It's really hard to find numbers that you can
relate to the stats you see in the OSG stats display.

Thanks a lot, I'll keep working on it and keep you updated.

J-S
--
______________________________________________________
Jean-Sebastien Guay jean-seba...@cm-labs.com
http://www.cm-labs.com/
http://whitestar02.webhop.org/

Tomlinson, Gordon

unread,
Jan 24, 2011, 10:13:45 AM1/24/11
to OpenSceneGraph Users
Hi JS

Just little FYI , you can build texture atlases that repeat the caveat being they can only repeat in one direction so you can build vertical and horizontal sets :)

I would also say the 1k verts per primitive set is lowish I would say you can look at 10k or more on modern hardware ( assuming you model(s) that many)


Gordon Tomlinson
3D Technology
System Engineering Consultant
Overwatch®
An Operating Unit of Textron Systems
__________________________________________________________
"WARNING: Documents that can be viewed, printed or retrieved from this E-Mail may contain technical data whose export is restricted by the Arms Export Control Act (Title 22, U.S.C., Sec 2751, et seq,) or the Export Administration Act of 1979, as amended, Title 50, U.S.C., App. 2401 et seq. and which may not be exported, released or disclosed to non-U.S. persons (i.e. persons who are not U.S. citizens or lawful permanent residents ["green card" holders]) inside or outside the United States, without first obtaining an export license.  Violations of these export laws are subject to severe civil, criminal and administrative penalties."

Wojciech Lewandowski

unread,
Jan 24, 2011, 10:14:47 AM1/24/11
to osg-...@lists.openscenegraph.org
Hi J-S,

> No, we use the ViewBounds type. and I forgot to mention in my arguments to
> reproducing the results in the osgshadow example that I also used
> ViewBounds. Sorry about that, seems even with my very long post I was
> still missing some specifics :-)

Command line for osgshadow was not indicating this. Default is DrawBounds
and I assumed you use this.

> One problem unfortunately is that some textures could not be included in
> the atlases (because they are set to REPEAT), and sometimes not only
> textures change, but also some render state such as cull face (some fences
> made with a single polygon set to render both sides, for example). I'll
> have to get our artist to model these kinds of things as boxes, and
> replace the textures set to REPEAT.

Yeah, but if we can remove texture state attributes we may usually get rid
of majority of StateSets and end up with really small number of unique
StateSets created by combinations of remaining attributes. Textures cannot
be always put into Texture atlasses thats why we finally built
Texture2DArray. Btw I just noticed you mentioned that you can send your DB
for others to check. If you want you may send it to me, I may try to run the
same tool on your DB and report what I got.

Cheers,
Wojtek

Jean-Sébastien Guay

unread,
Jan 24, 2011, 10:35:01 AM1/24/11
to OpenSceneGraph Users
Hi Gordon,

> Just little FYI , you can build texture atlases that repeat the caveat being they can only repeat in one direction so you can build vertical and horizontal sets :)

Yeah, but OSG's TextureAtlasVisitor just rejects textures as soon as
they repeat in any direction. If there's another better tool that I can
use on an OSG database I'd like to hear about it :-)

> I would also say the 1k verts per primitive set is lowish I would say you can look at 10k or more on modern hardware ( assuming you model(s) that many)

Most of our models are pretty low poly, as you can see in the stats I
sent the total number of polys is pretty low compared to the number of
transforms. I might be able to get close to 1k but apart from the static
scenery, I doubt I'll be able to go higher than that. Still, it's
another data point to consider.

Thanks,

Tomlinson, Gordon

unread,
Jan 24, 2011, 10:44:17 AM1/24/11
to OpenSceneGraph Users

-----Original Message-----
From: osg-user...@lists.openscenegraph.org [mailto:osg-user...@lists.openscenegraph.org] On Behalf Of Jean-Sébastien Guay
Sent: Monday, January 24, 2011 10:35 AM
To: OpenSceneGraph Users
Subject: Re: [osg-users] Optimizing scene structure and geometry

Hi Gordon,

> Just little FYI , you can build texture atlases that repeat the caveat being they can only repeat in one direction so you can build vertical and horizontal sets :)

Yeah, but OSG's TextureAtlasVisitor just rejects textures as soon as
they repeat in any direction. If there's another better tool that I can
use on an OSG database I'd like to hear about it :-)


GT: That's a shame :(, I know Multigen Creators has tools (since around 3.0) than can create composite textures that can have single repeat in Vert or Horiz as well as no repeat, but probably not much help to you

Jean-Sébastien Guay

unread,
Jan 25, 2011, 4:48:15 PM1/25/11
to osg-...@lists.openscenegraph.org
Hi again,

So I've been able to at least partially implement some suggestions that
I was given in the previous posts by Tim and Robert. Here are some
results in before-after style.

Scene stats haven't changed significantly.

Camera stats: Before After
State graphs 1254 846
Drawables 2117 1669
Vertices 485531 368844
PrimitiveSets 4899 1694
Triangles 17122 161448
Tri. Strips 191 32
Tri. Fans 7212 36
Quads 106464 9578
Quad Strips 534 356
Total primitives 131523 171450
Vertices per pset 99.1 217.7

This was achieved by doing 2 things:

* Uniformizing even more state, even if it leads to graphical artifacts
(our artist can go in and fix those later), for instance cull face
settings. This made texture atlas generation more effective which
lowered number of state sets / state graphs.

* Backporting the INDEX_MESH, VERTEX_PRETRANSFORM and
VERTEX_POSTTRANSFORM optimizers into my optimization tool which uses OSG
2.8.3. This lowered the number of PrimitiveSets by combining and
converting primitive sets to all use indexed triangle lists.

However, in our simulator itself as well as when testing with a dumped
scene in the osgshadow example, even though these numbers are lower,
they did not result in a large cull/draw time reduction, and so the
frame rate is still largely the same (improved from ~35 to ~40 fps). So
I guess even though I more than doubled the number of vertices per
primitiveset, the numbers (state graphs and primitive sets) are still
much too high.

My next step is eliminating textures that repeat, to even further
improve texture atlas generation. This will again lead to some graphical
artifacts unfortunately...

I did the test of removing almost all statesets after scene load. The
result was that most of the time, the frame rate was close to or at 60.
In the worst case scenario (whole scene visible, which is also what I'm
giving you stats for since the beginning) it went to about 50 fps in our
simulator, so it's not just statesets that are a problem. I think I
really need to lower statesets and primitivesets in combination to be
able to get good results.

I'll also look at decreasing the cull time, since in practice that
should pay off in double (because of the shadow pass) and even quadruple
or more sometimes (ocean can have reflection turned on, for now we've
turned it off; also we often have extra picture-in-picture view(s) to
render in addition to the main view).

Any further comments would be appreciated. Also any suggestions as to
what else I can do would be appreciated as well. At least I'm starting
to see some results, which is encouraging since up until now nothing I
had done had had a significant effect (I was scrounging 1 fps here, 0.5
fps there...).

Thanks in advance,

Wojciech Lewandowski

unread,
Jan 26, 2011, 5:49:38 AM1/26/11
to osg-...@lists.openscenegraph.org
Hi J-S,

[..]

> My next step is eliminating textures that repeat, to even further improve
> texture atlas generation. This will again lead to some graphical artifacts
> unfortunately...

> I did the test of removing almost all statesets after scene load. The
> result was that most of the time, the frame rate was close to or at 60. In
> the worst case scenario (whole scene visible, which is also what I'm
> giving you stats for since the beginning) it went to about 50 fps in our
> simulator, so it's not just statesets that are a problem. I think I really
> need to lower statesets and primitivesets in combination to be able to get
> good results.

I agree. I must admit I have lied to you (unintentionally;-) when I said
that our frmerate increase was due to statesets removal. I checked the code
and we later run optimizer again meging meshes. I also checked the toolset
on my new PC and speed increase does not seem to be that high on GF 280 GTX
(150%) as it used to on GF 8800 (200%).

> I'll also look at decreasing the cull time, since in practice that should
> pay off in double (because of the shadow pass) and even quadruple or more
> sometimes (ocean can have reflection turned on, for now we've turned it
> off; also we often have extra picture-in-picture view(s) to render in
> addition to the main view).

I think I may have hint for you. Check if ComputeBoundsVisitor is used
MinimalShadowMap::computeShadowReceivingCoarseBounds(). If it does it will
add to overall cull time. With complex scenes it may take some time. Its
only used to compute rough bounds of the scene so in many occasions it can
be computed once or replaced with constant Bounding box. Yo may need to
overwrite technique and the above method for this, though.

Cheers,
Wojtek

Jean-Sébastien Guay

unread,
Jan 26, 2011, 3:37:25 PM1/26/11
to OpenSceneGraph Users
Hello Wojtek,

> I agree. I must admit I have lied to you (unintentionally;-) when I said
> that our frmerate increase was due to statesets removal. I checked the
> code and we later run optimizer again meging meshes. I also checked the
> toolset on my new PC and speed increase does not seem to be that high on
> GF 280 GTX (150%) as it used to on GF 8800 (200%).

Thanks for the update, it makes me more confident that we'll be able to
get good results with our current approach.

> I think I may have hint for you. Check if ComputeBoundsVisitor is used
> MinimalShadowMap::computeShadowReceivingCoarseBounds(). If it does it
> will add to overall cull time. With complex scenes it may take some
> time. Its only used to compute rough bounds of the scene so in many
> occasions it can be computed once or replaced with constant Bounding
> box. Yo may need to overwrite technique and the above method for this,
> though.

Interesting, thanks for the hint, I'll have a look at whether that takes
a significant amount of time for us (I guess it might since we have a
lot of hierarchy to traverse).

Thanks,

J-S

Reply all
Reply to author
Forward
0 new messages