A few questions about performance

199 views

Skip to first unread message

Luca Bruno

unread,

Sep 25, 2011, 12:34:46 PM9/25/11

to cog...@googlegroups.com

Hello,
I'm really interested in the cogl project and I'm trying to dig into reading the code and looking toward cogl2.
I have a couple of questions about some design choices and performance (I don't have deep knowledge of neither 3d graphics nor cogl):
1) Cogl path is an interesting component, but I see it does not call underlying opengl. For example you implement your own evaluator for bezier curves so that you can take advantage of the journal. How much does it affect/improve the performance?
2) I see cogl transforms the vertices of the journal against the modelview matrix with the CPU. Would it be possible/efficient to let the GPU do this?
3) There are in general many parts that are implemented at CPU level without taking advantage of any opengl extension or driver in particular. Is that proven to be overall more efficient?
4) Is there any document comparing the performance of straight opengl over cogl?

Thank you for all the great work.

Robert Bragg

unread,

Sep 25, 2011, 5:36:47 PM9/25/11

to cog...@googlegroups.com

Hi,

On Sunday, September 25, 2011 5:34:46 PM UTC+1, Lethalman wrote:

Hello,
I'm really interested in the cogl project and I'm trying to dig into reading the code and looking toward cogl2.

that's great

I have a couple of questions about some design choices and performance (I don't have deep knowledge of neither 3d graphics nor cogl):
1) Cogl path is an interesting component, but I see it does not call underlying opengl. For example you implement your own evaluator for bezier curves so that you can take advantage of the journal. How much does it affect/improve the performance?

OpenGL evaluators are a deprecated OpenGL utility API (you can see here for an overview of deprecated functionality in OpenGL: http://www.opengl.org/wiki/History_of_OpenGL#Deprecation_Model) and they are also not available with GLES 1/2 used on mobile platforms. Also, since they are only a utility that are almost certainly just handled on the CPU by most/all? existing drivers there wouldn't really be much value in depending on multiple hardware vendors for this functionality, it would add to the variability between platforms compared to just re-using a single portable solution across all platforms.

For the performance I can't say I've ever tried using the OpenGL evaluator API but form what I vaguely recall from comparing with the performance of Cairo I think the recursive division approach we use works quite well with very similar performance to Cairo. From my experience although the bezier flattening can certainly show up in profiles the cost of tessellation is typically far higher. The reason we decided to expose retained path support with the CoglPath API is so we can retain the tessellated triangle mesh for a given path so ideally you only have to worry about the cost of bezier evaluation and tessellation once.

Since you note that indeed CoglPath doesn't use OpenGL directly internally I can add a note to say that we've actually considered splitting the CoglPath api out into a cogl-2d sub-library (like we have cogl-pango) because it is very orthogonal utility code with relation to the rest of Cogl. Actually all the drawing done for CoglPath is basically done with public Cogl API.

With regards to using the journal, I think there is some misunderstanding here because the journal is only currently used for rendering rectangles so unless your path is a rectangle then CoglPath drawing doesn't go through the journal.

2) I see cogl transforms the vertices of the journal against the modelview matrix with the CPU. Would it be possible/efficient to let the GPU do this?

The thing to note here is that only rectangles currently go through the journal - anything else bypasses it and the vertex transform is done fully by the gpu. The problem with trying to use the GPU to do vertex transform here is that we'd have to update the modelview matrix between each draw command and just the process of updating the modelview would be many, many times slower than simply transforming 4 vertices for a rectangle not to mention the cost of uploading/mapping the vertices for the gpu and emitting the command itself - there can be a surprising amount of driver code involved in emitting a separate draw command but it's not that many instructions to transform 4 points through one matrix.

There is a threshold of geometry complexity to consider here, whereby we can safely say its too costly to program the gpu to transform one rectangle at a time, but you could imagine once you start feeding more complex geometry meshes to the GPU the cost of updating the modelview becomes worthwhile. One of the main aims of the Journal is to target primitives that would be categorized as too simple to justify having separate draw calls, the journal tries to minimize the cost of uploading small primitives to the GPU and tries to batch them together so we do less work programming the GPU.

Although the journal is currently strictly limited to tracking rectangles the aim is that it will also start to track small CoglPrimitive too and once we do that then we will have to do a more explicit evaluation of this threshold to decide when software transform should be skipped for more complex primitives.

3) There are in general many parts that are implemented at CPU level without taking advantage of any opengl extension or driver in particular. Is that proven to be overall more efficient?

Certainly for all the various tricks we play in the journal, such as software transform, software clipping and software read_pixels they have shown to have a big performance benefit for a good range of Clutter applications on a range of GPUs and platforms. If you are interested in evaluating some of these things more carefully you might want to take a look at some of the COGL_DEBUG options we have. COGL_DEBUG=help will give a fairly descriptive list of options you can play with but you might be interested in COGL_DEBUG=disable-software-transform, COGL_DEBUG=disable-software-clip and COGL_DEBUG=disable-batching (I can't remember when I last tried disabling the software transform though so there is change that it might not work any more, sorry if that's the case) COGL_DEBUG=journal might also be interesting to see what kind of batching is happening in the journal.

As with the notes above about doing software transforms, a key thing to note here is that we only aim to play these tricks and use the CPU over the GPU in cases where it would be too in-efficient to be programming the GPU for such tiny primitives and we would always aim to use the GPU when dealing with more substantial meshes of geometry. Currently all drawing for the CoglAttribute/CoglPrimitive APIs bypasses the journal entirely and its only if you use the cogl_rectangle_* api that the journal is used.

4) Is there any document comparing the performance of straight opengl over cogl?

No documents no sorry, and it's a bit tricky to evaluate fairly since the programming models have diverged a bit (GL has a global state machine design where it can sometimes be awkward to avoid redundant re-assertion of state when you have many orthogonal components vs Cogl which encapsulates full state descriptions in objects but there's work involved with managing those objects.)

Recently I've been working on a Cogl backend for Cairo and something that's interesting for me is that there are two existing pure GL backends (one is the upstream cairo-gl and there is also a cairogles backend on code.google.com) that I can compare against. It's still really early days for my work but I'm already seeing really promising results and out-performing both backends on the things I support so far. To be fair though the backends aren't really rendering in a like-for-like way, although I would say currently I'm quite similar to the cairogles backend. So far I've been quite enjoying the process of developing a Cogl backend for Cairo since its the first substantial project I've tried to tackle using Cogl beyond supporting Clutter and touch-wood it seems to working out pretty nicely for me so far.

I think quite a bit of the complexity that Cogl is able to simplify is the interaction with the window system. So for example dealing with partial updates of the front-buffer which for some applications can have a huge impact on performance. Since there are multiple different extensions to juggle with OpenGL/GLX/EGL etc to achieve this and get the throttling right you probably wouldn't bother trying to support this in a basic GL application, but with Cogl we've started providing a nicely unified framebuffer API with just one feature to query and we'll figure out how to make it work with the right extension internally. Enabling clipped redraws in Clutter can make a huge performance difference for some apps so I think this is quite a good example where Cogl makes it easier to access the performance of your platform.

I think if you follow a pattern with Cogl of creating template pipelines up-front and then whenever you need to draw something you create a pipeline by copying a template and drawing with that Cogl should stack well against GL. Its easy to abuse OpenGL (and no doubt without much Cogl documentation its easy to abuse Cogl too) and those factors are more likely to impact overall performance. e.g. I don't think it would really be possible to measure the cost of Cogl being layered on top of OpenGL since there's going to be so much code underneath OpenGL that the layering is a drop in the ocean so the bigger concern is that you don't abuse the GPU by asking it to do costly things or how you structure higher level components.

Cogl is certainly a much younger technology than OpenGL and it's only recently that we've started trying to try to use it outside of Clutter so no doubt we have some optimizations here and there we'll need to figure out, but generally speaking I think the design we have for tracking state in sparse objects is a pretty decent approach for a GPU api since I think it'll give us more opportunities than OpenGL has to associate expensive to derive state with those objects and being able to efficiently compare arbitrary state objects to know what gpu state needs updating is perhaps better than the typical dirty-state flags that many GL drivers rely on internally.

Thank you for all the great work.

Thank you for taking the time to look at Cogl, if you come up with any more questions or any of my reply wasn't clear please say and we'll try our best to help.

kind regards,
- Robert

Luca Bruno

unread,

Sep 25, 2011, 6:19:39 PM9/25/11

to cog...@googlegroups.com

On Sun, Sep 25, 2011 at 11:36 PM, Robert Bragg <robert...@gmail.com> wrote:

Thank you for taking the time to look at Cogl, if you come up with any more questions or any of my reply wasn't clear please say and we'll try our best to help.

Thanks for the quick reply. You totally addressed all of my doubts and cleared out the whole point of cogl to me. I believe too that the OpenGL state machine is somewhat limiting and it's impossible to make something complex without layering an abstraction on top of it. Cogl indeed makes a great job as an higher-level and modern GPU api but enough low-level to be a backend for clutter and cairo. It's not a simple task to achieve such an api.

About the journal, I've read the code more deeply and finally figured out that only rectangles are batched. That's indeed a killer feature for a toolkit like clutter where allocations are rectangles most of the time. It wouldn't be probably the same for cairo as it doesn't need to read pixels. By the way, as you said, it would be nice (but probably quite difficult) to batch other primitives and evaluate an adaptive threshold for them.
Initially I thought most of the primitives went through the journal, including complex meshes, that's why I ingenuously asked for the performance of the transformations.

In other words (if I understood it correctly), you're trying to stay tied to the new programmable pipeline model: a bunch of vertices, textures and shaders which state is encapsulated in objects, provide uniform access to the underlying hardware by exploiting the already existing drivers through OpenGL, provide obvious facilities while trying to keep compatibility with older systems (or exploiting certain extensions), then optimize when possible for the real use cases.

More than other questions for the future, I will keep reading the commit log in the hope to give some contribution. Thanks again.

Best regards,

--
www.debian.org - The Universal Operating System

Luca Bruno

unread,

Sep 26, 2011, 5:01:29 PM9/26/11

to cog...@googlegroups.com

Hi,
I've another question about CoglObject. I've noticed it resembles much of GTypeInstance, so cogl doesn't want to use full gobject why not something like GstMiniObject?

Robert Bragg

unread,

Sep 27, 2011, 4:55:33 PM9/27/11

to cog...@googlegroups.com

Heh, I think that's quite a loaded question ;-)

I know you skipped asking about gobject exactly but I guess it's worth covering the whole topic to give background to the question...

At least for GObject, I'd have to say, I've come to feel these days that it's somewhat over engineered for our requirements and as a consequence it makes performance compromises. E.g. although signals and properties are an optional aspect of GObject their flexibility does lead to very bad performance when you have to emit signals at the frequency Clutter does let alone the even higher frequency that something like Cogl might desire.

So much of the profiling I end up doing of Clutter apps these days just boils down to the complexity of GObject sadly and even though I'm working with a colleague on some patches to optimize GObject signaling I think generally its going to take a long time to evolve and optimize GObject to be suitable for Cogl.

Without getting too carried away with attacking gobject performance - which theoretically can be incrementally addressed - I also have some general reservations as to the value of the hierarchical GType system for Cogl...

So far Cogl has been influenced by several existing APIs, (most notably I'd have to say Cairo, OpenVG, OpenGL and D3D) and for all that we have taken from those and all that we've implemented so far in Cogl I don't think we've ever really missed not having a hierarchical or strictly object oriented type system and actually I think the minimal constraints we are able to work within afford us quite a lot of freedom to optimize things internally because our public contracts are perhaps simpler than they would be if we adopted a general purpose object model.

The programming model we have been steering Cogl towards I would rather describe as "Interface Oriented" instead of "Object Oriented" whereby we do have various object types in Cogl but the object types aren't public or important; what is more important is the set of interfaces that an object implements. The most common interface we have is the Object interface which enables ref counting and attaching user data to objects but then we can talk about mostly all other Cogl APIs as interfaces too such as CoglTexture and CoglFramebuffer and CoglBuffer in particular.

One of the key differences internally I think this makes is that abstracting things in terms of interfaces doesn't force you to structure code or state in a particular way or to share the same implementation for an interfaces of different objects if it doesn't make sense. For a typical hierarchical, object oriented design the types encapsulate actual code as well as interfaces and state but that code may compromise performance if it has to be generalized to suite all sub-classes and often that hierarchy leaks into the public contract limiting the ability to re-factor code internally.

Many of the objects we have in Cogl need to be instantiated at an extremely high frequency compared to typical uses of GObject that we really can't afford to loose control over the entire allocation cycle so that we can even use specialized allocators at times. For example pipelines, primitives and attributes may be allocated on a per-primitive basis in some cases (i.e. per draw command) but since they have fixed sizes and in some cases we can predict their lifetime we might at times prefer to stack allocate objects in some fashion which might be awkward to do if we don't fully control the details of our object model.

To be honest I've only ever skimmed through GstMiniObject code a few times before, quite some time ago, so I don't remember all the details of what it offers but so far at least I haven't really felt like our approach has been a limiting factor and I'd be skeptical to build on something aiming to be a general purpose object model, concerned that compromises may come as a result. The amount of code that makes GstMiniObject should presumably be pretty small anyway that if there's anything neat it does then maybe it wouldn't really be a big deal to cherry-pick the ideas that are useful to us instead of literally sharing the same code.

One thing I've been particularly mindful of when considering Cogl's design is that I don't want to expose too complex of an object model through the public API or even really expose complex data types from glib publicly. I think one of the biggest mistakes imho made with Clutter was to make GObject and a lot of glib a public part of the API which imho contributes to making it an extremely complex API to learn because before you can really do anything interesting with Clutter you first have to learn a lots of things about GObject and glib. I think this is something that Cairo really got right and I think Cairo is much more approachable to newcomers than say Gtk or Clutter because it avoids being a leaky abstraction so you don't have to learn lots of other apis first. It has been gutting for me to talk with 2 of the biggest software vendors in the world (not referring to Nokia and MeeGo here which is different story) who've tried to evaluate Clutter but basically given up due to the complexity including GObject, but I can definitely sympathize and so I'd like to avoid being burned again. That isn't to say I don't think gobject should ever be used, but personally I would prefer keep it an implementation detail where possible.

Ok, this has turned into a fairly long and rambling reply to your innocent question :-P

At the end of the day I don't want to rule anything out, such as for example inheriting (maybe optionally) from GTypeInstance to perhaps make it easier for us to take advantage of GObject introspection, and maybe actually GstMiniObject does have some compelling features to warrant considering that too, but I figured I'd give a broader summary of how we've been discussing this topic up until now, since you kind-of asked :-)

If you do have some ideas for what we might get from something like GstMiniObject please do say!

kind regards,
- Robert

Luca Bruno

unread,

Sep 27, 2011, 5:16:53 PM9/27/11

to cog...@googlegroups.com

On Tue, Sep 27, 2011 at 10:55 PM, Robert Bragg <robert...@gmail.com> wrote:

Heh, I think that's quite a loaded question ;-)

I know you skipped asking about gobject exactly but I guess it's worth covering the whole topic to give background to the question...

Ahah, no I understand that GObject isn't suitable for such a low-level library :-)
My question was exactly referring to GstMiniObject, i.e. a cheap GTypeInstance.

One thing I've been particularly mindful of when considering Cogl's design is that I don't want to expose too complex of an object model through the public API or even really expose complex data types from glib publicly.

That is reasonable, in fact a possible GTypeInstance could in theory (not tested) still be internal, i.e. keep the structs private while allowing subclasses internally and at the same time have a gobject-introspectable API.

I think this is something that Cairo really got right and I think Cairo is much more approachable to newcomers than say Gtk or Clutter because it avoids being a leaky abstraction so you don't have to learn lots of other apis first. It has been gutting for me to talk with 2 of the biggest software vendors in the world (not referring to Nokia and MeeGo here which is different story) who've tried to evaluate Clutter but basically given up due to the complexity including GObject, but I can definitely sympathize and so I'd like to avoid being burned again. That isn't to say I don't think gobject should ever be used, but personally I would prefer keep it an implementation detail where possible.

That's sad, but I agree to some extents that GObject is more suited for higher-level code, although I don't think learning its basics is that hard.

At the end of the day I don't want to rule anything out, such as for example inheriting (maybe optionally) from GTypeInstance to perhaps make it easier for us to take advantage of GObject introspection, and maybe actually GstMiniObject does have some compelling features to warrant considering that too, but I figured I'd give a broader summary of how we've been discussing this topic up until now, since you kind-of asked :-)

If you do have some ideas for what we might get from something like GstMiniObject please do say!

I got the point that for cogl, obviously at least at this stage of development (and probably in the future) it's not meant to expose public API for subclassing. My concern with GTypeInstance is that it's basically what you do with cogl objects except you could share the framework if GType is not that a big overhead, but still keep structs private and allow subclasses internally.
While Gstreamer has lots of subclasses and a public API for that, cogl indeed doesn't need it but I thought that using GTypeInstance without GObject wouldn't have been slower.

Thanks a lot for the answer.

Robert Bragg

unread,

Sep 28, 2011, 9:25:53 AM9/28/11

to cog...@googlegroups.com

yeah, its certainly something that's been suggested a few times and seems like it could be worthwhile for the sake of being able to take advantage of GObject Introspection more than we do currently and I can't see that GTypeInstance could really have any impact.

Probably at the moment there would be a few fiddly details surrounding CoglTexture where we know we have work to do to cleanup how we handle different backend texture types, but otherwise it should be fairly straightforward work.

kind regards,
- Robert

Reply all

Reply to author

Forward

0 new messages