Hi,
On Sunday, September 25, 2011 5:34:46 PM UTC+1, Lethalman wrote:
Hello,
I'm really interested in the cogl project and I'm trying to dig into reading the code and looking toward cogl2.
that's great
I have a couple of questions about some design choices and performance (I don't have deep knowledge of neither 3d graphics nor cogl):
1) Cogl path is an interesting component, but I see it does not call underlying opengl. For example you implement your own evaluator for bezier curves so that you can take advantage of the journal. How much does it affect/improve the performance?
OpenGL evaluators are a deprecated OpenGL utility API (you can see here for an overview of deprecated functionality in OpenGL:
http://www.opengl.org/wiki/History_of_OpenGL#Deprecation_Model) and they are also not available with GLES 1/2 used on mobile platforms. Also, since they are only a utility that are almost certainly just handled on the CPU by most/all? existing drivers there wouldn't really be much value in depending on multiple hardware vendors for this functionality, it would add to the variability between platforms compared to just re-using a single portable solution across all platforms.
For the performance I can't say I've ever tried using the OpenGL evaluator API but form what I vaguely recall from comparing with the performance of Cairo I think the recursive division approach we use works quite well with very similar performance to Cairo. From my experience although the bezier flattening can certainly show up in profiles the cost of tessellation is typically far higher. The reason we decided to expose retained path support with the CoglPath API is so we can retain the tessellated triangle mesh for a given path so ideally you only have to worry about the cost of bezier evaluation and tessellation once.
Since you note that indeed CoglPath doesn't use OpenGL directly internally I can add a note to say that we've actually considered splitting the CoglPath api out into a cogl-2d sub-library (like we have cogl-pango) because it is very orthogonal utility code with relation to the rest of Cogl. Actually all the drawing done for CoglPath is basically done with public Cogl API.
With regards to using the journal, I think there is some misunderstanding here because the journal is only currently used for rendering rectangles so unless your path is a rectangle then CoglPath drawing doesn't go through the journal.
2) I see cogl transforms the vertices of the journal against the modelview matrix with the CPU. Would it be possible/efficient to let the GPU do this?
The thing to note here is that only rectangles currently go through the journal - anything else bypasses it and the vertex transform is done fully by the gpu. The problem with trying to use the GPU to do vertex transform here is that we'd have to update the modelview matrix between each draw command and just the process of updating the modelview would be many, many times slower than simply transforming 4 vertices for a rectangle not to mention the cost of uploading/mapping the vertices for the gpu and emitting the command itself - there can be a surprising amount of driver code involved in emitting a separate draw command but it's not that many instructions to transform 4 points through one matrix.
There is a threshold of geometry complexity to consider here, whereby we can safely say its too costly to program the gpu to transform one rectangle at a time, but you could imagine once you start feeding more complex geometry meshes to the GPU the cost of updating the modelview becomes worthwhile. One of the main aims of the Journal is to target primitives that would be categorized as too simple to justify having separate draw calls, the journal tries to minimize the cost of uploading small primitives to the GPU and tries to batch them together so we do less work programming the GPU.
Although the journal is currently strictly limited to tracking rectangles the aim is that it will also start to track small CoglPrimitive too and once we do that then we will have to do a more explicit evaluation of this threshold to decide when software transform should be skipped for more complex primitives.
3) There are in general many parts that are implemented at CPU level without taking advantage of any opengl extension or driver in particular. Is that proven to be overall more efficient?
Certainly for all the various tricks we play in the journal, such as software transform, software clipping and software read_pixels they have shown to have a big performance benefit for a good range of Clutter applications on a range of GPUs and platforms. If you are interested in evaluating some of these things more carefully you might want to take a look at some of the COGL_DEBUG options we have. COGL_DEBUG=help will give a fairly descriptive list of options you can play with but you might be interested in COGL_DEBUG=disable-software-transform, COGL_DEBUG=disable-software-clip and COGL_DEBUG=disable-batching (I can't remember when I last tried disabling the software transform though so there is change that it might not work any more, sorry if that's the case) COGL_DEBUG=journal might also be interesting to see what kind of batching is happening in the journal.
As with the notes above about doing software transforms, a key thing to note here is that we only aim to play these tricks and use the CPU over the GPU in cases where it would be too in-efficient to be programming the GPU for such tiny primitives and we would always aim to use the GPU when dealing with more substantial meshes of geometry. Currently all drawing for the CoglAttribute/CoglPrimitive APIs bypasses the journal entirely and its only if you use the cogl_rectangle_* api that the journal is used.
4) Is there any document comparing the performance of straight opengl over cogl?
No documents no sorry, and it's a bit tricky to evaluate fairly since the programming models have diverged a bit (GL has a global state machine design where it can sometimes be awkward to avoid redundant re-assertion of state when you have many orthogonal components vs Cogl which encapsulates full state descriptions in objects but there's work involved with managing those objects.)
Recently I've been working on a Cogl backend for Cairo and something that's interesting for me is that there are two existing pure GL backends (one is the upstream cairo-gl and there is also a cairogles backend on
code.google.com) that I can compare against. It's still really early days for my work but I'm already seeing really promising results and out-performing both backends on the things I support so far. To be fair though the backends aren't really rendering in a like-for-like way, although I would say currently I'm quite similar to the cairogles backend. So far I've been quite enjoying the process of developing a Cogl backend for Cairo since its the first substantial project I've tried to tackle using Cogl beyond supporting Clutter and touch-wood it seems to working out pretty nicely for me so far.
I think quite a bit of the complexity that Cogl is able to simplify is the interaction with the window system. So for example dealing with partial updates of the front-buffer which for some applications can have a huge impact on performance. Since there are multiple different extensions to juggle with OpenGL/GLX/EGL etc to achieve this and get the throttling right you probably wouldn't bother trying to support this in a basic GL application, but with Cogl we've started providing a nicely unified framebuffer API with just one feature to query and we'll figure out how to make it work with the right extension internally. Enabling clipped redraws in Clutter can make a huge performance difference for some apps so I think this is quite a good example where Cogl makes it easier to access the performance of your platform.
I think if you follow a pattern with Cogl of creating template pipelines up-front and then whenever you need to draw something you create a pipeline by copying a template and drawing with that Cogl should stack well against GL. Its easy to abuse OpenGL (and no doubt without much Cogl documentation its easy to abuse Cogl too) and those factors are more likely to impact overall performance. e.g. I don't think it would really be possible to measure the cost of Cogl being layered on top of OpenGL since there's going to be so much code underneath OpenGL that the layering is a drop in the ocean so the bigger concern is that you don't abuse the GPU by asking it to do costly things or how you structure higher level components.
Cogl is certainly a much younger technology than OpenGL and it's only recently that we've started trying to try to use it outside of Clutter so no doubt we have some optimizations here and there we'll need to figure out, but generally speaking I think the design we have for tracking state in sparse objects is a pretty decent approach for a GPU api since I think it'll give us more opportunities than OpenGL has to associate expensive to derive state with those objects and being able to efficiently compare arbitrary state objects to know what gpu state needs updating is perhaps better than the typical dirty-state flags that many GL drivers rely on internally.
Thank you for all the great work.
Thank you for taking the time to look at Cogl, if you come up with any more questions or any of my reply wasn't clear please say and we'll try our best to help.
kind regards,
- Robert