Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Minimizing glBegin/glEnd not always optimal, was: serious optimization

189 views
Skip to first unread message

Joachim Schröer

unread,
May 15, 2002, 7:08:40 AM5/15/02
to
Hello Anders,

I'm a hobby OpenGL programmer. I have a nice sparetime project were I
try to optimize rendering performance, an own version of XEarth.
Noticing that normal OpenGL texturing is limited in size of the bitmap
you can map (GluScaleImage may use a bitmap of maximum 1024 x 1024
pixels) I wrote a simple texturing routine for a sphere in order to use
my really nice earth bitmap of 1060 x 2180 pixels in full detail. I
generate a quad for each pixel (except a triangle-strip around the
poles). I render only the visible pixels which makes up about 1.2
million polygons.

There are two modes. One which shows the earth from the current position
of the sun and one which uses a fix position and shows the current
day/night zones using a light source for the sun. In the mode without
lighting your observation is perfectly true. Minimizing the number of
glBegin/glEnd calls by e.g. using quad-strips significantly reduces
frametime but the opposite is true when lighting is used. I have no idea
why. Here is my configuration:

NT 4.0, Intel Celeron / Pentium III, 1 Ghz, 521 Mb mem.
Nvidia TNT2 M128, 16 Mb

Here are 2 Ada95 code fragments:

---------------------------------------------------------------------------
procedure Render_Quad_Strip(Col : in Positive) is
-- This is faster on an unlit sphere.
Last : constant Positive := Positive'Min(The_Model.Last_Lat(Col),
The_Model.Vertices(1)'Last);
begin
if Last >= The_Model.First_Lat(Col) then
Gl.Glbegin(Gl.Gl_Quad_Strip);
for I in The_Model.First_Lat(Col) .. Last loop
-- Render only visible pixels.
Set_Color(The_Model.Pixels(I + 1, Col));
Set_Vertex(The_Model.Vertices(2)(I));
Set_Vertex(The_Model.Vertices(1)(I));
end loop;
Gl.Glend;
end if;
end Render_Quad_Strip;
---------------------------------------------------------------------------

procedure Render_Quads(Col : in Positive) is
-- This is faster on a lit sphere.
begin
-- Gl.Glbegin(Gl.Gl_Quads);
for I in Positive'Max(The_Model.First_Lat(Col), 2) ..
Positive'Min(The_Model.Last_Lat(Col),
The_Model.Pixels'Last(1) - 1) loop
-- Render only visible pixels.
Set_Normal (The_Model.Normals(I)); -- Glnormal3fv
Set_Material(The_Model.Pixels(I, Col)); -- Glmaterialfv
Set_Color (The_Model.Pixels(I, Col)); -- Glcolor3ub

Gl.Glbegin(Gl.Gl_Quads);
Set_Vertex(The_Model.Vertices(1)(I - 1)); -- Glvertex3fv
Set_Vertex(The_Model.Vertices(1)(I));
Set_Vertex(The_Model.Vertices(2)(I));
Set_Vertex(The_Model.Vertices(2)(I - 1));
Gl.Glend;
end loop;
-- Gl.Glend;
end Render_Quads;
---------------------------------------------------------------------------

These routines render a slice of up to 1080 quads from north to south
pole with a width of 360° / 2160. They are called 2160 times to build
the globe.
After reading your post I tried a version of Render_Quads with
Glbegin/Glend outside the loop. The frame-rates for the complete globe
consisting of ~1.2 million quads are:

glBegin/glEnd outside loop: 6000 msec.
glBegin/glEnd inside loop: 4150 msec.

P.S. display-lists may not be used. Using this amount of polygons they
consume more memory than I have (512 Mb).

Best regards
J. Schröer


--------------------------------------
Hmm. This is great. He's asking for a solution to a problem vertex
arrays didn't solve, and to which display lists are not an option. And
still the first two replys are: Use displaylists and use vertex arrays!
That's really helpfull [;-)]

Anyway, back to the real problem:
Doing a glBegin / glEnd for every polygon is extremely slow. You need to
first draw all triangles with _one_ glBegin / glEnd, and the all quads
the same way.
A simple fix would be like this:

for (i = 0; i < numPolys; i++)
{
glBegin (GL_TRIANGLES);
if (Polys[i].type = QUAD)
{
continue;
}

for (j = 0; j < 3; j++)
{
temp_index = Polys[i].indices[j];
glColor3f (Verts[temp_index].r, Verts[temp_index].g,
Verts[temp_index].b);
glVertex3f (Verts[temp_index].x, Verts[temp_index].y,
Verts[temp_index].z);
}
glEnd();
}
for (i = 0; i < numPolys; i++)
{
glBegin (GL_QUADS);
if (Polys[i].type == TRI)
{
continue;
}
for (j = 0; j < 4; j++)
{
temp_index = Polys[i].indices[j];
glColor3f (Verts[temp_index].r, Verts[temp_index].g,
Verts[temp_index].b);
glVertex3f (Verts[temp_index].x, Verts[temp_index].y,
Verts[temp_index].z);
}
glEnd();
}

If this isn't enough, unroll the inner loops, or even better, make two
vertex arrays: one with the quads, and one with the triangles, then draw
the entire triangle array and then the entire quad array.

Anders


Jason Allen wrote:
Use vertex arrays.

"A L" <zaina...@hotmail.com> wrote in message
news:b0f8767d.02051...@posting.google.com...

Hey everyone, I'm experiencing unacceptable (or justifiable)
framerates with a 70,000 poly scene. I'm running an Athlon XP 2000, 1
gig RAM, GeForce 4 Ti, so this model should fly. It certainly does in
the 3d app I used to make it. I'm pretty green with opengl, so if you
could suggest any improvements (and I'm sure you can,) please, please
let me know...

I'm reading in vertex info/colors and poly vertex indices from a file,
then running through as follows:


for (i = 0; i < numPolys; i++) {

if (Polys[i].type == TRI) {
tempint = 3;
glBegin (GL_TRIANGLES);
} else if (Polys[i].type = QUAD) {
tempint = 4;
glBegin (GL_QUADS);
}

for (j = 0; j < tempint; j++) {

temp_index = Polys[i].indices[j];

glColor3f (Verts[temp_index].r, Verts[temp_index].g,
Verts[temp_index].b);

glVertex3f (Verts[temp_index].x, Verts[temp_index].y,
Verts[temp_index].z);
}


glEnd();
}

I'm also using GLUT_DOUBLE, DEPTH_TEST, CULL_FACE, and SwapBuffers.
I've tried using two vertex arrays instead (one for coords, one for
color), but got no improvement. Display lists are not an option,
since I need to change the color of the vertices interactively. Any
ideas?

Much thanks!!

Anders Brodersen

unread,
May 15, 2002, 7:56:42 AM5/15/02
to
Interesting. I must admit I find it very hard to believe that calling
glBegin and glEnd 1.2 million times each frame can reduce rendering
time. Unless your driver somehow optimizes calls to glNormal, glColor or
glMaterial outside glBegin/glEnd blocks.
That is sort of the only explanation I can come up with, although I
don't really believe in it myself. Could you do me a favor and move the
Gl.Glbegin (the one inside the loop) up before Set_Normal, and then try
again? It shouldn't be any different, but please try anyway, and let me
know.

Does anyone else have a better explanation?

Anders

Joachim Schröer

unread,
May 15, 2002, 8:27:55 AM5/15/02
to
Anders Brodersen wrote:

> Interesting. I must admit I find it very hard to believe that calling
> glBegin and glEnd 1.2 million times each frame can reduce rendering
> time. Unless your driver somehow optimizes calls to glNormal, glColor or
> glMaterial outside glBegin/glEnd blocks.
> That is sort of the only explanation I can come up with, although I
> don't really believe in it myself. Could you do me a favor and move the
> Gl.Glbegin (the one inside the loop) up before Set_Normal, and then try
> again? It shouldn't be any different, but please try anyway, and let me
> know.
>
> Does anyone else have a better explanation?
>
> Anders
>


Ok, I've done your option and reran the 2 other versions, the numbers:

glBegin/glEnd outside loop : 6000 msec.

glBegin/glEnd in loop,glNormal,glColor,glMaterial out: 4150 msec.
glBegin/glEnd in loop,glNormal,glColor,glMaterial in : 5200 msec.

It has an influence. It's amazing how large these differences are.

Achim

>> -- Gl.Glbegin(Gl.Gl_Quads); -- 3. option

Anders Brodersen

unread,
May 15, 2002, 8:47:25 AM5/15/02
to
I must admit that I'm getting very confused. It seems to me that the
more Gl commands you put in a glBegin/glEnd block, the slower it gets,
and that the increase is more than linear! Could this be somehow be a
cache problem? Or maybe a driver issue? How old are your drivers?

Anders

Joachim Schröer

unread,
May 15, 2002, 11:28:11 AM5/15/02
to
Anders Brodersen wrote:

> I must admit that I'm getting very confused. It seems to me that the
> more Gl commands you put in a glBegin/glEnd block, the slower it gets,
> and that the increase is more than linear! Could this be somehow be a
> cache problem? Or maybe a driver issue? How old are your drivers?
>


This is my driver info (no idea what it means):
Version: 4.00.1381.0631, 4.0.0
nv4_mini.sys, nv4_disp.dll

This is my PC at work where a run the program as a kind of screen-saver.
The PC is a 8 month old Dell.

The SW was developed at home on a sligtly slower PC (1 year old) running
Win2k. I realized this runtime behaviour there and I'm sure your 3-th
solution will also be in the middle of the 2 others using that machine.

As I said, I assumne this must have something to do with lighting and
material properties.

Achim

Anders Brodersen

unread,
May 15, 2002, 11:45:39 AM5/15/02
to
Well, at least I can tell that your drivers are very old. You should try
getting new ones from the nVidia web site. (As you might have guessed, I
do not like this situation at all. I Would really like it to have a
different explanation from what we've come up with so far!)

Ruud van Gaal

unread,
May 15, 2002, 12:29:18 PM5/15/02
to
On Wed, 15 May 2002 17:45:39 +0200, Anders Brodersen <r...@daimi.au.dk>
wrote:

>Well, at least I can tell that your drivers are very old.

...


>> This is my driver info (no idea what it means):
>> Version: 4.00.1381.0631, 4.0.0

I'm at 6.13.10.2311 (23.11 drivers). They may not be official,
although I think they are. Otherwise you can at 21.83 or something at
least.

Although most improvements only relate to Geforce* cards.


Ruud van Gaal
Free car sim: http://www.racer.nl/
Pencil art : http://www.marketgraph.nl/gallery/

Joachim Schröer

unread,
May 15, 2002, 5:45:19 PM5/15/02
to
Hello,

did you see the champions league final? Leverkusen again won the price for
nice playing but lost the game and Zidane is really great.

Ok, here is my home PC configuration for reference:

Intel Pentium 3, 866 Mhz, 124 Mb memory, Win2k
Graphics card: MSI MS-8808 (Nvidia TNT2 M64)
Chip: Riva TNT2 Model 64, 32 Mb
Version: 2.05.2002
Driverversion: 5.3.2.0
nv4_disp.dll, display driver, Version 5.12.01.0650
nvoglnt.dll, OpenGL client driver, Version dito.
nv4_mini.sys, nvcpl.dll, nvqtwk.dll, nvdesk.dll, all the same Version.

Here the frame rates for the 3 code variants:

glBegin/glEnd outside loop : 8720
msec.
glBegin/glEnd in loop,glNormal,glColor,glMaterial out: 5800 msec.
glBegin/glEnd in loop,glNormal,glColor,glMaterial in : 6820 msec.

By the way, what's so strange with this behaviour? I only wondered about the
fact that for lighting enabled quads are faster than quad-strips.
For quads I first tested the case with as much as possible outside the
glBegin/glEnd frame and did not wonder that this was faster.
Achim

P.S. If you want you can have the sources and/or the exe. It's based on a
complete Ada95 OO-framework above OpenGL. (In a former job I developed
OpenGL based instrument display SW for flight simulators.) I will put the
code on the Adapower server anyway in 2-3 weeks time. I'm still playing with
some great-circle algorithms. A one year old version without this self made
sphere texturing is on
www.adapower.com/schroer

"Anders Brodersen" <r...@daimi.au.dk> schrieb im Newsbeitrag
news:3CE282A3...@daimi.au.dk...

fungus

unread,
May 16, 2002, 12:17:30 AM5/16/02
to
Anders Brodersen wrote:
>
> Interesting. I must admit I find it very hard to believe that calling
> glBegin and glEnd 1.2 million times each frame can reduce rendering
> time.

Why?

Imagine you have a graphics card which can draw 25 million
triangles/sec, and a 1GHz CPU (an average GeForce2 machine).

For each triangle drawn, the CPU has 40 clock cycles.

40 clock cycles isn't much for calling several functions
(glBegin(), three calls to glNormal()/glVertex(), glend())
and also setting up all those data transfers to the graphics
card.

People still expect it to happen though....because
it says "25 million" on the back of the box.

--
<\___/>
/ O O \
\_____/ FTB.

Amit Rao kt

unread,
May 16, 2002, 3:18:45 AM5/16/02
to
Hi all,
the solution of why this is optimised and not that is really simple..
for that lets go over some of the basics...

1. glBegin and glEnd mark the start and end of your input to the video
card and hence keep a sort of direct link open to the card. this time
should be the minimum possible to keep the card working at max
performance.

2. for optimization purposes opengl doesnt consider the glnormal and
gl material properties when in unlit mode.

3. caling glBegin and glEnd over and over would also cause a
performance loss.

so getting all three together ....
glBegin/End outside the loop for the lit sphere is not a good idea as
the additioinal calculations for glNormal and glMaterial properties
with light cause the time duration between glBegin and glEnd to be
quite large. this causes the card to poinntlessly wait for the channel
to close before rendering and thus causes slowdown.

on the other hand for unlit sphere the glNormal and glMaterial calls
dont come into the picture and the solution is hence faster.

the rest i belive is self explanatory.

(I recently gave a seminaar on opengl and 3d realtime optimization, if
you want you can mail me for the slides thoough i am hoping to put
them up on the web soon.)

hope that helps,
amit rao kt


"Joachim Schr?r" <joachim....@web.de> wrote in message news:<abuks7$lb56u$1...@ID-76083.news.dfncis.de>...

0 new messages