glDrawArray vs glVertex data when compiling a displaylist

137 views
Skip to first unread message

Red15

unread,
Feb 8, 2009, 9:18:51 PM2/8/09
to pyglet-users
Might be a bit out of place here but I was wondering when you are
compiling a displaylist does the amount of calls to glVertex3f (or
other data) have an impact on the time it takes to execute that
display list afterwards ?

And if so would using glDrawArray make a significant reduction in that
time ?

I'm having an issue where calling a displaylist (containing about 70
faces) a 100 times is dropping my framerate in the low 20's on a
GeForce 9800X2 .... Cpu usage is also a measly 40-50% so it shouldn't
be bottlenecking the processor and since I'm using glCallList it
shouldn't send all the vertex data over the pipeline each frame
right ?

Tristam MacDonald

unread,
Feb 8, 2009, 10:29:34 PM2/8/09
to pyglet...@googlegroups.com
On Sun, Feb 8, 2009 at 9:18 PM, Red15 <red1...@gmail.com> wrote:

Might be a bit out of place here but I was wondering when you are
compiling a displaylist does the amount of calls to glVertex3f (or
other data) have an impact on the time it takes to execute that
display list afterwards ?

A display list just executes whatever commands it was compiled with (although it may do a few driver-specific optimisations under the hood).

And if so would using glDrawArray make a significant reduction in that
time ?

The immediate mode (glBegin/glEnd/glVertex and related functions) are deprecated, and the newer interfaces, in particular glDrawElements, perform much better these days.
 
I'm having an issue where calling a displaylist (containing about 70
faces) a 100 times is dropping my framerate in the low 20's on a
GeForce 9800X2 .... Cpu usage is also a measly 40-50% so it shouldn't
be bottlenecking the processor and since I'm using glCallList it
shouldn't send all the vertex data over the pipeline each frame
right?

Depending on driver optimisations, glCalList may well send all that data over the bus - I would suggest moving to vertex buffers and glDrawElements. Pyglet provides a handy wrapper for this: vertex_list_indexed.

--
Tristam MacDonald
http://swiftcoder.wordpress.com/

Red15

unread,
Feb 9, 2009, 9:26:19 AM2/9/09
to pyglet-users
On 9 feb, 04:29, Tristam MacDonald <swiftco...@gmail.com> wrote:
> On Sun, Feb 8, 2009 at 9:18 PM, Red15 <red15...@gmail.com> wrote:
>
> > Might be a bit out of place here but I was wondering when you are
> > compiling a displaylist does the amount of calls to glVertex3f (or
> > other data) have an impact on the time it takes to execute that
> > display list afterwards ?
>
> A display list just executes whatever commands it was compiled with
> (although it may do a few driver-specific optimisations under the hood).
>
> And if so would using glDrawArray make a significant reduction in that
>
> > time ?
>
> The immediate mode (glBegin/glEnd/glVertex and related functions) are
> deprecated, and the newer interfaces, in particular glDrawElements, perform
> much better these days.
>

I guess it will be worth a shot then, but unfortunately it will
require a rather difficult rewrite process to make sure I can group
all triangles and quads into their own respective array.
The data I'm reading comes from a Wavefront OBJ file and thus
triangles and quads are mixed without much regard.

> > I'm having an issue where calling a displaylist (containing about 70
> > faces) a 100 times is dropping my framerate in the low 20's on a
> > GeForce 9800X2 .... Cpu usage is also a measly 40-50% so it shouldn't
> > be bottlenecking the processor and since I'm using glCallList it
> > shouldn't send all the vertex data over the pipeline each frame
> > right?
>
> Depending on driver optimisations, glCalList may well send all that data
> over the bus - I would suggest moving to vertex buffers and glDrawElements.
> Pyglet provides a handy wrapper for this: vertex_list_indexed.

Would there be any performance-wise reason to use pyglet's vertex_list
over the regular glDrawArray/DrawElements ?

Thanks for the reply I will report back on performance once I get the
work done.

Tristam MacDonald

unread,
Feb 9, 2009, 9:42:53 AM2/9/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 9:26 AM, Red15 <red1...@gmail.com> wrote:
I guess it will be worth a shot then, but unfortunately it will
require a rather difficult rewrite process to make sure I can group
all triangles and quads into their own respective array.
The data I'm reading comes from a Wavefront OBJ file and thus
triangles and quads are mixed without much regard.

Graphics cards don't draw quads, they will be split up into triangles by the driver anyway. You are probably better off pre-triangulating (most exporters and modelling packages have this option), and then rendering everything as triangles.
 
Would there be any performance-wise reason to use pyglet's vertex_list
over the regular glDrawArray/DrawElements?

Pyglet's vertex_list is considerably simpler to use. Beyond that, vertex_list will use VBOs (Vertex Buffer Objects) if available, to ensure your vertex data is actually in video memory.

Alex Holkner

unread,
Feb 9, 2009, 4:18:17 PM2/9/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 1:18 PM, Red15 <red1...@gmail.com> wrote:
>
> Might be a bit out of place here but I was wondering when you are
> compiling a displaylist does the amount of calls to glVertex3f (or
> other data) have an impact on the time it takes to execute that
> display list afterwards ?
>
> And if so would using glDrawArray make a significant reduction in that
> time ?

This is basically an alternative opinion to Tristam's -- I haven't
measured the performance difference. It's my understanding that
whether you use vertex arrays or immediate mode (glVertex3f, etc)
should make no difference once the display list is compiled (of
course, it will take longer to specify the vertex data while compiling
the display list, but we usually don't care about load times much).

>
> I'm having an issue where calling a displaylist (containing about 70
> faces) a 100 times is dropping my framerate in the low 20's on a
> GeForce 9800X2 .... Cpu usage is also a measly 40-50% so it shouldn't
> be bottlenecking the processor and since I'm using glCallList it
> shouldn't send all the vertex data over the pipeline each frame
> right ?

You should use NVProf or a similar tool to discover if data is being
sent over the bus. In this case though, I'm pretty sure that even if
it is, that's not the performance bottleneck. Calling glCallList 100
times is actually a moderate amount of work for pyglet, especially if
you're also calling various transform functions as well -- the Python
ctypes function call overhead is not insignificant.

A very quick way to check this is to run python with the -O option.
When you do this pyglet will not call glGetError after every GL call,
which should at least double your running time -- if it does, you know
that the function call overhead is the bottleneck, and you should look
into reducing the number of them; if not, it may well be some other
problem.

Alex.

>
> >
>

Tristam MacDonald

unread,
Feb 9, 2009, 5:17:43 PM2/9/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 4:18 PM, Alex Holkner <alex.h...@gmail.com> wrote:
This is basically an alternative opinion to Tristam's -- I haven't
measured the performance difference.  It's my understanding that
whether you use vertex arrays or immediate mode (glVertex3f, etc)
should make no difference once the display list is compiled (of
course, it will take longer to specify the vertex data while compiling
the display list, but we usually don't care about load times much).

This is true if a) your driver correctly optimises display lists, and b) if you specify *exactly* the same components per vertex - skip even a single normal, and your driver will typically fall back to immediate mode. Also, display lists don't generally store their elements in video memory unless you source their elements from a VBO, so the data goes across the bus every frame.

Red15

unread,
Feb 9, 2009, 7:32:15 PM2/9/09
to pyglet-users
> Graphics cards don't draw quads, they will be split up into triangles by the
> driver anyway. You are probably better off pre-triangulating (most exporters
> and modelling packages have this option), and then rendering everything as
> triangles.

Done, a bit of a downer to this imo is the disk size but I guess in a
world where google throws gigabytes at your head you shouldn't bicker
about 1-2k more or less :)

> Pyglet's vertex_list is considerably simpler to use. Beyond that,
> vertex_list will use VBOs (Vertex Buffer Objects) if available, to ensure
> your vertex data is actually in video memory.

After implementing it I was still having trouble with the performance
from as little as 100-200 entities.
I was playing around a bit and when I disabled the material setting on
each model suddenly my frames jump into the 1000fps.

What is the way to efficiently switch between materials as this
apparently is causing major slowdown:

def compile(self):
self.gllist = glGenLists(1)
glNewList(self.gllist,GL_COMPILE)
for mat, vlist in self.ivlists: # ivlists are dicts where key =
material name and value = IndexedVertexList

# Disabling next statments gives me frame boost
glMaterialfv(GL_FRONT, GL_DIFFUSE, materials[mat]['diffuse'])
# And all other material properties here one by one as well as
texture binding

vlist.draw(GL_TRIANGLES)
glEndList()

Do notice this is while compiling the list, apparently the list
actually also does the glMaterialfv and glBindTexture call each frame
which (I suspect) is pushing the texture down the pipeline each frame
(Ouch, poor PCIe bus ;)

I guess I have to use something like glPushAttrib or glPushState but
what is appropriate for material and texture calls ?

A simple rtfm with a good tutorial anyone found particulary usefull
would be appreciated.

Regards,
Niels

Gary Herron

unread,
Feb 9, 2009, 7:58:11 PM2/9/09
to pyglet...@googlegroups.com
Red15 wrote:
>> Graphics cards don't draw quads, they will be split up into triangles by the
>> driver anyway. You are probably better off pre-triangulating (most exporters
>> and modelling packages have this option), and then rendering everything as
>> triangles.
>>
>
> Done, a bit of a downer to this imo is the disk size but I guess in a
> world where google throws gigabytes at your head you shouldn't bicker
> about 1-2k more or less :)
>
>
>> Pyglet's vertex_list is considerably simpler to use. Beyond that,
>> vertex_list will use VBOs (Vertex Buffer Objects) if available, to ensure
>> your vertex data is actually in video memory.
>>
>
> After implementing it I was still having trouble with the performance
> from as little as 100-200 entities.
> I was playing around a bit and when I disabled the material setting on
> each model suddenly my frames jump into the 1000fps.
>
> What is the way to efficiently switch between materials as this
> apparently is causing major slowdown:
>

Lookup glColorMaterial. It allows you to change one material parameter
per vertex quickly using glColor.

Normally glColor specifies values for use when lighting is *off*,
and glMaterial specifies values for use when lighting is *on*.

However, glClorMaterial is a kind of cross-over between the two.

Specify all your glMaterial parameters as normal.

Choose one parameter for cross-over. Examples:
glColorMaterial(GL_FRONT, GL_DIFFUSE)
glColorMaterial(GL_FRONT_AND_BACK, GL_DIFFUSE)
glColorMaterial(GL_FRONT_AND_BACK, GL_AMBIENT_AND_DIFFUSE)

Enable lighting and color material features:
glEnable(GL_LIGHTING)
glEnable(GL_COLOR_MATERIAL)

Then draw your geometry with frequent changes to glColor:
big loop:
glColor3fv(color) # will become the materials diffuse value (for
instance)
glNormalfv(normal)
glVertex3fv(position)

If you are using vertex arrays (and you ought to), throw in a
glColorPointer(...)
along with the other arrays
glTexCoordPointer(...)
glNormalPointer(...)
glVertexPointer(...)
and get the same effect with the colors specified that way.



Enjoy,
Gary Herron

Greg Ewing

unread,
Feb 9, 2009, 8:02:04 PM2/9/09
to pyglet...@googlegroups.com
Red15 wrote:
> does the amount of calls to glVertex3f (or
> other data) have an impact on the time it takes to execute that
> display list afterwards ?
>
> And if so would using glDrawArray make a significant reduction in that
> time ?

Probably not. I expect they both end up putting vertex
data in the display list in the same format.

--
Greg

Tristam MacDonald

unread,
Feb 9, 2009, 9:39:22 PM2/9/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 7:32 PM, Red15 <red1...@gmail.com> wrote:

> Graphics cards don't draw quads, they will be split up into triangles by the
> driver anyway. You are probably better off pre-triangulating (most exporters
> and modelling packages have this option), and then rendering everything as
> triangles.

Done, a bit of a downer to this imo is the disk size but I guess in a
world where google throws gigabytes at your head you shouldn't bicker
about 1-2k more or less :)
 
The next step is to index the triangles, which improves performance immensely, but also reduces disk size a little.

def compile(self):
       self.gllist = glGenLists(1)
       glNewList(self.gllist,GL_COMPILE)
       for mat, vlist in self.ivlists: # ivlists are dicts where key =
material name and value = IndexedVertexList

               # Disabling next statments gives me frame boost
               glMaterialfv(GL_FRONT, GL_DIFFUSE, materials[mat]['diffuse'])

I am pretty sure (but can't find a reference right now) that any glMaterial* calls inside a display list will disable major optimisation.
 
Do notice this is while compiling the list, apparently the list
actually also does the glMaterialfv and glBindTexture call each frame
which (I suspect) is pushing the texture down the pipeline each frame
(Ouch, poor PCIe bus ;)

The texture won't be pushed through the pipeline every frame, as textures live in video memory (well, mostly), and all you are sending is a bind command. However, texture binds and material changes are expensive, so you are advised to batch your models by texture and material. If a single model uses multiple textures or materials, break it up into sub meshes that only use one of each, and render them separately.

Again, if you move to using pyglet's vertex lists and batches, batching by textures will be handled for you.

Red15

unread,
Feb 9, 2009, 10:55:30 PM2/9/09
to pyglet-users
> I am pretty sure (but can't find a reference right now) that any glMaterial*
> calls inside a display list will disable major optimisation.

I've tracked back an error in my code which seems to be at the base of
the whole issue with fps.

Running with -O continuously gives the downside of not catching
errors :)

So far I have been able to deduct that glMaterial call's arent really
that heavy (unless you make a mistake in them in which case they will
be slow).

Furthermore using glCallist() actually proved to be a performance
degrader as well.
I had created a glCallList() with inside it a call to the pyglet
vertex_lists and when then on each frame calling the displaylist it
actually reduced my fps to unbearably low fps.

I finally decided to iterate my list of vertex_lists and call them
manually inside my draw function (1 call per frame) and lo and behold
the frames jump up to reasonable results.

You would not expect manual (in python) iteration over objects to go
faster than a glCallList would you ?

So now the only thing I have yet to implement is TextureGroup and see
what kind of performance boost I get.
Right now I'm calling the same material up for the 200 identical
objects I'm rendering and I guess I could save time by ensuring the
material would only be set once.

Regards,
Niels

Tristam MacDonald

unread,
Feb 9, 2009, 11:58:41 PM2/9/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 10:55 PM, Red15 <red1...@gmail.com> wrote:
You would not expect manual (in python) iteration over objects to go
faster than a glCallList would you ?

A huge majority of the OpenGL information floating the web is very outdated. Display lists, immediate mode, vertex arrays and the whole fixed-function pipeline are all deprecated, and slated for removal in a later OpenGL version (OpenGL ES has already shed these - it has less need for backwards compatibility).

If you hunt around, you will see that these deprecated features have been given very low priority for optimisation in recent drivers, so the performance continues to slide.

Alex Holkner

unread,
Feb 10, 2009, 12:06:38 AM2/10/09
to pyglet...@googlegroups.com
On Tue, Feb 10, 2009 at 3:58 PM, Tristam MacDonald <swift...@gmail.com> wrote:
> On Mon, Feb 9, 2009 at 10:55 PM, Red15 <red1...@gmail.com> wrote:
>>
>> You would not expect manual (in python) iteration over objects to go
>> faster than a glCallList would you ?
>
> A huge majority of the OpenGL information floating the web is very outdated.
> Display lists, immediate mode, vertex arrays and the whole fixed-function
> pipeline are all deprecated, and slated for removal in a later OpenGL
> version (OpenGL ES has already shed these - it has less need for backwards
> compatibility).

While vertex arrays are deprecated in GL 3, they are still present in GL ES.

Alex.

Casey Duncan

unread,
Feb 10, 2009, 11:59:46 AM2/10/09
to pyglet...@googlegroups.com
On Mon, Feb 9, 2009 at 9:58 PM, Tristam MacDonald <swift...@gmail.com> wrote:
> On Mon, Feb 9, 2009 at 10:55 PM, Red15 <red1...@gmail.com> wrote:
>>
>> You would not expect manual (in python) iteration over objects to go
>> faster than a glCallList would you ?
>
> A huge majority of the OpenGL information floating the web is very outdated.
> Display lists, immediate mode, vertex arrays and the whole fixed-function
> pipeline are all deprecated, and slated for removal in a later OpenGL
> version (OpenGL ES has already shed these - it has less need for backwards
> compatibility).

WRT fixed-functionality pipeline deprecation, unfortunately I have
found that there are still many many machines out there (for example
the prev gen iBooks, etc) that do not have shader support. Unless you
are writing hardcore games targeted for the latest hardware (which
seems highly unlikely in the hobby sphere given their big content
budgets), or simply don't care about supporting a large fraction of
the installed hardware, fixed-function coding is a fact of life. Of
course this is changing but it will be a few years before shader
support reaches a critical mass IMO for the hobby "market".

As that is happening I imagine better shader support will appear in
projects like Pyglet, and other high-level gaming/graphics libraries
(though Pyglet's 2D aspirations make this somewhat less compelling).

-Casey

Tristam MacDonald

unread,
Feb 10, 2009, 12:18:40 PM2/10/09
to pyglet...@googlegroups.com
On Tue, Feb 10, 2009 at 11:59 AM, Casey Duncan <casey....@gmail.com> wrote:

On Mon, Feb 9, 2009 at 9:58 PM, Tristam MacDonald <swift...@gmail.com> wrote: 
 
WRT fixed-functionality pipeline deprecation, unfortunately I have
found that there are still many many machines out there (for example
the prev gen iBooks, etc) that do not have shader support. Unless you
are writing hardcore games targeted for the latest hardware (which
seems highly unlikely in the hobby sphere given their big content
budgets), or simply don't care about supporting a large fraction of
the installed hardware, fixed-function coding is a fact of life. Of
course this is changing but it will be a few years before shader
support reaches a critical mass IMO for the hobby "market".

Right, my point here was not to discourage the use of the fixed function pipeline in particular. Note that you will take a performance hit on newer cards - for instance, my Intel Integrated X3100 (MacBook) performs considerably better with shaders than without.

However, immediate mode and display lists were deprecated for very good reason - they don't provide the sort of performance that VBO and indexed triangle lists do, even on legacy hardware.
Reply all
Reply to author
Forward
0 new messages