Real-time PC texture mapping

Chaz Bocock

unread,

Dec 16, 1994, 9:40:39 PM12/16/94

to

In article <3cr3ap$e...@ixnews2.ix.netcom.com>
cgr...@ix.netcom.com "eric pitzer" writes:

> >
> >I don't think so. 3000 texture mapped, shaded polys on a 486 anything with
> >local bus video is quite an accomplishment. We managed to get 4000 flat
> >shaded polys on a 486DX/33 with an incredible amount of tweaking and
> >pre-setup of the polygons. Adding texture mapping, or even gouraud shading
> >dropped the frame rate pitifully low.
> >
> >Curt
> >
>
> You know, texture mapping and environment mapping needn't be
> really slow if one avoids high frequencies in the maps. Vast
> increases in rasterizing rates are possible when superpixel
> sampling of the maps is used, trading detail for speed; I won't
> go into the aliasing & hidden-surface problems you'll have however.
>
> Even without this, I was getting 16000 texture- and environment-
> mapped polygons per second on DX2 (Xform & Rasterizing); twice
> that for rasterizing.
>

I'm sorry, but I fail to believe that you were getting 16000 texture-mapped
polys on a DX2. I have slaved for months over my fully-asm coded poly
fill (flat shaded) and it uses NO variables, only registers. It is the
best algorithm I could come up with, after inventing many. I managed
just over 45000 /second on my friends DX2, so texture at 1/3 speed is
not possible. Even with texture-smearing you couldn't get this rate. Far
too many memory accesses for the 486.

Chaz Bocock (Coder / Eschaton Enterprises)
email ch...@eschaton.demon.co.uk

--
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . A VERY MERRY CHRISTMAS TO EVERYONE FROM . . . .
___ ___ ____ _ _ ___ ____ ___ ___
E |OOOO |OOOO |OOOO |O O |OOOO |OOOOO _OOOO |OOOO E
N |O_ |O |O____ |O |O |O____ |O____ |O |O |O |O |O |O |O N
T |OOO |OOOOOO |O |OOOOOO |OOOOOO |O |O |O |O |O T
E |O___|O ___|O |O___|O |O |O |O |O |O |O___|O |O |O E
R |OOOO |OOOO |OOOO |O |O |O |O |O |OOOO |O |O R
P P
R "Sects, Sects, Sects... is that all monks think about?" R
I I
S FINGER OR EMAIL ch...@eschaton.demon.co.uk FOR MORE INFO S
E E
S .ESCHATON...ESCHATON...ESCHATON....ESCHATON...ESCHATON...ESCHATON. S

Chaz Bocock

unread,

Dec 16, 1994, 9:43:39 PM12/16/94

to

Jez San

unread,

Dec 17, 1994, 3:58:21 PM12/17/94

to

Chaz -

We are getting about 90,000 gouraud shaded z buffered polygons per second
using our BRender 3d library on a 486 DX4 100 MHz PC and even more on a Pentium.

Texture mapping slows us down a small amount, but not significantly.

If you want more details on BRender and are in the uk, email pa...@argonaut.com
or in the usa email ri...@argonaut.com

-- Jez.

eric pitzer

unread,

Dec 17, 1994, 7:27:37 PM12/17/94

to

In <D0z3L...@cix.compulink.co.uk> js...@cix.compulink.co.uk ("Jez San") writes:

>We are getting about 90,000 gouraud shaded z buffered polygons per second
>using our BRender 3d library on a 486 DX4 100 MHz PC and even more on a Pentium.
>
>Texture mapping slows us down a small amount, but not significantly.
>
>If you want more details on BRender and are in the uk, email pa...@argonaut.com
>or in the usa email ri...@argonaut.com
>
>-- Jez.
>

-----How many of these polygons actually underwent a true view Xform?
-----How can true texture mapping, which is more expensive than
-----intensity and Z interpolation together, be cheap?

Kirk A. Baum

unread,

Dec 19, 1994, 12:08:00 AM12/19/94

to

Does Brender have a low level interface to the 2D rasterizer?
If so, how many 2D polys does it render per second. Do you
get the additional speed through your object hierarchy, clipping,
culling, etc..? Does Brender have an interface that is binary,
rather than text? I have looked at the package some and am
wondering how it compairs to RenderMorphics Reality Lab?

Thanks,

Kirk Baum
SingleTrac Entertainment

Brian Hook

unread,

Dec 19, 1994, 6:51:55 PM12/19/94

to

In article <3d4pef$d...@panix.com> ji...@panix.com (Jim Hourihan) writes:

> : I highly doubt they use a 4x4 matrix, most likely a 4x3 matrix since
> : the last column can be optimized out.

> What part of the pipeline are you doing this in? Are your matrices
> row or column major? Can any general 4x4 be supplied to your pipeline?

Doesn't matter if your matrices are row or column major -- depending
on which one you are using, you can use a 3x4 or a 4x3 -- the last
column (or row, depending) of a typical 4x4 matrix is 0,0,0,1 and can
thus be optimized out. This basically means that you are only worried
about scaling, rotation, and translation -- you can't do perspective
with a 4x3 matrix, but this is irrelevant since a software only
pipeline does perpsective in a different step.

: Assumptions assumptions assumptions. Fine, 30 pixel polygons. I'll
: grant you the Z-buffer stuff, even though they could be doing
: span checking, obviating the necessity for a compare and add
: at each pixel, and their texture mapping could be table based (which
: would mean two adds and a table look up per pixel). Now, another
: thing Argonaut may be doing is caching edge values for polygons,
: as this would save a LOT of time and prevent them from doing lots
: of redundant calculations. How does this figure in to your
: totals?

> These "texture maps" do not sound perspective correct. Are you interpolating
> texture coordinates linearly in screen-space or in eye-coordinates? Are these
> textures filtered? Do your textures appear to be swimming? Do you subdivide
> you geometry in screen-space to remove artifacts?

Who is 'you'? I'm not speaking for Argonaut, and I don't know how they
implement things, but I am stating that you can APPROXIMATE a perspective
correct texture map either using a cubic (third order) or quadratic
(second order) approximation of a typical texture mapping curve or,
alternatively, store successive curve U/V values in multiple tables, one
for each of a range of angles at which the polygon is oriented away from
the eye.

Obviously, interpolating values linearly in screen space is going to
cause artifacts, unless you compensate for Z, but that's still far
more efficient than doing things in eye-coordinates. How do
filtered texture maps affect performance? I doubt the Argonaut
textures swim -- my engine doesn't have any swimming textures. And
I would rather use a quadratic approximation than multiple subdivisions
with bilinear interpolation, since the more subdivisions you get the
less performance you achieve.

Generally, for very small polys I would use bilinear interpolation where
artifacts don't have enough space to become apparent. For larger polys
you could either A.) do a decent approximation or B.) adaptively choose
which interpolation algorithm to use based on the angle between the
polygon and the viewer.

Brian

--
+---------------------------------------------------------------------+
| Brian Hook | Mail me if you want a copy of the Great 3D |
| | Programming Book List with Reviews |
+- "Style distinguishes excellence from accomplishment" - J. Coplien -+

Jim Hourihan

unread,

Dec 20, 1994, 9:16:22 PM12/20/94

to

Brian Hook (b...@netcom.netcom.com) wrote:

: Doesn't matter if your matrices are row or column major -- depending

: on which one you are using, you can use a 3x4 or a 4x3 -- the last
: column (or row, depending) of a typical 4x4 matrix is 0,0,0,1 and can
: thus be optimized out. This basically means that you are only worried
: about scaling, rotation, and translation -- you can't do perspective
: with a 4x3 matrix, but this is irrelevant since a software only
: pipeline does perpsective in a different step.

My question about row vs column major was to find out if you were
removing the translations or the perspective. Certainly if you maintain
a separate projection matrix or stack of matrices you can remove
the column/row, I do not doubt this. Perhaps an even more optimized pipeline
would restrict transformations to a small set such as tran/rot/scale.
When you multiply them together you can make assumptions if you store
them prodecurally instead of as a 4x3/3x4 matrix. For example. Translations
add and Scales multiply, Combined Translations and scales do an element by
element mulitply, Tranlation^-1 is -Translation, Rotation-1 is
Transpose-Rotation etc. This type of transformation "engine" far outperforms
a matrix pipeline. Store the projection seperately. This is best kept as a
4x4 matrix.

: > These "texture maps" do not sound perspective correct. Are you interpolating

: > texture coordinates linearly in screen-space or in eye-coordinates? Are these
: > textures filtered? Do your textures appear to be swimming? Do you subdivide
: > you geometry in screen-space to remove artifacts?

: Who is 'you'? I'm not speaking for Argonaut, and I don't know how they
: implement things, but I am stating that you can APPROXIMATE a perspective
: correct texture map either using a cubic (third order) or quadratic
: (second order) approximation of a typical texture mapping curve or,
: alternatively, store successive curve U/V values in multiple tables, one
: for each of a range of angles at which the polygon is oriented away from
: the eye.

: Obviously, interpolating values linearly in screen space is going to
: cause artifacts, unless you compensate for Z, but that's still far
: more efficient than doing things in eye-coordinates. How do
: filtered texture maps affect performance? I doubt the Argonaut
: textures swim -- my engine doesn't have any swimming textures. And
: I would rather use a quadratic approximation than multiple subdivisions
: with bilinear interpolation, since the more subdivisions you get the
: less performance you achieve.

By swimming, I mean that the texture does not appear completely "fixed"
to the surface. If you are not doing perspective-correct texture coordinate
interpolation your texture will swim. If you are doing higher order
interpolation it will swim less, but it will still swim.

The filtering I am referring to is pre-computation of coverage on your
texture map. If you are doing mip-mapping there are a number of
color approximation methods you can use for various degrees of quality
in the texture. This will affect your performance greatly.

None of this is to say that you need to make perspective correct textures,
especially for purely interactive graphics where speed is god. There is
a lot of talk in this group about FAST_TEXTURES but little discussion about
the quality of these "textures."

: Generally, for very small polys I would use bilinear interpolation where

: artifacts don't have enough space to become apparent. For larger polys
: you could either A.) do a decent approximation or B.) adaptively choose
: which interpolation algorithm to use based on the angle between the
: polygon and the viewer.

Sounds good.

-Jim

Kendall Bennett

unread,

Dec 21, 1994, 8:52:42 AM12/21/94

to

bro...@coulomb.uwaterloo.ca (Bernie Roehl) writes:

>> Now we get to draw. Let's do a backface cull, which is 4
>> multiplies for a polygon.

>Of course, that's the wrong stage of the pipeline to be doing backface
>culling. As I pointed out over a *year* ago (and I'm sure other people
>have realized this as well) you should be doing your backface culling
>in *object* space, so that you never have to transform the normals.

Hi Bernie. I have been thinking about this problem myself just recently.
In my own renderer, I do backface culling of polygons in screen space
after they have been transformed (but before any lighting calculations
are performed. This works very effectively, but I have been thinking about
how you would actually do backfacing culling in object space. If the
application knows information about the object it is rendering, it can
handle the backface culling itself (it may know the real polygon normal
for the face, and can test it quickly).

However the problems that I see are as follows:

1. For an OpenGL style rendering pipeline (which ours is), each vertex
is sent down the pipeline one after the other, so you would need to
at least save the first three in order to test for backface culling.

2. If you have more than a single global tranformation matrix (such as
if you have a matrix stack for hierarchical viewing like we do)
then you are still going to need to transform from object space into
view space before the culling can be applied. At the moment, we only
do a single tranform step to get from object space to screen space,
so we save nothing in this case. Clipping for us is done in screen
space - when we add 3D clipping it might change things.

3. Unless you know the 'real' polygon for the face (not the user supplied
lighting normal) you will need to do the backfacing computations in
floating point. When you are in screen space coords you can simply
do it with a faster integer routine.

+-------------------------------------+------------------------------------+
| Kendall Bennett | Email: rc...@minyos.xx.rmit.edu.au |
| Software Engineer, SciTech Software | CIS: 100237,2213 |
| Unit 5, 106 Southbank Boulevard | Fax: +61 3 690 2137 |
| South Melbourne 3205 AUSTRALIA | |
+-------------------------------------+------------------------------------+

Bernie Roehl

unread,

Dec 21, 1994, 9:22:19 AM12/21/94

to

In article <3d9bva$c...@aggedor.rmit.edu.au>,
Kendall Bennett <rc...@minyos.xx.rmit.EDU.AU> wrote:
>
>Hi Bernie.

Hi, Kendall!

>In my own renderer, I do backface culling of polygons in screen space
>after they have been transformed (but before any lighting calculations

>are performed). This works very effectively, but I have been thinking about

>how you would actually do backfacing culling in object space. If the
>application knows information about the object it is rendering, it can
>handle the backface culling itself (it may know the real polygon normal
>for the face, and can test it quickly).

Right, you definitely need to know the object-space normal in order to
do the backspace culling there. (So far as I know).

>However the problems that I see are as follows:
>
> 1. For an OpenGL style rendering pipeline (which ours is)

Actually, that's *the* problem. OpenGL is derived from GL, which was
designed to work very well with a specific rendering pipeline (the one
that SGI had implemented in silicon). It's very hard to improve the
performance of OpenGL past a certain point, because it's impossible
to make any optimizations that involve altering the flow of the pipeline
(such as deciding to do your lighting or backface removal at a different
stage).

I have to admit, I'm no expert on OpenGL; if someone out there knows of
a way of getting around this, I'm eager to hear it.

> each vertex
> is sent down the pipeline one after the other, so you would need to
> at least save the first three in order to test for backface culling.

Right. OpenGL.

> 2. If you have more than a single global tranformation matrix (such as
> if you have a matrix stack for hierarchical viewing like we do)
> then you are still going to need to transform from object space into
> view space before the culling can be applied.

Right. OpenGl.

> 3. Unless you know the 'real' polygon for the face (not the user supplied
> lighting normal) you will need to do the backfacing computations in
> floating point. When you are in screen space coords you can simply
> do it with a faster integer routine.

You could, of course, do fixed point all the way down through the pipeline.

It sounds like the OpenGL compliance requirement is kind of a heavy burden.
I'm not sure what your application is, but can you move away from OpenGL?

--
Bernie Roehl
University of Waterloo Dept of Electrical and Computer Engineering
Mail: bro...@UWaterloo.ca
Voice: (519) 888-4567 x 2607 [work]

Gerald Gutierrez

unread,

Dec 21, 1994, 11:22:28 AM12/21/94

to

I'd like to get into all this heavy discussion going on, but I'm only
beginning in this branch of computer graphics. Some on this thread had
mentioned that there was a fountain of information on the internet about
various things such as gouard shading, 3D, image processing, etc... all
sound very interesting.

So in helping soeone get started, could someone please refer me to some
telnet/ftp sites where there is such information ?

Any help would be appreciated. Thanks.

Roger Cain

unread,

Dec 21, 1994, 11:53:21 AM12/21/94

to

Hi,
I'm wondering if there has been implemented a shareware/freeware
z-buffered rendering program for the PC (DOS or Windows). There
appear to be oogles of ray tracers, but it seems at the point that
anyone develops a renderer based on another algorithm, they try to go
commercial with it, rather than there being any freely distributable
software. It would be nice to find one that does textures,
environment mapping, bump mapping, etc., but even a basic one that
just does flat/gouraud/phong shading with specular highlighting would
be nice. Oh, and if it included source code, that would be even
better.

Please respond via e-mail to pr...@geomag.gly.fsu.edu.

Thanks,
---Prem

Allen Akin

unread,

Dec 21, 1994, 5:19:03 PM12/21/94

to

In article <D15zx...@watserv2.uwaterloo.ca>,
Bernie Roehl <bro...@coulomb.uwaterloo.ca> wrote:
| ... it's impossible

| to make any optimizations that involve altering the flow of the pipeline
| (such as deciding to do your lighting or backface removal at a different
| stage).

Why is it impossible? The OpenGL specification is written in terms of
lighting in eye coordinates, but you're free to implement lighting in
any way that yields the same general effect (as measured by the
conformance tests).

Similar observations can be made about culling, but with an interesting
exception. An OpenGL user can supply vertex normals, but not facet
normals, so using a facet normal for culling is not always going to be
the most efficient approach for an OpenGL implementation. This is an
example of an API design issue, rather than an implementation issue.
Most of this thread seems to be concerned with the latter, but the two
are frequently confused.

Allen

John Foust

unread,

Dec 21, 1994, 12:53:47 AM12/21/94

to

In article <BWH.94De...@netcom20.netcom.com>, b...@netcom20.netcom.com (Brian Hook) says:
>Alright Jez, you've sucked me in on this one. Backface culling
...
>is it possible to get around this fact? If this is a trade
>secret, that's fine, but if you AREN'T doing a dot product
>then I'd at least like to know that.

Uhm, don't expect Jez to start posting his secrets. :-)
There's a lot of money to be made from good drop-in 3D engines
these days, from obvious sells to game developers, to kiosk
makers who want talking chimps in their GUIs.

I've seen demos of BRender, and it's amazing. Maybe Argonaut
could post a simple Windows binary demo somewhere?

Kendall Bennett

unread,

Dec 21, 1994, 8:18:16 PM12/21/94

to

bro...@coulomb.uwaterloo.ca (Bernie Roehl) writes:

>>However the problems that I see are as follows:
>>
>> 1. For an OpenGL style rendering pipeline (which ours is)

>Actually, that's *the* problem. OpenGL is derived from GL, which was
>designed to work very well with a specific rendering pipeline (the one
>that SGI had implemented in silicon). It's very hard to improve the
>performance of OpenGL past a certain point, because it's impossible
>to make any optimizations that involve altering the flow of the pipeline
>(such as deciding to do your lighting or backface removal at a different
>stage).

Well I guess one of the things about OpenGL is that it is designed for
systems that do Zbuffering in either software or hardware, and where
the geometry processing is relatively cheap (very fast hardware or a CPU
with fast floating point). In fact, the current bottleneck in high end
systems that have hardware Zbuffering is actually rendering the pixels
to the display, not the geometry processing, so doing culling before
rasterisations is a big benefit (even on a software only implementation).

The way I see it, if you want to do backface culling object space, the
application progam can actually do this itself before sending polygons
to the rendering engine. If the app knows the object space normals for
the faces, then it can do quick and simple backface culling by with
a simple dot product between the tranformed normal vector and eye vector.

My personal opinion of OpenGL is that the API is very very well designed,
even for a software only rendering engine. The basic premise that OpenGL
is slow is primarily based on the fact that the current software only
implementations are written entirely in C, which is bound to be slow. If
you take the OpenGL interface and code up a highly optimised software
only geometry pipeline behind it (with fast assembler based Zbuffering
routines) I think the performance would be extremely good.

>> 2. If you have more than a single global tranformation matrix (such as
>> if you have a matrix stack for hierarchical viewing like we do)
>> then you are still going to need to transform from object space into
>> view space before the culling can be applied.

>Right. OpenGl.

Well this isn't an OpenGL problem specially. Just about any useful 3D
rendering API is going to have some kind of mechanisam for doing
hierarchical modelling, and a matrix stack is generally used for this
purpose.

>> 3. Unless you know the 'real' polygon for the face (not the user supplied
>> lighting normal) you will need to do the backfacing computations in
>> floating point. When you are in screen space coords you can simply
>> do it with a faster integer routine.

>You could, of course, do fixed point all the way down through the pipeline.

Well that wont solve the problem either ;-) In screen space coords you do
the culling with fast integer multiplies. If you are in fixed point you
still need to do slower fixed point multiplies (still integer, but a fixed
point multiply on a 486 runs at about the same speed as a floating point
multiply).

WU DAVID S

unread,

Dec 21, 1994, 11:12:59 PM12/21/94

to

I recently just took a dabble into 3D rendering and I came up with a
few ideas. If anyone knows if any of these are bad, please let me know.

First of all, after rendering the Torus with gourad (only about 800 faces) I discovered
that I was spending most of my time in translating verts. To screen space (I was doing
all of them for each object) I was also trying to clip everything.

What I've done:

Every polyhedron has a vertice list, a bsp sorted face list, a center coordinate,
a bounding radius, and a direction cosine matrix.

First the polyhedrons center is translated into camera coords. If z is neg, it is
thrown away. Comparing y vs z and x vs z finds which clipping planes apply.
(using bounding radius)

Then the camera is translated into obj space, the clipping planes that apply are
also made in obj space.

For each face, if it is culled, it is ignored. At this point I recursivly draw the
face in front and then the one behind. (since camera is behind and want to draw back
to front.) Is this sort of recursion evil or slow in any way? (haven't looked at
the asm yet. Asm usually frieghtens me.)
Then I check the object for clipping against the planes that apply. Either a,
the bounding sphere is outside (throw away poly) b, sphere is inside (keep poly and
flag each of it's vertices as drawn.) or c, poly is clipped. (send poly to alternate
pipeline for 3D clipping. Don't flag verts.)

I use floats since for some reason, a 3x3 xform is 2x as fast with floats comared to
longs on my 486-50DX with DJGPP 32 bit compiler.

Then draw.

I sort objects rather than faces (since there are usually around 16 obs per screen)
but this means that obs should be mostly rounded in shape. With this approach I
guess I could use any sorting algorithm that I wanted.

Some Ideas:
Maybe use a lookup table for the (scale = x_scale/z) based on Z.
Maybe draw front to back (with has_been_drawn_buffer) as either a warnock type thing
or a scan line thing.
Maybe use some sort of Z-buffer.

David Wui

Brian Hook

unread,

Dec 21, 1994, 1:50:32 AM12/21/94

to

In article <3d8ftb$b...@beta.inc.net> synd...@beta.inc.net (John Foust) writes:

> Uhm, don't expect Jez to start posting his secrets. :-)
> There's a lot of money to be made from good drop-in 3D engines
> these days, from obvious sells to game developers, to kiosk
> makers who want talking chimps in their GUIs.

I didn't say I expected him to post his secrets -- I simply asked
if he had gotten around the need for a dot product during culling
and shading. And obviously, he didn't need to respond if he didn't
want to -- many others are forthcoming about how they do things
in 3D graphics, so I was also hoping that someone else would post
a possible answer to the above question.

One of the major tradeoffs in engineering a fast 3D graphics pipeline
is generality vs. speed. OpenGL is very general, but horribly slow
for a game engine. BRender seems very fast -- so I'm simply trying
to find out what trade offs have to exist in order to attain such
speed, and if they would exact a toll in the product's usefulness.

Brian Hook

unread,

Dec 22, 1994, 4:00:40 AM12/22/94

to

In article <787632...@eschaton.demon.co.uk> ch...@eschaton.demon.co.uk (Chaz Bocock) writes:

> I'm sorry, but I fail to believe that you were getting 16000 texture-mapped
> polys on a DX2. I have slaved for months over my fully-asm coded poly
> fill (flat shaded) and it uses NO variables, only registers. It is the
> best algorithm I could come up with, after inventing many. I managed
> just over 45000 /second on my friends DX2, so texture at 1/3 speed is
> not possible. Even with texture-smearing you couldn't get this rate. Far
> too many memory accesses for the 486.

You know, I wasn't the original poster, however this kind of posting
is just so absolutely ridiculous I had to say something that I had
hoped would've been quite obvious -- JUST BECAUSE YOU COULDN'T
DO SOMETHING DOESN'T MEAN THAT SOMEONE ELSE CAN'T.

16000 polygons/second, even texture mapped, is not that hard to
believe on a DX2/66. If BRender claims 90000 on a DX4/100, drop
this to 60000 on a DX2/66. This is four Gouraud shading. Assuming
a 50% speed hit for texture mapping (which I find to be VERY
unlikely since Gouraud shading only has one less interpolant) then
that's still about 30K/sec on a DX2/66 with BRender. Worst
case.

My own UNOPTIMIZED with a whopping 30 lines of asm code rendering
engine does about 68K on a P90, figure 25K on a DX2/66, drop to about
12-15K for texture mapping. With Z-buffering. Let me do depth
sorting and I gain about 3-4K/sec.

So speaking from experience I would say 15K is what a reasonable
but not incredible engine would get doing texture mapping on
a DX2/66 with depth sorting, maybe even with Z-buffering.

(Jez, what ARE your quoted rates for 320x200 texture mapped?)

Brian Hook

unread,

Dec 21, 1994, 4:23:23 PM12/21/94

to

In article <3d9bva$c...@aggedor.rmit.EDU.AU> rc...@minyos.xx.rmit.EDU.AU (Kendall Bennett) writes:

> 1. For an OpenGL style rendering pipeline (which ours is), each vertex
> is sent down the pipeline one after the other, so you would need to
> at least save the first three in order to test for backface culling.

Most optimizations discussed so far I believe are counter to
a generalized 3D pipeline such as OpenGL's. For example, if you look
at the OpenGL state machine backface culling is done VERY late in
the pipeline, which in a game engine isn't going to cut it.

OpenGL is very polygon based, whereas most game engines are 'object'
or 'actor' based, depending on who's nomenclature you are using. This
is for one major reason -- culling out entire objects (and clumps of
objects) is one of the single greatest performance boosts you can do.
Another major reason is that when dealing with rendering an object at
a time, you can take advantage of vertex and edge coherency and avoid
transforming and projecting vertices more than once.

> 2. If you have more than a single global tranformation matrix (such as
> if you have a matrix stack for hierarchical viewing like we do)
> then you are still going to need to transform from object space into
> view space before the culling can be applied. At the moment, we only
> do a single tranform step to get from object space to screen space,
> so we save nothing in this case. Clipping for us is done in screen
> space - when we add 3D clipping it might change things.

This does cause a LOT of tricky coding ugly spaghetti coding, because
every object must have an aggregate transformation matrix (which is
the concatentation of all of its parents and its own local transforms
matrix). This matrix needs to be calculated in one of two instances:
a.) the object is the viewer, and thus you need a world->viewer
matrix and b.) you need to be doing culling/lighting calculatiosn
in object space. For b.) you only want to compute the aggregate
when necessary, i.e. when the entire object's hierarchy is front
of the near Z clip plane.

It ain't pretty, but it works.

> 3. Unless you know the 'real' polygon for the face (not the user supplied
> lighting normal) you will need to do the backfacing computations in
> floating point. When you are in screen space coords you can simply
> do it with a faster integer routine.

As mentioned before, doing culling in screen space means that you have
projected points (and transformed points) that don't need to be
touched in any way. And floating point ain't so bad (and you can use
fixed point if you really want to) on a 486DX even. Some of the
biggest advantages of culling in object space include the fact that
you never have to normalize or transform any vertex or surface normals
(this is the main reason I went to object space culling and lighting).

Alexander Enzmann

unread,

Dec 22, 1994, 9:24:58 AM12/22/94

to

In article <3d9mi1$p...@mailer.fsu.edu>, pr...@geomag.gly.fsu.edu (Roger
Cain) wrote:

Try Polyray. It has raytracing, zbuffered polys, etc. It does all of the
normal sorts of texturing things (all of the ones you listed are
handled). You describe your scenes with an ASCII description language
(similar to POV-Ray, not the same).

It is Shareware, available from Compuserve in the GRAPHDEV forum. I'm not
sure where you would find it on the net, I have very limited access and
don't know my way around. You can also find it in several trade books,
the only book with the up to date version is PC Graphics Unleashed, there
are several with the previous version.

Note that Polyray is not intended to be a super fast rendering engine for
doing games or the like. I've quite deliberatly spent the time on making
it possible to do high quality renders rather than spend time on
optimizing the speed of polygon rendering.

Dave Baldwin

unread,

Dec 23, 1994, 5:12:31 AM12/23/94

to

In article <BWH.94De...@netcom20.netcom.com>,
b...@netcom20.netcom.com (Brian Hook) wrote:

>
> My question, then, is if I transform my viewer into object space for
> each object, and each object TYPE has a BSP tree associated with
> it, would I see reasonable performance gains over just doing a
> dot product per polygon in object space?
>
> Brian

The simple answer is yes. The more complex answer is maybe. As you
traverse the bsp tree you still need to find out which side of a plane you
are on (i.e. back to a dot product) at each node to control the traversal
order. *If* the tree is well balanced then you will test far fewer planes
than polygons, so you should win. Again there are always other
assumptions such as the cost of traversing the bsp tree, what effect is
has on your cache behaviour, how complex the object is, etc which need to
be taken into account.

Dave.

Amanda Walker

unread,

Dec 22, 1994, 4:56:54 PM12/22/94

to

b...@netcom20.netcom.com (Brian Hook) writes:
> I didn't say I expected him to post his secrets -- I simply asked if
> he had gotten around the need for a dot product during culling and
> shading.

One interesting way (at the cost of more memory usage) would be to take
polygon orientation into account in a preprocessing stage (like constructing a
BSP-tree). Visibility trees of various sorts can give you amazing performance
wins for complex but static environments/objects where you can only see a
section at any one time. They do eat memory, though.

Amanda Walker
InterCon Systems Corporation

Brian Hook

unread,

Dec 22, 1994, 5:12:23 PM12/22/94

to

In article <941222165...@chaos.intercon.com> ama...@intercon.com (Amanda Walker) writes:

>One interesting way (at the cost of more memory usage) would be to take
>polygon orientation into account in a preprocessing stage (like constructing a
>BSP-tree). Visibility trees of various sorts can give you amazing performance
>wins for complex but static environments/objects where you can only see a
>section at any one time. They do eat memory, though.

This is something that is very interesting. I haven't really messed
with BSPs that much since I'm interested in very dynamic sceneary
and objects, and BSPs have traditionally been considered somewhat
antithetical to this.

My question, then, is if I transform my viewer into object space for
each object, and each object TYPE has a BSP tree associated with
it, would I see reasonable performance gains over just doing a
dot product per polygon in object space?

Brian

Jez San

unread,

Dec 24, 1994, 8:39:55 AM12/24/94

to

Brian -

Dont hold me to this, but for texture mapping (same test suite as
the gouraud shaded demo) it was about 75k polys/sec, and with texture
and gouraud it was about 68k. Environment mapped was about 55k.

the newer fast figures were about 110k polys/sec gouraud and 118k/sec
for flat.

-- Jez.

eric pitzer

unread,

Dec 24, 1994, 4:02:27 PM12/24/94

to

In <787632...@eschaton.demon.co.uk> ch...@eschaton.demon.co.uk (Chaz Bocock) writes:

>
>In article <3cr3ap$e...@ixnews2.ix.netcom.com>
> cgr...@ix.netcom.com "eric pitzer" writes:
>
>> >
>> >I don't think so. 3000 texture mapped, shaded polys on a 486 anything with
>> >local bus video is quite an accomplishment. We managed to get 4000 flat
>> >shaded polys on a 486DX/33 with an incredible amount of tweaking and
>> >pre-setup of the polygons. Adding texture mapping, or even gouraud shading
>> >dropped the frame rate pitifully low.
>> >
>> >Curt
>> >
>>
>> You know, texture mapping and environment mapping needn't be
>> really slow if one avoids high frequencies in the maps. Vast
>> increases in rasterizing rates are possible when superpixel
>> sampling of the maps is used, trading detail for speed; I won't
>> go into the aliasing & hidden-surface problems you'll have however.
>>
>> Even without this, I was getting 16000 texture- and environment-
>> mapped polygons per second on DX2 (Xform & Rasterizing); twice
>> that for rasterizing.
>>
>

>I'm sorry, but I fail to believe that you were getting 16000 texture-mapped
>polys on a DX2. I have slaved for months over my fully-asm coded poly
>fill (flat shaded) and it uses NO variables, only registers. It is the
>best algorithm I could come up with, after inventing many. I managed
>just over 45000 /second on my friends DX2, so texture at 1/3 speed is
>not possible. Even with texture-smearing you couldn't get this rate. Far
>too many memory accesses for the 486.
>

Actually its over 20000 photorealistic PPS on DX2. The
"superpixel sampling" algorithm (many, many problems) is
2-4 times faster.

David Wilson

unread,

Dec 18, 1994, 9:25:31 PM12/18/94

to

js...@cix.compulink.co.uk ("Jez San") shamelessly claims:

>We have tested the BRender 3d API on a 486 DX4 100 MHz PC (Dell),
>with a regular vga (no hardware acceleration) using a standard
>benchmark shape; a 4,000 polygon tesselation of the utah teapot.
>The benchmark is from object space into image space, ie: all of the
>3d rendering pipeline is being performed. We are NOT pre-storing
>any results (eg: doing the lighting only once, then simply
>reading it from a table later). The benchmark included: 4x4 matrix
>transform, lighting, clipping, projection, rendering with gouraud
>shading and z buffering. This achieved about 90,000 polygons per
>second (derived from the number of polygons in the scene multiplied
>by the frame rate). I should add that this was an older version
>of BRender and that the latest version is probably faster, and is
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>due for a retest.

Shameless plug?

>When we switch on additional options, like texture mapping, environment
>mapping, etc... things go slower, but not significantly slower.
>There was talk by other folk on this group of a third of the speed
>for texture mapping, and our results do not agree with this. Approximated
>texture mapping runs only a few percent slower, and genuine perspective
^^^^^^^^^^^^^^^^^^^^
Specifics?

>texture mapping (still with gouraud shading and z buffering) is still
>faster than 1/3 (probably close to a half). In general we use texture
>approximations for rounded shapes that have a lot of polygons, and we
>use perspective texturing for large flat shapes that have very large
>polygons easily visible close up in the scene. There is no reason to
>use perspective texturing for ALL polygons in the scene since most
>people would not know the difference. When we quote a benchmark, however
>we are quoting ALL polygons, and are not relying on any approximation
>or dropping of detail. That would be unfair.
^^^^^^^^^^^^^^^^^^^^^

Of course it would be.

90,000 Gouraud shaded, Z buffered polygons/sec on DX4-100?
Read on....

Teapot obtained from avalon.chinalake.navy.mil:/pub/objects/off.
According to teapot.geom first line: 1976 3751 11253
vertices^ ^polygons

Ok, let's look at the figures...90,000 gouraud shaded,
z-buffered polygons per second including full 4x4 matrix
xform, project, culling, lighting, etc. On a 486DX4-100.

Given that the Utah teapot has 3,751 polygons that means a
frame rate of 24 fps. There are 1,976 vertices. Therefore
47,424 transformations & projections per second *before you
start drawing*. Given a 4x4 transform this is a total of 9
fixed point multiplies at 20 clocks each on a 486 (assuming
16-bit IMUL, 13-26 clocks: average 20), plus 12 adds and at
least 15 memory references. Let's say 9*20 + 12 + 15 = 207
clocks/point, total 9,816,768. Division (16-bit) requires 27
clocks, 2 per point, total 27*2*47,424 = 2,560,896. In 32-bit
all this would probably take another 30-40% longer. Let's say
12.5 million clocks so far (allowing for code fetching, etc.)

Now we get to draw. Let's do a backface cull, which is 4

multiplies for a polygon. 3,751 * 24 * 4 = 360,000 at 20
clocks each is another 7,201,920 clocks. Up to 20 million
clocks. Let's assume half the polygons are culled. Let's
assume also that each polygon covers about 30 pixels. A
Z-buffer requires one addition and one comparison per pixel,
Gouraud shading is one addition, and texture-mapping is (let's
go with these "new algorithms") two additions and one division
per pixel. Add in code fetching and about 4 memory
accesses/pixel (5 if the texture map is Gouraud shaded), and
the *absolute minimum* you are looking at is about 50
clocks/pixel. 45,000 polygons, let's say you need about 4
divisions to set up (Gouraud & texture) which is 27*4*45000 =
5e6, 50 clocks/pixel * 30 * 45,000 = 67.5e6.

(I think that since approximated texture mapping is "only a few percent
slower" it should be included in the benchmark.)

GRAND TOTAL:

Transform, project 12.5M
BF cull 7.5M
Setup cost 5.0M
Drawing 67.5M
------
92.5M

This very conservative estimate *does not* take into account
the shifting needed for fixed point multiplies and divides
(each shift being 2 clocks, which would be probably 10M). It
also assumes that display memory writes are the same as a
memory access (which on a "standard VGA" is wrong by a factor
of 5 or more) and the Z-buffer comparison always succeeds.
Also forget instruction decode and the lighting dot products,
which would add about 5-10 million clocks on. *Also* don't
forget that everything is probably 32-bit not 16-bit.

Conclusion: 90,000 is the "advertising figure" for BRender
3D. Although I haven't seen it (nor have I heard of it before
now, sigh) my estimations make it look...too good to be true.

Just my $0.02. (If you want to correct my assumptions, feel
free.) *Ouch! That 486 is sizzling!*

____
Dave
e310...@mailbox.uq.oz.au

Chaz Bocock

unread,

Dec 23, 1994, 9:52:42 AM12/23/94

to

In article <3dak4o$p...@aggedor.rmit.EDU.AU>
rc...@minyos.xx.rmit.EDU.AU "Kendall Bennett" writes:

> >> 2. If you have more than a single global tranformation matrix (such as
> >> if you have a matrix stack for hierarchical viewing like we do)
> >> then you are still going to need to transform from object space into
> >> view space before the culling can be applied.
>
> >Right. OpenGl.
>
> Well this isn't an OpenGL problem specially. Just about any useful 3D
> rendering API is going to have some kind of mechanisam for doing
> hierarchical modelling, and a matrix stack is generally used for this
> purpose.
>
> >> 3. Unless you know the 'real' polygon for the face (not the user supplied
> >> lighting normal) you will need to do the backfacing computations in
> >> floating point. When you are in screen space coords you can simply
> >> do it with a faster integer routine.
>
> >You could, of course, do fixed point all the way down through the pipeline.
>
> Well that wont solve the problem either ;-) In screen space coords you do
> the culling with fast integer multiplies. If you are in fixed point you
> still need to do slower fixed point multiplies (still integer, but a fixed
> point multiply on a 486 runs at about the same speed as a floating point
> multiply).

I missed the original post for this (Jez San one?). Could you tell
me about this 90000/sec polys? What machine/what size polys?

I have read the previous articles on backface removal, and on my
3d engine (I started coding a couple of months back) I use the
determinant of matrix solution; ie I check whether the screen-projected
co-ords are in cw/anti-cw direction. The routine is implemented in the
triangle display routine, because of the hieracial object structure,
I don't think that anywhere else would be suitable.

Whilst I was coding this routine, I wondered if there was a way to get
rid of all the sub/add/muls in the formula, being an assembler coder
striving for max. speed! ( I sat in the bath this morning, and thought
of a way to clip lines to the screen without using any div/muls etc,
it was sooo obvious, yet no one thinks of these things.)

I thought that there may be a way of
determining orientation of a poly (specifically triangles) by comparing
relative positions of screen co-ords. ie

1
/ \ <<--- Triangle before rotation.
/ \
2 /_________\ 3

then,

1
/ \ <<--- Triangle after rotation, and facing the
/ \ other way, ie a backface.
3 /_________\ 2

Now, in the first example, point 2 had a lower x coord than 1, but in
the second, it had a greater one.

So, perhaps you could compare the X and Y coords of all the vertices,
against their starting relatives. ie 2 is less X, but more Y than 1.
Then use this at projection time to determine if its a backface or not??

Hope I'm not spouting rubbish,

Chaz Bocock (Coder / Eschaton Enterprises)

--
. . . . . . . . . . . . . . . . . .

./\ . . . . . . . . . . . . . /\ . .
./ \ . . A VERY MERRY CHRISTMAS TO EVERYONE FROM . . /. \ .

. . || . . . . . . . . . . . . . . .|| . .

E .d88b. d88888b. .d8888P d88 88b .d88888b d8b d888888b 88888b. E
N d8P Y8b 888P d88P" 888 "I88 d888888b 88P "Y88 888 "88b N
T 88888888 Y888888b 888 88888888 d8888888 "888' 88 88 888 888 T
E Y8b. "888 Y88b. 888 "888 88P "Y88 88888b 88b d88 888 888 E
R "Y8888 .d88888P "Y8888P Y88 88P Y888888P Y8888P Y888888P 888 888 R

P P
R "Sects, Sects, Sects... is that all monks think about?" R

i i
S FINGER or EMAIL ch...@eschaton.demon.co.uk FOR MORE INFO S
E E
S..ESCHATON..ESCHATON..ESCHATON..ESCHATON..ESCHATON..ESCHATON..ESCHATON...S
(sorry about the massive sig, it will return to normal size after xmas)

Brian Hook

unread,

Dec 23, 1994, 9:41:45 PM12/23/94

to

In article <788194...@eschaton.demon.co.uk> ch...@eschaton.demon.co.uk (Chaz Bocock) writes:

> I missed the original post for this (Jez San one?). Could you tell
> me about this 90000/sec polys? What machine/what size polys?

Jez has stated that it is now up to 100K/sec, and that this is
for a 4000 polygon teapot on a DX4/100 in 320x200x8 mode. This
is Z-buffered, but I'm not sure if it's for Gouraud shaded or
flat shaded. Having seen the demos, Argonaut is not lying.

> I have read the previous articles on backface removal, and on my
> 3d engine (I started coding a couple of months back) I use the
> determinant of matrix solution; ie I check whether the screen-projected
> co-ords are in cw/anti-cw direction. The routine is implemented in the
> triangle display routine, because of the hieracial object structure,
> I don't think that anywhere else would be suitable.

Are you the same poster who said it is IMPOSSIBLE to get 15K/sec
texture mapped polygons because if you couldn't then no one else
could? If the above is how you do backface culling -- well, that
rather answers a lot questions.

'because of the hieracial (sic.) object structure, I don't think
anywhere else would be suitable' Sounds like you haven't put that much
thought into it. Forgive my brusqueness, but for some reason it miffs
me when someone gets on a newsgroup like comp.graphics.algorithms and
screams that he has a 'MEGA-tweaked' polygon rendering algorithm and
that no one could have faster speed than he because he wrote it all in
assembly. And then goes on to state that he backface culls in screen
space.

> Whilst I was coding this routine, I wondered if there was a way to get
> rid of all the sub/add/muls in the formula, being an assembler coder
> striving for max. speed! ( I sat in the bath this morning, and thought
> of a way to clip lines to the screen without using any div/muls etc,
> it was sooo obvious, yet no one thinks of these things.)

Sure, you can do horizontal scan line clipping, that doesn't involve
any divs/muls (unless you're Gouraud shading or texture mapping or
Z-buffering or doing any other type of interpolated rendering
algorithm). But I'd rather pay the price once (during the original
clipping) than pay it over and over in my inner most loop of my
polygon rendering loop.

> I thought that there may be a way of
> determining orientation of a poly (specifically triangles) by comparing
> relative positions of screen co-ords. ie

Yes, you can do this. It becomes a very ugly set of if statements,
but it's faster (on some architectures) if you are actually doing
things in screen space (which you shouldn't be).

eric pitzer

unread,

Dec 15, 1994, 10:59:21 PM12/15/94

to

>
>I don't think so. 3000 texture mapped, shaded polys on a 486 anything with
>local bus video is quite an accomplishment. We managed to get 4000 flat
>shaded polys on a 486DX/33 with an incredible amount of tweaking and
>pre-setup of the polygons. Adding texture mapping, or even gouraud shading
>dropped the frame rate pitifully low.
>
>Curt
>

You know, texture mapping and environment mapping needn't be
really slow if one avoids high frequencies in the maps. Vast
increases in rasterizing rates are possible when superpixel
sampling of the maps is used, trading detail for speed; I won't
go into the aliasing & hidden-surface problems you'll have however.

Even without this, I was getting 16000 texture- and environment-
mapped polygons per second on DX2 (Xform & Rasterizing); twice
that for rasterizing.

I would suggest to everyone that they read Graham J. Dunnett's
article "Image Chip" and his use of Edge Functions, in
IEEE CG&A, Nov. 92.

You can also do parallel arithmetic by allocating 2 or even 3
variables per register.

eric pitzer

unread,

Dec 18, 1994, 7:40:14 PM12/18/94

to

In <D10H...@cix.compulink.co.uk> js...@cix.compulink.co.uk ("Jez San") writes:

>

>BRender uses some new texture mapping algorithms to achieve very high
>performance for texture mapping with full perspective. There are also
>approximations of less accurate texture mapping which are faster. BRender
>supports both types with selection in the model data as to the accuracy
>of the texture mapping required for each polygon.
>
>-- Jez.
>

-- Does this BRender run in 32-bit protected mode?
-- How does one avoid at least 1 texel lookup per pixel and
multiplication by the diffuse intensity? Literature?
-- Is BRender marketed? How? --CGRAPH

Bernie Roehl

unread,

Dec 19, 1994, 9:29:18 AM12/19/94

to

>David Wilson (e310...@dingo.cc.uq.oz.au) wrote:
> Now we get to draw. Let's do a backface cull, which is 4
> multiplies for a polygon.

Of course, that's the wrong stage of the pipeline to be doing backface

culling. As I pointed out over a *year* ago (and I'm sure other people
have realized this as well) you should be doing your backface culling

in *object* space, so that you never have to transform the normals.

As Jouni points out,

>With coding, you should never assume anything.

Agreed!

Robert Schmidt

unread,

Dec 26, 1994, 7:35:19 PM12/26/94

to

I'm not particularly impressed by such figures -- because they are
too scenario-sensitive, and thus mean nothing to me.

What is a polygon anyway? How many pixels? Vertically or horizontally
dominant? 3, 4 or n vertices?

Alternatively, when supplying us with the speeds of your engines, tell us
the number of frames per second, when the *entire screen* (320x200) is
filled by 1 polygon (not perpendicular to any axes). This gives the speed
of the actual texture mapping process. Interpolating (i.e. approximating)
engines will stand out clearly as the fastest, but give the worst looks.
Exact engines will look good, but are slow. Hybrids (exact, but still
interpolating) will be inbetween.

--
Robert Schmidt - rob...@idt.unit.no

All work and no play makes Robert a dull boy

Brian Hook

unread,

Dec 26, 1994, 9:45:48 PM12/26/94

to

In article <3dnng7$p...@due.unit.no> rob...@idt.unit.no (Robert Schmidt) writes:

> I'm not particularly impressed by such figures -- because they are
> too scenario-sensitive, and thus mean nothing to me.

> What is a polygon anyway? How many pixels? Vertically or horizontally
> dominant? 3, 4 or n vertices?

These are valid points, but you need to understand that a general idea
of ane engine's OVERALL performance can be stated how many fps are
managed with an N polygon object at a specificed screen resolution.
This is better than nothing.

> Alternatively, when supplying us with the speeds of your engines, tell us
> the number of frames per second, when the *entire screen* (320x200) is
> filled by 1 polygon (not perpendicular to any axes).

Great, this tells you have fast your scan converter is, but what about
transformation, clipping, and lighting? You can have the world's
fastest polygon renderer and absolutely hobble it by coupling it to a
brain dead transformation pipeline.

Brian Hook

unread,

Dec 19, 1994, 12:05:04 PM12/19/94

to

In article <3d2qur$l...@bunyip.cc.uq.oz.au> e310...@dingo.cc.uq.oz.au (David Wilson) writes:

> 90,000 Gouraud shaded, Z buffered polygons/sec on DX4-100?
> Read on....

While I, of all people, would not defend a blatantly commercial
posting, I think Jez San's posting was borderline, if for
no other reason than he was responding to someone else's post
stating that real-time Z-buffering, texture mapping, Gouraud
shading, etc. on a PC was difficult to achieve -- BRender is
one of the FEW graphics libraries that generally has reliable
benchmark figures.

> Teapot obtained from avalon.chinalake.navy.mil:/pub/objects/off.
> According to teapot.geom first line: 1976 3751 11253
> vertices^ ^polygons

The teapot that BRender uses has 4000 polygons, not 3751.

> Ok, let's look at the figures...90,000 gouraud shaded,
> z-buffered polygons per second including full 4x4 matrix
> xform, project, culling, lighting, etc. On a 486DX4-100.

I highly doubt they use a 4x4 matrix, most likely a 4x3 matrix since
the last column can be optimized out.

> Given that the Utah teapot has 3,751 polygons that means a
> frame rate of 24 fps. There are 1,976 vertices. Therefore
> 47,424 transformations & projections per second *before you
> start drawing*.

Okay, say we have 22fps with a 4000 poly, 2000 vtx. teapot. This is
NOT 44K transformations/projections per second. This should be about
22K transformations/projectiosn per second. Any decent graphics
library will only transform, clip, project, etc. vertices that are
actually used. A-ha! Suddenly things become more reasonable.

> Now we get to draw. Let's do a backface cull, which is 4
> multiplies for a polygon. 3,751 * 24 * 4 = 360,000 at 20
> clocks each is another 7,201,920 clocks. Up to 20 million
> clocks. Let's assume half the polygons are culled.

A reasonable backface cull should be 3 multiplies and an add for
each polygon (plane equation check with prestored plane constants).

> Let's
> assume also that each polygon covers about 30 pixels. A
> Z-buffer requires one addition and one comparison per pixel,
> Gouraud shading is one addition, and texture-mapping is (let's
> go with these "new algorithms") two additions and one division
> per pixel. Add in code fetching and about 4 memory
> accesses/pixel (5 if the texture map is Gouraud shaded), and
> the *absolute minimum* you are looking at is about 50
> clocks/pixel. 45,000 polygons, let's say you need about 4
> divisions to set up (Gouraud & texture) which is 27*4*45000 =
> 5e6, 50 clocks/pixel * 30 * 45,000 = 67.5e6.

Assumptions assumptions assumptions. Fine, 30 pixel polygons. I'll
grant you the Z-buffer stuff, even though they could be doing
span checking, obviating the necessity for a compare and add
at each pixel, and their texture mapping could be table based (which
would mean two adds and a table look up per pixel). Now, another
thing Argonaut may be doing is caching edge values for polygons,
as this would save a LOT of time and prevent them from doing lots
of redundant calculations. How does this figure in to your
totals?

> (I think that since approximated texture mapping is "only a few percent
> slower" it should be included in the benchmark.)

Why?

> Transform, project 12.5M
> BF cull 7.5M
> Setup cost 5.0M
> Drawing 67.5M

> It

> also assumes that display memory writes are the same as a
> memory access (which on a "standard VGA" is wrong by a factor
> of 5 or more) and the Z-buffer comparison always succeeds.
> Also forget instruction decode and the lighting dot products,
> which would add about 5-10 million clocks on. *Also* don't
> forget that everything is probably 32-bit not 16-bit.

I don't believe Jez said anything about 'standard VGA', I'd bet
anything that he used a PCI DX4/100 with a reasonably fast video card.
Clue: video writes these days on many video cards are actually FASTER
than writes to main memory; not only that, they don't affect your
cache coherency nearly as badly as a write to system memory; they are
typically never even done since you render to an offscreen buffer
under VGA mode 0x13 then blit all at once; etc.

> Conclusion: 90,000 is the "advertising figure" for BRender
> 3D. Although I haven't seen it (nor have I heard of it before
> now, sigh) my estimations make it look...too good to be true.

Given your brute force algorithms I could understand your reluctance
to believe the statistics, but the fact is one of the keys to fast
computer graphics is avoiding a lot of the things you mention.

As I've stated, I have no affiliation with BRender or Argonaut,
but their stuff looks real good (they had demos at the Game
Developers Conference) and they seem to have a good design and
good programming staff. They don't toot their own horns as
brashly or frequently as the folks at Criterion, who use ANY
excuse to talk about their library, which IMO is a piece of
garbage. I don't know if the optimizations I described above
are reasonable, but I would think that since they are obvious
for the most part, that at the LEAST BRender has implemented
them and probably some fancy stuff we haven't considered.

> Just my $0.02. (If you want to correct my assumptions, feel
> free.) *Ouch! That 486 is sizzling!*

Done.

Billy Zelsnack

unread,

Dec 16, 1994, 12:49:27 PM12/16/94

to

>Chaz Bocock (Coder / Eschaton Enterprises)

>email ch...@eschaton.demon.co.uk
>writes:

>I'm sorry, but I fail to believe that you were getting 16000 texture-mapped
>polys on a DX2. I have slaved for months over my fully-asm coded poly
>fill (flat shaded) and it uses NO variables, only registers. It is the
>best algorithm I could come up with, after inventing many. I managed
>just over 45000 /second on my friends DX2, so texture at 1/3 speed is
>not possible. Even with texture-smearing you couldn't get this rate. Far
>too many memory accesses for the 486.

It depends what their definition of a polygon is.
10x10 pixel triangle or 50x50 pixel quad? Whatever.
People always tend to leave this out.

Here is my super fast 1x1 pixel texture mapped, polygon drawer.

memcpy(VIDEO_MEMORY,(*texturePtr),64000);

On my 486, I get over 12,800,000 texture mapped polygons a second.
(approximation from off the top of my head)

Jez San

unread,

Dec 27, 1994, 8:53:18 AM12/27/94

to

In article <3dnng7$p...@due.unit.no> rob...@idt.unit.no (Robert Schmidt) writes:

>In article <D1BHy...@cix.compulink.co.uk>, js...@cix.compulink.co.uk ("Jez San") writes:
>|> In article <BWH.94De...@netcom20.netcom.com> b...@netcom20.netcom.com (Brian Hook) writes:
>
>|> Dont hold me to this, but for texture mapping (same test suite as
>|> the gouraud shaded demo) it was about 75k polys/sec, and with texture
>|> and gouraud it was about 68k. Environment mapped was about 55k.
>|>
>|> the newer fast figures were about 110k polys/sec gouraud and 118k/sec
>|> for flat.
>
>I'm not particularly impressed by such figures -- because they are
>too scenario-sensitive, and thus mean nothing to me.
>
>What is a polygon anyway? How many pixels? Vertically or horizontally
>dominant? 3, 4 or n vertices?
>

In another message I detailed exactly what our benchmark was and how we
achieved it. Its impossible to please everybody with a benchmark, and I
apologise.

>Alternatively, when supplying us with the speeds of your engines, tell us
>the number of frames per second, when the *entire screen* (320x200) is
>filled by 1 polygon (not perpendicular to any axes). This gives the speed
>of the actual texture mapping process. Interpolating (i.e. approximating)
>engines will stand out clearly as the fastest, but give the worst looks.
>Exact engines will look good, but are slow. Hybrids (exact, but still
>interpolating) will be inbetween.
>

I think you will find that the inner pixel loop is incredibly fast on most
3d graphics engines, and that for most scenes the 'setup math' swamps tbe
results, up until hi-res screens that is. If the entire screen were filled
by just one polygon, then the inner pixel loop would be doing millions
of pixels per second, with few other overheads involved this would give
you an extremely unrepresentative figure of performance that would probably
appear much faster than the true figure.

-- Jez.

Ed Michalski

unread,

Dec 27, 1994, 9:59:45 AM12/27/94

to

Can somebody enlighten me, when someone says 70k polygons/sec what size
polys are they talking about, how many points are being filled, or are
they simply talking about just the edge points of the poly?

Ed Michalski / WTS / GTS / TCS
Programmer / Analyst / 'C' Chef/ EDI ANSI X12
Safelite Glass Corp /_____ ovec...@infinet.com "I may not own it. But,

Chris Schoeneman

unread,

Dec 27, 1994, 1:59:03 PM12/27/94

to

In article <BWH.94De...@netcom20.netcom.com>, b...@netcom20.netcom.com (Brian Hook) writes:
> In article <3dnng7$p...@due.unit.no> rob...@idt.unit.no (Robert Schmidt) writes:
>
> > Alternatively, when supplying us with the speeds of your engines, tell us
> > the number of frames per second, when the *entire screen* (320x200) is
> > filled by 1 polygon (not perpendicular to any axes).
>
> Great, this tells you have fast your scan converter is, but what about
> transformation, clipping, and lighting? You can have the world's
> fastest polygon renderer and absolutely hobble it by coupling it to a
> brain dead transformation pipeline.

I'd like to see at least two numbers: transform rate and fill rate.

The transform rate should be the number of 3D vertices per second that can
be transformed into screen space using an arbitrary modeling transform (i.e.
no speedups because some row or column has lots of zeros) and a perspective
projection. I don't care how many polygons are involved or how may vertices
got shared.

Fill rate is the number of pixels per second that can be filled. This
depends on your drawing style naturally so that must be given. If a pixel
isn't ultimately drawn then it doesn't count. It might not be drawn because
of a z-buffer or because it's off the edge of the screen. Obviously, for
top speed you'd want all pixels to be drawn, which means all the work for
that pixel must be done. As Robert suggested, fill rate should be based
on filling arbitrary polygons (i.e. no speedups because the region is
rectangular or has constant z, etc.)

Having said that, I should note that a single number each for transform
and fill rate isn't enough. I'd like to see the transform speed for
unlit and lighted vertices (with one infinite light). Fill rates should
include: flat; Gouraud; z-buffered Gouraud; z-buffered textured, and
screen clear. Also useful are 2D screen aligned fill rate and memory-
to-screen and screen-to-memory transfer rate.

If lighting is true Phong, then I guess there should be a fill rate for
lit, z-buffered, phong shaded polygons.

Cheers,
-chris

Chris Schoeneman
c...@lightscape.com
Lightscape Technologies

Jez San

unread,

Dec 28, 1994, 6:53:37 PM12/28/94

to

In article <3dsgjr$6...@nntpd.lkg.dec.com> pa...@eng.pko.dec.com (Samuel S. Paik) writes:

>Jez San <js...@cix.compulink.co.uk> wrote:
>>Dont hold me to this, but for texture mapping (same test suite as
>>the gouraud shaded demo) it was about 75k polys/sec, and with texture
>>and gouraud it was about 68k. Environment mapped was about 55k.
>>
>>the newer fast figures were about 110k polys/sec gouraud and 118k/sec
>>for flat.
>

>So, have you tried porting and running any PLBs?
>
>Sam
>--

Nope, I havn't tried that... where can I found out explicit details of
these plbs?

-- Jez.

Brian Hook

unread,

Dec 20, 1994, 3:49:48 AM12/20/94

to

In article <D137C...@cix.compulink.co.uk> js...@cix.compulink.co.uk ("Jez San") writes:

> (b) he is assuming that you need lots of muls to backface cull.

Alright Jez, you've sucked me in on this one. Backface culling
implies that you need to determine the side of polygon that the viewer
is on. This can be done either by checking the angle between the
viewer and the polygon or by seeing which side of the polygon the
viewer is on (the same thing, stated differently basically).
Basically, you need a dot product, which is 3 multiplies. How
is it possible to get around this fact? If this is a trade
secret, that's fine, but if you AREN'T doing a dot product
then I'd at least like to know that.

On another note, how does BRender handle color? Specifically, how
does it try to resolve the issue of assigning colors for different
shades of different colors along with texture maps and things like
that?

Jouni Mannonen

unread,

Dec 19, 1994, 12:53:20 AM12/19/94

to

David Wilson (e310...@dingo.cc.uq.oz.au) wrote:
: 90,000 Gouraud shaded, Z buffered polygons/sec on DX4-100?

: Read on....
:
: Teapot obtained from avalon.chinalake.navy.mil:/pub/objects/off.
: According to teapot.geom first line: 1976 3751 11253
: vertices^ ^polygons
:
: Ok, let's look at the figures...90,000 gouraud shaded,
: z-buffered polygons per second including full 4x4 matrix
: xform, project, culling, lighting, etc. On a 486DX4-100.
:
: Given that the Utah teapot has 3,751 polygons that means a
: frame rate of 24 fps. There are 1,976 vertices. Therefore
: 47,424 transformations & projections per second *before you
: start drawing*. Given a 4x4 transform this is a total of 9
: fixed point multiplies at 20 clocks each on a 486 (assuming
: 16-bit IMUL, 13-26 clocks: average 20), plus 12 adds and at
: least 15 memory references. Let's say 9*20 + 12 + 15 = 207
: clocks/point, total 9,816,768. Division (16-bit) requires 27
: clocks, 2 per point, total 27*2*47,424 = 2,560,896. In 32-bit
: all this would probably take another 30-40% longer. Let's say
: 12.5 million clocks so far (allowing for code fetching, etc.)

Miscalculated slightly; you never took into account that the
integer multiply of the dx4 processor is less clocks than
on a dx2; I would recall the numbers to be 5/12 clocks. (If I
am mistaken about this, someone please inform the correct
values? In any case, read on)

: Now we get to draw. Let's do a backface cull, which is 4
: multiplies for a polygon.

4 multiplies? The common method is to use two, but you can get
around even that when you're only drawing triangles.

: Gouraud shading is one addition, and texture-mapping is (let's

: go with these "new algorithms") two additions and one division
: per pixel.

There are better algorithms out than doing a division per pixel,
for example going with the bresenham style approach for accurate
perspective transform without ever having to divide.

: (I think that since approximated texture mapping is "only a few percent

: slower" it should be included in the benchmark.)

I doubt they are approximating with a divide per pixel.

: Conclusion: 90,000 is the "advertising figure" for BRender

: 3D. Although I haven't seen it (nor have I heard of it before
: now, sigh) my estimations make it look...too good to be true.

Not at all too good for a dx4 in my opinion. My own, routines
that I am currently working on, already can push over 20K tri-
mesh gouraud shaded polygons per second (with similar processes
as listed with BRender although I never did find ZBuffer useful)
on a 386, with a 486dx4 easily making the needed 90K/sec.
My routines are probably not as polished as BRender, nor created
for ease of developement; but in time will be used for a game I
am currently working on. I have no real intentions to sell my
engine, although if there are interested parties out there, do
let me know. I am still pushing the limits of the code.

(For people who might mail me to ask information on HOW to do
this, I can only recommend to read things archived on the net,
think over the algorithms and just _do_ it. Anything that has
been done can be done again.)

I have never seen BRender. I have never heard of it. But I have
no need to disclaim something another person has coded just
because it seems to initially sound too fast to be true. There
is no such thing as maximal optimization.

: Just my $0.02. (If you want to correct my assumptions, feel

: free.) *Ouch! That 486 is sizzling!*

With coding, you should never assume anything.

I know 90K gouraud shaded polygons per second can be done on a
486dx4 at 100Mhz and believe anyone with the patience and the
access to the information on the Internet can optimize his
routines to do the same. I know I did.

: Dave
: e310...@mailbox.uq.oz.au

Jouni Mannonen - freelance game programmer from Finland
wh...@hybrid.kaapeli.fi (6502/680x0/80x86 is a way of life!)

Robert Glidden

unread,

Dec 28, 1994, 2:50:43 PM12/28/94

to

Fascinating thread! I am wondering about some apparent underlying
assumptions:

BRender et al are _special_ APIs (ie games) therefore faster. Hey, maybe
they are just faster because they are better and faster. After all, the
SGI market has long been based on the premium-for-greater-performance
model, and its not hard to imagine that some fat has gotten into the
system. As 3d goes more mainstream, maybe the low end is going to drive
performance, and the higher end will become the specialized end.
(We're slower because we have more features sounds like feature bloat
to a consumer's ear). After all games are already
driving performance on PCs (WinG, multimedia, Borland v Watcom C, etc).

Granted, there are things you can do with OpenGL that you
can't with the 1.0 versions of some of the
Game APIs, but I would say that based on newer API's performance so far the
OpenGL crowd ought to look at these early 3D Game APIs as a wake up call
that 3D (like so many consumerized technologies before it) will not remain a
high-price specialty product forever.

Rob Glidden
Soft Press

Samuel S. Paik

unread,

Dec 28, 1994, 3:07:07 PM12/28/94

to

Ed Michalski <love...@infinet.com> wrote:
>Can somebody enlighten me, when someone says 70k polygons/sec what size
>polys are they talking about, how many points are being filled, or are
>they simply talking about just the edge points of the poly?

The "standard" in the workstation world are 10 pixel lines, 50 pixel
triangles, 100 pixel quads. Note that there isn't a standard benchmark
program for this, so you need to take SGI/DEC/IBM/HP/Sun ads with a large
grain of salt. This includes transform, cull test, clip test, possibly
lighting, and rasterization.

The other standard are GPC PLB (Picture Level Benchmark) numbers. These
use metafiles which are supposed to characterise how typical applications
actually use graphics. The one that probably most closely approximates
the "spin a teapot" is the head plb, which uses some data generated from
a Cyberware 3D scanner converted into triangle strips, and spins that
for a while (with four vector lights!). Other PLBs include panning
across a 2D PC board, changing the visible layers, an architectural
walkthrough, and an animated "movie".

Sam
--
Samuel Paik / Digital Equipment Corporation / 3D Device Support
pa...@avalon.eng.pko.dec.com / 508-493-4048 / I speak only for myself

The yin and yang of programming languages (due to Andrew Koenig)

Pure object-oriented programming:
everything is an object, even your program
Pure functional programming:
everything is a program, even your data.

Samuel S. Paik

unread,

Dec 28, 1994, 3:08:27 PM12/28/94

to

Jez San <js...@cix.compulink.co.uk> wrote:
>Dont hold me to this, but for texture mapping (same test suite as
>the gouraud shaded demo) it was about 75k polys/sec, and with texture
>and gouraud it was about 68k. Environment mapped was about 55k.
>
>the newer fast figures were about 110k polys/sec gouraud and 118k/sec
>for flat.

So, have you tried porting and running any PLBs?

Sam

QAZ1111111

unread,

Dec 29, 1994, 11:49:54 AM12/29/94

to

I would like more information a Brender.

Does this software support either GLINT or Matrox MGA?

BH

Jim Hourihan

unread,

Dec 19, 1994, 3:11:59 PM12/19/94

to

Brian Hook (b...@netcom20.netcom.com) wrote:

: I highly doubt they use a 4x4 matrix, most likely a 4x3 matrix since

: the last column can be optimized out.

What part of the pipeline are you doing this in? Are your matrices
row or column major? Can any general 4x4 be supplied to your pipeline?

: > Let's

: > assume also that each polygon covers about 30 pixels. A
: > Z-buffer requires one addition and one comparison per pixel,
: > Gouraud shading is one addition, and texture-mapping is (let's
: > go with these "new algorithms") two additions and one division
: > per pixel. Add in code fetching and about 4 memory
: > accesses/pixel (5 if the texture map is Gouraud shaded), and
: > the *absolute minimum* you are looking at is about 50
: > clocks/pixel. 45,000 polygons, let's say you need about 4
: > divisions to set up (Gouraud & texture) which is 27*4*45000 =
: > 5e6, 50 clocks/pixel * 30 * 45,000 = 67.5e6.

: Assumptions assumptions assumptions. Fine, 30 pixel polygons. I'll
: grant you the Z-buffer stuff, even though they could be doing
: span checking, obviating the necessity for a compare and add
: at each pixel, and their texture mapping could be table based (which
: would mean two adds and a table look up per pixel). Now, another
: thing Argonaut may be doing is caching edge values for polygons,
: as this would save a LOT of time and prevent them from doing lots
: of redundant calculations. How does this figure in to your
: totals?

These "texture maps" do not sound perspective correct. Are you interpolating
texture coordinates linearly in screen-space or in eye-coordinates? Are these
textures filtered? Do your textures appear to be swimming? Do you subdivide
you geometry in screen-space to remove artifacts?

Just curious.

-Jim

Jez San

unread,

Dec 19, 1994, 9:09:41 PM12/19/94

to

Agreed Bernie. David Wilson is assuming that you would (a)
do backface culling very late in the pipeline stage, thus wasting
lots of cycles doing math on polygons that arent visible, and

(b) he is assuming that you need lots of muls to backface cull.

We're not saying there is any magic code in BRender. It runs
regular compiled C code (and a bit of assembler) just like everyone
else. What we ARE saying is that its a nice clean solid implementation
of a proper 3d pipeline, with full functionality and no artifacts,
and its fast. Just good software engineering.

-- Jez.

Nathan Hand

unread,

Dec 19, 1994, 8:41:23 PM12/19/94

to

: Brian Hook (b...@netcom20.netcom.com) wrote:
: : I highly doubt they use a 4x4 matrix, most likely a 4x3 matrix since
: : the last column can be optimized out.

: What part of the pipeline are you doing this in? Are your matrices
: row or column major? Can any general 4x4 be supplied to your pipeline?

He might mean the algorithm can be optimised, not the compilation. The
atypical matrix calculation looks like

| a b c 0 |
| d e f 0 |
| g h i 0 |
| x y z 1 |

This falls apart naturally from matrix axioms and 3d basics. So the
entire right column gets "optimised" out. The result being you have
very specific routines, but save about 20% calculations.

Jez San

unread,

Dec 18, 1994, 10:03:05 AM12/18/94

to

In article <3cvvlp$7...@ixnews2.ix.netcom.com> cgr...@ix.netcom.com (eric pitzer) writes:
>In <D0z3L...@cix.compulink.co.uk> js...@cix.compulink.co.uk ("Jez San") writes:
>
>
>>We are getting about 90,000 gouraud shaded z buffered polygons per second
>>using our BRender 3d library on a 486 DX4 100 MHz PC and even more on a Pentium.
>>
>>Texture mapping slows us down a small amount, but not significantly.
>>
>>If you want more details on BRender and are in the uk, email pa...@argonaut.com
>>or in the usa email ri...@argonaut.com
>>
>>-- Jez.
>>
>
>
>-----How many of these polygons actually underwent a true view Xform?
>-----How can true texture mapping, which is more expensive than
>-----intensity and Z interpolation together, be cheap?

All of the polygons were in the scene (the test was the usual one
ie: 4,000 polygon utah teapot). Naturally some are back face culled,
and the rest are rendered. The figures include Transform, Clip,
Project, and rendering with Gouraud Shading AND z buffering.

Jez San

unread,

Dec 18, 1994, 10:16:16 AM12/18/94

to

In article <3cvvlp$7...@ixnews2.ix.netcom.com> cgr...@ix.netcom.com (eric pitzer) writes:
>In <D0z3L...@cix.compulink.co.uk> js...@cix.compulink.co.uk ("Jez San") writes:
>
>
>>We are getting about 90,000 gouraud shaded z buffered polygons per second
>>using our BRender 3d library on a 486 DX4 100 MHz PC and even more on a Pentium.
>>
>>Texture mapping slows us down a small amount, but not significantly.
>>
>>If you want more details on BRender and are in the uk, email pa...@argonaut.com
>>or in the usa email ri...@argonaut.com
>>
>>-- Jez.
>>
>
>
>-----How many of these polygons actually underwent a true view Xform?
>-----How can true texture mapping, which is more expensive than
>-----intensity and Z interpolation together, be cheap?

I forgot to mention, the benchmark we were quoting for BRender is
easily misinterpreted or re-worded, so perhaps I should be even more detailed
in what was being performed.

We have tested the BRender 3d API on a 486 DX4 100 MHz PC (Dell),
with a regular vga (no hardware acceleration) using a standard
benchmark shape; a 4,000 polygon tesselation of the utah teapot.
The benchmark is from object space into image space, ie: all of the
3d rendering pipeline is being performed. We are NOT pre-storing
any results (eg: doing the lighting only once, then simply
reading it from a table later). The benchmark included: 4x4 matrix
transform, lighting, clipping, projection, rendering with gouraud
shading and z buffering. This achieved about 90,000 polygons per
second (derived from the number of polygons in the scene multiplied
by the frame rate). I should add that this was an older version
of BRender and that the latest version is probably faster, and is

due for a retest.

When we switch on additional options, like texture mapping, environment
mapping, etc... things go slower, but not significantly slower.
There was talk by other folk on this group of a third of the speed
for texture mapping, and our results do not agree with this. Approximated
texture mapping runs only a few percent slower, and genuine perspective

texture mapping (still with gouraud shading and z buffering) is still
faster than 1/3 (probably close to a half). In general we use texture
approximations for rounded shapes that have a lot of polygons, and we
use perspective texturing for large flat shapes that have very large
polygons easily visible close up in the scene. There is no reason to
use perspective texturing for ALL polygons in the scene since most
people would not know the difference. When we quote a benchmark, however
we are quoting ALL polygons, and are not relying on any approximation
or dropping of detail. That would be unfair.

-- Jez.

Jez San

unread,

Jan 1, 1995, 3:24:30 PM1/1/95

to

BRender is a 3d rendering library that works on a wide variety of different
platforms including PC compatible and Sony PlayStation, with others
following soon. It supports hardware accelerators like GLiNT
and has been architected to support full 3d math and rendering pipelines
beyond what is currently available today.

BRender is probably the fastest 3d rendering library currently available
for the PC. The current version has been tested at over 100,000 polygons
per second. This was tested on a Dell 486/100 MHz PC, with no hardware
acceleration. Standard VGA mode 13h was used (320 x 200 x 256 colours)
and the test included fully gouraud shaded, z buffered images of a
4,000 polygon teapot being rotated, clipped, lit (1 directional light),
perspective projected, and rendered.

For more details, email ri...@argonaut.com in the usa, or pa...@argonaut.com
in Europe.

-- Jez San,
(Argonaut Software Limited)

Incognito

unread,

Jan 1, 1995, 5:16:36 PM1/1/95

to

Jez San (js...@cix.compulink.co.uk) wrote:
: BRender is probably the fastest 3d rendering library currently available

: for the PC. The current version has been tested at over 100,000 polygons
: per second. This was tested on a Dell 486/100 MHz PC, with no hardware
: acceleration. Standard VGA mode 13h was used (320 x 200 x 256 colours)
: and the test included fully gouraud shaded, z buffered images of a
: 4,000 polygon teapot being rotated, clipped, lit (1 directional light),
: perspective projected, and rendered.

This still lacks several important details to be an appropriate
benchmark figure. Firstly, I'd like to know the size of the object as it
is rotating (320 pixels across?), the speed of the VGA write in bytes per
second, and if the graphic acceleration features (blitting) of the vga
card were exploited, and also a comparative benchmark for a 66 mhz 486
dx2, as the multiplication opcode for the 486 dx4 is significantly faster
in execution and thus makes comparison more difficult. To be picky, it was
never even stated if the object was rotating about all 3 axis.

I would believe that the performance figures for my SurRender engine would
be notably better, but not knowing the exact situation of the benchmark,
won't make such claims. I'm simply interested in pursuing the State of
Art, even though the actual brute force of a 3D engine is still secondary
to the quality of a game, to design and sheer playability. Optimization
simply helps to hit the rest of the game right.

Jouni Mannonen
Freelance game developer

Brian Hook

unread,

Jan 3, 1995, 10:15:19 PM1/3/95

to

In article <3e79k4$g...@transu.cute.fi> sta...@hetero.cute.fi (Incognito) writes:

> This still lacks several important details to be an appropriate
> benchmark figure. Firstly, I'd like to know the size of the object as it
> is rotating (320 pixels across?),

This is important, obviously, but you can usually assume (nearly full
screen) with most benchmarks folks quote.

>the speed of the VGA write in bytes per second,

This is not a rendering issue, since a fast VGA write is constant
across all rendering architectures. This is a hardware issue.

>and if the graphic acceleration features (blitting) of the vga
> card were exploited,

Given Jez has posted that his benchmarks are in 320x200, this would
usually mean no acceleration. However, VERY few video cards support
accelerated blits from DRAM->VRAM.

> in execution and thus makes comparison more difficult. To be picky, it was
> never even stated if the object was rotating about all 3 axis.

This is REAL picky and irrelevant to boot. Those ext4ra rotations
should have ZERO effect on performance.