From Bjorn's 3D-World:
Rendition about VQuake and VHexen II Ok, all the other stuff taken
care off. Then it's time to take on this hot potatoe that the Stealth
II has become ...First. Stefan mailed me back an answer for the
question about the "slow" (no performance gain) speed people have been
getting in VQuake and VHexen II. This is what he said:
I'm not sure how Golden Egg was able to get good frame rates on
VQuake, either. All of our experiments had shown that VQuake wouldn't
go much faster on the V2200. The reason is simple: the way the game
draws is very CPU-dependent for its performance. You get much bigger
performance increases with a faster CPU than you do with a faster
renderer. (I can give you the relevant details of the graphics engine
if you'd really like that.)
So, yes it's true that VHexenII goes the same speed (at least on my
machine) on a V2K as on a V1K (about 51fps timedemo demo1 at
320x240). On the other hand, running it on a P2-300 on a V2K (also
with the benefit of the faster AGP bus, but I'm not sure how much that
matters yet) I get about 70fps in the same test.
This is why, by the way, we have another programmer working on Hexen2
and Quake2 for the V2K, that will not be so CPU bound, and will, if
things go as planned, give better performance.
For now, in Quake 2, I do get much better numbers on the V2K than on
the V1K, but that is using timerefresh, so I can't guarantee that
those numbers will carry over once timedemo gets added to Q2. (BTW,
the P2-300 gets much better Q2 timerefresh scores than my P166MMX,
So again, I don't know why Golden Egg got better performance then
compared to now. Let me know if you'd like more details on why CPU
power is so crucial to Quake-engine games.
Well, wait a minute... The only games that seem to have this "slow
native" problem is VQuake and VHexenII. Both VHexenII and VQuakeII will
have versions optimized for the V2200, and with a little tweaking it
seems possible to get better VQuake performance too (33 fps timedemo2 on
a lowly iP166 reportedly). And all reports say that D3D performance is
For those who care about 2D performance, that is another issue though.
It seems to show very little improvement over the V1000.
_ /| .·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.¸¸.·´¯`·.
\`o.O' Johannes Weiman | f94jowe-@-dd.chalmers.se | +46-(0)31 188681
=(___)= ICQ 3351499 | Remove ^ ^ to reply! |
U BLIND GUARDIAN & QUAKE -> http://www.dd.chalmers.se/~f94jowe
>Forget about the Verite...The I3D Voodoo will not be returned to EB
You're writing off the V2100/2200 because it doesn't significantly
improve performance in two games -- VQuake and VHexen II? What about
the literally dozens of Direct3D games which show a two fold
performance increase with the new chip? What about the native games
like Nascar 2, SODA, and Psygnosis F1 which also show significant
improvement? What about the fact that VQuake 2 runs twice as fast on
the 2100/2200 than the 1000? What about the fact that the $120
Stealth II outperforms the $220 I3D Voodoo in Direct3D?
> jlw...@mindspring.com wrote:
> >On Thu, 30 Oct 1997 05:29:46 GMT, Ed wrote:
> >>jlw...@mindspring.com wrote:
> >What about the fact that your "D3D Outperformance" is based on the FPS
> >of *one* game. VQuake didn't speed up...*everyone* said it would.
> >Eat those words.
> Diamond claims 187 3D Winmarks for the Stealth II on a P2/266 (Diamond
> website). Intergraph claims 176 3D Winmarks for the Intense 3D Voodoo
> on a P2/266 (Intergraph print ad). JPA reports 129 3D Winmarks for a
> reference V2200 card on a P200MMX. JPA reptorts 121 3D Winmarks for
> an Intense 3D Voodoo on a P200MMX
The intergraph Winmarks look very low for a PII266. My OR3d scores 215+ on
my PII266. Thats using the reference 3dfx drivers, no overclocking.
My real e-mail address is purpled (at) primenet (dot) com
admin@loopback, $LOGIN@localhost, $LOGNAME@localhost,
Chairman Reed Hundt: rhu...@fcc.gov
Commissioner James Quello: jqu...@fcc.gov
Commissioner Susan Ness: sn...@fcc.gov
Commissioner Rachelle Chong: rch...@fcc.gov
US Postal Service: cust...@email.usps.gov
> In article <3458F033...@spam.this.primenet.com>,
> pur...@spam.this.primenet.com wrote:
> > Ed wrote:
> > > jlw...@mindspring.com wrote:
> > >
> > > >On Thu, 30 Oct 1997 05:29:46 GMT, Ed wrote:
> > >
> > > >>jlw...@mindspring.com wrote:
> > >
> > > >What about the fact that your "D3D Outperformance" is based on the FPS
> > > >of *one* game. VQuake didn't speed up...*everyone* said it would.
> > > >Eat those words.
> > >
> > > Diamond claims 187 3D Winmarks for the Stealth II on a P2/266 (Diamond
> > > website). Intergraph claims 176 3D Winmarks for the Intense 3D Voodoo
> > > on a P2/266 (Intergraph print ad). JPA reports 129 3D Winmarks for a
> > > reference V2200 card on a P200MMX. JPA reptorts 121 3D Winmarks for
> > > an Intense 3D Voodoo on a P200MMX
> > > (http://www.jpa.com/benchmarks/index.html).
> > The intergraph Winmarks look very low for a PII266. My OR3d scores 215+ on
> > my PII266. Thats using the reference 3dfx drivers, no overclocking.
> You're comparing a Rush board to a standalone 3dfx. The Rush boards are slower.
Oops, your right. Too many 3dfx variants too keep track of now a days :)
Björn Tidal wrote:
> I think it's a great board for that price and am a bit confused why
> people are slamming the whole V2x00 chipset family because of the
> initial problems with the first V2x00 card.
Most of the people who actively participate in this group already own a 3d
accelerator, so I don't think you are seeing slams, rather indifference.
Think about it. If you are a new buyer the V2x00 cards sound like a great
buy. If you are a hard core gamer you bought the best a year ago and still
have no legitimate reason to upgrade and hence chipsets like the V2x00 don't
> It does run D3D-games very fast. It does run native games very fast
> (yes, there are quite a few native games and more to come). It
> hopefully soon will run GLxxx games (can't say anything about speed).
> The 2D isn't lousy, just not better than the V1000. And all this for a
> price of 110$. I think it's quite a good deal.
What I like about all these second gen chips is the 3d support they'll
eventually inspire because they are finally reaching the baseline
>On Thu, 30 Oct 1997 05:29:46 GMT, Ed wrote:
>What about the fact that your "D3D Outperformance" is based on the FPS
>of *one* game. VQuake didn't speed up...*everyone* said it would.
>Eat those words.
Diamond claims 187 3D Winmarks for the Stealth II on a P2/266 (Diamond
website). Intergraph claims 176 3D Winmarks for the Intense 3D Voodoo
on a P2/266 (Intergraph print ad). JPA reports 129 3D Winmarks for a
reference V2200 card on a P200MMX. JPA reptorts 121 3D Winmarks for
an Intense 3D Voodoo on a P200MMX
Not everyone said that VQuake would speed up. Rendition sure didn't.
Besides, when Quake 2 is release in a month or so, who's going to care
about VQuake performance?
I wonder why you are so extremely hostile? I don't know if the Stealth
II outperforms the I3D Voodoo as I haven't seen any scores form it.
But the D3D-performance for the Stealth II is very good and right now
seems at least as good as either my Monster 3D OR my Miro Hiscore 3D
(but I need a card in my computer to make a direct comparison)
Currently people are reporting fps in games like JK, Motoracer etc
etc. and in those games that have no fps they are reporting how the
game feels (very subjective I know). And until now the reports have
only been positive. Reports on games like Nascar 2, Descent II,
Formula F1 etc. has also been very positive. The only 2 games
currently NOT being accelerated is VQuake and VHexen II. Stefan Podell
from Rendition posted a explanation today on my webboard. A very
compact summary is that the VQuake engine was done in a way to use the
CPU for many tasks the V1000 was to slow to handle. This means that a
faster renderer as the V2100 and the V2200 won't do much good after a
certain point. Yes, we all believed that VQuake and VHExen II would go
faster and Rendition must have been blind not to see it. However this
was just taken for granted without anyone knowing how the VQuake
engine works. Rendition IS working on a new VHexen II patch that will
let the V2100 driver the game faster. A new VQuake patch seems however
unlikely as they want to concentrate on VQuake II. But if enough
people nag on them about this they might reconsider.
Currently the only 2 negative thing with the Stealth II seems to be
1/ 2D-performance. Definitely not up to Riva's standard and strangely
enough doesn't even seem to be much better than the V1000. This can
however be a driver issue (not the first time a shipping driver would
be slow). I've never felt the 2D-performance of my Total 3D to be a
bottleneck in my system (P225 MMX, 1024x768, 16 bit) in any of the
non-3D games I play or the apps I use but I know people have different
opinions on this.
2/ No accelerated OpenGL-drivers in NT. Yes, you heard right.
Rendition HAS reference OpenGL NT drivers that they have shipped to
Diamond but according to a Rendition guy at my board who phoned
Diamond and spoke with them , Diamond admitted they hadn't shipped
these drivers in the first batch of cards. Apparently Rendition is
working with Diamond to get them to release them on the web. This is a
temporary problem but a irritating problem non-the-less.
I think it's a great board for that price and am a bit confused why
people are slamming the whole V2x00 chipset family because of the
initial problems with the first V2x00 card.
It does run D3D-games very fast. It does run native games very fast
(yes, there are quite a few native games and more to come). It
hopefully soon will run GLxxx games (can't say anything about speed).
The 2D isn't lousy, just not better than the V1000. And all this for a
price of 110$. I think it's quite a good deal.
Webmaster of Björn's 3D-World
- A resource for Verite owners
Do you want more games for the Verite??? Then join
The Rendition Initiative at my site
>Currently people are reporting fps in games like JK, Motoracer etc
>etc. and in those games that have no fps they are reporting how the
>game feels (very subjective I know).
To get the framerate in MotoRacer, add the following text at the
command line (like this):
Then, while playing the game, press CTRL-F1, and a counter will
You can use the counter during the replays too.
In order to respond by email, expand the "pdx"
to "portland" in my address.
He's comparing a 2D/3D board to another 2D/3D board. Are there any 2D/3D
boards based on Voodoo Graphics? No.
>I wonder why you are so extremely hostile? I don't know if the Stealth
>II outperforms the I3D Voodoo as I haven't seen any scores form it.
>But the D3D-performance for the Stealth II is very good and right now
>seems at least as good as either my Monster 3D OR my Miro Hiscore 3D
Actually, I apologize for the harsh words. In fact, I decided to try
out the Stealth II and its not nearly as bad as people are saying on
>Forget about the Verite...The I3D Voodoo will not be returned to EB
Posted by Stefan Podell on October 30, 1997 at 04:34:56:
Okay, with my reputation tarnished by VQuake and VHexen2
not going any faster on the V2x00, I'm here to explain
why this is the case.
First, the VQuake/VHexen2 engine can be roughly broken
down into four parts (actually, most games are like
* Software setup of geometry commands
* Software creation of texture maps
* Transfer of textures and commands to the renderer
The first two parts are totally CPU dependent, the
third is bus dependent, and the fourth, in my case,
is Verite dependent.
Now, some history on the design of VQuake.
When we first started working with Id on accelerated
Quake, Id's design was much like their design for Quake
2: that there would be a "driver" part of the code,
somewhat separated from the "main" game. (In Quake's
case, though, since it is a DOS game, this would be
accomplished with different executables rather than the
more elegant DLL model employed by Quake 2.)
They had two 3D engine paths through the game. One was
a traditional triangle/polygon based engine, which is
what they predicted all 3D accelerators would use, and
one was a fairly elaborate "span sorting" scheme which
the software renderer used.
So we set out accelerating the game using the polygon
interface. It looked great with filtering, but it
wasn't terribly fast. Even after doing polygon sorting
on the world polygons and turning off the Z buffer for
those, the performance was not great. (You must
remember that the V1000 wasn't really designed to do
Z-buffering as its primary rendering style.) So Walt
Donovan, then with Rendition, and Michael Abrash, then
with Id, talked about using the software engine's span
For those who are interested, there have
been a few articles published by Michael on the guts
of the engine. I think Dr Dobb's Journal had the best,
most detailed one. To the best of my understanding,
the engine sorts all world surfaces (floors, walls,
ceilings, and sky) on each scanline of the screen,
and keeps track of the edges of each span. When it
goes to draw a particular surface (polygon), it is
then guaranteed that every pixel it is drawing is the
only world pixel that will be drawn at those
coordinates. (This is hard to explain...) Suffice to
say that when the world surfaces are drawn, exactly
the minimum possible number of pixels is drawn (i.e.,
a depth complexity of 1). Meanwhile, the Z buffer
has been filled with the correct Z values, but Z
comparison is not necessary, since we know the depth
complexity is one. Next, when we start drawing objects
(monsters, weapons, etc), we turn on the Z comparison
so that the objects are properly hidden by the world.
The reasoning behind this was that the Pentium was
pretty slow at drawing pixels, but fast at floating
point operations. By doing more of what the Pentium
was good at, Quake was able to do less of what the
Pentium was bad at. Overall, a performance win, with
the side benefit of allowing much more interesting
Walt and Michael decided that since the Verite 1000
wasn't terribly good at Z-buffered pixels, that if
we let the Pentium take care of this span sorting, we
could reduce the number of pixels the Verite would
draw. Furthermore, we'd be able to turn off the Z
compare function on the Verite. So now, rather
than having a depth complexity of around 1.5
(about 450000 pixels at 640x480) pixels that draw
at the peak rate of about 10MP/s on the V1000, we
had a depth complexity of 1 (300000 pixels) that
draw at a peak rate of about 17MP/s.
As with the software renderer, we could then turn on
the Z compare and draw all the more interesting
objects. Yes these pixels would be as slow as always,
but there are generally far fewer of them.
So we set out writing new microcode to support Quake's
span data format. When we finally got this working,
sure enough, the performance was way better than the
original polygon-style engine.
Back to my four-part description of Quake's engine, we
traded more CPU work in stage one for less Verite work
in stage four, which ended up getting us a big win.
Note that when you increase the vertical resolution in
the game, the engine must sort more scanlines. And if
you increase the resolution in either direction, the
renderer must draw more pixels.
Also note that when Quake was written, P133's were
pretty top-notch, and the software frame rates were
low enough that the span sorting was adequate to
keep up. With the Verite rendering much faster than
the Pentium could, suddenly the span sorting was the
bottleneck. It wasn't until we got to P200's that
the Verite was busy most of the time.
So whether you're on a V1000 or V2x00, the CPU has
a lot of work to do.
Okay, that's part one.
Part two is texture maps. (This will be much shorter.)
The way the software renderer in Quake works is to take
a small texture tile (like a couple of bricks) and
duplicate it into a larger texture map for the world
surface while it is applying the light map (dynamic or
static). It then caches that texture map and draws
with it. When it needs to draw that surface again, it
checks to see if the lights have changed (like when
you fire a weapon). If they have, it must regenerate
the texture map and recache it.
The VQuake engine does the same thing, with the extra
step of having to download the texture map to video
Quake also mipmaps these surfaces. The mipmap level
is chosen based on the size of the polygon (in pixels)
relative to the size of the texture map.
In VQuake, the texture cache is kept in video memory
along with the display buffers and Z buffer. The
quick equation for how much memory the display and Z
buffers take is Width * Height * 6 (3 buffers, each
16-bits deep). The rest is for texture maps (minus
about 128K for microcode).
So when you increase the resolution, two things
happen that increase the demands on the CPU for
texture map generation. First, you have less texture
memory, so textures will fall out of the cache more
often, requiring regeneration. Second, higher
resolution mipmaps will be chosen, further straining
the texture cache.
The assembly code for generating the textures is
darn near as good as it can be. I certainly can't
think of any instructions to remove, and Michael
Abrash, who wrote it, is a genius at this stuff.
We considered doing two pass lighting on the Verite,
but after some experiments decided the CPU could do
So again, no matter the Verite chip, the CPU will be
very busy. (The texture mapping, by the way is the
primary reason that timedemo works so much better as
a real benchmark than timerefresh. In the demo
sequences, there's lots of combat going on, which
pushes the system much harder.)
Alright. That's part two.
Part three is the bus. As you know, the currently
available Verites use the PCI bus, and are able to
use DMA asynchronously, which does not use the
CPU. The bus activity will steal some cycles from
the CPU, but not an appreciable amount.
Finally, the renderer. The V2x00 chips are *much*
faster at drawing than the V1000. The fastest the
V1000 could go was 25MP/s. The V2100 goes 40MP/s
and the V2200 goes 50MP/s. Adding features (Z,
alpha, fog, etc.) would slow the V1000 down a lot,
while having minimal impact on the V2x00 chips.
So I understand why people were expecting VQuake
and VHexen2 to go much faster on the V2x00. But
the fact of the matter is that a faster renderer
doesn't necessarily buy you much given the
architecture of this engine. And because of that,
we're working on a V2x00-specific version of the
engine, to take advantage of the extra pixel
power, while lightening the load on the CPU.
I must admit that I was beginning to wonder if
my beliefs about the engine's behavior were
really true. So I just did a weird hack on
VHexen2 to test something:
I put in a check at the time drawing commands
and texture maps are sent to the Verite to see
if the game was in "timedemo mode". If it was,
I just threw away the commands and continued.
Then, when timedemo was over, drawing would
kick back in and I would see the results. The
purpose of this was to simulate an *infinitely
fast* renderer and bus (something we'd all
like to have :-) (This is also known as a
"speed of light" test.)
Here are the results of running timedemo demo1
on my current VHexen2 build (beta 3 candidate).
I ran all the tests at 320x200, 512x384, and
640x480, with antialiasing set to 0 and to 7
at all resolutions.
The first three tests are with rendering turned
on, in other words, the numbers everyone has
been responding to so far. The last set of
numbers is my "infinitely fast" renderer.
(fps are antialias = 0/antialias = 7)
The first test was on my P166MMX (64MB RAM)
with my V1000 reference board, which runs at
the same speed as an Intergraph 3D 100.
(These seem slower to me than what I
remember, but I can't find where I wrote
down my old numbers, so I just re-ran
320x200 (51.0 / 43.1)
512x384 (26.7 / 24.2)
640x480 (16.9 / 15.5)
Next, the same PC with a V2200. The first
thing you'll notice is that it's a little
slower at low resolutions. I think this is
because the span microcode has a little extra
overhead on the 2200. It also doesn't
currently interleave buffers. I'm looking
into fixing those things.
320x200 (48.1 / 41.9)
512x384 (32.9 / 28.5)
640x480 (22.0 / 20.1)
Next, a V2200 in a P2-300 (this computer has
a pretty sucky hard drive in it and 32MB RAM,
so there was more swapping going on than
on my 166MMX)
320x200 (71.0 / 63.4)
512x384 (43.3 / 36.1)
640x480 (27.9 / 23.5)
And finally, my P166MMX with my "infinitely
fast" renderer/bus test
320x200 (56.0 / 48.2)
512x384 (44.1 / 37.8)
640x480 (32.4 / 29.7)
So you can see that at low resolutions, a
faster CPU gets you much more by way of
performance than a faster renderer. And as
resolution increases, infinitely fast
rendering is a good thing :-)
Again, all this adds up to us working on
a V2x00 version of this engine. We'll tell
you more as we know more about its progress.
This is not to say that there's no room for
improvement in my current version of the
game. So if I figure out any cool way to make
this go faster on a V2x00, I'll definitely
put it in. There is one thing Quake 2
does that I want to try in VHexen 2. I'll
let you know.
Stefan (maybe I should use a .plan file) Podell