With all respect to mr Carmack, I don't think everyone should hold his
word as the truth just because he creates great games (I dont't know if
he actually has said that D3D sucks). As other game programmers he is
probably used to the straight forward way to handle computer graphics
that OpenGL provides, but in the long term he may change opinion. I
think everyone should give D3D-developers time to prove that D3D isn't
that bad, and with the introduction of Talisman only D3D (TM) will do
(?). Don't underestimate the power of Microsoft.
/ Per , Sweden
Why not some comparison D3D, OGL, RenderWare, BRender, QD3D?????
********* Happy new year !!!!! *********************************
This discusion is about which API we will all *have* to use in the long
run. I for one think its worth trying to influence that choice *now*
while we still have a chance.
Its also about wether Microsoft gain control of the PC games market,
wether we use an open or closed standard. Put another way, its about
wether you will use a Microsoft PC running Microsoft software in the
future, not because its best, but because its the only choice.
The sheer quantity of petty bickering and shit throwing may be annoying,
but this is a very important issue for all of us, and needs discussing.
---
Paul Shirley: shuffle chocolat before foobar for my real email address
Further more, some features that are supported according to the documentations
work only in special cases (or do not work in special cases). (I read this
in this group, so I only repeat someone)
I would call D3D a beta release. A final release would not have that many
bugs and problems. And all the D3D developers are beta tester (including
the own Microsoft developers programming decent clones)
So basically when blaming D3D most developer do not talk about speed. They
are talking about the development time for a D3D program. (If a certain
program works for one graphic card why does it not work for an other.
The customer will call the one how wrote the program first. This one
has to find out that not his program is the problem but a bad D3D
driver for the graphic card. Sometimes it ends in saying to the customer
that he must use an older or newer version x.y of the driver to
run the program.)
Also debugging programs are very difficult. E.g. you are searching for
bug which leads to screwing up your memory. If you can not find the
bug in your code (or you find a dirty hack like: allocating static memory
as buffer so that the bug only screw up the memory in this area;
allocating more memory than needed to prevent overwriting the next memory
segment...) you will never be sure that D3D doesn't do the memory screwing.
If it is a big project you are working on you will get nightmares about this.
If D3D was tested with test suites you could look into these test suites to
see if you are using D3D correctly or you use a feature of D3D which is
not included in the test suites. This would make your developing more easier
There is no intention to offend any D3D developer. The intentions is only
to show the current problems of D3D for some serious developers. Also
all problems I described I have seen in many previous postings; so it is
only a summary of some kind.
(This don't include me, because i don't use D3D in commercial projects)
Aurel Balmosan
per.burstrom (per.bu...@swipnet.se) wrote:
: Isn't this OGL vs D3D discussion a bit like a discussion about C vs C++:
: programs written in C is often faster and results in less code than the
: same program written in C++. But when your code grows more complex you
: may realize that in the long term the C++ path will result in not less
: code but less work. I think this also can be true with D3D. I don't have
: a lot of experience in D3D (a bit more in OGL), and I can't see why so
: many see D3D so bad. In speed it may not be very good (the now dead 3DR
: is faster, and free), but this should improve over time.
: With all respect to mr Carmack, I don't think everyone should hold his
: word as the truth just because he creates great games (I dont't know if
: he actually has said that D3D sucks). As other game programmers he is
: probably used to the straight forward way to handle computer graphics
: that OpenGL provides, but in the long term he may change opinion. I
: think everyone should give D3D-developers time to prove that D3D isn't
: that bad, and with the introduction of Talisman only D3D (TM) will do
: (?). Don't underestimate the power of Microsoft.
:
: / Per , Sweden
: Why not some comparison D3D, OGL, RenderWare, BRender, QD3D?????
: ********* Happy new year !!!!! *********************************
--
-----------------------------------------------------------------------------
Aurel Balmosan au...@xylo.owl.de
Xylo: The way things are going.
I love my job I don't want to mess with **SHIT**. I don't like using
anybody else libraries.. and if I have to then I want to use something
that is made by humans for humans.
Everyone is glad that Carmack spoke because his words might have some
weight on the market.
D3D IS SHIT, if you like it then go for it.. but don't tell me I don't
have to complain !! I want to enjoy my job !!! I've always been refusing
to work on D3D and I think everybody should do it.. just REFUSE WORKING
ON D3D.
These are all my personal opinions BESIDES what Carmack thinks and does.
ciaox
P.S. Sorry for the tone I'm not against you I'm just very concerned
about the whole question.
//=====================================================
//= Davide Pasca - http://www.netcom.com/~dpasca =
//= email: dpa...@ix.netcom.com || dpa...@val.net =
//=====================================================
Okay, here's the finger from JCs account. There seem to be a lot of people
here who are arguing about something they haven't even read.
---
airdmhor:simon> finger @idsoftware.com
[idsoftware.com]
Login Name TTY Idle When Office
american American McGee p1 23: Mon 10:05
johnc John Carmack p2 1d Sun 01:19
mikeab Michael Abrash p3 4:02 Tue 10:41
twillits Tim Willits p4 Sun 23:17
airdmhor:simon> finger jo...@idsoftware.com
[idsoftware.com]
Login name: johnc In real life: John Carmack
Directory: /raid/nardo/johnc Shell: /bin/csh
On since Dec 15 01:19:05 1 day 19 hours Idle Time
on ttyp2 from idnewt
Plan:
I am going to use this installment of my .plan file to get up on a soapbox
about an important issue to me: 3D API. I get asked for my opinions about
this often enough that it is time I just made a public statement. So here
it is, my current position as of december '96...
While the rest of Id works on Quake 2, most of my effort is now focused on
developing the next generation of game technology. This new generation of
technology will be used by Id and other companies all the way through the
year 2000, so there are some very important long term decisions to be made.
There are two viable contenders for low level 3D programming on win32:
Direct-3D Immediate Mode, the new, designed for games API, and OpenGL, the
workstation graphics API originally developed by SGI. They are both
supported by microsoft, but D3D has been evangelized as the one true
solution for games.
I have been using OpenGL for about six months now, and I have been very
impressed by the design of the API, and especially it's ease of use. A month
ago, I ported quake to OpenGL. It was an extremely pleasant experience. It
didn't take long, the code was clean and simple, and it gave me a great
testbed to rapidly try out new research ideas.
I started porting glquake to Direct-3D IM with the intent of learning the
api and doing a fair comparison.
Well, I have learned enough about it. I'm not going to finish the port. I
have better things to do with my time.
I am hoping that the vendors shipping second generation cards in the coming
year can be convinced to support OpenGL. If this doesn't happen early on
and there are capable cards that glquake does not run on, then I apologize,
but I am taking a little stand in my little corner of the world with the
hope of having some small influence on things that are going to effect us
for many years to come.
Direct-3D IM is a horribly broken API. It inflicts great pain and suffering
on the programmers using it, without returning any significant advantages.
I don't think there is ANY market segment that D3D is apropriate for, OpenGL
seems to work just fine for everything from quake to softimage. There is no
good technical reason for the existance of D3D.
I'm sure D3D will suck less with each forthcoming version, but this is an
oportunity to just bypass dragging the entire development community through
the messy evolution of an ill-birthed API.
Best case: Microsoft integrates OpenGL with direct-x (probably calling it
Direct-GL or something), ports D3D retained mode on top of GL, and tells
everyone to forget they every heard of D3D immediate mode. Programmers have
one good api, vendors have one driver to write, and the world is a better
place.
To elaborate a bit:
"OpenGL" is either OpenGL 1.1 or OpenGL 1.0 with the common extensions. Raw
OpenGL 1.0 has several holes in functionality.
"D3D" is Direct-3D Immediate Mode. D3D retained mode is a seperate issue.
Retained mode has very valid reasons for existance. It is a good thing to
have an api that lets you just load in model files and fly around without
sweating the polygon details. Retained mode is going to be used by at least
ten times as many programmers as immediate mode. On the other hand, the
world class applications that really step to new levels are going to be done
in an immediate mode graphics API. D3D-RM doesn't even really have to be
tied to D3D-IM. It could be implemented to emit OpenGL code instead.
I don't particularly care about the software only implementations of either
D3D or OpenGL. I haven't done serious research here, but I think D3D has a
real edge, because it was originally designed for software rendering and
much optimization effort has been focused there. COSMO GL is attempting to
compete there, but I feel the effort is misguided. Software rasterizers
will still exist to support the lowest common denominator, but soon all game
development will be targeted at hardware rasterization, so that's where
effort should be focused.
The primary importance of a 3D API to game developers is as an interface to
the wide variety of 3D hardware that is emerging. If there was one
compatable line of hardware that did what we wanted and covered 90+ percent
of the target market, I wouldn't even want a 3D API for production use, I
would be writing straight to the metal, just like I allways have with pure
software schemes. I would still want a 3D API for research and tool
development, but it wouldn't matter if it wasn't a mainstream solution.
Because I am expecting the 3D accelerator market to be fairly fragmented for
the forseeable future, I need an API to write to, with individual drivers
for each brand of hardware. OpenGL has been maturing in the workstation
market for many years now, allways with a hardware focus. We have exisiting
proof that it scales just great from a $300 permedia card all the way to a
$250,000 loaded infinite reality system.
All of the game oriented PC 3D hardware basically came into existance in the
last year. Because of the frantic nature of the PC world, we may be getting
stuck with a first guess API and driver model which isn't all that good.
The things that matter with an API are: functionality, performance, driver
coverage, and ease of use.
Both APIs cover the important functionality. There shouldn't be any real
argument about that. GL supports some additional esoteric features that I
am unlikely to use (or are unlikely to be supported by hardware -- same
effect). D3D actually has a couple nice features that I would like to see
moved to GL (specular blend at each vertex, color key transparancy, and no
clipping hints), which brings up the extensions issue. GL can be extended
by the driver, but because D3D imposes a layer between the driver and the
API, microsoft is the only one that can extend D3D.
My conclusion about performance is that there is not going to be any
significant performance difference (< 10%) between properly written OpenGL
and D3D drivers for several years at least. There are some arguments that
gl will scale better to very high end hardware because it doesn't need to
build any intermediate structures, but you could use tiny sub cache sized
execute buffers in d3d and acheive reasonably similar results (or build
complex hardware just to suit D3D -- ack!). There are also arguments from
the other side that the vertex pools in d3d will save work on geometry bound
applications, but you can do the same thing with vertex arrays in GL.
Currently, there are more drivers avaialble for D3D than OpenGL on the
consumer level boards. I hope we can change this. A serious problem is
that there are no D3D conformance tests, and the documentation is very poor,
so the existing drivers aren't exactly uniform in their functionality.
OpenGL has an established set of conformance tests, so there is no argument
about exactly how things are supposed to work. OpenGL offers two levels of
drivers that can be written: mini client drivers and installable client
drivers. A MCD is a simple, robust exporting of hardware rasterization
capabilities. An ICD is basically a full replacement for the API that lets
hardware accelerate or extend any piece of GL without any overhead.
The overriding reason why GL is so much better than D3D has to do with ease
of use. GL is easy to use and fun to experiment with. D3D is not (ahem).
You can make sample GL programs with a single page of code. I think D3D has
managed to make the worst possible interface choice at every oportunity.
COM. Expandable structs passed to functions. Execute buffers. Some of
these choices were made so that the API would be able to gracefully expand
in the future, but who cares about having an API that can grow if you have
forced it to be painful to use now and forever after? Many things that are
a single line of GL code require half a page of D3D code to allocate a
structure, set a size, fill something in, call a COM routine, then extract
the result.
Ease of use is damn important. If you can program something in half the
time, you can ship earlier or explore more aproaches. A clean, readable
coding interface also makes it easier to find / prevent bugs.
GL's interface is procedural: You perform operations by calling gl
functions to pass vertex data and specify primitives.
glBegin (GL_TRIANGLES);
glVertex (0,0,0);
glVertex (1,1,0);
glVertex (2,0,0);
glEnd ();
D3D's interface is by execute buffers: You build a structure containing
vertex data and commands, and pass the entire thing with a single call. On
the surface, this apears to be an efficiency improvement for D3D, because it
gets rid of a lot of procedure call overhead. In reality, it is a gigantic
pain-in-the-ass.
(psuedo code, and incomplete)
v = &buffer.vertexes[0];
v->x = 0; v->y = 0; v->z = 0;
v++;
v->x = 1; v->y = 1; v->z = 0;
v++;
v->x = 2; v->y = 0; v->z = 0;
c = &buffer.commands;
c->operation = DRAW_TRIANGLE;
c->vertexes[0] = 0;
c->vertexes[1] = 1;
c->vertexes[2] = 2;
IssueExecuteBuffer (buffer);
If I included the complete code to actually lock, build, and issue an
execute buffer here, you would think I was choosing some pathologically
slanted case to make D3D look bad.
You wouldn't actually make an execute buffer with a single triangle in it,
or your performance would be dreadfull. The idea is to build up a large
batch of commands so that you pass lots of work to D3D with a single
procedure call.
A problem with that is that the optimal definition of "large" and "lots"
varies depending on what hardware you are using, but instead of leaving that
up to the driver, the application programmer has to know what is best for
every hardware situation.
You can cover some of the messy work with macros, but that brings its own
set of problems. The only way I can see to make D3D generally usable is to
create your own procedural interface that buffers commands up into one or
more execute buffers and flushes when needed. But why bother, when there is
this other nifty procedural API allready there...
With OpenGL, you can get something working with simple, straightforward
code, then if it is warranted, you can convert to display lists or vertex
arrays for max performance (although the difference usually isn't that
large). This is the right way of doing things -- like converting your
crucial functions to assembly language after doing all your development in
C.
With D3D, you have to do everything the painful way from the beginning.
Like writing a complete program in assembly language, taking many times
longer, missing chances for algorithmic improvements, etc. And then finding
out it doesn't even go faster.
I am going to be programming with a 3D API every day for many years to come.
I want something that helps me, rather than gets in my way.
John Carmack
Id Software
--- end transcript
HTH,Simon
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Simon Roberts : "All the taxes paid over a lifetime by the average
Linux - the choice of : American are spent by the government in less than a
a GNU generation : second" -- Jim Fiebig
There is no basis to assume that D3D will be the only 3d library for
Talisman hardware. We are looking at implementing an OpenGL extension for
it, for instance. Also, OpenGL and D3D have exactly the same technical
problems with regards to Talisman... technically speaking, there's nothing
"special" about D3D that makes a Talisman implementation any easier than
OpenGL.
Now, I just hope that politically speaking, we can all arrive at the same
solutions with regards to Talisman.
Cheers,
--
Brandon J. Van Every | Free3d: old code never dies! :-)
| Starter code for GNU Copyleft projects.
DEC Graphics & Multimedia |
Windows NT Alpha OpenGL | vane...@blarg.net www.blarg.net/~vanevery
in contrast to most in this newsgroup, HE has PROOVEN that he can write
great games while others dream of creating games. I am quite curious
how many on this newsgroup actually SOLD a game.
: he actually has said that D3D sucks). As other game programmers he is
: probably used to the straight forward way to handle computer graphics
: that OpenGL provides, but in the long term he may change opinion. I
: think everyone should give D3D-developers time to prove that D3D isn't
: that bad,
we have now 3 versions of direct x in half a year, and i don't see
substantial improvements (except file size :-)).
: and with the introduction of Talisman only D3D (TM) will do
if i got the talisman concept right, it will never be as good as
playstation.
: (?). Don't underestimate the power of Microsoft.
the power of microsoft has nothing to do with innovation, they are basically
reinventing the wheel all the time, and make it incompatible to existing
wheels so that everybody has to by microsoft wheels. (see internet and d3d)
: Why not some comparison D3D, OGL, RenderWare, BRender, QD3D?????
because nobody forces you to use renderware or brender. if you don't like
them, you don't buy them. but everytime mircosoft makes a new standard,
everybody thinks that there is no other way then supporting it (and because
everybody thinks this, this might even be the truth). so if microsoft
puts another burden on our already hard work of creating games, then we
have at least the right to cry out "shit".
dierk ohlerich
chaos/feb
> Isn't this OGL vs D3D discussion a bit like a discussion about C vs C++:
> programs written in C is often faster and results in less code than the
> same program written in C++. But when your code grows more complex you
> may realize that in the long term the C++ path will result in not less
> code but less work. [stuff about d3d deleted]
^^^^^^^^^
While I really hate it when people needlessly correct small typos on
the net, I think you meant to say "code that will work less often."
--
Blair MacIntyre (b...@cs.columbia.edu,bl...@acm.org), Gradual Student, CUCS
smail: Dept. of Computer Science, 1214 Amsterdam Ave, Mail Code 0401
Columbia University, New York, NY 10027-7003
Actually, no. Let me explain why.
Your choice of C versus C++ has little effect on the underlying
processor's instruction set choice or that instruction set's
implementation. Indeed, both C and C++ can be supported by basically
all general purpose CPUs. A architecture well-suited for C would be
fairly similar well-suited to C++ and vice versa. A debate over two
fairly similar programming languages is about what is convienent and
productive for programmers and does not appreciably influence
hardware design.
This is not true of OpenGL vs. Direct3D.
OpenGL and Direct3D are hardware abstraction layers (HALs). Because
these layers expose the capabilities of 3D hardware, they have a
great deal of influence over the performance and capabilities of
3D hardware. The long term choice of standard 3D HAL has a tremendous
influence on the direction of hardware architectures for
interactive 3D graphics.
While OpenGL has demonstrated that it can be implemented at varying
price/performance levels through a broad variety of architectures,
Direct3D will cap the performance of Direct3D implementations because
of the memory locality issues inherent in Direct3D's requirement to
transfer 3D commands to main memory before their eventual transfer
to 3D hardware.
With respect to specific hardware capabilities, OpenGL's model is
readily adaptable to new graphics capabilities like volume rendering,
image processing, and video. Indeed, these capabilities exist today.
On the other hand, Direct3D has no such capabilities and its limited
model makes it difficult to evolve these capabilities.
Additionally, OpenGL's programming model is easier to use and it lets
programmers be more productive developing 3D applications. I would
hesitate to make this point if it were not such a widely accepted
view that Direct3D's API is greatly inferior to OpenGL.
- Mark
Ooo, Return of the Jedi flashback:
Vader to Skywalker: "Don't underestimate the POWER of the Dark Side!"
--
###################################
# Brett Grimes #
# Sr. Software Engineer ASEC #
##gri...@asec.dayton.oh.us#########
Oh yes it is (as they say in Christmas Pantomimes). Both APIs ultimately
produce bursts of low-level protocol to drive the graphics hardware in
exactly the same manner as a C compiler produces instruction bursts for
a particular processor.
>
> While OpenGL has demonstrated that it can be implemented at varying
> price/performance levels through a broad variety of architectures,
> Direct3D will cap the performance of Direct3D implementations because
> of the memory locality issues inherent in Direct3D's requirement to
> transfer 3D commands to main memory before their eventual transfer
> to 3D hardware.
You keep banging on about the execute buffer issue you think you've
discovered yet its largely a none issue. Firstly, the PC architecture
is quite different from your SGI boxes so execute buffers make a lot
more sense. In particular, the graphics h/w sits on the other side of
the PCI bus so buffering up graphics commands makes a lot of sense,
particularly in light of bus mastering boards and AGP - your SGI boxes
buffer commands in some manner as well. Second, the fact that you
generate protocol by procedure call (eg glVertex()) in OGL rather than
building an IM buffer in D3D is an implementation detail - they're
equivalent. Ask yourself this. Could I slap a procedure call interface
on the back end of execute buffer architecture. Yes. Could I slap a
execute buffer interface on the backend of a procedure call
architecture. Yes. Lastly, double-buffered execute buffers largely
resolve contention issues between writers (the app) and readers (the
h/w).
>
> With respect to specific hardware capabilities, OpenGL's model is
> readily adaptable to new graphics capabilities like volume rendering,
> image processing, and video.
Jack of all trades... Having everything but the kitchen sink in an API
is not a "good think"(TM). Small, modular design is what you need for
games. Oh but that's right, OpenGL has the "fastest demonstrated
performance" doesn't it - on a machine not hugely different in size from
my kitchen.
> Additionally, OpenGL's programming model is easier to use and it lets
> programmers be more productive developing 3D applications. I would
> hesitate to make this point if it were not such a widely accepted
> view that Direct3D's API is greatly inferior to OpenGL.
>
But thanks for repeating this for our benefit. I'd clean forgotten from
yesterday. :-)
ads
--
-------------
Adam Billyard,
Criterion Software Limited,
Westbury Court, Buryfields, Guildford, Surrey, GU25AZ, UK.
Tel: +44 (0) 1483 406203 Fax: +44 (0) 1483 406211
Adam,
Salient points are made much better sans sarcasm. Your posts come
across as PC bigotry. Please remember that your choice of hardware is
NOT everyone elses. Frankly, I wish ALL of you would take this
discussion SOLELY into the rec.games.programmer group and flame each
other there.
Regards,
Brett
No. Something critical is happening here.
If you want to consider it a battle between OpenGL and D3D, consider
this:
OpenGL has years of experience and evolution behind it, and is well
established in the 'high end'. It is now well poised to move into
the 'low end', except for this silly thing called D3D.
If OpenGl makes it into the low end, something wonderful will
happen. The years and years of high end experience will finally
make it into volume manufacturing. I would assert that the biggest
difference between today's $1000 OpenGL hardware, and today's $300
3D game hardware is simply anticipated market volume.
In other words, if OpenGL makes it into games, the low end benefits
from years of high end experience, and the high end benefits from
low end volume. The bottom of the high end will be a game card,
and as silicon advances, game cards will creep toward the high end.
On the other hand, choosing D3D for games throws away those years of
experience. It twists low end hardware architectures toward a model
that is 'performance capped', at least until an incompatible version
comes out. It also confines OpenGl to the high end at high prices.
Finally, it muddies the path of growth for gaming cards as silicon
does become cheaper.
We could be on the verge of 'The Golden Age of 3D'.
We could also blow it.
Dale Pontius
(NOT speaking for IBM)
Judging by what I've heard from processor designers, I think the
growing use of C++ rather than C *is* having some significant effects
on modern instruction sets and their implementations. (And reportedly
C itself was a major influence on several of the instruction sets in
use today.) So I'm not sure I buy Mark's original analogy.
On the other hand, I think Adam might be challenging a bit too much of
Mark's point. Today there's a lot more inertia in CPU instruction sets
than there is in 3D graphics hardware feature sets. One consequence is
that 3D graphics hardware designs are fairly responsive to ``pushes''
from the 3D graphics APIs. That's relevant to the OpenGL vs. D3D
debate.
| You keep banging on about the execute buffer issue you think you've
| discovered yet its largely a none issue. ...
Based on past experience with execute-buffer-style APIs and GL/OpenGL,
I don't think this is true. Both types of APIs present implementation
problems, of course, but to me it seems a lot harder to work around the
problems in an execute-buffer-style API. Building the buffer adds
unavoidable memory accesses and latency, and exposing the format of the
buffer adds long-term compatibility constraints to the hardware design
process. Those are generic problems with buffering, and there are
other issues with the D3D execute buffer in particular. All this is
discussed in more detail in this document:
http://www.sgi.com/Technology/openGL/direct-3d.html
so I'll avoid repeating it here.
| ... Firstly, the PC architecture
| is quite different from your SGI boxes so execute buffers make a lot
| more sense. In particular, the graphics h/w sits on the other side of
| the PCI bus so buffering up graphics commands makes a lot of sense,
I'm not convinced there's very much difference nowadays. (I understand
there are software issues, but those all seem fixable, and a lot of
progress has been made in that respect.) At SGI we're always fighting
I/O bus bottlenecks analogous to the PCI bottleneck. Memory access
patterns and cache behavior affect our machines in much the same way
they affect modern PCs. And so on. Did you have something particular
in mind?
| ...Second, the fact that you
| generate protocol by procedure call (eg glVertex()) in OGL rather than
| building an IM buffer in D3D is an implementation detail - they're
| equivalent. Ask yourself this. Could I slap a procedure call interface
| on the back end of execute buffer architecture. Yes. Could I slap a
| execute buffer interface on the backend of a procedure call
| architecture. Yes. ...
This is an interesting question. I believe you're correct that the two
interface styles can be made semantically equivalent. I don't think
they're equivalent in performance, though. First, there are the
memory-reference and latency issues mentioned in the document above.
Second, there's the problem of handling varying amounts of data for
each vertex. Consider layering OpenGL (where some vertices may have a
color or a normal, while others don't) on D3D (where all vertices in an
execute buffer must have the same format). How expensive is it to keep
track of what's been changed from vertex to vertex, or to speculatively
copy all the vertex data from one vertex structure to the next,
assuming most of it will be reused? I'd be inclined to handle such
stuff at a lower (machine-dependent) level in the implementation, but I
don't know of any efficient way to do it with D3D execute buffers.
This might be a good topic for further conversation, since the general
principles would be useful to applications developers as well as
graphics API implementors. It touches on the relative cost of
procedure calls vs. memory references, and minimizing host-to-graphics
bandwidth requirements, among other things.
| ...Oh but that's right, OpenGL has the "fastest demonstrated
| performance" doesn't it - on a machine not hugely different in size from
| my kitchen.
You can buy an InfiniteReality in a deskside enclosure that's about the
size of two AT-style tower cases side-by-side. I suspect there's an
E&S Harmony configuration that's comparable in size. You must have a
small kitchen. :-)
Allen
: Jack of all trades... Having everything but the kitchen sink in an API
: is not a "good think"(TM). Small, modular design is what you need for
: games. Oh but that's right, OpenGL has the "fastest demonstrated
: performance" doesn't it - on a machine not hugely different in size from
: my kitchen.
Hey, 80 million poly/sec can't be wrong. ;-)
I honestly love the way some people "pull the wool over their own eyes."
Didn't anyone in the pro-D3D camp ever stop to think, "SGI has been in the
high 3D market since 1982. Maybe, just Maybe, they have a little more
experience with 3D APIs". Maybe, just Maybe, OpenGL has just a little
better design, and yes of course, Maybe, just Maybe, you're game company
will go bankrupt because your competitors used OpenGL and you used D3D.
When your graphics look 1/2 as good as your competitors OpenGL based game,
you'll just say, "shit!" Ask your self this question. What's better
120,000 un-textured flat shaded polys/sec or 30,000 mip-mapped phong
shaded polys/sec?
Ken
Actually, I think the execute buffer issue is the _only_ interesting issue
in this whole thread. I don't know enough about 3D HW to know, but I
think there are inherent advantages and disadvantages to both execute
buffers and the function-call based design like OpenGL, and it's a shame
that the two never fought it out fair and square on a technical playing
field. Initially D3D had the momentum due solely to marketing, and now
it looks like OpenGL is gaining momentum, but again, due mostly to talk,
not actual performance data. It's a shame Carmack didn't finish the
port so we could compare both on a 3Dfx or something.
Whether you can emulate one on top of the other isn't really relevant
when inherent performance is the issue.
Now if the rumors are true that Microsoft is going to punt the execute
buffers and go to a triangle based API there really won't be any
technical difference left between OGL and D3D, and these discussions
can finally degenerate completely into marketing spam...
Chris
been there, it really sucks rocks.
Sam
--
408-749-8798 / pa...@webnexus.com
I speak for xyne KS since I AM xyne KS.
Resisitance is not futile! http://www.be.com/
I'm not aware of any instruction set changes motivated chiefly by
C++ though that I'd be interested in hearing about them if
anyone knows of such cases. I think you may be confusing changes
in the _implementation_ of CPU instruction sets with actual changes
in the instruction set itself.
I don't doubt that many C++ programs behave differently than
traditional C programs, but I think the effects are more due to
cache and memory subsystem stresses than their instruction set
usage. Note that cache and memory subsystem design
changes are typically implementation changes, not architecture
changes.
Your instruction set should be considered an architecture. I
certainly agree that there are many ways to implement a given
CPU instruction set architecture that will result in greatly
varying price/performance. The architecture stays the same;
the implementation changes.
Consider OpenGL to be an architecture. OpenGL's API is analagous
to a CPU instruction set to the extent a given class of microprocessor
(defined by its common instruction set) is an architecture. Just
like microprocessors, you could implement OpenGL's architecture
at greatly varying price-performance levels. OpenGL actually allows
a great deal of leeway in implementation schemes.
You can also think of Direct3D as an architecture to with its API
being analagous to a CPU instruction set. Unfortunately, because
Direct3D exposes poorly conceived things like execute buffers as
part of its API, Direct3D implementations have much less leeway
for varations and are generally held back because of this.
Architectures change slowly and if you standardize on a poor
architecture, you tend to get stuck with the bad decision for years.
The fact that Direct3D is such a bad architecture and the risk of
being "stuck with it" is why I speak out against it so loudly
against it.
- Mark
Good points, but we also have to remember that SGI has been primarily
selling full-blown full 3d pipeline hardware acceleration since 1982. Sure
some of their machines had software portions for their pipelines, but
OpenGL is an API primarily borne from full-blown 3d hardware acceleration.
As a matter of historical accident (and marketroid priorities) we have not
yet seen equivalent performance from "software geometry pipe + hardware
scan converter" implementations on PCs. So in fairness to D3D advocates,
it does require something of a "leap of faith" to believe that OpenGL
performance will really become that good for PC's. It simply hasn't been
demonstrated yet.
However: D3D hasn't been demonstrated to be any faster than OpenGL at this
point. SGI's demo at SIGGRAPH proved that.
I don't think any of the software-side problems of OpenGL are
insurmountable, they just take a lot of engineering work. It's really only
a question of how long you think that work will take. And 3d vendors have
to be interested in performing the engineering work. It's easy for DEC to
be interested in the work, since our markets are historically
OpenGL-driven.
That's not quite true. SGI did a comparo at SIGGRAPH between their Cosmo
OpenGL and D3D. They concluded that performance was nearly identical.
Having studied the interfaces of D3D, it's plainly apparent that D3D is no
"super API"... it has no magic ingredients which make it inherently faster
than OpenGL or anything else. It's very straightforward 3d graphics, and
more than anything it looks like an OpenGL with all the interface names
changed to protect the innocent. :-)
> It's a shame Carmack didn't finish the
> port so we could compare both on a 3Dfx or something.
>
> Now if the rumors are true that Microsoft is going to punt the execute
> buffers and go to a triangle based API there really won't be any
> technical difference left between OGL and D3D, and these discussions
> can finally degenerate completely into marketing spam...
But you are perhaps missing the point as to why Carmack didn't finish the
port. Frankly, because the D3D API sucks. That's not a question of
marketing spam, it's a question of programmer productivity. Brian Hook
already sold me on the "D3D sucks" point a few months back. Carmack
ditching D3D is a pretty darn strong confirmation of his thinking. :-)
Also, would you rather trust your standard to a neutral arbitration body
(OpenGL ARB), or just to Microsoft? Generally, you opt for the proprietary
solution when you think they're doing a better job than everyone else. My
own humble opinion about D3D, is that clearly they are not. The kindest
thing you could say about D3D, is that they're doing the same job with the
names and faces changed.
Sure, we're all disappointed that the "OpenGL vs. D3D" debate isn't being
resolved on raw speed. But there are other legitimate technical issues
besides raw speed.
I think either I wasn't clear in my post or you missed my point. I
don't care about the API or who makes it, I am interested in the
inherent performance advantages and disadvantages on hardware of
execute buffers versus a more immediate function based thing like
OpenGL. I don't care about software implementations, and I don't care
about talk--I wanted some actual data, and there isn't any.
Chris
> > While OpenGL has demonstrated that it can be implemented at varying
> > price/performance levels through a broad variety of architectures,
> > Direct3D will cap the performance of Direct3D implementations because
> > of the memory locality issues inherent in Direct3D's requirement to
> > transfer 3D commands to main memory before their eventual transfer
> > to 3D hardware.
>
> You keep banging on about the execute buffer issue you think you've
> discovered yet its largely a none issue. Firstly, the PC architecture
> is quite different from your SGI boxes so execute buffers make a lot
> more sense. In particular, the graphics h/w sits on the other side of
Are Criterion still talking about optionally layering Renderware on top of
D3DIM?
- - --=-==< gco...@ea.com, Head of R&D Bullfrog Productions >==-=-- - -
Possibly dumb question, hopefully one of the hardware people will
answer it:
OK, in terms of functionality, OpenGL is a superset of D3d, but
mostly, they overlap. Also, we have the MCD architecture instead of
ICD (which is more closely tied to OpenGL).
Why couldn't a generic 3d device driver layer be designed for use by
_many_ 3d rendering APIs instead of having one per API?
For now it doesn't look like any specific API will take over
completely, and that means multiple drivers, something PC video card
companies have shown they are only able to do ineptly even at the
operating system level.
I guess the obvious answer is that OpenGL is as low level as you want
to go, and Inventor or RM or others (if they didn't try and provide
services that are low-level, which in such a design they shouldn't)
could be layered on that.
Anyone?
Also, does anyone know if the SGI Cosmo OpenGL dll will work with MCDs
written for Microsoft's OpenGL1.1? What happens when a user with
Cosmo installed goes out and buys a card that ships with an MCD for
95? How is cosmo implemented -- does it replace the MS DLL or do we
have to link to it seperately (in which case, how do we configure at
runtime for HW vs. no hardware?).
rsr b a l a n c e
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
please discontinue use of r...@msn.com
Chris Hecker (che...@netcom.com) writes:
>
> Now if the rumors are true that Microsoft is going to punt the execute
> buffers and go to a triangle based API there really won't be any
> technical difference left between OGL and D3D, and these discussions
> can finally degenerate completely into marketing spam...
How long will it be before we find out whether these rumours are true?
--
|-------------------| The Law of the Computer Industry:
| Anis Ahmad | - The best products will never have commercial
|-------------------| success. The commercially successful products
will be mediocre and difficult to improve.
Phew. Go easy there on the lithium.
When you say "full-blown full 3d pipeline" I interpret you to mean
doing both vertex processing and rasterization in specialized hardware.
SGI's low-end machines (for about 6 years now, since the original
R3000 Indigo Entry) perform their vertex processoing on the host CPU.
While I'm not privy to know how current Direct3D accelerators work, I'm
guessing that they are pretty likely to be interpolated scan
convertors just like the R3000 Indigo Entry. Some probably do
full triangle setup. Some also do texturing with hardware support
(the Indigo Entry lacked anything special for hardware texture mapping).
|> Sure some of their machines had software portions for their pipelines, but
|> OpenGL is an API primarily borne from full-blown 3d hardware acceleration.
The first initial OpenGL implementation was implemented for the R3000 Indigo
Entry. Current PC 3D hardware accelerators are about where SGI was when SGI
first implemented OpenGL. Admittedly, that was several SGI generations ago.
|> As a matter of historical accident (and marketroid priorities) we have not
|> yet seen equivalent performance from "software geometry pipe + hardware
|> scan converter" implementations on PCs. So in fairness to D3D advocates,
|> it does require something of a "leap of faith" to believe that OpenGL
|> performance will really become that good for PC's. It simply hasn't been
|> demonstrated yet.
I would guess that Intergraph might beg to differ with respect to their
Windows NT workstations. Maybe you just meant Windows NT.
- Mark
> Architectures change slowly and if you standardize on a poor
> architecture, you tend to get stuck with the bad decision for years.
> The fact that Direct3D is such a bad architecture and the risk of
> being "stuck with it" is why I speak out against it so loudly
> against it.
>
Fair enough, but you might equally argue that such a piece of crap
encourages quantum leaps in architecture rather than sticking with
something which is inoffensive. Besides, I thought it was understood by
everyone that Microsoft *always* get its wrong first time, but it does,
admittedly after lots of iteration, finally get it 'right'. I guess the
same will happen this time. The only way OpenGL will 'win' is for SGI
to bury D3D on this first iteration.
ads
PS I just remembered a sad computer joke: Whats the difference between a
declarative language and procedural language? In a declarative language
you say "you're an arsehole", in a procedural language you say "piss
off".
Many graphics chips support display lists in offscreen memory. In a
PC, because the PCI bus is so slow, it may be advantageous to write
chip commands to the display list, and let the graphics chip handle
them asynchronously. So, for example, if I have an API that draws a
triangle, the underlying implementation of that API might be just
to pack all data in a buffer and DMA it to the chip's display list.
In the case of Direct 3D, it's even easier: the command buffer could
be allocated in the chip's offscreen memory, and if the chip is
designed to execute commands directly from the Microsoft's command
structs, implementation is really simple: the user, as he/she writes
commands to the command buffer, is actually directing the chip at
the lowest level, by writing to the chip's display list. A real good
chip might even support multiple display lists.
So, the way I see it, it might be the case that, underneath the OpenGL
API, the lowest level of software is actually a display list manager.
Alberto.
Just to stir the waters, the reason I went with OpenGL was:
* Technical support through newsgroups like this.
* The small/easy setup and implementation footprint.
D3D's development and runtime footprint are bigger/more involved.
More so than my current project needs.
Each has their own strengths and weaknesses. D3D contains more
application level support functionality. It is more object oriented
and will track/maintain object relationships. In OpenGL object
relationships must be tracked/maintained by the application. This is
no big deal for simple projects but, gets harder to maintain/debug for
larger projects.
Games are a category of application that deal with many constantly
moving and changing objects. So for games D3D seems to make more
sense.
I'm not an expert on any of these APIs but, an API choice should be
well balanced and made on a case by case basis.
Happy New Year,
Michael.
>Also, would you rather trust your standard to a neutral arbitration body
>(OpenGL ARB), or just to Microsoft? Generally, you opt for the proprietary
>solution when you think they're doing a better job than everyone else. My
>own humble opinion about D3D, is that clearly they are not. The kindest
>thing you could say about D3D, is that they're doing the same job with the
>names and faces changed.
Another point along the same lines, is would you, the game programming
community, rather have Microsoft compete with you using an open API
(level playing field), or using an API that they control lock, stock,
and barrel? Many of you may remember the rumors of "Windows isn't
done until Lotus won't run...". Microsoft is getting into the game
development business in a big way.
Also, the potential for cross-platform code re-use seems to be getting
lost in the shuffle. Be, Apple (I believe) and several other
platforms support OpenGL. If accepted as a game development API, it
could mean easier ports to more platforms.
--
Terry Sikes | Software Developer
tsi...@netcom.com | C++, Delphi, Java, Win32 (in alphabetical order ;)
finger for PGP pub key | Objective objects objectify objectivity.
My opinions - mine only! | http://members.aol.com/tsikes
Don't be silly.
We have D3d because RealityLab was running acceptably on dx2-66
systems and OpenGL *still* doesn't run well (sw render) on a P_166_.
MS was shipping OpenGL for two years before D3d, and nobody cared but
Fujitsu, 3dLabs and a few others with cards in the $1k+ range. It's
only since d3d that cheap cards have flooded the market and you can't
say that this would have happened otherwise: consider what happened to
the preliminary non d3d cards: both Nvidia and the 3d blaster
[pre-permedia] flopped *hard*.
*That's* why we have D3d. It's why we *still* have D3d. It's why
companies are *still* using D3d *despite* the fact that coding D3d IM
apps is like chewing on glass & aluminum foil.
Market issues aside, most of what MS is doing today wrt: D3d comes
from inertia. If you guys want things to change, forget plots and
conspiracies. Give us a fast OpenGL that works fsexclusive.
| Also, the potential for cross-platform code re-use seems to be getting
| lost in the shuffle. Be, Apple (I believe) and several other
| platforms support OpenGL. If accepted as a game development API, it
| could mean easier ports to more platforms.
This, to me, is the main reason why we have D3D. The last think Microsoft
wants is portability to another platform.
--
Michael I. Gold Silicon Graphics Inc. http://reality.sgi.com/gold
And my mama cried, "Nanook a no no! Don't be a naughty eskimo! Save your
money, don't go to the show!" Well I turned around and I said, "Ho! Ho!"
How is this advantageous? Whether you use DMA or PIO the data still has
to get across the bus at some time, so the performance implications are
there no matter what. You can use DMA to try and hide some of this, but
DMA is NOT free -- DMA transfers can lock out access to L2 cache in
certain situations, for example.
You also have to consider the cache implications of writing to an
intermediate buffer.
> In the case of Direct 3D, it's even easier: the command buffer could
> be allocated in the chip's offscreen memory, and if the chip is
> designed to execute commands directly from the Microsoft's command
> structs, implementation is really simple: the user, as he/she writes
> commands to the command buffer, is actually directing the chip at
> the lowest level, by writing to the chip's display list. A real good
> chip might even support multiple display lists.
How is this any different than writing to an on-chip FIFO via PIO?
Brian
--
+-----------------------------------------------------------------+
+ Brian Hook, b...@wksoftware.com +
+ WK Software, http://www.wksoftware.com +
+ Consultants specializing in 3D graphics hardware and software +
+ --------------------------------------------------------------- +
+ For a list of publications on 3D graphics programming, +
+ including Direct3D, OpenGL, and book reviews: +
+ http://www.wksoftware.com/publications.html +
+-----------------------------------------------------------------+
I'm assuming you're talking about D3D RM?
> In OpenGL object
> relationships must be tracked/maintained by the application. This is
> no big deal for simple projects but, gets harder to maintain/debug for
> larger projects.
I don't see this being the case -- a tree to contain all your objects
doesn't sound like a huge programming task, and I (and many hundreds of
otherS) have done it before.
> Games are a category of application that deal with many constantly
> moving and changing objects. So for games D3D seems to make more
> sense.
Doubtful. Games is an area of development where people want to manage
objects themselves. More control for the developer.
Brian
> I'm not an expert on any of these APIs but, an API choice should be
> well balanced and made on a case by case basis.
Definitely.
And this transferred to...what? I think if RL had shipped as part of
Win95 2.5 years ago that things would've worked out great. But
Microsoft tried to "Directify" it and make it into an IM library, and
that was what caused its collapse.
> Fujitsu, 3dLabs and a few others with cards in the $1k+ range. It's
> only since d3d that cheap cards have flooded the market and you can't
> say that this would have happened otherwise: consider what happened to
> the preliminary non d3d cards: both Nvidia and the 3d blaster
> [pre-permedia] flopped *hard*.
For starters, the premiere 3D companies (3Dfx and Rendition) were
working on 3D cards LONG before D3D was a twinkle in anyone's eye. They
had hardware designs finished before 3D-DDI was even finished (the
precursor to D3D in some ways). They were concentrating on DOS titles,
not Windows titles, and added support for D3D after it started becoming
an issue.
The NV1 and 3DBlaster failed for a simple reason -- they sucked. D3D or
not, those boards didn't have the performance necessary to be
interesting.
> conspiracies. Give us a fast OpenGL that works fsexclusive.
Agreed, this needs to happen.
Brian Hook <b...@wksoftware.com> wrote in article <32D1A8...@wksoftware.com>...
> bro...@aecd.gov.ab.ca wrote:
> > Each has their own strengths and weaknesses. D3D contains more
> > application level support functionality. It is more object oriented
> > and will track/maintain object relationships.
>
> I'm assuming you're talking about D3D RM?
>
> > In OpenGL object
> > relationships must be tracked/maintained by the application. This is
> > no big deal for simple projects but, gets harder to maintain/debug for
> > larger projects.
>
> I don't see this being the case -- a tree to contain all your objects
> doesn't sound like a huge programming task, and I (and many hundreds of
> otherS) have done it before.
>
but RM does it pretty well, and scene management is very usefull. If it wasn't
then SGI wouldn't have Inventor and Performer. Arguing against RM because
"you can do itself anyways" sounds a little biased.
>
> > Games are a category of application that deal with many constantly
> > moving and changing objects. So for games D3D seems to make more
> > sense.
>
> Doubtful. Games is an area of development where people want to manage
> objects themselves. More control for the developer.
>
Since ease-of-use has been a issue that has been bandied back and forth
in the OpenGL vs. D3D IM discussion, it doesn't make sense to dismiss
it so quickly here. RM has it's advantages over IM in most cases. Unless
you are using dynamic geometry or using your own scene manager, it
wouldn't be a bad choice for a game. Admittedly, if your pushing the
envelope with effects, IM or OpenGL will give you more range, but theres a
good portion of things that can be done in RM.
--
Clinton Keith
Angel Studios
(W) cl...@angel.com
(H) clinto...@worldnet.att.net
Most PC chips do not do geometry processing, but do have full hardware
support for texture mapping, including perspective correction, bilinear
and trilinear filtering, mipmapping, etc. etc. So they are much more
sophisticated in their rasterization than the Indigo Entry.
> |> Sure some of their machines had software portions for their pipelines, but
> |> OpenGL is an API primarily borne from full-blown 3d hardware acceleration.
>
> The first initial OpenGL implementation was implemented for the R3000 Indigo
> Entry. Current PC 3D hardware accelerators are about where SGI was when SGI
> first implemented OpenGL. Admittedly, that was several SGI generations ago.
I think you are underestimating 3D on a PC. Like I said above, there are
quite a few PC chips that do most of the OpenGL rasterization pipeline in
hardware, which is quite a bit more advanced than a RGB span engine with
no Z or Alpha support as in the Indigo Entry.
> Alberto C Moreira wrote:
> > Many graphics chips support display lists in offscreen memory. In a
> > PC, because the PCI bus is so slow, it may be advantageous to write
> > chip commands to the display list, and let the graphics chip handle
> > them asynchronously. So, for example, if I have an API that draws a
> > triangle, the underlying implementation of that API might be just
> > to pack all data in a buffer and DMA it to the chip's display list.
> How is this advantageous? Whether you use DMA or PIO the data still
> has to get across the bus at some time, so the performance implications
> are there no matter what. You can use DMA to try and hide some of
> this, but DMA is NOT free -- DMA transfers can lock out access to L2
> cache in certain situations, for example.
Say you want to render a Gouraud-shaded triangle; only vertex data needs
to get accross. The rest is negotiated by the chip's setup engine,
asynchronously to the CPU. So, the chip may get a triangle request,
queue it, and render it to the video memory while the graphics app is
busy computing the next object.
DMA isn't free, and nothing is free. But it isn't a good idea to make
a CPU wait till the graphics chip finishes rendering. Highest throughput
is achieved when the chip works in parallel with the CPU, with an
asynchronous communication channel in between.
> You also have to consider the cache implications of writing to an
> intermediate buffer.
If the command buffer is the chip's display list, it resides in video
ram which is not cacheable. If not, it means that the app is writing
to a secondary command buffer, and this isn't any different from
doing a memory to memory bitblt. The app can queue command sequences
to the secondary buffer and go away do something more productive;
some other task, at the right time, can DMA that buffer to video memory.
> How is this any different than writing to an on-chip FIFO via PIO?
In the sense that the display list is your FIFO, and you can have as
big a FIFO as your offscreen video memory allows us. That gives a
very large buffering capacity between application and graphics chip,
increasing the chance that the chip will function at maximum throughput.
It also prevents the need for constant checking to see if a 16-deep
or so FIFO is full before issuing a PIO command, or suffering PCI bus
retries because the FIFO is full.
We have a display list capability in our Imagine 2 chip, and it's a
great performance increaser.
Alberto.
1) Because we're all speed freaks and want to hardwire everything. Layers
dilute performance.
2) Because the 3d APIs implement non-trivial quantities of functionality.
In practice, you don't have time to deal with a zillion different API's.
OpenGL *itself*, is like trying to optimize for a zillion different API's.
If I ever reach a point where I feel like I've optimized *everything*
possible in OpenGL, I will have brought the science of software engineering
to a completely new level. (In other words, it would take Perl to lay out
the software in some kind of automated fashion.)
> For now it doesn't look like any specific API will take over
> completely, and that means multiple drivers, something PC video card
> companies have shown they are only able to do ineptly even at the
> operating system level.
I disagree. I'm more and more willing to put my money on OpenGL, because
it's very apparent to me that it's a superior API with superior
ease-of-use. It also has a superior evolutionary mechanism: the OpenGL
Architecture Review Board (the ARB) is not controlled by any one vendor.
So OpenGL can grow in competitive directions... it's not limited to
Microsoft's monopolistic agendas the way D3D is.
Performance between OpenGL and D3D is a non-issue: they both do the same
things, they can both have the same performance, and it's only a question
of who's going to deliver the performance. We're going to be putting it on
Alpha NT boxes that cost $3000, and people will buy them. As the Intel ->
Alpha FX32! translation software improves, that's going to actually matter
to games folks. First to developers who need to number-crunch their
animations and etc. Then later, to home users. The only thing missing for
OpenGL to "win," is the marketing. And I think the marketing of OpenGL is
gaining momentum. Compaq joined the OpenGL ARB, for instance.
At commodity price points, i.e. $100-$200.
Incidentally, OpenGL 1.1 also has "vertex arrays" as an input method. You
create about 5 separate arrays for colors, normals, edge flags, etc. It is
thus possible to read data into OpenGL without the per-vertex function call
overhead that's typical of immediate mode. It's not quite as good as
having a unique memory packing type for every single kind of vertex, but
then again, it solves the recombinatorial problem that the latter implies.
You could probably write pretty reasonable OpenGL drivers under such an
interface. But OpenGL 1.1 is so new, that I don't know that a lot of work
has been done by either 3d vendors or application developers with vertex
arrays. At least the Microsoft implementation is OpenGL 1.1, though.
> Many graphics chips support display lists in offscreen memory. In a
> PC, because the PCI bus is so slow, it may be advantageous to write
> chip commands to the display list, and let the graphics chip handle
> them asynchronously. So, for example, if I have an API that draws a
> triangle, the underlying implementation of that API might be just
> to pack all data in a buffer and DMA it to the chip's display list.
>
> In the case of Direct 3D, it's even easier: the command buffer could
> be allocated in the chip's offscreen memory,
Yes but the whole problem is that those D3D execute buffers allow you to
point to any arbitrary chunk of memory. Meaning that the whole "memory
packing problem" is completely non-contiguous. You'd have to allocate a
*lot* of memory on the 3d chip's side of things, and transfer lotsa big
chunks of memory across the bus before you can start getting any results.
Since 3d card-side RAM ain't exactly free or cheap yet, that makes execute
buffers an unpalatable $$$$ hardware design. The alternative is to spend a
lot of time on the CPU side translating the execute buffer into something
that has better memory contiguity - like a triangle strip. Then you don't
need much bus bandwidth to start your rendering, and you don't need much 3d
card-side RAM. But translating the execute buffer begs a question: why the
hell are you using an execute buffer in the first place??!?
The major difference between a $1k card and a $300 card is economny
of scale. If the current $1k cards were to sell in the volumes
expected for $300 cards, they'd be priced a lot closer to $300.
It's kind of a chicken-an-egg thing. Volume vs Price.
The key behind OpenGL is that it turns 3D graphics into a continuous
spectrum, and gets rid of the low-end/high-end dichotomy. That means
that a performance path for future games is clearly defined. It also
likely means that low-cost performance improvements will be made in
the low end and make their way to boost the high end, as well.
Dale Pontius
(NOT speaking for IBM)
How much scale does it take to make an $800 price difference? That's
the difference between a Monster3d and the closest OpenGL competitor
(*today*). How much scale of economy did Diamond and Orchid have with
the first shipment of their 3d add-on cards? $800 per unit?
That doesn't seem like a reasonable explanation for the prices we're
seeing _today_.
The real reason for the totally different pricing is that the OpenGL
card vendors weren't of a mind to price their cards on the low-end.
They wanted to charge big bucks, and they did.
A nasty side effect of this was that the pricing of high-end cards and
the lack of OpenGL on 95 pretty much destroyed the market for low-end
cards prior to Direct3d, and frankly, there weren't any OpenGL cards
coming. [The Matrox cards were never taken seriously as OpenGL cards
(and were useless if the app requested a 32bit Z, as most did)]
>of scale. If the current $1k cards were to sell in the volumes
>expected for $300 cards, they'd be priced a lot closer to $300.
>
>It's kind of a chicken-an-egg thing. Volume vs Price.
I'd be more willing to grant this point if the 3d vendors who are
targetting OpenGL (e.g., those selling GLiNT+Delta&c) were actually
pricing their cards aggressively, but they aren't. The high-end cards
clearly have the performance to sell in the same kind of volume as the
low-end Verite and 3dfx cards, and their pricing is not the result of
hardware costs (we aren't talking 16MB or 32MB cards, here, and if
3dLabs is pricing themselves out, same point applies), so why aren't
they being priced as such?
They have one other clear advantage: they work with the GUI, meaning
they aren't restricted to gaming purposes.
>The key behind OpenGL is that it turns 3D graphics into a continuous
>spectrum, and gets rid of the low-end/high-end dichotomy. That means
>that a performance path for future games is clearly defined. It also
>likely means that low-cost performance improvements will be made in
>the low end and make their way to boost the high end, as well.
Maybe. When has this ever worked in practice? How many games do you
know that really scale well?
In any case, my original point was about why D3d exists _on the PC_
[stress this].
We have D3d because OpenGL was going absolutely nowhere, fast. Nobody
was selling low end cards. Nobody was developing them. Nobody even
laughingly considered it an acceptable API in which to write games.
If D3d hadn't come along, the PC low end 3d hardware market would, at
best, still be a mess with one or two hardware standards taking over
much as the SoundBlaster did. There would be no Cosmo OpenGL because
SGI never saw fit to tune the reference implementation prior to MS
pushing Direct3d.
I agree that in a few years, when OpenGL hardware has settled out
(IMHO, cards which only support fullscreen operation will never make
adequate OpenGL cards) it will be the obvious choice. I also agree
that those starting on a lengthly project should probably look into
OpenGL (even Inventor) because that's where the industry is going.
Rather, I think we need to understand why the market is the way it is
so we can see where it's going instead of rallying to one standard or
another for non-market related reasons.
Right now, things could go either way. Everything depends on Cosmo
OpenGL, how suitable it is for games, how hardware support is going to
work, and so on. If it's not suitable for games, then Direct3d is
what we end up with, like it or not (and since nobody responded with a
yes to my question if anyone actually liked d3d, I think the answer is
"not" for everyone in this newsgroup).
I'm not saying that RM doesn't have its place, I'm simply stating that
RM isn't what world class games are going to be written with.
> Since ease-of-use has been a issue that has been bandied back and forth
> in the OpenGL vs. D3D IM discussion, it doesn't make sense to dismiss
> it so quickly here.
Ease-of-use isn't necessarily the issue, it's "ease-of-use vs.
flexibility". With RM you lose some flexibility in how you manage your
scene in exchange for better ease of use. The OGL vs. D3D debate
concentrates on ease of use vs. each other.
> RM has it's advantages over IM in most cases. Unless
> you are using dynamic geometry or using your own scene manager, it
> wouldn't be a bad choice for a game.
Except that most games do use their own scene managers and have dynamic
geometry. The exception probably being space flight sims (even then,
explosions wouldn't necessarily be easy).
Admittedly, if your pushing the
> envelope with effects, IM or OpenGL will give you more range, but theres a
> good portion of things that can be done in RM.
Right -- and I think that world class games are going to be doing just
that -- pushing the envelope.
This is what deep FIFOs are for. The advantage is that you can achieve
MORE parallelism with a deep FIFO than with DMA, because every command
write to the hardware can IMMEDIATELY be followed up an action on the
hardware's part. If you wait until you've batched up the commands,
there is some "idle time". Not only that, but the batching process
itself involves writing to main memory, which will affect cache
coherency.
> If the command buffer is the chip's display list, it resides in video
> ram which is not cacheable.
If the command buffer is off-host, then you don't need DMA at all!
> In the sense that the display list is your FIFO, and you can have as
> big a FIFO as your offscreen video memory allows us. That gives a
> very large buffering capacity between application and graphics chip,
> increasing the chance that the chip will function at maximum throughput.
> It also prevents the need for constant checking to see if a 16-deep
> or so FIFO is full before issuing a PIO command, or suffering PCI bus
> retries because the FIFO is full.
Such busy waiting is extremely poor practice on the part of chip
designers -- there are ways of getting around this. PCI bus retries are
bad, but that means your FIFO isn't deep enough (or your hardware isn't
flushing it fast enough).
> We have a display list capability in our Imagine 2 chip, and it's a
> great performance increaser.
3Dfx doesn't use DMA, and they have the fastest 3D acceleration on the
market today. Proof is in the pudding.
And many of these cards were using early (expensive) processes -- .8u
and .6u. And many of these early chips were designed to accelerate
large portions of OpenGL (including stencil, overlay, underlay, etc.)
And these cards often shipped with extremely large amounts of DRAM and
VRAM (for large Z-buffers, high resolution, and lots of textures) -- and
this was back when RAM was $50/MB.
> They wanted to charge big bucks, and they did.
I don't think "wanted" is necessarily true. Granted, Glint based boards
had a near monopoly due to lack of competition, but there were other
solutions (Fujitsu Sapphire).
> A nasty side effect of this was that the pricing of high-end cards and
> the lack of OpenGL on 95 pretty much destroyed the market for low-end
> cards prior to Direct3d, and frankly, there weren't any OpenGL cards
> coming. [The Matrox cards were never taken seriously as OpenGL cards
> (and were useless if the app requested a 32bit Z, as most did)]
This is incorrect. There were chips in the design stage long before
Direct3D was announced. These chips (Verite, Voodoo, NV1, Yamaha RPV,
3DBlaster, etc.) were designed to be accessed via proprietary DOS APIs.
Some of these never even saw D3D drivers.
> Maybe. When has this ever worked in practice? How many games do you
> know that really scale well?
Not many games are written for OpenGL (yet), but the one reference
example, GLQuake, scales from a 3DLabs Permedia board (lackluster
performance) to Voodoo Graphics (good performance) to Intergraph to SGI
O2 all the way up to SGI Onyx2/IR (awesome performance). Same code base
for all of them (except for GUI issues). I would dare say that's an
impressive amount of scaleability for a single app.
> We have D3d because OpenGL was going absolutely nowhere, fast. Nobody
> was selling low end cards. Nobody was developing them.
Once again, this is factually incorrect. By the time D3D was announced
development of 3D chips was going on in earnest at many major 2D chip
companies (Cirrus Logic Mondello, S3 Virge, ATI Rage), and several
startups specializing in 3D had designs either mostly completed or
actually done (NVidia, Rendition, 3Dfx Interactive, etc.).
> Nobody even
> laughingly considered it an acceptable API in which to write games.
Granted, no one was looking at OpenGL as the API for games, but then
again, most of these companies were planning on supporting DOS games via
proprietary APIs. 3Dfx had Glide, ATI had Rage3D (or something like
that), Rendition had SPD3D, Nvidia had its own development tools, S3 had
its S3 DTK, Creative had its CGL, etc. etc.
3D-DDI support (precursor in some ways to D3D) was an afterthought for
most of these companies, and this extended into D3D eventually.
> If D3d hadn't come along, the PC low end 3d hardware market would, at
> best, still be a mess with one or two hardware standards taking over
> much as the SoundBlaster did.
In all likelihood if D3D hadn't come along OpenGL's driver model would
have been improved and one day it would be the standard, and until then
folks would be using proprietary libraries for A.) the coolest cards or
B.) the card companies who paid the most money. And I'm sure that a 3D
analog to HMI/Miles Audio would have come out and written a
rasterization library that sits on top of most everyone else's
proprietary libraries.
I doubt we would have seen hardware standards -- reverse engineering a
3D part is nearly impossible.
> I agree that in a few years, when OpenGL hardware has settled out
> (IMHO, cards which only support fullscreen operation will never make
> adequate OpenGL cards) it will be the obvious choice.
There is only one full screen only card out there right now, and it's
already "old hat" and will be supplanted by a second generation probably
within a year. And you can bet that second generation will not be full
screen only (there is actually an intermediate, Voodoo Rush, that will
address much of these problems).
> Right now, things could go either way. Everything depends on Cosmo
> OpenGL, how suitable it is for games, how hardware support is going to
> work, and so on. If it's not suitable for games, then Direct3d is
> what we end up with, like it or not (and since nobody responded with a
> yes to my question if anyone actually liked d3d, I think the answer is
> "not" for everyone in this newsgroup).
I think you're overestimating the importance of Cosmo OpenGL -- it's an
interesting project, but I don't think it's in the critical path for
OpenGL development. Microsoft's OpenGL implementation for Win95 is
pretty damned good (I know, since I'm constantly trying to match its
speed), and once they get their MCD for Win95 out things are going to be
a lot clearer.
The sample implementation was designed to be a porting base, for use by
a large number of vendors with a huge variety of machines. A tuned
x86-based implementation of OpenGL for PC graphics cards is a fine
thing, but it doesn't meet the same set of needs that the sample
implementation meets. In the early days, the sample implementation was
more important.
SGI's position is that responsibility for tuning must belong to the
OpenGL licensees, because only the licensees have the necessary
knowledge and expertise for their machines. SGI had no expertise in
Windows and x86 systems, so it made sense to leave that tuning task to
Microsoft. As you noted, the aggressive misinformation campaign for
D3D eventually made it necessary for SGI to move onto Microsoft's
turf. We'll see how things develop. Having a single high-quality
implementation is probably a better answer in the long run.
| Right now, things could go either way. Everything depends on Cosmo
| OpenGL, how suitable it is for games, how hardware support is going to
| work, and so on. If it's not suitable for games, then Direct3d is
| what we end up with, like it or not (and since nobody responded with a
| yes to my question if anyone actually liked d3d, I think the answer is
| "not" for everyone in this newsgroup).
At the moment, I'd say Cosmo OpenGL is an important part of the
solution, but I wouldn't say that *everything* depends on it. Hardware
vendor support for OpenGL is more critical, if OpenGL is to be used by
game developers.
Allen
> Yes but the whole problem is that those D3D execute buffers allow you
> to point to any arbitrary chunk of memory. Meaning that the whole
> "memory packing problem" is completely non-contiguous. You'd have to
> allocate a *lot* of memory on the 3d chip's side of things, and
> transfer lotsa big chunks of memory across the bus before you can start
> getting any results.
Direct Draw always had the concept of "primary" and "secondary" surfaces.
Prior to that, Windows always supported the concept of memory bitmaps.
The concept is nice, because we can setup, say, a big bitmap in memory
composed of a number of smaller ones - which are faster rendered in
memory than out to the graphics chip - and then blt the big bitmap out,
taking advantage of the fact that graphics chips render big spans faster
than small ones.
Having a memory-resident execute buffer is an application prerogative.
For example, it may be a good idea to keep a couple of memory buffers
for different animated objects, having these buffers fed into the
graphics subsystem via on-going DMA transactions. The Microsoft
design reflects the desire to give the user this flexibility.
> Since 3d card-side RAM ain't exactly free or cheap yet, that makes
> execute buffers an unpalatable $$$$ hardware design.
It depends. Today we have 1, 2, 4 and 8 Mb boards. Say you have an 8Mb
board and you're running 1024x768 resolution. There will be a fair
amount of video memory available to host a display list. I can see
boards going up to 16Mb or even higher, to host textures and display
lists on board. As an alternative, keeping them in host memory allows
for example an AGP machine to send them down real fast using DMA.
> The alternative is to spend a lot of time on the CPU side translating
> the execute buffer into something that has better memory contiguity -
> like a triangle strip. Then you don't need much bus bandwidth to start
> your rendering, and you don't need much 3d card-side RAM. But
> translating the execute buffer begs a question: why the hell are you
> using an execute buffer in the first place??!?
This sort of translation is better left to the hardware. The way I see
it, the only data that needs to go accross the bus is vertex X, Y, Z,
U, V, W, and a few others. The chip has hardware enough to do all that's
needed to render the triangle. RAM isn't that expensive either.
The fact that all of us need to face is this: graphics functionality is
migrating fast from the host into the chip. Soon we'll be in a situation
similar to printers, where we send them postscript: that is, a source
program. The printer does the rest itself, even translation of source
into executable is done by the printer and away from the host.
So, graphics APIs will increasingly migrate into the graphics subsystem,
and whole API calls will be handled just by packing parameter data and
shipping it out through the bus. The only thing the host will worry about
is to get the input data fast accross the bus, and that's where display
lists help.
In fact, even today, I bet it's possible to build up quite a respectable
implementation of OpenGL on top of Direct 3D immediate mode.
Alberto.
Brett,
Agreed. However, I sent no such message ("tat"). No idea who did.
Perhaps you should check before casting around phrases like "bigot".
Besides, if you so completely misunderstand the central core of this
thread as to think its about hardware platforms, you're wasting your
time reading it.
>
> In fact, even today, I bet it's possible to build up quite a respectable
> implementation of OpenGL on top of Direct 3D immediate mode.
You're a cruel man!
ads
Who says it has to be on-chip?
> previously queued get handled. Moreover, an on-chip FIFO fills
> rather quickly, meaning that the software cannot write to the FIFO
> without asking for the FIFO depth.
I'd rather have a very deep memory-backed FIFO that occasionally issues
a PCI retry than having to poll FIFO depth. I know architectures that
do this and get phenomenal performance.
> If you have a command and the FIFO is full, there's basically two
> things you can do: you can wait or you can block.
The point is that you avoid getting full! Make sure you have a REAL
deep FIFO (a la 3Dfx) and/or make sure you can process triangles at a
pretty good clip.
> A display list in offscreen memory doesn't delay anything, and
> isn't cached anyway. An app can push data onto the display list
> and then go on do something more profitable in the meantime.
This sounds a lot like just doing a memory backed FIFO. Why not,
instead of a display list, just write commands to the card's RAM and
give the chip a pointer to it?
> DMA frees the host to do something more profitable, and DMA is usually
> faster. The PCI bus has a pretty decent DMA capability, why not use it
> to one's best advantage ?
Because it (supposedly) can lock out access to L2 cache during large DMA
transfers. DMA is NOT as free as we would like.
> Global 3D performance depends on other things too; but feeding triangle
> data fast enough into the pipeline is one most important thing.
Granted, but the 3Dfx can sink data faster than any other board out
there.
> I don't know the 3Dfx, but remember, we're in the Windows world too.
> It isn't enough to do fast 3D; you also have to do fast BitBlt and
> text rendering. And even 3D rendering will profit from a display list.
Check into the 3Dfx. My benchmarks have shown it to soundly destroy
everyone in terms of throughput and fill rate. No bus mastering.
Wow, sounds a lot like high-end SGI machines. :-)
> The only thing the host will worry about
> is to get the input data fast accross the bus, and that's where display
> lists help.
For static data, display lists are fine...but for dynamic data, I'll
pass.
> In fact, even today, I bet it's possible to build up quite a respectable
> implementation of OpenGL on top of Direct 3D immediate mode.
Try a sample implementation then come back and say this. I've actually
looked into this, and it simply is not feasible. VERY simple things are
difficult to do -- glColor3f vs. glNormal3f, for example.
> This is what deep FIFOs are for. The advantage is that you can achieve
> MORE parallelism with a deep FIFO than with DMA, because every command
> write to the hardware can IMMEDIATELY be followed up an action on the
> hardware's part. If you wait until you've batched up the commands,
> there is some "idle time". Not only that, but the batching process
> itself involves writing to main memory, which will affect cache
> coherency.
How deep a FIFO can we bury inside a chip ? If I'm rendering to a frame
buffer that fits in, say, 4Mb, and my display card has 8Mb of memory,
I have 4Mb I can dedicate to a display list. A command written to
a FIFO won't be handled immediately, but only when the command words
previously queued get handled. Moreover, an on-chip FIFO fills
rather quickly, meaning that the software cannot write to the FIFO
without asking for the FIFO depth.
If you have a command and the FIFO is full, there's basically two
things you can do: you can wait or you can block. Neither of these
helps your performance, because your app or your bus will be idle
until you have space in the FIFO. Moreover, you can't know the
FIFO is empty until you check it; therefore, if you don't want to
keep retrying, you'll have to issue one additional read to the
bus, and that's slow.
A display list in offscreen memory doesn't delay anything, and
isn't cached anyway. An app can push data onto the display list
and then go on do something more profitable in the meantime.
> If the command buffer is off-host, then you don't need DMA at all!
DMA frees the host to do something more profitable, and DMA is usually
faster. The PCI bus has a pretty decent DMA capability, why not use it
to one's best advantage ?
> Such busy waiting is extremely poor practice on the part of chip
> designers -- there are ways of getting around this. PCI bus retries
> are bad, but that means your FIFO isn't deep enough (or your hardware
> isn't flushing it fast enough).
An on-chip FIFO won't ever be deep enough, and no graphics hardware is
fast enough to always empty the fifo before a new request comes out.
There's basically three ways to handle a fifo full condition: the
software can poll a status bit, or the graphics board can generate an
interrupt, or the bus can keep retrying until there's space. Neither
of them is palatable; polling eats CPU time, interrupts are slow and
messy and need special software handling, and retries hog the bus.
Having a display list means having megabytes of FIFO space available,
no need to check anything.
> 3Dfx doesn't use DMA, and they have the fastest 3D acceleration on the
> market today. Proof is in the pudding.
Global 3D performance depends on other things too; but feeding triangle
data fast enough into the pipeline is one most important thing.
I don't know the 3Dfx, but remember, we're in the Windows world too.
It isn't enough to do fast 3D; you also have to do fast BitBlt and
text rendering. And even 3D rendering will profit from a display list.
Alberto.
One of the features of i860 is ability to bypass the cache by flipping
an address bit (that was not brought out to the external memory interface).
Does x86 support anything like this? At least you can avoid trashing the
cache if so, though you may still incur idle time on the accelerator.
Jon
__@/
: > Alberto C Moreira wrote:
[Alberto's informed commentary regretfully snipped]
: We have a display list capability in our Imagine 2 chip, and it's a
: great performance increaser.
Alberto has a very thorough understanding of the performance issues
in a modern PC. I quite agree with his comments. DMA, while not
entirely free, is certainly a large performance boost vs. PIO.
A PCI bus master capable of burst memory transactions not only offloads
the CPU but is several times faster than PIO. The CPU can continue
relatively unhindered from the L1 and L2 caches with some minor
overhead for cache snooping/updating if the PCI bus master happens
to be writing to cached locations or reading from cache locations
that are dirty and not yet written back to main memory. Moreover,
the external bus master will not hold off the CPU for long even in
the case where it needs access to main memory... the core logic
bus arbitor will interleave cycles between all devices requesting
transactions on the PCI bus.
All the latest generation of PCI peripheral devices that need to
move large amounts of data are incorporating PCI burst bus mastering
functionality. This includes SCSI, network controllers, IDE,
video capture, and plain old video controllers.
David Springer
Laptop BIOS Programmer
Dell Computer Corporation
--
************ IGAMES ONLINE ENTERTAINMENT SYSTEM ***********
* *
* A client/server system designed from the ground up to *
* support real time multi-player games on the Internet. *
* Game developers, players, and ISP's check it out at: *
* *
****************** http://www.igames.com ******************
: The major difference between a $1k card and a $300 card is economny
: of scale. If the current $1k cards were to sell in the volumes
: expected for $300 cards, they'd be priced a lot closer to $300.
Not entirely. High end cards typically use gobs of dual-ported
VRAM while low end cards use standard DRAM. VRAM is more than
DRAM no matter what economy of scale is considered.
: The key behind OpenGL is that it turns 3D graphics into a continuous
: spectrum, and gets rid of the low-end/high-end dichotomy. That means
: that a performance path for future games is clearly defined. It also
: likely means that low-cost performance improvements will be made in
: the low end and make their way to boost the high end, as well.
If that was the case why aren't we all playing games on workstation
class PC's ?
Economy of scale isn't a universal concept and it doesn't work in
all situations. Also, low-end/high-end dichotomies sometimes exist
for a good reason... and handily explains why I'm not using a CRAY
XMP to write this post.
David Springer
: Yes but the whole problem is that those D3D execute buffers allow you to
: point to any arbitrary chunk of memory. Meaning that the whole "memory
: packing problem" is completely non-contiguous. You'd have to allocate a
Non-contiguous memory isn't really a problem with a modern PCI burst
bus master. Scatter/gather DMA is the key as it allows the DMA controller
to either "gather" a chunk of memory from disparate locations or
"scatter" a chunk out to disparate locations.
The immediate need, and first real use that I know of for scatter/gather,
is in network controllers. A packet being sent out to the network is
composed of various header segments and data that are each typically
supplied by one of the OSI software layers. Instead of each layer
adding its piece of information to the packet and passing it along
to the next layer it simply puts the data in a known local space and
the NIC, using scatter gather DMA, will round up all the pieces right
before the packet is sent. The reverse process occurs on received
packets. This greatly increases the efficiency of the NIC driver
and protocol stacks.
A similar method can be done for graphics information when the
graphics controller has scatter/gather ability.
David Springer
Laptop BIOS Programmer
Dell Computer Corporation
--
>
>I don't think "wanted" is necessarily true. Granted, Glint based boards
>had a near monopoly due to lack of competition, but there were other
>solutions (Fujitsu Sapphire).
>
Minor correction: the Fujitsu Sapphire was a Glint board too (I've
got one; still don't know what there chipset plans are now), but there
were other CAD oriented solutions which didn't seem to work very well
(like Evans & Sutherland PC Freedom). In any case, your argument is
correct: "high" prices for PC CAD boards have historically been due
to the obvious: increased silicon requirements and low market volume.
Eric Hoppe
Lead Engineer, Visualization Team
Xerox PARC
Sorry for the late reply.
>> Each has their own strengths and weaknesses. D3D contains more
>> application level support functionality. It is more object oriented
>> and will track/maintain object relationships.
>I'm assuming you're talking about D3D RM?
Yes, I was referring to RM.
>> In OpenGL object
>> relationships must be tracked/maintained by the application. This is
>> no big deal for simple projects but, gets harder to maintain/debug for
>> larger projects.
>I don't see this being the case -- a tree to contain all your objects
>doesn't sound like a huge programming task, and I (and many hundreds of
>otherS) have done it before.
I'm not saying we should stop trying to "improve the state of the art"
but, do something better with our time. If "many hundreds of others"
have done it before, then their must be a need, in many cased, for
generic storage/hierarchy functions in the standard implementation.
>> Games are a category of application that deal with many constantly
>> moving and changing objects. So for games D3D seems to make more
>> sense.
>Doubtful. Games is an area of development where people want to manage
>objects themselves. More control for the developer.
D3Ds RM manages/stores not only the objects but, all the transforms
applied to them. As I understand this, when an object is rotated
or translated the transform are applied independently to the object.
In other words, transforms are applied to an object, not
cumulatively to the scene or MODELVIEW. In this way, a turret on
a tank may be rotated with a single RM statement without affecting
the rotation of the tank body. This seems equivalent to:
glPushMatrix..
glTranslate..
glRotate..
glCallList(object)..
glPopMatrix..
in the window repaint of an OPENGL application. Because RM manages
most of this for me, a single statement in the window repaint places
the object. Sure, I do have the flexibility to optimize the above
OPENGL by removing unneeded calls to glTranslate, glRotate, etc.. but,
who says that D3D doesn't do that internally for me anyway.
I prefer having a set of functions to construct/store objects and a
set of functions to place objects. OpenGL is a little to much in my
face and doesn't have a RM style option in the standard
implementation.
What did I mean by optimized 2 paragraphs ago. In theory, because
the RM and low level rendering routines are part of a single package,
the low level routines could look through the object hierarchy to
perform additional scene optimizations. This may or may not be
happening in D3D but, with a segregated implementation like
Inventor -> OpenGL it is much less likely to happen.
At least D3D RM provides a generic set of application oriented
functions in the default install. Yet still provides IM access for
those who wish to do it themselves.
An RM style interface fulfills the development communities need to
produce applications not more functions libraries. I don't believe
using RM precludes allowing model, like explosions, to be maintained
in the application. If you need or believe you can manage things
better yourself then you can use the IM level.
I don't know a lot about the details of Cosmo OpenGL but, it should
make the playing field more level. If it delivers more RM level
functionality while still allowing access to the IM level then I
can't see many programming reasons to go with D3D.
My biggest complaint with OpenGL has been that it has been build up
to be more than it is. This can easily be said of anything, including
D3D. I would have thought that such a mature implementation would
have more default functions (in an RM layer), so that the first cut
can get up an running quickly. Leaving the optimization, including
more use of an IM level, for cut #2.
Dump trucks and sports cars have different objectives. Pick what
suits your needs without getting in your way. I choice OpenGL for my
current project but, I may use D3D for my next one. There is no
substitute for personal experience to allow someone to decide when to
choose each. I believe OpenGL still has an edge as far as image
quality goes. However, this edge may only be a learning curse issue
with D3D.
Unfortunately, because e-mail has not advanced to the Vulcan mind-meld
level yet, please understand the intent, not the letter, of my
discussion.
I wish you the best of luck in all your endeavors,
Michael.
PS. I didn't think much of D3D until I saw EarthSeige running
as well on my 486 under Windows 95 with textures as
MechWarrior 2 did in DOS without textures.
>The immediate need, and first real use that I know of for scatter/gather,
>is in network controllers. A packet being sent out to the network is
>composed of various header segments and data that are each typically
>supplied by one of the OSI software layers. Instead of each layer
>adding its piece of information to the packet and passing it along
>to the next layer it simply puts the data in a known local space and
>the NIC, using scatter gather DMA, will round up all the pieces right
>before the packet is sent. The reverse process occurs on received
>packets. This greatly increases the efficiency of the NIC driver
>and protocol stacks.
David, nice hardware solution to a software problem, but writing the
software in the first place to handle the packet within a single fixed
memory location is faster than the scatter/burst DMA handling of a packet.
The same would go for graphics after a time. If you want real performance,
than the system should not be doing any sort of bus transfer. It should be
able to directly access the system memory, as in dual ported ram, etc. I
have to wonder what sort of performance we're going to have shortly with
multi processor machines and shared memory. I don't believe that 3D chips
that that much of a life span.
Wilbur
<stuff deleted>
Just to be clear: This isn't an issue of missing functionality in
OpenGL because scene graph management is available via higher
level API's that sit on top of OpenGL (eg. Inventor, Performer)
So the question is: Is there a benefit to having the low level rendering
routines "reach up" and perform optimizations on the scene graph?
I don't think so. You really want the scene graph library to
present the data in an optimal way to the low level API. (For example,
the data can be organized to minimize state changes and texture downloads.)
In other words, it is the implementation of the scene graph library that
should make these optimizations depending on what the underlying hardware
can do.
Also, for accelerated machines, you need a good hardware abstraction
layer. (OpenGL is the hardware abstraction layer for Inventor and
Performer. D3D IM is the hardware abstraction layer D3D RM) This
is probably the most important reason for decoupling the scene graph
API and the immediate mode API.
>At least D3D RM provides a generic set of application oriented
>functions in the default install. Yet still provides IM access for
>those who wish to do it themselves.
>
> An RM style interface fulfills the development communities need to
>produce applications not more functions libraries. I don't believe
>using RM precludes allowing model, like explosions, to be maintained
>in the application. If you need or believe you can manage things
>better yourself then you can use the IM level.
>
>I don't know a lot about the details of Cosmo OpenGL but, it should
>make the playing field more level. If it delivers more RM level
>functionality while still allowing access to the IM level then I
>can't see many programming reasons to go with D3D.
>
Cosmo OpenGL is just an implementation of the OpenGL API on Pentium.
Cosmo3D is a scene graph API that combines the best of Performer and
Inventor.
>My biggest complaint with OpenGL has been that it has been build up
>to be more than it is. This can easily be said of anything, including
>D3D. I would have thought that such a mature implementation would
>have more default functions (in an RM layer), so that the first cut
>can get up an running quickly. Leaving the optimization, including
>more use of an IM level, for cut #2.
>
The goal for OpenGL is to expose those features that CAN be
accelerated in hardware. (This is what makes it such a good hardware
abstraction layer.) Utility functions are relegated to higher
level API's (eg., Inventor and Performer do scene graph management.
GLU provides matrix utilities)
>Dump trucks and sports cars have different objectives. Pick what
>suits your needs without getting in your way. I choice OpenGL for my
>current project but, I may use D3D for my next one. There is no
>substitute for personal experience to allow someone to decide when to
>choose each. I believe OpenGL still has an edge as far as image
>quality goes. However, this edge may only be a learning curse issue
>with D3D.
>
>Unfortunately, because e-mail has not advanced to the Vulcan mind-meld
>level yet, please understand the intent, not the letter, of my
>discussion.
>
>I wish you the best of luck in all your endeavors,
>
>Michael.
>
>PS. I didn't think much of D3D until I saw EarthSeige running
> as well on my 486 under Windows 95 with textures as
> MechWarrior 2 did in DOS without textures.
>
>
--
======================================================
wom...@sgi.com 415-933-1946 MS 8U-590
(note the new prefix ^^^)
2011 N. Shoreline Blvd, Mountain View, CA 94039-7311
You're putting an awful lot of faith into memory-mapped IO.
Historically, very true. And I think you are right to point fingers at
people who had no priority/interest in selling OpenGL in the commodity
space. D3D was indeed necessary to create free market pressure.
> I agree that in a few years, when OpenGL hardware has settled out
> (IMHO, cards which only support fullscreen operation will never make
> adequate OpenGL cards) it will be the obvious choice. I also agree
> that those starting on a lengthly project should probably look into
> OpenGL (even Inventor) because that's where the industry is going.
>
> Right now, things could go either way. Everything depends on Cosmo
> OpenGL, how suitable it is for games, how hardware support is going to
> work, and so on. If it's not suitable for games, then Direct3d is
> what we end up with, like it or not (and since nobody responded with a
> yes to my question if anyone actually liked d3d, I think the answer is
> "not" for everyone in this newsgroup).
Pardon while I reveal my biases, but everything doesn't depend on just
SGI's Cosmo OpenGL. People seem to forget that Microsoft offers a decent
OpenGL right now, and then of course there's our work at DEC. Since DEC is
shooting for Alpha workstations at a $3000 price point in the very near
future, and since we're working on those $200 graphics cards same as
everyone else, that puts additional pressure on the Intel commodity space.
Nowadays, there is enough free market competition in the low-end OpenGL
arena to force good performance. It's not going to take years to happen,
it's going to take 1 year. 1 year from now, I see no reason why D3D will
have any speed advantage over OpenGL. Sure, some 3d vendors will fail to
invest their development resources in OpenGL. But they'll pay the price,
and will probably go under.
I think SGI could have provided a better "superstructure" in their SI,
regardless of CPU platform. Sure, you could leave it to individual vendors
to develop CPU-specific code. But the superstructure of the SI was
absolutely terrible. (Haven't looked closely at the latest offerings.)
"if..then" spaghetti going all the way down to the pixel level. Horrors!
Small wonder it ran badly on Intel - and everything else, for that matter.
SGI could have done the work to get the function call architecture and
state selection mechanisms "right". This is pretty much a
platform-independent issue. You pick something with the granularity of a
32-register RISC processor, let the Intel folks worry about narrowing it
down further, etc.
That may be fine from the PCI side of things, but it doesn't do anything
for the cache (in)coherency on the CPU and ASM scheduling side of things.
Also, I wonder how much "scatter/gathering" these bus masters are really
capable of. Especially on garden-variety PC motherboards, using commodity
3d graphics cards. An execute buffer is asking for trouble....
> David, nice hardware solution to a software problem, but writing the
> software in the first place to handle the packet within a single fixed
> memory location is faster than the scatter/burst DMA handling of a
> packet.
Scatter/Gather DMA is great because drivers don't need to bother
about collecting data in one single point: the hardware does it.
> The same would go for graphics after a time. If you want real
> performance, than the system should not be doing any sort of bus
> transfer. It should be able to directly access the system memory, as
> in dual ported ram, etc.
The video memory sits accross a bus. It is good in some cases that
the CPU can access video memory directly, but normally it's way
too slow: general purpose processors cannot possibly compete with
dedicated graphics engines when it comes to graphics performance.
>I have to wonder what sort of performance we're going to have shortly
>with multi processor machines and shared memory. I don't believe that
>3D chips that that much of a life span.
On the contrary, even supercomputers with lots of CPUs use specialized
graphics hardware to render 3D. Also, properly rendering 3D takes a lot
of iron, look for example at SGI's Reality Engine. One Command Processor,
Twelve Geometry Engines, 20 Fragment Generators, 320 Image Engines, in
its largest configuration. That's 353 processors for you, right there.
Alberto.
Brandon Van Every <vane...@blarg.net> wrote in article
<01bbfecf$a861fce0$1c90...@hammurabi.blarg.net>...
> Pardon while I reveal my biases, but everything doesn't depend on just
> SGI's Cosmo OpenGL. People seem to forget that Microsoft offers a decent
> OpenGL right now, and then of course there's our work at DEC. Since DEC
is
Thanks for the good word! Just to reiterate ...
Microsoft OpenGL is still the only 1.1 implementation on the market (though
we believe other vendors will finally be releasing implementations in a
month or so). It's also the fastest software implementation on the market,
going by the published SPEC OPC benchmarks. And the variety and strength
of hardware implementations ranging from system prices of $2K up to $40K is
virtually unmatched. Those implementations include ...
> shooting for Alpha workstations at a $3000 price point in the very near
> future, and since we're working on those $200 graphics cards same as
> everyone else, that puts additional pressure on the Intel commodity
space.
Very exciting!
Steve Wright
OpenGL and DirectX on NT Program Manager
Microsoft Corporation
[snip]
: This sounds a lot like just doing a memory backed FIFO. Why not,
: instead of a display list, just write commands to the card's RAM and
: give the chip a pointer to it?
A display list *is* a memory cache of graphics commands. Once you
put it in random access memory it ceases being a FIFO and becomes
a display list.
Moreover, once you've taken the step of moving the FIFO into memory
why restrict its operation to First In First Out ? Why not have
commonly used display list fragments resident in memory that can be
quickly modified by changing one or two critical parameters and then
called like a subroutine by another display list fragment ?
Indeed, what Alberto is describing is the traditional model for
a graphics co-processor such as the Intel 80786 or the TI 32020.
These chips have their own small native language and data structures
and limiting them to a FIFO for their instruction stream is ludicrous.
Indeed, the Intel 80786 could not only execute program store from
it's own memory but could also use system memory.
Video memory, including that used for program store, should be
simply mapped into CPU address space and the FIFO, if any, merely
used for the traditional purpose of cacheing a single read/write
cycle so that no wait penalties are involved in sychronizing the
video memory bus to the PCI bus.
The Intel 80786 graphics co-processor also had a video memory arbitor
which could be tweaked in order to change priority of accesses to
video memory from the BIU (Bus Interface Unit), GDE (Graphics Drawing
Engine), and the DRU (Display Refresh Unit).
Using a PCI burst bus master in the video chip to increase flexibility
and performance in a graphics co-processor makes infinite sense.
: Because it (supposedly) can lock out access to L2 cache during large DMA
: transfers. DMA is NOT as free as we would like.
So what ? Using the x86 to move data instead of a PCI bus master
absolutely without a doubt ties up not only the L2 bus but it
uses up the L1 bus as well. It's also going to muck up the execution
pipelines in the CPU. PCI burst bus mastering is there to free up
the x86 from the mundane chore of pushing data around.
Depending on the core logic architecture the L2 may or may not be locked
out during a PCI transaction - usually the only thing that happens is
a cache snoop and there's little overhead unless the snoop cycle
indicates that an L1 or L2 cache location was hit during the memory
read/write. A snoop cycle requires only a single access to cache
tag ram - 10 or 15 nanoseconds, and is usually quite transparent,
unless of course there's a hit and then anything from an invalidation
cycle to a write back to a restart of the PCI cycle occurs... again
depending on the core logic configuration, multiple processors or no,
and a few other factors.
But once again, NOT using a bus master to make the transaction is
GUARANTEED to completely tie up the CPU, the L1 cache, and the L2
for the duration of the transaction. It also ruins any parallel
processing going on in the Pentium's dual execution units, fills
the pre-fetch queue with junk, and god only knows what other
performance hits like putting useless MOV instructions into the
L1 code cache, filling the L1 data cache and L2 cache with crud
that will probably never get hit again and thus cause cache
thrashing...
:
: > Global 3D performance depends on other things too; but feeding triangle
: > data fast enough into the pipeline is one most important thing.
: Granted, but the 3Dfx can sink data faster than any other board out
: there.
:
That doesn't mean it can't be improved upon. I fail to see this as
any kind of valid argument. A year ago something else was the
fastest and next year something else will be faster.
: > I don't know the 3Dfx, but remember, we're in the Windows world too.
: > It isn't enough to do fast 3D; you also have to do fast BitBlt and
: > text rendering. And even 3D rendering will profit from a display list.
: Check into the 3Dfx. My benchmarks have shown it to soundly destroy
: everyone in terms of throughput and fill rate. No bus mastering.
What bus mastering cards did you compare it against ? If you did
compare to a bus mastering card what leads you to believe that
presence/absence of bus mastering was the critical difference ?
Let's have some facts about these "comparisons" supporting why they
are valid indicators of whether or not PCI bus mastering and display
list processing is detrimental.
While there's no doubt it could have been improved (can't everything?),
I think the SI infrastructure serves pretty well on a wide variety of
platforms.
Probably what we should have done was focus on a more highly-optimized
x86-specific version earlier. At the time, neither Microsoft nor SGI
felt that was the right way to go. Water under the bridge, I guess...
Allen
Yes all very impressive and all very specialised, and all not suited to
procedural shading.
: Wow, sounds a lot like high-end SGI machines. :-)
: > The only thing the host will worry about
: > is to get the input data fast accross the bus, and that's where display
: > lists help.
: For static data, display lists are fine...but for dynamic data, I'll
: pass.
Well, I've written display list processor drivers and it worked extremely
well for me with very dynamic data. In fact display lists are FOR
dynamic data. Static data you can shove into a dumb bitmap and forget
about it.
: >The immediate need, and first real use that I know of for scatter/gather,
: >is in network controllers. A packet being sent out to the network is
: >composed of various header segments and data that are each typically
: >supplied by one of the OSI software layers. Instead of each layer
: >adding its piece of information to the packet and passing it along
: >to the next layer it simply puts the data in a known local space and
: >the NIC, using scatter gather DMA, will round up all the pieces right
: >before the packet is sent. The reverse process occurs on received
: >packets. This greatly increases the efficiency of the NIC driver
: >and protocol stacks.
: David, nice hardware solution to a software problem, but writing the
: software in the first place to handle the packet within a single fixed
: memory location is faster than the scatter/burst DMA handling of a packet.
Unfortunately that can't always be done. For instance how would you
incorporate the protocol stack into every network aware application ?
Layered software hierarchies have too many upsides in the increasingly
complex environment of the IBM PC "compatible". Non-layered approaches
will always offer better performance but the time to develop a proprietary
non-layered implimentation of anything moderately complex by today's
standards will effectively keep you out of the market. Product lifespans
are too short in this industry to spend much more time than your
competition is spending in devlopment. Delay too long and you miss
the market window of opportunity.
: The same would go for graphics after a time. If you want real performance,
: than the system should not be doing any sort of bus transfer. It should be
: able to directly access the system memory, as in dual ported ram, etc. I
: have to wonder what sort of performance we're going to have shortly with
: multi processor machines and shared memory. I don't believe that 3D chips
: that that much of a life span.
Graphics co-processors come and go. If history is any indication 3D
chips will win the performance race for a while then back to CPU/dumb
bitmap for a while, then back to co-processor... ad infinitum.
Every phase of the above cycle will find someone supporting the current
winner as the winner for all time.
David Springer
: have any speed advantage over OpenGL. Sure, some 3d vendors will fail to
: invest their development resources in OpenGL. But they'll pay the price,
: and will probably go under.
Hardly... their first priority is to invest their development resources
in the performance of MAINSTREAM windows applications. How many
mainstream applications use OpenGL _or_ Direct3D ? Zero... we're talking
Microsoft Office, Lotus Notes, Netscape Navigator, Norton Anti-virus...
you know, the apps that account for 90% of the money people spend on
software...
Now if you're talking about a graphics card company that is going to
try hitting a niche market for games and depend on compelling enough
performance in that arena to convince people to throw over the
graphics card oringinally bundled with their PC - well, unless they
win, and win BIG, in that niche they'll be the one paying the price
and going under.
David Springer
Laptop BIOS Programmer
Dell Computer Corporation
: Cheers,
: --
: Brandon J. Van Every | Free3d: old code never dies! :-)
: | Starter code for GNU Copyleft projects.
: DEC Graphics & Multimedia |
: Windows NT Alpha OpenGL | vane...@blarg.net www.blarg.net/~vanevery
: That may be fine from the PCI side of things, but it doesn't do anything
: for the cache (in)coherency on the CPU and ASM scheduling side of things.
: Also, I wonder how much "scatter/gathering" these bus masters are really
: capable of. Especially on garden-variety PC motherboards, using commodity
: 3d graphics cards. An execute buffer is asking for trouble....
How much do you know about how cache coherency is handled ? A snoop
cycle is pretty painless - 15 nanoseconds to check the tag rams - and
is handled in a nearly completely transparent manner unless and until
there's a hit. There's no problem associated with cache coherency and
PCI bus masters nor is there any significant penalty associated with
maintaining cache coherency.
The penalty in NOT using a bus master when you could is obvious - it
DEFINITELY ties up the CPU, the L1, the L2, the main memory subsystem,
AND the PCI bus. Isn't is nicer to tie up only the PCI bus and main
memory - and most of time leave the CPU, L1, and L2 to continue in
parallel ?
David Springer
Laptop BIOS Programmer
Dell Computer Corporation
Alberto C Moreira <amor...@nine.com> wrote in article
<MPLANET.32d6573...@news.tiac.net>...
> In article <32d59e08...@news.monmouth.com>,
> WStr...@shell.monmouth.com says...
>
>
> >I have to wonder what sort of performance we're going to have shortly
> >with multi processor machines and shared memory. I don't believe that
> >3D chips that that much of a life span.
>
> On the contrary, even supercomputers with lots of CPUs use specialized
> graphics hardware to render 3D. Also, properly rendering 3D takes a lot
> of iron, look for example at SGI's Reality Engine. One Command Processor,
> Twelve Geometry Engines, 20 Fragment Generators, 320 Image Engines, in
> its largest configuration. That's 353 processors for you, right there.
>
>
> Alberto.
>
I agree.
If I need to draw a full screen simulation (games), and I want to use about
1024x768 pixels, that is about a million pixels. Now, say I touch each
pixel 2 times per frame. Now say I want AT LEAST 30 frames/second.
Now, finally, say it takes 20 clocks/pixel. (That sounds reasonable if a
user
is texturing, z-buffering, alpha blending, right? What IS correct?) That
is about
1200 million clocks/second, for a minimim system. I believe that is still
above
where most processors will be in the next few years. And, the only way to
get
20 clocks/pixel will be with MMX or comparable hardware. And, we haven't
even
done any lighting, uv-texture calculations, foging, transforms, scissor,
etc.
And, when host CPU's can do this, lets increase the complexity of the
scene!
Andy Grove said in an interview down at comdex that 3d graphics is
the only field that Intel sees that they will still need dedicated support
for in the forseeable future. Future meaning the next decade, not just the
next few months.
Of course, with PC cards coming on line this year, "Graphics Hardware"
may take up a bit less space in the office and budget!
alan scott
What's the 32020? Was this a precursor to the 34020 or 34010?
> So what ? Using the x86 to move data instead of a PCI bus master
> absolutely without a doubt ties up not only the L2 bus but it
> uses up the L1 bus as well. It's also going to muck up the execution
> pipelines in the CPU. PCI burst bus mastering is there to free up
> the x86 from the mundane chore of pushing data around.
Valid point. The point I was trying to make was that while DMA is neat,
it isn't exactly the wonder technology that it is often positioned for.
People often claim that with bus mastering you can achieve perfect
parallelism or something very close, so my comment on locking L2 was
simply intended to point out that the CPU's operation may be affected
until the bus master operation is complete.
Bus mastering has the added neat attribute of bursting a lot more
consistently than PIO, but at the cost of either.
Right now you have three choices for writing command data to a graphics
accelerator. First off, you can write to a dumb FIFO. This isn't a
good idea, since you poll the FIFO slots and often end up issuing PCI
retries. Alternatively, you can write to a main memory command buffer
and DMA this across. This has several nice advantages others have
noted, but also has some significant disadvantages (the biggest of which
is cache thrashing). Finally, you can write data directly into a
board's RAM via PIO (memory mapped), which is what would seem to make
the most sense. I'm certain variations on the preceding themes can, do,
and will exist, addressing each of their various disadvantages.
> Depending on the core logic architecture the L2 may or may not be locked
> out during a PCI transaction - usually the only thing that happens is
> a cache snoop and there's little overhead unless the snoop cycle
> indicates that an L1 or L2 cache location was hit during the memory
> read/write. A snoop cycle requires only a single access to cache
> tag ram - 10 or 15 nanoseconds, and is usually quite transparent,
> unless of course there's a hit and then anything from an invalidation
> cycle to a write back to a restart of the PCI cycle occurs... again
> depending on the core logic configuration, multiple processors or no,
> and a few other factors.
Very informative and a good description.
> But once again, NOT using a bus master to make the transaction is
> GUARANTEED to completely tie up the CPU, the L1 cache, and the L2
> for the duration of the transaction.
But using a bus master implies writing command data to main memory,
which affects cache coherency, which in turn affects overall
performance...
> L1 code cache, filling the L1 data cache and L2 cache with crud
My point exactly -- with command buffers you are constantly going to be
hitting the same addresses over and over.
> : Granted, but the 3Dfx can sink data faster than any other board out
> : there.
> :
>
> That doesn't mean it can't be improved upon. I fail to see this as
> any kind of valid argument. A year ago something else was the
> fastest and next year something else will be faster.
I didn't say it CAN'T be improved upon -- as a matter of fact, I know it
can. It wasn't an argument, it was a statement of empirical proof -- so
far the fastest 3D accelerator on the market is a PIO, non-bus-mastering
board.
> : Check into the 3Dfx. My benchmarks have shown it to soundly destroy
> : everyone in terms of throughput and fill rate. No bus mastering.
>
> What bus mastering cards did you compare it against ? If you did
> compare to a bus mastering card what leads you to believe that
> presence/absence of bus mastering was the critical difference ?
I did NOT state that I thought bus mastering was the critical
difference. I simply stated that the 3Dfx Voodoo chipset, a
non-bus-mastering chipset, had higher performance than the other chips
I've tested. These other chips include ATI Rage II, Matrox Mystique,
Cirrus Logic Laguna, 3DLabs Permedia/DELTA, Rendition Verite, etc. Most
of these boards support bus mastering.
> Let's have some facts about these "comparisons" supporting why they
> are valid indicators of whether or not PCI bus mastering and display
> list processing is detrimental.
I never said such a thing.
> > In fact, even today, I bet it's possible to build up quite a
respectable
> > implementation of OpenGL on top of Direct 3D immediate mode.
Give me a break.
This may work for 2D, but not for 3D unless you perform geometry in the
chipset. The purpose of 3D is to move things, if things move, their
coordinates change. If their coordinates change the display list changes.
Most display lists are mostly coordinates (textures etc. are already
cached in local memory). Therefore, the display lists change (entirely)
every frame.
>
> : Because it (supposedly) can lock out access to L2 cache during large DMA
> : transfers. DMA is NOT as free as we would like.
>
> So what ? Using the x86 to move data instead of a PCI bus master
> absolutely without a doubt ties up not only the L2 bus but it
> uses up the L1 bus as well. It's also going to muck up the execution
> pipelines in the CPU. PCI burst bus mastering is there to free up
> the x86 from the mundane chore of pushing data around.
It doesn't muck up the execution pipelines if you use memory mapped IO.
> But once again, NOT using a bus master to make the transaction is
> GUARANTEED to completely tie up the CPU, the L1 cache, and the L2
> for the duration of the transaction. It also ruins any parallel
> processing going on in the Pentium's dual execution units, fills
> the pre-fetch queue with junk, and god only knows what other
> performance hits like putting useless MOV instructions into the
> L1 code cache, filling the L1 data cache and L2 cache with crud
> that will probably never get hit again and thus cause cache
> thrashing...
There is no more code to execute copying things to a FIFO than there
would be copying things to a DMA buffer. Therefore, there is no difference.
Actually, if the data is changing every frame - it's often quicker to
write it directly to the PCI bus rather than to memory and then DMA it.
Writing to memory is often slower than writing to PCI. Note that this is
for data that changes every frame, not things like bitmaps or textures
which are static.
> : > Global 3D performance depends on other things too; but feeding triangle
> : > data fast enough into the pipeline is one most important thing.
>
> : Granted, but the 3Dfx can sink data faster than any other board out
> : there.
> :
>
> That doesn't mean it can't be improved upon. I fail to see this as
> any kind of valid argument. A year ago something else was the
> fastest and next year something else will be faster.
You are right, it can be improved upon. But Brian's point is that it's
really a lot fast than people give it credit for.
You seem to be thinking of a different type of "display list". Brian is
talking about 3D that is affected by global state, e.g. a rigid object can
be put in a list but different views produced by rotating the model matrix
and invoking the list. You appear to be talking about 2D commands in
whatever the hardware command format is. You cannot put a static 3D object
into a bitmap (modulo image-warping, but that's a different issue).
Jon
__@/
Not necessarily. High end uses commodity parts whenever possible so that
they can reap price benefits associated with those parts. They use more
expensive parts when it gives better price/performance and there is a
market for that performance. Engineering is about being clever enough
about the architecture to allow more choices (and then making the final
choice actually work :). Making any absolute statements about VRAM, DRAM,
EDO, SDRAM, RDRAM, etc is worthless. I mention it because its not uncommon
for the 'great unwashed' to decide whether a card is good or not by looking
at the kind of memory on it without understanding how it is used.
Economy of scale makes a big difference in cost. What kind of pricing do
you think Nintendo gets for their RDRAMs? What kind of pricing do you
think they get for their other chips? I'd bet 3dlabs and 3dfx could do
more interesting stuff if they could get that kind of pricing.
Your statement seems to imply that for a given cost that vendor X
has produced the best possible card, therefore no one can compete.
That is seldom true, especially in areas of developing technology.
-db
[snip]
: It doesn't muck up the execution pipelines if you use memory mapped IO.
Pardon me ? Please explain that. Under what circumstance do MOV
instructions not affect parallel instruction execution in a Pentium ?
Indeed, under what conditions does memory mapped I/O not tie up the
processor's external bus ? If the BIU is tied up with a string move
operation what happens to the execution pipelines if they can't
continue to be fed or finish instructions that require the BIU ?
: > But once again, NOT using a bus master to make the transaction is
: > GUARANTEED to completely tie up the CPU, the L1 cache, and the L2
: > for the duration of the transaction. It also ruins any parallel
: > processing going on in the Pentium's dual execution units, fills
: > the pre-fetch queue with junk, and god only knows what other
: > performance hits like putting useless MOV instructions into the
: > L1 code cache, filling the L1 data cache and L2 cache with crud
: > that will probably never get hit again and thus cause cache
: > thrashing...
: There is no more code to execute copying things to a FIFO than there
: would be copying things to a DMA buffer. Therefore, there is no difference.
: Actually, if the data is changing every frame - it's often quicker to
: write it directly to the PCI bus rather than to memory and then DMA it.
: Writing to memory is often slower than writing to PCI. Note that this is
: for data that changes every frame, not things like bitmaps or textures
: which are static.
I guess I'm assuming a display list which is composed of high level
graphics instructions in addition to raw data and I'm also assuming
that the graphics chip is able to offload computational tasks. In
my experience with graphics co-processors and display lists, which
is limited to 2D, there were a lot of primitives which required
both data and graphics CPU instructions which were maintained and
manipulated as required in system memory then fed to the graphics
engine. This scenario would be aided by scatter/gather DMA for instead
of the main CPU feeding chunks of display processing instructions and
data to the graphics co-proc the main CPU could simply feed it a
list of the sizes & addresses of those chunks then go start preparing
the next batch.
: > : > Global 3D performance depends on other things too; but feeding triangle
: > : > data fast enough into the pipeline is one most important thing.
: >
: > : Granted, but the 3Dfx can sink data faster than any other board out
: > : there.
: > :
: >
: > That doesn't mean it can't be improved upon. I fail to see this as
: > any kind of valid argument. A year ago something else was the
: > fastest and next year something else will be faster.
: You are right, it can be improved upon. But Brian's point is that it's
: really a lot fast than people give it credit for.
How much credit do people give it ?
David Springer
You can execute two MOV instructions in parallel all day long. There's
no problem. It can execute with all other simple instructions, 2 per clock.
Am I misunderstanding you?
> I guess I'm assuming a display list which is composed of high level
> graphics instructions in addition to raw data and I'm also assuming
> that the graphics chip is able to offload computational tasks. In
> my experience with graphics co-processors and display lists, which
> is limited to 2D, there were a lot of primitives which required
> both data and graphics CPU instructions which were maintained and
> manipulated as required in system memory then fed to the graphics
> engine. This scenario would be aided by scatter/gather DMA for instead
> of the main CPU feeding chunks of display processing instructions and
> data to the graphics co-proc the main CPU could simply feed it a
> list of the sizes & addresses of those chunks then go start preparing
> the next batch.
Correct, but 3D is very different. Unless you do the entire 3D pipeline on
the graphics, 3D display lists are need be kept (and processed) on the CPU.
> : You are right, it can be improved upon. But Brian's point is that it's
> : really a lot fast than people give it credit for.
>
> How much credit do people give it ?
About $.02, :-) The problem is most people just out right assume a card
with DMA is better than a card without DMA because the card without DMA is
slow. This is usually the case, but not always. We (3Dfx) do not support
DMA and so far, no one has higher performance in the same price class.
There are a lot of applications that use OpenGL... way too many to
list here. You may not consider these applications "mainstream" but some
of them (eg., PTC) sell a lot of seats. Also, you might be interested
to know that many applications use OpenGL for 2D rendering and image
operations.
>Now if you're talking about a graphics card company that is going to
>try hitting a niche market for games and depend on compelling enough
>performance in that arena to convince people to throw over the
>graphics card oringinally bundled with their PC - well, unless they
>win, and win BIG, in that niche they'll be the one paying the price
>and going under.
>
>David Springer
>Laptop BIOS Programmer
>Dell Computer Corporation
>
>: Cheers,
>: --
>: Brandon J. Van Every | Free3d: old code never dies! :-)
>: | Starter code for GNU Copyleft projects.
>: DEC Graphics & Multimedia |
>: Windows NT Alpha OpenGL | vane...@blarg.net www.blarg.net/~vanevery
>
>
>--
>
>************ IGAMES ONLINE ENTERTAINMENT SYSTEM ***********
>* *
>* A client/server system designed from the ground up to *
>* support real time multi-player games on the Internet. *
>* Game developers, players, and ISP's check it out at: *
>* *
>****************** http://www.igames.com ******************
>
>Just to be clear: This isn't an issue of missing functionality in
>OpenGL because scene graph management is available via higher
>level API's that sit on top of OpenGL (eg. Inventor, Performer)
Yes, but at what cost. D3D includes both RM and IM functionallity for
free. Why belittle other APIs (D3D and QD3D), by ignoring their RM
level so that we can comfortably compare them to OpenGL. Comparisons
should be made between the functionallity of each APIs generic
install.
Very good points. Again, Dump trucks and sports cars have different
objectives. OpenGL's roots have assumed that the target machine will
have hardware to assist in the process. D3D on the other hand assumes
that the target machine will not have hardware assistance.
I'll be glad when Cosmo3D comes out and I can start worrying about
application functionallity not wrapper functions.
Time will tell. Thank you for the clarification,
Michael.
> Moreover, once you've taken the step of moving the FIFO into memory
> why restrict its operation to First In First Out ? Why not have
> commonly used display list fragments resident in memory that can be
> quickly modified by changing one or two critical parameters and then
> called like a subroutine by another display list fragment ?
Because this isn't the way normal 3D graphics works in a
rasterization-only piece of hardware. If you perform transformations in
hardware then yes, it makes sense to do all this, but effectively this
is useless for rasterization hardware since the geometry (2D triangles)
will change every frame.
> Video memory, including that used for program store, should be
> simply mapped into CPU address space and the FIFO, if any, merely
> used for the traditional purpose of cacheing a single read/write
> cycle so that no wait penalties are involved in sychronizing the
> video memory bus to the PCI bus.
Where a memory backed FIFO wins is that it can automatically begin
sinking graphics data as you write it. What you propose is a display
list that you fill then execute, which effectively reduces parallelism
by a huge amount. While in practice a random access display list could
be used like a FIFO, you would have contention issues between the CPU
and the graphics processor as they determined their own ideas of "the
current command pointer".
A memory backed FIFO, on the other hand, is completely maintained by the
graphics processor. The CPU writes to a fixed address space and
commands and data are buffered into the FIFO behind the scenes. The CPU
is not responsible for managing any command pointers and, more
importantly, the graphics processor is free to process data IMMEDIATELY.
> So what ? Using the x86 to move data instead of a PCI bus master
> absolutely without a doubt ties up not only the L2 bus but it
> uses up the L1 bus as well.
I'm sorry, but at what point does DMA prevent you from ever having to
write data? Whether you use DMA or PIO you are going to write command
and data traffic.
PIO = CPU->graphics chip
DMA = CPU->memory->graphics chip
One makes a stop in main memory (a Bad Idea), the other doesn't.
Here, for your education, is how a typical code sequence in ASM would
look like for some PIO writes:
mov edi, reg_base
mov [edi+REG0], data
mov [edi+REG1], data
mov [edi+REG2], data
(Obviously "data" is going to be some register that needs to be loaded,
but this would be the case with filling a DMA buffer, so I'm leaving
that out for brevity and clarity).
Now, to do this with DMA, you could do something like:
mov edi, dmabuffer
mov [edi+0], command
mov [edi+4], data
mov [edi+8], command
mov [edi+12], data
Many DMA architectures require that address information be sent within
the DMA buffer, or alternatively that you tag the data with a header
indicating what kind of data is going to follow. This is extra
bandwidth in either case.
Fundamentally the two look about the same -- lots of MOV commands, the
difference being the addresses being written to and the content of the
data.
In practice, the DMA buffer suffers from several drawbacks, the biggest
of which is going to be cache coherency. Every time you write something
out to the command buffer it's probably going to stop in L1 cache
temporarily then eventually get kicked out to main memory at some point
(on Pentium, with its write-through architecture, whether writes stop in
the cache or not is determined by whether you've prefetched the data or
not).
So with DMA buffers of, say, 4K size, you are eating up half of your L1
cache with DMA writes.
PIO, on the other hand, doesn't thrash your cache. You can comfortably
feel that your 8K data cache is going to be filled with the data that
you want it to be (within reason).
> It's also going to muck up the execution
> pipelines in the CPU. PCI burst bus mastering is there to free up
> the x86 from the mundane chore of pushing data around.
How do you figure this? Given the above examples, the execution
pipelines are going to be affected the same whether you are DMAing or
using PIO -- the instruction sequences are nearly identical.
Data must be pushed around regardless of DMA or PIO, the only difference
is the extra level of indirection that is required for DMA. This is not
a winning situation. If writes to main memory were SIGNIFICANTLY faster
than writes to PCI, then I would almost lean towards DMA for command
traffic if for no other reason than you could stream data faster to main
memory than PCI.
However, today's PCI implementations are pretty damned good, and the
Pentium is still suffering pretty badly from poor main memory bandwidth,
so PIO to PCI is still the faster of the two.
> It also ruins any parallel
> processing going on in the Pentium's dual execution units
Can you elaborate on this? Last time I checked you can execute mov
instructions in parallel so long as there are not any existing bank
conflicts.
Note that this is from experience -- I've written my fair share of code
to get to graphics devices via PIO, and I've measured all these timings
down to the clock cycle.
At no point has PIO ruined parallel execution for me, at least not in a
way that wouldn't cause the same effect writing to a DMA buffer.
Note also that DMA has another significant problem, mostly that Win95
doesn't cleanly allow you to setup the TLBs to lock down physical
addresses for DMA. This is yet more complexity that PIO eschews.
, fills
> the pre-fetch queue with junk, and god only knows what other
> performance hits like putting useless MOV instructions into the
> L1 code cache
Once again, are you saying that filling a DMA buffer doesn't involve MOV
instructions? THAT would be a pretty neat trick.
, filling the L1 data cache and L2 cache with crud
> that will probably never get hit again and thus cause cache
> thrashing...
Actually, PIO is SIGNIFICANTLY better than DMA in terms of the L1 data
cache since it isn't inadvertently flushing it all the time. On a
Pentium either the writes are going to be through the cache (and
extremely slow), or written back into the cache and flushed out a line
at a time to main memory (which is faster than not caching), but if that
happens you can kiss cache coherency good bye.
> OpenGL's roots have assumed that the target machine will
> have hardware to assist in the process. D3D on the other hand assumes
> that the target machine will not have hardware assistance.
Bingo. So if you want hardware acceleration, OpenGL is a better API.
Although I still think a software only version of OpenGL is probably
going to be close to D3D's performance (and better in some areas).
But there will always be limits on how much can be scattered or gathered.
Thus, you have pretty much the same software design problem anyways.
> The video memory sits accross a bus. It is good in some cases that
> the CPU can access video memory directly, but normally it's way
> too slow: general purpose processors cannot possibly compete with
> dedicated graphics engines when it comes to graphics performance.
I'm willing to wager differently on that. Use DejaNews for my past posts
on superscalar ASM algorithms for details. I think maximum superscalar CPU
performance is an area that's pretty much unexplored.
> On the contrary, even supercomputers with lots of CPUs use specialized
> graphics hardware to render 3D. Also, properly rendering 3D takes a lot
> of iron, look for example at SGI's Reality Engine. One Command Processor,
> Twelve Geometry Engines, 20 Fragment Generators, 320 Image Engines, in
> its largest configuration. That's 353 processors for you, right there.
Ever thought that SGI uses so much iron, mainly because they're in the
business of selling that iron?
There is no 3Dfx support though, except for the promise of a skeleton
GL driver.
Too bad Intergraph (or Rendition) hasn't released an OpenGL driver for
the Verite, and too bad Carmack doesn't provide the cpu-lit textures
as an option...
The Verite is such a good design, I hope Rendition's next chip will be
similar. What I REALLY REALLY REALLY hope is that someone puts out a
GL driver for Win95. I sold my millenium for a Reactor, and now I'm
doing OpenGL development and Vquake has become boring. Of course
knowing Matrox's driver's, I'm probably not missing much.
>Now if the rumors are true that Microsoft is going to punt the execute
>buffers and go to a triangle based API there really won't be any
>technical difference left between OGL and D3D, and these discussions
>can finally degenerate completely into marketing spam...
Direct3D's position is too tenuous for a complete overhaul.
----
Brought to you by the "Vote 7.62x51 for Real Change" Committee.
Don't stick anything through my front door you don't plan on losing.
Writing directly to the bus however is not a SCALABLE design. It
utterly fails to take advantage of potential performance increases in
new SYSTEMs. The speed of your graphics card is fixed. The speed of
system memory and transparency of DMA may change with any and every
new chipset.
If on some system DMA is transparent, and access to system memory is
MUCH MUCH MUCH faster than access to your card via the PCI bus, your
card completely fails to take advantage of this. That's virtually a
design flaw, as PCI DMA is very cheap to implement.
>You are right, it can be improved upon. But Brian's point is that it's
>really a lot fast than people give it credit for.
A high fill rate is meaningless to me. I'd be very interested to see
the voodoo under OpenGL with POLYGONs, LOTS of them. Perhaps it will
do very well under those circumstances, but that is my measure of
utility.
Certainly the vast majority of people use applications for which 3D
acceleration is irrelevant today. All graphics hardware vendors must
pay due diligence to those customers, and accelerate the kind of
applications you mentioned (Word, Netscape, etc.). If they don't do a
decent job of it, they won't survive in the long run.
However, as Brandon pointed out, vendors of *3D* accelerators must also
invest in support for one or more 3D APIs, and OpenGL is a critical
one. One reason for this is that OpenGL is used by applications
required by the current buyers of 3D accelerators, as Paula noted.
There are enough customers for this sort of functionality, and enough
chance that the market will expand, that a lot of companies feel the
business is worthwhile -- even though it's small compared to the
word-processor or database businesses.
Someone else (maybe it was Mike Chapman? I forget) predicted that
entertainment applications eventually will be a much larger business
than current mainstream applications. I don't know whether that's
true, but it's clear that some major players expect the use of 3D
graphics to become much more widespread. For example, Intel announced
a deal with Lockheed-Martin to provide 3D acceleration on Intel
motherboards sometime later this year. I don't think they would have
been able to justify the added cost if they didn't expect a long-term
demand for higher 3D performance.
Allen
CosmoGL is going to be 100% compatible with microsoft's OpenGL from
what I've seen.
I don't see what it matters anyhow. 3D hardware will rule the market.
A general purpose CPU is just UTTERLY inappropriate for realtime 3D.
>If it's not suitable for games, then Direct3d is
>what we end up with, like it or not (and since nobody responded with a
>yes to my question if anyone actually liked d3d, I think the answer is
>"not" for everyone in this newsgroup).
It'll take another year for any substantial number of direct3d games
to come out. If there are lots of OpenGL drivers in 10 months, other
developers will be able to crank out more cool games with accurate
visual effects in the remaining 2 months. Bet on it.
Direct3D is constantly changing. The API is a LONG learning curve.
It's silly crap. It has no legitimate reason to exist by any measure.
CosmoGL is irrelevant. Software rendering will be a painful memory in
another year.
People will buy a fast 3D card for $200 like they'll buy a N64 for the
same. In fact, they'd do well to buy something like an N64 on a card.
This could drop into any PCI computer and offer stellar performance.
End of story. 3D hardware is the deal.
>Brandon Van Every (vane...@blarg.net) wrote:
>
>: have any speed advantage over OpenGL. Sure, some 3d vendors will fail to
>: invest their development resources in OpenGL. But they'll pay the price,
>: and will probably go under.
>
>Hardly... their first priority is to invest their development resources
>in the performance of MAINSTREAM windows applications. How many
>mainstream applications use OpenGL _or_ Direct3D ? Zero... we're talking
>Microsoft Office, Lotus Notes, Netscape Navigator, Norton Anti-virus...
>you know, the apps that account for 90% of the money people spend on
>software...
MAINSTREAM software is going to be made to look like flea market wares
in comparison to consumer entertainment electronics.
That's all the free investment advice you're getting today. :-)
>Now if you're talking about a graphics card company that is going to
>try hitting a niche market for games and depend on compelling enough
>performance in that arena to convince people to throw over the
>graphics card oringinally bundled with their PC - well, unless they
>win, and win BIG, in that niche they'll be the one paying the price
>and going under.
"niche market for games"
Heheheh. Ok buddy, whatever you say.
: There are a lot of applications that use OpenGL... way too many to
: list here. You may not consider these applications "mainstream" but some
: of them (eg., PTC) sell a lot of seats. Also, you might be interested
: to know that many applications use OpenGL for 2D rendering and image
: operations.
Windows NT sells a lot of seats if you don't compare it to Windows 3.1
or Windows 95. Relatively speaking there are zero mainstream applications
that use OpenGL.
Reply with a list of all the OpenGL applications that have sold a
million or more copies.
David Springer
It's all in their fill rate. No proof of anything beyond that.
You said before that under some circumstances PCI DMA can interfere
with the L2 cache. When is that? Is that with all the chipsets and
configurations?
With such a high fill rate on the Voodoo DMA is probably not as
critical as a chip like the Verite. On the Verite, the CPU *needs* to
be doing stuff like lighting calcs while the card grabs its textures.
The lighting/other geom calcs will stay neatly enough in the cache to
make DMA a big advantage.
The same DMA PCI card in a computer with a smarter arbitration and
more sophisticated bus will see a performance gain - if parallelism is
of benefit. PCI PIO is PCI PIO. You have to load textures at some
point... More sophisticated games will have LOTS more textures. With
a DMA hard disk controller and a DMA graphics board, the benefits are,
well, self-evident.
In other words, if I buy a $1000 PCI graphics board and it uses PIO,
it can never take advantage of that new multi-pathway, multi-processor
PowerPC system with 128 megs of quad-ported RAM and UBERKLUGE
arbitration chip to its full potential.
You have a lot of faith in incredible advancements in the very nature
of general purpose chips. Further, what ever technology would bring
those advancements would likely improve the performance of 3D chips as
well.
The sort of chip which TRULY can perform ANY logic on ANY data as
effeciently as technologically possible is a long way off. It's a
nice idea. It will probably happen some day. Expect to see a LOT of
custom 3D hardware before then. Expect to see direct neural links
before then. Expect to see artificial intelligence around the same
time.
We used that much hardware on RealityEngine because (at the time) it
was needed to deliver the performance and feature-set required by
high-end markets. Technology marches on; no one would use that many
chips to build a RealityEngine-class machine today.
High-end machines still need a lot of silicon, though. If I've counted
correctly, a single maxed-out InfiniteReality subsystem contains about
156 ASICs. It has more transistors than RealityEngine, despite the
lower ASIC count. A surprisingly large number of people have
performance requirements so demanding that they need to lash together
several such subsystems.
Allen
Are copies sold or dollars sold more relevant? You'd be VERY surprised
at the sheer amount of dollars funneled into CAD/CAM, MCAD, MI, vis-sim,
and GIS applications.
> If on some system DMA is transparent, and access to system memory is
> MUCH MUCH MUCH faster than access to your card via the PCI bus, your
> card completely fails to take advantage of this. That's virtually a
> design flaw, as PCI DMA is very cheap to implement.
That's a might big "if". Show me a system where DMA is transparent and
where access to system memory is MUCH MUCH MUCH faster than access to
PCI and I'll take the above comment with a bit more seriousness. The
only systems that I'm aware of that MIGHT qualify for the "fast memory,
slow bus" would be some older DEC Alpha systems -- and these are only
like that because their PCI implementations were so bad.
In general real PCI bandwidth is increasing MUCH faster than memory
bandwidth. Witness the evolution of Intel's chipsets from Neptune to
Triton to Triton II to Orion -- the Neptune was lucky if it could do
30MB/sec, and now the Orion can sustain some 80MB/sec.
On the vast majority of systems, that being Wintel boxes, sustained PCI
bandwidth is far higher than sustained main memory copy bandwidth.
> A high fill rate is meaningless to me. I'd be very interested to see
> the voodoo under OpenGL with POLYGONs, LOTS of them. Perhaps it will
> do very well under those circumstances, but that is my measure of
> utility.
The Voodoo does just fine with lots of polygons. Compared to Verite or
even 3DLabs Permedia/Delta it can sustain a far higher number of
triangles/second under Direct3D.
Keep in mind that the chief architect of the Voodoo is someone with a
LOT more experience with OpenGL than Direct3D, and that the Voodoo, in
many ways, is an ideal OpenGL accelerator.
This is NOT true. 3Dfx has posted faster triangle throughput than any
other accelerator I've tested, with the exception of the Permedia/Delta
but only when using heavily meshed triangle sets.
Other companies argue that the 3Dfx does not allow significant overlap,
which has also proven untrue -- physics intensive games still run faster
on the 3Dfx than on bus-master-based systems.
I think that's germane to the point I was trying to make: you need to keep
track of the price/performance curves as well as the absolute performance
curves. At the high-end, everyone does dedicated silicon because the
generic processors aren't "there yet." Historically, SGI has staked out
the high-end market. That doesn't make 3d graphics an intrinsically
silicon-heavy problem, though. It just means you need a lot of silicon to
do things that are way way farther ahead than what everyone else is doing.
Also, as I've said in the past I don' t think the possibilities of good
superscalar software implementations have been fully exploited yet. Hope
to do so in the coming year.