Shader Model 3 Problems

Jim Henriksen

unread,

Mar 8, 2008, 4:38:01 PM3/8/08

to

Background:

1. I have code that takes advantage of Model 3 vertex shaders, when
present, to efficiently process large numbers of instances of common
geometry; i.e., I use SetStreamSourceFreq().

2. I use a number of self-constructed vertex shaders (written in HLSL), but
no pixel shaders.

3. This code has worked for quite some time with a variety of video
hardware; however, it fails on ATI 2400XT hardware.

4. If I use software vertex processing with the 2400XT, everything works.
Hardware vertex processing fails silently. All the DirectX API calls return
D3D_OK, but RenderPrimitive calls have no visable effect. My application
window appears, and I'm able to clear the screen to any color I choose, but
"drawn" geometry simply does not appear.

My questions are as follows:

1. The DirectX SDK documentation contains the following statement:

If you are implementing shaders in hardware, you may not use vs_3_0 or
ps_3_0 with any other shader versions, and you may not use either shader type
with the fixed function pipeline.

Does this mean that I *must* provide my own pixel shader(s) if I want to use
Shader Model 3 with custom vertex shaders? Until very recently, all the
hardware I had encountered did support Model 3 vertex shaders without custom
pixel shaders.

2. The relationship between what's implemented in drivers and what's
implemented in hardware is not clear to me. On hardware that supports Model
3, I can compile my vertex shaders, restricting the shader model to VS_2_0,
and still use SetStreamSourceFreq() instancing, which requires Model 3
support. This approach has worked on the handful of cards I have tested.
Does this approach violate any rules?

3. The absence of good documentation really hurts. The DirectX SDK
documentation on shaders in general, and Model 3 in particular, is woefully
inadequate. Can anyone recommend a good source of Model 3 documentation with
examples?

TIA
--
Jim Henriksen

Richard [Microsoft Direct3D MVP]

unread,

Mar 8, 2008, 7:14:14 PM3/8/08

to

[Please do not mail me a copy of your followup]

=?Utf-8?B?SmltIEhlbnJpa3Nlbg==?= <JimHen...@discussions.microsoft.com> spake the secret code
<78FA1CD1-6A64-4559...@microsoft.com> thusly:

>1. The DirectX SDK documentation contains the following statement:
>
>If you are implementing shaders in hardware, you may not use vs_3_0 or
>ps_3_0 with any other shader versions, and you may not use either shader type
>with the fixed function pipeline.
>
>Does this mean that I *must* provide my own pixel shader(s) if I want to use
>Shader Model 3 with custom vertex shaders?

Yes.

>2. The relationship between what's implemented in drivers and what's
>implemented in hardware is not clear to me. On hardware that supports Model
>3, I can compile my vertex shaders, restricting the shader model to VS_2_0,
>and still use SetStreamSourceFreq() instancing, which requires Model 3
>support. This approach has worked on the handful of cards I have tested.
>Does this approach violate any rules?

No.

>3. The absence of good documentation really hurts. The DirectX SDK
>documentation on shaders in general, and Model 3 in particular, is woefully
>inadequate. Can anyone recommend a good source of Model 3 documentation with
>examples?

Sorry, I don't have any particular pointers here.
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

Legalize Adulthood! <http://blogs.xmission.com/legalize/>

Jim Henriksen

unread,

Mar 8, 2008, 10:20:01 PM3/8/08

to

Dear Richard:

Thank you for your reply. Please allow me a brief follow-up.

1. Your answer is what I suspected. My misfortune is that for me the
"mixed" approach worked on lots of cards before I encountered problems. The
obfuscatory SDK prose leaves a great deal to be desired. Warnings to the
effect that "it's all or nothing-at-all with respect to Model 3" would have
saved me a couple days of work and a 160-mile round trip to a customer's
site. Warnings about consistency requirements should also be placed in the
the documentation of yhe various D3DXCompileShader... functions. To the
extent that you have contact with MS, please implore them to improve their
docs.

2. Are you sure about your answer to my second question? The documentation
for "Efficiently Drawing Multiple Instances of Geometry" states that

"This technique [SetStreatSourceFreq] requires a device that supports the
3_0 vertex shader model. This technique works with any programmable shader
but not with the fixed function pipeline."

My interpretation of your answer is that as long as the *device* supports
Model 3, I'm OK, even though I compile my shaders using Model 2. I tried
this out of desperation and was mildly surprised that it worked. I worry
whether this behavior will be consistent across all video hardware. Is it
the case that drivers "feed" the video hardware looping and indexing
information independent of my vertex shaders?

Thanks.
--
Jim Henriksen

Richard [Microsoft Direct3D MVP]

unread,

Mar 10, 2008, 6:00:36 PM3/10/08

to

[Please do not mail me a copy of your followup]

=?Utf-8?B?SmltIEhlbnJpa3Nlbg==?= <JimHen...@discussions.microsoft.com> spake the secret code

<93806033-8BC7-427A...@microsoft.com> thusly:

>2. Are you sure about your answer to my second question? The documentation
>for "Efficiently Drawing Multiple Instances of Geometry" states that
>
>"This technique [SetStreatSourceFreq] requires a device that supports the
>3_0 vertex shader model. This technique works with any programmable shader
>but not with the fixed function pipeline."

Oh, I misread your earlier post; yes, this is SM 3 specific as the
docs state.

Jim Henriksen

unread,

Mar 10, 2008, 9:57:01 PM3/10/08

to

Dear Richard:

I took a look at the SDK example that shows instancing, and the
hardware-assisted approach that "requires a device that supports the 3.0
vertex shader model" uses a vertex shader that's compiled at the 2.0 level.

Thus it would appear that the 3.0 requirement is hardware/driver only. This
is the way my work-around approach is implemented, and since it agrees with
the SDK example, I guess I'm OK, although the inconsistency seems odd.

Thanks for your help.

Regards
--
Jim Henriksen

"Richard [Microsoft Direct3D MVP]" wrote:

Richard [Microsoft Direct3D MVP]

unread,

Mar 11, 2008, 5:41:27 PM3/11/08

to

[Please do not mail me a copy of your followup]

=?Utf-8?B?SmltIEhlbnJpa3Nlbg==?= <JimHen...@discussions.microsoft.com> spake the secret code

<666A9B33-B63D-440F...@microsoft.com> thusly:

>Thus it would appear that the 3.0 requirement is hardware/driver only. This
>is the way my work-around approach is implemented, and since it agrees with
>the SDK example, I guess I'm OK, although the inconsistency seems odd.

Check it against the reference rasterizer; if no errors come out then
the docs are incorrect.

Jim Henriksen

unread,

Mar 11, 2008, 6:27:07 PM3/11/08

to

Richard [Microsoft Direct3D MVP] wrote:

> [Please do not mail me a copy of your followup]
>
> =?Utf-8?B?SmltIEhlbnJpa3Nlbg==?= <JimHen...@discussions.microsoft.com> spake the secret code
> <666A9B33-B63D-440F...@microsoft.com> thusly:
>
>> Thus it would appear that the 3.0 requirement is hardware/driver only. This
>> is the way my work-around approach is implemented, and since it agrees with
>> the SDK example, I guess I'm OK, although the inconsistency seems odd.
>
> Check it against the reference rasterizer; if no errors come out then
> the docs are incorrect.

Dear Richard:

Thanks for your help.

I tried using the reference device, but that was a bust, because vertex
buffers created were aligned on multiples of 8, but not multiples of 16.
This caused alignment exceptions in my SSE code. My code failed on a
MOVAPD instruction, which requires 16-byte alignment.

This leaves me high and dry as far as using the reference device for
confirmation. What a can of worms this whole thing is! I may burn one
of my MSDN tech support calls to get an answer.

Regards,
Jim Henriksen

Jan Bruns

unread,

Mar 15, 2008, 11:37:57 PM3/15/08

to

"Jim Henriksen":

> I may burn one of my MSDN tech support calls to get an answer.

Ah, cool.
What I don't get is why this question is so important to you. If your
project already requuires targeting to sm3 hardware, it's not a big deal
to also target to sm3. Ok, some gpu optimization guides recommend to always
use the least possible sm version for performance reasons, but I personally
haven't seen any noticeable difference in performance, when I compared
sm3 vs sm2 (on my computer).

Also, when targeting to windows platforms, better expect many of the
operting system's details to behave exactly the opposite of what the
corresponding SDK defines (apart from details that aren't even mentiioned).

So in nmy opinion, you probably better should do the same thing that micosoft
does: merely invest into testing then into planing. Even more if this issue
is turning out to be really relevant. It won't help your customers if the
ms-support tells "yes, we meant that you can use sm2 with instancing on
sm3 hardware" if some driver maintainers decide to drop support for this
combination somewhere in future (for example caused by misinterpretation
of an imaginary "is_sm3 flag").

I guess the only situation where using separate sm2/sm3 paths (quite easy)
would be content-less engine stuff, distributed over the web to custumers
using dialup-modems (in this case, the additional code-size could mean up
to doubled download times). Normally not a real problem.

Gruss

Jan Bruns

Jim Henriksen

unread,

Mar 17, 2008, 6:12:26 PM3/17/08

to

Dear Jan:

The unfortunate thing about testing is that you rarely can prove that
things work by testing; you can only prove that they don't work.

My animations frequently have large numbers of moving objects. For
example, an air traffic control animations can easily have 5,000
airplanes. Typically only one or two generic, low triangle count
objects are used to represent airplanes. My goal is to run such
animations at hardware refresh rates, e.g., 60Hx or 70Hz, so
notwithstanding the low per-object triangle counts, I need to extract
maximum performance from the hardware.

Hardware-implemented instancing makes a large difference in performance.
The question is "what *is* the precise definition of shader model 3?"
The SDK docs are clear as mud.

One rule that Richard has confirmed is that with shader model 3, you
can't replace part of the fixed function pipeline; you have to replace
it *all*. If you compile HLSL shaders using model 3, you must provide
pixel shaders, as well. Some video hardware/driver combinations do not
enforce this rule; i.e., on some hardware, you *can* supply a model 3
vertex shader without providing your own pixel shaders. In fact, I only
recently encountered hardware that enforces the rules.

Since I have quite a number of custom HLSL vertex shaders, I would have
to implement custom pixel shaders to go along with them, and I want to
avoid that if possible.

The indexing example in the SDK uses hardware instancing, which requires
model 3 hardware support, but the vertex shader used is compiled as
model 2.

Right now, I'm using the same approach as the SDK example:

1. If shader model 3 is supported, use hardware instancing. I call
GetDeviceCaps to see whether PixelShaderVersion is greater or equal to
model 3.

2. I compile the vertex shaders using model2, avoiding the
all-or-nothing-at-all requirement that comes with model 3.

This seems to work just fine, but I'm seeking confirmation as to whether
this approach is truly legal.

Richard's suggestion was to test my code against the reference device.
This didn't work because the reference device can create things such as
vertex buffers that are 8-byte, but not 16-byte aligned. Real
devices/drivers always return 16-byte aligned buffers. Since I use SSE
instructions requiring 16-byte alignment, the reference device is
worthless to me.

The consequence of all of this is that like so many of the rest of you,
I've had to burn days tracking down an answer because the MS docs really
suck.

Regards,
Jim