Hi All,
This post complements our previous post "[angleproject] Solving slow compilation (and eventual fail) in complex shaders (with patch)" with a related but different problem. Previously we discussed a solution for slow compilation of long loops by preventing their unrolling. It has been said that Chrome 12 will compile shaders without unrolling to improve this. We have tested Chrome 12 and certainly improves fractal.io as much as our solution with "[loop] [fastopt] for (...)".
Now, there are other cases, such as that reported by John Davis (http://www.pcprogramming.com/flight.html), and others, where loops need to do texture sampling (i.e. texture2D(tex, uv); ).
The problem here seems to be the following or a similar issue: texture2D() translates to tex2D() in HLSL. tex2D is said to be a "gradient instruction" because it uses mipmapping (even if we defined MIN_FILTER LINEAR!). HLSL does not allow gradient instructions in true loops (at least in loops with "break" which I can't find is that demo (?)), so upon seeing that call, DX forces an unrolling, even if Chrome 12 is trying to avoid that. The result is a very long compile time and a possible error if the loop is too deep and can't be unrolled.
In HLSL mipmapping can be avoided by using tex2Dlod(tex, uv, 0, 0) [the 0s being the levels chosen]. Great. Is there a similar function in GLSL? yes: texture2DLod(). Can the shader developer just change it? No: GLSL only allows texture2DLod() in vertex shaders, not fragment shaders.
A solution implies Angle emitting a HLSL "tex2Dlod(...,0,0)" instruction from a source GLSL "texture2D(...)". We tested this, again in a custom-built Angle library. And it worked great for our application which does heavy texture sampling inside long loops.
Can that translation be done always as in our naive fix? Not really. There are plenty of applications that need correct mipmapped sampling for good minification of textures. The translation then needs to be done selectively, only where necessary. The rule can be to check whether the texture2D call is inside a loop or when inside a loop of more than N iterations. [the first seems easier to do]:
texture2D(a,b); out of loop => tex2D(a,b);
texture2D(a,b); inside loop => tex2Dlod(a, b, 0, 0);
We propose this change or equivalent fixes as we definitely need sampling in loops, following the necessary testing so it does not break anything. Pure OpenGL works with no problems, BTW.
For reference, we are using Angle SVN trunk (rev 598), compiled under VS2008 and then, replacing resulting DLL in FF4. I have upgraded MS SDK Platform and DX SDK to the latest one, having a Win7 64bits PC with a nVidia GTX485. BTW, what is different in Chrome 12? doesn't it use regular SVN-trunk Angle?
The GLSL ES specification does not guarantee all loops with dynamic indexing
to compile/link successfully. See sections 10.25, 10.35, and Appendix A item
5. So strictly speaking ANGLE is conformant and even certain code which does
compile successfully with ANGLE may not work as expected on other platforms.
That said, the reason HLSL does not allow functions which compute
screen-space gradients inside dynamic loops is because these gradients would
be unpredictable when neighboring pixels take different branches. Unrolled
loops don't suffer from this because even if logically different branches
are taken, all paths are still guaranteed to be executed so there are
meaningful gradients.
The GLSL ES spec doesn't say how gradients should be computed when an
implementation does support dynamic loops, but the desktop GLSL spec
(section 8.10.1) states that "derivatives within nonuniform
control flow are undefined". This is a significant caveat because even if
your shader does compile successfully, you're not guaranteed to get the same
result on different platforms. Therefore it's questionable how useful this
extended ability is in the first place. Note that HLSL really does a best
effort at creating repeatable results, but you simply might run out of
instruction slots supported by the hardware.
In theory non-mipmapped texture lookups could be supported by implementing
them using tex2Dlod, but this would significantly complicate things because
there can be many different versions of the same shader. In my humble
opinion the effort does not outweigh the advantage. As far as I can tell
only convoluted tech demos might use this functionality. I sincerely doubt
that any Shader Model 3.0 game on the market had to resort to using tex2Dlod
because it required texture lookups in dynamic loops...
Anyhow, there might be a way around the HLSL limitation to get the same
undefined behavior as GLSL, by manipulating the shader binary. As far as I
know, assembly shaders don't impose any limits on the use of instructions
which compute gradients. So you could use tex2Dlod (texldl) everywhere, and
then after the shader binary has been generated, change them back to regular
texld instructions. The binary format is well defined:
http://msdn.microsoft.com/en-us/library/ff552891(v=VS.85).aspx. If you can
get this to work without drawbacks, we might consider exposing it as an
extension. It has a high hackyness level though, so no guarantees.
Cheers,
Nicolas
That said, the reason HLSL does not allow functions which compute
screen-space gradients inside dynamic loops is because these gradients would
be unpredictable when neighboring pixels take different branches.
I sincerely doubt that any Shader Model 3.0 game on the market had to resort to using tex2Dlod
because it required texture lookups in dynamic loops...
You are right. That's why we propose to find a way to apply a smart
selective change.
Let me state the problem in a different way:
We are not talking about saving several seconds as in the flight.html
demo. In fact, we faced this problem more severely. Our application,
with moderate iteration settings takes more than 1 minute trying to
compile (with a few "your script is taking too long" alerts) and
finally fails. I don't have the numbers here, I think we have to hit
"stop script" after minutes.
Our fix makes compilation possible, and in only a couple seconds, the
same as native GLSL. Rendering itself is acceptably fast after that
for such a shader.
So, yes, a rule has to be chosen to apply the tex2Dlod() trick only
when necessary that does not break the rest of cases. Some
suggestions:
- texture reads inside loops
- texture reads inside loops longer than N iterations (N=10?)
- texture reads inside loops when the texture has a MIN_FILTER=LINEAR
or NEAREST, ...
That last idea might be the safest. Is it possible in Angle to know
the texture filtering mode? (texParameteri TEXTURE_MIN_FILTER) If the
mode for the relevant texture is LINEAR or NEAREST (not _MIPMAP_) then
using tex2Dlod is just fine, right?
This sounds like the best approach to me, and given the severity of
the problem, I think this is a workaround we should implement in ANGLE
as soon as possible.
Alvaro, would it be possible for you to provide a patch for your
changes to ANGLE implementing this alternate translation? It would be
ideal if you could produce and upload your patch using gclient / gcl
so that it can be easily reviewed.
Thanks,
-Ken