Hi!
We have been noticing that complex shaders with relatively long loops
take very long time to compile. An example is
http://fractal.io/. As
no. of iterations increases compile times grow to very long times.
Using the default values (Menger Sponge with 8 iterations and 60 max
steps) on Chrome or FF4 with Angle as backend is significantly slower
than the same example with Angle disabled (pure GLSL) [1]. Compilation
without Angle is also very fast.
Increasing iterations to 30 leads to a compilation failure (Error
1281) with Angle, but, it works fine and smooth without Angle.
Actually the problem is when she shader program is linked
[gl.linkProgram()], which tiggers conversion to DirectX and actual
compilation of HLSL. We suspected the last compilation by the DX
runtime is what takes that much time.
It seems DX is trying to unroll all loops, which for long loops is a
lot of work, apparently. And for very long loops, just fails with a
message "unable to unroll loop". It is weird because our hardware
(common today) is perfectly able to run real loops with an iteration
counter.
Therefore, we have inspected the Angle source to find out how the HLSL
code is generated and we found that prepending "[fastopt] [loop]" [2]
to the "for" statement in the GLSL->HLSL translation solves this
issue. We modified this, build the Angle libs (ligEGL*.dll) and
replaced Firefox's to test.
[loop] tells the HLSL compiler that we want real loops, and prevents
unrolling. This alone improved compile times clearly but not as much
as pure GL. [fastopt] tells the compiler to not attempt optimizations.
Adding also this, improves compile time very significantly, near GL.
We could open a new issue in the bug tracker if you seeit is worth. It
works for this problem but its implications should probably be
studied.
Performance is very good and doesn't seem to be affected by the lack
of unrolling or optimizations. We are unsure now if this would prevent
code to run in simpler hardware not supporting real loops (complex
shaders will fail there anyway).
The suggestion would be either to always use "[fastopt] [loop] for
(...)" or to do only if the number of iterations is known to be quite
high. Keep in mind loops can be nested so that total iteration counts
would multiply.
We send the patch here, but we are not sure it will break many other
things, so, consider it as a quick and specific solution for the
http://fractal.io/ site and similar long loop shaders, although the
slow compilation issu can be found with other examples.
Index: OutputHLSL.cpp
===================================================================
--- OutputHLSL.cpp (revision 594)
+++ OutputHLSL.cpp (working copy)
@@ -1478,7 +1478,7 @@
mUnfoldSelect->traverse(node->getExpression());
}
- out << "for(";
+ out << "[fastopt] [loop] for(";
if (node->getInit())
{
@@ -1720,7 +1720,7 @@
// for(int index = initial; index < clampedLimit;
index += increment)
- out << "for(int ";
+ out << "[fastopt] [loop] for(int ";
index->traverse(this);
out << " = ";
out << initial;
[1] Angle is disabled by setting webgl.prefer-native-gl to TRUE in
about:config
[2] HLSL for statement:
http://msdn.microsoft.com/en-us/library/bb509602%28v=vs.85%29.aspx