I need to ask permission from by employer if I can post the shader; however, I can give some immediate info without divulging anything sensitive. I've managed to narrow down the offending
vertex shader which has some complicated logic to decide if something should be collapsed. What makes the story more... messy is that commenting out a particular block of shader code makes it work on the Direct3D backend in all situations and in another situation the same block of shader code works anyways. The ugly is that there is quite a bit of control flow and dynamic loops going on, but the different between where it works and does not work is in code that completely follows (and not nested in anyway) with the commented out to work block of code.
The offending shader is also quite large (as it is partially machine assembled); I can try to look at the debug shader value (though a simple print of it might flood the console).
As for the UBO dynamic index access, I do this in the shader code:
void loadFoo(in uint location, out Foo foo)
{
uvec4 tmp;
tmp = rawLoad(location);
foo.bar1 = uintBitsToFloat(tmp.x); //get a float
foo.bar2 = tmp.y; // get a uint
// grab from tmp.z and tmp.w
//if it is more than 16 bytes,
tmp = rawLoad(location + 1u);
foo.bar5 = // take from tmp
}
then using UBO's the rawLoad() is just this:
uniform UBO
{
uvec4 data[N]; //N so that UBO is smaller than max size allowed for a UBO
};
uvec4 rawLoad(uint L) { return data[L]; }
whereas when I use textures instead I do this little nightmare:
highp usampler2D data;
uvec4 rawLoad(uint L)
{
uvec2 xy;
xy.x = L & 2047;
xy.y = L >> 11u;
return texelFetch(data, ivec2(xy), 0);
}
Only the Metal backend requires this, the Direct3D backend does not. As to why I load the values directly this way even with UBO's working: a long time ago I was examining shader assemblies on Intel GPU's and had found that it issue far more load messages when I used formatted UBO data instead of doing the loads myself up to 16-bytes at a time.
The place where performance seems to suffer is when I use uber-shading; the scenes I am drawing (because it is 2D SVG stuff), I cannot reorder on shader and there are zillions of items to draw with what/how to draw changing excessively. Using uber-shading on OpenGL native for these scenes gives an improvement of over 20% often, so it is worth while... and because of the cross-process nature of WebGL2 implementations making GL calls even heavier, it is potentially more worthwhile on browsers too.