Direct3D backend performs poorly or incorrectly with complicated shaders

1,337 views
Skip to first unread message

Kevin Rogovin

unread,
Apr 8, 2022, 7:07:35 AM4/8/22
to angleproject
Hello all,

 I am developing a library whose shaders have complicated flow control including dynamic loops and nesting conditionals (but not terribly deep). I have the following results on various platforms:

  1. MacOS (M1 GPU and AMD GPU)
    1. native OpenGL (runs perfectly)
    2. Chrome's WebGL2 via OpenGL (runs perfectly)
    3. Chrome's WebGL2 via Metal (works if I use a texture instead of UBO's that contain arrays)
  2. Linux (Intel GPU, AMD GPU and NVIDIA GPU)
    1. native OpenGL 
    2. (runs perfectly)
    3. Chrome's WebGL2 via OpenGL (runs perfectly)
  3. MS-Windows (NVIDIA GPU)
    1. native OpenGL (runs perfectly)
    2. Chrome's WebGL2 via OpenGL (runs perfectly)
    3. Chrome's WebGL2 via Direct3D (works with poor performance mostly if I use a texture instead of UBO's that contain arrays
I do not (yet) have tests on AMD or Intel GPU on MS-Windows, but I anticipate the same since the Linux drivers for Intel GPU are usually slower than the MS-Windows drivers on MS-Windows and the same situation for AMD.

What I would like to advocate is that for desktop MS-Windows to move to the default backend for ANGLE to be OpenGL; OpenGL drivers have come an incredibly long way since ANGLE was first written I have found that it out performs (and works more reliably) than the translation to Direct3D.

Jamie Madill

unread,
Apr 8, 2022, 9:58:09 AM4/8/22
to kevinr...@invisionapp.com, angleproject
Curious how we perform using the Vulkan back-end. Can you give it a try on Windows and/or Linux and let us know how it compares? Note that MacOS would need MoltenVk, which we don't currently support.

--
You received this message because you are subscribed to the Google Groups "angleproject" group.
To unsubscribe from this group and stop receiving emails from it, send an email to angleproject...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/angleproject/ec6ed4b8-eb7e-4db7-872c-90380fc820cen%40googlegroups.com.

Kevin Rogovin

unread,
Apr 8, 2022, 11:12:40 AM4/8/22
to angleproject
Even after enabling Vulkan in chrome://flags, there was no option to use Vulkan backend in ANGLE. Is there something I am missing to get Chrome to use Vulkan for ANGLE?

Shahbaz Youssefi

unread,
Apr 8, 2022, 11:36:42 AM4/8/22
to angleproject
I believe the chrome://flags one is for Chrome to use Vulkan for compositing, rather than ANGLE using Vulkan for WebGL. You can run Chrome with `--use-angle=vulkan` to try this.

Regarding GL on windows, it was deemed not worth the effort, with a potential eventual switch to something even better (i.e. Vulkan). Context: crbug.com/693090

Kevin Rogovin

unread,
Apr 8, 2022, 12:14:48 PM4/8/22
to angleproject
I have to admit that for drivers and GPU's on MS-Windows that are no more than 5 years old (and really likely 10), OpenGL drivers have been very solid (at the very least more solid than translating to Direct3D). As for emulating GL on top of Vulkan, there are a fair number of heart-aches that will harm performance as well (the worst one is what vertex of a primitive chooses the value for flat varyings comes to mind) and a fair amount of pain needed for so many pipeline state objects. I advocate that atleast enabling using GL for ANGLE as default on NVIDIA cards (of really any generation) on MS-Windows is best and I strongly suspect that using it for Intel GPU's (atleast since Gen9 possible as far back as Gen7) is also fine and AMD cards as far back as 7 years ago is very safe.

I have a massive compatibility headache on my hands from this. I have valid shaders that run fine, but fail to compile in the Direct3D backend or produce utter weirdness in their output. Also, dynamically indexing into an array in a UBO is currently quite unreliable too in the Direct3D backend (and also the Metal backend too).

I find myself writing work arounds and getting a performance hit too from this....

I'll give the Vulkan a whirl shortly (on MS-Windows and Linux), but I know that on Intel GPU's that Mesa's shader compiler for GL has more optimisations enabled than the SPIR-V side so I am not very hopeful with the Vulkan path either.

Jasper St. Pierre

unread,
Apr 8, 2022, 12:29:35 PM4/8/22
to Kevin Rogovin, angleproject
Have you looked into why the D3D11 backend is slower? Switching to the
OpenGL backend for the default is going to be a non-starter on many
systems; it does not integrate well with DirectComposite and
multimedia playback.

I'm fully prepared to believe there's a bug in the shader translator,
or the D3D11 runtime, or your shaders are using features which are
just going to perform poorly on other modern GPUs. Using
WEBGL_debug_shaders, you should be able to look at the translated
shader source and start to profile it. Would you be able to post a
shader here for investigation?

On Fri, Apr 8, 2022 at 9:14 AM Kevin Rogovin
> To view this discussion on the web visit https://groups.google.com/d/msgid/angleproject/ffa23164-1907-4763-a4e6-d0dd76750655n%40googlegroups.com.



--
Jasper

Kevin Rogovin

unread,
Apr 8, 2022, 2:23:04 PM4/8/22
to angleproject
I need to ask permission from by employer if I can post the shader; however, I can give some immediate info without divulging anything sensitive. I've managed to narrow down the offending vertex shader which has some complicated logic to decide if something should be collapsed. What makes the story more... messy is that commenting out a particular block of shader code makes it work on the Direct3D backend in all situations and in another situation the same block of shader code works anyways. The ugly is that there is quite a bit of control flow and dynamic loops going on, but the different between where it works and does not work is in code that completely follows (and not nested in anyway) with the commented out to work block of code.

The offending shader is also quite large (as it is partially machine assembled); I can try to look at the debug shader value (though a simple print of it might flood the console).

As for the UBO dynamic index access, I do this in the shader code:

void loadFoo(in uint location, out Foo foo)
{
   uvec4 tmp;
  tmp =  rawLoad(location);
  foo.bar1 = uintBitsToFloat(tmp.x); //get a float
  foo.bar2 = tmp.y; // get a uint
  // grab from tmp.z and tmp.w

  //if it is more than 16 bytes,
 tmp = rawLoad(location + 1u);
 foo.bar5 = // take from tmp
}

then using UBO's the rawLoad() is just this:

uniform UBO
{
  uvec4 data[N]; //N so that UBO is smaller than max size allowed for a UBO
};
uvec4 rawLoad(uint L) { return data[L]; }

whereas when I use textures instead I do this little nightmare:

highp usampler2D data;
uvec4 rawLoad(uint L)
{
   uvec2 xy;

  xy.x = L & 2047;
  xy.y = L >> 11u;

  return texelFetch(data, ivec2(xy), 0);
}

Only the Metal backend requires this, the Direct3D backend does not. As to why I load the values directly this way even with UBO's working: a long time ago I was examining shader assemblies on Intel GPU's and had found that it issue far more load messages when I used formatted UBO data instead of doing the loads myself up to 16-bytes at a time.

The place where performance seems to suffer is when I use uber-shading; the scenes I am drawing (because it is 2D SVG stuff), I cannot reorder on shader and there are zillions of items to draw with what/how to draw changing excessively. Using uber-shading on OpenGL native for these scenes gives an improvement of over 20% often, so it is worth while... and because of the cross-process nature of WebGL2 implementations making GL calls even heavier, it is potentially more worthwhile on browsers too.

Kevin Rogovin

unread,
Apr 8, 2022, 2:25:36 PM4/8/22
to angleproject
One last comment: I strongly suspect that the trigger is complicated control flow in the shaders, this is my hunch.

Kevin Rogovin

unread,
Apr 8, 2022, 6:22:43 PM4/8/22
to angleproject
Um, this is somewhat embarrassing, but how do I get the translated shaders from C++ code compiled via Emscripten?

When I build for native and link against ANGLE, I fetch the symbol glGetTranslatedShaderSourceANGLE and call it. However, trying to fetch that function via emscripten_webgl_get_proc_address() fails, but the extension GL_WEBGL_debug_shaders is listed in the extension list. Out of desperation I also tried glGetTranslatedShaderSource. In addition, I tried skipping the the function pointer dance by adding #define GL_GLEXT_PROTOTYPES before include GLES3/gl3.h and GLES2/gl2ext.h and although it compiles, emscipten fails to link saying the symbol glGetTranslatedShaderSourceANGLE does not exist.

Anyone have hints on this?

Shahbaz Youssefi

unread,
Apr 9, 2022, 1:48:51 AM4/9/22
to angleproject
On Friday, April 8, 2022 at 2:23:04 PM UTC-4 kevinr...@invisionapp.com wrote:
As to why I load the values directly this way even with UBO's working: a long time ago I was examining shader assemblies on Intel GPU's and had found that it issue far more load messages when I used formatted UBO data instead of doing the loads myself up to 16-bytes at a time.

Are you familiar with the std140 and std430 layouts? It very much sounds like you were using the std140 layout with float/int/etc types.


Um, this is somewhat embarrassing, but how do I get the translated shaders from C++ code compiled via Emscripten? 

Sorry not familiar with any of this. My suggestion would be to put the slow shader in an ANGLE end2end test like in GLSLTests.cpp, and debug that way. Then you can just printf the translated shader if all else fails.

Kevin Rogovin

unread,
Apr 9, 2022, 4:54:51 PM4/9/22
to angleproject
std430 is not available in WebGL2/GLES3, which is shame since the packing rules result in tighter packing; but std140 is available in WebGL2/GLES3. However, the packing rules for std140 can be at times subtle; in times past GL drivers sometimes got them wrong and even now it is easy to get wrong by hand too (though the usual tag line of make stuff 16-byte aligned is an easy rule of thumb) which is another reason why I just made the UBO's as an array of uvec4's and do the unpacking of data by hand. Checking my shaders, I do declare them as:

layout(std140) uniform UBO { uvec4 data[N]; }

The issue is that I need for my stuff to work in -browsers- which means I need to see how the ANGLE in current browsers behave. I hope that as bugs in ANGLE are fixed that those updates find themselves in browsers, but I know there can be a long lead time. My hope is that using OpenGL backend will become the default since the translation is more reliable .... though, it looks like that is a thin hope because of the better integration for video and compositing with Direct3D.. in that light, a Vulkan translation does not look like it will be helpful either.

I think I have a way to get access to the WebGL shaders objects in Emscripten by accessing GL.shaders[] in https://github.com/emscripten-core/emscripten/blob/main/src/library_webgl.js#L192 and using the WebGL extension directly.

I do admit I am very frustrated as a developer in this case because the shaders I am writing are involved and in addition to checking that they work will all drivers I can get ahold of (and they have so far), I now also need to check how the shaders are translated to Direct3D11 and Metal and how those translations run on different drivers... it just multiplies my work load and because of the translation it can be quite tricky to work-around bugs/issues hit from my shaders.

Kevin Rogovin

unread,
Apr 11, 2022, 9:22:44 AM4/11/22
to angleproject
I managed to be able to extract the translated source code. I tested on both current Chrome and current Edge. For the shaders that fail, the translated code is empty, the compile log and link long are also empty.

For posterity, this touch of C code will let one get the translated shader code for an application compiled by emscripten:

EM_JS(char*, get_translated_shader_source, (int32_t nm), {
    var ext = GLctx.getExtension('WEBGL_debug_shaders');
    if (ext) {
      var jsString = ext.getTranslatedShaderSource(GL.shaders[nm]);
      var lengthBytes = lengthBytesUTF8(jsString) + 1;
      var stringOnWasmHeap = _malloc(lengthBytes);
      stringToUTF8(jsString, stringOnWasmHeap, lengthBytes);
      return stringOnWasmHeap;
    } else {
      return 0;
    }
  });

The above uses the undocumented but very stable GL variable in Emscripten's GL-binding to WebGL to go from GL-names to WebGLShader objects.

The use of the function get_translated_shader_source() is this:

translated_code = get_translated_shader_source(shader_name);
if (translated_code) {
    // do something with the C-style string

   // remember to free it too
   free(translated_code);
}

I suppose the next step would be to get the version of ANGLE that makes the bad code and feed the shaders to it. Is there a way to get the Direct3D translation without building (or for that matter) running on MS-Windows? My development environment is heavily MacOS and Linux.

Best Regards,
 -Kevin

Shahbaz Youssefi

unread,
Apr 12, 2022, 1:45:04 AM4/12/22
to angleproject
On Monday, April 11, 2022 at 9:22:44 AM UTC-4 kevinr...@invisionapp.com wrote:
I suppose the next step would be to get the version of ANGLE that makes the bad code and feed the shaders to it. Is there a way to get the Direct3D translation without building (or for that matter) running on MS-Windows? My development environment is heavily MacOS and Linux.

I recall having built with HLSL support on Linux in the past by adding this to args.gn: `angle_enable_hlsl=true`. You would need to hack the code to actually use it though, because without the d3d backend, no one is calling into the HLSL parts of the translator. Alternatively, and this is totally untested, you might find success by cross-compiling for windows on Linux, then run it under wine! To do that, you need to add `target_os = [ 'win' ]` to the bottom of the .gclient file, call gclient sync and add this to args.gn: `target_os = "win"`.

Kevin Rogovin

unread,
Apr 12, 2022, 2:40:54 AM4/12/22
to angleproject
Does ANGLE build correctly with the Msys2 tool chain? I can try that....

Kevin Rogovin

unread,
Apr 12, 2022, 8:27:09 AM4/12/22
to angleproject
So, I've managed to make a diff to make the thing not crash the Direct3D backend of ANGLE (in both Edge and Chrome):

-  if (tmp > 0.0 && dash.m_first_interval >= 0.0 && float(int(tmp)) == tmp)
+ if (tmp > 0.0 && dash.m_first_interval >= 0.0 && floor(tmp) == tmp)

This code is part of a function and there are other shaders that do NOT crash the Direct3D backend that use this function without the above change.

I don't know if the above gives enough hints, I will know in less than 48 hours if I can post the shader that crashes the Direct3D backend of ANGLE (atleast the version of ANGLE that is in current Chrome and Edge)

-Kevin.

P.S. Fun fact: if the Direct3D backend of Edge crashes enough (without restarting Edge), Edge will switch over to the GL backend.

Ken Russell

unread,
Apr 12, 2022, 2:26:26 PM4/12/22
to Kevin Rogovin, angleproject
Can you go to Chrome's about:crashes page and tell us any uploaded crash IDs from there? It would be good to know whether this crash (if it is in fact a crash and not something else like a shader compilation timeout) is coming from ANGLE's shader translator, Microsoft's fxc which is invoked to compile the translated HLSL, etc.


--
You received this message because you are subscribed to the Google Groups "angleproject" group.
To unsubscribe from this group and stop receiving emails from it, send an email to angleproject...@googlegroups.com.

Kevin Rogovin

unread,
Apr 12, 2022, 4:03:04 PM4/12/22
to angleproject
Hi,

 Do the crash reports include the shader source code? If so, I need to get permission to allow others to see it (since the project is not yet released to the world).

Ken Russell

unread,
Apr 12, 2022, 4:19:18 PM4/12/22
to Kevin Rogovin, angleproject
They do not. They only contain stack traces of Chrome's internal code.


Kevin Rogovin

unread,
Apr 12, 2022, 4:25:36 PM4/12/22
to angleproject
Here is the latest unhappiness:

Crash from Tuesday, April 12, 2022 at 11:03:44 PM
Status:Uploaded
Uploaded Crash Report ID:5951117c4170eb39

Kevin Rogovin

unread,
Apr 12, 2022, 5:25:03 PM4/12/22
to angleproject
Comment: the crash report 5951117c4170eb39 is an uber-shader with that peculiar diff earlier in the thread applied to one of its sub-units. I can, if you wish, also provide a crash from a shader which is not an uber-shader without that peculiar diff. Just let me know.

Ken Russell

unread,
Apr 12, 2022, 6:25:31 PM4/12/22
to Kevin Rogovin, angleproject, Geoff Lang
Thanks for the crash ID. The stack trace is a little surprising.

Thread 0 (id: 0x0000268c)MAIN THREADCRASHEDMAGIC SIGNATURE THREAD
Exception infoEXCEPTION_ACCESS_VIOLATION_READ @0x00000000
Stack Quality100%Show frame trust levels
0x00007ff8a840e04f (chrome.dll -memcpy.asm:254) memcpy
0x00007ff8ac089170 (chrome.dll -gles2_cmd_decoder_passthrough_doers.cc:4242) gpu::gles2::GLES2DecoderPassthroughImpl::DoGetUniformBlocksCHROMIUM(unsigned int,std::__1::vector<unsigned char,std::__1::allocator<unsigned char> > *)
0x00007ff8a5096808 (chrome.dll -gles2_cmd_decoder_passthrough.cc:2573) gpu::gles2::GLES2DecoderPassthroughImpl::ProcessQueries(bool)
0x00007ff8a4a1ed4d (chrome.dll -command_buffer_stub.cc:767) gpu::CommandBufferStub::ScopedContextOperation::~ScopedContextOperation
0x00007ff8a4a1ed4d (chrome.dll -command_buffer_stub.cc:169) gpu::CommandBufferStub::ExecuteDeferredRequest(gpu::mojom::DeferredCommandBufferRequestParams &)
0x00007ff8a4a1e957 (chrome.dll -gpu_channel.cc:669) gpu::GpuChannel::ExecuteDeferredRequest
...

It'd be great if you could provide us a test case. This looks like an internal bookkeeping bug we'd like to track down.

-Ken



Kevin Rogovin

unread,
Apr 12, 2022, 8:24:21 PM4/12/22
to angleproject
It is really surprising from my side too because all but one UBO's of the code are of the form layout(std140) uniform UBO { uvec4 data[N]; } and the remaining UBO is just 8 floats (including padding) and not an array. In addition, I was able to tweak the stuff so I did manage to get non-empty translated shader code to HLSL (or atleast an initial transformation).

I will make a few more crashes that are the following:
  1. nearly same as 5951117c4170eb39, but instead of using several UBO's of the form layout(std140) uniform UBO { uvec4 data[N]; }, it would use a usampler2D.
  2. a non-uber shader, that makes a crash because it lacks the peculiar patch earlier in the thread

I'll do this tomorrow though, it is the middle of the night here and I will also know tomorrow if I can post code. Making a small test case is a touch trickier though....



Message has been deleted

Kevin Rogovin

unread,
Apr 13, 2022, 3:02:45 AM4/13/22
to angleproject
I thought I had a workaround by avoiding parallel compile, but I did not...

Geoff Lang

unread,
Apr 13, 2022, 2:55:21 PM4/13/22
to kevinr...@invisionapp.com, angleproject
Both these crashes have the same stack in Chrome, something to do with gathering uniform blocks. It does seem like possibly an ordering issue if you saw different results depending on the program being bound or not.

The crash appears to happen when appending a uniform block name to a string. A null dereference from a std::vector::data() means that the vector was likely size 0 which implies that querying GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH returned 0 and GL_ACTIVE_UNIFORM_BLOCKS was >0. I'm not sure how we got into this state yet but it's likely dependent on the shader or order of operations before compiling.

Geoff

Kevin Rogovin

unread,
Apr 13, 2022, 6:25:58 PM4/13/22
to angleproject
I .. don't know. The difference between working and not working is spooky:
  1. float(int(tmp)) == tmp triggers a crash reliably inside of a function when the function is called in a very, very specific place (but other places are ok), but floor(tmp) == tmp works fine.
  2. in other situations, it crashes from an uber-shader where as elements of the uber-shader when compiled separately all need access to the same uniforms. In addition it crashes even if there are just two UBO's which are small (one is 7 floats and the other is 12 floats), both are declared with layout(std140). 
In addition, I can grab the translated source if I am very careful how I fetch status and logs. It looks like checking the value of GL_COMPILE_STATUS for a shader or GL_LINK_STATUS for a program triggers or fetching the logs is when the crash gets triggered. So I grab the translated code before querying the compile status, link status or their logs. The translated code lists the entire GLSL shader code and the resulting HLSL code looks more or less the same (though the formatting lacks any indentation so it is amusing and the names are easy to see what came from what) and it defines a bunch of variants of various structs (I guess for if they are members of a UBO) that are never used. I also see it initializes all variables as 0.

What it smells like is that something goes off the rails and is actually corrupting memory... is there an issue where bad things happen when if-logic depth gets too deep? As a side note, the current Metal backend in the current Safari does NOT crash, neither does the GL backend in Chrome or Edge. It is only the Direct3D backend that is grumpy... Is that code from the link active regardless of the backend?

Here are two more crash dumps:
  • 29606bcf88af4d09 --> non-uber, use texture, float(int(tmp) == tmp
  • a6221d10b789c0c4  ---> uber, use texture, floor(tmp) == tmp
I was told that I need to wait for an answer beyond my immediate manage about if I can share the shader code. I wish to share it very much so, but I need to wait for an answer.

-Kevin

Ken Russell

unread,
Apr 13, 2022, 7:05:47 PM4/13/22
to Kevin Rogovin, angleproject
Both 29606bcf88af4d09 and a6221d10b789c0c4 are the same basic crash in DoGetUniformBlocksCHROMIUM.

Somewhat unrelated, but I recommend you stop checking COMPILE_STATUS of individual shaders and only check the LINK_STATUS of the linked program. If that failed, you can go back and check the compile status of the individual shaders. Chrome doesn't currently optimize the shader compilation status checks, but KHR_parallel_shader_compile does allow compiles+links to be fully asynchronous if you only check the LINK_STATUS of the complete program.

-Ken



Kevin Rogovin

unread,
Apr 14, 2022, 2:20:43 AM4/14/22
to angleproject
These dumps are not what I usually do; what I usually do is the following:

create:
  1. vs = glCreateShader(GL_VERTEX_SHADER); glShaderSource(vs...); glCompileShader(vs);
  2. fs = glCreateShader(GL_FRAGMENT_SHADER); glShaderSource(fs...); glCompileShader(fs);
  3. p = glCreateProgram(); glAttachShader(vs); glAttachShader(fs); glLinkProgram(p);
rendering is then just:
      
       if (p already used)
        glUseProgram(p); 
      else {
          GLenum v(GL_FALSE);
          glGetProgramiv(p, GL_COMPLETION_STATUS_KHR, &v);
          if (v == GL_TRUE)
            glUseProgram(p);
         else
           glUseProgram(FallBackProgram);
      }

i.e. when in release mode I do not even bother checking the shaders or link status....   

Ken Russell

unread,
Apr 18, 2022, 10:32:49 PM4/18/22
to Kevin Rogovin, angleproject
OK, good to know that your product generally doesn't block waiting for either compile or link status.

Regarding the actual crashes - our team needs some sort of test case in order to make progress.

-Ken



Reply all
Reply to author
Forward
0 new messages