Strange program switching under Metal on MacOS

101 views
Skip to first unread message

David Holtkamp

unread,
Mar 25, 2025, 8:00:49 PMMar 25
to angleproject
I am in the process of porting an OpenGL ES 3.0 game to work on Mac via Metal. Almost everything works the same as mobile at this time, but I am having trouble with a fairly simplistic shader which seems to have its program automatically unbound at certain points. I have worked many hours on this issue and this is what I've found

- The shader program is correctly building and linking with no errors
- This is running single threaded and it is stable.
- The uniforms exist as written and nothing is getting strangely optimized out by the Metal compiler
- In my engine's "UseShader" function, I call glUseProgram and it will claim to correctly use the shader with no errors and calling glGetIntegerv(GL_CURRENT_PROGRAM, &program), will correctly show the shader being used.
- This makes no sense to me, but when I  return from the function, which is called in a singleton, and then immediately after returning I call glGetIntegerv(GL_CURRENT_PROGRAM, &program), I see a different shader bound. This causes glUniform calls to fail obviously with the wrong program bound. Calling glGetIntegerv the line before returning shows the correct result
- If I call glUseProgram() after the function returns to rebind the correct texture, then glGetIntegerv(GL_CURRENT_PROGRAM, &program) will show the correct shader BUT it will still fail with an error 0x0502 when I try to set the uniform via glUniformMatrix3fv. 
- This has failed in this exact same way with different uniforms in different places on a couple specific shaders
- The shader itself is correctly compiled and visually works when rendering, although not very usable without access to needed uniforms


This behavior really has me stumped as I have no idea how the Angle implementation would even know that the function had returned, but I have tested it many times. Any ideas on what would cause this kind of behavior would be appreciated. As I said, there are 0 errors compiling the shaders, they have the uniforms when asked, and it also works perfect under native OpenGL ES 3.0 on iOS.

David Holtkamp

unread,
Mar 26, 2025, 2:03:10 AMMar 26
to angleproject
I should probably add that in my engines useShader function, there are no manager objects created in the function scope that are resetting the program on return. I double checked as this is the only reason I could think of for the status to change when it returns. The only local variables declared are two integers, and the class itself is a singleton that never gets deleted.  As I said, it works fine under nave OpenGL ES 3.0 on iOS as well.

Geoff Lang

unread,
Mar 26, 2025, 11:13:06 AMMar 26
to holt...@gmail.com, angleproject
I would hook up a debugger and add a breakpoint to ANGLE's glUseProgram call, GL_UseProgram.

--
You received this message because you are subscribed to the Google Groups "angleproject" group.
To unsubscribe from this group and stop receiving emails from it, send an email to angleproject...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/angleproject/beee6095-ff32-4210-90fd-71ea31c3cbe4n%40googlegroups.com.

David Holtkamp

unread,
Mar 26, 2025, 12:35:01 PMMar 26
to angleproject
Thanks for that symbol name. I did set the breakpoint and successfully saw it call that when using

glUseProgram(current_program_index);


But the same strange issue still occurred:
I called:

    glGetIntegerv(GL_CURRENT_PROGRAM, &program);


at the end of the function, which returned 27 (the correct program number), and then right after the function returns, I called the exact same thing, and it gave me a 5 (the previously bound shader program), and the GL_UseProgram function was never called in between those two changes.

David Holtkamp

unread,
Mar 26, 2025, 12:36:20 PMMar 26
to angleproject
I would add that I also called the 

glGetIntegerv(GL_CURRENT_PROGRAM, &program);

twice at the end of the function just in case there was a side effect of changing programs. Both correctly returned 27.

David Holtkamp

unread,
Mar 26, 2025, 1:08:47 PMMar 26
to angleproject
Another interesting point I would add. There are times in the draw loop that it does allow me to correctly bind this program and alter the Uniform. During the setup for the frame, I can put a valid value in both the vector and matrix so that it will render, but then in the draw call, when I try to update it to align with the scene, the above weirdness starts to occur. 

Ken Russell

unread,
Mar 26, 2025, 1:08:53 PMMar 26
to holt...@gmail.com, angleproject
Could a new program be being synthesized behind the scenes in response to some draw call with slightly different state in between the glGetIntegerv(GL_CURRENT_PROGRAM, &program); calls? And that synthetic one isn't fully abstracted away in the Metal backend?

Can you provide a small test case that shows the problem?


Ken Russell

unread,
Mar 26, 2025, 1:48:37 PMMar 26
to holt...@gmail.com, angleproject
Again can you please provide a test case? It's difficult or impossible for us to guess what is happening without one.


David Holtkamp

unread,
Mar 26, 2025, 1:59:30 PMMar 26
to angleproject
All shader programs are being compiled when the program starts and none are being dynamically created or recompiled. With an entire engine and game in source, it will be a bit difficult to fully abstract it down, and based on the fact that in some edge cases Angle behaves as expected makes me believe that it is some issue in the state that is not handled correctly by Angle. I will continue to refactor it down and see if I can isolate and will upload something if I can isolate it to a small enough sample.

David Holtkamp

unread,
Mar 27, 2025, 2:37:44 PMMar 27
to angleproject

I have further isolated the issue so that it can occur right at the beginning of the program and I think I might be able to isolate it down to something simple.

The part that will make it hard is that it for some reason is exceptionally dependent on the stack position which makes no sense. 

I would appreciate to know if any of this can be made sense of:
- It is not shader dependent. When it gets into its strange state, all shader bindings will revert back to a single number
- When in this strange state, textures do seem to correctly bind to sample2D uniforms without errors, but vectors will not bind to vec3 and cause a GLError.
- The state is triggered at certain stack depths it seems. Returning out of a function call causes glGetIntegerv(GL_CURRENT_PROGRAM, &program); to always return the same program. Calling into the manager and trying to set it to something else will allow glGetIntegerv(GL_CURRENT_PROGRAM, &program); to return the correct value until that functions returns and then future calls return the same value. 
- The functions are not causing any state changes with manager objects on the stack. Additionally setting a break point in Angle to make sure the program is never switched on return shows that it is not being changed
- No errors are returned from Angle via glGetError() until the uniform is attempted to be set via glUniform3f.

At this point, I feel there must be some strange edge case in a shader or the compilation process, because I am not really doing much between setting up and seeing this strange, seemingly stack based behavior.

David Holtkamp

unread,
Mar 28, 2025, 1:59:13 AMMar 28
to angleproject
Ok, after far more time than I would have liked, I figured out what was going on. My game engine is compiled along side the game as an embedded framework on Mac. I have included ANGLE in the engine, but for some reason, the library was included two separate times, with calls to the engine and game having two different addresses when calling angle. This resulted in the strange instantaneous state changes  I described earlier after function returns out of the engine.

With that said, I would love to get this included as a static library as this is what we do for all other 3rd party dependencies of our engine. Is there a way to get a clean libEGL.a and libGLESv2.a out of the Ninja build.

I currently use:
gn gen out/Debug --args='is_debug=true angle_enable_metal=true is_component_build=false'
ninja -C out/Debug 

This gives me the dynamically linked libraries, but the static libraries under the obj folder are all thin archives. Is there any option to generate full self contained static libraries?

Ken Russell

unread,
Mar 28, 2025, 7:03:25 PMMar 28
to holt...@gmail.com, angleproject
Glad you figured it out. That reminds me of all the C runtime options on Windows, where you could have different C heaps in different DLLs and couldn't free a pointer in one DLL that was malloc'ed in another one.

Sorry, I don't know how you would build ANGLE's targets as static libraries. ANGLE's Android build, at least, has some support for this:

There's this part of the GN docs; maybe it helps?


Geoff Lang

unread,
Apr 1, 2025, 8:10:55 AMApr 1
to holt...@gmail.com, angleproject
Glad you figured it out. We have libGLESv2_static and libEGL_static targets as well.

Geoff Lang

unread,
Apr 1, 2025, 8:11:08 AMApr 1
to k...@chromium.org, holt...@gmail.com, angleproject
There will be many static libraries for different components of ANGLE. libGLESv2_static and libEGL_static are the root ones that should pull everything else in but since you're not using GN, you may need to link them manually. 

Also try Ken's suggestion of complete_static_lib. Add it to the libEGL_static target here: https://source.chromium.org/chromium/chromium/src/+/main:third_party/angle/BUILD.gn;l=1594;drc=19e456805e72897843890e90cb344f384c33dce5

Normally I'd suggest using dlmopen to load multiple instances of the shared objects but it looks like MacOS has no support.

David Holtkamp

unread,
Apr 1, 2025, 7:01:04 PMApr 1
to angleproject
Hey everyone, I finally got everything working, so I thought I would come back and say thank you for the help, and also share this for any poor soul who might attempt this in the future on Mac

So once I figured out the issue was caused by loading two copies of the library, I started looking at both fixing the dynamic library, and by making a static library

The static library was a very big pain, and complete_static_lib may have helped, but it does not resolve the issue of it being a thin archive, meaning it references outside object files and does not include them. This is a huge issue as the Mac toolchain, which I think may be BSD and does not support these and will fail when trying to link. After an exceptional amount of searching, I finally found reference to this in compiler/BUILD.gn where there is a thin_archive section config which can be turned off. Once I did that and compiled, the library was 1.5GB which seemed strangely big, but I tried to link nonetheless. It failed with some sort of namespace issue on the c++ library which I never got time to investigate further.

Next, I continued to investigate the dynamic linking issue that caused the double load. I eventually figure out it had to do with making it so the install name built into the dylib files were not ./libEGL.dylib, but instead @rpath/libEGL.dylib. I changed this via an external tool 

install_name_tool -id @rpath/libEGL.dylib libEGL.dylib 


but perhaps you can easily change this easily when building too. Once I corrected that and added the libraries to both the framework and program, and then made sure both had the run path each paths that could find the library in its final destination, the library started as a single unified copy. Apparently the OS dynamic loader will make a second copy if it does not resolve the path to look the same, even if its the same file.


Thanks again,

David

Reply all
Reply to author
Forward
0 new messages