The mechanism for parallel compile/link can be leveraged to put these tasks in an UnlockedTailCall. This would allow GL threads whose sole purpose is to compile shaders and link programs to be run in parallel with the main thread.
There are a couple of subtasks:
- Parallel link should be supported (even if disabled) in the Vulkan backend - The job management should be hoisted to the front-end such that it can decide to post a job, or schedule it as an UnlockedTailCall * Most of the front-end link can also then be placed in the same task and parallelized - The tail of the call (Context::onProgramLink) should be managed such that it's no longer the tail! Given that it stops parallel link in some cases, the front-end can just decided not to make the link parallel in the first place in that case, and only do parallel/UnlockedTailCall in the other case (which is the common one) - UnlockedTailCall for shader compilation is separate task.
-- You received this message because: 1. The project was configured to send all issue notifications to this address
This change moves pipeline warm up to a parallelizable task, mostly as an exercise to put in the infrastructure for parallel link in the Vulkan backend. Follow up changes will move more of the link step to this task.
The end goal is to be able to make the link task independent of ContextVk, which would allow it to be run as an UnlockedTailCall, even if not using a worker thread.
Prevent shaders from recompiling while a link job is in progress
This will prevent a guarantee that link jobs can take further advantage of. In particular, a good chunk of the link job is done serially and under the share group lock due to this recompile-while-link issue. After this change, that is no longer a problem, and most of the link can be made lockless/parallelized.
The descriptor set layout set up for XFB emulation lived in TransfromFeedbackVk, but did not access anything in that object at all. More importantly, it doesn't make any sense to look at the current XFB object when linking a program.
Prior to this change, Program* objects held references to Shader* objects. This poses a problem where a shader recompile can race with a program link, if the program link is done in parallel.
As a result, a good chunk of the link job is done serially and under the share group lock. After this change, that is no longer a problem, and
most of the link can be made lockless/parallelized.
This change separates out the "compiled state" from the rest of the shader state. This was already done for the front-end state (for the sake of caching), but is also now done for the backends that need it. The compiled state in turn is placed in a shared_ptr, and is shared with the program. When a shader is compiled, its own shared_ptr is replaced with a new object, leaving all programs currently compiling unaffected and using the previous compilation results.
Once a program is linked, its references to compiled shader states is updated.
However, this does not seem to be an issue anymore. New tests are specifically added for this, and they, along with every other test, pass.
If this needs to be reintroduced, it needs to be rethought. It does not play well with parallel link as it changes the executable pointer while link is in progress (and it was done on the assumption that everything needing the executable is linked serially). A better solution would likely be an `mLastSuccessfullyLinkedExecutable` that normally points to `mExecutable`, but not during link. On `resolveLink`, it would either make `mExecutable` point back to `mLastSuccessfullyLinkedExecutable`, or the other way around based on whether the link was successful or not.
The dependence on ContextMtl is not entirely removed, though it seems feasible. In particular, ContextDevice is passed around while only the real underlying device is needed. Some refactoring is needed here to address that.
Additionally, I haven't verified if all that is accessed through the display is thread-safe, which would be a pre-requisite to moving the entire link job to a worker thread on this backend.
Vulkan: Make sure ProgramVk has no members other than executable
The program is really just an interface to the executable. It should not hold on to any data on its own. The SpvProgramInterfaceInfo member did not even need to be a member, and was left over from previous refactorings.
Previously, only things that needed sharing with ProgramPipeline was moved to ProgramExecutable, and in particular only state that the Vulkan backend needed to access. In truth, everything that's a result of link needs moving.
The program hash should be independent of link output. The fragment shader output locations were being hashed in the program hash, but that should be unnecessary, as they are derived from the link input, which is already entirely present in the program hash.
The info log doesn't really belong to ProgramExecutable; it belongs to ProgramState. However, it is placed there for convenience since many functions access it.
This change cleans up usage of InfoLog so the one in ProgramExecutable is consistently used, but also that is turned into a reference to ProgramState's InfoLog.
This is necessary for a follow up change that restores the previous executable on link failure (and would thus otherwise lose the info log of the failing link).
Move transform feedback buffer mode to program state
This is link input and should persist between relinks. A copy of it is still left in ProgramExecutable as it is used later after the executable is installed (and the program might get relinked; a future change reworks things so that the installed executable and the originating program are independent).
Cache transform feedback varying names in the executable
Currently, ANGLE actually does a full link of the programs inside PPOs. This was never the intention of the spec (hence why an explicit link doesn't exist). During this link operation, the transform feedback varying names are used, and they are retrieved from the program itself.
This is not correct, because the transform feedback varyings may have changed, the program may have failed to relink, and the program pipeline is expected to continue functioning using the "installed" executable.
Metal: Move setupDraw and associated code to executable
Last bits of state left in ProgramMtl that are now moved to ProgramExecutableMtl, and now ContextMtl does not reference the program at all.
This fix was necessary for a follow up change that allows the program to be modified while the executable is installed, and the metal backend was crashing after a failed relink due its direct access to the program.
- The program has an executable - The executable is overwritten during link. - After a failed link, queries of the executable may return half-linked information - On glUseProgram, the executable is installed in the context - On glUseProgramStages, the executable is installed in the program pipeline - After a successful link, the executable is updated wherever the previous executable of the program was installed.
This change implements exactly the above:
- The program's and the program pipeline's executables are now shared_ptr. References to an executable in the context and PPO are also through a shared_ptr. Installing an executable thus translates to sharing the executable. - The context and PPOs are made to not reference the program directly, but work solely through the executable. As a result, the program is free to create a new executable for link.
With this change, the link job will be free to modify the executable as necessary because that will not be accessed until the link is done. Note that previous changes made the backend executable accessed through the frontend one, and moved all link results to the frontend and backend executables as appropriate.
Revert "Verify shader link correctness after looking up in program cache"
This reverts commit 60b56591dee59bc0bc770577f43d90be4b18863c.
Reason for revert: Flakiness if the attached shaders are not compiled
Original change's description:
> Verify shader link correctness after looking up in program cache > > Since the program cache takes shader sources into account, if the cache > is hit the shaders should have been correct for link. > > Bug: angleproject:8297 > Change-Id: I115c7eb807ed620d30b9e30669c99fffb5c7912d > Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/4817830
D3D: Remove more usages of program instead of executable
Only usage of getProgram() and getLinkedProgram() in the d3d backend remains for multidraw, which will be fixed separately as it involves setting uniforms (currently done through the program).
- Front-end link - Back-end link - Independent back-end link subtasks (typically native driver compile jobs) - Post-link finalization
Each step depends on the previous. These steps are executed as such:
1. Program::link calls into ProgramImpl::link - ProgramImpl::link runs whatever needs the Context, such as releasing resources - ProgramImpl::link returns a LinkTask 2. Program::link implements a closure that calls the front-end link and passes the results to the backend's LinkTask. 3. The LinkTask potentially returns a set of LinkSubTasks to be scheduled by the worker pool 4. Once the link is resolved, the post-link finalization is run
In the above, steps 1 and 4 are done under the share group lock. Steps 2 and 3 can be done in threads or without holding the share group lock if the backend supports it. Step 2 is not yet made independent of the Context on some backends, and a frontend feature is used to make that step either run on the main thread or as a worker thread.
Move set/get uniform machinery to ProgramExecutable
This is done because some uniforms are internally added by the compiler (draw ID, base vertex, and base instance) and are automatically set **on the installed executable**.
This change fixes scenarios where a draw is done after a program has failed a relink, and therefore is unable to correctly set the uniforms (as it does not have access to the executable that is installed).
It also fixes draws that use those uniforms in a PPO.
This cleans up the multiple compilation task implementations in the backends and consolidates them in the front-end. The front-end is then able to do the compilation in an unlocked tail call instead if desired (in a future change).
This change is in preparation for having the program link tasks directly wait on the shader compilation tasks. As a result, the "shader resolve" should not be needed to access the shader compilation results; it should be enough to wait for the compilation job. This change therefore moves post-processing of results to the compilation job itself as they did not need to actually be done after compilation is done (merely after translation is done). As a side effect, shader substition and other debug features should now work for the GL backend as they are now done before back-end compilation.
GL: Fix compilation on drivers sensitive to initial comment
Prior to https://chromium-review.googlesource.com/c/angle/angle/+/4994655, debug info was **prepended** to the translated source in the form of a comment after the compilation was done. In that change, this was done after translation and before the translated source was handed to the backend compiler.
Some GL drivers however failed to compile the shader after this change because the #version line was no longer the first one in the shader.
This is fixed by **appending** the debug info instead.
commit 93b97a59aa8f7c193cdb78363ccc6e64a96b82df Author: Shahbaz Youssefi <syou...@chromium.org> Date: Sat Nov 04 02:07:23 2023
Make link job directly wait on compile job
Previously, program link waited on the compile job on the calling thread before launching the link job. As a result, sequences of intermixed compile and link would get largely serialized as such:
With this change, the main thread no longer waits for compilation to finish. It's the link job itself that does the waiting. This allows the main thread to go through Compile and Link commands without blocking, generating as many jobs as needed. The above scenario therefore becomes:
Main T1 T2 T3 T4 T5 T6 T7 T8 T9 C ----> C C ------|----> C L ------|------|----> L C ------|------|-------W---> C C ------|------|-------|-----|----> C L ------|------|-------|-----|------|----> L C ------|------|-------|-----|------|-------W---> C C ------|------|-------|-----|------|-------|-----|----> C L ------|------|-------|-----|------|-------|-----|------|----> L . \-----\------>/ | | | | | W . | \-----\------>/ | | | . | | \-----\------>/ . | | | . | | |
This greatly improves the amount of parallelism compile and link jobs get.
The careful observer may note that the link job being blocked on the compile job is now wasting a thread from the thread pool. While this change is strictly an improvement, parallelism can be further improved if the link job is just not assigned to a thread until the corresponding compile jobs are finished. This is currently not possible, but may be if:
- Instead of a thread pool, the operating system's FIFO scheduler is used. Then the operating system would automatically put blocking tasks to sleep and pick up another task. This has the downside of requiring threads to be created for each task. - The thread pool work scheduler is enhanced to be made aware of relationship between tasks and avoid scheduling jobs whose dependencies are not yet met.
Alternatively, the number of threads in the pool can be increased by 30% and hope for the best.
Do compile/link unlocked if not threaded (but thread-safe)
If GL_KHR_parallel_shader_compile is not supported, or it is not used to do threaded compilation and link, this change lets the compile and link jobs be done after releasing the share group lock. With multithreaded/multi-context applications, this allows the other context (typically the main context) to make progress in the meantime.
A typical scenario where this optimization matters is games seamlessly loading a new area of the game and performing compilation and link in a separate context. Before this change, the game would stutter as the compile/link jobs prevent the main thread from drawing anything. With this change, the hitching is removed.
Calling this done. See potential follow ups mentioned in Comment #49, which _may_ be helpful for the other backends if the apps did links without immediately resolving them (another idea is to prepend compile jobs while appending link jobs as a poor man's priority system). For Vulkan, the change in Comment #50 is all I really wanted to do all along.
syous… via monorail
unread,
Nov 15, 2023, 10:22:28 AM11/15/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
This reverts commit 00eb6edba074a22389b09990ab856adfd417dd64.
Reason for revert: Sub tasks are not actually post-link tasks for all the other backends (other than Vulkan), but they are a real part of the link job.
Backends have the option to request frontend to run their LinkSubTask post-link. Do not wait for such sub tasks during most glProgram APIs. Note that we do wait for these "post-link sub tasks" incase of a program relink or first use by a draw call.
Blob cache tests rely on implicit caching of programs. Account for disableProgramCaching frontend feature when saving and retrieving from cache. Also skip EGLBlobCache tests if the feature is enabled.
commit ad13fec33da34c4bac8ef99441924e9386dd47c8 Author: Mohan Maiya <m.m...@samsung.com> Date: Sat Mar 30 22:31:49 2024
Vulkan: warmUpGraphicsPipelineCache(...) shouldn't set state
The prepareForWarmUpPipelineCache(...) method would have already setup all necessary state for the warm up task. Make that intent explicit by calling into a method that sets no state.
Wait for post-link tasks before resetting ProgramExecutableVk. Otherwise mGraphicsProgramInfos, which post-link tasks use to create pipeline, can be prematurely invalidated.
1. An early return caused compute warm up tasks to not be scheduled 2. There are usecases where a program is marked separable post-link. Perform waitForPostLinkTasksImpl unconditionally during save(...)
std::unordered_map doesn't support simultaneous read and write. Cache placeholder PipelineHelper in WarmUpGraphicsTask and std::move the newly created PipelineHelper when warm up is complete.
The last bit of responsibility still left in Program::syncState was to wait for post-link tasks for the sake of EGLBlobCacheTest tests. A new extension, GL_ANGLE_program_binary_readiness_query is created so that the wait can be done in the test itself.
This extension is ultimately useful for applications as well, so they can avoid blocking the CPU by calling glGetProgramBinary prematurely.
Vulkan: Improve pipeline warmup hit rate without GPL
When VK_EXT_graphics_pipeline_library is not in use, warmup would practically always hit a perf warning about it mismatching the graphics desc of the current draw call. This is because the color mask is not set appropriately, which is what this change does.
Since the backend wait for post-link tasks was made possible [1], the front-end may never get a chance to internally cache its binary. This is because the decision to cache the binary was based on the existence of post-link tasks, and the backend clearing that list made the front-end confused.
In this change, a specific bool tracks whether binary is cached, and that is checked at various times to make sure the binary cache is eventually done.
1. don't reset pipeline cache hit / miss counters after every frame 2. bugfix in LinkTaskVk::getResult(...) 3. accumulate counters in WarmUpTaskCommon::getResultImpl(...)