GPU Main Recorder Ordering Check Causes Rendering Issues

70 views
Skip to first unread message

shengjie hu

unread,
Jan 3, 2026, 8:50:58 PM (14 days ago) Jan 3
to Graphics-dev

Under Skia's graphite backend, the Recorder enforces strict ordering checks by default to apply certain performance optimizations. According to the discussion at https://issues.chromium.org/issues/406292843?pli=1, ordering inconsistencies can cause rendering issues in some cases. In version 136, the ordering check for Viz Recorder was relaxed while keeping the strict ordering check for GPU Main Recorder. However, I'm encountering the same issue in Chromium 137.0.7151.133, where the entire browser UI renders incorrectly, and chrome://gpu reports "Recordings are expected to be replayed in order". After debugging, the issue is not with Viz Recorder but with GPU Main Recorder. Setting require_ordered_recordings to false for GPU Main Recorder resolves the issue. Sorry, I cannot provide a reproduction URL as it's an internal website. I tested version 144 and cannot reproduce the issue. I cannot upgrade the Chromium kernel to the latest version in the short term. Are there other solutions? What would be the performance impact of setting GPU Main Recorder's require_ordered_recordings to false by default? If there's a commit that fixes this, please let me know. Thank you very much!

shengjie hu

unread,
Jan 4, 2026, 4:01:41 AM (14 days ago) Jan 4
to Graphics-dev, shengjie hu
 After extensive debugging, I identified the cause: TextAtlas TextureProxy cascade failure due to cross-Recording caching. 

The failure chain is as follows:

1.Recording N is created and activates a new TextAtlas page, creating a TextureProxy (uninstantiated, waiting for UploadTask to instantiate it)

2.During Recording N's prepareResources() phase, the UploadTask fails for some reason (resource exhaustion, GPU state issue, etc.), leaving the TextureProxy uninstantiated

3.Recording N fails and is discarded. invalidateAtlases() is called, but evictAllPlots() only clears Plot cache data — the TextureProxy remains cached in an uninstantiated state

4.Recording N+1 is built and reuses the same TextAtlas TextureProxy

5.In DrawPass::prepareResources(), the check detects the TextureProxy is uninstantiated → Recording N+1 also fails

6.Recording N+2 attempts insertRecording() → QueueManager detects Recording ID mismatch → reports "Recordings are expected to be replayed in order" → Recording N+2 is also rejected

7.This cascade continues, causing complete UI rendering failure


Workaround Fix:

I applied a fix in DrawPass::prepareResources() to attempt re-instantiation of uninstantiated TextureProxies, which breaks the cascade failure:

if (!fSampledTextures[i]->isInstantiated() && !fSampledTextures[i]->isLazy()) {
    if (!TextureProxy::InstantiateIfNotLazy(resourceProvider, fSampledTextures[i].get())) {
        SKGPU_LOG_E("DrawPass: Cannot sample from uninstantiated TextureProxy[%d], label: %s",
                    i, fSampledTextures[i]->label());
        return false;
    }
}

This fix has resolved the issue in my testing. I'm curious if a similar fix exists in version 143/144, or if there's an upstream commit addressing TODO() that I could backport.

Colin Blundell

unread,
Jan 5, 2026, 8:23:07 AM (13 days ago) Jan 5
to shengjie hu, Michael Ludwig, Graphics-dev
Thanks for the report and the detailed investigation! +Michael Ludwig, do you have thoughts here?

Michael Ludwig

unread,
Jan 5, 2026, 10:47:28 AM (12 days ago) Jan 5
to Colin Blundell, shengjie hu, Graphics-dev
1. That is very impressive debugging and I really appreciate how much work went into it.
2. Graphite's texture updating system is generally at risk of getting into an invalid state when a Recording fails or fails to snap due to runtime issues with resource creation, etc. Text atlases are particularly prone to this because their state changes over time and are reused often, but it could theoretically happen to even static images.
3. As part of the work to make recordings replayable and in any order, these uploads need to be updated to track whether or not they succeeded and to allow recordings to share upload tasks until it's successfully executed. This would solve the issue, but it is a larger engineering effort.
4. In https://b.corp.google.com/issues/433845560, we originally were crashing the GPU process when kAddCommandsFailed occurred but removed that because we thought that maybe we should crash silently instead of doing a CHECK.
5. In the reported failure cascade, it would be kInvalidRecording as the failing InsertStatus. Until #3 is addressed, we should crash the GPU process on any of these failures to short-circuit the recording failure.  While instantiating these lazy proxies would allow the recording to be inserted, the contents of that proxy would not be accurate since it would have not had the original uploads executed.  Resetting the GPU process is heavy handed but ensures we are in a consistent state.  Once the upload tasks are tracked better internally in Graphite to carry past failures, then we can relax that.

shengjie hu

unread,
Jan 7, 2026, 11:03:23 AM (10 days ago) Jan 7
to Graphics-dev, Michael Ludwig, shengjie hu, Graphics-dev, blun...@chromium.org
Hi Michael,

Thank you very much for your detailed response and valuable
suggestions. I really appreciate the time you took to review our
debugging work and provide guidance.

I understand your recommendation clearly. For Chromium 137, we will
implement the approach you suggested: in the `insertRecording` path,
we will detect when the order check fails (specifically the
`kInvalidRecording` case you mentioned) and proactively reset the GPU
process to restore a consistent state.

Best regards,
Reply all
Reply to author
Forward
0 new messages