Very cool experiment, and happy to see positive results!
Feedback RE: "no-op frames often occupy the main thread during page load or input handling"...
My past experience with interactions and smoothness is that no-op-raf-loops can have more subtle effects on UX than would be predicted by just occupying the main thread. Perhaps some pages have expensive rAF callbacks, but commonly what I had seen in the past was something more like the following:
- User input arrives. A HIGHEST priority task is scheduled, jumping the task queue.
- Some default action (or custom event listeners), trigger. Page is updated and requires rendering.
- If you haven't had a rAF this vsync yet, we will schedule BMF as soon as possible, and at highest priority, typically fast enough to get scheduled immediately after the event.
- But, if you have already had a rAF scheduled for this animation frame already (such as would always be true with a raf loop), now we will now wait for the next vsync first.
- This extra wait means we have some idle time after the event, and will schedule some other normal/background/idle priority tasks. It was these that would block the main thread. (This could even include work that you explicitly deferred from the event listener, expecting it to run after rendering, and you thought you could delay a long-task until after INP triggers...)
In other words: in the presence of raf-loops, you tend to get extra normal/idle priority tasks sneak after input and before rendering (assuming the event handler doesn't itself run long and into the next vsync). That was often the worst culprit of latency, that I noticed.
One specific example I recall: fetch() calls from an event listener. If the resource was already cached, the .then() handler would either get scheduled before or after the BMF, depending if we had to wait for vsync or not (i.e. such as animations, or the presence of raf-loop based monitoring). And the fetch().then() handler might be very expensive and the developer expects it to always be scheduled way later than the input event, and so doesn't think it affects responsiveness. (For example, several SPA routers historically would do this by default-- and route prefetching on hover made INP performance worse!)
(Note: Some of the above is changing with an experiment Scott Hasely is running to defer task scheduling between input and rendering.)
---
All that said, if I'm right about the above, it seems to me that decreasing frame rates would only fix the latency issue ~50% of the time. Assuming that the rAF still gets scheduled at the start of the vsync cycle when it does trigger, just happens to be at half the frame rate? That's a good step -- but I wonder if we could use the signal (4 raf frames without pixel updates) in the future to go even further?
- Could we also defer the scheduling of the BMF tasks to be at the end of the vsync cycle (i.e. by also delaying BMF task for 1vsync duration)?
- This way, if input does arrive even mid-frame for a frame that wasn't throttled, the event would still be scheduled first.
- Something like this could still hit 30hz, though each frame has a bit more latency. Since these frames will likely not commit an update anyway, perhaps frame deadlines don't matter?
- And, what about the opposite: if we know that rAF's are producing pixel updates consistently, could we schedule BMF with higher priority?
- vsync frame rates are one thing, but today we also have a policy of defaulting the BMF task to NORMAL priority (for 100ms, then boosting to HIGHEST).
- On some pages we see this policy decreasing main thread rendering rates to 8-10hz if other tasks are already queued in the scheduler.
- Perhaps this signal should be used to drive rendering task priority?
- I've heard that the current scheduling policy works well for early loading use cases to balance latency and throughput-- and that we've wanted to experiment with a higher priority rendering mode when post-load, but worried about over-scheduling no-op raf loops... maybe this feature helps alleviate that risk?
Cheers!