Intent to Implement: Cooperative Scheduling

652 views
Skip to first unread message

Taiju Tsuiki

unread,
Feb 1, 2018, 6:48:20 AM2/1/18
to blink-dev

Contact emails

tz...@chromium.org, schedu...@chromium.org, platform-arc...@chromium.org


Explainer

Cooperative Scheduling makes long running third party JS tasks to yield the main thread for the renderer responsiveness. This change itself doesn’t add or update any API or spec, but changes the internal task execution semantics within the UA discretion.


Design doc/Spec

Design doc: Cooperative Scheduling.


Summary

Stop V8 execution at certain safepoints, and run a nested message loop for better responsiveness when a long running task from a third party iframe blocks the renderer main thread. To do it safely and spec compliant, we’ll reorganize schedulers and task queues into EventLoop, and introduce opt-in scope to manage reentrancy as described in the design doc.


Motivation

Improving the renderer responsiveness is the motivation. Long running tasks block the main thread, that stops all other contents that share the same main thread.


Risks

Interoperability and Compatibility

We consider there’s no risk, as Cooperative Scheduling modifies no spec nor API, and the change is within the UA  discretion.


Ergonomics

Cooperative Scheduling introduces an opt-in scope for running a nested message loop. We have to keep the call path around the scope reentrant.


Activation

N/A. Web developers need no action for this.


Debuggability

N/A.


Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

No. This feature targets Android only at least as the first target.

OOPIF (Out Of Process IFrame) should work fine for desktop browsers instead of Cooperative Scheduling.


Link to entry on the feature dashboard

N/A.


Requesting approval to ship?

No for now. We’ll implement it behind a flag.


Kentaro Hara

unread,
Feb 2, 2018, 12:56:48 AM2/2/18
to Taiju Tsuiki, blink-dev
I'd emphasize that the cooperative scheduling provides a mechanism to execute all third-party iframes in a jank-free manner. Which should be a huge win on responsiveness :) According to tzik's experiment, it reduces # of >100 ms tasks on http://www.cricbuzz.com by 30% on low-end Android.

Mozilla already shipped the cooperative scheduling as part of the Quantum project.

I'm super excited about it!




On Thu, Feb 1, 2018 at 8:48 PM, Taiju Tsuiki <tz...@chromium.org> wrote:

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFK_eqQOzc7cU0QuRMbiJ_Ywcgn2TYaQNdOv6szWAbMFeMAFow%40mail.gmail.com.



--
Kentaro Hara, Tokyo, Japan

Rick Byers

unread,
Feb 2, 2018, 3:39:32 AM2/2/18
to Kentaro Hara, Taiju Tsuiki, blink-dev
I'm also really excited about this - not just because it'll improve the user experience on many sites, but because (like site isolation) it provides performance isolation between frames - in some cases giving developers more control over (and ability to reason about changes in) the user experience.

For example, a developer may be able to control all the code that runs in their frame, and optimize it for a short TTI.  But as soon as they need to host a 3rd party iframe, their performance metrics can be hurt in a way they can't directly control (and the third party may feel little incentive to invest in improving the metric for their own code - incentives based on data like the Chrome UX report apply only to whole pages).  Worse, for a metric like TTI, any improvements or regressions they have in their frame may be completely masked by the impact of the worst iframe on the page.

Of course much of the performance problem on the mobile web is not cleanly confined to iframes (even ad networks often run much of their expensive code in the top document). But I think enabling the separation is an important pre-requisite.  Once we're able to announce the real-world impact of this and site isolation (hopefully including data in the Chrome UX report) I'm hopeful that we'll see more sites choose to leverage iframes to insulate themselves from code whose performance they don't trust without having to give up on the composition and monetization models that have allowed the web to thrive.

Boris Zbarsky

unread,
Feb 2, 2018, 9:54:18 AM2/2/18
to blink-dev
On 2/2/18 12:56 AM, Kentaro Hara wrote:
> Mozilla already shipped
> <https://hacks.mozilla.org/2017/06/an-inside-look-at-quantum-dom-scheduling/>
> the cooperative scheduling as part of the Quantum project.

Just to be clear, that blog post describes proposed work in progress.
We (Mozilla) are not in fact shipping cooperative scheduling so far, and
it's unclear that we will be at all, given initial measurements.

Not that this should affect what Chrome does, but I figured I'd set the
record straight on what Mozilla is doing. ;)

-Boris

Kentaro Hara

unread,
Feb 2, 2018, 10:16:14 AM2/2/18
to Boris Zbarsky, blink-dev
On Fri, Feb 2, 2018 at 11:54 PM, Boris Zbarsky <bzba...@mit.edu> wrote:
On 2/2/18 12:56 AM, Kentaro Hara wrote:
Mozilla already shipped <https://hacks.mozilla.org/2017/06/an-inside-look-at-quantum-dom-scheduling/> the cooperative scheduling as part of the Quantum project.

Just to be clear, that blog post describes proposed work in progress. We (Mozilla) are not in fact shipping cooperative scheduling so far, and it's unclear that we will be at all, given initial measurements.

I apologize for my misunderstanding. And thanks for the info!


> given initial measurements.

I'm just curious but does this mean that you didn't get a clear performance win (if you don't mind sharing it)?




Not that this should affect what Chrome does, but I figured I'd set the record straight on what Mozilla is doing.  ;)

-Boris
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Boris Zbarsky

unread,
Feb 2, 2018, 11:20:54 AM2/2/18
to Kentaro Hara, blink-dev
On 2/2/18 10:15 AM, Kentaro Hara wrote:
> I'm just curious but does this mean that you didn't get a clear
> performance win (if you don't mind sharing it)?

I haven't been following this closely, but my impression is that at
least for the moment the ratio of potential wins (that would be the
initial measurements) to engineering effort is less than for other
things we can work on, so we're focusing on those.

And once site isolation happens, the potential wins here need to be
reevaluated, of course.

-Boris

Kentaro Hara

unread,
Feb 2, 2018, 11:38:59 AM2/2/18
to Boris Zbarsky, blink-dev
Thanks Boris!

And once site isolation happens, the potential wins here need to be reevaluated, of course.

Yeah, our plan is to have both OOPIF and cooperative scheduling. For example, it would be hard to enable OOPIF on low-memory mobile devices. Even on desktops it's not uncommon that one renderer process is shared by multiple tabs. My assumption is that the cooperative scheduling would be useful for those scenarios. In other words, the cooperative scheduling is expected to solve jank issues that cannot be solved by OOPIF :)

Alexander Timin

unread,
Feb 2, 2018, 1:50:37 PM2/2/18
to Kentaro Hara, Taiju Tsuiki, Boris Zbarsky, blink-dev, scheduler-dev, Gabriel Charette, Alex Clarke, Sami Kyostila
Cooperative scheduling sounds like a great way to localize performance on mobile!

However, I'm very scared of nested message loops (we've had a great deal of problems with them already in the scheduler) and I'd like to avoid increasing their usage in Chromium by a magnitude.

What we want to do here is to implement user-space execution context switching. I wonder if instead using message loops we could implement it using coroutine-like approach by saving the stack state using setjmp/longjmp and yielding control back to the scheduler?

The benefits include reduced number of problems with reentrancy and explicit control of the execution flow from the scheduler (e.g. better control when we can resume execution of an interrupted task).
The other issue that this approach addresses is security -- this way we won't end up with frames both from main frame and cross-origin iframe on the same stack.
+alexclarke@, skyostil@, gab@ and scheduler-dev@.

P. S. I'd say that this has low to medium compat risks -- at the moment web developers do not expect that their scripts can be interrupted in the middle and this can break some metrics and analytics. It's likely that we'll have to expose some information about preemption to web devs.


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Daniel Cheng

unread,
Feb 2, 2018, 2:20:05 PM2/2/18
to Alexander Timin, Kentaro Hara, Taiju Tsuiki, Boris Zbarsky, blink-dev, scheduler-dev, Gabriel Charette, Alex Clarke, Sami Kyostila
On Fri, Feb 2, 2018 at 10:50 AM Alexander Timin <alt...@chromium.org> wrote:
Cooperative scheduling sounds like a great way to localize performance on mobile!

However, I'm very scared of nested message loops (we've had a great deal of problems with them already in the scheduler) and I'd like to avoid increasing their usage in Chromium by a magnitude.

What we want to do here is to implement user-space execution context switching. I wonder if instead using message loops we could implement it using coroutine-like approach by saving the stack state using setjmp/longjmp and yielding control back to the scheduler?

The benefits include reduced number of problems with reentrancy and explicit control of the execution flow from the scheduler (e.g. better control when we can resume execution of an interrupted task).

From my understanding of the proposal, we don't allow unlimited re-entrancy. I'd also be quite nervous about setjmp() and longjmp() and how they interact with things like RAII and destructors.
 
The other issue that this approach addresses is security -- this way we won't end up with frames both from main frame and cross-origin iframe on the same stack.
+alexclarke@, skyostil@, gab@ and scheduler-dev@.

I'm not sure how setjmp() and longjmp() would help with this: the state I'd be most concerned about is global state (e.g. what is the current execution context).
 

P. S. I'd say that this has low to medium compat risks -- at the moment web developers do not expect that their scripts can be interrupted in the middle and this can break some metrics and analytics. It's likely that we'll have to expose some information about preemption to web devs.

I don't think this would break things any more than OOPIF would, since a context can only yield to a cross-site context.

Daniel
 


On 2 February 2018 at 16:38, Kentaro Hara <har...@chromium.org> wrote:
Thanks Boris!

And once site isolation happens, the potential wins here need to be reevaluated, of course.

Yeah, our plan is to have both OOPIF and cooperative scheduling. For example, it would be hard to enable OOPIF on low-memory mobile devices. Even on desktops it's not uncommon that one renderer process is shared by multiple tabs. My assumption is that the cooperative scheduling would be useful for those scenarios. In other words, the cooperative scheduling is expected to solve jank issues that cannot be solved by OOPIF :)


On Sat, Feb 3, 2018 at 1:20 AM, Boris Zbarsky <bzba...@mit.edu> wrote:
On 2/2/18 10:15 AM, Kentaro Hara wrote:
I'm just curious but does this mean that you didn't get a clear performance win (if you don't mind sharing it)?

I haven't been following this closely, but my impression is that at least for the moment the ratio of potential wins (that would be the initial measurements) to engineering effort is less than for other things we can work on, so we're focusing on those.

And once site isolation happens, the potential wins here need to be reevaluated, of course.

-Boris



--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CABg10jxfR5TbO7ABwvvk_BaoAsZwB344dq3KJJnuUMNuEnpYNw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CALHg4nmCqhtY9%2BOb%2BojR9JySvVMiikXSw8tKKtWLD-oa7q4zbw%40mail.gmail.com.

Kentaro Hara

unread,
Feb 4, 2018, 8:44:49 PM2/4/18
to Daniel Cheng, Alexander Timin, Taiju Tsuiki, Boris Zbarsky, blink-dev, scheduler-dev, Gabriel Charette, Alex Clarke, Sami Kyostila
However, I'm very scared of nested message loops (we've had a great deal of problems with them already in the scheduler) and I'd like to avoid increasing their usage in Chromium by a magnitude.

Thanks Alexander -- this is a valid concern.

To mitigate the concern, in short term I'm planning to enable the cooperative scheduling only in the following stack:

  (no Blink C++ stack) => cross-origin V8 => (yield) => main thread task

From the performance perspective, I think this is already useful to dramatically reduce janks caused by cross-origin frames.

Do you think it will help?

========

If we want to support more cases, we can rewrite Blink so that a cross-origin V8 execution starts without having a Blink C++ stack. For example, the proposed cooperative scheduling cannot support a case where Blink runs a parser-blocking script of a cross-origin frame, because it has the following stack:

  (Blink C++ stack) => cross-origin V8 => (yield) => main thread task

Then we rewrite Blink as follows:

  (Blink C++ stack) => (post a task to run the parser-blocking script)
  (no Blink C++ stack) => cross-origin V8 => (yield) => main thread task

Then the cooperative scheduling can be enabled :)




On Sat, Feb 3, 2018 at 4:19 AM, Daniel Cheng <dch...@chromium.org> wrote:
On Fri, Feb 2, 2018 at 10:50 AM Alexander Timin <alt...@chromium.org> wrote:
Cooperative scheduling sounds like a great way to localize performance on mobile!

However, I'm very scared of nested message loops (we've had a great deal of problems with them already in the scheduler) and I'd like to avoid increasing their usage in Chromium by a magnitude.

What we want to do here is to implement user-space execution context switching. I wonder if instead using message loops we could implement it using coroutine-like approach by saving the stack state using setjmp/longjmp and yielding control back to the scheduler?

The benefits include reduced number of problems with reentrancy and explicit control of the execution flow from the scheduler (e.g. better control when we can resume execution of an interrupted task).

From my understanding of the proposal, we don't allow unlimited re-entrancy. I'd also be quite nervous about setjmp() and longjmp() and how they interact with things like RAII and destructors.

Yeah, at first I was thinking about introducing user-level context switching (or, Green Threads) but realized that it's pretty complex.

 
The other issue that this approach addresses is security -- this way we won't end up with frames both from main frame and cross-origin iframe on the same stack.
+alexclarke@, skyostil@, gab@ and scheduler-dev@.

I'm not sure how setjmp() and longjmp() would help with this: the state I'd be most concerned about is global state (e.g. what is the current execution context).

Agreed.


 

P. S. I'd say that this has low to medium compat risks -- at the moment web developers do not expect that their scripts can be interrupted in the middle and this can break some metrics and analytics. It's likely that we'll have to expose some information about preemption to web devs.

I don't think this would break things any more than OOPIF would, since a context can only yield to a cross-site context.

Daniel
 


On 2 February 2018 at 16:38, Kentaro Hara <har...@chromium.org> wrote:
Thanks Boris!

And once site isolation happens, the potential wins here need to be reevaluated, of course.

Yeah, our plan is to have both OOPIF and cooperative scheduling. For example, it would be hard to enable OOPIF on low-memory mobile devices. Even on desktops it's not uncommon that one renderer process is shared by multiple tabs. My assumption is that the cooperative scheduling would be useful for those scenarios. In other words, the cooperative scheduling is expected to solve jank issues that cannot be solved by OOPIF :)


On Sat, Feb 3, 2018 at 1:20 AM, Boris Zbarsky <bzba...@mit.edu> wrote:
On 2/2/18 10:15 AM, Kentaro Hara wrote:
I'm just curious but does this mean that you didn't get a clear performance win (if you don't mind sharing it)?

I haven't been following this closely, but my impression is that at least for the moment the ratio of potential wins (that would be the initial measurements) to engineering effort is less than for other things we can work on, so we're focusing on those.

And once site isolation happens, the potential wins here need to be reevaluated, of course.

-Boris



--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CABg10jxfR5TbO7ABwvvk_BaoAsZwB344dq3KJJnuUMNuEnpYNw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-dev+unsubscribe@chromium.org.
Reply all
Reply to author
Forward
0 new messages