ScopedFrame/AgentPauser

1,132 views
Skip to first unread message

Kentaro Hara

unread,
Aug 8, 2023, 3:50:55 AM8/8/23
to platform-architecture-dev, yangguo, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs)
Hi

Context: Recently bashi@ experimented with ProcessPerSite (TL;DR: Let multiple same-origin tabs share one renderer process) and it produced promising results to improve the global LCP. There are a couple of blockers to make ProcessPerSite launchable and we are resolving it one by one.

Problem: One blocker is ScopedPagePauser used by DevTools (see this bug for the detail). DevTools uses the ScopedPagePauser to pause the execution and debug. This causes a problem with ProcessPerSite because the ScopedPagePauser is a singleton in the renderer process and pauses all pages inside the process. e.g., when you pause an execution of example.com in one tab, example.com in the second tab is paused too. In theory, this is an existing bug because Chrome may put multiple tabs in one renderer process when it exceeds the soft process limit (see the "Process Reuse" section of this doc). However, it is rare enough. ProcessPerSite makes the problem happen much more often.

Proposed solution: MBI implemented a per-frame scheduler / per-agent scheduler. My proposal is to implement ScopedFramePauser / ScopedAgentPauser, which pauses tasks of the specific frames / agents. One caveat is that not 100% of the tasks go through the per-frame / per-agent schedulers (some of the tasks go through the global page scheduler -- we should fix them) but I think the coverage is close to 100%.

Will it solve the problem of DevTools? :)

Also, I think we should replace other usages of ScopedPagePauser (e.g., OpenJavaScriptDialog() and Print()) with ScopedFrame/AgentPauser.


--
Kentaro Hara, Tokyo

Yuki Shiino

unread,
Aug 8, 2023, 4:30:01 AM8/8/23
to Kentaro Hara, platform-architecture-dev, yangguo, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang
+cc: thestig@ just FYI,

Probably this will unblock https://crbug.com/956832 and https://crbug.com/1316315 . A long wanted feature.


2023年8月8日(火) 16:50 Kentaro Hara <har...@chromium.org>:
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CABg10jyDoc2r4-JyZtSUKqPvhnTG%3D3TkC6wF%3D%3DqPXRuZH9TUTw%40mail.gmail.com.

Kentaro Hara

unread,
Aug 8, 2023, 4:36:40 AM8/8/23
to Yuki Shiino, Kouhei Ueno, platform-architecture-dev, yangguo, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang
@Kouhei Ueno  pointed out that Charlie is already discussing this issue in this comment.

Charlie wrote:
I don't think changing ScopedPagePauser from process-wide to page or frame specific is a viable option.  There could be other pages in the same BrowsingContextGroup and process (e.g., a same-origin opener) which must also be paused, or else the opener can interact with a page that is stuck on a breakpoint.

In Blink terms, I think this means it would need to be PageGroup-wide, which is the representation of a BrowsingContextGroup in a given process.  (In browser process terms, DevTools could probably allow a different BrowsingInstance in the same renderer process to keep running during a breakpoint.)

Would it be hard for Blink to get a set of Pages in the same BrowsingContextGroup? Let's call it a PageGroup.

The browser process has the information. For example, can the browser process send an UnguessableToken of the BrowsingContextGroup when creating a Page in Blink? Then Blink can create the PageGroup and update ScopedPagePauser::SetPaused() to:

for (auto* page : page_group) {
  page->SetPaused(true);
}

As Yukishiino mentioned, this will fix the problems of OpenJavaScriptDialog(), Print() etc :)





--
Kentaro Hara, Tokyo

Dominic Farolino

unread,
Aug 8, 2023, 10:19:06 AM8/8/23
to Kentaro Hara, Yuki Shiino, Kouhei Ueno, platform-architecture-dev, yangguo, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang
Doesn't AgentGroupScheduler already cover multiple blink::Pages in the same browsing context group? In the same-origin opener case that Charlie mentioned, I believe the newly-created blink::Page/blink::WebView is governed by the opener's AgentGroupScheduler, which would mean we already have the primitives required to achieve this. At least I'm pretty sure that was the motivation with MBI. Consider this flow:

  old_page->CreateNewWindow()

I'll admit this is just a cursory code search, but it's motivated by my hunch that trying to introduce something like PageGroup, specifically for scheduling, is redundant over the existing MBI primitives. And separately:

Would it be hard for Blink to get a set of Pages in the same BrowsingContextGroup? Let's call it a PageGroup.

I think this might already be covered by the concept of "related" pages in Blink? See blink::Page::RelatedPages(), which consults a "circular, double-linked list of pages that are related to the current browsing context". We should confirm whether that gives us all top-level browsing contexts in a particular BCG, but I think it might.

Kentaro Hara

unread,
Aug 8, 2023, 11:42:39 AM8/8/23
to Dominic Farolino, Yuki Shiino, Kouhei Ueno, platform-architecture-dev, yangguo, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang
Thanks Dominic! We're getting close :)

I agree that blink::Page::RelatedPages() will *mostly* give us what we need now *if* an AgentGroupScheduler is 1:1 with a SiteInstanceGroup.

A few things:

- An AgentGroupScheduler is 1:1 with a SiteInstanceGroup when the MBI mode is on. Currently the MBI mode is off, and the AgentGroupScheduler is created per renderer process (so blink::Page::RelatedPages() is equal to blink::Page::OrdinaryPages()). Is my understanding correct?

- Enabling the MBI mode will require more engineering work. In the meantime, I think we can just implement blink::Page::RelatedPages() so that it returns the PageGroup regardless of whether the MBI mode is on or off. It should be easy. This unblocks us.

- I wrote "mostly" because I think that blink::Page::RelatedPages() should return pages in the SiteInstance, not the SiteInstanceGroup. The SiteInstanceGroup is a performance isolation unit and may contain an arbitrary number of SiteInstances in the BrowsingContextGroup. If blink::Page::RelatedPages() returns pages in the SiteInstanceGroup, what SiteInstances are paused by the ScopedPagePauser depends on the site isolation policy. To make the behavior deterministic, blink::Page::RelatedPages() should return pages in the SiteInstance (not the SiteInstanceGroup). It won't be hard to implement this.

Overall my proposal is: Regardless of whether the MBI mode is enabled or not, implement blink::Page::PagesInTheSameSiteInstance(). And make the ScopedPagePauser use it.

Does it make sense? :)

--
Kentaro Hara, Tokyo

Charlie Reis

unread,
Aug 8, 2023, 7:25:50 PM8/8/23
to Kentaro Hara, Dominic Farolino, Yuki Shiino, Kouhei Ueno, platform-architecture-dev, yangguo, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang, Chromium Site Isolation, Daniel Cheng, Camille Lamy
[+site-isolation-dev, dcheng, clamy]

At a high level, I'm excited to see work to improve ScopedPagePauser to be at the granularity of Browsing Context Groups, which is useful whether or not ProcessPerSite mode is used more often or not.  (I think there are some big caveats in the experiment results that we still need to discuss before decisions about ProcessPerSite are made.)

blink::Page::RelatedPages() is the current approximation of Browsing Context Groups in Blink, and ahemery@ added Page::BrowsingContextGroupToken() as well in r1145881 for the COOP:RP work.  He filed https://crbug.com/1464622 to create an actual BrowsingContextGroup class, as an improvement over tracking the tokens.

I don't think you want to use SiteInstanceGroup or AgentSchedulingGroup for any of this-- both of those can be tuned and change their behavior over time, as haraken@ notes.  SiteInstance is also not the right granularity-- DevTools needs to pause everything in the Browsing Context Group in a given process, not just the SiteInstance.

So, we should be aiming to pause all pages with the same BrowsingContextGroupToken(), which I understand to be the pages in blink::Page::RelatedPages().

Charlie

Kentaro Hara

unread,
Aug 8, 2023, 8:55:15 PM8/8/23
to Charlie Reis, Dominic Farolino, Yuki Shiino, Kouhei Ueno, platform-architecture-dev, yangguo, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Lei Zhang, Chromium Site Isolation, Daniel Cheng, Camille Lamy
I don't think you want to use SiteInstanceGroup or AgentSchedulingGroup for any of this-- both of those can be tuned and change their behavior over time, as haraken@ notes.  SiteInstance is also not the right granularity-- DevTools needs to pause everything in the Browsing Context Group in a given process, not just the SiteInstance.

SiteInstance defines a group of documents that are scriptable with each other. Would you help me understand why DevTools needs to pause things that are not scriptable from the main frame?

If DevTools pauses everything in the BrowsingContextGroup, I was concerned that it will lead to a non-deterministic behavior. Imagine you have A(B). A is the main frame from a.com and B is a subframe from b.com. A and B have different SiteInstances. A and B are in the same BrowsingContextGroup. A and B may belong to a different renderer process depending on the performance policy.

If DevTools pauses only the SiteInstance of A, A is paused and B is not paused. This is deterministic.

DevTools pauses everything in the BrowsingContextGroup, A is paused and B may or may not be paused depending on whether A and B belong to the same renderer process or not. This is non-deterministic.

Am I missing something...?
 
So, we should be aiming to pause all pages with the same BrowsingContextGroupToken(), which I understand to be the pages in blink::Page::RelatedPages().

I'm very happy with this simple conclusion though :)

--
Kentaro Hara, Tokyo

Yang Guo

unread,
Aug 8, 2023, 9:10:42 PM8/8/23
to Kentaro Hara, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Charlie Reis, Alex Moshchuk, Kenichi Ishibashi (via Google Docs)
Hi,

Thanks for seriously considering concerns from the DevTools team! It's great however that the experiment produced promising results, enough to pursue this idea further.

I'll leave it up to +Benedikt Meurer , +Danil Somsikov , and +Jaroslav Sevcik to decide whether the proposed solution works, and comment on the details of the proposal.

Cheers,

Yang




--

Charlie Reis

unread,
Aug 9, 2023, 12:15:11 PM8/9/23
to Yang Guo, Kentaro Hara, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Rakina Zata Amni
On Tue, Aug 8, 2023 at 5:55 PM Kentaro Hara <har...@chromium.org> wrote:
I don't think you want to use SiteInstanceGroup or AgentSchedulingGroup for any of this-- both of those can be tuned and change their behavior over time, as haraken@ notes.  SiteInstance is also not the right granularity-- DevTools needs to pause everything in the Browsing Context Group in a given process, not just the SiteInstance.

SiteInstance defines a group of documents that are scriptable with each other. Would you help me understand why DevTools needs to pause things that are not scriptable from the main frame?

If DevTools pauses everything in the BrowsingContextGroup, I was concerned that it will lead to a non-deterministic behavior. Imagine you have A(B). A is the main frame from a.com and B is a subframe from b.com. A and B have different SiteInstances. A and B are in the same BrowsingContextGroup. A and B may belong to a different renderer process depending on the performance policy.

If DevTools pauses only the SiteInstance of A, A is paused and B is not paused. This is deterministic.

DevTools pauses everything in the BrowsingContextGroup, A is paused and B may or may not be paused depending on whether A and B belong to the same renderer process or not. This is non-deterministic.

Am I missing something...?

This is a good question, and it has more to do with Chrome's implementation than the basic concept.  In theory, SiteInstances are independent of each other and we could pause one without pausing the others.  But in practice, the code in a given renderer process doesn't run them fully independently.

One example: functions like focus, close, and postMessage do different things based on whether it's LocalFrame or RemoteFrame.  If A(B) are in two different SiteInstances in different processes (e.g., on desktop), then focus calls between them will use the RemoteFrame version, and send an IPC without blocking.  However, if A(B) are two different SiteInstances in the same SiteInstanceGroup and process (e.g., what we're trying to build for Android), they'll use the LocalFrame versions when calling focus on each other.  I suspect those will try to do the operation immediately and end up blocked if one of those SiteInstances were frozen by DevTools.  (I haven't verified this.)

A second example: if A(B) are in the same SiteInstanceGroup and process, they will share the same RenderWidget.  This means any painting/input work done by either one affects the other, and it seems pretty unlikely to me that we can leave the RenderWidget unblocked for B while A is frozen in DevTools.

I could be wrong, and it might be possible to make these things work, but I suspect we'll need to freeze any same-BCG frames within the renderer process.  (I do agree it has the downside that a different number of frames will be frozen depending on how many frames are in the process, though there's probably other ways to observe behavior differences with OOPIFs in DevTools anyway.)

Adding rakina@, since this point may also matter for any plans to allow bfcache to work for cross-SiteInstance cases within a BrowsingContextGroup.  (That may have to avoid cases where two SiteInstances share a process and thus a RenderWidget, etc.)

Charlie

Kentaro Hara

unread,
Aug 9, 2023, 8:10:14 PM8/9/23
to Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Kenichi Ishibashi (via Google Docs), Rakina Zata Amni
Thanks Charlie! I'm convinced.

Maybe the next step is to update ScopedPagePauser to use blink::Page::RelatedPages() and investigate problems it may cause?

It's actually a one-line change :D
--
Kentaro Hara, Tokyo

Kenichi Ishibashi

unread,
Aug 28, 2023, 1:26:00 AM8/28/23
to Kentaro Hara, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Sorry for reviving this, but I have a question.

ScopedPagePauser holds MainThreadScheduler::RendererPauseHandle, which is also a singleton in a renderer effectively. Do we also need to make it per browsing context group? If so, how do we do that? Could something like `map<BrowsingContextGroupToken, int>` work?

Kentaro Hara

unread,
Aug 28, 2023, 2:16:49 AM8/28/23
to Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
MainThreadScheduler is a thing for the main thread. Thus our BrowsingContextGroupPagePauser should NOT pause it. If we pause this, a BrowsingContextGroupPagePauser on one tab ends up pausing the execution of a different tab because the main thread stops working.

I think you can just stop calling MainThreadScheduler::RendererPauseHandle. In theory :)

 
--
Kentaro Hara, Tokyo

Dave Tapuska

unread,
Sep 18, 2023, 12:17:23 PM9/18/23
to Kentaro Hara, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Maybe I am missing something here. But the whole IPC channel still serializes across the same pipe and we can't really run some things on that serialized channel and not others. That was the goal of MBI: that we could separate the IPC pipe per blink::AgentGroupScheduler and then enforce that everything in that group was able to be paused.

I thought AgentGroupScheduler from an spec point of view is intended to match the BrowsingContextGroup. (in the ideal scenario). Specifically, if we would have two tabs (and they are unrelated, we'd have two AgentGroupSchedulers (two separate IPC pipes, two v8::Isolates, etc...)).

dave.

Kentaro Hara

unread,
Sep 18, 2023, 8:50:43 PM9/18/23
to Dave Tapuska, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Yes, that's the goal of MBI. We want to create an IPC channel per AgentGroupScheduler.

My point is that ProcessPerSite is not blocked by MBI because:

- Sharing one renderer process among multiple tabs is already happening (e.g., after reaching a soft limit on Desktop).
- DevTools only freezes task schedulers, not the IPC channel. As long as we implement a mechanism to pause task schedulers of RelatedPages() + prevent developers from running DevTools in a nested loop, things will work.

Thoughts? :)
--
Kentaro Hara, Tokyo

Dave Tapuska

unread,
Sep 18, 2023, 10:00:01 PM9/18/23
to Kentaro Hara, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
There still are some ipcs & tasks that touch every frame (like extensions and devtools protocol messages).

Dave

Kentaro Hara

unread,
Sep 18, 2023, 11:53:27 PM9/18/23
to Dave Tapuska, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
I think you are talking about a per-thread task (which runs through the per-thread scheduler).

Currently ScopedPagePauser pauses the per-thread task because it pauses all the tasks in AllPages(). This is problematic when the renderer process is shared between multiple tabs (common: the main frame of one tab + sub frames of another tab, rare: the main frame of one tab + the main frame of another tab). This is broken and we need to fix it.

ProcessPerSite changes it to RelatedPages() and does not pause the per-thread task. This will fix the problem in the above but have a risk of running tasks that should not run.

So both may result in broken behaviors. The right solution is to move the tasks to the per-frame schedulers maybe?


--
Kentaro Hara, Tokyo

Dominic Farolino

unread,
Sep 19, 2023, 8:33:28 AM9/19/23
to Kentaro Hara, Dave Tapuska, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
So both may result in broken behaviors. The right solution is to move the tasks to the per-frame schedulers maybe?

This, I thought, was precisely the goal with MBI. Move everything to per-frame TaskRunners, and then cluster the frames based on ASG. This involves migrating process-wide IPCs (which result in renderer-global tasks) and also  other renderer-wide non-IPC tasks.

Dave Tapuska

unread,
Sep 19, 2023, 8:52:08 AM9/19/23
to Kentaro Hara, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
I was specifically concerned with the Agent group scheduler default task runner. 

Somethings fall back to this task runner (as opposed to the thread task runner (which there are few now) and if we don't pause the agent group scheduler it will be problematic. 

This gets back to Charlie's point saying doing it at the AgentGroupScheduler is the wrong spot. In the ideal world I'd think pausing the agent group scheduler might be the right spot. I'd still have to think hard about what scenarios are problematic for Charlie. 

Secondly there are some assumptions in the MainThreadDebugger which a assume it pauses for the associated isolate. These would be much easier if we had the isolate allocated per agent group scheduler and a main thread debugger per isolate.  

I feel some of this is much simpler with mbi. I've been slowly making progress on that.

But yes my opinion is that moving this page pauses to related pages could be problematic without mbi. 

Dave

Dave Tapuska

unread,
Sep 21, 2023, 11:41:14 AM9/21/23
to Kentaro Hara, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
So I chatted with domfarolino@ and haraken@ today.

I explained that the AgentGroupScheduler has a default task runner and the main thread is currently caused by the PauseHandle. Switching to the group of related pages and removing the pause handle is dangerous. If we do not pause the AgentGroupScheduler we are going to run into a number of issues with IPCs and other tasks running inside the group of related pages that should be paused.

The current thinking is that we should continue working on MBI in order to unblock ProcessPerSite and Cooperative Scheduling, with ProcessPerSite having higher priority.

I explained that we still have some legacy IPC mechanisms that still exist that must be removed:
1) GIN java bridge IPC (Android only, no impact on ProcessPerSite)
2) Android legacy CDM key IPC (Android only, no impact on ProcessPerSite)
3) Extension IPCs (Impacts ProcessPerSite, non trivial, 23+2 messages remain)
4) RenderThread Channel bound IPCs (hopefully can migrate these to be ASG channel bound requests)
5) NACL/PPAPI legacy IPC (verify that there is no impact, I believe these don't communicate via any RenderFrameHost and we should be able to turn legacy IPC off for all renderers other than Nacl/PPAPI if we fix the above).

I'll dedicate some time to #3 and see how we can progress. #2 seems in hand, a CL was posted in June but didn't get reviewed so I've pinged the author.

dave.

Dominic Farolino

unread,
Sep 25, 2023, 11:04:48 AM9/25/23
to Dave Tapuska, Kentaro Hara, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Thanks for the summary Dave!

On Thu, Sep 21, 2023 at 11:41 AM Dave Tapuska <dtap...@chromium.org> wrote:
So I chatted with domfarolino@ and haraken@ today.

I explained that the AgentGroupScheduler has a default task runner and the main thread is currently caused by the PauseHandle. Switching to the group of related pages and removing the pause handle is dangerous. If we do not pause the AgentGroupScheduler we are going to run into a number of issues with IPCs and other tasks running inside the group of related pages that should be paused.

The current thinking is that we should continue working on MBI in order to unblock ProcessPerSite and Cooperative Scheduling, with ProcessPerSite having higher priority.

I explained that we still have some legacy IPC mechanisms that still exist that must be removed:
1) GIN java bridge IPC (Android only, no impact on ProcessPerSite)

I do wonder if this actually needs to be migrated further after https://chromium-review.googlesource.com/c/chromium/src/+/2681232, which already seems to make the Gin Java Bridge 1:1 with ASG. Maybe that's sufficient (and we can simply migrate the interface from legacy IPC => mojo separately?). Not sure...
 
2) Android legacy CDM key IPC (Android only, no impact on ProcessPerSite)
3) Extension IPCs (Impacts ProcessPerSite, non trivial, 23+2 messages remain)
4) RenderThread Channel bound IPCs (hopefully can migrate these to be ASG channel bound requests)

As discussed in our meeting, the spreadsheet I used to use to track all of these was https://docs.google.com/spreadsheets/d/15gGAwsLNkpECSSz9znsViRxSV-0fYq1GAu_0u25qX5U/edit#gid=0. It's probably a little outdated at this point, but it was a good reference for us when we were working on MBI, to keep track of the various process-global interfaces that were associated with the legacy IPC channel.
 

Kentaro Hara

unread,
Oct 12, 2023, 10:48:00 PM10/12/23
to Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Hi

ProcessPerSite is currently causing some issues with DevTools. If DevTools sets a breakpoint in one tab, DevTools cannot be attached to another tab hosted by the same process (i.e., we cannot support cases where DevTools is attached in a nested message loop). This is because ProcessPerSite creates only one V8 isolate.

Currently we are working on a mitigation plan (and my hope is that we can go with the plan but it requires a product decision) but I'm curious about this question: How much work will be needed to support multiple V8 isolates? More specifically, how much work will be needed to create one V8 isolate per AgentSchedulingGroup?

@Dave Tapuska is removing the usage of v8::Isolate::GetCurrent() from the codebase and we are close to completion (thanks!). My understanding is that this is not enough to create multiple V8 isolates. The biggest challenge is to implement memory isolation for Oilpan. Since Oilpan runs on a per-isolate basis, we need to ban Persistent / Member handles crossing the isolate boundary. We need to replace them with CrossThreadHandle or something else, which will require a lot of engineering work.

Is my understanding correct? Is there any other work needed to create multiple V8 isolates?

(I'm not proposing we do this now. I'm trying to figure out what work is needed.)
--
Kentaro Hara, Tokyo

Dave Tapuska

unread,
Oct 12, 2023, 11:07:04 PM10/12/23
to Kentaro Hara, Dominic Farolino, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
I'll try to prepare a list of things. I believe there were some spreadsheets that had looked at static locals that needed to move to per isolate data.  The isolate is very available so this likely isn't difficult.

The main thing I'm working on now rewriting the ipc channel for extensions. Once we do this then we won't have an iteration across all v8::contexts per thread. I hope that this will only take a couple more weeks, I have is mostly all prototyped but have to land it. 

I told Kouhei when he was visiting Waterloo it would be nice if we had a switch for devtools like Safari (and Android (adb)) does. Then the majority of our customers could have process per site if we had devtools disabled. But I guess that goes against Chrome being easy for developers. 

Dave

Kentaro Hara

unread,
Oct 12, 2023, 11:44:02 PM10/12/23
to Dave Tapuska, Dominic Farolino, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Thanks Dave!

Just to clarify, I'm curious to understand the work needed to create one V8 isolate per AgentSchedulingGroup while hosting all V8 isolates on the (one) main thread (i.e., I'm not (yet) talking about multiple main threads) :)


--
Kentaro Hara, Tokyo

Michael Lippautz

unread,
Oct 13, 2023, 3:58:53 AM10/13/23
to Kentaro Hara, Dave Tapuska, Dominic Farolino, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
On Fri, Oct 13, 2023 at 5:44 AM Kentaro Hara <har...@chromium.org> wrote:
Thanks Dave!

Just to clarify, I'm curious to understand the work needed to create one V8 isolate per AgentSchedulingGroup while hosting all V8 isolates on the (one) main thread (i.e., I'm not (yet) talking about multiple main threads) :)



On Fri, Oct 13, 2023 at 12:07 PM Dave Tapuska <dtap...@chromium.org> wrote:
I'll try to prepare a list of things. I believe there were some spreadsheets that had looked at static locals that needed to move to per isolate data.  The isolate is very available so this likely isn't difficult.

The main thing I'm working on now rewriting the ipc channel for extensions. Once we do this then we won't have an iteration across all v8::contexts per thread. I hope that this will only take a couple more weeks, I have is mostly all prototyped but have to land it. 

I told Kouhei when he was visiting Waterloo it would be nice if we had a switch for devtools like Safari (and Android (adb)) does. Then the majority of our customers could have process per site if we had devtools disabled. But I guess that goes against Chrome being easy for developers. 

Dave


On Thu, Oct 12, 2023, 10:47 PM Kentaro Hara <har...@chromium.org> wrote:
Hi

ProcessPerSite is currently causing some issues with DevTools. If DevTools sets a breakpoint in one tab, DevTools cannot be attached to another tab hosted by the same process (i.e., we cannot support cases where DevTools is attached in a nested message loop). This is because ProcessPerSite creates only one V8 isolate.

Currently we are working on a mitigation plan (and my hope is that we can go with the plan but it requires a product decision) but I'm curious about this question: How much work will be needed to support multiple V8 isolates? More specifically, how much work will be needed to create one V8 isolate per AgentSchedulingGroup?

@Dave Tapuska is removing the usage of v8::Isolate::GetCurrent() from the codebase and we are close to completion (thanks!). My understanding is that this is not enough to create multiple V8 isolates. The biggest challenge is to implement memory isolation for Oilpan. Since Oilpan runs on a per-isolate basis, we need to ban Persistent / Member handles crossing the isolate boundary. We need to replace them with CrossThreadHandle or something else, which will require a lot of engineering work.


CrossThread* is not a replacement for Member per se, as they are treated as roots whereas Member is a regular traced reference. If you compose them left and right the chance for creating a memory leak is very high. On top of that the performance profiles (time + memory) are different with Member being more lightweight.

The long term roadmap for us is working towards a global heap where multiple V8 Isolates can be attached to. This is relatively far out though as it is not work that's currently being prioritized.
 

Dave Tapuska

unread,
Oct 13, 2023, 10:36:29 AM10/13/23
to Kentaro Hara, Dominic Farolino, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni

Kentaro Hara

unread,
Oct 13, 2023, 3:58:31 PM10/13/23
to Dave Tapuska, Dominic Farolino, Kenichi Ishibashi, Charlie Reis, Yang Guo, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
We need to take care of cross-isolate references somehow. It requires either using CrossThreadHandles for the cross-isolate references or as Michael said, implementing a multi-isolate GC in V8.

I'm not proposing we do this now. I was trying to figure out whether ProcessPerSite can realistically depend on multiple V8 isolates. It looks like it requires too much work :)


--
Kentaro Hara, Tokyo

Charlie Reis

unread,
Oct 19, 2023, 9:56:33 PM10/19/23
to Yang Guo, Dominic Farolino, Dave Tapuska, Kentaro Hara, Kenichi Ishibashi, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
The CSA team was discussing the ideas in ProcessPerSite and DevTools: User journeys and problems for how to make ProcessPerSite less disruptive for DevTools, if it's unlikely that more granular pausing will happen soon.

We're curious if another approach might be considered, either alongside or instead of some of the options.  Today, DevTools shows a "Paused in debugger" translucent overlay when a frame hits a breakpoint:
paused-in-debugger.png

However, DevTools doesn't show that overlay on other frozen frames in the same process (whether they're sibling frames in the same tab, a same-origin popup tab that is required to be in the same process, or an unrelated tab that happens to share the same process because of either process reuse or the proposed ProcessPerSite mode).  Could we show a similar overlay on all affected frames in the same process, maybe with a button to jump to the DevTools instance where the breakpoint is?

That seems like it has a few benefits:
1) It explains why various tabs and frames are frozen, connecting them to something understandable.
2) It gives the user something actionable to do (i.e., unpausing the frozen frame if needed).
3) It helps in cases that are fundamental and can't be removed (e.g., a same-origin popup).
4) It could make proposed modes like ProcessPerSite marginally less of a problem as well.

Thoughts?
Charlie

Kentaro Hara

unread,
Oct 20, 2023, 1:27:08 AM10/20/23
to Charlie Reis, Yang Guo, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Benedikt Meurer, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Thanks Charlie for the idea!

I might want to add that regardless of whether we do ProcessPerSite or not, once MBI is complete, I think we are planning to make ScopedPagePauser pause only pages in the same BrowsingContextGroup (instead of all pages). In that world, it will not happen that a DevTools breakpoint pauses "an unrelated tab that happens to share the same process because of either process reuse or the proposed ProcessPerSite mode".

I like your proposal but just wanted to note that it might not help ProcessPerSite :)

--
Kentaro Hara, Tokyo

Benedikt Meurer

unread,
Oct 20, 2023, 9:27:41 AM10/20/23
to Kentaro Hara, Charlie Reis, Yang Guo, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
FWIW I'm very much onboard with Charlies idea. While it's surely not ideal, it's definitely better to allow developers to debug whenever they have to, and worst case stop a few other tabs meanwhile.

cheers,
Benedikt
--

Benedikt Meurer

Chrome DevTools Manager

bme...@google.com


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. 

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Charlie Reis

unread,
Oct 20, 2023, 1:43:38 PM10/20/23
to Benedikt Meurer, Kentaro Hara, Yang Guo, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Great!  I've filed https://crbug.com/1494434 for it, which I'm guessing would involve DevTools UI folks.

I might want to add that regardless of whether we do ProcessPerSite or not, once MBI is complete, I think we are planning to make ScopedPagePauser pause only pages in the same BrowsingContextGroup (instead of all pages). In that world, it will not happen that a DevTools breakpoint pauses "an unrelated tab that happens to share the same process because of either process reuse or the proposed ProcessPerSite mode".

Totally agree-- MBI and having separate V8 isolates should allow more granular pausing and would reduce the number of frames that are frozen and need overlays (likely to just those in the same SiteInstance).

Charlie

Yang Guo

unread,
Oct 23, 2023, 4:33:06 AM10/23/23
to Charlie Reis, Benedikt Meurer, Kentaro Hara, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
I think after discussing all other options, I also like Charlie's proposal as it offers transparency to developers, while giving them agency on how to best proceed, without taking away choices (e.g. not allowing pausing at all). We should definitely keep this in mind as an option, while also exploring others.

Yang


--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Kentaro Hara

unread,
Oct 23, 2023, 4:57:37 AM10/23/23
to Yang Guo, Charlie Reis, Benedikt Meurer, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
I think after discussing all other options, I also like Charlie's proposal as it offers transparency to developers, while giving them agency on how to best proceed, without taking away choices (e.g. not allowing pausing at all).

I like Charlie's proposal too as a general improvement of DevTools but would you help me understand how it helps ProcessPerSite?

With ProcessPerSite, the second tab from the same origin keeps running when a developer pauses the first tab. How will Charlie's proposal help prevent developers from using DevTools on the second tab? (If we pause the second tab, that will create more problems, as was reported by multiple developers when we ran the Finch experiment a few months ago.)
--
Kentaro Hara, Tokyo

Yang Guo

unread,
Oct 23, 2023, 5:02:43 AM10/23/23
to Kentaro Hara, Charlie Reis, Benedikt Meurer, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
From what I understand from Charlie, the proposal is to display an overlay in all tabs from the same process that is affected by pausing one tab hosted by this process. This way, if one tab is paused, the developer understands that other tabs hosted in the same process are also affected. If they want to use these other tabs, they can choose to resume the paused tab. If they want to debug these other tabs, they can choose to resume the paused tab and then start a debugging session for the tab they want to debug.

That will not solve the issue of debugging two tabs hosted by the same process simultaneously. Hopefully the UMA numbers can give us an indication how likely this is. And the other idea of disabling ProcessPerSite based on some heuristic could also reduce the likelihood of this becoming an issue.

WDYT?

Yang

Kentaro Hara

unread,
Oct 23, 2023, 8:02:36 AM10/23/23
to Yang Guo, Charlie Reis, Benedikt Meurer, Dominic Farolino, Dave Tapuska, Kenichi Ishibashi, Danil Somsikov, Jaroslav Sevcik, platform-architecture-dev, Alex Moshchuk, Rakina Zata Amni
Ah, that makes a lot of sense. Thanks for the clarification!





--
Kentaro Hara, Tokyo
Reply all
Reply to author
Forward
0 new messages