Re: Background Processes on ChromeOS

334 views
Skip to first unread message

François Doray

unread,
Aug 3, 2022, 9:25:12 PM8/3/22
to Youssef Esmat, Gabriel Charette, chrome-catan, Chrome Scheduler, scheduler-dev, Zheng, Hong, Chen, Zheda
[ This discussion does not contain confidential information, I'm adding schedu...@chromium.org to open it to external contributors. ]

On Wed, Aug 3, 2022 at 9:05 PM François Doray <fdo...@google.com> wrote:
Hi Youssef,

Sandbox restrictions that exist today should be preserved.

If a thread is running at priority PT in a process with priority PP1, it should also be running at priority PT after changing the process' priority to PP2 and then back to PP1. The OS can support this by keeping track of priorities set for each thread independently from the effective priority of each thread. 

Concrete example:

Thread T1 in Process P1, [t1 requested priority = normal, t1 effective priority = normal, p1 priority = normal]
Priority is set to Background for Process P1, [t1 requested priority = normal, t1 effective priority = background, p1 priority = background]
Priority is set to  Normal Process P1, [t1 requested priority = normal, t1 effective priority = normal, p1 priority = normal]

Do you think this should be implemented in Chrome OS instead of Chrome Browser?

Thanks,

François

On Wed, Aug 3, 2022 at 4:00 PM Youssef Esmat <yousse...@google.com> wrote:
Hi All,

Picking this up again. When you say "kernel do this instead", what do you mean by "this"?
I do agree that ideally the browser process should not have to manage the different cgroups (cpuset/cpu) and the nice values. That could translate to a different set of APIs exposed by the OS where the browser process could tell the OS that the renderer process should be in "foreground" or "background" and the OS would place the process in the correct cgroups. We could also have a similar API at the thread level allowing some threads to be higher priority than others.

However, the second part of the problem is, which process is allowed to make these calls to the OS? The renderer process is currently sandboxed. So should the browser process make the calls on the behalf of the renderer process or should the renderer process threads be allowed to change their own priorities?

Thanks,
Youssef

On Tue, Jul 5, 2022 at 3:27 PM Gabriel Charette <g...@google.com> wrote:
Why can't the kernel do this instead of the browser process though?

i.e. SetCurrenThreadPriority() should merely be a hint to the OS which it should then apply based on context (process state -- set by the browser process)

On Wed, Jun 29, 2022 at 12:52 PM Youssef Esmat <yousse...@google.com> wrote:
The problem today is that the cpuset cgroup is tied to thread priority and not a property of the process. If the entire process is either in urgent or non-urgent this would be easy by writing the process PID to the cgroup.procs file. However, a high priority thread is placed in the urgent cgroup whereas a low priority thread is placed in the non-urgent cgroup. So we have two factors that move the threads cpuset cgroup:
  1. The thread priority
  2. The process state (renderer is foreground or background).
So we need some entity to keep track of both the thread priority and the process state and set the cgroup accordingly.

In the past, we tried setting the priority from the child process, but it didn't work well because a background child process would take too much time to process the Mojo message asking it to increase its priority

Today this rule does not exactly hold in all cases. The threadpool in the renderer process sets the thread priorities by calling SetCurrentThreadPriority which adjusts the nice values but fails to adjust the cgroup because of the sandbox.
I think this is the part we need to fix. Only one entity can be in charge of thread priority and process state. For now that should mean the browser process and that when a new thread is created in the renderer the browser process needs to be called to adjust the priority and cgroup value.

On Wed, Jun 29, 2022 at 8:19 AM François Doray <fdo...@google.com> wrote:
Hi Youssef,

Today, on all platforms, the browser process is responsible for setting the priority of child processes [code]. In the past, we tried setting the priority from the child process, but it didn't work well because a background child process would take too much time to process the Mojo message asking it to increase its priority. Therefore, I suggest that we keep the call to set the priority of child processes in the browser process.

Ideally, the existing API to set the c-group of a process by writing to /sys/fs/cgroup/cpuset/ would be adjusted such that moving a process from the "normal" to the "non-urgent" group and moving a process from the "non-urgent" to the "normal"  group would be inverse operations (applying both operations brings all threads to their initial state). 

We can adjust thread properties from Chrome to palliate for the fact that moving a process from "normal" > "non-urgent" and moving a process from "non-urgent" > "normal" aren't inverse operations. However, since nothing prevents thread creation, destruction or priority adjustment while Chrome is adjusting thread properties, it's hard to be convinced that this solution will be correct in all cases today and in the future. This is why I would like to understand what's required to fix the /sys/fs/cgroup/cpuset/ API instead of implementing thread priority adjustments inside Chrome.

Have a nice day,

François 

On Tue, Jun 28, 2022 at 4:30 PM Youssef Esmat <yousse...@google.com> wrote:
Thanks for starting this thread Francois!

I agree that a lot of this should be pushed to OS wherever possible. Digging deeper on your suggestion, when you say handled by the OS do you mean the OS will handle the call from the renderer process or from the browser or both?

On Tue, Jun 28, 2022 at 12:45 PM François Doray <fdo...@google.com> wrote:
Hi!

On ChromeOS, PlatformThread::SetCurrentThreadPriority() doesn't work from renderers due to restrictions on setting c-groups and nice values. This dry run on this CL highlights the failures. Priority is set correctly only when the operation is proxied through the browser process via RenderMessageFilter.SetThreadPriority.

A contributor from Intel proposes using c-groups to throttle background renderers [CL]. Under the hood, changing the c-group of a process changes the c-group of all its threads, effectively overriding any c-group previously set via RenderMessageFilter.SetThreadPriority. The contributor suggests re-applying c-groups previously set via RenderMessageFilter.SetThreadPriority when foregrounding a process.

Example:
Initial => t1 [normal], t2 [display], t3 [background] *
Process is backgrounded => t1 [background], t2 [background], t3 [background] 
Process is foregrounded => t1 [normal], t2 [normal], t3 [normal]
Thread priorities are re-applied => t1 [normal], t2 [display], t3 [background]

It is possible to implement this solution without race by performing all process and thread priority changes from the same sequence in the browser process.

However, it seems that the correct layer to implement this is inside the OS, not in Chrome. @Youssef Esmat as ChromeOS expert: What are the implications of implementing this in the OS instead of in Chrome?

Have a nice day,

François

--
You received this message because you are subscribed to the Google Groups "chrome-scheduler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chrome-schedul...@google.com.
To view this discussion on the web visit https://groups.google.com/a/google.com/d/msgid/chrome-scheduler/CALUeGD3HBxAxoyug92g%3Dfx5Oxcu%3DcoE6G4zdFSVT4GG%2BCzji5g%40mail.gmail.com.

Gabriel Charette

unread,
Aug 4, 2022, 11:02:51 AM8/4/22
to Youssef Esmat, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev
+1 to what Francois said about decoupling the priority hint from the effective priority.

On Wed, Aug 3, 2022 at 4:00 PM Youssef Esmat <yousse...@google.com> wrote:
Hi All,

Picking this up again. When you say "kernel do this instead", what do you mean by "this"?

"this" == "honor the process priority hint"
 
I do agree that ideally the browser process should not have to manage the different cgroups (cpuset/cpu) and the nice values. That could translate to a different set of APIs exposed by the OS where the browser process could tell the OS that the renderer process should be in "foreground" or "background" and the OS would place the process in the correct cgroups. We could also have a similar API at the thread level allowing some threads to be higher priority than others.

However, the second part of the problem is, which process is allowed to make these calls to the OS? The renderer process is currently sandboxed. So should the browser process make the calls on the behalf of the renderer process or should the renderer process threads be allowed to change their own priorities?

The browser process is always the one managing the priority of the renderers (otherwise there's a priority inversion when foregrounding a tab where a background renderer needs to be told it's now foreground).

Youssef Esmat

unread,
Aug 8, 2022, 1:21:58 PM8/8/22
to Gabriel Charette, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev
Thanks for clarifying and I think we are on the same page. My feeling right now is we should move this logic to the OS (for chromeOS this looks like resrouceD because of other work going on with cgroups over there).

This will mean adding new apis to manage process/thread priorities. This API will not be available to sandboxed processes. Meaning that calling this API from the renderer will result in a no-op. Today, setting the thread priority in the renderer is broken because its a semi no-op, the nice values take affect and the cgroup changes to not. Moving forward the whole API will be a no-op. Do we agree there?

Thanks,
Youssef

Youssef Esmat

unread,
Aug 8, 2022, 1:56:55 PM8/8/22
to Gabriel Charette, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev
Started b/241794894.

François Doray

unread,
Aug 8, 2022, 4:16:11 PM8/8/22
to Youssef Esmat, Gabriel Charette, chrome-catan, Chrome Scheduler, scheduler-dev
Thanks for filling this issue! To confirm, this means that we are pausing the review of https://chromium-review.googlesource.com/c/chromium/src/+/3318660 ?

Youssef Esmat

unread,
Aug 8, 2022, 4:26:01 PM8/8/22
to François Doray, Gabriel Charette, chrome-catan, Chrome Scheduler, scheduler-dev
I think that depends on the urgency of the change.

Zheda Chen

unread,
Aug 9, 2022, 5:25:14 AM8/9/22
to scheduler-dev, Youssef Esmat, Gabriel Charette, chrome-catan, Chrome Scheduler, scheduler-dev, François Doray
Thanks for propelling the issue to be fixed in Chrome OS level ! Introducing the API to change scheduling policy of processes/threads is an eventual solution critical to hybrid scheduling on Chrome OS. Hope we can work together to land both the API and background process scheduling power optimization soon : )

François Doray

unread,
Aug 9, 2022, 9:39:12 AM8/9/22
to Zheda Chen, scheduler-dev, Youssef Esmat, Gabriel Charette, chrome-catan, Chrome Scheduler
I suggest we wait until the issue is fixed at the ChromeOS level before doing further changes in Chrome. This will avoid introducing technical debt in Chrome and consuming engineering time when we know that a more refined solution is coming. Let me know if you think otherwise.

Zheda Chen

unread,
Aug 12, 2022, 3:38:07 AM8/12/22
to scheduler-dev, François Doray, scheduler-dev, Youssef Esmat, Gabriel Charette, chrome-catan, Chrome Scheduler, Zheda Chen
Okay, agreed. 
Let's focus on API implementation in Chrome OS level (b/241794894) first and then update Chrome CL when b/241794894 is fixed.

Gabriel Charette

unread,
Aug 15, 2022, 3:15:08 PM8/15/22
to Youssef Esmat, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev


Le lun. 8 août 2022, 13 h 21, Youssef Esmat <yousse...@google.com> a écrit :
Thanks for clarifying and I think we are on the same page. My feeling right now is we should move this logic to the OS (for chromeOS this looks like resrouceD because of other work going on with cgroups over there).

This will mean adding new apis to manage process/thread priorities. This API will not be available to sandboxed processes. Meaning that calling this API from the renderer will result in a no-op. Today, setting the thread priority in the renderer is broken because its a semi no-op, the nice values take affect and the cgroup changes to not. Moving forward the whole API will be a no-op. Do we agree there?

Ideally, as on other platforms, we need to be able to set thread priority/QoS from sandboxed renderers (especially BACKGROUND priority for background ThreadPool workers).

The priority of a thread only affecting its share of the process's allotted CPU time (per process priority), it is not a problem to do this from sandboxed processes (on other platforms).

- Gab (from plane before going OOO for a few weeks...)

Youssef Esmat

unread,
Aug 16, 2022, 1:15:16 PM8/16/22
to Gabriel Charette, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev
I am not worried about nice value. However, do you see harm in allowing the thread to increase the CPUs that the thread can run on?

Youssef Esmat

unread,
Aug 18, 2022, 1:27:19 PM8/18/22
to Gabriel Charette, Rom Lemarchand, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev

Coming back to this to set expectations.

While in general I agree with this approach and I think that Chrome does not need to manage these platform details, however ChromeOS is a little different than other platforms. The main reason being that chromeos is not really an open platform that allows native apps to be written. So if we think about the customers of this API it's not clear this is a big win. The main customer of this API will be Chrome/Lacros. Some other subsystems that can use the API potentially are the VM subsystems (ARC, crostini, parallels). But there are a lot more details that need to be solved in that space first.

That said, if we want this API to be generic, we have to define a way to make the choice of cgroup generic also. For example, the concept of foreground on chromeos maps to a cgroup today. But the cgroup is very specific to chrome. The cgroup is /chrome_renderers/foreground or /chrome_renderers/{RENDERER_TOKEN} (in the case of per-renderer cgroups). This is very chrome specific and does not translate generically. Firstly because all renderers are limited by the top level cgroup "/chrome_renderers", secondly because of specific features for chrome like the per-renderer cgroup feature. Translating that to a single /foreground cgroup that can be shared by all customers of this API will change the behavior.

The cgroup structure above seems to indicate there is also value in having a user mode process like chrome be able to fine tune its use. It has a better understanding of the details of the processes/threads that are running on the system. 

Of course, all the problems above are solvable. And moving the API to a system level entity does have benefits. However, doing it correctly from the beginning will take time. We need to understand all the customers of this API and how that will map to system level constructs and what the impact of that will be on perf and power.

We have a few options:
  1. Take our time and fully understand this API and its customers and how it will map to system level constructs. But this will be in the order of weeks not days. 
    1. If we take this route are we ok blocking Zheda's proposed change?
  2. Start a first iteration of this API that is specific to chrome renderers and see how we can expand this later.
Thanks,
Youssef


Youssef Esmat

unread,
Aug 18, 2022, 2:10:48 PM8/18/22
to Gabriel Charette, Rom Lemarchand, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev
There is a third option of do nothing for now.

Also, I do think we should explore allowing the renderer to change thread priority from within the sandbox and perhaps this should be done before implementing the new API regardless of the option we choose. This leaves the thread in a broken state today (nice value changed but the cpuset is not updated).

Hong Zheng

unread,
Aug 18, 2022, 10:03:44 PM8/18/22
to scheduler-dev, Youssef Esmat, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev, Gabriel Charette, Rom Lemarchand
Thanks Youssef for your detailed clarification. As Zheda's optimization maybe needs to do trials to evaluate power and performance impact in the real world. In order to efficiently push it forward, is it possible to split the whole task into two stages:
Stage 1: choose option 2/3, then to review Zheda's patch, and do trials. During trials, some exploration/development work can be done simultaneously.
Stage 2: if the trial result is acceptable and development work is done, Zheda can update his change with the latest solution and ship the optimization finally.

Zheda Chen

unread,
Aug 24, 2022, 4:23:07 AM8/24/22
to scheduler-dev, Hong Zheng, Youssef Esmat, François Doray, chrome-catan, Chrome Scheduler, scheduler-dev, Gabriel Charette, Rom Lemarchand
@Francois @Gabriel,

What do you think about the proposed option 2 / 3, to start a first iteration of this API specific to chrome renderers, or to fix the broken state of priority change from within the sandbox ?
If choosing option 3, we are able to update Chrome CL and further explore more scheduling scenarios based on API first iteration, and then launch the trial experiments earlier.

François Doray

unread,
Aug 26, 2022, 12:15:06 PM8/26/22
to Zheda Chen, scheduler-dev, Hong Zheng, Youssef Esmat, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand
Hi everyone! 

I had a meeting with Youssef yesterday. We discussed 2 viable options which both LGTM:

Option A:
Change to the operating system such that changing the c-group of a process from X -> Y and then back from Y -> X brings all threads back to their original c-group.

Advantage:  The operating system already manipulates thread c-groups when the c-group of a process changes. It is preferable to set the right thread c-group at that time, instead of setting incorrect c-groups and letting the userspace application adjust them at a later time.

Option B:
Modify base::SetCurrentThreadType() to allow the request to be forwarded to a custom handler. In Chrome renderers, the handler would send a Mojo message to the browser process to request a thread type change [1]. The browser process would handle that request on the same sequence that adjusts renderer process priority. To change the priority of a renderer, the browser process would set the process c-group and then fix individual thread c-groups. Because thread and process priorities are manipulated from the same sequence, the browser process can derive a thread's desired c-group from its nice value, without any risk of data race.

Advantage: This is presumably simpler to implement than option B.

[1] Due to sandbox restrictions, base::SetCurrentThreadType() doesn't fully work in renderers. This change would incidentally make base::SetCurrentThreadType() work correctly in renderers.

Zheda Chen

unread,
Aug 30, 2022, 8:09:01 AM8/30/22
to scheduler-dev, François Doray, scheduler-dev, Hong Zheng, Youssef Esmat, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand, Zheda Chen
When you say option A, do you mean to let Chrome OS save original c-groups of all threads before process c-group change from X (like normal) to Y (like non-urgent)? So that when process c-group switches back from Y(like non-urgent) to X(like normal), Chrome OS reads from saved list for restoration ?

So if option B is implemented, all thread and process priority change of renderer process (including RenderMessageFilter::SetThreadType) will be handled by browser process on same sequence.

Do you mean both A and B need to be done?

Youssef Esmat

unread,
Aug 30, 2022, 10:50:04 AM8/30/22
to Zheda Chen, scheduler-dev, François Doray, Hong Zheng, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand
If we went with the OS api, then both the browser and the renderer would call into the OS.
On the other hand, if we used the browser, then all paths in the renderer that sets thread priority would need to call into the browser to set the thread priority.

My opinion is that we should start with option B since it will have user impact and help reduce power consumption. At the same time we should start defining the API.

Zheda Chen

unread,
Aug 31, 2022, 1:27:30 AM8/31/22
to scheduler-dev, Youssef Esmat, scheduler-dev, François Doray, Hong Zheng, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand, Zheda Chen
Thanks for the explanation. I would take a look at option B, to make browser process handle process/thread c-group change for renderer process.

François Doray

unread,
Aug 31, 2022, 9:19:15 AM8/31/22
to Zheda Chen, scheduler-dev, Youssef Esmat, Hong Zheng, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand
Ack, let's go with Option B.

To be clear, with Option B, all process/thread nice/c-group manipulations would be performed from the "process launcher" TaskRunner in the browser process. As a result, the browser process doesn't need to keep track of the ThreadTypes of each thread. It can simply derive it from a thread's nice value when needed. Since all process/thread nice/c-group manipulations are performed from the same sequence, there is no risk that the nice value of a thread would change between when it is read by the browser process and when an action is taken based on that information.

Zheda Chen

unread,
Sep 26, 2022, 10:05:44 AM9/26/22
to scheduler-dev, François Doray, scheduler-dev, Youssef Esmat, Hong Zheng, chrome-catan, Chrome Scheduler, Gabriel Charette, Rom Lemarchand, Zheda Chen
Currently I'm working on code implementation of the option B, and will submit a CL soon. 

I have prepared a one page design doc, please take a look. Your comments are welcome.

Gabriel Charette

unread,
Sep 26, 2022, 3:37:32 PM9/26/22
to François Doray, Zheda Chen, scheduler-dev, Youssef Esmat, Hong Zheng, chrome-catan, Chrome Scheduler, Rom Lemarchand
Oops, like my last reply only went to Francois... 🤦‍♂️

"""
Catching up post-vacation, thanks for driving this, I'm really happy to see CrOS moving on this.

Caveat for Option B, re.Francois's last message: ThreadType cannot always be derived from nice value as some TheadTypes may map to the same nice value. This is why it's stored in TLS, but that makes GetCurrentThreadType only callable from the current thread. Maybe we'll need to keep a thread type map on the "process launcher" sequence?
"""

Also added some comments on the doc.
Reply all
Reply to author
Forward
0 new messages