Could we use system capability to clear dead threads for the map ? Youssef (+ youssefesmat@) said that as for dead process, we might be able to use the filewatcher API in base to watch /proc/{pid}/tasks directory. If so we might also record pid into the map, like [ tid ---> <pid, thread_type>]?
Add some replies/comments to Bruce and Francois below.
From: Bruce Dawson <bruce...@google.com>
Sent: Saturday, November 5, 2022 8:27 AM
To: François Doray <fdo...@google.com>
Cc: Chen, Zheda <zheda...@intel.com>; Gabriel Charette <g...@google.com>; Chrome Scheduler <chrome-s...@google.com>; Markus Handell <hand...@google.com>; Etienne Pierre-doray <etie...@google.com>; fdoray-...@google.com
Subject: Re: nice value of kResourceEfficient on ChromeOS
Maintaining a list of threads owned by other processes or other code seems like a really bad idea. It is fundamentally racy (as soon as a TID is freed it can be reused), and a small delay makes this worse. Plus the implementation sounds very fragile.
[Zheda] When we use the [tid->thread_type] map, we traverse all live threads of the foregrounding renderer process via ForEachProcessTask to get their thread types [CL code ] on process launcher task runner of browser process. And renderer thread type is only set on same task runner of browser process [CL code]. So I think there’s no race here, and the browser process won’t read a dead thread.
If tid is reused by another renderer process, SetCurrentThreadType would be invoked after thread creation, and the map get refreshed.
The only way I can think of to do this which would be safe would be that threads that we own could register themselves at startup and deregister at shutdown. To avoid raciness they should not finish terminating until the browser process has definitely received the message and removed the entry, otherwise the browser process will have a reference to a dead thread which is a use-after-free of a different sort.
The raciness could be avoided (on Windows) by duplicating the thread handle and giving a copy to the browser process. However the whole thing still makes me very nervous and I fear a future bug farm. And, I don't know if other platforms could support these semantics.
If the goal is to support foregrounding/backgrounding processes then doesn't changing the process priority does that? It doesn't give us full control, but it feels like the right thing to do (work with the platform) and is simpler and less bug prone.
[Zheda] We had discussion on scheduler-dev groups [link] in August whether we can define a Chrome OS API, which introduces changes to OS that changing the c-group of a process from X->Y and then back from Y->X brings all threads back to their original c-group. The answer from Chrome OS team (youssefesmat@) is that defining such OS API is a long term solution. We need to fully understand the API customers and how it will map to system level constructs [link]. Since we want to ship background process scheduling feature soon, we decide to do option B user space level solution first. This is the CL [link] for user space solution.
Apologies if I'm misunderstanding the goals here.
On Fri, Nov 4, 2022 at 11:44 AM François Doray <fdo...@google.com> wrote:
Hi!
Yes, let's maintain a [thread_id->thread_type] map on the browser side.
I can think of two possible solution to detect a thread going away in a child process:
- Detection code at the end of ThreadFunc() [code]
- Pro: Arbitrary code execution allowed.
- Con: Only works for threads created via //base APIs.
- Detection code in the destructor of an object stored in ThreadLocalOwnedPointer for any thread that sets a thread type.
- Pro: Works for any thread, even if not created via //base APIs.
- Con: We need to be careful about code that runs late in the life of a thread.
The thread destruction detection code could:
- Send a Mojo message to notify the browser process.
- Pro: Browser process immediately notified of thread destruction.
- Con: The overhead of a Mojo message per thread destruction may be too high.
- Add the thread's tid to a process-global locked data structure. A Mojo message with the list of destructed threads is sent to the browser process only when it's big enough.
- Pro: Minimal overhead if total number of threads created in a process remains below a threshold.
- Cons:
- Small delay to clean up the map in the browser process (but keeping a few integers in memory is not too concerning).
- We need to think about how this works if a tid is reused?
[Zheda] If a tid is reused by another process (a different pid), we might be able to identify it if we also record pid in the map? Is it possible to be used by same process?
What do you think?
François
On Thu, Nov 3, 2022 at 10:59 PM Chen, Zheda <zheda...@intel.com> wrote:
Okay, let’s try with alternative solution, probably to maintain a [thread_id->thread_type] map on browser side but need to fix the problem of map infinitely grow. Please let me know if any better proposal.
From: François Doray <fdo...@google.com>
Sent: Thursday, November 3, 2022 2:53 AM
To: Gabriel Charette <g...@google.com>
Cc: Chen, Zheda <zheda...@intel.com>; Bruce Dawson <bruce...@google.com>; Chrome Scheduler <chrome-s...@google.com>; Markus Handell <hand...@google.com>; Etienne Pierre-doray <etie...@google.com>; fdoray-...@google.com
Subject: Re: nice value of kResourceEfficient on ChromeOS
I understand that we want to keep supporting arbitrary mappings from "ThreadType" to "nice value" and "c-groups" (not add a constraint that there must be a 1:1 mapping between "nice value" and "c-group"). In that case, I'll work with Zheda Chen on a solution to restore c-groups when a process is foregrounded that doesn't depend on deriving c-groups from nice value.
On Mon, Oct 31, 2022 at 5:24 PM Gabriel Charette <g...@google.com> wrote:
Replied with more thoughts in a longer discussion labeled "Thread priority discuss: kUtility vs kResourceEfficient vs kDisplay".
With that in mind, I don't think we want to have a different nice-value on CrOS for kResourceEfficient than we do kDefault..?
Although, that is what Apple does for QOS_CLASS_UTILITY IIUC?
CC +Bruce Dawson +Chrome Scheduler (same as other discussion but focused here on how to recall that a thread was kResourceEfficient when re-foregrounding a renderer process; mapping the nice value was a proposal if we can have different nice values per thread types).
On Fri, Oct 28, 2022 at 7:43 AM Chen, Zheda <zheda...@intel.com> wrote:
This is the thread nice value to c-group mapping on CrOS below per [code]
kBackground (10), kResourceEfficient (0), kUtility (1, proposed in CL #3898731)
Non-urgent c-group
kDefault (0)
Normal c-group
kCompositing (-8), kDisplayCritical (-8), kRealtimeAudio (-10)
Urgent c-group
Per this [CL] design, when backgrouding a renderer process, the browser process moves all threads of the renderer to non-urgent c-group. And when foregrounding a process, the browser needs to re-apply original thread c-group previously set via RenderMessageFilter::SetThreadType. It makes process level hybrid scheduling on Chrome OS workable and we might consider more processes in the future [design doc].
To make [derive thread c-group from nice value in browser process] workable, except for lowering kResourceEfficient nice value to 1, probably there’s another short-term alternative which is to map kResourceEfficient [0] to normal c-group on CrOS ? Currently kResourceEfficient cannot be move to non-urgent c-group via SetCurrentThreadType on Linux/CrOS due to renderer sandboxed restriction. The result of this short term solution is consistent with current behavior.
Re Markus’s question “What’s the ETA for that work”: the [CL] to apply C-Group updates from the browser process is in review.
Regards
Zheda
From: Markus Handell <hand...@google.com>
Sent: Friday, October 28, 2022 4:31 PM
To: Gabriel Charette <g...@google.com>
Cc: François Doray <fdo...@google.com>; Chen, Zheda <zheda...@intel.com>; Etienne Pierre-doray <etie...@google.com>; fdoray-...@google.com
Subject: Re: nice value of kResourceEfficient on ChromeOS
If we don't mind running a thread slower to save resources (kResourceEfficient), I think this implies that its priority is slightly lower than a kDefault thread? And as such, changing the nice value from "0" to "1" is ok?
> It's also my opinion that kResourceEfficient could be lower priority, I forget why @Markus Handell felt strongly otherwise for WebRTC? This is why we added kUtility FWIW.
Because belating scheduling means increasing end-to-end transmission delay by the same amount, and results in increased size of remote jitter buffers.
Priority is orthogonal to energy use, and matters only when there's more than 1 choice of a thread to run for a processor. Currently all WebRTC thread workloads need to be regarded as latency-sensitive.
WebRTC threads are not CPU intensive, but they do need to run timely.
Side note: Today, the only use of kResourceEfficient is by WebRTC, from renderers [code]. Since the nice value for kResourceEfficient is the same as for kDefault [code] and changing C-group from a renderer is prevented by the sandbox, the "WebRtcThreadsUseResourceEfficientType" feature is a no-op on ChromeOS. With @Chen, Zheda's work to apply C-Group updates from the browser process, this feature will no longer be a no-op on ChromeOS.
Hm interesting... I was not aware. What's the ETA for that work?
The experiment (50% beta since a while) includes very few data points for Intel 12th gen (5k on CrOS, ~650 (!) on Windows) and I was about to roll to stable to get more.
On Thu, Oct 27, 2022 at 10:54 PM Gabriel Charette <g...@google.com> wrote:
Le jeu. 27 oct. 2022, 15 h 05, François Doray <fdo...@google.com> a écrit :
If we don't mind running a thread slower to save resources (kResourceEfficient), I think this implies that its priority is slightly lower than a kDefault thread? And as such, changing the nice value from "0" to "1" is ok?
It's also my opinion that kResourceEfficient could be lower priority, I forget why @Markus Handell felt strongly otherwise for WebRTC? This is why we added kUtility FWIW.
Perhaps it's because we're okay delaying scheduling but then once it's running we want a normal priority thread (i.e. equivalent of base::ThreadPolicy::USE_FOREGROUND_THREADS)?
Side note: Today, the only use of kResourceEfficient is by WebRTC, from renderers [code]. Since the nice value for kResourceEfficient is the same as for kDefault [code] and changing C-group from a renderer is prevented by the sandbox, the "WebRtcThreadsUseResourceEfficientType" feature is a no-op on ChromeOS. With @Chen, Zheda's work to apply C-Group updates from the browser process, this feature will no longer be a no-op on ChromeOS.
Nice catch! We do analyze this experiment beyond CrOS though so it's still relevant in today's chrome I think?
On Mon, Oct 24, 2022 at 1:20 PM Gabriel Charette <g...@google.com> wrote:
Sounds like a reasonable experiment to me but +Markus Handell was mentioning in our ongoing discussion @ https://chromium-review.googlesource.com/c/chromium/src/+/3898731/comment/f63d4e70_136c1e7c/
that Meet would expect kResourceEfficient threads to be running at normal priority (and nice value I assume). We haven't yet settled on this but this information contributes to the set of things to consider.
On Mon, Oct 24, 2022 at 10:55 AM François Doray <fdo...@google.com> wrote:
Hi Gab/Etienne,
Background:
- On ChromeOS, the sandbox prevents setting a thread's C-Group from a renderer. On the other hand, setting a thread's nice value works fine.
- Zheda (Intel) works on a "Thread Type Change Delegate" which will make a Mojo call from the renderer to the browser on Thread Type change, to let the browser change the C-Group.
- When a process is backgrounded, the C-Group of all threads in the process is set to "non-urgent". When a process is foregrounded, the C-Group of each of its threads must be restored explicitly (otherwise, it stays "non-urgent").
Zheda proposed to maintain a [tid -> ThreadType] map in the browser process and use it to restore correct C-Groups. Problem: There is no simple way to observe all thread deaths, so the map may grow indefinitely.
I proposed to read a thread's nice value (not affected by process backgrounding) and derive the C-Group to restore from it. This is non-racy if all Thread C-Group and Nice value manipulations happen from the same sequence in the browser process. Problem: kResourceEfficient and kDefault have the same nice value, but different C-Groups.
Question:
- What do you think of changing the nice value of kResourceEfficient from "0" to "1" to allow deriving C-Groups from Nice values (and eliminate the need to maintain per-thread state in the browser process)?
Thanks,
François
--
Bruce Dawson, he/him
To determine if this work is desired, I would really like to get an ETA for youssefesmat@'s work (see above)
Yes, that work is still desired. CrOS is the only platform which doesn't support this.
Incorporating all comments from Gabriel, Youssef and Francois, it's desired to have the workaround solution in Chrome browser before CrOS API is ready. Will continue this work.I'm also interested in the work of stage 3 ---- thread level API to resourced, but I guess it would be several months later? If I have bandwidth at that time, I'm glad to help and contribute.With regard to priority on Windows, I find SetProcessBackgrounded will also set idle priority for background process on Windows, apart from EcoQoS setting. Looks like it's already done ?
The workaround solution in Chrome browser is proposed below. Please take a look and give any comment.
Target: implement a workaround to restore correct thread c-groups from Chrome when the c-group of a process is moved from: foreground->background->foreground
Currently, the thread type behaviours on Chrome OS are listed below
1) kRealtimeAudio, kDisplayCritical, kCompositing: nice values < 0, placed in the urgent c-group
2) kDefault: nice value = 0, placed in normal c-group
3) kResourceEfficient: nice value = 0, placed in non-urgent c-group
4) kUtility, kBackground: nice values > 0, placed in non-urgent c-group
Since it's a temporary solution before CrOS priority API is fully ready, probably the design don't need to be very elaborate. So the proposal to recall original thread c-groups
when re-foregrounding a renderer process might be:
Step 1) Derive the correct thread c-groups from nice value, if the thread nice value < 0 or > 0.
nice value < 0 -> urgent c-group
nice value > 0 -> non-urgent c-group
Step 2) Restore the thread c-group for kResourceEfficient type threads (nice value is 0)
Option 2.A: hack approach. Encode kResourceEfficient type in the name of the thread.
AFAIK currently only WebRTC threads (webrtc_network, webrtc_signaling, webrtc_worker) use this type. We can identify this type of threads through thread names.
Option 2.B: maintain a data structure in browser process to store kResourceEfficient type, and clear dead threads timely.
storage[pid][tid] = [kRE_thread_type]
When the renderer process exits (RenderProcessHostObserver::RenderProcessExited), remove all values of corresponding pid
When storage[pid][*] elements exceeds the threshold, watch for changes in /proc/{pid}/tasks to remove dead threads
Option A is much simpler. Since it's for temporary solution, probably option A might be worth it ?
Step 3) kNormal type threads (nice value is 0), no action since no need of extra thread c-group manipulation.SetProcessBackgrounded(bool), thread type setting, thread c-group manipulation are all done in process launcher task runner of browser process. So we can guarantee the sequences.
Thanks for your comments ! Let's go with 2B and I will implement the solution per this proposal.
I read Joel's patch and make a table below. In October 2022, there were some discussions (Gabriel, Francois, Bruce, Markus, and me) about whether should change nice value of kResourceEfficient type. At that time we wanted to support arbitrary mappings from "ThreadType" to "nice value" and "c-group". Probably also ask Bruce / Markus to take a look at this idea ?Besides, a few comments about the code:1. There's one small problem that kCompositing and kDisplayCritical have the same nice value (and cpuset) on ChromeOS, so GetThreadType (nice value -> thread type) never returns kDisplayCritical type (type = 5). But it doesn't affect nice value transition between foreground and background.2. In the patch, background -> foreground transition code doesn't cover kRealtimeAudio yet.
Re Markus's question.
1. Can you clarify the meaning of the "non-urgent"/"normal" and "urgent" c-groups? The names convey priority to me, not power efficiency profile.
Actually the names are a bit confusing here, probably need re-naming. The cpuset means to bind process / thread in a c-group to a specified set of CPUs and NUMA nodes. On ChromeOS, only CPU binding is used. Chrome uses cpuset to control the core placement of urgent / normal / non-urgent threads. On hybrid core systems, non-urgent threads are placed on E-cores, while normal / urgent threads able to run on all logical cores.
2. Maybe let's redo WebRtcThreadsUseResourceEfficientType with a new ThreadType where we can map the desired behavior later, and define?
Per current logic, if WebRTC threads use kResourceEfficient type, it has same ratio of CPU time as normal priority threads, and it runs only on E-cores to save power. Both behaviors are desired?If we cannot use unique nice value per thread type, probably have to use nice value + cpuset to identify a type during foreground <---> background transition for the patch https://hastebin.com/share/eperajejor.diff.
> That's why I think we should add a "kRtc" ThreadType to which we can assign them. In the frame of the proposal with unique nice values, they could be at for example FG nice -4 but live in "non-urgent". Wdyt?
Make FG nice slightly less than 0 + live in "non-urgent" for kRTC looks good to our web media team. If we add more thread types in the future, we might also need to make their nice unique.Looking forward to inputs from other experts.
In large meetings on lower end (non-big/little) machines we see significant keypress latencies. Those latencies were significantly reduced by using kCompositing (nice -8) for the renderer main. I worry that if we go with nice -4 those latencies would come back.
Ultimately kDisplayCricital and kCompositing are the same, the difference is mostly historical,
Meanwhile, they also have different QoS behaviors on MacOS. I think it would take some time for experiments.
Ultimately kDisplayCricital and kCompositing are the same, the difference is mostly historical,
Agreed, I'd plan on merging them; and perhaps eventually make a different split that might make more sense.
Meanwhile, they also have different QoS behaviors on MacOS. I think it would take some time for experiments.I started UserInteractiveCompositingMac for this.
I don't know if it's worth doing an experiment on Linux; population is small and presumably the fact that it was a good choice on ChromeOs means it should be on Linux as well.
Thanks Gabriel and Etienne for sharing the context and status of two thread types.When thread types evolve (either merge or meaningful split over time), we will also need to update the logic of foreground <----> background nice change.
> Here is the updated CL (mainly added more error handling and handled
> the issue you mentioned):
> https://chromium-review.googlesource.com/c/chromium/src/+/4605124
> Would you be willing to give this CL a try on your side?
Yes I'm glad to have a try, and will comment on the CL soon.