Hi,
To give context on who’s who, Kevin and Rossen are experts on accessibility (among other topics J), while Stefan, CJ, and I are experts on cross-process window parenting & input processing, and process model in general. We’ve been meaning to open this thread – thank you – but wanted to bring Rossen and Kevin in on the discussion. They have knowledge of some solid issues specific to accessibility that need to be worked through, that are large in their own right.
I’m going to give a long answer your question. I’m not sure where I’m giving too much or too little detail so I’m open to feedback. I 100% agree with Raymond’s assertion that this is like juggling chainsaws, but if I do say so myself, we have a few experience chainsaw jugglers J.
Most threads in Windows has a win32k message queue. There’s a number of equivalent ways of looking at that queue, and the way I like to look at it is as a queue of queues. When you call GetMessage or PeekMessage, you pull messages out of the queue in this order:
NonQueued > Posted > Input > Generated
The NonQueued “queue” is essentially all the other threads that are trying to interrupt your thread, eg. with a SendMessage operation that would be processed before GetMessage() or PeekMessage() actually returns. Posted messages come from PostMessage and many APIs that ultimately wrap PostMessage. Input messages are generally keyboard & pointer (mouse/touch/pen etc.), including messages related to keyboard focus and activation. Generated messages are things like timers, paint messages, etc., that are generated when there’s no other work in the queue but a certain flag is set on the input queue. WM_MOUSEMOVE / WM_POINTERMOVE is a special case that’s sometimes an input message, sometimes a generated message, due to how mouse coalescing works.
For the most part, every thread has a completely independent queue of queues. But when you attach input queues, you have two threads that share the same Input Queue, and many of the same flags and member variables for the Input Queue’s State, such as this
thread’s currently focused window. I’m going to attempt a little ascii art, not sure if it’ll pull through:
Thread A: NonQueued > Posted Generated
\ /
Input
/ \
Thread B: NonQueued > Posted Generated
If that diagram made any sense, the immediately obvious problem is that a deadlock in one thread can cause another idle thread to never process messages. Consider if thread A receives a click (mousedown followed by mouseup), then B receives a click, then A receives a click again. A and B are both idle. The Input Queue looks like:
Adown Aup Bdown Bup Adown Aup
Then suppose that A can process messages just fine, but processing Bdown triggers an infinite loop. Now the queue looks like:
Bup Adown Aup
Bdown has already been removed from the queue, but the damage is done. Bup cannot be processed because thread B is wedged. Adown can’t be processed because Bup is ahead of it in queueing order. So even though thread A is completely idle, it cannot service any input whatsoever.
This can be particularly challenging when you call a function like SetFocus, including implicit calls when you call DefWindowProc on WM_MOUSEDOWN. Focus is a synchronous queue state. If you move focus from one window on the current input queue to another, they both get a synchronous message. So moving focus to or from a thread that is processing messages, will block the other thread.
There’s a number of other subtle interactions. This also leads to developers seeing a hang, attaching a debugger, and being confused when confronted with a completely idle thread that seemed to be frozen.
The good news is, if you can detect this, you can detach input queues just-in-time, and then A can proceed with processing messages.
Interestingly, it looks like the chromium source is already dealing with this to some degree in MessagePumpForUI::WaitForWork, so it looks like it’s something you already encountered in some cases, probably having to do with dialog windows. Also this blog post from a Mozilla dev concerning the same code https://dblohm7.ca/blog/2015/03/12/waitmessage-considered-harmful/. At first glance, I’m not sure the existing solution is really effective, but maybe I just need to sit down in the debugger to see what’s going on live. It’s a bit different from the workarounds and solutions I’ve seen before.
The fundamental problem is, in the above case, suppose thread A checks its queue for messages. It has pending messages, but none it can process. There are two basic ways to let the thread go to sleep in MsgWaitForMultipleObjects(), hinging on whether you pass MWMO_INPUTAVAILABLE or not. With no flag, it says “wake me up when there’s new messages anywhere in the queue”. If thread B eventually processes its message, thread A might never wake up even though it still has pending work because there’s no new pending work, until something “pokes” it – experienced by the user as a window that hangs until you just wave your mouse over the window and it clears up. With the flag, it says “wake me up when there’s any input messages, new or not, for me to process”. Then it wakes up immediately because there’s already input messages for it to process. But it can’t process them yet due to the attached queue. So you spin in a tight loop. And this tight loop occupies CPU time that could be spent working on whatever is blocking thread B. All the while suffering from increased battery usage.
Suppose you call, or an API you call, creates a top-level window based on the current window – whether modal or modeless. Top-level don’t have parents but they often have owners, such that when you click the owner, the owned dialog comes to the top, or vice-versa. When you attempt to make a child window an owner, the ownership is basically transferred to the ancestor of that child which is a top-level window. This means you can implicitly create a cross-process Owner/Owned relationship. This, first of all, implicitly attaches input queues again, even if you detached them before. But secondly, this creates a synchronous dependency risk of its own. When the mouse clicks down on a background window, the Windows kernel does a synchronous message to all owned windows to tell them to join that window in the Z-order. That means you have a sudden synchronous dependency from one thread to another, which cannot be detected ahead of time because the kernel made the call before usermode code even sees it happening. This one is particularly insidious because unlike the Implicit Synchronous Dependency of attached input queues, when you’re in this state, there’s no way out other than to either have one thread be unlocked on its own, or to kill that process. Un-ownering the window after-the-fact will not repair the situation.
You can “feel” the answer if you create a app with multiprocess UI and call AttachThreadInput with fAttach=FALSE, in a loop, to fully detach input queues of the windows. In short, keyboard focus is messed up such (sometimes it looks like something should have focus, but typing reveals that keyboard input is still going to the wrong window), clicking the child window doesn’t automatically bring the parent window to the top, etc.. Win32 is generally built with the assumption that all windows that share a common ancestor top-level window, share the same input queue, and some very basic things break down when you subvert that.
More literally, the input queues are attached because the Windows kernel automatically attaches queues between threads that have windows in a hierarchy relationship, and various triggers can cause Windows to re-evaluate that relationship and attempt to re-establish input queue attachment. So even if you do detach input queues, if you’re not careful, they can be re-attached for you behind your back. We have a decent survey of conditions that reattach input queues. One of the trickiest to deal with has to do with IMEs.
Every previous Microsoft browser has included cross-process UI, at least since IE8.
In IE and earlier versions of the non-Chromium Edge, we spent a great deal of effort on dealing with this imperfectly. Feel free to skip this section, the point I’m raising here is some concrete information on what “juggling chainsaws” really means. The key points were:
Overall, this leads to an experience where user interactions can be janky for about 1 second after some code enters a tight loop, but otherwise the browser is mostly resilient to problems from other threads.
Newer versions of the pre-Chromium Edge browser used a different technique, just solving all the issues that input queue attachment solved, without the actual input queue attachment. This technique gives much more symmetric independence (neither thread depends on the other, for the most part), and was fundamentally reliable with no hairy race conditions. I believe this solution worked out great, and I’d like to take inspiration from it. Unfortunately, we can’t import it as-is, because the precise implementation is tied pretty closely to UWP. I’m not recommending that chromium switch to being a UWP on Windows, and I know it would at best take several releases of Windows to bring the whole implementation to win32 compatibility (we had looked into it in the past).
However, that’s for the entire “Component UI” API we designed. Many fundamental implementation details could be generalized to win32 more easily as isolated features. We can experiment internally with them and work with the various teams within the Windows organization to make these underlying APIs (or some wrappers thereof) stable and supported so we could feel comfortable using them going forward. In particular, there is an API that allows you to give an HWND a false parent or a false child (just 1:1) for accessibility purposes only, without impacting input queue attachment, queue flags, or window ownership. This API also allows us to fake out the dimensions and visibility state of that window.
The other obvious solution is, if no attached thread ever does anything that risks taking nontrivial time, then we don’t have a problem. We use MWMO_INPUTAVAILABLE, even though it effectively doubles CPU usage, because spin time is guaranteed to be so short that it makes no real difference. We can simply accept all the other synchronous dependencies because we really don’t need to run both threads in parallel at any point, their workload is so sparse. No dialog will pop up because the premise
I know the browser process UI thread already has a strict policy avoiding IO on the UI thread and that’s a great step here. A big risk is we have to be sure that nothing sneaks in that we don’t control, adding a dialog from a thread we don’t own, because it ruins everything. One of the vectors by which things could sneak in is through the very accessibility tools we are trying to keep out of proc in the first place. Here I want to defer to Kevin and Rossen as to the best way to do UIAutomation endpoints. Another are the window manager injectors you’re describing below.
Correcting a missing thought below:
“No dialog will pop up because the premise" becomes “No dialog will pop up because launching a dialog would be nontrivial code."
The other obvious solution is, if no attached thread ever does anything that risks taking nontrivial time, then we don’t have a problem. We use MWMO_INPUTAVAILABLE, even though it effectively doubles CPU usage, because spin time is guaranteed to be so short that it makes no real difference. We can simply accept all the other synchronous dependencies because we really don’t need to run both threads in parallel at any point, their workload is so sparse. No dialog will pop up because launching a dialog would be nontrivial code.
I actually think most of the experimentation would be purely on the chromium side. For instance, the accessibility items ultimately just come down to calls to SetProp on the HWND in question with some well-defined internal property names. We wouldn’t want to rely on that implementation detail being stable in the long-term, but I don’t think it would block prototyping. The properties of interest are:
UIA_HWNDXOffset
UIA_HWNDYOffset
UIA_HWNDWidth
UIA_HWNDHeight
UIA_WindowVisibilityOverridden
CrossProcessChildHWND
CrossProcessParentHWND
UIA_UseSiblingAsChildForHitTesting
I can get back to you later on their meaning, it’s subtle and a bit weirder at points than you might expect because it’s only expected to be used internally. This does not constitute documentation J.
In terms of your question – yes, we can reparent the window relatively simply, which makes a proof of concept for #1 not too difficult at least as far as windowing goes. Basically, just do what chromium does now, then a new process that creates a new top-level window and returns the handle (the HWND). Then have the Browser Process call SetWindowLongPtr to convert the existing top-level window from WS_POPUP to WS_CHILD, and SetParent to parent it to the HWND from the utility process. That’s almost how old Internet Explorer works, except in IE it’s the other way around, where the child process’ child windows are reparented to the top-level window. But regardless of who is parent, we need to perform the operations in the medium-IL process. The vast majority of input handling does not necessarily need to move to the new process, but things like the size/move modal loop (dragging by the titlebar) and closing the window from the taskbar are going to get in touch with this new top-level window first. That means, if we don’t move it, then we have the condition I warned about where both processes will receive input and potentially keyboard focus, so it opens a whole lot of landmines. But it can be done. Windows will automatically route your pointer (mouse, touch, etc.) input to the child window just as it would any other child window, and there will one keyboard focus window across both UI threads.
We’d probably set things up so that whenever this “host” process owning the top level window goes down, then the browser process self-terminates, at least to begin with. The long-haul part is creating careful mitigations for potential queue issues, and also adjusting crashpad etc. to be aware of the new reality that a hang in one thread could be due to something in a different process entirely that doesn’t seem to be waiting from usermode.
Then the next step is changing the accessibility objects themselves and serializing them out to this process that contains the top-level window. I know Kevin and Rossen think this will take a good deal of time and have big rocks, and apparently they fell off the email, so I’m putting them back J.
Apologies for the long delay in getting back. To update, I spent some time looking in to this and I'm going to summarize my understanding of the issue and recommend some options.
The main goal for
this discussion seem to be about how to apply the
ProcessExtensionPointDisablePolicy mitigation to the Browser process. This
mitigation isn't enough to prevent all third party code injection, but it
prevents many legacy injection techniques:
557798 -
Block legacy hooking mechanisms on Win8+ - chromium
The mitigation was
attempted by Chromium in the past but some issues came up where some third
party software injecting in to the Browser process stopped working properly
which was reported by users, most notably in old abandonware IMEs that use
IMM32 rather than the newer TSF. A few of these are still popular and I found
this bug with more details:
1017694
- after update 78.0.3904.70 cannot input a win7 chinese - chromium
I also found a bug
tracking other non-accessibility software that intentionally injects in the
Browser process using accessibility APIs (hooks) that was broken by this
mitigation:
1018714
- Breaks Windows system hooks - chromium
I do not think of either of these "accessibility" software, where in previous replies this discussion was led to how we solved the accessibility problem with non-Chromium Edge. One point of clarification is that the legacy IMEs that don't work when this ProcessExtensionPointDisablePolicy policy is applied to Chromium also don't work in the non-Chromium Edge, along with other UWP-based UI like the in-box Windows 10 settings app.
I didn't find a great explanation anywhere of exactly what ProcessExtensionPointDisablePolicy does, so I did some research and poking around to build a list:
As mentioned in the last 2 bullet points, accessibility software is still allowed to hook via some common entry points as long as they have the UIAccess capability, which is required for hooking system UI, so I expect most (if not all) accessibility software in use today already has this capability for that reason. Given that, I don't think the ideas from earlier about inventing a way to connect accessibility trees to another process (via HWND parenting or props) where third party accessibility binaries gets loaded makes much sense to prevent accessibility tools from injecting in the browser process, since they would still be loaded in the browser process (where the hooks are fired by Windows from HWND input) anyway.
Somewhat related, newer IME binaries use a different mechanism to load in to the browser process which isn't prevented by ProcessExtensionPointDisablePolicy.
To actually have accessibility and IME software run out of the browser process, it would require avoiding triggering the hooks used to inject the software. In the cases I looked at in depth it means not having HWND Focus / Foreground in the Browser process, which is possible if we get rid of all HWND keyboard and mouse input use in the Browser process, which seems possible if we delegate those HWNDs to a separate (UI) process. In this case we may be able to get to a point where all of the relevant application events a third party software needs to listen to are triggered by a new utility process that manages all of the HWNDS and input.
To summarize, I can think of two main approaches that let us move forward to enable this policy in the Browser process without Windows changes:
Interested in hearing thoughts from others on these ideas
-Stefan