(I sent this previously as an email reply to a few people about a particular patch, but as this continues to come up on a monthly basis, here is a slightly edited copy of this rant as a reference of my biased position on this topic.)We should *not* make layout tests use threaded compositing.We shouldn't do it because it's a rat hole. jamesr and I both tried to make this happen on separate occasions. We spent a lot of time and fought with even more flakiness. I'm sure somebody could heroically burn a lot of time making this happen and succeed where we couldn't, but in my opinion, it's a waste of time. Threaded compositing increases flakiness and loses determinism in terms of when events happen, how repaints line up with frames, and how animation happens, especially. We already have a ton of problems with flakiness in layout tests, so why make this worse?If there are features that are only enabled with threaded compositing (compositor animations?) that somehow still need Blink layout tests, we should enable those features for non-threaded compositing. I would support removing the virtual/threaded directory if somebody made it possible to test those features in other ways. If there are tests that have different behavior in threaded mode vs not threaded mode, then they need to be deflaked or tested more narrowly via unit tests or esprehn's simulation testing framework.Single vs threaded compositing is generally well-tested within cc unit tests. There should be very little difference at the Blink level, and I would question the need to test it there. We have ironed out a small number of message ordering differences over time in cc. We should consider these differences bugs and add cc unit tests for them to guarantee the same ordering of messages. In the long term, I think the Blimp/Mandoline work has the potential to merge single/thread proxy with an abstracted glue layer between them.So, sure, we should test what we ship, but on the other hand every test should not be an integration test. I have have seen exactly zero bugs ever that would have been caught if a layout test had been using threaded compositing instead of single threaded compositing. Really, zero.Personally, I think our problems as a team lie more on the "tests are flaky and hard to reason about because they're integration tests" end of the spectrum and far less on the "we are shipping huge hidden regressions because of holes in our testing".
Currently, no!There are some things (e.g. composited animations) that have behavior differences in code depending on threaded vs single thread. It seems to me that these are wrinkles that could be ironed out. It seems like there's no reason we couldn't create and run composited animations in single-threaded mode.It looks like other things have been added there over time (e.g. idle callbacks), that I don't understand as well. Naively, it seems like fundamentally nothing should depend on threaded compositing.
Resurrecting this thread, because yotha@ are launching the first step in a project that completelyignores the previous consensus (i.e., that we should *not* enable threaded compositing for allweb_tests).
I will paraphrase the high-level arguments made previously by enne@ and ojan@, neither of whomare still active on chromium:- Enabling threaded compositing will increase test flakiness- Previous efforts to fix the flakiness have been really difficult and largely unsuccessful- Buggy behavior in the blink/compositor thread boundary is best covered by focused unit tests- There are vanishingly few bugs in that area; maybe none?My reponse...web_tests, by their very nature, are a massive set of integration tests. That is not the most efficientorganization of test coverage. But the beauty of web_tests is that they catch a *ton* of bugs andprevent many bad patches from landing. The large majority of the bugs are in core blink areas,but not all of them. web_tests provide effective, if inefficient, coverage of almost the entire chromiumcode base. The lack of coverage for threaded compositing is an anomaly and a blink spot. "It's toohard" is not, in my opinion, a convincing justification.Yes, turning on threaded compositing *does* make web_tests flakier. And that is evidence of real bugs!It's oxymoronic to acknowledge the flakiness but claim that there are few or no bugs in theblink/compositor thread boundary. There are definitely bugs in there! They can be excruciatingly hardto root-cause, but they definitely exist. A couple of examples...Once these bugs are root-caused, I certainly agree that we should prefer unit tests for regressiontesting. But they key point is that we only discovered these bugs because they caused failures inweb_tests.Again: these are real bugs!
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/4d8dd468-b3c4-4f65-8ac7-1b4ca194e08an%40chromium.org.
On Fri, Jun 9, 2023 at 11:25 AM Stefan Zager <sza...@chromium.org> wrote:Resurrecting this thread, because yotha@ are launching the first step in a project that completelyignores the previous consensus (i.e., that we should *not* enable threaded compositing for allweb_tests).This is exciting to hear! Is there a good place / bug to follow progress?
Woah 2015 super necro thread!There were a couple meta angles here:- Explicitly testing non-web exposed behavior is probably better done with a SimTest or other unit test. In the past folks kept adding more methods to Internals, or special behavior to the compositor, to make it easier to write web_tests. So not having the threaded compositor discourages some classes of that. I see folks are still adding new internals methods unfortunately, but I still think SimTests (or something else) is a better option.
- The more "complete" the browser run over web_tests, the more bugs are caught, but (at least at the time) browser_tests were very flaky and there was a strong desire to avoid that fate within blink. For example you could run the full chrome binary for web_tests and catch more bugs too. Why stop at just the threaded compositor? The thinking was making web_tests very stable was better than catching a small number of real bugs at the cost of a flake tax across the hundreds of contributors.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAHOQ7J-0kFrERxC_PZdyfA6yoh_53%2Bu50gm_vuS-hzd1pAXUTg%40mail.gmail.com.