site_per_process_webkit_layout_tests

26 views
Skip to first unread message

Roger Tawa

unread,
Jul 31, 2018, 11:02:32 AM7/31/18
to chromium-dev, Markus Heintz
Hi all,

As sheriff over the last two days I've noticed a lot of flaky tests from site_per_process_webkit_layout_tests.  Flakiness dashboard for site_per_process_webkit_layout_tests:


As a comparison, the flakiness dashboard for browser_tests:


When I look at this I ask myself why site_per_process_webkit_layout_tests are even part of our build process if they are that flaky?  And if they really need to be, why do they generate work for sheriffs?

Thanks,
Roger

-

Dominic Mazzoni

unread,
Jul 31, 2018, 11:46:29 AM7/31/18
to rog...@chromium.org, chromium-dev, Markus Heintz
I think a more reasonable comparison would be to webkit_layout_tests. Is it site isolation that's causing the problems, or are there just a lot of flaky layout tests?

I'm having a hard time figuring out how to get the equivalent flakiness dashboard output for webkit_layout_tests, can anyone post a working link?

Also, it looks like the vast majority on that list start with virtual/outofblink-cors-ns - which means that the problem is potentially with --enable-features=OutOfBlinkCORS,NetworkService and not with the specific tests.


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAFv%2B6UpnUwfPQv8ADXAWO1k2jvDz%3Dz2e5ErqHg69kJCaY0yp%2Bg%40mail.gmail.com.

Roger Tawa

unread,
Jul 31, 2018, 1:42:39 PM7/31/18
to Dominic Mazzoni, chromium-dev, Markus Heintz, Fabrice de Gans-Riberi
Thanks for looking Dominic.

Is the best option to disable all those tests?  Or maybe disable the feature?  Is this where they are declared:


Roger

-

Stephen Chenney

unread,
Jul 31, 2018, 5:51:21 PM7/31/18
to rog...@chromium.org, dmaz...@chromium.org, chromi...@chromium.org, markus...@google.com, fde...@google.com
We do not want to disable the entire suite. The goal is to drive toward fixing the tests.

Are there a lot of distinct tests? If it's a big deal for you to mark them as flaky then send me the list and I'll do it, and add them to the tracking bug https://bugs.chromium.org/p/chromium/issues/detail?id=667551

Stephen.

Roger Tawa

unread,
Aug 14, 2018, 10:28:17 AM8/14/18
to sche...@chromium.org, Dominic Mazzoni, chromium-dev, Markus Heintz, Fabrice de Gans-Riberi
Hi Stephen,

I was on vacation after my last sheriff shift, sorry for delay.  I don't what the state is today, but two weeks ago it was generating a lot of noise for sheriffs.  I think the chrome policy is disable such tests if they are very flaky.

Thanks,
Roger

-

Stephen Chenney

unread,
Aug 14, 2018, 11:04:53 AM8/14/18
to Roger Tawa, Vladimir Levin, Dominic Mazzoni, chromi...@chromium.org, Markus Heintz, Fabrice de Gans-Riberi
It's fine to disable certain tests, but I want to avoid disabling the entire suite. If every test is flaky sometimes, then the issue is with the test infrastructure and we should fix that.

Looping in vmpstr@ who wrote the handling for OOPIF in layout tests.

Cheers,
Stephen.

Dirk Pranke

unread,
Aug 14, 2018, 12:53:40 PM8/14/18
to Stephen Chenney, Roger Tawa, Vladimir Levin, Dominic Mazzoni, chromium-dev, Markus Heintz, Fabrice de Gans-Riberi
I also would prefer not to disable the entire test suite, but if it is too flaky and disrupting the flow for everyone, it's still the right thing to do.

Things need to be off the waterfall until proven stable, not otherwise. We haven't historically done this well or consistently, but we are trying to get better about it, and that means getting stricter about it as well.

I don't know what the right thing to do in this particular case is; I haven't looked at the data recently. I am happy to discuss whether there should be exceptions if we have a plan for dealing with things.

-- Dirk



Łukasz Anforowicz

unread,
Aug 15, 2018, 8:00:42 PM8/15/18
to Chromium-dev, sche...@chromium.org, rog...@chromium.org, vmp...@chromium.org, dmaz...@chromium.org, markus...@google.com, fde...@google.com
If the flakiness is not limited to a handful of individual tests, then I'd rather avoid disabling a whole swath of tests without first understanding the root cause of the flakiness.  I also don't think that disabling site-per-process (or the site_per_process_webkit_layout_tests test step) is a good option, since since M67 99% of Chrome users run with site-per-process.

That said, it does indeed seem that site-per-process makes tests more flaky and we should try to get to the bottom of this (not sure exactly how yet...).  I've opened https://crbug.com/874695 to track this problem.

FWIW, I don't think OOB-CORS VS OOPIF bears any significant amount of blame for the difference - OOB-CORS accounts only for 167 out of 3958 flaky tests.

One thing I did notice is that quite a few tests that are flaky with site-per-process had quite a slow "slowest run" (around 1100 tests had a "slowest run" of 3s or more).  I don't know if "slowest run" only counts passes (and so ignores timeouts), but if so, then it would probably mean that site-per-process makes things slightly slower and possibly takes quite a few tests over a timeout cliff.  I was not able to figure out how to limit the flakiness dashboard to only show timeouts (and working with the dashboard was fairly painful in general - it crashes quite often due to OOMs).

Dirk Pranke

unread,
Aug 15, 2018, 9:03:38 PM8/15/18
to Łukasz Anforowicz, Chromium-dev, Stephen Chenney, Roger Tawa, Vladimir Levin, Dominic Mazzoni, Markus Heintz, Fabrice de Gans-Riberi
On Wed, Aug 15, 2018 at 5:00 PM, Łukasz Anforowicz <luk...@chromium.org> wrote:
If the flakiness is not limited to a handful of individual tests, then I'd rather avoid disabling a whole swath of tests without first understanding the root cause of the flakiness.

That is understandable, but also not going to happen unless someone is actively working on this and we have some hope of an ETA on a fix. AFAIK, we don't have either of these things, though maybe you've started looking at it now?
 
  I also don't think that disabling site-per-process (or the site_per_process_webkit_layout_tests test step) is a good option, since since M67 99% of Chrome users run with site-per-process.

I never said it was a good option :). But, it is our policy to not run flaky tests on the waterfall, period. The burden of proof is on devs to show that their tests are stable before they get to run in the CQ and affect everyone. We're not enforcing this perfectly, but that doesn't mean that things get free passes. Just because you've shipped something doesn't mean that, either.

[ In an ideal world, we would've caught this before shipping, of course, and it would've been part of the launch discussion. ]
 
That said, it does indeed seem that site-per-process makes tests more flaky and we should try to get to the bottom of this (not sure exactly how yet...).  I've opened https://crbug.com/874695 to track this problem.

Thank you. I hope we can make some significant progress on this, because I don't want to turn the step off, either.
 
FWIW, I don't think OOB-CORS VS OOPIF bears any significant amount of blame for the difference - OOB-CORS accounts only for 167 out of 3958 flaky tests.

One thing I did notice is that quite a few tests that are flaky with site-per-process had quite a slow "slowest run" (around 1100 tests had a "slowest run" of 3s or more).  I don't know if "slowest run" only counts passes (and so ignores timeouts), but if so, then it would probably mean that site-per-process makes things slightly slower and possibly takes quite a few tests over a timeout cliff.  I was not able to figure out how to limit the flakiness dashboard to only show timeouts (and working with the dashboard was fairly painful in general - it crashes quite often due to OOMs).

Yes, I've added some comments on the bug, and we should continue discussion there. I suspect the bots are even more oversubscribed than they were before (since we intentionally run the bots heavily loaded), and that's not going to help. 

-- Dirk

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Łukasz Anforowicz

unread,
Aug 20, 2018, 6:34:16 PM8/20/18
to Chromium-dev, luk...@chromium.org, sche...@chromium.org, rog...@chromium.org, vmp...@chromium.org, dmaz...@chromium.org, markus...@google.com, fde...@google.com
On Wednesday, August 15, 2018 at 6:03:38 PM UTC-7, Dirk Pranke wrote:
On Wed, Aug 15, 2018 at 5:00 PM, Łukasz Anforowicz <luk...@chromium.org> wrote:
If the flakiness is not limited to a handful of individual tests, then I'd rather avoid disabling a whole swath of tests without first understanding the root cause of the flakiness.

That is understandable, but also not going to happen unless someone is actively working on this and we have some hope of an ETA on a fix. AFAIK, we don't have either of these things, though maybe you've started looking at it now?

I did manage to identify one source of flakiness in https://crbug.com/834185.  Let's see how much things will improve after fixing it.
 
  I also don't think that disabling site-per-process (or the site_per_process_webkit_layout_tests test step) is a good option, since since M67 99% of Chrome users run with site-per-process.

I never said it was a good option :). But, it is our policy to not run flaky tests on the waterfall, period. The burden of proof is on devs to show that their tests are stable before they get to run in the CQ and affect everyone.
 
FWIW, according to https://crbug.com/874695#c6 site_per_process_webkit_layout_tests actually contribute less to flakiness than the default webkit_layout_tests.  This is a bit surprising (i.e. I'd expect no difference at all), but hopefully this means that site_per_process_webkit_layout_tests is in a reasonably good shape to be CQ-worthy.
We're not enforcing this perfectly, but that doesn't mean that things get free passes. Just because you've shipped something doesn't mean that, either.

My apologies - I didn't mean to imply that a shipped feature should get a free pass.
Reply all
Reply to author
Forward
0 new messages