blink CQ false rejections for today (Nov 24) exceeded 20%

Paweł Hajdan, Jr.

unread,

Nov 24, 2015, 4:41:59 AM11/24/15

to infr...@chromium.org, blink-dev

I've heard the Flakiness sub-team of CQ FA starts investigating chromium flakiness exceeding 10% and blink exceeding 20%. Today's blink false rejection rate is 24.1%, and so I'm looking forward to your insights.

This seems easy - cycle time has also regressed, and I've noticed we had a series of flakes on linux_chromium_chromeos_rel_ng (https://code.google.com/p/chromium/issues/detail?id=560329). Automated systems did react, see e.g. https://code.google.com/p/chromium/issues/detail?id=560264 .

Action seems to have happened with a rather large delay (8-18 hours) - see https://codereview.chromium.org/1467183004 disabling the tests.

I'm working on a CL for the test launcher not to bail out early without the patch (i.e. calculate the threshold for broken tests differently when we only run a subset of them), as happened in e.g. http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_rel_ng/builds/133462 .

I wouldn't like to exaggerate importance of this regression. On the other hand, it gives us a specific case to study and improve on.

Paweł

Adrian Kuegel

unread,

Nov 24, 2015, 6:20:02 AM11/24/15

to Paweł Hajdan, Jr., infr...@chromium.org, blink-dev

It seems the problem in this case was that the Sheriffs didn't realize how bad these flaky tests were affecting the Tryserver. I added a link to our Monitoring Page with some small explanation to https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues

We are also considering linking to the monitoring page from Sheriff-o-matic directly.

In any case, for this specific case our tooling already worked: we got alerts by the monitoring pipeline that the success rate of linux_chromium_chromeos_rel_ng had dropped (also clearly visible in the Builder success rate Graph), automatic flakiness bugs were filed, and we got alerts from buildbucket. What didn't work: the trooper didn't notify the Sheriffs about the alerts, so the Sheriff only saw the flakiness bugs, and probably didn't know about how to investigate the effect of that flakiness. Also, the instructions for Sheriffs regarding flakiness just said to disable tests after "a couple of hours" (which is maybe not clear enough). I added:

If a builder has dropped to a low success rate because of flaky tests, those tests should be disabled as soon as possible.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPZ-%3Day7e0%2BV6y4JigD68wLUBybQKs38Mjq6x8rG1BX%3DaQ%40mail.gmail.com.

Kenneth Russell

unread,

Nov 24, 2015, 4:53:03 PM11/24/15

to Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev

On Tue, Nov 24, 2015 at 3:19 AM, Adrian Kuegel <aku...@chromium.org> wrote:

It seems the problem in this case was that the Sheriffs didn't realize how bad these flaky tests were affecting the Tryserver. I added a link to our Monitoring Page with some small explanation to https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues
We are also considering linking to the monitoring page from Sheriff-o-matic directly.

In any case, for this specific case our tooling already worked: we got alerts by the monitoring pipeline that the success rate of linux_chromium_chromeos_rel_ng had dropped (also clearly visible in the Builder success rate Graph), automatic flakiness bugs were filed, and we got alerts from buildbucket. What didn't work: the trooper didn't notify the Sheriffs about the alerts, so the Sheriff only saw the flakiness bugs, and probably didn't know about how to investigate the effect of that flakiness. Also, the instructions for Sheriffs regarding flakiness just said to disable tests after "a couple of hours" (which is maybe not clear enough). I added:

If a builder has dropped to a low success rate because of flaky tests, those tests should be disabled as soon as possible.

If previously-reliable tests suddenly become flaky, then investigation should be done immediately to see whether a recently committed CL is likely the cause, and if one is found, it should be reverted.

Julie Parent

unread,

Nov 24, 2015, 5:07:48 PM11/24/15

to Kenneth Russell, Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev

I don't think it makes sense to ask sheriffs to look at the viceroy graphs. Sheriffs use s-o-m as a one stop shop.

You mention the trooper failing to notify the sheriff of the builder success rate dropping low as a core problem. I agree, but we don't want to depend on humans to do this. This is why we are building tooling. We should make it either dead simple, or automatic, for it to happen. What if the alert for linux_chromium_chromeos_rel_ng success rate had been added directly to the sheriff's view in s-o-m, no trooper involvement needed? This seems like the sort of issue that should be surfaced for both trooper and sheriff. Or, if we still want troopers to vet it first, make it incredibly easy to verify and send it to s-o-m for the sheriff.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAMYvS2fXj0vjP5bL3K7QzqtbT7pYs0ha28j5ZAFBnJYzk%3DiG_w%40mail.gmail.com.

Ojan Vafai

unread,

Nov 24, 2015, 7:07:17 PM11/24/15

to Julie Parent, Kenneth Russell, Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev

+chromium-dev

Context: The blink CQ was >20% flaky for 16 hours yesterday because flaky downloads gtests.

On Tue, Nov 24, 2015 at 2:07 PM Julie Parent <jpa...@chromium.org> wrote:

I don't think it makes sense to ask sheriffs to look at the viceroy graphs. Sheriffs use s-o-m as a one stop shop.

+1. The sheriff shouldn't care how bad the flake is. The flake should be handled the same way regardless, which is to revert the offending patch if it's straightforward to figure out and disable the test + assign appropriate owners to the bug otherwise.

It appears that the sheriff noticed the bug almost immediately, but noone disabled the tests for 14 hours until jam@ asked for the tests to be disabled. We need to figure out why this happened. Can you followup with the chromium sheriffs on duty at that time to see what went wrong?

Also, the Sheriff-Chromium label is still on this even after the tests were disabled, which means that the sheriffing bug queue has bugs that are not the sheriff's responsibility any more.

Is there a better way we can communicate expectations around the bug queue? Maybe the on-duty sheriff's should be nagged if they have left bugs that are >4 hours old in the queue or something?

Once we figure that out, you can see what the best solution on your end is. For example, maybe sheriff-o-matic needs to explicitly say in the UI in big red text: "If you can't find the culprit patch quickly, disable the tests first and then continue investigation."

Bruce

unread,

Nov 24, 2015, 8:23:16 PM11/24/15

to blink-dev, jpa...@chromium.org, k...@chromium.org, aku...@chromium.org, phajd...@chromium.org, infr...@chromium.org, chromi...@chromium.org

On a related note, there has been some long-term flakiness of blink tests on one of the try bots and this has stopped some CLs from landing. It was first reported in crbug.com/521124 and more recently I filed crbug.com/558574.

I have a patch that I believe should disable the two tests that have been failing consistently lately (crrev.com/1472863005) but I don't have the context to know whether this solution is correct.

Any thoughts on the overall issues, or reviewers for the specific patch?

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+unsubscribe@chromium.org.

To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPZ-%3Day7e0%2BV6y4JigD68wLUBybQKs38Mjq6x8rG1BX%3DaQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+unsubscribe@chromium.org.

To post to this group, send email to infr...@chromium.org.

Dirk Pranke

unread,

Nov 24, 2015, 8:34:45 PM11/24/15

to Bruce, blink-dev, Julie Parent, Kenneth Russell, Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, chromium-dev

As has been noted in threads before regarding those tests, it is a long-standing

principle of the project that flaky tests should be disabled rather than being

allowed to persist in causing flaky failures.

Thank you for being the first person to actually get around to doing so :).

I LGTM'ed the patch.

(Ideally someone would've taken the time to figure out why those tests are only failing

on some of the tryservers, but thus far no one has been willing to do so).

-- Dirk

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPZ-%3Day7e0%2BV6y4JigD68wLUBybQKs38Mjq6x8rG1BX%3DaQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAMYvS2fXj0vjP5bL3K7QzqtbT7pYs0ha28j5ZAFBnJYzk%3DiG_w%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/98edf73c-a3df-4c29-8c43-e0b5177fe1ee%40chromium.org.