sheriff's not disabling flaky tests WAS: [blink-dev] Re: blink CQ false rejections for today (Nov 24) exceeded 20%

16 views
Skip to first unread message

Ojan Vafai

unread,
Nov 24, 2015, 7:08:07 PM11/24/15
to Julie Parent, Kenneth Russell, Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev
+chromium-dev

Context: The blink CQ was >20% flaky for 16 hours yesterday because flaky downloads gtests. 

On Tue, Nov 24, 2015 at 2:07 PM Julie Parent <jpa...@chromium.org> wrote:
I don't think it makes sense to ask sheriffs to look at the viceroy graphs.  Sheriffs use s-o-m as a one stop shop.

+1. The sheriff shouldn't care how bad the flake is. The flake should be handled the same way regardless, which is to revert the offending patch if it's straightforward to figure out and disable the test + assign appropriate owners to the bug otherwise.

It appears that the sheriff noticed the bug almost immediately, but noone disabled the tests for 14 hours until jam@ asked for the tests to be disabled. We need to figure out why this happened. Can you followup with the chromium sheriffs on duty at that time to see what went wrong?

Also, the Sheriff-Chromium label is still on this even after the tests were disabled, which means that the sheriffing bug queue has bugs that are not the sheriff's responsibility any more.

Is there a better way we can communicate expectations around the bug queue? Maybe the on-duty sheriff's should be nagged if they have left bugs that are >4 hours old in the queue or something?

Once we figure that out, you can see what the best solution on your end is. For example, maybe sheriff-o-matic needs to explicitly say in the UI in big red text: "If you can't find the culprit patch quickly, disable the tests first and then continue investigation."
 

You mention the trooper failing to notify the sheriff of the builder success rate dropping low as a core problem.  I agree, but we don't want to depend on humans to do this.  This is why we are building tooling.   We should make it either dead simple, or automatic, for it to happen.  What if the alert for linux_chromium_chromeos_rel_ng success rate had been added directly to the sheriff's view in s-o-m, no trooper involvement needed?  This seems like the sort of issue that should be surfaced for both trooper and sheriff.  Or, if we still want troopers to vet it first, make it incredibly easy to verify and send it to s-o-m for the sheriff. 

On Tue, Nov 24, 2015 at 1:52 PM, Kenneth Russell <k...@chromium.org> wrote:
On Tue, Nov 24, 2015 at 3:19 AM, Adrian Kuegel <aku...@chromium.org> wrote:
It seems the problem in this case was that the Sheriffs didn't realize how bad these flaky tests were affecting the Tryserver. I added a link to our Monitoring Page with some small explanation to https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues
We are also considering linking to the monitoring page from Sheriff-o-matic directly.

In any case, for this specific case our tooling already worked: we got alerts by the monitoring pipeline that the success rate of linux_chromium_chromeos_rel_ng had dropped (also clearly visible in the  Builder success rate Graph), automatic flakiness bugs were filed, and we got alerts from buildbucket. What didn't work: the trooper didn't notify the Sheriffs about the alerts, so the Sheriff only saw the flakiness bugs, and probably didn't know about how to investigate the effect of that flakiness. Also, the instructions for Sheriffs regarding flakiness just said to disable tests after "a couple of hours" (which is maybe not clear enough). I added:

If a builder has dropped to a low success rate because of flaky tests, those tests should be disabled as soon as possible.

If previously-reliable tests suddenly become flaky, then investigation should be done immediately to see whether a recently committed CL is likely the cause, and if one is found, it should be reverted.

 


On Tue, Nov 24, 2015 at 10:41 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
I've heard the Flakiness sub-team of CQ FA starts investigating chromium flakiness exceeding 10% and blink exceeding 20%. Today's blink false rejection rate is 24.1%, and so I'm looking forward to your insights.

This seems easy - cycle time has also regressed, and I've noticed we had a series of flakes on linux_chromium_chromeos_rel_ng (https://code.google.com/p/chromium/issues/detail?id=560329). Automated systems did react, see e.g. https://code.google.com/p/chromium/issues/detail?id=560264 .

Action seems to have happened with a rather large delay (8-18 hours) - see https://codereview.chromium.org/1467183004 disabling the tests.

I'm working on a CL for the test launcher not to bail out early without the patch (i.e. calculate the threshold for broken tests differently when we only run a subset of them), as happened in e.g. http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_rel_ng/builds/133462 .

I wouldn't like to exaggerate importance of this regression. On the other hand, it gives us a specific case to study and improve on.

Paweł

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPZ-%3Day7e0%2BV6y4JigD68wLUBybQKs38Mjq6x8rG1BX%3DaQ%40mail.gmail.com.


--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.

Bruce

unread,
Nov 24, 2015, 8:24:13 PM11/24/15
to blink-dev, jpa...@chromium.org, k...@chromium.org, aku...@chromium.org, phajd...@chromium.org, infr...@chromium.org, chromi...@chromium.org
On a related note, there has been some long-term flakiness of blink tests on one of the try bots and this has stopped some CLs from landing. It was first reported in crbug.com/521124 and more recently I filed crbug.com/558574.

I have a patch that I believe should disable the two tests that have been failing consistently lately (crrev.com/1472863005) but I don't have the context to know whether this solution is correct.

Any thoughts on the overall issues, or reviewers for the specific patch?
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+unsubscribe@chromium.org.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+unsubscribe@chromium.org.

To post to this group, send email to infr...@chromium.org.

Dirk Pranke

unread,
Nov 24, 2015, 8:35:41 PM11/24/15
to Bruce, blink-dev, Julie Parent, Kenneth Russell, Adrian Kuegel, Paweł Hajdan, Jr., infr...@chromium.org, chromium-dev
As has been noted in threads before regarding those tests, it is a long-standing 
principle of the project that flaky tests should be disabled rather than being 
allowed to persist in causing flaky failures.

Thank you for being the first person to actually get around to doing so :).

I LGTM'ed the patch.

(Ideally someone would've taken the time to figure out why those tests are only failing
on some of the tryservers, but thus far no one has been willing to do so).

-- Dirk

To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.

Adrian Kuegel

unread,
Nov 25, 2015, 4:31:15 AM11/25/15
to Ojan Vafai, Julie Parent, Kenneth Russell, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev
On Wed, Nov 25, 2015 at 1:07 AM, Ojan Vafai <oj...@chromium.org> wrote:
+chromium-dev

Context: The blink CQ was >20% flaky for 16 hours yesterday because flaky downloads gtests. 

On Tue, Nov 24, 2015 at 2:07 PM Julie Parent <jpa...@chromium.org> wrote:
I don't think it makes sense to ask sheriffs to look at the viceroy graphs.  Sheriffs use s-o-m as a one stop shop.

+1. The sheriff shouldn't care how bad the flake is. The flake should be handled the same way regardless, which is to revert the offending patch if it's straightforward to figure out and disable the test + assign appropriate owners to the bug otherwise.


Good point. Even if the flake occurs not so frequently, it is still hurting some developers who want to get their patch landed. So should I remove the link to the viceroy graphs, or just change the text that this can be used as a FYI if a Sheriff is curious?

Ojan Vafai

unread,
Nov 25, 2015, 10:54:58 PM11/25/15
to Adrian Kuegel, Julie Parent, Kenneth Russell, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev
On Wed, Nov 25, 2015 at 1:30 AM Adrian Kuegel <aku...@chromium.org> wrote:
On Wed, Nov 25, 2015 at 1:07 AM, Ojan Vafai <oj...@chromium.org> wrote:
+chromium-dev

Context: The blink CQ was >20% flaky for 16 hours yesterday because flaky downloads gtests. 

On Tue, Nov 24, 2015 at 2:07 PM Julie Parent <jpa...@chromium.org> wrote:
I don't think it makes sense to ask sheriffs to look at the viceroy graphs.  Sheriffs use s-o-m as a one stop shop.

+1. The sheriff shouldn't care how bad the flake is. The flake should be handled the same way regardless, which is to revert the offending patch if it's straightforward to figure out and disable the test + assign appropriate owners to the bug otherwise.


Good point. Even if the flake occurs not so frequently, it is still hurting some developers who want to get their patch landed. So should I remove the link to the viceroy graphs, or just change the text that this can be used as a FYI if a Sheriff is curious?

I think the best thing is to remove it. We need to keep documentation like this as short as possible so that people actually read it. As per the other thread, we are clearly having a problem where people aren't even reading all of what's there now.
  
It appears that the sheriff noticed the bug almost immediately, but noone disabled the tests for 14 hours until jam@ asked for the tests to be disabled. We need to figure out why this happened. Can you followup with the chromium sheriffs on duty at that time to see what went wrong?

You didn't respond to this part. Did you contact the sheriff? In order to reduce the flakiness numbers, we have to get sheriff's to address these bugs quickly. If they're not doing so, we need to understand why so we can fix it. If we really try and can't get the sheriff's to act, then we'll need to consider other solutions (e.g. temporarily auto-disabling tests). But first we need to make sure we've done all the work to get sheriff's to pay attention since there are people very opposed to auto-disabling of tests.

John Abd-El-Malek

unread,
Nov 25, 2015, 11:26:41 PM11/25/15
to Ojan Vafai, Adrian Kuegel, Julie Parent, Kenneth Russell, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev
(excuse the cross-post)

Can we put a concise version of the sheriff instructions in the reminder emails that get sent to sheriffs? I can understand how in such a large team many folks might not have heard what the accepted wisdom is in sheriffing.

Ojan Vafai

unread,
Nov 25, 2015, 11:41:14 PM11/25/15
to John Abd-El-Malek, Adrian Kuegel, Julie Parent, Kenneth Russell, Paweł Hajdan, Jr., infr...@chromium.org, blink-dev, chromium-dev
There's certainly no harm in doing so. I doubt it will be enough to definitively fix the problem though. I suspect people who have sheriff'ed before see the reminder email and immediately archive it without reading it. That's what I do.

Adrian/Sergiy: Mind do so?
Reply all
Reply to author
Forward
0 new messages