Cannot understand flakiness dashboard

28 views
Skip to first unread message

Mikel Astiz

unread,
Oct 29, 2018, 9:46:12 AM10/29/18
to chromi...@chromium.org, Sergiy Belozorov
Hi all,

I'm wondering who maintains or knows the flakiness dashboard in detail.

I'm investigating certain reported flakes (example), where tryserver.chromium.android is marked as VERYFLAKY, and hovering over a black cell says something like "PASS PASS PASS .", and I fail to find an example of a test failure.

Build logs (example) show failures (and flakes) for steps "with patch", but that doesn't seem a strong indicator of flakiness on trunk (without patch), is it?

The test (org.chromium.chrome.browser.sync.SyncCustomizationFragmentTest#testSetupOnly) does however get listed as failure in "retry summary" (with and without patch). Where does this come from? Is this WAI?

Thanks!
Mikel

Yuke Liao

unread,
Oct 29, 2018, 11:03:02 AM10/29/18
to mas...@chromium.org, erik...@chromium.org, chromi...@chromium.org, Sergiy Belozorov
I'm not sure who maintains flakiness dashboard either, but feel free to file a bug using component: Infra>Flakiness>Dashboard.

+Erik Chen, I think this might be related to your 'retry with patch' change. 

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAAy826erMjQnhBHZ0wY3QFqwNFihAVRSDrY7afWAHvi7-mNVow%40mail.gmail.com.

Erik Chen

unread,
Oct 29, 2018, 11:35:40 AM10/29/18
to Yuke Liao, mas...@chromium.org, chromi...@chromium.org, Sergiy Belozorov
I also don't know how to interpret this dashboard. :(

In the example you linked, there are a lot tests with "TEST RESULTS WERE INVALID". That might also be causing problems?

Dirk Pranke

unread,
Oct 30, 2018, 6:22:05 PM10/30/18
to Erik Chen, Yuke Liao, Mikel Astiz, chromi...@chromium.org, Sergiy Belozorov, Sean McCullough
The app is owned by Sean McCullough and his team in Ops, but it's been very badly understaffed for a long time, and doesn't work that well in some scenarios. 

I know how it works and can answer some questions, as can a few other people. 

Other comments inline ...

On Mon, Oct 29, 2018 at 8:34 AM, Erik Chen <erik...@chromium.org> wrote:
I also don't know how to interpret this dashboard. :(

In the example you linked, there are a lot tests with "TEST RESULTS WERE INVALID". That might also be causing problems?

On Mon, Oct 29, 2018 at 11:01 AM, Yuke Liao <liao...@chromium.org> wrote:
I'm not sure who maintains flakiness dashboard either, but feel free to file a bug using component: Infra>Flakiness>Dashboard.

+Erik Chen, I think this might be related to your 'retry with patch' change. 


On Mon, Oct 29, 2018 at 6:45 AM Mikel Astiz <mas...@chromium.org> wrote:
Hi all,

I'm wondering who maintains or knows the flakiness dashboard in detail.

I'm investigating certain reported flakes (example), where tryserver.chromium.android is marked as VERYFLAKY, and hovering over a black cell says something like "PASS PASS PASS .", and I fail to find an example of a test failure.

The second most recent black/VERYFLAKY cell I see is from:


Looking at the log for that test, I can't make heads nor tails of it, since it looks like it's running the test somewhere between 4 and 12 times even though it passes every time. So, someone who is more familiar with Android testing should probably take a look.

 

Build logs (example) show failures (and flakes) for steps "with patch", but that doesn't seem a strong indicator of flakiness on trunk (without patch), is it?

Most of the time, test flakiness on a try job is unrelated to the patch in question, and so flakiness on a "with patch" run can be reflective of flakiness on trunk, though it's not a guarantee.
 

The test (org.chromium.chrome.browser.sync.SyncCustomizationFragmentTest#testSetupOnly) does however get listed as failure in "retry summary" (with and without patch). Where does this come from? Is this WAI?

I'm not sure, as per my comment above. Something seems wrong somewhere :). But I'm not sure which things.

Can you file a bug and we can get someone to look into it in more detail? Or, link us to an existing bug if that's what triggered your investigation?

-- Dirk
 

Thanks!
Mikel

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAAy826erMjQnhBHZ0wY3QFqwNFihAVRSDrY7afWAHvi7-mNVow%40mail.gmail.com.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Mikel Astiz

unread,
Oct 31, 2018, 2:52:49 AM10/31/18
to dpr...@chromium.org, erik...@chromium.org, liao...@chromium.org, Mikel Astiz, chromi...@chromium.org, Sergiy Belozorov, seanmcc...@google.com
Hi,

Thanks for your replies, answers inline.

On Tue, Oct 30, 2018 at 11:20 PM Dirk Pranke <dpr...@chromium.org> wrote:
The app is owned by Sean McCullough and his team in Ops, but it's been very badly understaffed for a long time, and doesn't work that well in some scenarios. 

I know how it works and can answer some questions, as can a few other people. 

Other comments inline ...

On Mon, Oct 29, 2018 at 8:34 AM, Erik Chen <erik...@chromium.org> wrote:
I also don't know how to interpret this dashboard. :(

In the example you linked, there are a lot tests with "TEST RESULTS WERE INVALID". That might also be causing problems?

On Mon, Oct 29, 2018 at 11:01 AM, Yuke Liao <liao...@chromium.org> wrote:
I'm not sure who maintains flakiness dashboard either, but feel free to file a bug using component: Infra>Flakiness>Dashboard.

+Erik Chen, I think this might be related to your 'retry with patch' change. 


On Mon, Oct 29, 2018 at 6:45 AM Mikel Astiz <mas...@chromium.org> wrote:
Hi all,

I'm wondering who maintains or knows the flakiness dashboard in detail.

I'm investigating certain reported flakes (example), where tryserver.chromium.android is marked as VERYFLAKY, and hovering over a black cell says something like "PASS PASS PASS .", and I fail to find an example of a test failure.

The second most recent black/VERYFLAKY cell I see is from:


Looking at the log for that test, I can't make heads nor tails of it, since it looks like it's running the test somewhere between 4 and 12 times even though it passes every time. So, someone who is more familiar with Android testing should probably take a look.

 

Build logs (example) show failures (and flakes) for steps "with patch", but that doesn't seem a strong indicator of flakiness on trunk (without patch), is it?

Most of the time, test flakiness on a try job is unrelated to the patch in question, and so flakiness on a "with patch" run can be reflective of flakiness on trunk, though it's not a guarantee.

That makes sense. My question was whether the flakiness dashboard and related tools would report it as a flake.
 
 

The test (org.chromium.chrome.browser.sync.SyncCustomizationFragmentTest#testSetupOnly) does however get listed as failure in "retry summary" (with and without patch). Where does this come from? Is this WAI?

I'm not sure, as per my comment above. Something seems wrong somewhere :). But I'm not sure which things.

Can you file a bug and we can get someone to look into it in more detail? Or, link us to an existing bug if that's what triggered your investigation?

I filed a new bug as a blocker for our original bug.

Thanks,
Mikel

Dirk Pranke

unread,
Oct 31, 2018, 3:23:10 PM10/31/18
to Mikel Astiz, Erik Chen, Yuke Liao, chromi...@chromium.org, Sergiy Belozorov, Sean McCullough, Shuotao Gao
On Tue, Oct 30, 2018 at 11:51 PM, Mikel Astiz <mas...@chromium.org> wrote:
Hi,

Thanks for your replies, answers inline.

On Tue, Oct 30, 2018 at 11:20 PM Dirk Pranke <dpr...@chromium.org> wrote:
The app is owned by Sean McCullough and his team in Ops, but it's been very badly understaffed for a long time, and doesn't work that well in some scenarios. 

I know how it works and can answer some questions, as can a few other people. 

Other comments inline ...

On Mon, Oct 29, 2018 at 8:34 AM, Erik Chen <erik...@chromium.org> wrote:
I also don't know how to interpret this dashboard. :(

In the example you linked, there are a lot tests with "TEST RESULTS WERE INVALID". That might also be causing problems?

On Mon, Oct 29, 2018 at 11:01 AM, Yuke Liao <liao...@chromium.org> wrote:
I'm not sure who maintains flakiness dashboard either, but feel free to file a bug using component: Infra>Flakiness>Dashboard.

+Erik Chen, I think this might be related to your 'retry with patch' change. 


On Mon, Oct 29, 2018 at 6:45 AM Mikel Astiz <mas...@chromium.org> wrote:
Hi all,

I'm wondering who maintains or knows the flakiness dashboard in detail.

I'm investigating certain reported flakes (example), where tryserver.chromium.android is marked as VERYFLAKY, and hovering over a black cell says something like "PASS PASS PASS .", and I fail to find an example of a test failure.

The second most recent black/VERYFLAKY cell I see is from:


Looking at the log for that test, I can't make heads nor tails of it, since it looks like it's running the test somewhere between 4 and 12 times even though it passes every time. So, someone who is more familiar with Android testing should probably take a look.

 

Build logs (example) show failures (and flakes) for steps "with patch", but that doesn't seem a strong indicator of flakiness on trunk (without patch), is it?

Most of the time, test flakiness on a try job is unrelated to the patch in question, and so flakiness on a "with patch" run can be reflective of flakiness on trunk, though it's not a guarantee.

That makes sense. My question was whether the flakiness dashboard and related tools would report it as a flake.

Good question. I'm not sure of the answer but maybe liaoyuke@ or stgao@ knows. 

-- Dirk

Yuke Liao

unread,
Nov 1, 2018, 6:33:45 PM11/1/18
to Dirk Pranke, Mikel Astiz, Erik Chen, chromi...@chromium.org, Sergiy Belozorov, Sean McCullough, Shuotao Gao
Hi Mikel,

FindIt's flake detector did detect this flake, please see:

Unfortunately, there is no easy way to navigate to this link as of now, the https://findit-for-me.appspot.com/ranked-flakes dashboard only surfaces flakes that meet certain criteria, and it seems this test doesn't. We're still improving the tool and will fix that, at least make tests that don't meet the criteria searchable, please let us know if you have any other feedback.

Shuotao Gao

unread,
Nov 1, 2018, 6:53:20 PM11/1/18
to Yuke Liao, mas...@chromium.org, Dirk Pranke, erik...@chromium.org, chromi...@chromium.org, Sergiy Belozorov, Sean McCullough
For this particular build (build time on 10/26/2018) mentioned in this email thread, Flake Detector failed to detect it unfortunately.
Flake Detector was broken for an extended period of time by "retry with patch", we have fixed the majority of cases there, but this is another edge case that is not handled yet.
I filed https://crbug.com/901158 to follow up.
Reply all
Reply to author
Forward
0 new messages