android_blink_rel's usefulness in WPT imports

21 views
Skip to first unread message

Raphael Kubo da Costa

unread,
Aug 10, 2017, 4:48:08 AM8/10/17
to ecosyst...@chromium.org
Hey everyone,

I was looking at the most recent import failures today, and was
wondering if it makes sense to take android_blink_rel's results into
consideration at all.

* It doesn't seem to actually test what we're importing. When the bot
does manage to finish running successfully, the results it reports
always look like:

=> Results: 2/1013 tests passed (0.2%)
=> Tests to be fixed (1011):
=> Tests that will only be fixed if they crash (WONTFIX) (2):

so when the bot does run successfully, it always goes green and never
catches any of the test failures the other bots do.

* It gets stuck and slows down/cancels a working import. When the bot
does not run successfully, it gets stuck for > 40min in the
"webkit_tests" step until it either finishes with success or goes
purple. When the latter happens, the whole import job is cancelled
even if all the other bots reported meaningful results -- see
https://chromium-review.googlesource.com/c/608944 for example.

Philip Jägenstedt

unread,
Aug 10, 2017, 4:59:20 AM8/10/17
to Raphael Kubo da Costa, ecosyst...@chromium.org
Quinten, I guess this bot is triggered automatically by the rebaseline scripts? Should it be?

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.
To post to this group, send email to ecosyst...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ecosystem-infra/87mv77n709.fsf%40rkubodac-desk.ger.corp.intel.com.

Rick Byers

unread,
Aug 10, 2017, 10:10:38 AM8/10/17
to Philip Jägenstedt, Raphael Kubo da Costa, ecosyst...@chromium.org
There's a long history of trying to get LayoutTests (which includes WPT) running reliably on the android bots.  IIRC there's a very short 'SmokeTests' list (which in principle should support WPT tests) and otherwise we just rely on our Linux coverage (which is almost identical code to Android).  Not an ideal state (IMHO if we had to pick one platform to run WPT against I'd actually prefer Android over all others myself), but doesn't seem to be a huge problem in practice.  But I don't know all the details/history here.

On Thu, Aug 10, 2017 at 4:59 AM, Philip Jägenstedt <foo...@chromium.org> wrote:
Quinten, I guess this bot is triggered automatically by the rebaseline scripts? Should it be?
On Thu, Aug 10, 2017 at 10:48 AM Raphael Kubo da Costa <raphael.kubo.da.costa@intel.com> wrote:
Hey everyone,

I was looking at the most recent import failures today, and was
wondering if it makes sense to take android_blink_rel's results into
consideration at all.

* It doesn't seem to actually test what we're importing. When the bot
  does manage to finish running successfully, the results it reports
  always look like:

  => Results: 2/1013 tests passed (0.2%)
  => Tests to be fixed (1011):
  => Tests that will only be fixed if they crash (WONTFIX) (2):

  so when the bot does run successfully, it always goes green and never
  catches any of the test failures the other bots do.

* It gets stuck and slows down/cancels a working import. When the bot
  does not run successfully, it gets stuck for > 40min in the
  "webkit_tests" step until it either finishes with success or goes
  purple. When the latter happens, the whole import job is cancelled
  even if all the other bots reported meaningful results -- see
  https://chromium-review.googlesource.com/c/608944 for example.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-infra+unsubscribe@chromium.org.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-infra+unsubscribe@chromium.org.

To post to this group, send email to ecosyst...@chromium.org.

Philip Jägenstedt

unread,
Aug 10, 2017, 10:12:31 AM8/10/17
to Rick Byers, qyea...@chromium.org, Raphael Kubo da Costa, ecosyst...@chromium.org
+Quinten Yearsley in case he filters this list :)

On Thu, Aug 10, 2017 at 4:10 PM Rick Byers <rby...@chromium.org> wrote:
There's a long history of trying to get LayoutTests (which includes WPT) running reliably on the android bots.  IIRC there's a very short 'SmokeTests' list (which in principle should support WPT tests) and otherwise we just rely on our Linux coverage (which is almost identical code to Android).  Not an ideal state (IMHO if we had to pick one platform to run WPT against I'd actually prefer Android over all others myself), but doesn't seem to be a huge problem in practice.  But I don't know all the details/history here.

On Thu, Aug 10, 2017 at 4:59 AM, Philip Jägenstedt <foo...@chromium.org> wrote:
Quinten, I guess this bot is triggered automatically by the rebaseline scripts? Should it be?
On Thu, Aug 10, 2017 at 10:48 AM Raphael Kubo da Costa <raphael.ku...@intel.com> wrote:
Hey everyone,

I was looking at the most recent import failures today, and was
wondering if it makes sense to take android_blink_rel's results into
consideration at all.

* It doesn't seem to actually test what we're importing. When the bot
  does manage to finish running successfully, the results it reports
  always look like:

  => Results: 2/1013 tests passed (0.2%)
  => Tests to be fixed (1011):
  => Tests that will only be fixed if they crash (WONTFIX) (2):

  so when the bot does run successfully, it always goes green and never
  catches any of the test failures the other bots do.

* It gets stuck and slows down/cancels a working import. When the bot
  does not run successfully, it gets stuck for > 40min in the
  "webkit_tests" step until it either finishes with success or goes
  purple. When the latter happens, the whole import job is cancelled
  even if all the other bots reported meaningful results -- see
  https://chromium-review.googlesource.com/c/608944 for example.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

To post to this group, send email to ecosyst...@chromium.org.

Quinten Yearsley

unread,
Aug 10, 2017, 12:22:02 PM8/10/17
to Rick Byers, Philip Jägenstedt, Raphael Kubo da Costa, ecosyst...@chromium.org
That's right, on Android we run tests in the SmokeTests list, which includes some wpt tests.

The problem with it not running any tests at all recently is crbug.com/753702.

The recent-ish history with regards to layout tests on Android is: we currently run layout tests on very old versions of Android (K), and we want to run layout tests on newer versions of Android and on newer devices, but this is blocked on crbug.com/567947.

We still want to trigger android_blink_rel for imports because some wpt tests are (supposed to be) run on the waterfall (WebKit Android Nexus4), and it's possible for updates to those tests to cause waterfall failures if we don't take them into consideration.

I think that the case of https://chromium-review.googlesource.com/c/608944 is actually related to crbug.com/754169 (adding another comment there). In general, when there's a purple android job and green everything else, the importer should be able to continue (except of course if one of the few tests run on android was actually affected).

Philip Jägenstedt

unread,
Aug 11, 2017, 4:32:35 AM8/11/17
to Quinten Yearsley, Rick Byers, Raphael Kubo da Costa, ecosyst...@chromium.org
Which bots to run seems like a bit of a balancing act. We could get away with only running CQ and say it's no worse than for changes within Chromium, or we could run as many bots as possible to absolutely minimize breakage, likely at the cost of slower and more failing imports.

Of all the bots that wpt-importer current starts, do you know which are the slowest and most frequently failing? Seems like disabling some of those is worth discussing if they aren't also catching many problems that would otherwise break the waterfall.

On Thu, Aug 10, 2017 at 6:22 PM Quinten Yearsley <qyea...@chromium.org> wrote:
That's right, on Android we run tests in the SmokeTests list, which includes some wpt tests.

The problem with it not running any tests at all recently is crbug.com/753702.

The recent-ish history with regards to layout tests on Android is: we currently run layout tests on very old versions of Android (K), and we want to run layout tests on newer versions of Android and on newer devices, but this is blocked on crbug.com/567947.

We still want to trigger android_blink_rel for imports because some wpt tests are (supposed to be) run on the waterfall (WebKit Android Nexus4), and it's possible for updates to those tests to cause waterfall failures if we don't take them into consideration.

I think that the case of https://chromium-review.googlesource.com/c/608944 is actually related to crbug.com/754169 (adding another comment there). In general, when there's a purple android job and green everything else, the importer should be able to continue (except of course if one of the few tests run on android was actually affected).

On Thu, Aug 10, 2017 at 7:10 AM, Rick Byers <rby...@chromium.org> wrote:
There's a long history of trying to get LayoutTests (which includes WPT) running reliably on the android bots.  IIRC there's a very short 'SmokeTests' list (which in principle should support WPT tests) and otherwise we just rely on our Linux coverage (which is almost identical code to Android).  Not an ideal state (IMHO if we had to pick one platform to run WPT against I'd actually prefer Android over all others myself), but doesn't seem to be a huge problem in practice.  But I don't know all the details/history here.
On Thu, Aug 10, 2017 at 4:59 AM, Philip Jägenstedt <foo...@chromium.org> wrote:
Quinten, I guess this bot is triggered automatically by the rebaseline scripts? Should it be?
On Thu, Aug 10, 2017 at 10:48 AM Raphael Kubo da Costa <raphael.ku...@intel.com> wrote:
Hey everyone,

I was looking at the most recent import failures today, and was
wondering if it makes sense to take android_blink_rel's results into
consideration at all.

* It doesn't seem to actually test what we're importing. When the bot
  does manage to finish running successfully, the results it reports
  always look like:

  => Results: 2/1013 tests passed (0.2%)
  => Tests to be fixed (1011):
  => Tests that will only be fixed if they crash (WONTFIX) (2):

  so when the bot does run successfully, it always goes green and never
  catches any of the test failures the other bots do.

* It gets stuck and slows down/cancels a working import. When the bot
  does not run successfully, it gets stuck for > 40min in the
  "webkit_tests" step until it either finishes with success or goes
  purple. When the latter happens, the whole import job is cancelled
  even if all the other bots reported meaningful results -- see
  https://chromium-review.googlesource.com/c/608944 for example.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

To post to this group, send email to ecosyst...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

To post to this group, send email to ecosyst...@chromium.org.
--
You received this message because you are subscribed to the Google Groups "ecosystem-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ecosystem-inf...@chromium.org.

To post to this group, send email to ecosyst...@chromium.org.

Raphael Kubo da Costa

unread,
Aug 11, 2017, 6:28:57 AM8/11/17
to Philip Jägenstedt, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
Philip Jägenstedt <foo...@chromium.org> writes:

> Of all the bots that wpt-importer current starts, do you know which are the
> slowest and most frequently failing? Seems like disabling some of those is
> worth discussing if they aren't also catching many problems that would
> otherwise break the waterfall.

I'd say android_blink_rel is the last reliable, even though its pass
rate has improved compared to a few weeks ago when we started the
ecosystem infra rotation. I wonder if it makes sense to skip it at least
until crbug.com/753702 is resolved.

Surprisingly, I'd say the second least reliable bots are the mac ones,
mostly due to crbug.com/750594.

Philip Jägenstedt

unread,
Aug 11, 2017, 7:36:00 AM8/11/17
to Raphael Kubo da Costa, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
Thanks for your insights, Raphael, it's great to have you around :)

Raphael Kubo da Costa

unread,
Aug 16, 2017, 6:09:03 AM8/16/17
to Philip Jägenstedt, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
Raphael Kubo da Costa <raphael.ku...@intel.com> writes:
> I'd say android_blink_rel is the last reliable, even though its pass
> rate has improved compared to a few weeks ago when we started the
> ecosystem infra rotation. I wonder if it makes sense to skip it at least
> until crbug.com/753702 is resolved.

Lately, the bot does seem to have started running some layout tests
again, so hooray for that :-)

It still seems quite flaky though: looking at
https://luci-milo.appspot.com/buildbot/tryserver.chromium.android/android_blink_rel/,
it's possible to see that while there have been some successful, valid
runs where it takes around 15min to run the layout tests, there are a
lot more red and purple entries.

The red ones always appear to stem from the
"BaseAudioContextAutoplayTest/BaseAudioContextAutoplayTest.AutoplayMetrics_CreateGesture_Child/1"
unit test being flaky (see
https://luci-milo.appspot.com/buildbot/tryserver.chromium.android/android_blink_rel/3323,
for example).

The purple statuses are the most annoying ones: the layout tests finish
running for about 15-20min, but the "webkit_tests" step keeps running
until it's automatically killed after about 60min and the entire tryjob
goes purple.

Does anyone know if those 2 problems are being worked on?

Philip Jägenstedt

unread,
Aug 16, 2017, 7:10:05 AM8/16/17
to Raphael Kubo da Costa, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
I found https://bugs.chromium.org/p/chromium/issues/detail?id=752511 for BaseAudioContextAutoplayTest. If it's still being flaky, just disabling the test on Android would be appropriate.

Raphael Kubo da Costa

unread,
Aug 16, 2017, 7:40:48 AM8/16/17
to Philip Jägenstedt, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
Philip Jägenstedt <foo...@chromium.org> writes:

> I found https://bugs.chromium.org/p/chromium/issues/detail?id=752511 for
> BaseAudioContextAutoplayTest. If it's still being flaky, just disabling the
> test on Android would be appropriate.

I don't have access to that bug :(

Philip Jägenstedt

unread,
Aug 16, 2017, 8:19:56 AM8/16/17
to Raphael Kubo da Costa, Quinten Yearsley, Rick Byers, ecosyst...@chromium.org
Uh, apparently it links to official builders and their logs and is therefore internal. All it says is that the test is timing out and that https://chromium-review.googlesource.com/585358 is the suspected cause.
Reply all
Reply to author
Forward
0 new messages