New policy on adding Tast tests

160 views
Skip to first unread message

Daniel Erat

unread,
Apr 24, 2019, 11:40:35 PM4/24/19
to tast-users, Chromium OS dev
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.

There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
  • work on our top priority of porting Autotest tests that are running on the CQ and PFQs
  • improve the framework
  • add new features (e.g. Servo support or hardware dependencies)
  • write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.

Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.

nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
The number of tests affected by these are relatively small, though, and we are working on solving these issues.

Thank you for your understanding, and please let me know if you have any questions.

Dan

Brian Norris

unread,
Apr 25, 2019, 8:51:11 PM4/25/19
to Daniel Erat, tast-users, Chromium OS dev
On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:

How about https://crbug.com/934090? I just spent a few minutes looking through 1 Tast failure and all of the logs were trimmed absurdly, such that I had no clue what was going on.

Brian

Stéphane Marchesin

unread,
Apr 25, 2019, 9:34:15 PM4/25/19
to Daniel Erat, tast-users, Chromium OS dev
On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.


For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.

 
There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
  • work on our top priority of porting Autotest tests that are running on the CQ and PFQs
  • improve the framework
  • add new features (e.g. Servo support or hardware dependencies)
  • write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.

Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.


What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.

 
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
The number of tests affected by these are relatively small, though, and we are working on solving these issues.

Thank you for your understanding, and please let me know if you have any questions.


Some of these flakes seem to be unrelated to the test itself or the subsystem being tested, but instead caused by other factors. For example network problems seem to be fairly common. What is the policy here?

Stéphane


 
Dan

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
https://groups.google.com/a/chromium.org/group/chromium-os-dev
---
You received this message because you are subscribed to the Google Groups "Chromium OS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-os-d...@chromium.org.

Bhaskar Janakiraman

unread,
Apr 26, 2019, 12:42:41 AM4/26/19
to Stéphane Marchesin, Daniel Erat, tast-users, Chromium OS dev
On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:


On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.


For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.



Hi,
I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)
Bhaskar 

Daniel Erat

unread,
Apr 26, 2019, 1:02:02 AM4/26/19
to Brian Norris, tast-users, Chromium OS dev
On Thu, Apr 25, 2019 at 5:51 PM Brian Norris <brian...@chromium.org> wrote:
On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:

How about https://crbug.com/934090? I just spent a few minutes looking through 1 Tast failure and all of the logs were trimmed absurdly, such that I had no clue what was going on.

There are plenty of other bugs that need to be fixed; I was just trying to list ones that may make it difficult to promote Tast tests to the CQ. :-)

As far as I'm aware, 934090 is an autoserv log-collection issue that affects both Tast and Autotest tests when they run in the lab. Tast is copying the complete log updates from the DUT, but the files get truncated later by autoserv. Bhaskar, do you know if there's anyone on the infra team with cycles to look at this now?
 
Brian
 
The number of tests affected by these are relatively small, though, and we are working on solving these issues.

--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-users+...@chromium.org.
To post to this group, send email to tast-...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CA%2BASDXMWUh7p_c73XrmeN_Hb%2Bo_LAkaBap18eS1zj9b1Yhfy8A%40mail.gmail.com.

Daniel Erat

unread,
Apr 26, 2019, 1:08:10 AM4/26/19
to Bhaskar Janakiraman, Stéphane Marchesin, tast-users, Chromium OS dev
On Thu, Apr 25, 2019 at 9:42 PM Bhaskar Janakiraman <bjanak...@google.com> wrote:
On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:


On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.


For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.



Hi,
I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)
Bhaskar 
 
There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
  • work on our top priority of porting Autotest tests that are running on the CQ and PFQs
  • improve the framework
  • add new features (e.g. Servo support or hardware dependencies)
  • write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.

Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.


What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.

There are no immediate plans to move CTS to Tast. I think we can discuss the limited number of other tests that aren't a good fit for the CQ on a case-by-case basis. We may want to add a separate test group for lengthy graphics tests that runs as part of bvt-tast-informational, for instance.
  
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
The number of tests affected by these are relatively small, though, and we are working on solving these issues.

Thank you for your understanding, and please let me know if you have any questions.


Some of these flakes seem to be unrelated to the test itself or the subsystem being tested, but instead caused by other factors. For example network problems seem to be fairly common. What is the policy here?

If a test's only failures are caused by outside factors like DUT or network flakiness, I have no objections to it being promoted to the CQ.

Ilja Friedel

unread,
Apr 26, 2019, 4:38:33 PM4/26/19
to Daniel Erat, Bhaskar Janakiraman, Stéphane Marchesin, tast-users, Chromium OS dev
On Thu, Apr 25, 2019 at 10:08 PM Daniel Erat <de...@chromium.org> wrote:


On Thu, Apr 25, 2019 at 9:42 PM Bhaskar Janakiraman <bjanak...@google.com> wrote:
On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:


On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.


For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.



Hi,
I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)
Bhaskar 
 
There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
  • work on our top priority of porting Autotest tests that are running on the CQ and PFQs
  • improve the framework
  • add new features (e.g. Servo support or hardware dependencies)
  • write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.

Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.


What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.

There are no immediate plans to move CTS to Tast. I think we can discuss the limited number of other tests that aren't a good fit for the CQ on a case-by-case basis. We may want to add a separate test group for lengthy graphics tests that runs as part of bvt-tast-informational, for instance.

We have graphics_dEQP, which (when not on the CQ) runs 500k egl/gles/vulkan unit tests. Now the tests can become failing or flaky due to a driver uprev and it can take a long time to stabilize the test. Is dEQP a good match for tast?

Also we would like to add API traces that replay games and other apps. Each trace could run for 1-10 minutes, there might be hundreds of them eventually. We would be interested in monitoring crashes, GPU hangs, rendering corruptions and performance regressions (and fix those one at a time). Are such high level (and likely noisy) tests a good match for tast? Notice we don't have any expectations that traces would run in the CQ, but we would want them to run several times a week (automatically). 

You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-users+...@chromium.org.
To post to this group, send email to tast-...@chromium.org.

Daniel Erat

unread,
Apr 26, 2019, 11:15:05 PM4/26/19
to Ilja Friedel, Bhaskar Janakiraman, Stéphane Marchesin, tast-users, Chromium OS dev
On Fri, Apr 26, 2019 at 1:38 PM Ilja Friedel <i...@chromium.org> wrote:


On Thu, Apr 25, 2019 at 10:08 PM Daniel Erat <de...@chromium.org> wrote:


On Thu, Apr 25, 2019 at 9:42 PM Bhaskar Janakiraman <bjanak...@google.com> wrote:
On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:


On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.

There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.

I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.


For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.



Hi,
I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)
Bhaskar 
 
There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
  • work on our top priority of porting Autotest tests that are running on the CQ and PFQs
  • improve the framework
  • add new features (e.g. Servo support or hardware dependencies)
  • write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.

Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.


What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.

There are no immediate plans to move CTS to Tast. I think we can discuss the limited number of other tests that aren't a good fit for the CQ on a case-by-case basis. We may want to add a separate test group for lengthy graphics tests that runs as part of bvt-tast-informational, for instance.

We have graphics_dEQP, which (when not on the CQ) runs 500k egl/gles/vulkan unit tests. Now the tests can become failing or flaky due to a driver uprev and it can take a long time to stabilize the test. Is dEQP a good match for tast?

I don't see any impediments to using Tast to run dEQP outside of the CQ . Like I said, we'd probably want to put it in a new group so it runs in its own job.
 
Also we would like to add API traces that replay games and other apps. Each trace could run for 1-10 minutes, there might be hundreds of them eventually. We would be interested in monitoring crashes, GPU hangs, rendering corruptions and performance regressions (and fix those one at a time). Are such high level (and likely noisy) tests a good match for tast? Notice we don't have any expectations that traces would run in the CQ, but we would want them to run several times a week (automatically). 

This feels similar to dEQP to me, and I don't see any obvious issues in the way it's described here. I'm arguing against flaky informational tests that nobody pays attention to -- as I view it, the CQ is mainly used to prevent regressions in code that's expected to be working across all platforms and that can be reliably verified. If some teams have their own workflows necessitated by doing development across a bunch of sometimes-flaky platforms, I want to find a way to support that.

Jasmine Chen

unread,
Apr 29, 2019, 1:18:52 PM4/29/19
to Daniel Erat, Ilja Friedel, Bhaskar Janakiraman, Stéphane Marchesin, tast-users, Chromium OS dev
> As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ.

For the case of Google Camera App (GCA), this is difficult to do.
GCA is a full-fledged app that has many dependencies (ARC++ WM, App platform, UI, graphics stack, camera stack, video stack, audio stack... etc.). If any one of these breaks, GCA breaks. It's hard to keep GCA tests stable because it essentially sits on a moving platform.
During the rapid development of Nocturne/ARC++ P, GCA was broken almost on a weekly basis. It's getting better as P is much more stable now, but still it's really hard to keep GCA functional all the time.
For our case, we just want to have tests in place, so that we can semi-timely catch regressions when tests start failing.



Daniel Erat

unread,
Apr 29, 2019, 3:00:35 PM4/29/19
to Jasmine Chen, tast-users, Chromium OS dev
[note: public mailing lists]

I'd expect all of the code that can break GCA to be gated by the Chrome OS CQ, Chrome CQ/PFQ, and Android PFQ. If that's the case, then core GCA tests should be stabilized so they can run in all of those queues and prevent regressions from being committed or integrated in the first place. Is the concern that the tests won't run in a place where failures can block Android commits, resulting in the Android PFQ frequently being broken?

Ricky Liang

unread,
Apr 29, 2019, 9:48:14 PM4/29/19
to Daniel Erat, Jasmine Chen, tast-users, Chromium OS dev
On Tue, Apr 30, 2019 at 3:00 AM Daniel Erat <de...@chromium.org> wrote:
[note: public mailing lists]

I'd expect all of the code that can break GCA to be gated by the Chrome OS CQ, Chrome CQ/PFQ, and Android PFQ. If that's the case, then core GCA tests should be stabilized so they can run in all of those queues and prevent regressions from being committed or integrated in the first place. Is the concern that the tests won't run in a place where failures can block Android commits, resulting in the Android PFQ frequently being broken?

Yes, this is our major concern of running GCA tests on the Android PFQ. It should be okay to run the GCA tests on the stabilized ARC++ branches (N and P), but can be tricky when we want to bring-up new ARC++ desserts (R for example). We can enable the GCA tests on Android PFQ and see how it goes.

If there's a way for us to run the tests in Android/ARC++ CQ then it'd be even better. Will we support running Tast tests on the Android TreeHugger presubmit/postsubmit?

Daniel Erat

unread,
Apr 30, 2019, 9:39:50 PM4/30/19
to Ricky Liang, Jasmine Chen, tast-users, Chromium OS dev, Shuhei Takahashi (nya), Hidehiko Abe, Laurent Chavey
On Mon, Apr 29, 2019 at 6:48 PM Ricky Liang <jcl...@chromium.org> wrote:

On Tue, Apr 30, 2019 at 3:00 AM Daniel Erat <de...@chromium.org> wrote:
[note: public mailing lists]

I'd expect all of the code that can break GCA to be gated by the Chrome OS CQ, Chrome CQ/PFQ, and Android PFQ. If that's the case, then core GCA tests should be stabilized so they can run in all of those queues and prevent regressions from being committed or integrated in the first place. Is the concern that the tests won't run in a place where failures can block Android commits, resulting in the Android PFQ frequently being broken?

Yes, this is our major concern of running GCA tests on the Android PFQ. It should be okay to run the GCA tests on the stabilized ARC++ branches (N and P), but can be tricky when we want to bring-up new ARC++ desserts (R for example). We can enable the GCA tests on Android PFQ and see how it goes.

If there's a way for us to run the tests in Android/ARC++ CQ then it'd be even better. Will we support running Tast tests on the Android TreeHugger presubmit/postsubmit?

I think it'd be great to run Tast tests at that point, but I don't know much about TreeHugger and am not sure how feasible it is. I've filed https://crbug.com/958278 to track this. You'll hopefully get a more-knowledgeable answer after the Japan holidays end. :-)

rica...@google.com

unread,
May 2, 2019, 12:44:34 PM5/2/19
to tast-users, jcl...@chromium.org, lni...@chromium.org, chromiu...@chromium.org, n...@chromium.org, hide...@chromium.org, cha...@google.com, arth...@chromium.org
+arthurhsu (who has been experimenting with Tast + Android PFQ).


Stéphane


 
To unsubscribe from this group and stop receiving emails from it, send an email to chromiu...@chromium.org.

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
https://groups.google.com/a/chromium.org/group/chromium-os-dev
---
You received this message because you are subscribed to the Google Groups "Chromium OS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromiu...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-...@chromium.org.
Reply all
Reply to author
Forward
0 new messages