nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
- https://crbug.com/929021: UIAutomator tests are flaky
- https://crbug.com/933110: Add feature to skip tests only on specific boards in Tast
TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.
There are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
- work on our top priority of porting Autotest tests that are running on the CQ and PFQs
- improve the framework
- add new features (e.g. Servo support or hardware dependencies)
- write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
- https://crbug.com/929021: UIAutomator tests are flaky
- https://crbug.com/933110: Add feature to skip tests only on specific boards in Tast
The number of tests affected by these are relatively small, though, and we are working on solving these issues.Thank you for your understanding, and please let me know if you have any questions.
--Dan
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
https://groups.google.com/a/chromium.org/group/chromium-os-dev
---
You received this message because you are subscribed to the Google Groups "Chromium OS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-os-d...@chromium.org.
On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.
On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
- https://crbug.com/929021: UIAutomator tests are flaky
- https://crbug.com/933110: Add feature to skip tests only on specific boards in Tast
How about https://crbug.com/934090? I just spent a few minutes looking through 1 Tast failure and all of the logs were trimmed absurdly, such that I had no clue what was going on.
BrianThe number of tests affected by these are relatively small, though, and we are working on solving these issues.
--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-users+...@chromium.org.
To post to this group, send email to tast-...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CA%2BASDXMWUh7p_c73XrmeN_Hb%2Bo_LAkaBap18eS1zj9b1Yhfy8A%40mail.gmail.com.
On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.Hi,I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)BhaskarThere are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
- work on our top priority of porting Autotest tests that are running on the CQ and PFQs
- improve the framework
- add new features (e.g. Servo support or hardware dependencies)
- write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.
nya@ mentioned that there are several known issues that make it difficult to promote some existing informational tests to the CQ:
- https://crbug.com/929021: UIAutomator tests are flaky
- https://crbug.com/933110: Add feature to skip tests only on specific boards in Tast
The number of tests affected by these are relatively small, though, and we are working on solving these issues.Thank you for your understanding, and please let me know if you have any questions.Some of these flakes seem to be unrelated to the test itself or the subsystem being tested, but instead caused by other factors. For example network problems seem to be fairly common. What is the policy here?
On Thu, Apr 25, 2019 at 9:42 PM Bhaskar Janakiraman <bjanak...@google.com> wrote:On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.Hi,I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)BhaskarThere are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
- work on our top priority of porting Autotest tests that are running on the CQ and PFQs
- improve the framework
- add new features (e.g. Servo support or hardware dependencies)
- write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.There are no immediate plans to move CTS to Tast. I think we can discuss the limited number of other tests that aren't a good fit for the CQ on a case-by-case basis. We may want to add a separate test group for lengthy graphics tests that runs as part of bvt-tast-informational, for instance.
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-users+...@chromium.org.
To post to this group, send email to tast-...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CACDpghnZqzMCNCV2jSnAWd2-bj9yqKikt8E5YqY3oDTjK%2BVdkA%40mail.gmail.com.
On Thu, Apr 25, 2019 at 10:08 PM Daniel Erat <de...@chromium.org> wrote:On Thu, Apr 25, 2019 at 9:42 PM Bhaskar Janakiraman <bjanak...@google.com> wrote:On Thu, Apr 25, 2019 at 6:34 PM 'Stéphane Marchesin' via Chromium OS Development <chromiu...@chromium.org> wrote:On Wed, Apr 24, 2019 at 8:40 PM Daniel Erat <de...@chromium.org> wrote:TL;DR: Please don't add new regression tests before stabilizing existing informational tests and promoting them to the CQ.There are nearly 150 informational Tast tests listed at http://tastboard/test (indicated by an "I" at the left side of their row). None of these tests run on the Commit Queue, and many of them have been failing repeatedly since they were introduced. I would estimate that fifty or fewer of these tests are close to being reliable enough for their failures to be actionable. Almost all of them require stabilization work before they can be promoted to the CQ.I'm heartened that so many developers are interested in writing tests, but we need follow-through to avoid ending up in the same state as Autotest, with hundreds or thousands of broken and unmaintained tests. Regression tests are only useful if people care about failures. People generally don't care about failures if they're only displayed on an easy-to-ignore dashboard.For graphics tests, our model is a little different: we can't run them in the CQ because they are too long, so we run them off-band and track them through an alert system, which we have a rotation for.Hi,I think the intent here was to make sure all these tests are stable. Yes, we don't want to be promoting long running tests to CQ :-)BhaskarThere are costs associated with a test beyond the time and effort that is spent writing it. In particular, all test code is currently reviewed by one or more core Tast developers. This has greatly reduced the amount of time that we have to:
- work on our top priority of porting Autotest tests that are running on the CQ and PFQs
- improve the framework
- add new features (e.g. Servo support or hardware dependencies)
- write documentation and tools to help test authors
As such, we are instituting a new policy that teams cannot add additional regression tests until their existing informational tests have been stabilized and promoted to the CQ. I would also strongly urge any new test authors to start out by trying to stabilize an existing informational test. Doing so will expose you to Tast coding conventions and make it easier to write new code in the future.Any new test being added must have a clear plan for being promoted to the CQ. I would recommend filing an issue to track promoting the test and using these to track flakiness issues that need to be resolved first. While investigating flakiness in tests, I've often found subtle races in the code that is being tested.What about tests which are too long to run in the CQ, like CTS or deqp for example? As it is, this seems to conflict with our mandate to keep CTS passing.There are no immediate plans to move CTS to Tast. I think we can discuss the limited number of other tests that aren't a good fit for the CQ on a case-by-case basis. We may want to add a separate test group for lengthy graphics tests that runs as part of bvt-tast-informational, for instance.We have graphics_dEQP, which (when not on the CQ) runs 500k egl/gles/vulkan unit tests. Now the tests can become failing or flaky due to a driver uprev and it can take a long time to stabilize the test. Is dEQP a good match for tast?
Also we would like to add API traces that replay games and other apps. Each trace could run for 1-10 minutes, there might be hundreds of them eventually. We would be interested in monitoring crashes, GPU hangs, rendering corruptions and performance regressions (and fix those one at a time). Are such high level (and likely noisy) tests a good match for tast? Notice we don't have any expectations that traces would run in the CQ, but we would want them to run several times a week (automatically).
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CACDpgh%3DtxOTVAukAWzvWy-ewRw4WWGtn-L0hanoQKm2jacsdDw%40mail.gmail.com.
[note: public mailing lists]I'd expect all of the code that can break GCA to be gated by the Chrome OS CQ, Chrome CQ/PFQ, and Android PFQ. If that's the case, then core GCA tests should be stabilized so they can run in all of those queues and prevent regressions from being committed or integrated in the first place. Is the concern that the tests won't run in a place where failures can block Android commits, resulting in the Android PFQ frequently being broken?
On Tue, Apr 30, 2019 at 3:00 AM Daniel Erat <de...@chromium.org> wrote:[note: public mailing lists]I'd expect all of the code that can break GCA to be gated by the Chrome OS CQ, Chrome CQ/PFQ, and Android PFQ. If that's the case, then core GCA tests should be stabilized so they can run in all of those queues and prevent regressions from being committed or integrated in the first place. Is the concern that the tests won't run in a place where failures can block Android commits, resulting in the Android PFQ frequently being broken?Yes, this is our major concern of running GCA tests on the Android PFQ. It should be okay to run the GCA tests on the stabilized ARC++ branches (N and P), but can be tricky when we want to bring-up new ARC++ desserts (R for example). We can enable the GCA tests on Android PFQ and see how it goes.If there's a way for us to run the tests in Android/ARC++ CQ then it'd be even better. Will we support running Tast tests on the Android TreeHugger presubmit/postsubmit?
Stéphane
To unsubscribe from this group and stop receiving emails from it, send an email to chromiu...@chromium.org.
--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
https://groups.google.com/a/chromium.org/group/chromium-os-dev
---
You received this message because you are subscribed to the Google Groups "Chromium OS Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromiu...@chromium.org.
--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-...@chromium.org.
To post to this group, send email to tast-...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CACDpghnZqzMCNCV2jSnAWd2-bj9yqKikt8E5YqY3oDTjK%2BVdkA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "tast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tast-...@chromium.org.
To post to this group, send email to tast-...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/tast-users/CACDpgh%3DtxOTVAukAWzvWy-ewRw4WWGtn-L0hanoQKm2jacsdDw%40mail.gmail.com.