--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAATLsPaVmFwG9JnmJsajYfm1D1%2B2UL0-pyJTWWurBnXQA2UaPA%40mail.gmail.com.
My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.
For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.
On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.
For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?:For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.Your second point made me think interrupting is in fact counter productive: It is useful to know about other actually failing tests as I'd be able to keep working on them even though there are invalid ones (which I would not work on).
On Wed, Nov 25, 2015 at 9:37 AM, Carlos Knippschild <car...@chromium.org> wrote:On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.Actually, I believe Paweł is referring to a very specific type of failure that occurs when the executed test doesn't return a result in the format we're expecting. Most test steps pass structured JSON back, and the recipe validates the data; if the data is invalid for some reason, then we report that and fail the test.This failure can occur either because of a bug in infra or a bug in a src-side script, as John says.
For infra failures, I can see that argument to stopping immediately. Do we have monitoring on these failures that is looked at by troopers?:For dev failures, I don't see why we'd want to stop immediately instead of continuing to run the other tests? It would be frustrating if a patch breaks multiple test suites that one would have to serially find this out.Your second point made me think interrupting is in fact counter productive: It is useful to know about other actually failing tests as I'd be able to keep working on them even though there are invalid ones (which I would not work on).Right. It's rare that an invalid test result on one step means that *every* step will be invalid. So, from a developer's point of view there can still be value in running the remaining tests and retrying without the patch, to help figure out what else might break.From the CQ's point of view, though, Paweł is correct that we shouldn't ever land a patch that has failed in that way, because something's wrong somewhere :).In this case I would err on the side of the developer (run the other tests), but make sure we send up enough alerts so that people can triage in to see if the infra structure is broken, or there are unrelated bugs in the src-side scripts, or if the patch itself is at fault.Make sense?-- Dirk
--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CAEoffTBPxE9xU3p-6z_bey5Cm%2BppVB41sY5ezQ_Gf5PWGgxBPw%40mail.gmail.com.
Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.1. We do not retry invalid test results without patch.2. I'd define invalid test results as "couldn't determine a list of individual failing tests".
3. I don't think we're hitting issues with something 100% misconfigured between the harness and recipe. However, it seems to me some test harnesses don't seem to handle various edge cases well, and just bail out.
4. Make sure to see example build http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 . Test results are invalid because telemetry test harness crashed ("ImportError: No module named pkg_resources"). I think the recipe is handling this as expected, but let me know if you see room for improvement.
5. Failures with and without the patch are not considered invalid results. In fact, we ignore failures that happened with the patch if they also happened without the patch. This doesn't happen at entire step level, but individual test level. If we can't get a list of individual failed test, it doesn't make sense to try without patch.6. I'm not aware of working and used monitoring solution for infra failures, but I've been active on an internal email thread about it.7. Yes, there's tradeoff here: this could help reduce cycle time, but potentially make individual try jobs less useful to developers. On one hand I can understand the reluctance to make this change. On the other hand, I'm just looking for ways to avoid CQ cycle time regressions. One possibility might be to try making the change and revert if it makes debugging harder.7a. If you can provide examples of try runs that had invalid test results but other test results were still helpful to you, please let me know.PawełOn Wed, Nov 25, 2015 at 8:28 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote:2015-11-25 13:39 GMT-05:00 Dirk Pranke <dpr...@chromium.org>:On Wed, Nov 25, 2015 at 9:37 AM, Carlos Knippschild <car...@chromium.org> wrote:On Wed, Nov 25, 2015 at 6:22 PM, John Abd-El-Malek <j...@chromium.org> wrote:My first question is: what exactly is invalid test results? I have seen that for both buggy patches, but also infra failures.My understanding of invalid is: if the same test suit failed both with patch and without patch that's an invalid result. It doesn't mean the patch doesn't cause issues, but it does mean that that run can't prove whether it does or not. And it might not necessarily be related to infra; it could also be that tot is in a bad state.And to clarify: what I meant by solely invalid is a bot run where all suite failures where considered invalid.Actually, I believe Paweł is referring to a very specific type of failure that occurs when the executed test doesn't return a result in the format we're expecting. Most test steps pass structured JSON back, and the recipe validates the data; if the data is invalid for some reason, then we report that and fail the test.This failure can occur either because of a bug in infra or a bug in a src-side script, as John says.It's tricky; for example a DLL could be missing so the executable didn't start at all. You don't want this to get committed. Yet that's listed as a missing shard.
Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.1. We do not retry invalid test results without patch.2. I'd define invalid test results as "couldn't determine a list of individual failing tests".
3. I don't think we're hitting issues with something 100% misconfigured between the harness and recipe. However, it seems to me some test harnesses don't seem to handle various edge cases well, and just bail out.4. Make sure to see example build http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_rel_ng/builds/100812 . Test results are invalid because telemetry test harness crashed ("ImportError: No module named pkg_resources"). I think the recipe is handling this as expected, but let me know if you see room for improvement.5. Failures with and without the patch are not considered invalid results. In fact, we ignore failures that happened with the patch if they also happened without the patch. This doesn't happen at entire step level, but individual test level. If we can't get a list of individual failed test, it doesn't make sense to try without patch.6. I'm not aware of working and used monitoring solution for infra failures, but I've been active on an internal email thread about it.7. Yes, there's tradeoff here: this could help reduce cycle time,
On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.1. We do not retry invalid test results without patch.2. I'd define invalid test results as "couldn't determine a list of individual failing tests".I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.
On Wed, Nov 25, 2015 at 2:17 PM, John Abd-El-Malek <j...@chromium.org> wrote:On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.1. We do not retry invalid test results without patch.2. I'd define invalid test results as "couldn't determine a list of individual failing tests".I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.I'm sorry, I thought I answered this. Yes, this can happen because of both infra and dev bugs.
On Wed, Nov 25, 2015 at 2:28 PM, Dirk Pranke <dpr...@chromium.org> wrote:On Wed, Nov 25, 2015 at 2:17 PM, John Abd-El-Malek <j...@chromium.org> wrote:On Wed, Nov 25, 2015 at 11:35 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:Thanks for the feedback so far. There are good points, and there's also fair amount of confusion. Let me try to clarify.1. We do not retry invalid test results without patch.2. I'd define invalid test results as "couldn't determine a list of individual failing tests".I'm still not sure I know the answer to my question: can this happen because of both infra and dev bugs? My experience has been yes.I'm sorry, I thought I answered this. Yes, this can happen because of both infra and dev bugs.Sorry I missed that.I thought I've seen this from my own patches (that didn't involve changes to src-side scripts). i.e. causing a testsuite to not return results because it crashes immediately.
-- Dirk