--
You received this message because you are subscribed to the Google Groups "Chromium Hackability Code Yellow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hackability-c...@chromium.org.
To post to this group, send email to hackabi...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CAJ8%2BGoge4y1fu-ZCoF%3DLvDYYSr7vbSDrJBm_tbGDTHjS_sJa5A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CANMdWTt_1DSdrYgjYsNXd1iR-_htdWMi%2BFFHg5vEb4rHQPiNHw%40mail.gmail.com.
They were (I hadn't noticed that, as I was only looking at trybots).
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CANMdWTt7E0x73_rJaNSy-MuwtWMh1LwWxkT-wpA2Hh-QsrHuOg%40mail.gmail.com.
Oh, I see. It's super flaky. This would be a really hard thing to detect and close the tree for.The only solution I can see for this is to expose flakiness in sheriff-o-matic. That work isn't progressing super quickly. We could use more help on that work.
Let's not look at this as binary: either old policy or keeping tree open always. Both have issues.
We have switched from a situation where everyone pays price of closed tree through slower CQ, which sucked for obvious reasons. However at least that suckiness incentivized some people to reopen tree. How can we incentivize people now?
Just saying welcome to new world & volunteers welcome to undo the suckiness from the new system is somewhat lacking.
On Mon, Sep 8, 2014 at 8:38 PM, Ojan Vafai <oj...@chromium.org> wrote:Have a lot of flaky tests been disabled in the past week and a half?
On Mon, Sep 8, 2014 at 9:32 PM, John Abd-El-Malek <j...@chromium.org> wrote:Let's not look at this as binary: either old policy or keeping tree open always. Both have issues.I think we can keep the new policy and address these sorts of issues by building better tooling.
For example, if we really wanted, we could have gatekeeper change the tree status message and require the sheriff to fix it without closing the tree. That's a bit silly, but it would achieve the same goal without stopping chromium development in the process.We have switched from a situation where everyone pays price of closed tree through slower CQ, which sucked for obvious reasons. However at least that suckiness incentivized some people to reopen tree. How can we incentivize people now?We have a good plan for showing flakiness in sheriff-o-matic. The problem is just that bad flakiness can now go unnoticed (whereas before only mild flakiness went unnoticed). If we get to a point where we expect sheriffs to use sheriff-o-matic and we expose this in sheriff-o-matic as something the sheriff's need to address in order to consider the tree green, then I don't think we need further incentives or policy changes. We're not there yet obviously.If we just got to a point where sheriff's used sheriff-o-matic without the dedicated flakiness UI, I think a lot of problems like this would get caught because sheriffs would see the same failure repeating and would address it similar to how they did before when the tree would close.
Just saying welcome to new world & volunteers welcome to undo the suckiness from the new system is somewhat lacking.Who is saying that? That's certainly not what I said. To clarify, right now I'm focusing my efforts on making sheriff-o-matic better so that sheriffs actually use it for non-flaky failures. Once that's working well, I'll focus on better-exposing flaky failures. We could do the work in parallel and do both parts of this faster if we had more help.
On Mon, Sep 8, 2014 at 10:08 PM, Ojan Vafai <oj...@chromium.org> wrote:On Mon, Sep 8, 2014 at 9:32 PM, John Abd-El-Malek <j...@chromium.org> wrote:Let's not look at this as binary: either old policy or keeping tree open always. Both have issues.I think we can keep the new policy and address these sorts of issues by building better tooling.I agree with this.I what worries me is that we switched tree policy without this tooling being done. If we keep having issues like the one we had over the last 5 days, then that'll be a big time sink that isn't measured.For example, if we really wanted, we could have gatekeeper change the tree status message and require the sheriff to fix it without closing the tree. That's a bit silly, but it would achieve the same goal without stopping chromium development in the process.We have switched from a situation where everyone pays price of closed tree through slower CQ, which sucked for obvious reasons. However at least that suckiness incentivized some people to reopen tree. How can we incentivize people now?We have a good plan for showing flakiness in sheriff-o-matic. The problem is just that bad flakiness can now go unnoticed (whereas before only mild flakiness went unnoticed). If we get to a point where we expect sheriffs to use sheriff-o-matic and we expose this in sheriff-o-matic as something the sheriff's need to address in order to consider the tree green, then I don't think we need further incentives or policy changes. We're not there yet obviously.If we just got to a point where sheriff's used sheriff-o-matic without the dedicated flakiness UI, I think a lot of problems like this would get caught because sheriffs would see the same failure repeating and would address it similar to how they did before when the tree would close.btw I often hear that all the new tooling will be in sheriffo-matic, however anecdotal points from chromium sheriffs is that most aren't using it. Do we have stats on what percentage of sheriffs are using it? Is anything being done to encourage sheriffs to try this out and collect feedback about what can be done to make it their preferred tool?Just saying welcome to new world & volunteers welcome to undo the suckiness from the new system is somewhat lacking.Who is saying that? That's certainly not what I said. To clarify, right now I'm focusing my efforts on making sheriff-o-matic better so that sheriffs actually use it for non-flaky failures. Once that's working well, I'll focus on better-exposing flaky failures. We could do the work in parallel and do both parts of this faster if we had more help.That probably came off as not exactly what I meant, sorry.It's just not clear to me (either way, I'm unsure myself) if we should change the tree opening policy before we have ways to cope with the resulting tragedy of the commons.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CALhVsw0py9Rrj%2BH2h%3DwvMR1dR0eugQNdbQp4xwzOXVuO%2BpLUSg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/hackability-cy/CALhVsw0py9Rrj%2BH2h%3DwvMR1dR0eugQNdbQp4xwzOXVuO%2BpLUSg%40mail.gmail.com.
Re: engaging with sheriffs.When the policy change was made, we discussed having Ojan/Karen reach out directly to sheriffs to remind them of the new tooling, ask for feedback during/after their shifts, and, more importantly stress that it is still their *job* to keep the tree green, even if it is open. Is that still being done? A cultural change like this will take some reminding to make happen. The first few sheriffs post-change gave a lot of great feedback.
I'm not sure if anyone reads the email, but we should add a note to the sheriff reminder e-mail with a link to the sheriff-o-matic documentation page and a link to easily file bugs.
And, at least until we feel confident that sheriffs have switched, we should follow up directly with sherifs after their shifts and ask for feedback.
On Mon, Sep 8, 2014 at 10:25 PM, John Abd-El-Malek <j...@chromium.org> wrote:On Mon, Sep 8, 2014 at 10:08 PM, Ojan Vafai <oj...@chromium.org> wrote:On Mon, Sep 8, 2014 at 9:32 PM, John Abd-El-Malek <j...@chromium.org> wrote:Let's not look at this as binary: either old policy or keeping tree open always. Both have issues.I think we can keep the new policy and address these sorts of issues by building better tooling.I agree with this.I what worries me is that we switched tree policy without this tooling being done. If we keep having issues like the one we had over the last 5 days, then that'll be a big time sink that isn't measured.For example, if we really wanted, we could have gatekeeper change the tree status message and require the sheriff to fix it without closing the tree. That's a bit silly, but it would achieve the same goal without stopping chromium development in the process.We have switched from a situation where everyone pays price of closed tree through slower CQ, which sucked for obvious reasons. However at least that suckiness incentivized some people to reopen tree. How can we incentivize people now?We have a good plan for showing flakiness in sheriff-o-matic. The problem is just that bad flakiness can now go unnoticed (whereas before only mild flakiness went unnoticed). If we get to a point where we expect sheriffs to use sheriff-o-matic and we expose this in sheriff-o-matic as something the sheriff's need to address in order to consider the tree green, then I don't think we need further incentives or policy changes. We're not there yet obviously.If we just got to a point where sheriff's used sheriff-o-matic without the dedicated flakiness UI, I think a lot of problems like this would get caught because sheriffs would see the same failure repeating and would address it similar to how they did before when the tree would close.btw I often hear that all the new tooling will be in sheriffo-matic, however anecdotal points from chromium sheriffs is that most aren't using it. Do we have stats on what percentage of sheriffs are using it? Is anything being done to encourage sheriffs to try this out and collect feedback about what can be done to make it their preferred tool?Just saying welcome to new world & volunteers welcome to undo the suckiness from the new system is somewhat lacking.Who is saying that? That's certainly not what I said. To clarify, right now I'm focusing my efforts on making sheriff-o-matic better so that sheriffs actually use it for non-flaky failures. Once that's working well, I'll focus on better-exposing flaky failures. We could do the work in parallel and do both parts of this faster if we had more help.That probably came off as not exactly what I meant, sorry.
It's just not clear to me (either way, I'm unsure myself) if we should change the tree opening policy before we have ways to cope with the resulting tragedy of the commons.
It seems to me the main downside of having the tree open while the failures persisted is that more changes landed in the mean time. It wasn't clear to me from Arv's initial note that that that actually made his job harder.