Stop using NOTRY=true, stop LGTMing changes with NOTRY=true!

222 views
Skip to first unread message

John Abd-El-Malek

unread,
Jan 29, 2015, 8:58:24 PM1/29/15
to chromium-dev
A lot of the causes of tree breakage now is when the CQ is skipped. So we should try to drastically reduce the number of times this is used.

Please don't use NOTRY=true. Please don't LGTM changes with it. 

Skipping the trybots should only be done for immediate reverts. Likewise, there are very few scenarios where manual landing without green try runs is OK. We had gotten used to using NOTRY because the CQ was slow or we thought that the tests weren't necessary for a "trivial" change. However, the CQ is fast now. Also, it figures out the bare minimum builds and tests to run based on the affected files and build rules, so no need to guess. It's nearly impossible to know how a change could affect our large number of test suites, platforms, and configurations anyways. And if each of us mis-judges just once a month and breaks the tree, it will be closed always.

Some example of usage of NOTRY from recent usage that doesn't need to be done:
-OWNERS files change: try bots will early exit, finishing in a few minutes
-changing scripts that are used by builders not on CQ: ditto
-disabling tests: trybots will only build that test target and run it (breaking build on test disabling is more common than one would think)

NOTRY=true was used about 35 times last week, excluding reverts. Let's bring that close to 0 :)

Thanks


PS And yes, adding presubmit checks and/or having the CQ reject NOTRY=true other than for reverts would be great.

Ilya Sherman

unread,
Jan 29, 2015, 9:04:36 PM1/29/15
to John Abd-El-Malek, chromium-dev
The main reason that I have found myself using NOTRY=true is that this is the only way that I know to land CLs that need to override presubmit warnings.  In an ideal world, we wouldn't ever have spurious presubmit warnings.  But, given that this is not the case, what is the recommended way to override presubmit warnings?

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Mark Seaborn

unread,
Jan 29, 2015, 9:11:16 PM1/29/15
to John Abd-El-Malek, chromium-dev
On 29 January 2015 at 17:57, John Abd-El-Malek <j...@chromium.org> wrote:
A lot of the causes of tree breakage now is when the CQ is skipped. So we should try to drastically reduce the number of times this is used.

Please don't use NOTRY=true. Please don't LGTM changes with it. 

Skipping the trybots should only be done for immediate reverts.

I agree.

By the way, I recently accidentally committed a non-immediate revert with NOTRY=true + NOTREECHECKS=true, because Rietveld's "Revert Patchset" button creates reverts that way by default.  See https://codereview.chromium.org/847413002 -- I was reverting a day-old change.

It would be nice if "Revert Patchset" omitted NOTRY/NOTREECHECKS/TBR when the change being reverted is no longer recent -- e.g. when it was committed >6 hours ago (to pick an arbitrary threshold).

Cheers,
Mark

Daniel Cheng

unread,
Jan 29, 2015, 9:15:30 PM1/29/15
to msea...@chromium.org, John Abd-El-Malek, chromium-dev
NOPRESUBMIT=true should skip presubmit checks.

Daniel

Ilya Sherman

unread,
Jan 29, 2015, 9:29:29 PM1/29/15
to Daniel Cheng, msea...@chromium.org, John Abd-El-Malek, chromium-dev
On Thu, Jan 29, 2015 at 6:14 PM, Daniel Cheng <dch...@chromium.org> wrote:
NOPRESUBMIT=true should skip presubmit checks.

Great!  I'll try to use that instead from now on.  Thanks for the info!
 

On Thu, Jan 29, 2015 at 6:12 PM, Dana Jansens <dan...@google.com> wrote:

Scott Hess

unread,
Jan 29, 2015, 11:17:47 PM1/29/15
to Ilya Sherman, Daniel Cheng, Mark Seaborn, John Abd-El-Malek, chromium-dev
Is there some sort of docs for the set of magical things I can say to
accomplish stuff, in a place I'm likely to be able to find when
needed? Right now I just kind of cargo-cult things until someone
yells at me, then I search around on chromium-dev or in the sheriff
docs looking for hints.

-scott

John Abd-El-Malek

unread,
Jan 29, 2015, 11:50:04 PM1/29/15
to Mark Seaborn, Paweł Hajdan Jr., chromium-dev
I think it already does this, but the threshold is longer. Adding Pawel to get the answer.
 

Cheers,
Mark


John Abd-El-Malek

unread,
Jan 29, 2015, 11:50:52 PM1/29/15
to Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, chromium-dev
If you're referring to things to say to the CQ, there's https://sites.google.com/a/chromium.org/dev/developers/testing/commit-queue?pli=1#TOC-Options. I just edited it to strongly discourage NOTRY=true

Paweł Hajdan, Jr.

unread,
Jan 30, 2015, 7:31:26 AM1/30/15
to John Abd-El-Malek, infr...@chromium.org, Jason Robbins, Ravi, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, chromium-dev
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

Thanks John for starting this thread, I totally +1 it.

By the way, instead of bypassing presubmit checks why don't we fix them? I find it annoying that it's apparently accepted to have checks with false positives.

Paweł

Ravi

unread,
Jan 30, 2015, 8:04:19 AM1/30/15
to chromi...@chromium.org, j...@chromium.org, infr...@chromium.org, jrob...@google.com, rmi...@chromium.org, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org


On Friday, January 30, 2015 at 7:31:26 AM UTC-5, Paweł Hajdan, Jr. wrote:
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

The one-click revert button used to leave off "TBR=" for CLs older than 14 days, but this functionality was removed in crrev/124470043 because it was found to be confusing (chromium:401513). I agreed with the change and personally did not like the idea of the button automatically behaving differently based on how old the CL is.

How about instead putting in a checkbox below the "Automatically send Revert CL to CQ" that says "Commit immediately". If the CL is X days old then you get an extra prompt "This CL is X days old, are you sure you want to commit immediately?" or maybe the "Commit immediately" can be unchecked by default for old CLs instead.

John Abd-El-Malek

unread,
Jan 30, 2015, 10:42:17 AM1/30/15
to Ravi, chromium-dev, infr...@chromium.org, Jason Robbins, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan Jr.
On Fri, Jan 30, 2015 at 5:04 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 7:31:26 AM UTC-5, Paweł Hajdan, Jr. wrote:
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

The one-click revert button used to leave off "TBR=" for CLs older than 14 days, but this functionality was removed in crrev/124470043 because it was found to be confusing (chromium:401513). I agreed with the change and personally did not like the idea of the button automatically behaving differently based on how old the CL is. 

How about instead putting in a checkbox below the "Automatically send Revert CL to CQ" that says "Commit immediately". If the CL is X days old then you get an extra prompt "This CL is X days old, are you sure you want to commit immediately?" or maybe the "Commit immediately" can be unchecked by default for old CLs instead.

I think we should use sane defaults, and the user can (always) override. It seems that this old reverted behavior was slightly different though, in that it left out TBR so the change wouldn't let.

Proposal: for commits <  6 hours, current behavior stays the same. The reason is that something in the tree is broken and  it might have taken a while to track it down during sheriff-switchover. Since it's less than 6 hours, the chance that it'll collide with another landed change is low.

For commits longer than 6 hours, then we leave TBR as is, but we don't add NOTRY=true. The older a change is, the more chance that its revert will collide with other commits. Also, if a change is old then there's no urgent rush to revert it and it can wait an hour.

Ravi

unread,
Jan 30, 2015, 11:29:25 AM1/30/15
to chromi...@chromium.org, rmi...@chromium.org, infr...@chromium.org, jrob...@google.com, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org


On Friday, January 30, 2015 at 10:42:17 AM UTC-5, John Abd-El-Malek wrote:


On Fri, Jan 30, 2015 at 5:04 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 7:31:26 AM UTC-5, Paweł Hajdan, Jr. wrote:
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

The one-click revert button used to leave off "TBR=" for CLs older than 14 days, but this functionality was removed in crrev/124470043 because it was found to be confusing (chromium:401513). I agreed with the change and personally did not like the idea of the button automatically behaving differently based on how old the CL is. 

How about instead putting in a checkbox below the "Automatically send Revert CL to CQ" that says "Commit immediately". If the CL is X days old then you get an extra prompt "This CL is X days old, are you sure you want to commit immediately?" or maybe the "Commit immediately" can be unchecked by default for old CLs instead.

I think we should use sane defaults, and the user can (always) override. It seems that this old reverted behavior was slightly different though, in that it left out TBR so the change wouldn't let.

Proposal: for commits <  6 hours, current behavior stays the same. The reason is that something in the tree is broken and  it might have taken a while to track it down during sheriff-switchover. Since it's less than 6 hours, the chance that it'll collide with another landed change is low.

For commits longer than 6 hours, then we leave TBR as is, but we don't add NOTRY=true. The older a change is, the more chance that its revert will collide with other commits. Also, if a change is old then there's no urgent rush to revert it and it can wait an hour.

My 2c:
* 6 hours is IMHO too short a timeframe, some projects correctness/perf tests may reliably identify a CL that needs to be reverted after that. We could make the timeframe project specific with a way to specify whether a project would like to opt out of this.
* The problem with the previous removing TBR= implementation was that it was never clear to the revert'er what was going to happen. If we do implement this NOTRY proposal, then lets make it crystal clear what is going to happen either with a new dialog or a descriptive message in the revert popup.

John Abd-El-Malek

unread,
Jan 30, 2015, 11:37:39 AM1/30/15
to Ravi, chromium-dev, infr...@chromium.org, Jason Robbins, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan Jr.
On Fri, Jan 30, 2015 at 8:29 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 10:42:17 AM UTC-5, John Abd-El-Malek wrote:


On Fri, Jan 30, 2015 at 5:04 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 7:31:26 AM UTC-5, Paweł Hajdan, Jr. wrote:
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

The one-click revert button used to leave off "TBR=" for CLs older than 14 days, but this functionality was removed in crrev/124470043 because it was found to be confusing (chromium:401513). I agreed with the change and personally did not like the idea of the button automatically behaving differently based on how old the CL is. 

How about instead putting in a checkbox below the "Automatically send Revert CL to CQ" that says "Commit immediately". If the CL is X days old then you get an extra prompt "This CL is X days old, are you sure you want to commit immediately?" or maybe the "Commit immediately" can be unchecked by default for old CLs instead.

I think we should use sane defaults, and the user can (always) override. It seems that this old reverted behavior was slightly different though, in that it left out TBR so the change wouldn't let.

Proposal: for commits <  6 hours, current behavior stays the same. The reason is that something in the tree is broken and  it might have taken a while to track it down during sheriff-switchover. Since it's less than 6 hours, the chance that it'll collide with another landed change is low.

For commits longer than 6 hours, then we leave TBR as is, but we don't add NOTRY=true. The older a change is, the more chance that its revert will collide with other commits. Also, if a change is old then there's no urgent rush to revert it and it can wait an hour.

My 2c:
* 6 hours is IMHO too short a timeframe, some projects correctness/perf tests may reliably identify a CL that needs to be reverted after that. We could make the timeframe project specific with a way to specify whether a project would like to opt out of this.
* The problem with the previous removing TBR= implementation was that it was never clear to the revert'er what was going to happen. If we do implement this NOTRY proposal, then lets make it crystal clear what is going to happen either with a new dialog or a descriptive message in the revert popup.

In general I prefer to avoid extra dialogs or new checkboxes to avoid cluttering up the UI.

With the previous removal of TBR=, it was confusing because people hit "revert" and might close the page and the revert never lands. With the proposal above, I don't think there's much confusion. The reverter has the same flow (just hit one button). It's just sometimes it takes an hour while other times it takes a few minutes. Devs are already trained that CQ takes variable amount of time, depending on how queued it is, if the tree is closed, how many files their change touches etc..

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/d1bd3786-5096-4036-bfd6-825eb7f4a88a%40chromium.org.

Jeffrey Yasskin

unread,
Jan 30, 2015, 11:40:51 AM1/30/15
to John Abd-El-Malek, chromium-dev
The memory sheriffs have a tradition of landing suppression changes with NOTRY=true: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium/memory-sheriff#TOC-Submitting-a-patch

Are the bots now smart enough to avoid compiling the world when the changed file isn't compiled?

On Thu, Jan 29, 2015 at 5:57 PM, John Abd-El-Malek <j...@chromium.org> wrote:

--

John Abd-El-Malek

unread,
Jan 30, 2015, 11:44:49 AM1/30/15
to Jeffrey Yasskin, chromium-dev
On Fri, Jan 30, 2015 at 8:40 AM, Jeffrey Yasskin <jyas...@chromium.org> wrote:
The memory sheriffs have a tradition of landing suppression changes with NOTRY=true: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium/memory-sheriff#TOC-Submitting-a-patch 

Thanks, memory sheriffs: I've removed this section for now since it's not needed anymore (thanks to the analyze.py step).
 

Are the bots now smart enough to avoid compiling the world when the changed file isn't compiled?

Correct. A change to a suppression file wouldn't lead to any compilation or tests run.

Ravi

unread,
Jan 30, 2015, 12:38:35 PM1/30/15
to chromi...@chromium.org, rmi...@chromium.org, infr...@chromium.org, jrob...@google.com, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org


On Friday, January 30, 2015 at 11:37:39 AM UTC-5, John Abd-El-Malek wrote:


On Fri, Jan 30, 2015 at 8:29 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 10:42:17 AM UTC-5, John Abd-El-Malek wrote:


On Fri, Jan 30, 2015 at 5:04 AM, Ravi <rmi...@chromium.org> wrote:


On Friday, January 30, 2015 at 7:31:26 AM UTC-5, Paweł Hajdan, Jr. wrote:
+infra-dev,jrobbins,rmistry

I don't know the answer to the question about "Revert Patchset" behavior. Adding Jason and Ravi that should know it.

The one-click revert button used to leave off "TBR=" for CLs older than 14 days, but this functionality was removed in crrev/124470043 because it was found to be confusing (chromium:401513). I agreed with the change and personally did not like the idea of the button automatically behaving differently based on how old the CL is. 

How about instead putting in a checkbox below the "Automatically send Revert CL to CQ" that says "Commit immediately". If the CL is X days old then you get an extra prompt "This CL is X days old, are you sure you want to commit immediately?" or maybe the "Commit immediately" can be unchecked by default for old CLs instead.

I think we should use sane defaults, and the user can (always) override. It seems that this old reverted behavior was slightly different though, in that it left out TBR so the change wouldn't let.

Proposal: for commits <  6 hours, current behavior stays the same. The reason is that something in the tree is broken and  it might have taken a while to track it down during sheriff-switchover. Since it's less than 6 hours, the chance that it'll collide with another landed change is low.

For commits longer than 6 hours, then we leave TBR as is, but we don't add NOTRY=true. The older a change is, the more chance that its revert will collide with other commits. Also, if a change is old then there's no urgent rush to revert it and it can wait an hour.

My 2c:
* 6 hours is IMHO too short a timeframe, some projects correctness/perf tests may reliably identify a CL that needs to be reverted after that. We could make the timeframe project specific with a way to specify whether a project would like to opt out of this.
* The problem with the previous removing TBR= implementation was that it was never clear to the revert'er what was going to happen. If we do implement this NOTRY proposal, then lets make it crystal clear what is going to happen either with a new dialog or a descriptive message in the revert popup.

In general I prefer to avoid extra dialogs or new checkboxes to avoid cluttering up the UI.

With the previous removal of TBR=, it was confusing because people hit "revert" and might close the page and the revert never lands. With the proposal above, I don't think there's much confusion. The reverter has the same flow (just hit one button). It's just sometimes it takes an hour while other times it takes a few minutes. Devs are already trained that CQ takes variable amount of time, depending on how queued it is, if the tree is closed, how many files their change touches etc..

The confusion would result from the revert CL not landing immediately. I personally would rather click through an extra dialog or read an info message telling me what is going to happen, than having my expected keywords be automatically missing and spend time trying to figure out where they went. Leaving the decision to Jason.


 
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+unsubscribe@chromium.org.

Sergiy Byelozyorov

unread,
Feb 1, 2015, 1:48:44 AM2/1/15
to Ravi, chromi...@chromium.org, infr...@chromium.org, jrob...@google.com, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org
On Fri Jan 30 2015 at 6:38:36 PM Ravi <rmi...@chromium.org> wrote:
The confusion would result from the revert CL not landing immediately. I personally would rather click through an extra dialog or read an info message telling me what is going to happen, than having my expected keywords be automatically missing and spend time trying to figure out where they went. Leaving the decision to Jason.

+1
 
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.

To post to this group, send email to infr...@chromium.org.

Timur Iskhodzhanov

unread,
Feb 2, 2015, 5:56:37 AM2/2/15
to j...@chromium.org, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
Hi Lei, Qin,

Can you please confirm that avoiding NOTRY=true doesn't make memory sheriffs less productive?
--
Tim

Timur Iskhodzhanov

unread,
Feb 2, 2015, 9:03:42 AM2/2/15
to j...@chromium.org, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.

A similar change with NOTRY=true makes it through CQ in less than a minute, e.g.

That being said, I don't agree with
"-OWNERS files change: try bots will early exit, finishing in a few minutes
-changing scripts that are used by builders not on CQ: ditto"

John Abd-El-Malek

unread,
Feb 2, 2015, 10:45:09 AM2/2/15
to Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
On Mon, Feb 2, 2015 at 6:03 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.

Hey Timur, thanks for checking. I have filed bugs on all these issues (some were already known).



Is 35 minutes really slowing down the memory sheriffs? Compared with 6 months ago, median CQ time was 4 hours. Now Linux ASAN runs on CQ, and the other ASAN bots on the main waterfall cycle in < 30 minutes compared to up 4 hours before.

There's a lot to be gained to having everyone use the CQ without skipping it, even if in some cases things are slowed down by half an hour.

Timur Iskhodzhanov

unread,
Feb 2, 2015, 10:49:52 AM2/2/15
to John Abd-El-Malek, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
On Mon Feb 02 2015 at 6:44:34 PM John Abd-El-Malek <j...@chromium.org> wrote:
On Mon, Feb 2, 2015 at 6:03 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.


Thanks!
 
Is 35 minutes really slowing down the memory sheriffs? Compared with 6 months ago, median CQ time was 4 hours. Now Linux ASAN runs on CQ, and the other ASAN bots on the main waterfall cycle in < 30 minutes compared to up 4 hours before.

Sorry for the naming confusion [I was against this naming], but memory sheriffs are not looking at the Memory waterfall [ASan,LSan].

Instead, they look at Valgrind, MSan and TSan bots.  They are much slower than ASan and there's no point in waiting even 30 minutes CQ when you can locally try new suppressions with a cycle time of 1 second [search for waterfall.sh here:

John Abd-El-Malek

unread,
Feb 2, 2015, 10:53:26 AM2/2/15
to Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
On Mon, Feb 2, 2015 at 7:49 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:


On Mon Feb 02 2015 at 6:44:34 PM John Abd-El-Malek <j...@chromium.org> wrote:
On Mon, Feb 2, 2015 at 6:03 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.


Thanks!
 
Is 35 minutes really slowing down the memory sheriffs? Compared with 6 months ago, median CQ time was 4 hours. Now Linux ASAN runs on CQ, and the other ASAN bots on the main waterfall cycle in < 30 minutes compared to up 4 hours before.

Sorry for the naming confusion [I was against this naming], but memory sheriffs are not looking at the Memory waterfall [ASan,LSan].

Instead, they look at Valgrind, MSan and TSan bots.  They are much slower than ASan and there's no point

The point is that there's a large advantage to having devs trained to never skip the CQ's tests.
 
in waiting even 30 minutes CQ when you can locally try new suppressions with a cycle time of 1 second [search for waterfall.sh here:

Well, if they're much slower than 30 minutes, then by the time a memory sheriff notices a failure and makes a cl, it would have missed the next build anyways. Since that takes >> 30 minutes, it seems a change landing properly through the CQ will end up in the same buildrun as if it had skipped the CQ.

Nico Weber

unread,
Feb 2, 2015, 11:26:22 AM2/2/15
to Timur Iskhodzhanov, John Abd-El-Malek, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
On Mon, Feb 2, 2015 at 7:49 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:


On Mon Feb 02 2015 at 6:44:34 PM John Abd-El-Malek <j...@chromium.org> wrote:
On Mon, Feb 2, 2015 at 6:03 AM, Timur Iskhodzhanov <timu...@chromium.org> wrote:
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.


Thanks!
 
Is 35 minutes really slowing down the memory sheriffs? Compared with 6 months ago, median CQ time was 4 hours. Now Linux ASAN runs on CQ, and the other ASAN bots on the main waterfall cycle in < 30 minutes compared to up 4 hours before.

Sorry for the naming confusion [I was against this naming], but memory sheriffs are not looking at the Memory waterfall [ASan,LSan].

Instead, they look at Valgrind, MSan and TSan bots.  They are much slower than ASan and there's no point in waiting even 30 minutes CQ when you can locally try new suppressions with a cycle time of 1 second [search for waterfall.sh here:

I want to second this. Memory suppression changes only touch text files that can't affect any main waterfall bots, and it's work you do to get bots back to green. Adding 30 minutes of overhead for something that adds 0 value doesn't sound like a good trade.

John Abd-El-Malek

unread,
Feb 2, 2015, 11:36:53 AM2/2/15
to Nico Weber, Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
To be clear, the fact that the change took 30 minutes is a regression. The system should automatically do minimal work for a change like this. If there's a bug, we should fix that instead of working around it by skipping tests. Also per my previous email that may have crossed with this, given how slow the memory bots are, the difference between 0 and 30 minutes won't change things.

Sergey Matveev

unread,
Feb 2, 2015, 11:39:06 AM2/2/15
to Nico Weber, Timur Iskhodzhanov, John Abd-El-Malek, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
+1. If we don't want to encourage NOTRY=true then we should make suppression files exempt from CQ.

--

Nico Weber

unread,
Feb 2, 2015, 12:19:49 PM2/2/15
to John Abd-El-Malek, Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
Since the suppression files affect nothing but the memory bots, there are no tests that are skipped. The only thing this skips are bot config regressions. (And these aren't all that rare, as we all know.)
 
Also per my previous email that may have crossed with this, given how slow the memory bots are, the difference between 0 and 30 minutes won't change things.

For me, this is more about cognitive overhead than about latency: If I come in and there are 5 things broken, it's a good workflow to me to look at the first thing, file bug, land suppression, repeat until I think everthing's addressed, then check back in a few hours if this actually worked. If there's uncertainty about my suppressions landing, I now have to check up on the suppressions, and maybe they won't make it all in when the next memory bot build kicks off, then I'll have to wait two cycles to see if I addressed everything, etc. All this for no benefit to anyone, as suppression files aren't used by anything but the memory bots.

On the other hand, since reverting is still instant, I suppose this encourages reverting on memory errors instead of landing suppressions, which is probably a good thing anyhow.

Thiago Farina

unread,
Feb 2, 2015, 9:53:58 PM2/2/15
to John Abd-El-Malek, Nico Weber, Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
Is analyze.py an internal script? Because I couldn't find it with cs.chromium.org.

The analyze step just call 'build/gyp_chromium --analyzer'.

-- 
Thiago Farina

Nico Weber

unread,
Feb 2, 2015, 10:06:04 PM2/2/15
to Thiago Farina, John Abd-El-Malek, Timur Iskhodzhanov, Jeffrey Yasskin, thestig, zha...@chromium.org, chromium-dev
The analyze step is implemented by this function https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/api.py&l=540 calling https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/filter/api.py&l=73 which runs build/gyp_chromium --analyzer (as you say). I don't think there's another analyze.py script.

Derek Bruening

unread,
Feb 3, 2015, 11:40:40 AM2/3/15
to chromi...@chromium.org, timu...@chromium.org, j...@chromium.org, jyas...@chromium.org, the...@chromium.org, zha...@chromium.org
On Monday, February 2, 2015 at 11:26:22 AM UTC-5, Nico Weber wrote:
On Mon, Feb 2, 2015 at 7:49 AM, Timur Iskhodzhanov wrote:

On Mon Feb 02 2015 at 6:44:34 PM John Abd-El-Malek wrote:

On Mon, Feb 2, 2015 at 6:03 AM, Timur Iskhodzhanov wrote:
Out of curiosity, I didn't put NOTRY=true on my recent change
https://codereview.chromium.org/886073003
and it took 35 minutes to get through CQ.
It only affects one FYI configuration (ASan for Windows), yet CQ has made a build on Win8 builder and ran all tests on Linux-ASan.


Thanks!
 
Is 35 minutes really slowing down the memory sheriffs? Compared with 6 months ago, median CQ time was 4 hours. Now Linux ASAN runs on CQ, and the other ASAN bots on the main waterfall cycle in < 30 minutes compared to up 4 hours before.

Sorry for the naming confusion [I was against this naming], but memory sheriffs are not looking at the Memory waterfall [ASan,LSan].

Instead, they look at Valgrind, MSan and TSan bots.  They are much slower than ASan and there's no point in waiting even 30 minutes CQ when you can locally try new suppressions with a cycle time of 1 second [search for waterfall.sh here:
I want to second this. Memory suppression changes only touch text files that can't affect any main waterfall bots, and it's work you do to get bots back to green. Adding 30 minutes of overhead for something that adds 0 value doesn't sound like a good trade.

 +1.  I don't buy the argument that memory sheriffs using NOTRY=true on suppression changes somehow makes them or anyone else more likely to use NOTRY=true on other code changes.  Suppression changes are part of a distinct workflow.

John Abd-El-Malek

unread,
Feb 3, 2015, 12:34:59 PM2/3/15
to Derek Bruening, chromium-dev, Timur Iskhodzhanov, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
Eventually, we could get rid of NOTRY=true except through the revert UI.

If suppression changes land in 5 or 10 minutes, would any of the memory sheriffs care? It used to be 5 minutes for no-op changes, but there have been some regressions and they'll be fixed.

Dan Beam

unread,
Feb 3, 2015, 2:07:16 PM2/3/15
to brue...@chromium.org, Chromium-dev, timu...@chromium.org, John Abd-El-Malek, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
Just to clarify: we're only proposing using NOTRY=true for *adding* suppressions, right?  Because removing them can certainly still break stuff.

-- Dan 

Jason Robbins

unread,
Feb 3, 2015, 4:23:11 PM2/3/15
to Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, dch...@chromium.org, msea...@chromium.org, Paweł Hajdan, Jr.
My first thought about it is that I agree that users should not be uncertain about the affect of a revert, and that the dialog for doing a revert has some room to add information that could clarify what will happen.  

I am thinking that we could just make all the revert options more explicit.  E.g., in ASCII-art:

Reason for revert:
  [text entry box]

NOPRESUBMIT       [X]
NOTREECHECKS   [X]
NOTRY                    [_]
BUG:                        [________]


I'm thinking that the rule for NOTRY on reverts should just be that it defaults to false because (ideally) a revert should still pass tests. The user doing the revert can get in the habit of checking it by clicking on it if they need it (e.g., to quickly roll back a series of changes).  If we later put in a more tricky rule for the NOTRY default, the user will still be able to see it and override it before pressing the submit button.

Thanks,
jason!

Timur Iskhodzhanov

unread,
Feb 4, 2015, 5:27:02 AM2/4/15
to Dan Beam, brue...@chromium.org, Chromium-dev, John Abd-El-Malek, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
Even though by removing suppressions one may break an MFYI bot, such bots are still not on CQ.  Using NOTRY=true doesn't seem to help here either.  Please note we (memory sheriffs) check if a suppression is dead by using http://chromium-build-logs.appspot.com/, so it's highly unlikely a change that only removes a suppression would break anything.
Changes that both fix a bug and remove a corresponding suppression are run through CQ as they change code.

Nico Weber

unread,
Feb 4, 2015, 11:55:03 AM2/4/15
to John Abd-El-Malek, Derek Bruening, chromium-dev, Timur Iskhodzhanov, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
It'd still be distracting for benefit imho. (33% of memory sheriffs cared enough that they bothered to chime in on this thread.)
 
It used to be 5 minutes for no-op changes, but there have been some regressions and they'll be fixed.

What's the plan for regular sheriffs? If someone checks in something that causes a failure on a main waterfall builder not covered by the cq that has an obvious fix, will fixing them (instead of reverting) no longer be allowed?

John Abd-El-Malek

unread,
Feb 4, 2015, 12:17:01 PM2/4/15
to Nico Weber, Derek Bruening, chromium-dev, Timur Iskhodzhanov, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
Well to be fair they chimed in that 30m vs a a minute is a lot. I'm asking if it's 5.

 
It used to be 5 minutes for no-op changes, but there have been some regressions and they'll be fixed.

What's the plan for regular sheriffs? If someone checks in something that causes a failure on a main waterfall builder not covered by the cq that has an obvious fix, will fixing them (instead of reverting) no longer be allowed?

I don't think no one has stated this; just avoiding using NOTRY=true. In the case you describe, since it's not on the CQ there's no harm in having a fix go through the normal trybots to ensure it doesn't break anything else. Even trivial fixes sometimes have unintended consequences.

To give a picture of the state of things, since even I was surprised by this data. Basically improvements to closely matching the CQ and main waterfall have had a lot of positive effects. I manually went through a week's worth of data (all I could find by looking at last 200 builds) on main waterfall breakages. I didn't find any breakages because of debug vs release. XP broke once because of a new dependency, which is the most common breakage. There's a bug filed on adding one telemetry test on CQ for it, which would catch these breakages. Vista broke twice, which is much more than normal, but that was because of new tests being swarmed. Mac 10.6/10.9, Linux 32 bit, Ozone didn't break. chromium waterfall broke once, and that is being addressed by adding a Linux clobber builder to CQ. chromium.chrome is planned to be removed once internal waterfall shows up on sheriff-o-matic. On chromium.memory, mac_asan broke once while linux_asan broke twice.

Granted, this is one week's worth of data, and some folks might hit problems more than others because they touch build-specific stuff. However, I think even then landing fixes to configs not on the main waterfall should go through the CQ. By definition, the breakage isn't affecting almost all devs. However, there's a chance that a trivial fix could affect them all if it's not tested, so it's not worth the risk.

Nico Weber

unread,
Feb 4, 2015, 1:35:23 PM2/4/15
to John Abd-El-Malek, Derek Bruening, chromium-dev, Timur Iskhodzhanov, Jeffrey Yasskin, Lei Zhang, zha...@chromium.org
Ok, if we think that no NOTRY=true for main waterfall sheriffs should work, then it makes sense to me to use the same bar for other sheriffs. I hope the "5-10 minutes to land" will regress less often once than I'd intuitively expect.

Daniel Bratell

unread,
Feb 5, 2015, 5:02:44 AM2/5/15
to Ravi, chromi...@chromium.org, Sergiy Byelozyorov, infr...@chromium.org, jrob...@google.com, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org
On Sun, 01 Feb 2015 07:48:05 +0100, Sergiy Byelozyorov <ser...@chromium.org> wrote:



On Fri Jan 30 2015 at 6:38:36 PM Ravi <rmi...@chromium.org> wrote:
The confusion would result from the revert CL not landing immediately. I personally would rather click through an extra dialog or read an info message telling me what is going to happen, than having my expected keywords be automatically missing and spend time trying to figure out where they went. Leaving the decision to Jason.

+1

It sounds to me like the problem you (rmistry and sergeiyb and most likely many others) are seeing is that you don't trust the CQ and don't know how to check what it's working on and prioritizing at the moment?

Is there a summary page listing ongoing integrations? I think I might have seen one at some point but I couldn't possibly remember where and when.

Is it possible to boost one integration so that it happens faster at the expense of other integrations?

/Daniel

Ravi

unread,
Feb 5, 2015, 7:28:08 AM2/5/15
to chromi...@chromium.org, rmi...@chromium.org, ser...@chromium.org, infr...@chromium.org, jrob...@google.com, sh...@chromium.org, ishe...@chromium.org, dch...@chromium.org, msea...@chromium.org, phajd...@chromium.org


On Thursday, February 5, 2015 at 5:02:44 AM UTC-5, Daniel Bratell wrote:
On Sun, 01 Feb 2015 07:48:05 +0100, Sergiy Byelozyorov <ser...@chromium.org> wrote:



On Fri Jan 30 2015 at 6:38:36 PM Ravi <rmi...@chromium.org> wrote:
The confusion would result from the revert CL not landing immediately. I personally would rather click through an extra dialog or read an info message telling me what is going to happen, than having my expected keywords be automatically missing and spend time trying to figure out where they went. Leaving the decision to Jason.

+1

It sounds to me like the problem you (rmistry and sergeiyb and most likely many others) are seeing is that you don't trust the CQ and don't know how to check what it's working on and prioritizing at the moment?

My comments were to ask for transparency into what the revert button was going to do if we started taking away expected CQ keywords like NOTRY. Jason's proposal sufficiently addressed my concerns.
There is no distrust of the CQ :) the CQ has been fast and reliable because of the great work done by the infra CQ team.

These are the pages I use to see what the CQ is doing (there may be other useful ones that the infra team can point to):
You can also substitute chromium in the links above for another project to see its CQ (eg: skia).

John Abd-El-Malek

unread,
Feb 5, 2015, 10:19:47 AM2/5/15
to Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan, Jr.
On Tue, Feb 3, 2015 at 1:22 PM, 'Jason Robbins' via infra-dev <infr...@chromium.org> wrote:
My first thought about it is that I agree that users should not be uncertain about the affect of a revert, and that the dialog for doing a revert has some room to add information that could clarify what will happen.  

I am thinking that we could just make all the revert options more explicit.  E.g., in ASCII-art:

Reason for revert:
  [text entry box]

NOPRESUBMIT       [X]
NOTREECHECKS   [X]
NOTRY                    [_]
BUG:                        [________]


I'm thinking that the rule for NOTRY on reverts should just be that it defaults to false because (ideally) a revert should still pass tests. The user doing the revert can get in the habit of checking it by clicking on it if they need it (e.g., to quickly roll back a series of changes).  If we later put in a more tricky rule for the NOTRY default, the user will still be able to see it and override it before pressing the submit button.

Can we leave the 3 checkboxes unchecked and disable them if a commit is older than a certain time (6 or 12 hours?). We can put a red message saying something to the effect that a change is too old to be reverted without going through CQ.
 

Nico Weber

unread,
Feb 10, 2015, 10:43:36 AM2/10/15
to John Abd-El-Malek, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan, Jr.
I have a question on how the analyze stuff works. Someone on this thread might know the answer to that. https://codereview.chromium.org/898183005/ (an OWNERS file change) got committed almost immediately after I hit "cq" – apparently from the presubmit bot, not by the cq (?). Very fast, cool. However, hitting cq still kicked off builds on 25 bots, and they all think that they should actually compile things (which seems wasteful).

How does the presubmit bot know that it can land this CL, if all the other bots can't even figure out that there's no need to compile anything?

John Abd-El-Malek

unread,
Feb 10, 2015, 10:51:08 AM2/10/15
to Nico Weber, she...@chromium.org, Sergey Berezin, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan, Jr.
On Tue, Feb 10, 2015 at 7:42 AM, Nico Weber <tha...@chromium.org> wrote:
I have a question on how the analyze stuff works. Someone on this thread might know the answer to that. https://codereview.chromium.org/898183005/ (an OWNERS file change) got committed almost immediately after I hit "cq" – apparently from the presubmit bot, not by the cq (?).

Sheyang/Sergey/Pawel: this is new to me; I would expect that CQ would need the same bots to pass (the full list) independent of how the try was triggered?

I guess I have seen this before where a "git cl try", on Android only file, might only trigger the android boy because of the presubmit filters. However it seems that the CQ verifier should trigger any other trybot in the full list that's not there, instead of just looking that the ones in Rietveld are green.
 
Very fast, cool. However, hitting cq still kicked off builds on 25 bots, and they all think that they should actually compile things (which seems wasteful).

Note that this happened because we have a blacklist of paths that if changed, avoid analyze.py shortcuts. See the "read filter exclusion spec" step in your cl, which lists the list (that contains your changed file).

John Abd-El-Malek

unread,
Feb 10, 2015, 10:57:59 AM2/10/15
to Nico Weber, she...@chromium.org, Sergey Berezin, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn, Paweł Hajdan, Jr.
On Tue, Feb 10, 2015 at 7:50 AM, John Abd-El-Malek <j...@chromium.org> wrote:


On Tue, Feb 10, 2015 at 7:42 AM, Nico Weber <tha...@chromium.org> wrote:
I have a question on how the analyze stuff works. Someone on this thread might know the answer to that. https://codereview.chromium.org/898183005/ (an OWNERS file change) got committed almost immediately after I hit "cq" – apparently from the presubmit bot, not by the cq (?).

Sheyang/Sergey/Pawel: this is new to me; I would expect that CQ would need the same bots to pass (the full list) independent of how the try was triggered?

I guess I have seen this before where a "git cl try", on Android only file, might only trigger the android boy because of the presubmit filters. However it seems that the CQ verifier should trigger any other trybot in the full list that's not there, instead of just looking that the ones in Rietveld are green.

Paweł Hajdan, Jr.

unread,
Feb 16, 2015, 6:05:25 AM2/16/15
to John Abd-El-Malek, Nico Weber, she...@chromium.org, Sergey Berezin, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn
On Tue, Feb 10, 2015 at 4:50 PM, John Abd-El-Malek <j...@chromium.org> wrote:
On Tue, Feb 10, 2015 at 7:42 AM, Nico Weber <tha...@chromium.org> wrote:
I have a question on how the analyze stuff works. Someone on this thread might know the answer to that. https://codereview.chromium.org/898183005/ (an OWNERS file change) got committed almost immediately after I hit "cq" – apparently from the presubmit bot, not by the cq (?).

Sheyang/Sergey/Pawel: this is new to me; I would expect that CQ would need the same bots to pass (the full list) independent of how the try was triggered?

Maybe something else is going on, but all sources available to me say https://codereview.chromium.org/898183005/ was landed manually and not using CQ:

1. Committed patchset #1 (id:1) manually as b2648b98a6826b6e5d0fa7fa072984546901e0ad (presubmit successful).

2. https://chromium-cq-status.appspot.com/patch-status/898183005/1 shows CQ triggered bots but stopped processing the CL very shortly afterwards since the CQ checkbox has been unchecked.

3. Raw CQ logs confirm the above.

I hope that addresses the surprise, and for bots compiling on that CL - consider adding OWNERS to "analyze" blacklist. Note that we still benefit from CQ in that case to run PRESUBMIT on the OWNERS changes.

Paweł

Christian Biesinger

unread,
Mar 4, 2015, 9:45:06 PM3/4/15
to Paweł Hajdan, Jr., John Abd-El-Malek, Nico Weber, she...@chromium.org, Sergey Berezin, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn
Hi there,

so I just did use NOTRY=true for
https://codereview.chromium.org/978123002 because it seemed like the
analyze step wasn't working -- gdb.py is not part of the build in any
way, yet bots like
http://build.chromium.org/p/tryserver.blink/builders/win_blink_rel/builds/53612
were compiling everything and
http://build.chromium.org/p/tryserver.blink/builders/linux_blink_rel/builds/50825
was running tests. Is that expected?

-christian

Dirk Pranke

unread,
Mar 4, 2015, 9:54:18 PM3/4/15
to Christian Biesinger, Paweł Hajdan, Jr., John Abd-El-Malek, Nico Weber, she...@chromium.org, Sergey Berezin, Jason Robbins, Sergiy Byelozyorov, Ravi, Chromium-dev, infr...@chromium.org, Scott Hess, Ilya Sherman, Daniel Cheng, Mark Seaborn
That's a Blink change, and I don't think analyze is wired up on any of the Blink trybots yet, so yes, I would expect to see what you're seeing. John's logic therefore doesn't apply so much to Blink changes.

However, for your change I also see no compelling reason to do NOTRY=true; it's true that your change won't break anything, but there's also no reason to rush the change into landing and no real reason not to skip the normal workflow. We' re not so short on resources that it's worth trying to optimize things here.

-- Dirk

-- Dirk

Reply all
Reply to author
Forward
0 new messages