Have the CQ commit when the tree is "throttled"

77 views
Skip to first unread message

Marc-Antoine Ruel

unread,
Nov 15, 2012, 8:36:47 PM11/15/12
to chromium-dev
https://codereview.chromium.org/11280022 makes the CQ commits when the tree is throttled.

Since the CQ already checks the patches, the chances of further breaking a partially sick tree is much lower so it is an useful fallback mode when nobody is sheriff but the tree is a partially broken state or things are simply out of control for too long.

Note that the CQ already limits its rate of commits: it burst 3 commits every 4 minutes*.

* These twos numbers where carefully selected after a fair dice roll.

Thanks,

M-A

Dirk Pranke

unread,
Nov 15, 2012, 8:52:31 PM11/15/12
to maruel...@google.com, chromium-dev
It's been a while since I was a sheriff, but it seemed to me that the
whole point of throttling was to (a) give the bots a chance to catch
up, so that fewer changes were in flight at once and (b) help limit or
control which parts of the tree might be in flux, in order to better
triage certain kinds of failures.

It may be that the CQ already checks the patches, but does it really
give us more safety than the fact that all changes are supposed to
have been run through the try servers first? (This is a real question,
maybe it does).

It seems like this defeats the point of throttling. Are there
advantages to making this change I'm not seeing?

-- Dirk
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev

Dana Jansens

unread,
Nov 15, 2012, 8:54:18 PM11/15/12
to dpr...@google.com, maruel...@google.com, chromium-dev
On Thu, Nov 15, 2012 at 8:52 PM, Dirk Pranke <dpr...@chromium.org> wrote:
> It's been a while since I was a sheriff, but it seemed to me that the
> whole point of throttling was to (a) give the bots a chance to catch
> up, so that fewer changes were in flight at once and (b) help limit or
> control which parts of the tree might be in flux, in order to better
> triage certain kinds of failures.
>
> It may be that the CQ already checks the patches, but does it really
> give us more safety than the fact that all changes are supposed to
> have been run through the try servers first? (This is a real question,
> maybe it does).

Especially since CQ does not run all the bots on the waterfall, such
as win_aura, linux_aura.

Marc-Antoine Ruel

unread,
Nov 15, 2012, 8:56:48 PM11/15/12
to Dana Jansens, dpr...@google.com, maruel...@google.com, chromium-dev


2012/11/15 Dana Jansens <dan...@chromium.org>

On Thu, Nov 15, 2012 at 8:52 PM, Dirk Pranke <dpr...@chromium.org> wrote:
> It's been a while since I was a sheriff, but it seemed to me that the
> whole point of throttling was to (a) give the bots a chance to catch
> up, so that fewer changes were in flight at once and (b) help limit or
> control which parts of the tree might be in flux, in order to better
> triage certain kinds of failures.
 
(a) is keeping the tree closed.
(b) is also keeping the tree closed and having people commit manually.

> It may be that the CQ already checks the patches, but does it really
> give us more safety than the fact that all changes are supposed to
> have been run through the try servers first? (This is a real question,
> maybe it does).

It does.

Especially since CQ does not run all the bots on the waterfall, such
as win_aura, linux_aura.

They'll be added real soon.
 
> It seems like this defeats the point of throttling. Are there
> advantages to making this change I'm not seeing?

Yes, when the tree is not closely monitored. Think of the holidays for example.

M-A

John Abd-El-Malek

unread,
Nov 15, 2012, 8:57:47 PM11/15/12
to dpr...@google.com, maruel...@google.com, chromium-dev
On Thu, Nov 15, 2012 at 5:52 PM, Dirk Pranke <dpr...@chromium.org> wrote:
It's been a while since I was a sheriff, but it seemed to me that the
whole point of throttling was to (a) give the bots a chance to catch
up, so that fewer changes were in flight at once and (b) help limit or
control which parts of the tree might be in flux, in order to better
triage certain kinds of failures.

It may be that the CQ already checks the patches, but does it really
give us more safety than the fact that all changes are supposed to
have been run through the try servers first? (This is a real question,
maybe it does).

This would be my concern as well. If we want to allow changes that have passed the tests, then why would a developer who has run trybots successfully not be able to commit directly as well?

My experience sheriffing and seeing other sheriff is that when the tree is throttled, the sheriff wants to vet every change that goes in first. This would go against that. Especially since trybots/CQ don't run every configuration that can break the tree, which is why some human judgement of whether a change is safe to land or not is helpful.

Dirk Pranke

unread,
Nov 15, 2012, 9:03:57 PM11/15/12
to Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
On Thu, Nov 15, 2012 at 5:56 PM, Marc-Antoine Ruel <mar...@google.com> wrote:
>
>
> 2012/11/15 Dana Jansens <dan...@chromium.org>
>>
>> On Thu, Nov 15, 2012 at 8:52 PM, Dirk Pranke <dpr...@chromium.org> wrote:
>> > It's been a while since I was a sheriff, but it seemed to me that the
>> > whole point of throttling was to (a) give the bots a chance to catch
>> > up, so that fewer changes were in flight at once and (b) help limit or
>> > control which parts of the tree might be in flux, in order to better
>> > triage certain kinds of failures.
>
>
> (a) is keeping the tree closed.
> (b) is also keeping the tree closed and having people commit manually.
>

I disagree. If the tree is closed, I normally interpret that as
nothing should be landed unless it is fixing the tree; that is a
higher bar than just throttling.

>> > It may be that the CQ already checks the patches, but does it really
>> > give us more safety than the fact that all changes are supposed to
>> > have been run through the try servers first? (This is a real question,
>> > maybe it does).
>
>
> It does.
>

Perhaps you could say what they are?

>> Especially since CQ does not run all the bots on the waterfall, such
>> as win_aura, linux_aura.
>
>
> They'll be added real soon.
>
>>
>> > It seems like this defeats the point of throttling. Are there
>> > advantages to making this change I'm not seeing?
>
>
> Yes, when the tree is not closely monitored. Think of the holidays for
> example.
>

It seems to me you're describing an open tree, not a throttled tree.

-- Dirk

Ojan Vafai

unread,
Nov 16, 2012, 12:23:43 AM11/16/12
to Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
Could we walk a middle ground to start with here and still avoid committing to a closed tree during peak commit hours (i.e. weekdays 11am-7pm PST)? During less busy hours, the extra commits are less likely to make cleaning up the tree more difficult.


John Abd-El-Malek

unread,
Nov 16, 2012, 2:06:18 AM11/16/12
to ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
I don't see how off-peak changes this. There'll have to be a sheriff who's throttling the tree anyways, so why wouldn't they want to see which changes land?

Isaac Levy

unread,
Nov 16, 2012, 2:11:56 AM11/16/12
to jabde...@google.com, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
How about adding a special annotation to the CL description to mean
sheriff approved?

NOTRY=true <- CQ runs no trybots (only presubmit, essentially a dcommit)
NOTRY=approved <- all of the above + commit on throttled tree

Note the former already exists.
-Isaac

John Abd-El-Malek

unread,
Nov 16, 2012, 2:51:22 AM11/16/12
to Isaac Levy, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
Is this s

Viet-Trung Luu

unread,
Nov 16, 2012, 2:55:22 AM11/16/12
to dpr...@google.com, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
+1 to Dirk's concerns.

As sheriff, I'd throttle to let some bots catch up. I might entertain commit requests to a throttled tree, but I'd vet them (e.g., if I'm waiting for Win bots to catch up, I might let a Mac-only change in).

I don't see why the CQ should get more privileges with respect to a throttled tree. A manual commit may/should also come with a complete set of try jobs, and moreover the committer is more likely to be online and responsive!

That we throttle the tree at all (instead of closing) is essentially informative; for all intents and purposes, the tree is closed to commits that aren't approved. If the meaning of "throttled" is unclear, we could always just go back to having only two statuses: open or closed.

(Also, I've never seen the tree left throttled during off hours or over the holidays. The typical situation is either an ugly open tree, or a tree that's been closed for a long time. So I don't see how the proposed change helps.)

- Trung

On Thu, Nov 15, 2012 at 6:03 PM, Dirk Pranke <dpr...@google.com> wrote:

John Abd-El-Malek

unread,
Nov 16, 2012, 2:55:51 AM11/16/12
to Isaac Levy, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
(hit send accidentally)

Is this a common enough issue that it needs a solution? My feeling so far is that tree throttling is something that should be minimized, since it just leads to a whole slew of checkins after the tree is opened. IMO the biggest thing we can do to help the tree stay green is ensuring no untested changes land (happens too often unfortunately. if each developer breaks the tree once a week because of this, then the tree would always be closed) and that try/CQ jobs test everything that can break the tree.

On Thu, Nov 15, 2012 at 11:11 PM, Isaac Levy <il...@chromium.org> wrote:

Ojan Vafai

unread,
Nov 16, 2012, 12:19:23 PM11/16/12
to John Abd-El-Malek, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
One of the problems with committing to a closed tree is that it makes it hard to diagnose where the failures are coming from when the bots aren't caught up. During off-peak hours, the bots are more likely to be caught up, so the commits are less likely to cause extra noise that makes diagnosing the failures more difficult.

Better than looking at time of day would be to look at how many pending jobs the bots have and use that to decide whether it's OK to commit.

There's no sheriff on weekends and we don't actually have 24/5 coverage during the week, right?

On Thu, Nov 15, 2012 at 11:06 PM, John Abd-El-Malek <j...@chromium.org> wrote:

John Abd-El-Malek

unread,
Nov 16, 2012, 2:07:27 PM11/16/12
to Ojan Vafai, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
On Fri, Nov 16, 2012 at 9:19 AM, Ojan Vafai <oj...@google.com> wrote:
One of the problems with committing to a closed tree is that it makes it hard to diagnose where the failures are coming from when the bots aren't caught up. During off-peak hours, the bots are more likely to be caught up,

In that case, why would the tree be throttled?
 
so the commits are less likely to cause extra noise that makes diagnosing the failures more difficult.

Better than looking at time of day would be to look at how many pending jobs the bots have and use that to decide whether it's OK to commit.

There's no sheriff on weekends and we don't actually have 24/5 coverage during the week, right?

Right. That's why during those times, the tree shouldn't really be throttled. If it is, then there's an acting sheriff who actively throttled it, and they would want to control what goes in.

Jochen Eisinger

unread,
Nov 16, 2012, 2:26:58 PM11/16/12
to jabde...@google.com, Ojan Vafai, ojan+...@google.com, Dirk Pranke, Marc-Antoine Ruel, Dana Jansens, maruel...@google.com, chromium-dev
On Fri, Nov 16, 2012 at 8:07 PM, John Abd-El-Malek <j...@chromium.org> wrote:
>
>
> On Fri, Nov 16, 2012 at 9:19 AM, Ojan Vafai <oj...@google.com> wrote:
>>
>> One of the problems with committing to a closed tree is that it makes it
>> hard to diagnose where the failures are coming from when the bots aren't
>> caught up. During off-peak hours, the bots are more likely to be caught up,
>
>
> In that case, why would the tree be throttled?
>
>>
>> so the commits are less likely to cause extra noise that makes diagnosing
>> the failures more difficult.
>>
>> Better than looking at time of day would be to look at how many pending
>> jobs the bots have and use that to decide whether it's OK to commit.
>>
>> There's no sheriff on weekends and we don't actually have 24/5 coverage
>> during the week, right?
>
>
> Right. That's why during those times, the tree shouldn't really be
> throttled. If it is, then there's an acting sheriff who actively throttled
> it, and they would want to control what goes in.

We're actually pretty close to 24/5 - even if there's no sheriff
scheduled, we have always troopers scheduled, and committers working
that want to get their patches in, so they keep the tree green and
open.

However, that still means the tree shouldn't be throttled without a
sheriff that throttles it.

-jochen
Reply all
Reply to author
Forward
0 new messages