Proposal: add slimming paint v2 trybot to presubmit for changes in cc

13 views
Skip to first unread message

wko...@chromium.org

unread,
Apr 3, 2017, 3:03:08 PM4/3/17
to Graphics-dev, Philip Rogers, Chris harrelson
Proposal for discussion: update cc PRESUBMIT.py to auto-add the Linux Slimming Paint v2 trybot to CQ_INCLUDE_TRYBOTS for changes under cc/.

Specific bot: master.tryserver.chromium.linux:linux_layout_tests_slimming_paint_v2

Bot capacity increase required before we do this: http://crbug.com/703478

Backing rationale: http://crbug.com/707281 burned a few days tracking down a cc change that broke composited animations in SPv2. As we work on the last ~20% of functionality for SPv2 to achieve parity I expect we'll see an increasing risk of breaking changes that aren't directly within Blink paint directories where we add this trybot today.

dan...@chromium.org

unread,
Apr 3, 2017, 3:07:50 PM4/3/17
to wko...@chromium.org, Graphics-dev, Philip Rogers, Chris harrelson
I like the idea of running tests.

First thought: Would failing this bot be an FYI or would it block the patch landing?

Second thought: Is the slimming paint bot not running on the waterfall? It's unfortunate to have bots on the waterfall that don't run in the CQ since that can break the waterfall. But then, it's also a problematic situation to have CQ blocking on bots (for some patches only) that don't run on the waterfall, since this allows patches to land and then block CQ for other things.

Third thought: Why is this through presubmit instead of just adding it to the CQ and having analyze decide of the patch is relevant to the bot?

wko...@chromium.org

unread,
Apr 3, 2017, 3:29:26 PM4/3/17
to Graphics-dev, wko...@chromium.org, p...@chromium.org, chri...@chromium.org
On Monday, April 3, 2017 at 12:07:50 PM UTC-7, danakj wrote:
> First thought: Would failing this bot be an FYI or would it block the patch landing?

Thought: start as FYI but to prevent regression we would want to move to blocking once we have gained confidence that that would be a reasonable thing to do. So, what would we need to see to feel confident?

Given we are only adding it for certain subdirectories Blink-side for example, we see occasional regressions today. For example there were multicol changes under WebKit/Source/core/layout that broke the SPv2 trybot builds, we had to notice it was red manually and mark those tests as failing. http://crrev.com/2765013002

pdr@ set up our current try bot and overall test plan, here is the historical doc:

https://docs.google.com/document/d/1QCM912Dr6u38DqyQqd7pxQxDy8FFOoWMMDq7uAXqKdA/view#heading=h.scmfc1jwpawo

In there he mentions setting up add'l SPv2 specific trybots: "When spv2 is closer to launch, additional spv2-specific trybots can be added." I'm not sure exactly what he had in mind. What would be best?



> Second thought: Is the slimming paint bot not running on the waterfall? It's unfortunate to have bots on the waterfall that don't run in the CQ since that can break the waterfall. But then, it's also a problematic situation to have CQ blocking on bots (for some patches only) that don't run on the waterfall, since this allows patches to land and then block CQ for other things.

It's on the tryserver.chromium.linux waterfall here:

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

I think we haven't added it to the fuller CQ because we've been building up more compatibility. We're just now approaching point where broadening likely makes sense, and cc would be an obvious initial candidate, so a bit of a mini capybara type of situation.

> Third thought: Why is this through presubmit instead of just adding it to the CQ and having analyze decide of the patch is relevant to the bot?

See above re: compatibility. I think this would allow us more direct control until we have greater confidence, but also, I know how to add it through presubmit and I don't know what we'd need to do for CQ, so open to thoughts here as well.

Chris Harrelson

unread,
Apr 3, 2017, 3:33:01 PM4/3/17
to Walter, Graphics-dev, Philip Rogers
On Mon, Apr 3, 2017 at 12:29 PM, <wko...@chromium.org> wrote:
On Monday, April 3, 2017 at 12:07:50 PM UTC-7, danakj wrote:
> First thought: Would failing this bot be an FYI or would it block the patch landing?

Thought: start as FYI but to prevent regression we would want to move to blocking once we have gained confidence that that would be a reasonable thing to do. So, what would we need to see to feel confident?

Given we are only adding it for certain subdirectories Blink-side for example, we see occasional regressions today. For example there were multicol changes under WebKit/Source/core/layout that broke the SPv2 trybot builds, we had to notice it was red manually and mark those tests as failing. http://crrev.com/2765013002

pdr@ set up our current try bot and overall test plan, here is the historical doc:

https://docs.google.com/document/d/1QCM912Dr6u38DqyQqd7pxQxDy8FFOoWMMDq7uAXqKdA/view#heading=h.scmfc1jwpawo

In there he mentions setting up add'l SPv2 specific trybots: "When spv2 is closer to launch, additional spv2-specific trybots can be added." I'm not sure exactly what he had in mind. What would be best?

The only reason we have a specific bot is to avoid slowdown of the regular CQ with a lot of extra tests. But since it runs in parallel with the other bots, I'm not sure if it ends up slowing down any commits (assuming we have enough capacity).
 

> Second thought: Is the slimming paint bot not running on the waterfall? It's unfortunate to have bots on the waterfall that don't run in the CQ since that can break the waterfall. But then, it's also a problematic situation to have CQ blocking on bots (for some patches only) that don't run on the waterfall, since this allows patches to land and then block CQ for other things.

It's on the tryserver.chromium.linux waterfall here:

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

I think we haven't added it to the fuller CQ because we've been building up more compatibility. We're just now approaching point where broadening likely makes sense, and cc would be an obvious initial candidate, so a bit of a mini capybara type of situation.

Clarification: it is on the CQ for code touching core/paint/ or platform/graphics/, but not otherwise.

dan...@chromium.org

unread,
Apr 3, 2017, 3:42:55 PM4/3/17
to wko...@chromium.org, Graphics-dev, Philip Rogers, Chris Harrelson
On Mon, Apr 3, 2017 at 3:29 PM, <wko...@chromium.org> wrote:
On Monday, April 3, 2017 at 12:07:50 PM UTC-7, danakj wrote:
> First thought: Would failing this bot be an FYI or would it block the patch landing?

Thought: start as FYI but to prevent regression we would want to move to blocking once we have gained confidence that that would be a reasonable thing to do. So, what would we need to see to feel confident?

Given we are only adding it for certain subdirectories Blink-side for example, we see occasional regressions today. For example there were multicol changes under WebKit/Source/core/layout that broke the SPv2 trybot builds, we had to notice it was red manually and mark those tests as failing. http://crrev.com/2765013002

pdr@ set up our current try bot and overall test plan, here is the historical doc:

https://docs.google.com/document/d/1QCM912Dr6u38DqyQqd7pxQxDy8FFOoWMMDq7uAXqKdA/view#heading=h.scmfc1jwpawo

In there he mentions setting up add'l SPv2 specific trybots: "When spv2 is closer to launch, additional spv2-specific trybots can be added." I'm not sure exactly what he had in mind. What would be best?

> Second thought: Is the slimming paint bot not running on the waterfall? It's unfortunate to have bots on the waterfall that don't run in the CQ since that can break the waterfall. But then, it's also a problematic situation to have CQ blocking on bots (for some patches only) that don't run on the waterfall, since this allows patches to land and then block CQ for other things.

It's on the tryserver.chromium.linux waterfall here:

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

I think we haven't added it to the fuller CQ because we've been building up more compatibility. We're just now approaching point where broadening likely makes sense, and cc would be an obvious initial candidate, so a bit of a mini capybara type of situation.

Cool, given it's on the waterfall, blocking landing if your patch breaks something on this bot seems appropriate. It also looks like the bot only turns red if your patch breaks something that isn't broken without your patch since it's got green runs with failing tests, so that is helpful to making this block CQ.
 

> Third thought: Why is this through presubmit instead of just adding it to the CQ and having analyze decide of the patch is relevant to the bot?

See above re: compatibility. I think this would allow us more direct control until we have greater confidence, but also, I know how to add it through presubmit and I don't know what we'd need to do for CQ, so open to thoughts here as well.

I think cc/PRESUBMIT has the blink tests because analyze didn't exist before. I'm open to adding this bot there too but it's a difficult-to-scale solution. It'd be awesome to learn how to add this to analyze and move both spv2 and blink_rel over to analyze instead of in cc/PRESUBMIT.

+1 to adding the bot to cc/PRESUBMIT from me. Thanks :)

Dirk Pranke

unread,
Apr 3, 2017, 3:50:32 PM4/3/17
to Chris Harrelson, Walter, Graphics-dev, Philip Rogers
A few things ...

First, there is currently *one* machine for that builder, so, yeah, that's way under capacity for anything that you want to be sending builds to at all regularly, let alone as part of a presubmit or on the CQ. We'd normally need to either move it to a bigger pool or the CQ pool itself. But ...

Second, we shouldn't have anything being run in the CQ that isn't also a waterfall (continuous) builder, because it's too hard to expect to keep a tryserver-only config green. So, you need to add a waterfall configuration. But ...

Third, I'm pretty sure I asked this a year ago but I'll ask again: do you really need to run *all* of the layout tests? Because that is indeed quite expensive. And ...

Fourth, adding an extra builder just to run an additional test step is also expensive (since you have to do a full build even though SPv2 is just a runtime flag) so we should try to avoid that if we can. But ...

Fifth, all of this is complicated by the fact that we can't run the layout tests under swarming and hence it's much harder to parallelize that test execution with anything else. *And* part of the reason we use particular directories instead of just relying on `analyze` is because we don't rely on analyze even for the normal layout test case, because the test suites are too slow, because we're not running under swarming. But! ...

Sixth, we're about to start running layout tests under swarming on linux, literally hopefully with CLs landing in the next day or two (so, really, in the next week or two before things stick).

So all of this will hopefully become *much* easier Real Soon Now, and you'll be able to just add a test step into the CQ like any other step. Can we wait for that?

-- Dirk


 

dan...@chromium.org

unread,
Apr 3, 2017, 3:54:57 PM4/3/17
to wko...@chromium.org, Graphics-dev, Philip Rogers, Chris Harrelson
FWIW apparently the way to learn how to change the analyzer is to read https://cs.chromium.org/chromium/src/tools/gn/analyzer.h and/or ask dpranke :)

Dirk Pranke

unread,
Apr 3, 2017, 4:29:51 PM4/3/17
to Dana Jansens, Walter Korman, Graphics-dev, Philip Rogers, Chris Harrelson, Paweł Hajdan Jr.
+phajdan for recipe / what-test-to-run-when-related relevance ...

Well, yeah, but more importantly the answer in this case is that I don't want to change the analyzer, I want to get to a point where we can just make things work Correctly.

Though, OTOH, there is an argument to say that we will always want the functionality to only run a particular test or set of tests on a subset of the changes that we really should run them on, and we should figure out the Right way to do that. And, PRESUBMIT, since it isn't really integrated into the CQ or the rest of the builder configurations, probably isn't the right way to do that :).

-- Dirk

Nico Weber

unread,
Apr 11, 2017, 12:57:33 PM4/11/17
to Dirk Pranke, Dana Jansens, Walter Korman, Graphics-dev, Philip Rogers, Chris Harrelson, Paweł Hajdan Jr.
One of my CLs currently had to wait 10 minutes to get scheduled on this new bot, and it currently has three more pending builds (https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_layout_tests_slimming_paint_v2https://build.chromium.org/p/tryserver.chromium.linux/stats/linux_layout_tests_slimming_paint_v2 claims that the bot takes about 20 minutes to cycle, so it'll add a full 1h20min delay to the third CL that's currently pending. I don't think this bot is backed by enough slaves.

Philip Rogers

unread,
Apr 11, 2017, 1:42:08 PM4/11/17
to Nico Weber, Dirk Pranke, Dana Jansens, Walter Korman, Graphics-dev, Chris Harrelson, Paweł Hajdan Jr.
Hi Nico,

This should have been fixed with https://crbug.com/703478 but it looks like the new capacity wasn't picked up. I've reopened.

--
You received this message because you are subscribed to the Google Groups "Graphics-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev+unsubscribe@chromium.org.

Dirk Pranke

unread,
Apr 11, 2017, 3:31:19 PM4/11/17
to Philip Rogers, Nico Weber, Dana Jansens, Walter Korman, Graphics-dev, Chris Harrelson, Paweł Hajdan Jr.
No one actually responded to my earlier note in this thread (apart from not doing anything, which I suppose is a response to my can we wait question ;).

What do you actually want to do with these tests? I don't like running them in the CQ if we don't have a matching main waterfall builder.

-- Dirk

wko...@chromium.org

unread,
Apr 11, 2017, 4:03:27 PM4/11/17
to Graphics-dev, p...@chromium.org, tha...@chromium.org, dan...@chromium.org, wko...@chromium.org, chri...@chromium.org, phajd...@chromium.org
On Tuesday, April 11, 2017 at 12:31:19 PM UTC-7, Dirk Pranke wrote:
> No one actually responded to my earlier note in this thread (apart from not doing anything, which I suppose is a response to my can we wait question ;).

Yes, I was waiting a bit to see if the stuff you mentioned in flight had landed.

>
> What do you actually want to do with these tests? I don't like running them in the CQ if we don't have a matching main waterfall builder.

Replying specifically:

> First, there is currently *one* machine for that builder, so, yeah, that's way under capacity for anything that you want to be sending builds to at all regularly, let alone as part of a presubmit or on the CQ. We'd normally need to either move it to a bigger pool or the CQ pool itself. But ...

We have that bug above aiming to add capacity. It seems to not have been fully enabled. I had assumed from reading the estimation in the bug (but perhaps I am incorrect?) that the resourcing we scoped is sufficient to add it to just cc/... changes for presubmit and/or CQ or both.

> Second, we shouldn't have anything being run in the CQ that isn't also a waterfall (continuous) builder, because it's too hard to expect to keep a tryserver-only config green. So, you need to add a waterfall configuration. But ...

I'm not super familiar with these builder setups. I see our bot listed on tryserver.chromium.linux on the waterfall page. Is that not sufficient?

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

> Third, I'm pretty sure I asked this a year ago but I'll ask again: do you really need to run *all* of the layout tests? Because that is indeed quite expensive. And ...

IMO we get much of the value from running some specific sub-directories. Probably they add up to about 1/3 of total tests? You can see based on what is enabled in FlagExpectations/enable-slimming-paint-v2 for rough sense.

> Fourth, adding an extra builder just to run an additional test step is also expensive (since you have to do a full build even though SPv2 is just a runtime flag) so we should try to avoid that if we can. But ...

Hmm, this seems a bummer, really no way around having to build as well given it's runtime only?

> Fifth, all of this is complicated by the fact that we can't run the layout tests under swarming and hence it's much harder to parallelize that test execution with anything else. *And* part of the reason we use particular directories instead of just relying on `analyze` is because we don't rely on analyze even for the normal layout test case, because the test suites are too slow, because we're not running under swarming. But! ...

I don't fully grok everything above, but it sounds like it's changing with the below, so focusing there.

> Sixth, we're about to start running layout tests under swarming on linux, literally hopefully with CLs landing in the next day or two (so, really, in the next week or two before things stick).

This sounds good! Did it happen? Does it make it more feasible resource-wise for us to enable SPv2 layout test runs in some manner for cc/... changes?

Open to thoughts on best next steps. My impression from discussion to date is that adding it to analyze is preferred approach? We obviously need to get the resourcing fully addressed first.
> To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.

Philip Rogers

unread,
Apr 11, 2017, 4:52:26 PM4/11/17
to Dirk Pranke, Chris Harrelson, Walter, Graphics-dev
On Mon, Apr 3, 2017 at 12:50 PM, Dirk Pranke <dpr...@chromium.org> wrote:
A few things ...

First, there is currently *one* machine for that builder, so, yeah, that's way under capacity for anything that you want to be sending builds to at all regularly, let alone as part of a presubmit or on the CQ. We'd normally need to either move it to a bigger pool or the CQ pool itself. But ...

Second, we shouldn't have anything being run in the CQ that isn't also a waterfall (continuous) builder, because it's too hard to expect to keep a tryserver-only config green. So, you need to add a waterfall configuration. But ...

Third, I'm pretty sure I asked this a year ago but I'll ask again: do you really need to run *all* of the layout tests? Because that is indeed quite expensive. And ...

Fourth, adding an extra builder just to run an additional test step is also expensive (since you have to do a full build even though SPv2 is just a runtime flag) so we should try to avoid that if we can. But ...

Fifth, all of this is complicated by the fact that we can't run the layout tests under swarming and hence it's much harder to parallelize that test execution with anything else. *And* part of the reason we use particular directories instead of just relying on `analyze` is because we don't rely on analyze even for the normal layout test case, because the test suites are too slow, because we're not running under swarming. But! ...

Sixth, we're about to start running layout tests under swarming on linux, literally hopefully with CLs landing in the next day or two (so, really, in the next week or two before things stick).

So all of this will hopefully become *much* easier Real Soon Now, and you'll be able to just add a test step into the CQ like any other step. Can we wait for that?

Sorry, I missed these questions. I think Walter addressed most of them but I wanted to add that the spv2 trybot has been working surprisingly well with the exception of capacity issues. For example, I haven't seen folks complaining about it blocking their patch unnecessarily because of how it passes the patch along as long as no new test failures are introduced. Once a week I've been going through and triaging any new failures that were introduced. This has been a nice tradeoff between not slowing folks down and still catching regressions.
 

-- Dirk


 

On Mon, Apr 3, 2017 at 12:32 PM, Chris Harrelson <chri...@chromium.org> wrote:


On Mon, Apr 3, 2017 at 12:29 PM, <wko...@chromium.org> wrote:
On Monday, April 3, 2017 at 12:07:50 PM UTC-7, danakj wrote:
> First thought: Would failing this bot be an FYI or would it block the patch landing?

Thought: start as FYI but to prevent regression we would want to move to blocking once we have gained confidence that that would be a reasonable thing to do. So, what would we need to see to feel confident?

Given we are only adding it for certain subdirectories Blink-side for example, we see occasional regressions today. For example there were multicol changes under WebKit/Source/core/layout that broke the SPv2 trybot builds, we had to notice it was red manually and mark those tests as failing. http://crrev.com/2765013002

pdr@ set up our current try bot and overall test plan, here is the historical doc:

https://docs.google.com/document/d/1QCM912Dr6u38DqyQqd7pxQxDy8FFOoWMMDq7uAXqKdA/view#heading=h.scmfc1jwpawo

In there he mentions setting up add'l SPv2 specific trybots: "When spv2 is closer to launch, additional spv2-specific trybots can be added." I'm not sure exactly what he had in mind. What would be best?

The only reason we have a specific bot is to avoid slowdown of the regular CQ with a lot of extra tests. But since it runs in parallel with the other bots, I'm not sure if it ends up slowing down any commits (assuming we have enough capacity).
 

> Second thought: Is the slimming paint bot not running on the waterfall? It's unfortunate to have bots on the waterfall that don't run in the CQ since that can break the waterfall. But then, it's also a problematic situation to have CQ blocking on bots (for some patches only) that don't run on the waterfall, since this allows patches to land and then block CQ for other things.

It's on the tryserver.chromium.linux waterfall here:

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

I think we haven't added it to the fuller CQ because we've been building up more compatibility. We're just now approaching point where broadening likely makes sense, and cc would be an obvious initial candidate, so a bit of a mini capybara type of situation.

Clarification: it is on the CQ for code touching core/paint/ or platform/graphics/, but not otherwise.
 

> Third thought: Why is this through presubmit instead of just adding it to the CQ and having analyze decide of the patch is relevant to the bot?

See above re: compatibility. I think this would allow us more direct control until we have greater confidence, but also, I know how to add it through presubmit and I don't know what we'd need to do for CQ, so open to thoughts here as well.



--
You received this message because you are subscribed to the Google Groups "Graphics-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev+unsubscribe@chromium.org.

Dirk Pranke

unread,
Apr 11, 2017, 5:25:53 PM4/11/17
to Walter Korman, Graphics-dev, Philip Rogers, Nico Weber, Dana Jansens, Chris Harrelson, Paweł Hajdan Jr.
On Tue, Apr 11, 2017 at 1:03 PM, <wko...@chromium.org> wrote:
On Tuesday, April 11, 2017 at 12:31:19 PM UTC-7, Dirk Pranke wrote:
> No one actually responded to my earlier note in this thread (apart from not doing anything, which I suppose is a response to my can we wait question ;).

Yes, I was waiting a bit to see if the stuff you mentioned in flight had landed.

>
> What do you actually want to do with these tests? I don't like running them in the CQ if we don't have a matching main waterfall builder.

Replying specifically:

> First, there is currently *one* machine for that builder, so, yeah, that's way under capacity for anything that you want to be sending builds to at all regularly, let alone as part of a presubmit or on the CQ. We'd normally need to either move it to a bigger pool or the CQ pool itself. But ...

We have that bug above aiming to add capacity. It seems to not have been fully enabled. I had assumed from reading the estimation in the bug (but perhaps I am incorrect?) that the resourcing we scoped is sufficient to add it to just cc/... changes for presubmit and/or CQ or both.

3 may be enough for the current load but it gives us very little headroom.
 

> Second, we shouldn't have anything being run in the CQ that isn't also a waterfall (continuous) builder, because it's too hard to expect to keep a tryserver-only config green. So, you need to add a waterfall configuration. But ...

I'm not super familiar with these builder setups. I see our bot listed on tryserver.chromium.linux on the waterfall page. Is that not sufficient?

https://build.chromium.org/p/tryserver.chromium.linux/waterfall

The terminolgy is confusing, but the "continuous" is the important part, i.e., a non-tryserver builder that is building every revision.

> Third, I'm pretty sure I asked this a year ago but I'll ask again: do you really need to run *all* of the layout tests? Because that is indeed quite expensive. And ...

IMO we get much of the value from running some specific sub-directories. Probably they add up to about 1/3 of total tests? You can see based on what is enabled in FlagExpectations/enable-slimming-paint-v2 for rough sense.

> Fourth, adding an extra builder just to run an additional test step is also expensive (since you have to do a full build even though SPv2 is just a runtime flag) so we should try to avoid that if we can. But ...

Hmm, this seems a bummer, really no way around having to build as well given it's runtime only?

There are various things we can do to work around this.
 

> Fifth, all of this is complicated by the fact that we can't run the layout tests under swarming and hence it's much harder to parallelize that test execution with anything else. *And* part of the reason we use particular directories instead of just relying on `analyze` is because we don't rely on analyze even for the normal layout test case, because the test suites are too slow, because we're not running under swarming. But! ...

I don't fully grok everything above, but it sounds like it's changing with the below, so focusing there.

> Sixth, we're about to start running layout tests under swarming on linux, literally hopefully with CLs landing in the next day or two (so, really, in the next week or two before things stick).

This sounds good! Did it happen? Does it make it more feasible resource-wise for us to enable SPv2 layout test runs in some manner for cc/... changes?

Not yet. "any day now ..." :).
 
Open to thoughts on best next steps. My impression from discussion to date is that adding it to analyze is preferred approach? We obviously need to get the resourcing fully addressed first.

Correct, adding it to analyze and getting the adequate resources would be ideal.

-- Dirk

wko...@chromium.org

unread,
Apr 11, 2017, 5:45:36 PM4/11/17
to Graphics-dev, wko...@chromium.org, p...@chromium.org, tha...@chromium.org, dan...@chromium.org, chri...@chromium.org, phajd...@chromium.org
Filed http://crbug.com/710627 to track planned work to update analyze config.
Reply all
Reply to author
Forward
0 new messages