RFC: simplifying Blink CQ, switching fully to SimpleTryJobVerifier

90 views
Skip to first unread message

Paweł Hajdan, Jr.

unread,
Feb 18, 2014, 7:15:01 PM2/18/14
to blink-dev
Hello Blink Developers,

I'm working on the commit queue (CQ) and would like to hear your feedback about some possible changes to the Blink CQ intended to improve it.

1. Would it be fine to move presubmit check from a dedicated blink_presubmit bot to {linux,win,mac}_blink{,_rel} bots? Note that blink_presubmit bot can stay for possible manually submitted try jobs - it'd be just CQ relying on the *_blink* bots to run it instead of having logic to handle the presubmit bot and only then launch other jobs.

2. Should mac_layout and win_layout bots stay on the CQ? Looks like they are intended to be compile-only, but the other *_blink* bots already provide compile coverage, and they do run webkit_lint.

3. What do you think about switching Blink CQ to a simpler commit queue verifier (that's an implementation detail)? It's like 1500 lines of code vs. 150, and it's much easier to reason about and debug in case of problems. A possible drawback is that it relies on the recipe to handle test retries (all Blink trybots except the _layout bots - see #2 above - are on recipes), and so it doesn't do retries itself (which also helps keep things simple and seems to work well in practice for Chromium CQ).

Comments are welcome - note that if any of these changes goes really bad, it'll be easy to revert it and quickly restart CQ. This is an operation that can be done within minutes.

The changes I'm recommending here would really help the health of the CQ. If possible, please bear with me while I'm working on this.

Paweł

Adam Barth

unread,
Feb 18, 2014, 7:55:09 PM2/18/14
to Paweł Hajdan, Jr., blink-dev
On Tue, Feb 18, 2014 at 4:15 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Hello Blink Developers,

I'm working on the commit queue (CQ) and would like to hear your feedback about some possible changes to the Blink CQ intended to improve it.

1. Would it be fine to move presubmit check from a dedicated blink_presubmit bot to {linux,win,mac}_blink{,_rel} bots? Note that blink_presubmit bot can stay for possible manually submitted try jobs - it'd be just CQ relying on the *_blink* bots to run it instead of having logic to handle the presubmit bot and only then launch other jobs.

Yes, that's fine.  I think Robbie moved it from the master to it's own bot at some point, but there's no reason it couldn't be merged with the other bots.

2. Should mac_layout and win_layout bots stay on the CQ? Looks like they are intended to be compile-only, but the other *_blink* bots already provide compile coverage, and they do run webkit_lint.

We should probably keep them.  They make sure we don't break the debug compile on Mac and Win.  For example, if a Windows specific file contains an ASSERT, win_layout is the only bot that will check that the code inside the ASSERT actually builds.

3. What do you think about switching Blink CQ to a simpler commit queue verifier (that's an implementation detail)? It's like 1500 lines of code vs. 150, and it's much easier to reason about and debug in case of problems. A possible drawback is that it relies on the recipe to handle test retries (all Blink trybots except the _layout bots - see #2 above - are on recipes), and so it doesn't do retries itself (which also helps keep things simple and seems to work well in practice for Chromium CQ).

That seems fine to me, but I don't really understand the difference between the two.

Comments are welcome - note that if any of these changes goes really bad, it'll be easy to revert it and quickly restart CQ. This is an operation that can be done within minutes.

Sounds good.
 
The changes I'm recommending here would really help the health of the CQ. If possible, please bear with me while I'm working on this.

I'm definitely in favor of improving the health of the CQ.  :)

Adam

Dirk Pranke

unread,
Feb 18, 2014, 7:57:21 PM2/18/14
to Paweł Hajdan, Jr., blink-dev, Robbie Iannucci
On Tue, Feb 18, 2014 at 4:15 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Hello Blink Developers,

I'm working on the commit queue (CQ) and would like to hear your feedback about some possible changes to the Blink CQ intended to improve it.

1. Would it be fine to move presubmit check from a dedicated blink_presubmit bot to {linux,win,mac}_blink{,_rel} bots? Note that blink_presubmit bot can stay for possible manually submitted try jobs - it'd be just CQ relying on the *_blink* bots to run it instead of having logic to handle the presubmit bot and only then launch other jobs.


I thought there was a reason we were running it on its own bot, but I don't remember what it was. Perhaps +iannucci remembers.

Are you thinking that we'd run the presubmit N times, one for each other bot? That seems a bit redundant ...
 
2. Should mac_layout and win_layout bots stay on the CQ? Looks like they are intended to be compile-only, but the other *_blink* bots already provide compile coverage, and they do run webkit_lint.


Adam covered this; I think we want to keep them to keep compile coverage.
 
3. What do you think about switching Blink CQ to a simpler commit queue verifier (that's an implementation detail)? It's like 1500 lines of code vs. 150, and it's much easier to reason about and debug in case of problems. A possible drawback is that it relies on the recipe to handle test retries (all Blink trybots except the _layout bots - see #2 above - are on recipes), and so it doesn't do retries itself (which also helps keep things simple and seems to work well in practice for Chromium CQ).


I don't understand the difference here either. It's probably fine, and I doubt most of the people on blink-dev have any idea, but given that I do at least roughly understand the code, I'd be happy to follow up w/ you off-list to understand and answer better.

Otherwise, I am of course in favor of whatever makes the CQ work better :).

-- Dirk

Robert Iannucci

unread,
Feb 19, 2014, 10:46:49 PM2/19/14
to Dirk Pranke, Paweł Hajdan, Jr., blink-dev
I think the reason we were running it separately was because CQ was previously running the presubmit directly before launching any jobs. It seemed best to preserve the 'fail fast once on presubmit before tying up other builder resources' behavior.

Paweł Hajdan, Jr.

unread,
Feb 20, 2014, 9:24:19 PM2/20/14
to Adam Barth, blink-dev
On Tue, Feb 18, 2014 at 4:55 PM, Adam Barth <aba...@chromium.org> wrote:
2. Should mac_layout and win_layout bots stay on the CQ? Looks like they are intended to be compile-only, but the other *_blink* bots already provide compile coverage, and they do run webkit_lint.

We should probably keep them.  They make sure we don't break the debug compile on Mac and Win.  For example, if a Windows specific file contains an ASSERT, win_layout is the only bot that will check that the code inside the ASSERT actually builds.

I see. Can I just add recipe-based builders that compile in debug mode? There should be no change in behavior observable to you other than different bot name.

At least for now I intend to keep the old bots around on the tryserver, it'd just be CQ only using recipe-based bots.

Thank you for very positive feedback so far, looks like it'll be pretty easy to make these changes. At any point feel free to alert me to problems or suggest a better way to configure the Blink CQ.

Paweł

Dirk Pranke

unread,
Feb 20, 2014, 9:46:16 PM2/20/14
to Paweł Hajdan, Jr., Adam Barth, blink-dev
On Thu, Feb 20, 2014 at 6:24 PM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
On Tue, Feb 18, 2014 at 4:55 PM, Adam Barth <aba...@chromium.org> wrote:
2. Should mac_layout and win_layout bots stay on the CQ? Looks like they are intended to be compile-only, but the other *_blink* bots already provide compile coverage, and they do run webkit_lint.

We should probably keep them.  They make sure we don't break the debug compile on Mac and Win.  For example, if a Windows specific file contains an ASSERT, win_layout is the only bot that will check that the code inside the ASSERT actually builds.

I see. Can I just add recipe-based builders that compile in debug mode? There should be no change in behavior observable to you other than different bot name.


Yes, please do so :). The more things we switch to recipes, the better. If you felt like switching the main bots over, that would be even better (and should be easy, given that it's just the try server recipes w/o retries in most cases).

-- Dirk

Daniel Cheng

unread,
Feb 23, 2014, 12:27:56 AM2/23/14
to Dirk Pranke, Paweł Hajdan, Jr., Adam Barth, blink-dev
Would these changes have affected the normal blink tryservers? I get a lot of try "failures" now when I send out tryjobs due to the fact that I haven't yet received a LGTM. Can we make the presubmit for non-CQ tryjobs skip that particular check?

Daniel

Dirk Pranke

unread,
Feb 23, 2014, 6:15:26 PM2/23/14
to Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev, Robbie Iannucci
Oh, yes, that would probably be the case :(. Paweł, it's not obvious to me how we'd change the recipe to do that, though; any ideas?

Robert Iannucci

unread,
Feb 24, 2014, 1:46:28 PM2/24/14
to Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
We could/should probably skip the LGTM presubmit check when the tryjob requestor is commit-bot.

Dirk Pranke

unread,
Feb 24, 2014, 1:53:26 PM2/24/14
to Robert Iannucci, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
Presumably you mean *not* commit-bot ?

-- Dirk

Robert Iannucci

unread,
Feb 24, 2014, 2:09:27 PM2/24/14
to Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
Oh, yeah, right :)

James Robinson

unread,
Feb 24, 2014, 5:24:32 PM2/24/14
to Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
I managed to hit a filelock while running the presubmit checks on one of the windows bots:


** Presubmit ERRORS **
run-bindings-tests (0.22s) failed
Traceback (most recent call last):
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Source\bindings\..\..\Tools\Scripts\run-bindings-tests", line 53, in <module>
    sys.exit(main(sys.argv))
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Source\bindings\..\..\Tools\Scripts\run-bindings-tests", line 49, in main
    return BindingsTests(reset_results, test_python, verbose, executive.Executive()).main()
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\bindings\main.py", line 301, in main
    current_scm = detect_scm_system(os.curdir)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\detection.py", line 83, in detect_scm_system
    return SCMDetector(FileSystem(), Executive()).detect_scm_system(path, patch_directories)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\detection.py", line 72, in detect_scm_system
    return SVN(cwd=absolute_path, patch_directories=patch_directories, filesystem=self._filesystem, executive=self._executive)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\svn.py", line 54, in __init__
    SCM.__init__(self, cwd, **kwargs)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\scm.py", line 48, in __init__
    self.checkout_root = self.find_checkout_root(self.cwd)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\svn.py", line 91, in find_checkout_root
    uuid = self._find_uuid(path)
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\svn.py", line 78, in _find_uuid
    return self.value_from_svn_info(path, 'Repository UUID')
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\checkout\scm\svn.py", line 84, in value_from_svn_info
    info_output = Executive().run_command(svn_info_args, cwd=path).rstrip()
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\system\executive.py", line 394, in run_command
    close_fds=self._should_close_fds())
  File "E:\b\build\slave\win_layout\build\src\third_party\WebKit\Tools\Scripts\webkitpy\common\system\executive.py", line 466, in popen
    return subprocess.Popen(string_args, **kwargs)
  File "E:\b\depot_tools\python276_bin\lib\subprocess.py", line 709, in __init__
    errread, errwrite)
  File "E:\b\depot_tools\python276_bin\lib\subprocess.py", line 957, in _execute_child
    startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Exception WindowsError: (32, 'The process cannot access the file because it is being used by another process', 'c:\\users\\chrome~2\\appdata\\local\\temp\\tmpvsl5xs') in <bound method ScopedTempFileProvider.__del__ of <webkitpy.bindings.main.ScopedTempFileProvider object at 0x02323510>> ignored

 BUILDBOT_BUILDERNAME: win_blink_rel
 BUILDBOT_BUILDNUMBER: 28707
 BUILDBOT_CLOBBER: 
 BUILDBOT_GOT_REVISION: None
 BUILDBOT_REVISION: HEAD
 BUILDBOT_SCHEDULER: try_job_rietveld
 BUILDBOT_SLAVENAME: vm524-m4

The failure hasn't generated an email (yet), but if it does I'll reply.

- James

Scott Graham

unread,
Feb 24, 2014, 6:07:42 PM2/24/14
to James Robinson, Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
Just in case it's not obvious, note that there's 2 separate errors here, the first the [Error 2] is something to do with running that SVN command (looks like just not finding the SVN binary?)

The second (WindowsError 32) and below is likely a bug in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/bindings/main.py&l=63 which probably should be using __enter__/__exit__ rather than __del__.

http://docs.python.org/2/reference/datamodel.html#object.__del__ in particular "It is not guaranteed that__del__() methods are called for objects that still exist when the interpreter exits." so there's a fair chance those temp dirs are leaking.

Nils Barth

unread,
Feb 24, 2014, 8:00:41 PM2/24/14
to Scott Graham, James Robinson, Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
Scott:
Just in case it's not obvious, note that there's 2 separate errors here, the first the [Error 2] is something to do with running that SVN command (looks like just not finding the SVN binary?)

The second (WindowsError 32) and below is likely a bug in https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/bindings/main.py&l=63 which probably should be using __enter__/__exit__ rather than __del__

Thanks Scott; this is bindings generator test code, so I'll see about fixing it ASAP!​​

Nils Barth

unread,
Feb 24, 2014, 8:50:19 PM2/24/14
to Scott Graham, James Robinson, Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
Fix posted and landing; will dcommit if win_blink_rel fails (as is likely):
179253002Fix resource leak (temporary files and directories) in run-bindings-tests

James Robinson

unread,
Feb 24, 2014, 9:13:23 PM2/24/14
to Nils Barth, Scott Graham, Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
The presubmit step is currently disabled for win_blink_rel, FWI.

- James

Eric Seidel

unread,
Mar 10, 2014, 7:13:24 PM3/10/14
to James Robinson, Nils Barth, Scott Graham, Robert Iannucci, Dirk Pranke, Daniel Cheng, Paweł Hajdan, Jr., Adam Barth, blink-dev
I'm noticing blink_presubmit failing as a result of "git cl try"
telling me that I'm missing an LGTM. Is this a known regression?
https://codereview.chromium.org/193413002/

Paweł Hajdan, Jr.

unread,
Mar 10, 2014, 8:06:11 PM3/10/14
to Eric Seidel, James Robinson, Nils Barth, Scott Graham, Robert Iannucci, Dirk Pranke, Daniel Cheng, Adam Barth, blink-dev
That's before Adam's LGTM, right?

If so, I think that's expected.

Obviously question is - should git cl try run blink_presubmit then. Maybe it shouldn't.

Paweł

Eric Seidel

unread,
Mar 10, 2014, 8:12:49 PM3/10/14
to Paweł Hajdan, Jr., Eric Seidel, James Robinson, Nils Barth, Scott Graham, Robert Iannucci, Dirk Pranke, Daniel Cheng, Adam Barth, blink-dev
Do we need to split blink_presubmit into two pieces?  One of which is only run just before submit?

Paweł Hajdan, Jr.

unread,
Apr 1, 2014, 1:18:08 PM4/1/14
to Eric Seidel, James Robinson, Nils Barth, Scott Graham, Robert Iannucci, Dirk Pranke, Daniel Cheng, Adam Barth, blink-dev
We could do that, although it may seem a bit cumbersome.

I'd be interested in hearing other opinions from the Blink community.

Paweł
Reply all
Reply to author
Forward
0 new messages