Luca Milanesio <
luca.mi...@gmail.com> writes:
>> On 22 Feb 2021, at 03:22, James E. Blair <
j...@acmegating.com> wrote:
>>
>> Hi,
>>
>> I think we're ready to continue moving forward with Zuul for Gerrit.
>
> +1, looking forward to it :-)
>
>>
>> We've been running Zuul on plugins for a while, and I think that's been
>> going relatively well. This past week I did manage to cause a couple of
>> outages, for which I apologize. In both cases we made improvements to
>> the system that should make it more robust.
>>
>> We do already have a job for Gerrit ready to run. It only runs the
>> build, no tests yet, and it currently takes about 12-14 minutes to run,
>> which seems like a workable starting point to me.
>
> Do you have the pointer for it?
> I believe that enabling caching and RBE execution, that could be
> massively be brought down to a couple of minutes.
Here's an example run of the build-only job:
https://ci.gerritcodereview.com/t/gerrit/build/d0e62ce4190c4b85b1ac2f576af3da63
>> I believe we can
>> improve on that time, but it seems short enough that we could go ahead
>> and start running it in parallel with the current CI system and not
>> inconvenience anyone.
>
> Sure, we should get started ASAP with something working.
>
>>
>> Of that run time, about 6 minutes is setup, and about 6 minutes is the
>> Bazel build. Monty and I have started work on improving the setup time
>> and using a Bazel cache to reduce the build time.
>
> You can just point to the existing cache
> (
https://gerrit-ci.gerritforge.com/cache) and it should work out of
> the box.
Ah okay. I also set up a cache as a bucket in GCS. It reduced the
build time from 6 minutes to 3 minutes in my experiments. Monty's work
may shave off even more time.
I tried running the tests with
"--test_tag_filters=-flaky,-elastic,-git-protocol-v2" (thanks David for
the correction). The test run took 37 minutes on a very large test
node, and a number of tests timed out and/or failed. So it may take
some iteration to work that out.
>> Meanwhile, we should also start running tests. If I'm reading the
>> job definitions correctly, it looks like we're running:
>>
>> --test_tag_filters=-flaky,elastic,git-protocol-v2
>>
>> I propose the following iterative approach:
>>
>> 1) Add a checker to the gerrit repo so that we start running the current
>> build-only job on changes to Gerrit.
>
> We do not currently have a check for “build only” but just “build + test”.
> Can we also add tests to it so that we can reuse the existing check?
We will need a new checker for Zuul, but we'll only need one -- we won't
have separate checkers for build only and build+test. When we add it,
we'll see 5 checks on Gerrit changes instead of 4. If we later turn off
some of the Jenkins checks, we can disable those checkers and they will
no longer appear.
>> 2) Update that job to start running the above tests.
>
> 1) + 2) are the same phase currently, unless you propose to break them
> down, which would increase the overall build time though, as the setup
> phase would be executed twice, isn’t it?
Sorry, what I was trying to say is that there is a job already defined
and ready to go that builds Gerrit but doesn't run tests. I propose we
start running it, then add in the "bazelisk test" command to that job
shortly afterwords. I agree, we should not run separate jobs for
{build} and {build+test}.
>> 3) Add any additional tests.
>
> Not sure we can do that now, as we would need Docker for the extra tests.
> I would prefer to park this phase for now and, instead, assessing if
> and how we can get rid of the Docker-based tests.
>
> ES support is currently in doubt and should be either dropped or moved
> to a libModule: Docker may NOT be needed anymore in Gerrit tests IMHO.
Okay, step 3 can be a no-op. I mostly just wanted to establish the
iterative approach by starting by running the build job.
>> And as they are ready, we can modify the job to use the Bazel cache and
>> other improvements to the setup time.
>>
>> How does that sound? If the Gerrit maintainers are okay with this, I
>> can go ahead and add the checker.
>
> We already have the Checker for that, can we just have Zuul use it?
> I wouldn’t like to add extra complexity, as we do know that Zuul works
> and we can just use it !
>
> We can run the:
> - Code Style
> - RBE Build/Test
You may recall there are extra considerations with using RBE due to
Zuul's speculative execution capability. Using RBE without exposing the
credential is not a solved problem (though I'm sure we can solve it).
If it's critical that we use RBE, then let's stop now and design for
that. I'm sure it's possible, but it's going to take some commitment
from a Maintainer with good knowledge of the test system, a Googler with
admin access to the cloud account and more of my time.
It's a matter of priorities. We could start running the build job now,
add on locally-executed tests, and work on addressing any shortfalls.
We'll all be able to see immediate results and iterate in the open. And
we'd know that tests can run in a clean environment.
Or, if we don't want to waste time on trying to get tests to work
outside of an RBE environment, then let's stop and design for that.
We'll need commitment to work on it, and it will probably be a while
before we see visible progress on it.
Thoughts on how we should proceed?
-Jim