Gerrit CI warning: master is currently broken

45 views
Skip to first unread message

lucamilanesio

unread,
Apr 21, 2020, 6:00:13 AM4/21/20
to Repo and Gerrit Discussion
Hi all,
the Gerrit master branch is currently broken because of PolyGerrit test failures.

The first red build is:

and the changes that possibly have broken it are:
  1. Fix flaky gr-edit-controls tests (detail)
  2. Improve polygerrit test performance (detail)

I am investigating on why that breakage wasn't picked up by the change validation.
In the meantime, bear in mind that any Verified -1 due to the master being broken are potentially not associated with your change but rather with the master being red.

Luca.

David Ostrovsky

unread,
Apr 21, 2020, 8:57:56 AM4/21/20
to Repo and Gerrit Discussion

Am Dienstag, 21. April 2020 12:00:13 UTC+2 schrieb lucamilanesio:
Hi all,
the Gerrit master branch is currently broken because of PolyGerrit test failures.

The first red build is:

and the changes that possibly have broken it are:
  1. Fix flaky gr-edit-controls tests (detail)
  2. Improve polygerrit test performance (detail)
Given that for both changes the PG tests were passing on GerritForge CI,
can we bisect the problem?



Luca Milanesio

unread,
Apr 21, 2020, 9:52:59 AM4/21/20
to David Ostrovsky, Luca Milanesio, Repo and Gerrit Discussion
Actually, Dimitrii has found the issue and that tells me that when the Bazel rules are modified, we should run the tests on *everything*.
I’ll leave Dimitrii describe the details :-) because the Bazel rules are still half of “black magic” to me, I need to study a lot more on it.

Luca.

David Ostrovsky

unread,
Apr 21, 2020, 10:14:14 AM4/21/20
to Repo and Gerrit Discussion
I think I understand Bazel rules. So let me try to summarize what happened.

In this CL: [1] that changed only PG related are of code, PG test rules
were changed/added. One new test rule wasn't tagged with "manual" label.

That means, that wildcard target execution:

  $ bazel test //...

would trigger that test. However, PG test cannot be executed in that way,
but must be started from shell script that initializes the environment.

Why this breakage was not detected in: [1]? Because of the CI optimization
machinery we are skipping server check entirely wenn only PG area of code
was touched. That code is here: [2].

One possible solution would be to always trigger PG and server tests if any
Bazel related files changed (Thomas started to work on this in: [3]), or just remove
the whole fuzzy logic with check skipping and always run all tests.

I think that with the switch to RBE and with activated remote caching we should
be fine running more test, not less tests.


Luca Milanesio

unread,
Apr 21, 2020, 10:22:26 AM4/21/20
to David Ostrovsky, Luca Milanesio, Repo and Gerrit Discussion

On 21 Apr 2020, at 15:14, David Ostrovsky <david.o...@gmail.com> wrote:


Am Dienstag, 21. April 2020 15:52:59 UTC+2 schrieb lucamilanesio:


On 21 Apr 2020, at 13:57, David Ostrovsky <david.o...@gmail.com> wrote:


Am Dienstag, 21. April 2020 12:00:13 UTC+2 schrieb lucamilanesio:
Hi all,
the Gerrit master branch is currently broken because of PolyGerrit test failures.

The first red build is:

and the changes that possibly have broken it are:
  1. Fix flaky gr-edit-controls tests (detail)
  2. Improve polygerrit test performance (detail)
Given that for both changes the PG tests were passing on GerritForge CI,
can we bisect the problem?

Actually, Dimitrii has found the issue and that tells me that when the Bazel rules are modified, we should run the tests on *everything*.
I’ll leave Dimitrii describe the details :-) because the Bazel rules are still half of “black magic” to me, I need to study a lot more on it.

I think I understand Bazel rules. So let me try to summarize what happened.

In this CL: [1] that changed only PG related are of code, PG test rules
were changed/added. One new test rule wasn't tagged with "manual" label.

That means, that wildcard target execution:

  $ bazel test //...

would trigger that test. However, PG test cannot be executed in that way,
but must be started from shell script that initializes the environment.

Thanks for the “less magical than mine” explanation :-)


Why this breakage was not detected in: [1]? Because of the CI optimization
machinery we are skipping server check entirely wenn only PG area of code
was touched. That code is here: [2].

Yes, exactly.


One possible solution would be to always trigger PG and server tests if any
Bazel related files changed (Thomas started to work on this in: [3]),

Already +2 that change.

or just remove
the whole fuzzy logic with check skipping and always run all tests.

Yes, but we would need first to have RBE fully enabled to run PolyGerrit tests.
We’ll get there eventually, but we’re not there yet :-(

Luca.


I think that with the switch to RBE and with activated remote caching we should
be fine running more test, not less tests.



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/56f2385f-007c-43fe-b2ce-f35331d2b824%40googlegroups.com.

Dmitrii Filippov

unread,
Apr 21, 2020, 12:34:15 PM4/21/20
to Repo and Gerrit Discussion
Hi,
David, thanks for good explanation, you did my job :) I don't know what to add.

About RBE - not sure, how good it works with node modules.
Some times ago I tried to enable remote cache for polygerrit rules (tests and build) and it was very slow because of huge amount of files in node_modules (https://tsh.io/wp-content/uploads/2019/06/node-modules-app-performance_.png)

Btw, if you need help with RBE and polygerrit - let me know.

Best regards,
Dmitrii
Reply all
Reply to author
Forward
0 new messages