Gerrit integration tests flakiness

lucamilanesio

unread,

Dec 30, 2016, 3:22:58 AM12/30/16

to Repo and Gerrit Discussion

After the successful resumption of our CI ... I started investigating why some of the test suites are failing intermittently.

One that is recurrent is:

//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/ssh:ssh

If you run the following sample script locally:

while true; do buck test --no-results-cache //gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/ssh:ssh; done

it will not take very long to get a well and sound deadlock!

The stack-trace of threads is always the same:

"main" #1 prio=5 os_prio=31 tid=0x00007fd663803800 nid=0x1c03 in Object.wait() [0x0000700007f7d000]

java.lang.Thread.State: TIMED_WAITING (on object monitor)

at java.lang.Object.wait(Native Method)

at java.io.PipedInputStream.read(PipedInputStream.java:326)

- locked <0x00000007bb5fac40> (a com.jcraft.jsch.Channel$MyPipedInputStream)

at java.io.PipedInputStream.read(PipedInputStream.java:377)

- locked <0x00000007bb5fac40> (a com.jcraft.jsch.Channel$MyPipedInputStream)

at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)

at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)

- locked <0x00000007bb5fb098> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:184)

at java.io.Reader.read(Reader.java:100)

at java.util.Scanner.readInput(Scanner.java:804)

at java.util.Scanner.hasNext(Scanner.java:1339)

at com.google.gerrit.acceptance.SshSession.exec(SshSession.java:61)

I tried to submit a fix where SshSession has a socket timeout associated, however, it didn't address the problem.

I started then digging inside the JSch code ... and I realized that it is actually much worse that what I thought!

Our tests in the ssh suite rely on JSch stack and, unfortunately, is *really unstable* and buggy.

If you dive into the code, you realize how messy it is and you wonder ... what I am actually testing?

Am I testing Gerrit or just JSch?

When the tests are more unstable that the code you would like to test ... is like testing the exact length of a ruler using your thumbs!!!

Should we rely on something more stable? Should we just give us testing "in-process" and use an external and more stable SSH client?

Suggestions and ideas are more than welcome :-)

Luca.

Edwin Kempin

unread,

Dec 30, 2016, 3:28:50 AM12/30/16

to lucamilanesio, Repo and Gerrit Discussion

If unstable SSH tests are a problem right now, we might disable them on the CI by setting GERRIT_USE_SSH to NO, see [1].

Of course this is rather a short-term solution as many people rely on SSH,

and we shouldn't leave this untested for long.

[1] https://gerrit-review.googlesource.com/93301

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lucamilanesio

unread,

Dec 30, 2016, 3:35:56 AM12/30/16

to Repo and Gerrit Discussion, luca.mi...@gmail.com

Yes, that's a very good short-term solution! Thanks for sharing it.

Let me add this to the CI scripts and look for the JSch problems in background.

Luca.

[1] https://gerrit-review.googlesource.com/93301

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

lucamilanesio

unread,

Dec 30, 2016, 3:42:29 AM12/30/16

to Repo and Gerrit Discussion, luca.mi...@gmail.com

Done:

https://gerrit-review.googlesource.com/93461

lucamilanesio

unread,

Dec 30, 2016, 4:35:06 AM12/30/16

to Repo and Gerrit Discussion, luca.mi...@gmail.com

It seems that use SSH connections are skiped now:

--- Before ---

PASS 13.7s 2 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.AbandonRestoreIT

PASS 9.5s 1 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.BanCommitIT

PASS 20.7s 2 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.CreateProjectIT

PASS 58.2s 4 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.GarbageCollectionIT

PASS 194.5s 14 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.QueryIT

PASS 18.9s 3 Passed 0 Skipped 0 Failed com.google.gerrit.acceptance.ssh.UploadArchiveIT

--- After ---

ASSUME 5.4s 0 Passed 2 Skipped 0 Failed com.google.gerrit.acceptance.ssh.AbandonRestoreIT

ASSUME 4.1s 0 Passed 1 Skipped 0 Failed com.google.gerrit.acceptance.ssh.BanCommitIT

ASSUME 8.4s 0 Passed 2 Skipped 0 Failed com.google.gerrit.acceptance.ssh.CreateProjectIT

ASSUME 31.6s 0 Passed 4 Skipped 0 Failed com.google.gerrit.acceptance.ssh.GarbageCollectionIT

ASSUME 35.6s 0 Passed 14 Skipped 0 Failed com.google.gerrit.acceptance.ssh.QueryIT

ASSUME 19.3s 0 Passed 3 Skipped 0 Failed com.google.gerrit.acceptance.ssh.UploadArchiveIT

With regards to Bazel build, unfortunately, doesn't provide this level of details ... however, I saw the build was much faster so I trust the different speed :-)

Luca.

lucamilanesio

unread,

Dec 30, 2016, 6:40:33 AM12/30/16

to Repo and Gerrit Discussion, luca.mi...@gmail.com

I may actually have find the deadlock in JSch ... and is caused by rotten code and side-effects, not a surprise :-(

I've uploaded a workaround in our code to prevent the deadlock to occur at:

https://gerrit-review.googlesource.com/93477

I've been running it locally and it seems to work.

Luca.

Edwin Kempin

unread,

Jan 2, 2017, 8:54:54 AM1/2/17

to lucamilanesio, Repo and Gerrit Discussion

It seems that the Bazel build still fails often because some tests run into a timeout?

E.g. https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/495/console

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

David Ostrovsky

unread,

Jan 2, 2017, 11:44:36 AM1/2/17

to Repo and Gerrit Discussion, luca.mi...@gmail.com

On Monday, January 2, 2017 at 2:54:54 PM UTC+1, Edwin Kempin wrote:

It seems that the Bazel build still fails often because some tests run into a timeout?
E.g. https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/495/console

Should be adressed with: [1]. We would need to rebase these changes

to trigger new verification attempt.

[1] https://gerrit-review.googlesource.com/93490

luca.mi...@gmail.com

unread,

Jan 2, 2017, 12:11:48 PM1/2/17

to David Ostrovsky, Repo and Gerrit Discussion

Yes, this wasn't flakiness but only test size too small :-)

Build should be fairy stable now !

Luca

Sent from my iPhone

Edwin Kempin

unread,

Jan 2, 2017, 12:15:18 PM1/2/17

to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky

Thanks, I wasn't aware that this change was not merged yet.

--

Luca Milanesio

unread,

Jan 2, 2017, 6:15:11 PM1/2/17

to Edwin Kempin, Repo and Gerrit Discussion, David Ostrovsky

I'll keep on eye on the CI this week to identify other potential "points of flakiness".

Ideally, we should remove the retry() look in the build, which is responsible of the huge delay of some of the verifications, taking at times up to 4h :-(

Without retry, the set of 4 builds needed to verify a change (bazel + buck) x (reviewdb, notedb) should take no more than 20-30' as they are all executed in parallel.

Luca.

Edwin Kempin

unread,

Jan 3, 2017, 5:58:57 AM1/3/17

to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky

Looks like there are more tests that run into a timeout with Bazel [1]:

TIMEOUT: //gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/project:rest_project (see /home/jenkins/.cache/bazel/_bazel_jenkins/3239551e333dc09ba2b5ef07ff4549b6/execroot/gerrit/bazel-out/local-fastbuild/testlogs/gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/project/rest_project/test.log)

[1] https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/571/consoleFull

Edwin Kempin

unread,

Jan 3, 2017, 6:00:48 AM1/3/17

to Luca Milanesio, Repo and Gerrit Discussion, David Ostrovsky

Sorry for the noise. Looks like this was a vote on an old patch set which wasn't rebased yet...

Luca Milanesio

unread,

Jan 3, 2017, 7:53:17 AM1/3/17

to Edwin Kempin, David Ostrovsky, Repo and Gerrit Discussion

I have noticed that the Bazel builds tend to be slower and more flaky than the equivalent Buck ones.

Buck build trends (mostly green - take on average 25-30')

Bazel build trends (mostly red - take on average 30-35')

Is there any reason for that? The code is the same, tests are the same :-O

P.S. Bazel builds have allocated even more CPU and running hosts (3 x Bazel vs. 1 x Buck), so on the same size and number of nodes would be even slower !)

Luca.

Luca Milanesio

unread,

Jan 3, 2017, 3:48:49 PM1/3/17

to Edwin Kempin, David Ostrovsky, Repo and Gerrit Discussion

And in some builds (e.g. https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-change/5164/console) the Bazel one has even Junit instantiation errors.

See:

3) withStart(com.google.gerrit.server.query.group.AbstractQueryGroupsTest)

java.lang.InstantiationException

at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)

at java.lang.reflect.Constructor.newInstance(Constructor.java:422)

at java.lang.Class.newInstance(Class.java:442)

at com.google.gerrit.testutil.ConfigSuite$ConfigRunner.createTest(ConfigSuite.java:127)

at org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:244)

at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:241)

at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runners.Suite.runChild(Suite.java:127)

at org.junit.runners.Suite.runChild(Suite.java:26)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runners.Suite.runChild(Suite.java:127)

at org.junit.runners.Suite.runChild(Suite.java:26)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at com.google.testing.junit.runner.internal.junit4.CancellableRequestFactory$CancellableRunner.run(CancellableRequestFactory.java:89)

at org.junit.runner.JUnitCore.run(JUnitCore.java:160)

at org.junit.runner.JUnitCore.run(JUnitCore.java:138)

at com.google.testing.junit.runner.junit4.JUnit4Runner.run(JUnit4Runner.java:112)

at com.google.testing.junit.runner.BazelTestRunner.runTestsInSuite(BazelTestRunner.java:140)

at com.google.testing.junit.runner.BazelTestRunner.main(BazelTestRunner.java:79)

If you look at the stack-trace, it seems that bazel uses a custom JUnit test runner that possibly provides a different runtime environment to the tests.

Can we try to fallback to a standard JUnit runner and see if this flakiness / instability / slowness goes away?

Can we report to the Bazel team? Possibly they could give us some hints on how to identify the problem?

At the moment, the Bazel builds always lack behind because of the multiple retries to get a successful one :-(

Other ideas are more than welcome :-)

Luca.

On 3 Jan 2017, at 12:53, Luca Milanesio <Luca.Mi...@gmail.com> wrote:

I have noticed that the Bazel builds tend to be slower and more flaky than the equivalent Buck ones.

Buck build trends (mostly green - take on average 25-30')

<PastedGraphic-2.png>

Bazel build trends (mostly red - take on average 30-35')

<PastedGraphic-1.png>

David Ostrovsky

unread,

Jan 3, 2017, 4:35:41 PM1/3/17

to Repo and Gerrit Discussion, eke...@google.com

On Tuesday, January 3, 2017 at 9:48:49 PM UTC+1, lucamilanesio wrote:

And in some builds (e.g. https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-change/5164/console) the Bazel one has even Junit instantiation errors.

See:
3) withStart(com.google.gerrit.server.query.group.AbstractQueryGroupsTest)
java.lang.InstantiationException
at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)

Well, let us not blame Bazel for not being able to instantiate an

abstract class ;-)

@Ignore annotation is missing on new introduced AbstractQueryGroupsTest

class, while it is there on old AbstractQueryAccountsTest class.

Other ideas are more than welcome :-)

I would stop all these retrying attempts, and only build once.

Actually we already verify every single change 4 times:

(Buck | Bazel) x (ReviewDb | NoteDbReadWrite)

This should be more than enough.

Luca Milanesio

unread,

Jan 3, 2017, 6:55:31 PM1/3/17

to David Ostrovsky, Repo and Gerrit Discussion, eke...@google.com

On 3 Jan 2017, at 21:35, David Ostrovsky <david.o...@gmail.com> wrote:

On Tuesday, January 3, 2017 at 9:48:49 PM UTC+1, lucamilanesio wrote:
And in some builds (e.g. https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-change/5164/console) the Bazel one has even Junit instantiation errors.

See:
3) withStart(com.google.gerrit.server.query.group.AbstractQueryGroupsTest)
java.lang.InstantiationException
at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)

Well, let us not blame Bazel for not being able to instantiate an
abstract class ;-)

That's fine, I am not blaming anyone, just noticing differences :-)

Another thing that is strange is: why only sometimes? Why when I run the same exact build again some of those "genuine" failures (like this one) go away? Why only with Bazel and not with Buck?

@Ignore annotation is missing on new introduced AbstractQueryGroupsTest
class, while it is there on old AbstractQueryAccountsTest class.

Other ideas are more than welcome :-)

I would stop all these retrying attempts, and only build once.
Actually we already verify every single change 4 times:

(Buck | Bazel) x (ReviewDb | NoteDbReadWrite)

This should be more than enough.

I can try to set the retry to zero and see how it goes for a few days.

Agreed, these retry cycles make the build lasting *hours* which is very annoying :-(

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Luca Milanesio

unread,

Jan 3, 2017, 7:26:29 PM1/3/17

to David Ostrovsky, Repo and Gerrit Discussion, eke...@google.com

To be fair, the AbstractQueryGroupsTest class should be flagged as @Ignore because it is actually abstract and its tests cannot be instantiated.

But (again) this change did not introduce that class, just modified some parts of it. Why on earth Bazel build was working before for that test, ignoring it, and now isn't doing it anymore in this patch-set?

And secondly, why Buck works instead?

That's very confusing :-(

Luca.

David Ostrovsky

unread,

Jan 4, 2017, 12:54:32 AM1/4/17

to Repo and Gerrit Discussion, eke...@google.com

On Wednesday, January 4, 2017 at 1:26:29 AM UTC+1, lucamilanesio wrote:

To be fair, the AbstractQueryGroupsTest class should be flagged as @Ignore because it is actually abstract and its tests cannot be instantiated.
But (again) this change did not introduce that class, just modified some parts of it. Why on earth Bazel build was working before for that test, ignoring it, and now isn't doing it anymore in this patch-set?

Was it? I don't see any successfull verification with Bazel for

this specific change unless i fixed it in patch set 4.

And secondly, why Buck works instead?

The difference here is in java_test rule implementations for different

build tool chains.

Buck induces the actual tests from the sources and by doing this

respects if the test class is abstract or interface and skip it: [1].

Bazel expects test suite file and because Gerrit project doesn't

have them, they are generated in this Bazlet: [2]. The generation

takes place for each and every file, no matter if it is abstract or not.

That why the only way to skip tests in Bazel for now is not to include

abstract classes in sources set provided to the java_test rule or

annotate them with @Ignore annotation.

* [1] https://github.com/facebook/buck/blob/master/src/com/facebook/buck/testrunner/JUnitRunner.java#L117-L122

* [2] https://github.com/gerrit-review/gerrit/blob/master/tools/bzl/junit.bzl#L56-L63

Luca Milanesio

unread,

Jan 4, 2017, 2:49:49 AM1/4/17

to David Ostrovsky, Repo and Gerrit Discussion, eke...@google.com

On 4 Jan 2017, at 05:54, David Ostrovsky <david.o...@gmail.com> wrote:

On Wednesday, January 4, 2017 at 1:26:29 AM UTC+1, lucamilanesio wrote:
To be fair, the AbstractQueryGroupsTest class should be flagged as @Ignore because it is actually abstract and its tests cannot be instantiated.
But (again) this change did not introduce that class, just modified some parts of it. Why on earth Bazel build was working before for that test, ignoring it, and now isn't doing it anymore in this patch-set?

Was it? I don't see any successfull verification with Bazel for
this specific change unless i fixed it in patch set 4.

I thought the class was an existing one, but instead was created in the previous change, which was failing exactly for the same reason.

It makes sense then and it is failing systematically, which is much better :-)

And secondly, why Buck works instead?

The difference here is in java_test rule implementations for different
build tool chains.

Buck induces the actual tests from the sources and by doing this
respects if the test class is abstract or interface and skip it: [1].

Bazel expects test suite file and because Gerrit project doesn't
have them, they are generated in this Bazlet: [2]. The generation
takes place for each and every file, no matter if it is abstract or not.
That why the only way to skip tests in Bazel for now is not to include
abstract classes in sources set provided to the java_test rule or
annotate them with @Ignore annotation.

* [1] https://github.com/facebook/buck/blob/master/src/com/facebook/buck/testrunner/JUnitRunner.java#L117-L122
* [2] https://github.com/gerrit-review/gerrit/blob/master/tools/bzl/junit.bzl#L56-L63

Thanks for the detailed explanation, it makes sense to me now. We should then have a "check" method to fail-fast the build in both Buck and Bazel if we find any abstract test class.

Failing immediately on both is *much better* than running for hours on both and getting a partial success and only on Buck :-(

It shouldn't be difficult to write after all :-)

Luca Milanesio

unread,

Jan 4, 2017, 4:28:42 AM1/4/17

to David Ostrovsky, Edwin Kempin, Repo and Gerrit Discussion

Builds seemed a lot more stable and consistent over the past 24h, I am going to disable NOW the retry mechanism and see how it goes.

Ideally, all the remaining (genuinely flaky) tests should be just be labelled as such and skipped, rather than going through expensive and slow retry mechanism.

Luca.

Edwin Kempin

unread,

Jan 4, 2017, 4:31:04 AM1/4/17

to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion

On Wed, Jan 4, 2017 at 10:28 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:

Builds seemed a lot more stable and consistent over the past 24h,

\o/

I am going to disable NOW the retry mechanism and see how it goes.

+1 sounds good to me :)

--
--
To unsubscribe, email repo-discuss+unsub...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Dave Borowitz

unread,

Jan 10, 2017, 10:58:29 AM1/10/17

to Edwin Kempin, Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion

I'm still seeing a number of flaky test timeouts in the bazel builds. Is this still an outstanding known issue? For example: https://gerrit-review.googlesource.com/93830

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

David Ostrovsky

unread,

Jan 10, 2017, 1:06:18 PM1/10/17

to Repo and Gerrit Discussion, eke...@google.com, luca.mi...@gmail.com

On Tuesday, January 10, 2017 at 4:58:29 PM UTC+1, Dave Borowitz wrote:

I'm still seeing a number of flaky test timeouts in the bazel builds. Is this still an outstanding known issue? For example: https://gerrit-review.googlesource.com/93830

Looks like Jenkins issue to me: [1]. The better question is:

why it only affects Bazel build and Buck build just works? Luca?

* [1] https://issues.jenkins-ci.org/browse/JENKINS-28476

Luca Milanesio

unread,

Jan 10, 2017, 6:36:34 PM1/10/17

to David Ostrovsky, Repo and Gerrit Discussion, eke...@google.com

So, we have:

1. Multiple timeouts in tests

https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/1023/console

//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/api/change:api_change TIMEOUT in 906.5s
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/git:git TIMEOUT in 905.0s
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/change:rest_change_other TIMEOUT in 905.0s
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/change:rest_change_submit TIMEOUT in 905.0s
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/rest/project:rest_project TIMEOUT in 905.1s
//gerrit-acceptance-tests/src/test/java/com/google/gerrit/acceptance/server/change:server_change TIMEOUT in 908.5s
//gerrit-server:query_tests                                             TIMEOUT in 908.4s

2. Unexpected termination of the channel

https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-bazel/1058/console

FATAL: java.io.IOException: Unexpected termination of the channel
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel

This second one was running on an Intel NUC that I had on my desk (Building remotely on nuc-1214f55d7ded): I manually switch that on during the day to give some extra boost to the builds.

I may actually have pulled the network plug :-( oops !

Luca.

Han-Wen Nienhuys

unread,

Jan 11, 2017, 6:54:12 AM1/11/17

to Luca Milanesio, David Ostrovsky, Repo and Gerrit Discussion, Edwin Kempin

Does buck support test timeouts? How quickly do these test cases run with Buck?

> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com

> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to repo-discuss...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--

Han-Wen Nienhuys
Google Munich
han...@google.com

Luca Milanesio

unread,

Jan 11, 2017, 7:15:41 AM1/11/17

to Han-Wen Nienhuys, David Ostrovsky, Repo and Gerrit Discussion, Edwin Kempin

I do not see any execution stats in Buck :-(

@David you are the Buck-expert here, any ideas?

Luca.

David Ostrovsky

unread,

Jan 11, 2017, 7:59:01 AM1/11/17

to Repo and Gerrit Discussion, han...@google.com, eke...@google.com

Am Mittwoch, 11. Januar 2017 13:15:41 UTC+1 schrieb lucamilanesio:

I do not see any execution stats in Buck :-(

@David you are the Buck-expert here, any ideas?

Sure, it's all there, but we don't use this facility (default is no timeout)

There are three places in Buck where/how to set test timeout:

1. per test, in .buckconfig: [1]

[test]

timeout = 300000

The default is no timeout. A JUnit test can override this via the @Test annotation.

2. per java_test() rule, in .buckconfig: [2]

[test]

rule_timeout = 1200000

The number of milliseconds per *_test rule to allow before stopping it and reporting a failure. The default is no timeout.

3. overwrite those setting in java_test() rule: [3]

test_rule_timeout_ms: If set specifies the maximum amount of time (in milliseconds) in which all of the tests in this rule should complete.

This overrides the default rule_timeout if any has been specified in test.rule_timeout.

[1] https://buckbuild.com/concept/buckconfig.html#test.timeout

[2] https://buckbuild.com/concept/buckconfig.html#test.rule_timeout

[3] https://buckbuild.com/rule/java_test.html

Luca Milanesio

unread,

Jan 12, 2017, 4:06:04 AM1/12/17

to Repo and Gerrit Discussion, Han-Wen Nienhuys, Edwin Kempin, David Ostrovsky, Dave Borowitz

I've disabled the Bazel test timeouts for the past 12h, by forcing all the test labels to use a 3600 sec timeout value.

I see a lot more green on the bazel build trends:

I'll go through the reds to see how many of them are still for timeout or other flaky cause.

Luca.

Reply all

Reply to author

Forward