Re: Gerrit CI: java.io.IOException: Backing channel 'docker-21de6187bec93a' is disconnected.

22 views
Skip to first unread message

Luca Milanesio

unread,
Nov 5, 2018, 4:02:42 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel


On 5 Nov 2018, at 08:59, Ole Rehmsen <ol...@google.com> wrote:


Yes, working on that.
For some reasons the SSH slaves keys are not accepted.

Luca.

Luca Milanesio

unread,
Nov 5, 2018, 6:57:10 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel


On 5 Nov 2018, at 10:56, Ole Rehmsen <ol...@google.com> wrote:

Great to hear. Is there a bug/thread I can follow, or will you report back here?

Problem is fixed now, builds have been resumed.


Also, it feels like CI is taking much longer for a week now or so

a *week* ??? Can you give me some examples?

The typical trend is between 30' and 1h (real max).
I'm trying to push for avoiding the ReviewDb validation skipped on master (see [1]) so that it could be between 15' and 30' :-)

- used to come back within 1-2 hours

That's slow, it should be 1/2 of that time.

, now it's sometimes half a day. Is there a link where one can see the queue?


If you login (top-right link) with your GitHub account and send me your username, I can grant you more permissions.

Luca.

--
Google Germany GmbH | Erika-Mann-Straße 33 | 80636 Muenchen | Germany

AG Hamburg, HRB 86891 | Sitz der Gesellschaft: Hamburg | Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle

Ole Rehmsen

unread,
Nov 5, 2018, 7:11:17 AM11/5/18
to Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel
Great to hear that CI should work again. Thanks!

It's been a week since it seemed slower to me - not a week that the CI is running. Example: 
The -1 was removed at 8:38 AM UTC+1. The run is still not done, at 1:07 PM UTC+1, which is more than 4 hours. I am seeing the same for basically all other changes in that stack.

I logged into computer, but am not seeing my builds. Maybe I don't know where to look? My Github id is rehmsen, if I need permissions or anything.

Again, thanks for your help.

Ole Rehmsen

unread,
Nov 5, 2018, 7:11:17 AM11/5/18
to Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel
Great to hear. Is there a bug/thread I can follow, or will you report back here?

Also, it feels like CI is taking much longer for a week now or so - used to come back within 1-2 hours, now it's sometimes half a day. Is there a link where one can see the queue?

On Mon, Nov 5, 2018 at 10:02 AM Luca Milanesio <luca.mi...@gmail.com> wrote:

Luca Milanesio

unread,
Nov 5, 2018, 7:46:49 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel

On 5 Nov 2018, at 12:09, Ole Rehmsen <ol...@google.com> wrote:

Great to hear that CI should work again. Thanks!

Spoke too early, found another problem :-(

Luca Milanesio

unread,
Nov 5, 2018, 7:56:34 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel

On 5 Nov 2018, at 12:09, Ole Rehmsen <ol...@google.com> wrote:

Great to hear that CI should work again. Thanks!

It's been a week since it seemed slower to me - not a week that the CI is running. Example: 

Ah gotcha ;-)
During the hackathon the roundtrip is very fast (Google pays for extra nodes on GCloud), however, those instances are not available now and the ones we use (slave1, slave2, nuc and dockerhost) are fully based on slaves hosted and paid by GerritForge.

We can of course put more money on it :-) but 1h delay for a validation seems "good enough" for us.

Luca.

Ole Rehmsen

unread,
Nov 5, 2018, 8:38:11 AM11/5/18
to Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel
I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

luca.mi...@gmail.com

unread,
Nov 5, 2018, 8:41:48 AM11/5/18
to Ole Rehmsen, repo-d...@googlegroups.com, Patrick Hiesel


Sent from my iPhone

On 5 Nov 2018, at 13:37, Ole Rehmsen <ol...@google.com> wrote:

I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

Today we had problems and all the builds were failing: I finally should have fixed the second problem and just trying out if it is stable.

Then I’ll reopen the flood gates: there is a bit of backlog though :-(

Today is not the typical day :-)

Luca

Luca Milanesio

unread,
Nov 5, 2018, 8:48:47 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel

On 5 Nov 2018, at 13:41, luca.mi...@gmail.com wrote:



Sent from my iPhone

On 5 Nov 2018, at 13:37, Ole Rehmsen <ol...@google.com> wrote:

I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

Today we had problems and all the builds were failing: I finally should have fixed the second problem and just trying out if it is stable.

Then I’ll reopen the flood gates: there is a bit of backlog though :-(

Today is not the typical day :-)

Now it *seems* to work, but builds are failing still ... seems a code-issue.
Going to validate master on CI and locally, before declaring the incident resolved.

Luca.

Luca Milanesio

unread,
Nov 5, 2018, 9:35:21 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel

On 5 Nov 2018, at 13:48, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 13:41, luca.mi...@gmail.com wrote:



Sent from my iPhone

On 5 Nov 2018, at 13:37, Ole Rehmsen <ol...@google.com> wrote:

I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

Today we had problems and all the builds were failing: I finally should have fixed the second problem and just trying out if it is stable.

Then I’ll reopen the flood gates: there is a bit of backlog though :-(

Today is not the typical day :-)

Now it *seems* to work, but builds are failing still ... seems a code-issue.
Going to validate master on CI and locally, before declaring the incident resolved.

Gerrit master is green:

Opening the flood gates :-)
Apologies for the inconvenience, I was able to track down the root cause of the problem and revert the commit.

Unfortunately (my bad) I did not test the commit properly when it was merged, that why it "loitered" for a few weeks before showing up :-(
I will invest more time into a validation system for the CI-related changes.

Luca.

Luca Milanesio

unread,
Nov 5, 2018, 9:36:46 AM11/5/18
to Ole Rehmsen, Luca Milanesio, repo-d...@googlegroups.com, Patrick Hiesel

On 5 Nov 2018, at 14:35, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 13:48, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 13:41, luca.mi...@gmail.com wrote:



Sent from my iPhone

On 5 Nov 2018, at 13:37, Ole Rehmsen <ol...@google.com> wrote:

I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

Today we had problems and all the builds were failing: I finally should have fixed the second problem and just trying out if it is stable.

Then I’ll reopen the flood gates: there is a bit of backlog though :-(

Today is not the typical day :-)

Now it *seems* to work, but builds are failing still ... seems a code-issue.
Going to validate master on CI and locally, before declaring the incident resolved.

Gerrit master is green:

Opening the flood gates :-)

Gerrit has 27 change(s) since 2018-11-04 12:17:00.420 +0000 [202033, 187786, 202910, 202840, 197112, 202310, 194040, 201732, 202514, 202515, 202513, 202512, 202511, 202673, 202672, 202596, 202595, 202594, 202593, 202290, 196632, 202072, 202010, 201731, 201490, 202850, 202890]
... but I've got bandwidth for only 16 of them at the moment
================================================================================

27 changes to validate ... it will *definitely* take the whole afternoon if not the night to catch-up :-(

Luca.

Luca Milanesio

unread,
Nov 5, 2018, 11:25:04 AM11/5/18
to Repo and Gerrit Discussion, Luca Milanesio, Patrick Hiesel, Ole Rehmsen

On 5 Nov 2018, at 14:36, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 14:35, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 13:48, Luca Milanesio <luca.mi...@gmail.com> wrote:



On 5 Nov 2018, at 13:41, luca.mi...@gmail.com wrote:



Sent from my iPhone

On 5 Nov 2018, at 13:37, Ole Rehmsen <ol...@google.com> wrote:

I see, thanks for the explanation.
One hour would indeed be fine - but this one has been waiting for 6 hours now: https://gerrit-review.googlesource.com/c/gerrit/+/201490/11
If I wanted to find out what is taking so long, how would I do that? When I open https://gerrit-ci.gerritforge.com/computer/, I see "master" and "slave-1-..." both building, but they have supposedly started just minutes ago. I see nothing in the queue. Does that mean the CI run for my change was never started? Is there anything I can do to make it start? Or is this the second issue you just discovered?

Today we had problems and all the builds were failing: I finally should have fixed the second problem and just trying out if it is stable.

Then I’ll reopen the flood gates: there is a bit of backlog though :-(

Today is not the typical day :-)

Now it *seems* to work, but builds are failing still ... seems a code-issue.
Going to validate master on CI and locally, before declaring the incident resolved.

Gerrit master is green:

Opening the flood gates :-)

Gerrit has 27 change(s) since 2018-11-04 12:17:00.420 +0000 [202033, 187786, 202910, 202840, 197112, 202310, 194040, 201732, 202514, 202515, 202513, 202512, 202511, 202673, 202672, 202596, 202595, 202594, 202593, 202290, 196632, 202072, 202010, 201731, 201490, 202850, 202890]
... but I've got bandwidth for only 16 of them at the moment
================================================================================

It was a lot faster than I thought: only 1 left in queue ...
I believe in max 2h all backlog should be recovered.

Luca

Ole Rehmsen

unread,
Nov 5, 2018, 1:05:13 PM11/5/18
to Luca Milanesio, Repo and Gerrit Discussion, Patrick Hiesel
Thanks a bunch, Luca!
Reply all
Reply to author
Forward
0 new messages