got ssh InterruptedByTimeoutException while pushed a big git repository in gerrit 2.14

1,717 views
Skip to first unread message

Makson Lee

unread,
May 2, 2017, 6:15:12 AM5/2/17
to Repo and Gerrit Discussion
timeout exception? where to configure timeout value?

[2017-05-02 17:56:01,062] [sshd-SshServer[66e218d8]-nio2-thread-5] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[xxxx@/172.17.113.10:56544])[state=Opened] InterruptedByTimeoutException: null
[2017-05-02 17:59:02,907] [SSH git-receive-pack '/alps/kernel-3.18' (xxxx)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxxx account 1000207) during git-receive-pack '/alps/kernel-3.18'
org.apache.sshd.common.SshException: write(ChannelOutputStream[ChannelSession[id=0, recipient=0]-ServerSessionImpl[xxxx@/172.17.113.10:56544]] SSH_MSG_CHANNEL_DATA) len=49 - channel already closed
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:106)
        at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:171)
        at org.eclipse.jgit.transport.SideBandOutputStream.flushBuffer(SideBandOutputStream.java:127)
        at org.eclipse.jgit.transport.BaseReceivePack.close(BaseReceivePack.java:1790)
        at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:211)
        at com.google.gerrit.sshd.commands.Receive.runImpl(Receive.java:96)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:97)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:30)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:63)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:418)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:418)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Makson Lee

unread,
May 3, 2017, 9:12:16 AM5/3/17
to Repo and Gerrit Discussion
switched sshd backend from default NIO2 to MINA fixed the problem.

Luca Milanesio

unread,
May 3, 2017, 9:29:49 AM5/3/17
to Makson Lee, Repo and Gerrit Discussion
That isn't a good sign, it means that the MINA backend does not respect the SSH timeout.

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Makson Lee

unread,
May 3, 2017, 9:41:13 AM5/3/17
to Repo and Gerrit Discussion, cdle...@gmail.com
so, where we can set the SSH timeout value for NIO2 and MINA backend? i can't find it in document.

Luca Milanesio

unread,
May 3, 2017, 10:10:26 AM5/3/17
to Makson Lee, Repo and Gerrit Discussion
https://gerrit-documentation.storage.googleapis.com/Documentation/2.14/config-gerrit.html#sshd.idleTimeout

It should have been zero (unlimited) by default, but it could be that NIO has a different default from the MINA one.

Luca.

Makson Lee

unread,
May 3, 2017, 10:16:54 AM5/3/17
to Repo and Gerrit Discussion, cdle...@gmail.com
thanks, i also found this before, but it says that the default is 0, so i didn't think it is relevant, anyway, i will try to set it explicitly to 0 to see if it can fix the problem.

Makson Lee

unread,
May 3, 2017, 6:39:26 PM5/3/17
to Repo and Gerrit Discussion, cdle...@gmail.com
unfortunately, i have set the idleTimeout explicitly to zero, but still got InterruptedByTimeoutException :-(

[sshd]
        listenAddress = *:29418
        threads = 24
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        backend = NIO2
        idleTimeout = 0

Luca Milanesio

unread,
May 3, 2017, 7:28:34 PM5/3/17
to Makson Lee, Repo and Gerrit Discussion
Have you tried instead to raised it to a very high value?
(e.g. 30 mins or so)

Luca.

Makson Lee

unread,
May 3, 2017, 8:56:41 PM5/3/17
to Repo and Gerrit Discussion, cdle...@gmail.com
didn't work too with following configuration,

[sshd]
        listenAddress = *:29418
        threads = 24
        batchThreads = 4
        rekeyTimeLimit = 0
        rekeyBytesLimit = 1099511627776
        backend = NIO2
        idleTimeout = 1 hr

luca.mi...@gmail.com

unread,
May 4, 2017, 2:13:32 AM5/4/17
to Makson Lee, Repo and Gerrit Discussion
Then it could well be a critical problem with the NIO2 backend in Apache Mina :-(

Funny enough that we switched to NIO2 a few releases ago because it was far more stable!

Luca

Sent from my iPhone
--

Makson Lee

unread,
May 4, 2017, 4:11:28 AM5/4/17
to Repo and Gerrit Discussion, cdle...@gmail.com
we didn't have any problems with NIO2 in 2.13.7 too, anyway, as a workaround, we switch SSHD backend from NIO2 to MINA, and someone else may help to verify that if a long duration push will cause a NIO2 InterruptedByTimeoutException in 2.14.

Robert Pearce

unread,
May 9, 2017, 9:34:12 PM5/9/17
to Repo and Gerrit Discussion, cdle...@gmail.com

I can confirm. Seeing same behavior, and same error message immediately after bumping our version to 2.14. Actually ended up looking for the fix.

Robert Pearce

unread,
May 9, 2017, 9:38:07 PM5/9/17
to Repo and Gerrit Discussion, cdle...@gmail.com
is this being tracked as an official bug at all ? I couldn't find it. We're not specifying any timeout, but i tried using the MINA backend and we're still seeing the timeout issue ? For clients it shows up as a hang after pushing the patch, but it looks like the patch does get pushed up ok .

David Pursehouse

unread,
May 10, 2017, 1:28:01 AM5/10/17
to Robert Pearce, Repo and Gerrit Discussion, cdle...@gmail.com
On Wed, May 10, 2017 at 10:38 AM Robert Pearce <robert...@totaralearning.com> wrote:
is this being tracked as an official bug at all ?

I just entered it in the tracker now:


What length timeout are you seeing with nio2?  I found this issue upstream on sshd:


and I'm wondering if that's related.

Robert Pearce

unread,
May 10, 2017, 4:07:58 PM5/10/17
to David Pursehouse, Repo and Gerrit Discussion, cdle...@gmail.com

we hadnt set the idle timeout at all (in fact we had no ssh settings other than the port defined i believe) so it was all as stock out the box and was working prior to the upgrade. I'm not sure what the actual timeout was with nio2 but long enough that noone using the system waited that long to find out.

with mina its still doing the same thing, but now the patch is actually getting pushed (allbeit the client appears to hang when pushing). So we're on that backend now as its better than nothing.

David Pursehouse

unread,
May 10, 2017, 7:25:34 PM5/10/17
to Robert Pearce, Repo and Gerrit Discussion, cdle...@gmail.com
On Thu, May 11, 2017 at 5:07 AM Robert Pearce <robert...@totaralearning.com> wrote:

we hadnt set the idle timeout at all (in fact we had no ssh settings other than the port defined i believe) so it was all as stock out the box and was working prior to the upgrade. I'm not sure what the actual timeout was with nio2 but long enough that noone using the system waited that long to find out.


As far as I understand from this:


the default when not explicitly set is 10 minutes, so I was wondering if that's the same timeout you're seeing. 
 
I've pushed a WiP fix here:


Would you be able to build gerrit with that and see if it fixes your problem?

Makson Lee

unread,
May 11, 2017, 1:41:58 AM5/11/17
to Repo and Gerrit Discussion, robert...@totaralearning.com, cdle...@gmail.com
the fix works for me, thanks.

luca.mi...@gmail.com

unread,
May 11, 2017, 2:16:06 AM5/11/17
to David Pursehouse, Robert Pearce, Repo and Gerrit Discussion, cdle...@gmail.com


Sent from my iPhone

On 11 May 2017, at 00:25, David Pursehouse <david.pu...@gmail.com> wrote:

On Thu, May 11, 2017 at 5:07 AM Robert Pearce <robert...@totaralearning.com> wrote:

we hadnt set the idle timeout at all (in fact we had no ssh settings other than the port defined i believe) so it was all as stock out the box and was working prior to the upgrade. I'm not sure what the actual timeout was with nio2 but long enough that noone using the system waited that long to find out.


As far as I understand from this:


the default when not explicitly set is 10 minutes, so I was wondering if that's the same timeout you're seeing. 
 
I've pushed a WiP fix here:


Would you be able to build gerrit with that and see if it fixes your problem?

Reply all
Reply to author
Forward
0 new messages