We have been having an issue over the last couple of days where gerrits git connections over ssh become unresponsive. There is very little to go on in the logs.
If I try do do any git operation on a repository nothing happens. In the sshd_log I get the following once I ctrl-c the command
In the error log I can see lots of the following traces, however they do not correlate to the fail connection attempts there are entries for NioProcessor1-9 and they vary between CancelledKeyException and ConnectionResetByPeer:
[2016-08-30 23:44:48,914] [NioProcessor-4] WARN com.google.gerrit.sshd.GerritServerSession : Exception caught
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:287)
at org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:45)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:880)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:778)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1126)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-08-30 23:44:52,397] [NioProcessor-5] WARN com.google.gerrit.sshd.GerritServerSession : Exception caught
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:302)
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:45)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:694)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:668)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:657)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1121)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Finally jstat shows the following, though I'll admit my java foo is not really existent and I don't fully know how to interpret that I believe from google that it is ok:
At this point gerrit is unusable, a restart will get us up and running again for a few hours. Any help or advice on where to look next would be appreciated,