Gerrit SSH Connections Hang

1,021 views
Skip to first unread message

Callum Scott

unread,
Aug 31, 2016, 4:31:30 AM8/31/16
to Repo and Gerrit Discussion
We have been having an issue over the last couple of days where gerrits git connections over ssh become unresponsive.  There is very little to go on in the logs.

We are running 2.12.2 and:

[root@gerrit logs]# java -version
java version "1.7.0_91"
OpenJDK Runtime Environment (rhel-2.6.2.2.el6_7-x86_64 u91-b00)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

I am able to connect via ssh using:  ssh user...@gerrit.server.com and get the message to use git clone.

If I try do do any git operation on a repository nothing happens.  In the sshd_log I get the following once I ctrl-c the command

[2016-08-30 23:19:49,209 +0000] 047c2d24 cscott a/38 LOGIN FROM 92.15.22.14
[2016-08-30 23:34:32,027 +0000] 047c2d24 cscott a/38 git-upload-pack.Servers.git 1ms 882668ms 0
[2016-08-30 23:34:34,054 +0000] 047c2d24 cscott a/38 LOGOUT

In the error log I can see lots of the following traces, however they do not correlate to the fail connection attempts there are entries for NioProcessor1-9 and they vary between CancelledKeyException and ConnectionResetByPeer:

[2016-08-30 23:44:48,914] [NioProcessor-4] WARN  com.google.gerrit.sshd.GerritServerSession : Exception caught
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:287)
        at org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:45)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:880)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:778)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1126)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2016-08-30 23:44:52,397] [NioProcessor-5] WARN  com.google.gerrit.sshd.GerritServerSession : Exception caught
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:302)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:45)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:694)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:668)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:657)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1121)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Finally jstat shows the following, though I'll admit my java foo is not really existent and I don't fully know how to interpret that I believe from google that it is ok:

[root@gerrit logs]# jstat -gc 19981
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
340480.0 45056.0  0.0   44560.9 802304.0 45029.1  1034240.0   814875.0  107520.0 59489.7     23    1.203   5      1.377    2.580

At this point gerrit is unusable, a restart will get us up and running again for a few hours.  Any help or advice on where to look next would be appreciated,

Callum

Saša Živkov

unread,
Aug 31, 2016, 4:45:31 AM8/31/16
to Callum Scott, Repo and Gerrit Discussion
On Wed, Aug 31, 2016 at 10:31 AM, Callum Scott <scott....@gmail.com> wrote:
We have been having an issue over the last couple of days where gerrits git connections over ssh become unresponsive.  There is very little to go on in the logs.

We are running 2.12.2 and:

[root@gerrit logs]# java -version
java version "1.7.0_91"
OpenJDK Runtime Environment (rhel-2.6.2.2.el6_7-x86_64 u91-b00)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

I am able to connect via ssh using:  ssh user...@gerrit.server.com and get the message to use git clone.

If I try do do any git operation on a repository nothing happens.  In the sshd_log I get the following once I ctrl-c the command

[2016-08-30 23:19:49,209 +0000] 047c2d24 cscott a/38 LOGIN FROM 92.15.22.14
[2016-08-30 23:34:32,027 +0000] 047c2d24 cscott a/38 git-upload-pack.Servers.git 1ms 882668ms 0

Looks like a very long processing time (882 seconds).
Do you regularly run "git gc" on your Git repositories in Gerrit?
Make several JVM thread dumps which your SSH command is running and check what is the thread doing.
Run "top -H" during that time and see if a thread shows high CPU usage then check in the thread dump
for that thread.

 

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Callum Scott

unread,
Aug 31, 2016, 5:33:02 AM8/31/16
to Saša Živkov, Repo and Gerrit Discussion


On 31 August 2016 at 09:44, Saša Živkov <ziv...@gmail.com> wrote:

<snip>


Looks like a very long processing time (882 seconds).
Do you regularly run "git gc" on your Git repositories in Gerrit?

Yes, there should be a job that runs regularly to do this.  Im not confident that it has been running, but I triggered a run yesterday manually, so it has been done in the last 24Hrs
 
Make several JVM thread dumps which your SSH command is running and check what is the thread doing.
Run "top -H" during that time and see if a thread shows high CPU usage then check in the thread dump
for that thread.

I did this, but only got one result from the dump:

[root@gerrit logs]# jstack 7449
7449: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
[root@gerrit logs]# jstack -F 7449
Attaching to process ID 7449, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.91-b01
Deadlock Detection:

java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f41fc032000 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
        at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
        at sun.jvm.hotspot.runtime.DeadlockDetector.createThreadTable(DeadlockDetector.java:149)
        at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:56)
        at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
        at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f41fc032000
        at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
        at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
        ... 15 more
Can't print deadlocks:Unable to deduce type of thread from address 0x00007f41fc032000 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
        at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f41fc032000 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
        at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:65)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
        ... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f41fc032000
        at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
        at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
        ... 12 more


I restarted the gerrit server and can now successfully fetch pull etc, for now.  however when i tried a stack again out of curiosity I go the same as above,

Top does not show any massive amounts of cpu being used

7486 gerrit2   20   0 12.7g 496m  17m S  0.3  3.3   0:00.06 java
 7512 gerrit2   20   0 12.7g 496m  17m S  0.3  3.3   0:00.06 java

[root@gerrit logs]# free -h
             total       used       free     shared    buffers     cached
Mem:           14G        12G       2.1G       836K       176M        11G
-/+ buffers/cache:       1.1G        13G
Swap:           0B         0B         0B
--
Callum

Saša Živkov

unread,
Aug 31, 2016, 5:40:31 AM8/31/16
to Callum Scott, Repo and Gerrit Discussion
On Wed, Aug 31, 2016 at 11:32 AM, Callum Scott <scott....@gmail.com> wrote:


On 31 August 2016 at 09:44, Saša Živkov <ziv...@gmail.com> wrote:

<snip>

Looks like a very long processing time (882 seconds).
Do you regularly run "git gc" on your Git repositories in Gerrit?

Yes, there should be a job that runs regularly to do this.  Im not confident that it has been running, but I triggered a run yesterday manually, so it has been done in the last 24Hrs
 
Make several JVM thread dumps which your SSH command is running and check what is the thread doing.
Run "top -H" during that time and see if a thread shows high CPU usage then check in the thread dump
for that thread.

I did this, but only got one result from the dump:

[root@gerrit logs]# jstack 7449
7449: Unable to open socket file: target process not responding or HotSpot VM not loaded

Which user is used to run the Gerrit process?
Please try using the same user for running jstack.

Callum Scott

unread,
Aug 31, 2016, 6:40:59 AM8/31/16
to Saša Živkov, Repo and Gerrit Discussion

On 31 August 2016 at 10:39, Saša Živkov <ziv...@gmail.com> wrote:

Which user is used to run the Gerrit process?
Please try using the same user for running jstack.


I ran as the gerrit user at our most recent crash and got some more useful results which are attached.

Unfortunately I can't interpret them
--
Callum
stack1
stack2
stack3
stack4
stack5

Saša Živkov

unread,
Aug 31, 2016, 8:10:42 AM8/31/16
to Callum Scott, Repo and Gerrit Discussion
On Wed, Aug 31, 2016 at 12:40 PM, Callum Scott <scott....@gmail.com> wrote:


On 31 August 2016 at 10:39, Saša Živkov <ziv...@gmail.com> wrote:

Which user is used to run the Gerrit process?
Please try using the same user for running jstack.


I ran as the gerrit user at our most recent crash and got some more useful results which are attached.

Which operation was hanging during the time you were taking the thread dumps?

Callum Scott

unread,
Aug 31, 2016, 11:53:46 AM8/31/16
to Saša Živkov, Repo and Gerrit Discussion
Ok I think I have tracked down the cause of the issue.  We have a reasonably large number of servers that do a git fetch periodically as the same user.  The timings of the fetch had converged and when the user reached their maximum number of concurrent connections it crashed gerrit.  Its the best working theory i have just now.  I upped the number of concurrent connections and tweaked the ssh timeouts and we haven't fallen over again, so far,

--
Callum
Reply all
Reply to author
Forward
0 new messages