SSH connections to Gerrit randomly hang

5,255 views
Skip to first unread message

Alan

unread,
Nov 24, 2011, 6:23:10 PM11/24/11
to Repo and Gerrit Discussion
Hi All,

We came across strange issue with Gerrit, which is quite hard to
debug.
In short: commands requiring SSH connection to Gerrit randomly hang.
I'm not sure if it's a bug in Gerrit, so I wanted to discuss it here
before filing an issue on Gerrit project page.

Any help and inputs are highly appreciated!

1. PROBLEM DESCRIPTION

After upgrading Gerrit from 1.6.1 to 2.2.1, our team started
experiencing problems with SSH connections to Gerrit. Commands
involving SSH operations with Gerrit silently hangs at the beginning.

The problem randomly appears during execution of the following
commands
* ssh -p 29418 user...@gerrit.repo
* git clone
* ssh://gerrit.repo/project
* git fetch or git push
* or any other command requiring SSH interaction with Gerrit.

It’s hard to debug the issue, as no erroneous logs were detected nor
on the client side or the server side.

2. WORKAROUND

Cancel hanged command using by pressing control+c in terminal window
and repeat command again until it succeeds.

3. SYMPTOMS

Problem doesn’t appear during a short time after Gerrit restart.
After some time (1hr or 30min) the problem first appears and after
that it may appear for about every third SSH connection to Gerrit.
If a command started working, it always succeeds. For example, if we
see any output from git clone, it will complete normally. However, if
a command hangs, it hangs at the very beginning, we do not see any
output from git clone.
The problem experienced on all operating systems:: Ubuntu, Mac OS,
Cygwin and Mingw32 (Git Bash for Windows)

4. DEBUG INFORMATION

4.1. CLIENT SIDE

Connecting to Gerrit via SSH hangs
$ ssh -vv -p 29418 user...@gerrit.repo
OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007
debug2: ssh_connect: needpriv 0
debug1: Connecting to gerrit.repo [10.0.10.43] port 29418.
debug1: Connection established.
debug1: identity file /c/Users/username/.ssh/identity type -1
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug2: key_type_from_name: unknown key type 'Proc-Type:'
debug2: key_type_from_name: unknown key type 'DEK-Info:'
debug2: key_type_from_name: unknown key type '-----END'
debug1: identity file /c/Users/username/.ssh/id_rsa type 1
debug1: identity file /c/Users/username/.ssh/id_dsa type -1


Netstat on Windows shows ESTABLISHED connection:
C:\Windows\system32>netstat -tnb
<removed for redability>
TCP 10.8.11.110:53308 10.0.10.43:29418 ESTABLISHED
[ssh.exe]
<removed for redability>

Gerrit show-connection doesn’t list the connection initiated above:
$ ssh -p 29418 user...@gerrit.repo gerrit show-connections;
Enter passphrase for key '/c/Users/username/.ssh/id_rsa':
Session Start Idle User Remote Host
--------------------------------------------------------------
5e0d9732 01:39:05 00:00:00 username 10.8.11.110
--
I believe, that the only connection shown, is the gerrit show-
connections command.


4.2 SERVER SIDE

user...@gerrit.repo:~$ sudo netstat -tnp | grep GerritCodeRev | grep
10.8.11.110
tcp 0 0 ::ffff:10.0.10.43:29418 ::ffff:
10.8.11.110:53308 ESTABLISHED 503 20435214 12095/
GerritCodeRev

In Gerrit’s error_log we can only see “Connection reset by peer”
exceptions (result of ^C on the client when command hangs)
[2011-11-25 01:34:49,957] WARN
org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at
org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:
214)
at
org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:
42)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:
673)
<remainer removed>

In sshd_log
[2011-11-25 00:40:22,580 +0300] 3e054306 aeneev a/5 LOGIN FROM
10.8.11.110
[2011-11-25 00:40:28,228 +0300] 3e054306 aeneev a/5 'gerrit show-
connections' 6ms 8ms 0
[2011-11-25 00:40:29,081 +0300] 3e054306 aeneev a/5 LOGOUT

Matthias Sohn

unread,
Nov 24, 2011, 6:54:39 PM11/24/11
to Alan, Repo and Gerrit Discussion
2011/11/25 Alan <ene...@gmail.com>
Capture some thread dumps on the server while you have hanging requests.
The dump's stack traces should help to find out what's going on.

--
Matthias

Alan

unread,
Nov 25, 2011, 6:11:59 AM11/25/11
to Repo and Gerrit Discussion
Hi Matthias,

Thank you for reply.

I've captured two dumps, both of them say that there is one deadlock.

Dump 1: https://docs.google.com/document/pub?id=1yn3LyTaB1Wwvuvxwxg7pJ00UQai5Dq-KbkiK6qeIG2g
Dump 2: https://docs.google.com/document/pub?id=1UkIqO7B_BCqEPVM7MWIDXMwjRX1tkIDoAblnjMjiKd0

Deadlock related quote follows.

Found one Java-level deadlock:
=============================
"NioProcessor-3":
waiting to lock monitor 0x085c5530 (object 0x959c9ac0, a
com.google.gerrit.sshd.CommandFactoryProvider$Trampoline),
which is held by "SshCommandStart-1"
"SshCommandStart-1":
waiting to lock monitor 0x092e9828 (object 0x959c9680, a
java.lang.Object),
which is held by "NioProcessor-3"
Java stack information for the threads listed above:
===================================================
"NioProcessor-3":
at com.google.gerrit.sshd.CommandFactoryProvider
$Trampoline.destroy(CommandFactoryProvider.java:186)
- waiting to lock <0x959c9ac0> (a
com.google.gerrit.sshd.CommandFactoryProvider$Trampoline)
at org.apache.sshd.server.channel.ChannelSession
$1.operationComplete(ChannelSession.java:175)
at
org.apache.sshd.common.future.DefaultSshFuture.notifyListener(DefaultSshFuture.java:
339)
at
org.apache.sshd.common.future.DefaultSshFuture.notifyListeners(DefaultSshFuture.java:
324)
at
org.apache.sshd.common.future.DefaultSshFuture.setValue(DefaultSshFuture.java:
252)
at
org.apache.sshd.common.future.DefaultCloseFuture.setClosed(DefaultCloseFuture.java:
44)
at
org.apache.sshd.common.channel.AbstractChannel.handleClose(AbstractChannel.java:
111)
- locked <0x959c9680> (a java.lang.Object)
at
org.apache.sshd.common.session.AbstractSession.channelClose(AbstractSession.java:
979)
at
org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:
227)
at
org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:
523)
at
org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:
226)
- locked <0x959cd998> (a java.lang.Object)
at
org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:
58)
at org.apache.mina.core.filterchain.DefaultIoFilterChain
$TailFilter.messageReceived(DefaultIoFilterChain.java:716)
at
org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:
434)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access
$1200(DefaultIoFilterChain.java:46)
<truncated>
"SshCommandStart-1":
at
org.apache.sshd.common.future.DefaultSshFuture.addListener(DefaultSshFuture.java:
274)
- waiting to lock <0x959c9680> (a java.lang.Object)
at
org.apache.sshd.server.channel.ChannelSession.close(ChannelSession.java:
172)
at
org.apache.sshd.server.channel.ChannelSession.closeShell(ChannelSession.java:
542)
<truncated>
[ Quote end ]

Looks like a bug?

Thanks,
Alan

On Nov 24, 3:54 pm, Matthias Sohn <matthias.s...@googlemail.com>
wrote:
> 2011/11/25 Alan <enee...@gmail.com>


>
> > Hi All,
>
> > We came across strange issue with Gerrit, which is quite hard to
> > debug.
> > In short: commands requiring SSH connection to Gerrit randomly hang.
> > I'm not sure if it's a bug in Gerrit, so I wanted to discuss it here
> > before filing an issue on Gerrit project page.
>
> > Any help and inputs are highly appreciated!
>
> > 1. PROBLEM DESCRIPTION
>
> > After upgrading Gerrit from 1.6.1 to 2.2.1, our team started
> > experiencing problems with SSH connections to Gerrit. Commands
> > involving SSH operations with Gerrit silently hangs at the beginning.
>
> > The problem randomly appears during execution of the following
> > commands

> >  * ssh -p 29418 usern...@gerrit.repo


> >  * git clone
> >  * ssh://gerrit.repo/project
> >  * git fetch or git push
> >  * or any other command requiring SSH interaction with Gerrit.
>
> > It’s hard to debug the issue, as no erroneous logs were detected nor
> > on the client side or the server side.
>
> > 2. WORKAROUND
>
> >  Cancel hanged command using by pressing control+c in terminal window
> > and repeat command again until it succeeds.
>
> > 3. SYMPTOMS
>
> >  Problem doesn’t appear during a short time after Gerrit restart.
> > After some time (1hr or 30min) the problem first appears and after
> > that it may appear for about every third SSH connection to Gerrit.
> > If a command started working, it always succeeds. For example, if we
> > see any output from git clone, it will complete normally. However, if
> > a command hangs, it hangs at the very beginning, we do not see any
> > output from git clone.
> > The problem experienced on all operating systems:: Ubuntu, Mac OS,
> > Cygwin and Mingw32 (Git Bash for Windows)
>
> > 4. DEBUG INFORMATION
>
> > 4.1. CLIENT SIDE
>
> > Connecting to Gerrit via SSH hangs

> > $ ssh -vv -p 29418 usern...@gerrit.repo


> > OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007
> > debug2: ssh_connect: needpriv 0
> > debug1: Connecting to gerrit.repo [10.0.10.43] port 29418.
> > debug1: Connection established.
> > debug1: identity file /c/Users/username/.ssh/identity type -1
> > debug2: key_type_from_name: unknown key type '-----BEGIN'
> > debug2: key_type_from_name: unknown key type 'Proc-Type:'
> > debug2: key_type_from_name: unknown key type 'DEK-Info:'
> > debug2: key_type_from_name: unknown key type '-----END'
> > debug1: identity file /c/Users/username/.ssh/id_rsa type 1
> > debug1: identity file /c/Users/username/.ssh/id_dsa type -1
>
> > Netstat on Windows shows ESTABLISHED connection:
> > C:\Windows\system32>netstat -tnb
> > <removed for redability>
> >  TCP    10.8.11.110:53308      10.0.10.43:29418       ESTABLISHED
> >  [ssh.exe]
> > <removed for redability>
>
> > Gerrit show-connection doesn’t list the connection initiated above:

> > $ ssh -p 29418 usern...@gerrit.repo gerrit show-connections;


> > Enter passphrase for key '/c/Users/username/.ssh/id_rsa':
> > Session     Start     Idle   User            Remote Host
> > --------------------------------------------------------------
> > 5e0d9732 01:39:05 00:00:00  username         10.8.11.110
> > --
> > I believe, that the only connection shown, is the gerrit show-
> > connections command.
>
> > 4.2 SERVER SIDE
>

> > usern...@gerrit.repo:~$ sudo netstat -tnp | grep GerritCodeRev | grep

Martin Fick

unread,
Nov 25, 2011, 12:42:58 PM11/25/11
to Alan, Repo and Gerrit Discussion
Are you sure you have enough db connections setup (dbpool) ? If there are too few, deadlock can occur. That may be unrelated, but its worth checking.

I do think there is a bug somewhere in the command start path because I get null pointer exceptions there every now and then. I have been meaning to debug that section lately.

Alan <ene...@gmail.com> wrote:

>--
>To unsubscribe, email repo-discuss...@googlegroups.com
>More info at http://groups.google.com/group/repo-discuss?hl=en

Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum

Alan

unread,
Nov 26, 2011, 10:01:54 PM11/26/11
to Repo and Gerrit Discussion
Martin, thank you for the suggestion.

The problem seems to be gone after I increased db connections pool
from default 8 to 64.

A bug was filed for this deadlock: http://code.google.com/p/gerrit/issues/detail?id=1162

-Alan

On Nov 25, 9:42 am, Martin Fick <mf...@codeaurora.org> wrote:
> Are you sure you have enough db connections setup (dbpool) ?  If there are too few, deadlock can occur.  That may be unrelated, but its worth checking.
>
>  I do think there is a bug somewhere in the command start path because I get null pointer exceptions there every now and then.  I have been meaning to debug that section lately.
>
>
>
>
>
>
>
>
>
> Alan <enee...@gmail.com> wrote:
> >Hi Matthias,
>
> >Thank you for reply.
>
> >I've captured two dumps, both of them say that there is one deadlock.
>
> >Dump 1:

> >https://docs.google.com/document/pub?id=1yn3LyTaB1Wwvuvxwxg7pJ00UQai5...
> >Dump 2:
> >https://docs.google.com/document/pub?id=1UkIqO7B_BCqEPVM7MWIDXMwjRX1t...

> >More info athttp://groups.google.com/group/repo-discuss?hl=en

Alan

unread,
Nov 28, 2011, 4:55:00 PM11/28/11
to Repo and Gerrit Discussion
Just to update the thread.

Increasing db connections pool did not cure the problem.
The problem still remains, but it's harder to reproduce now.

-Alan

Remy Bohmer

unread,
Dec 1, 2011, 9:55:58 AM12/1/11
to Repo and Gerrit Discussion, Alan
Hi All,

>> A bug was filed for this deadlock:http://code.google.com/p/gerrit/issues/detail?id=1162
>>
>> -Alan

We seem to have similar problems on the gerrit-SSH connection.
In our case it is not every third ssh connection that fails, but every 2nd.

It ran for months without any issue, the issue started when we moved
the gerrit server from a physical machine to a corporate vmware server
solution last week.
We use the internal H2 database of Gerrit. (we use version 2.2.1 on a
Ubuntu box x86_64 10.04 LTS)

Below I have a few stacktraces of an aborted session.
I also have a wireshark trace that shows a good and bad situation.
(see attached.) It is clear that gerrit stops communication to the
ssh-client. The client is waiting for an answer, but that does is
never getting transmitted.

Strange as well is the case that 'netstat -an | grep 29418' on the
old standalone system reports this line only:
tcp6 0 0 :::29418 :::* LISTEN

while the new machine shows this on a fresh booted gerrit configuration:
tcp6 0 0 :::29418 :::* LISTEN
tcp6 0 0 ???.???.150.82:51739 ???.???.150.82:29418 ESTABLISHED
tcp6 0 0 ???.???.150.82:29418 ???.???.150.82:51739 ESTABLISHED

Kind regards,

Remy

[2011-12-01 13:58:06,412] WARN


org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)


at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:214)
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:42)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:673)

at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:646)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:635)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1079)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-01 14:23:34,901] WARN
org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
at sun.nio.ch.IOUtil.write(IOUtil.java:93)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:221)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:42)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:928)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:852)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:777)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$500(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1084)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-01 14:36:11,858] WARN
org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
at sun.nio.ch.IOUtil.write(IOUtil.java:93)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:221)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:42)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:928)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:852)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:777)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$500(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1084)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-01 14:36:12,490] WARN
org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
at sun.nio.ch.IOUtil.write(IOUtil.java:93)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:221)
at org.apache.mina.transport.socket.nio.NioProcessor.write(NioProcessor.java:42)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.writeBuffer(AbstractPollingIoProcessor.java:928)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flushNow(AbstractPollingIoProcessor.java:852)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:777)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$500(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1084)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

gerrit-ok
gerrit-bad

Remy Bohmer

unread,
Dec 9, 2011, 9:11:46 AM12/9/11
to Repo and Gerrit Discussion, Alan
Hello All,

2011/12/1 Remy Bohmer <li...@bohmer.net>:


> Hi All,
>
>>> A bug was filed for this deadlock:http://code.google.com/p/gerrit/issues/detail?id=1162
>>>
>>> -Alan
>
> We seem to have similar problems on the gerrit-SSH connection.
> In our case it is not every third ssh connection that fails, but every 2nd.
>
> It ran for months without any issue, the issue started when we moved
> the gerrit server from a physical machine to a corporate vmware server
> solution last week.
> We use the internal H2 database of Gerrit. (we use version 2.2.1 on a
> Ubuntu box x86_64 10.04 LTS)

FYI: We moved Jenkins back to an independent system, and restored the
gerrit plugin to an older revision.
Since that moment the SSH connection of gerrit has not hung once. Now
we need to investigate if it is the plugin, or the fact that jenkins
and gerrit were both running on the same system were causing this.
Since more people seem to report this hangup in Gerrit, and the first
reports started in the same week, the gerrit-trigger plugin is a
highly suspect. Maybe the jenkins plugin is triggering a bug in
Gerrit, or the jenkins plugin is buggy in itself.

Kind regards,

Remy

Karen

unread,
Dec 29, 2011, 2:24:54 AM12/29/11
to Repo and Gerrit Discussion
I also encountered this issue. My gerrit version is 2.2.1, and also
install gerrit trigger plugin in jenkins.
BTW, how to capture some thread dumps from server?

Thanks a lot !

Matthias Sohn

unread,
Dec 29, 2011, 11:48:30 AM12/29/11
to Karen, Repo and Gerrit Discussion
2011/12/29 Karen <wuyanp...@gmail.com>

I also encountered this issue. My gerrit version is 2.2.1, and also
install gerrit trigger plugin in jenkins.
BTW, how to capture some thread dumps from server?

use jstack [1] or visualvm [2] to create thread dumps from the server. 

Remy Bohmer

unread,
Jan 4, 2012, 11:36:29 AM1/4/12
to Wu Yanping, Repo and Gerrit Discussion
Hi,

2011/12/30 Wu Yanping <wuyanp...@gmail.com>:
> Thanks Remy.
> Can you also tell me your current plugin version. And also the new version
> you have installed before.
>
> Thanks a lot!

Here are the versions that are working OK now:
* Jenkins: 1.442
* Gerrit Trigger: 2.3.0
* Git plugin: 1.1.9

The setup that gave problems had these versions:
* Jenkins: 1.441
* Gerrit Trigger: 2.3.1
* Git Plugin: 1.1.13

Problems may be related to Gerrit Trigger 2.3.1, but we are not sure
yet since the working environment is running on a different machine as
well! (read: different hardware, not different distro and same SW
configuration)

Kind regards,

Remy

Lundh, Gustaf

unread,
Jan 5, 2012, 6:32:04 AM1/5/12
to Remy Bohmer, Wu Yanping, Repo and Gerrit Discussion
Hi,

Just to add some additional information; I created the original issue-1162 and when this issue was first seen, I was running Gerrit on a physical machine, without the Gerrit Trigger plug-in (or even Jenkins for that sake). The internal H2-db was used at the time.

So I would not blame the Gerrit Trigger plug-in (or Jenkins) for causing this issue to appear.

Best regards
Gustaf

Hi,

Kind regards,

Remy

--

Remy Bohmer

unread,
Jan 5, 2012, 7:02:17 AM1/5/12
to Lundh, Gustaf, Wu Yanping, Repo and Gerrit Discussion
Hi,

2012/1/5 Lundh, Gustaf <Gustaf...@sonyericsson.com>:


> Hi,
>
> Just to add some additional information; I created the original issue-1162 and when this issue was first seen, I was running Gerrit on a physical machine, without the Gerrit Trigger plug-in (or even Jenkins for that sake). The internal H2-db was used at the time.
>
> So I would not blame the Gerrit Trigger plug-in (or Jenkins) for causing this issue to appear.

Okay, I understand that, but it is not sure if we are all talking
about the same issue here (although I feel it is the same issue)
In your report you talked about every 3rd connection to hang, and in
our case it was every 2nd.
In your case you talked about a random problem, I talk about a
reproducible problem.

Even so, in the end the ssh daemon in gerrit is hanging, so there is
where the bug is likely to be located.
The reason to look into the Jenkins plugins is that it seems to
_trigger_ the bug. With the old setup/plugins it is already running
for many weeks, while with the newer plugins it can not run for 5
minutes...

So, first we wanted to find out which version of the Jenkins plugins
is triggering the error condition, then find out what has been changed
in those plugins, and then describe a reproducible situation to make
gerrit hang, such that it is no longer 'random' but fixable ;-)
In the mean time until it is fixed, it might be a workaround for
others encountering the same issue to check those plugins.

Kind regards,

Remy

Lundh, Gustaf

unread,
Jan 5, 2012, 8:28:45 AM1/5/12
to Remy Bohmer, Wu Yanping, Repo and Gerrit Discussion
I think I skimmed through the thread a bit too quickly. Sorry about that.

Yes. I'm pretty sure we are talking about two quite different issues in this thread.

Looking at the stack-traces attached to 1162, I am pretty certain that Alan has stumbled into the same bug as I did, when I opened the issue.

However, regarding the Gerrit-trigger plug-in, we have also seen a issue similar as yours, E.g. where Gerrit's stream-events is no longer being sent to the Gerrit-trigger plug-in. And just like you, we also dumped some packets and noticed that no data was sent from Gerrit at all. However, both our Jenkins-server and Gerrit-server are running on independent physical machines.

If you can help out pinpoint the Gerrit Trigger version that caused your issue, I and Robert (the Gerrit Trigger author) can help out looking into the Gerrit-trigger plug-in. He sits next to me at the office.

Best regards
Gustaf

-----Original Message-----
From: linux.bo...@gmail.com [mailto:linux.bo...@gmail.com] On Behalf Of Remy Bohmer
Sent: den 5 januari 2012 13:02
To: Lundh, Gustaf
Cc: Wu Yanping; Repo and Gerrit Discussion
Subject: Re: SSH connections to Gerrit randomly hang

Remy Bohmer

unread,
Jan 5, 2012, 8:57:13 AM1/5/12
to Lundh, Gustaf, Wu Yanping, Repo and Gerrit Discussion, bartvd...@gmail.com
Hi,

2012/1/5 Lundh, Gustaf <Gustaf...@sonyericsson.com>:


> I think I skimmed through the thread a bit too quickly. Sorry about that.
>
> Yes. I'm pretty sure we are talking about two quite different issues in this thread.
>
> Looking at the stack-traces attached to 1162, I am pretty certain that Alan has stumbled into the same bug as I did, when I opened the issue.
>
> However, regarding the Gerrit-trigger plug-in, we have also seen a issue similar as yours, E.g. where Gerrit's stream-events is no longer being sent to the Gerrit-trigger plug-in.
> And just like you, we also dumped some packets and noticed that no data was sent from Gerrit at all.

Well when the situation hangs, then _all_ ssh connections are hanging,
and every 2nd ssh connection works.
It is not only that Gerrit stream-events are no longer being sent,
everything via ssh hangs... This also means that even 'repo sync'
commands hang...
As said before we still need to investigate the real trigger that is
causing this. (2 options: gerrit-trigger or
'gerrit+jenkins-on-same-machine')

> However, both our Jenkins-server and Gerrit-server are running on independent physical machines.

We are not sure yet if this is related to the problem, but it was one
of the differences.

> If you can help out pinpoint the Gerrit Trigger version that caused your issue, I and Robert (the Gerrit Trigger author) can help out looking into the Gerrit-trigger plug-in. He sits next to me at the office.

Great. We will keep you informed.

Kind regards,

Remy

GS

unread,
Feb 2, 2012, 8:10:24 AM2/2/12
to Repo and Gerrit Discussion
We've just hit this same issue. We've been running with 2.1.6.1 for
many months with Jenkins Trigger running quite happily over this time.
Just recently we upgraded to 2.1.8 (dont ask why we didnt go to
2.2.2!) and suddenly our Gerrit server has started hanging as per the
original post. It tends to hang within 30 mins of being started.

We've spent some time looking into why, and I think I have the
answer.
1) gerrit-sshd/src/main/java/com/google/gerrit/sshd/
CommandFactoryProvider.java creates a new thread to run the requested
ssh command. It holds the CommandFactoryProvider lock while it runs
the command.
2) The SSH connection is terminated while the above command is
running. Presumably Jenkins Trigger is regularly causing this
termination in some way, but in theory it could be any caller. They
key is that it has to be terminated before the command completes.
3) The Network IO thread sees the SSH connection die, so it tries to
destroy the CommandFactoryProvider, but it gets blocked trying to
obtain this classes lock. Note that it also holds the low level
network object lock.
4) The CommandFactoryProvider completes and runs its atexit functions
and ends up trying to obtain the low level network object lock ... and
you are now deadlocked.

The code to create a new thread to run the command was put into 2.1.7
by Shawn Pierce: http://code.google.com/p/gerrit/source/detail?r=d6296556c6838754477d972e2fbbc2227a2bbb33
And, indeed, if I remove the new section between lines 111 and 129 of
the new file that spawns the new thread rather than doing it as it was
previously, the gerrit server no longer hangs - presumably because it
runs the command in the same thread as the termination is being
handled and hence can no longer deadlock.

I dont see any obvious fix for this issue in 2.2.2, so I assume this
is still a real issue. I'm not sure exactly how to fix it ... clearly
reverting the change as I've just suggested is one option, but
presumably the original commit was made to fix some performance issue.
Maybe just relinquishing the lock while running the command would do
the job.

Emmanuel Grumbach

unread,
Mar 18, 2012, 3:01:09 AM3/18/12
to repo-d...@googlegroups.com
Hi all,

so I am suffering from the same issue:

Gerrit 2.2.1
Jenkins 1.444
Gerrit Trigger: 2.3.1

I see a lot of:

[2012-03-16 16:53:37,201] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:214)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:42)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:673)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:646)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:635)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1079)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


which happens together with Jenkins's ls-project command.
When I started to see that in the logs, I also noticed that SSH commands began to hang. Not only Jenkin's but also pushes etc...
Has anybody anything new regarding this issue ?
Has anybody tested 2.3-X ?
Thanks

Saša Živkov

unread,
Mar 20, 2012, 5:41:53 PM3/20/12
to Emmanuel Grumbach, repo-d...@googlegroups.com

Have you created a few thread dumps of the Gerrit server during the
time when SSH
commands hang? Looking at what the threads are doing might give a hint
for the root cause.

Run "netstat -a --numeric-ports --numeric-hosts | grep 29418" on the
Gerrit server
box to find out how many SSH connections, and from which IP addresses,
are open at the moment.

Have you checked CPU usage at during the time when SSH connections hang?
Is it low or high?

Saša Živkov

Emmanuel Grumbach

unread,
Mar 21, 2012, 4:09:58 AM3/21/12
to Saša Živkov, repo-d...@googlegroups.com
Thanks !

I have a Jenkins master on the same running on the same machine as my
Gerrit server.

I ran show-connections from a remote client.
The I can see the connections (and they look fine):

Session Start Idle User Remote Host
--------------------------------------------------------------

780cedb3 21:36:52 00:00:06 jenkins <SERVER_NAME>
32e3f629 10:06:59 00:00:00 MYLOGIN <CLIENT NAME>

After I run show-connections a few times in a row I can get:

iapp029> netstat -a --numeric-ports --numeric-hosts | grep 29418
tcp 0 0 0.0.0.0:29418 0.0.0.0:* LISTEN
tcp 0 0 <SERVER_IP>:29418 <SERVER_IP>:42357
ESTABLISHED <=== Probably Jenkins
tcp 0 0 <SERVER_IP>:42357 <SERVER_IP>:29418
ESTABLISHED <=== Probably Jenkins
tcp 0 0 <SERVER_IP>:29418 <CLIENT_IP>:42843
TIME_WAIT <=== disappears after a few seconds
tcp 0 0 <SERVER_IP>:29418 <CLIENT_IP>:42844
TIME_WAIT <=== disappears after a few seconds
tcp 0 0 <SERVER_IP>:29418 <CLIENT_IP>:42845
TIME_WAIT <=== disappears after a few seconds
tcp 0 0 <SERVER_IP>:29418 <CLIENT_IP>:42847
TIME_WAIT <=== disappears after a few seconds

All this is in normal situation (I haven' been able to reproduce the
issue yet). I will send the same data when the issue occurs


Emmanuel Grumbach
egru...@gmail.com

Emmanuel Grumbach

unread,
Mar 21, 2012, 4:10:27 AM3/21/12
to Saša Živkov, repo-d...@googlegroups.com
And, no I haven't created dumps of the threads yet.

Thanks for you help !

Emmanuel Grumbach
egru...@gmail.com

Emmanuel Grumbach

unread,
Mar 22, 2012, 4:19:13 AM3/22/12
to Saša Živkov, repo-d...@googlegroups.com
here is the output of netstat while it hangs:

cp 0 0 0.0.0.0:29418 0.0.0.0:* LISTEN
tcp 0 0 <SERVER_IP>:29418 <SERVER_IP>:42357 ESTABLISHED

tcp 0 0 <SERVER_IP>:50150 <SERVER_IP>:29418 ESTABLISHED
tcp 0 0 <SERVER_IP>:54558 <SERVER_IP>:29418 ESTABLISHED


tcp 0 0 <SERVER_IP>:42357 <SERVER_IP>:29418 ESTABLISHED

tcp 20 0 <SERVER_IP>:29418 <SERVER_IP>:54558 ESTABLISHED
tcp 20 0 <SERVER_IP>:29418 <SERVER_IP>:50150 ESTABLISHED
tcp 0 0 <SERVER_IP>:29418 <STUCK_CLIENT1>:43509 ESTABLISHED
tcp 0 0 <SERVER_IP>:29418 <STUCK_CLIENT2>:53704 ESTABLISHED
tcp 0 0 <SERVER_IP>:29418 <STUCK_CLIENT3>:35357
ESTABLISHED

show-connections:


Session Start Idle User Remote Host
--------------------------------------------------------------

780cedb3 21:36:52 00:00:10 jenkins server
af1e9dd6 01:09:37 09:08:15 jenkins server
acb0eb89 10:17:52 00:00:00 egrumbac client from which I ran
the show-connections command


I will try to learn how to dump the thread stack trace.

Emmanuel Grumbach
egru...@gmail.com

Matthias Sohn

unread,
Mar 22, 2012, 4:49:35 AM3/22/12
to Emmanuel Grumbach, Saša Živkov, repo-d...@googlegroups.com
2012/3/22 Emmanuel Grumbach <egru...@gmail.com>

here is the output of netstat while it hangs:

cp        0      0 0.0.0.0:29418           0.0.0.0:*               LISTEN
tcp        0      0 <SERVER_IP>:29418      <SERVER_IP>:42357      ESTABLISHED
tcp        0      0 <SERVER_IP>:50150      <SERVER_IP>:29418      ESTABLISHED
tcp        0      0 <SERVER_IP>:54558      <SERVER_IP>:29418      ESTABLISHED
tcp        0      0 <SERVER_IP>:42357      <SERVER_IP>:29418      ESTABLISHED
tcp       20      0 <SERVER_IP>:29418      <SERVER_IP>:54558      ESTABLISHED
tcp       20      0 <SERVER_IP>:29418      <SERVER_IP>:50150      ESTABLISHED
tcp        0      0 <SERVER_IP>:29418      <STUCK_CLIENT1>:43509     ESTABLISHED
tcp        0      0 <SERVER_IP>:29418      <STUCK_CLIENT2>:53704     ESTABLISHED
tcp        0      0 <SERVER_IP>:29418      <STUCK_CLIENT3>:35357
 ESTABLISHED

show-connections:
Session     Start     Idle   User            Remote Host
--------------------------------------------------------------
780cedb3 21:36:52 00:00:10  jenkins         server
af1e9dd6 01:09:37 09:08:15  jenkins         server
acb0eb89 10:17:52 00:00:00  egrumbac        client from which I ran
the show-connections command


I will try to learn how to dump the thread stack trace.


to get thread dumps find the process id for the server process and simply run
jstack <pid>

--
Matthias

Luthander, Fredrik

unread,
Mar 22, 2012, 4:54:58 AM3/22/12
to Emmanuel Grumbach, repo-d...@googlegroups.com
Hi Emmanuel!

I've made this script, and it seems to work:

$ cat bin/gerrit-jstack
#!/bin/bash
#
# Script to capture jstack from running Gerrit process.

OUTPUT=~gerrit2/logs/tomcat-stack-`date +%Y%m%d-%H%M%S`.log
PID=`jps | grep gerrit.war | cut -f 1 -d ' '`

jstack $PID > $OUTPUT
$

Everytime I need a stack trace I just call that script and it takes care to put a file with the stack trace output in a directory I have for that purpose.
It doesn't have any error handling whatsoever, but on the other hand I always execute it manually, and also it's quite trivial in nature, so it's sufficient for me.

Please feel free to copy and modify it to your own needs!

--
Best regards,
    Fredrik Luthander
Sony Mobile Communications AB

> -----Original Message-----
> From: repo-d...@googlegroups.com [mailto:repo-
> dis...@googlegroups.com] On Behalf Of Emmanuel Grumbach
> Sent: torsdag den 22 mars 2012 09:19
> To: Saša Živkov
> Cc: repo-d...@googlegroups.com
> Subject: Re: SSH connections to Gerrit randomly hang
>
> a:42)
> >>>>         at
> >>>>
> org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPo
> llingIoProcessor.java:673)
> >>>>         at
> >>>>
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(Abstrac
> tPollingIoProcessor.java:646)
> >>>>         at
> >>>>
> org.apache.mina.core.polling.AbstractPollingIoProcessor.process(Abstrac
> tPollingIoProcessor.java:635)
> >>>>         at
> >>>>
> org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(Abst
> ractPollingIoProcessor.java:67)
> >>>>         at
> >>>>
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(A
> bstractPollingIoProcessor.java:1079)
> >>>>         at
> >>>>
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.
> java:64)
> >>>>         at
> >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut
> or.java:886)
> >>>>         at
> >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
> ava:908)
> >>>>         at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>>
> >>>> which happens together with Jenkins's ls-project command.
> >>>> When I started to see that in the logs, I also noticed that SSH
> commands
> >>>> began to hang. Not only Jenkin's but also pushes etc...
> >>>> Has anybody anything new regarding this issue ?
> >>>
> >>> Have you created a few thread dumps of the Gerrit server during the
> >>> time when SSH
> >>> commands hang? Looking at what the threads are doing might give a
> hint
> >>> for the root cause.
> >>>
> >>> Run "netstat -a --numeric-ports --numeric-hosts | grep 29418" on
> the
> >>> Gerrit server
> >>> box to find out how many SSH connections, and from which IP
> addresses,
> >>> are open at the moment.
> >>>
> >>> Have you checked CPU usage at during the time when SSH connections
> hang?
> >>> Is it low or high?
> >>>
> >>> Saša Živkov
>

Saša Živkov

unread,
Mar 22, 2012, 5:45:49 AM3/22/12
to Luthander, Fredrik, Emmanuel Grumbach, repo-d...@googlegroups.com
On Thu, Mar 22, 2012 at 9:54 AM, Luthander, Fredrik
<Fredrik....@sonymobile.com> wrote:
> Hi Emmanuel!
>
> I've made this script, and it seems to work:
>
> $ cat bin/gerrit-jstack
> #!/bin/bash
> #
> # Script to capture jstack from running Gerrit process.
>
> OUTPUT=~gerrit2/logs/tomcat-stack-`date +%Y%m%d-%H%M%S`.log
> PID=`jps | grep gerrit.war | cut -f 1 -d ' '`

Gerrits PID can be found in ...review_site/logs/gerrit.pid file.
This should be enough to generate thread dump of Gerrit process:
jstack $(cat gerrit.pid)

Saša

Emmanuel Grumbach

unread,
Mar 22, 2012, 6:31:58 AM3/22/12
to Saša Živkov, Luthander, Fredrik, repo-d...@googlegroups.com
Yep - thanks.
I just need to request from IT to install jstack - sigh -
Emmanuel Grumbach
egru...@gmail.com

seonguk.baek

unread,
Mar 28, 2012, 8:41:31 AM3/28/12
to repo-d...@googlegroups.com, Saša Živkov, Luthander, Fredrik
Dear all....

We have same gerrit hang problem in gerrit 2.2.2.1
it makes me and all users crazy...

We use 32core server and normally use 70% of CPUs.
But when gerrit server started hanging, only 1 cpu core work 100% and the others 0% during about 10~30 seconds.

1. below log is gerrit error log. (this logs occured when hang is gone)

[2012-03-28 21:25:01,187] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
org.apache.mina.core.write.WriteTimeoutException
        at org.apache.mina.core.session.AbstractIoSession.notifyWriteTimeout(AbstractIoSession.java:1304)
        at org.apache.mina.core.session.AbstractIoSession.notifyIdleSession(AbstractIoSession.java:1286)
        at org.apache.mina.core.session.AbstractIoSession.notifyIdleness(AbstractIoSession.java:1263)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.notifyIdleSessions(AbstractPollingIoProcessor.java:748)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1090)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
[2012-03-28 21:25:01,188] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
org.apache.mina.core.write.WriteToClosedSessionException
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:619)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:570)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeSessions(AbstractPollingIoProcessor.java:540)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1087)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
[2012-03-28 21:25:01,197] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
org.apache.mina.core.write.WriteTimeoutException
        at org.apache.mina.core.session.AbstractIoSession.notifyWriteTimeout(AbstractIoSession.java:1304)
        at org.apache.mina.core.session.AbstractIoSession.notifyIdleSession(AbstractIoSession.java:1286)
        at org.apache.mina.core.session.AbstractIoSession.notifyIdleness(AbstractIoSession.java:1263)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.notifyIdleSessions(AbstractPollingIoProcessor.java:748)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$700(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1090)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
[2012-03-28 21:25:01,197] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user xxxx.xxx account 795) during git-upload-pack '/platform/external/v8'
org.apache.sshd.common.SshException
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:128)
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:75)
        at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:158)
        at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:138)
        at org.eclipse.jgit.storage.pack.PackOutputStream.write(PackOutputStream.java:124)
        at org.eclipse.jgit.storage.file.PackFile.copyAsIs2(PackFile.java:501)
        at org.eclipse.jgit.storage.file.PackFile.copyAsIs(PackFile.java:325)
        at org.eclipse.jgit.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:162)
        at org.eclipse.jgit.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1118)
        at org.eclipse.jgit.storage.pack.PackWriter.writeObject(PackWriter.java:1089)
        at org.eclipse.jgit.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:161)
        at org.eclipse.jgit.storage.file.WindowCursor.writeObjects(WindowCursor.java:168)
        at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1077)
        at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1065)
        at org.eclipse.jgit.storage.pack.PackWriter.writePack(PackWriter.java:662)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:928)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:789)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:449)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:369)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:53)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:103)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:34)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:69)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:397)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:324)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.sshd.common.channel.Window.waitForSpace(Window.java:146)
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:104)
        ... 32 more

2. below log is jstack dump

"SSH git-upload-pack '/platform/vendor/frameworks/core' (inho.chung)" prio=10 tid=0x00007f31b831e000 nid=0xed52 waiting for monitor entry [0x00007f31a1e9e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.sshd.common.session.AbstractSession.writePacket(AbstractSession.java:328)
- waiting to lock <0x00007f343bcdd7d8> (a java.lang.Object)
at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:120)
- locked <0x00007f354a0b7ae0> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:75)
- locked <0x00007f354a0b7ae0> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:158)
at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:138)
at org.eclipse.jgit.storage.pack.PackOutputStream.write(PackOutputStream.java:124)
at org.eclipse.jgit.storage.file.ByteArrayWindow.write(ByteArrayWindow.java:90)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs2(PackFile.java:470)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs(PackFile.java:325)
at org.eclipse.jgit.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:162)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1118)
at org.eclipse.jgit.storage.pack.PackWriter.writeObject(PackWriter.java:1089)
at org.eclipse.jgit.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:161)
at org.eclipse.jgit.storage.file.WindowCursor.writeObjects(WindowCursor.java:168)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1077)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1065)
at org.eclipse.jgit.storage.pack.PackWriter.writePack(PackWriter.java:662)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:928)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:789)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:449)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:369)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:53)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:103)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:34)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:69)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:397)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:324)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
"SSH git-upload-pack '/apps/android/apps/Richnote' (kyungjin1.park)" prio=10 tid=0x00007f31b8815800 nid=0xed37 waiting for monitor entry [0x00007f31a31b0000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.sshd.common.session.AbstractSession.writePacket(AbstractSession.java:328)
- waiting to lock <0x00007f34d3a32bb8> (a java.lang.Object)
at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:120)
- locked <0x00007f354565afa8> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:75)
- locked <0x00007f354565afa8> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:158)
at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:138)
at org.eclipse.jgit.storage.pack.PackOutputStream.write(PackOutputStream.java:124)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs2(PackFile.java:501)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs(PackFile.java:325)
at org.eclipse.jgit.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:162)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1118)
at org.eclipse.jgit.storage.pack.PackWriter.writeObject(PackWriter.java:1089)
at org.eclipse.jgit.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:161)
at org.eclipse.jgit.storage.file.WindowCursor.writeObjects(WindowCursor.java:168)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1077)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1065)
at org.eclipse.jgit.storage.pack.PackWriter.writePack(PackWriter.java:662)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:928)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:789)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:449)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:369)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:53)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:103)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:34)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:69)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:397)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:324)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

"SSH git-upload-pack '/platform/system/media' (seonhwi.cho)" prio=10 tid=0x00007f31b81b3000 nid=0xed36 waiting for monitor entry [0x00007f31a32b1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.sshd.common.session.AbstractSession.writePacket(AbstractSession.java:328)
- waiting to lock <0x00007f33d2d12f08> (a java.lang.Object)
at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:120)
- locked <0x00007f354d16c958> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:75)
- locked <0x00007f354d16c958> (a org.apache.sshd.common.channel.ChannelOutputStream)
at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:158)
at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:138)
at org.eclipse.jgit.storage.pack.PackOutputStream.write(PackOutputStream.java:124)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs2(PackFile.java:501)
at org.eclipse.jgit.storage.file.PackFile.copyAsIs(PackFile.java:325)
at org.eclipse.jgit.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:162)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1118)
at org.eclipse.jgit.storage.pack.PackWriter.writeObject(PackWriter.java:1089)
at org.eclipse.jgit.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:161)
at org.eclipse.jgit.storage.file.WindowCursor.writeObjects(WindowCursor.java:168)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1077)
at org.eclipse.jgit.storage.pack.PackWriter.writeObjects(PackWriter.java:1065)
at org.eclipse.jgit.storage.pack.PackWriter.writePack(PackWriter.java:662)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:928)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:789)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:449)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:369)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:53)
at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:103)
at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:34)
at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:69)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:397)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:324)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)




And no respones 

2012년 3월 22일 목요일 오후 7시 31분 58초 UTC+9, Emmanuel Grumbach 님의 말:

Saša Živkov

unread,
Mar 28, 2012, 2:54:28 PM3/28/12
to seonguk.baek, repo-d...@googlegroups.com, Luthander, Fredrik
On Wed, Mar 28, 2012 at 2:41 PM, seonguk.baek <baeks...@gmail.com> wrote:
> Dear all....
>
> We have same gerrit hang problem in gerrit 2.2.2.1
> it makes me and all users crazy...
>
> We use 32core server and normally use 70% of CPUs.
> But when gerrit server started hanging, only 1 cpu core work 100% and the
> others 0% during about 10~30 seconds.
>
> 2. below log is jstack dump

This thread dump is not complete, it shows only 3 threads.

All of them are blocked but we need to see complete thread dump in order
to find out which thread is keeping the locks and try to understand why.

seonguk.baek

unread,
Mar 28, 2012, 7:15:47 PM3/28/12
to repo-d...@googlegroups.com, seonguk.baek, Luthander, Fredrik
Thanks for your reply!!!

I've attached full jstack dump..
If you need more information, please let me know.

Thanks



2012년 3월 29일 목요일 오전 3시 54분 28초 UTC+9, zivkov 님의 말:
jstack.log

Saša Živkov

unread,
Mar 29, 2012, 5:38:39 AM3/29/12
to seonguk.baek, repo-d...@googlegroups.com, Luthander, Fredrik
From the thread dump I see that most of the threads are blocked in the
Apache Mina
library. One example is where this thread:
"SSH git-upload-pack '/platform/prebuilt' (seonhwi.cho)" prio=10
tid=0x00007f31c87c7000 nid=0xeb1a waiting for monitor entry
[0x00007f31b66e4000]

java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.sshd.common.session.AbstractSession.writePacket(AbstractSession.java:328)
- locked <0x00007f33d2d12f08> (a java.lang.Object)
blocks the following threads who are all waiting to lock the 0x00007f33d2d12f08:
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/Email'
(seonhwi.cho)" prio=10 tid=0x00007f31c90aa800 nid=0xed57 waiting for
monitor entry [0x00007f31a1a99000]

"SSH git-upload-pack '/platform/system/media' (seonhwi.cho)" prio=10
tid=0x00007f31b81b3000 nid=0xed36 waiting for monitor entry
[0x00007f31a32b1000]
"SSH git-upload-pack '/platform/packages/inputmethods/LatinIME'
(seonhwi.cho)" prio=10 tid=0x0000000001651000 nid=0xed10 waiting for
monitor entry [0x00007f31a46c6000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/LGHome3'
(seonhwi.cho)" prio=10 tid=0x00000000018f8000 nid=0xecff waiting for
monitor entry [0x00007f31a53d3000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/FmRadio'
(seonhwi.cho)" prio=10 tid=0x00007f31b8900800 nid=0xecf0 waiting for
monitor entry [0x00007f31a5fdf000]
"SSH git-upload-pack '/platform/packages/providers/ContactsProvider'
(seonhwi.cho)" prio=10 tid=0x0000000001745800 nid=0xecb9 waiting for
monitor entry [0x00007f31a7dfd000]
"SSH git-upload-pack '/platform/packages/apps/Launcher2'
(seonhwi.cho)" prio=10 tid=0x00007f31b8100000 nid=0xecb0 waiting for
monitor entry [0x00007f31b1b9a000]
"SSH git-upload-pack
'/LG_apps/android/vendor/lge/apps/ApplicationManager' (seonhwi.cho)"
prio=10 tid=0x00007f31d174a000 nid=0xec80 waiting for monitor entry
[0x00007f31aa01f000]
"SSH git-upload-pack '/device/samsung/tuna' (seonhwi.cho)" prio=10
tid=0x00007f31b893f800 nid=0xec4d waiting for monitor entry
[0x00007f31ab836000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/Calendar3'
(seonhwi.cho)" prio=10 tid=0x00007f31d0443800 nid=0xec28 waiting for
monitor entry [0x00007f31ad957000]
"SSH git-upload-pack
'/LG_apps/android/vendor/lge/apps/LGDefaultAccount' (seonhwi.cho)"
prio=10 tid=0x00007f31d1322800 nid=0xec11 waiting for monitor entry
[0x00007f31b0f8d000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/LGEIME'
(seonhwi.cho)" prio=10 tid=0x000000000141a800 nid=0xec0f waiting for
monitor entry [0x00007f31ade5c000]
"SSH git-upload-pack '/platform/packages/apps/Gallery'
(seonhwi.cho)" prio=10 tid=0x00007f31d0668800 nid=0xec00 waiting for
monitor entry [0x00007f31aea68000]
"SSH git-upload-pack '/platform/vendor/lge/apps/HiddenMenu'
(seonhwi.cho)" prio=10 tid=0x00007f31b806f800 nid=0xeb98 waiting for
monitor entry [0x00007f31b26a5000]
"SSH git-upload-pack '/platform/packages/apps/Settings'
(seonhwi.cho)" prio=10 tid=0x00000000019c2800 nid=0xeb7b waiting for
monitor entry [0x00007f31b30af000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/CameraApp'
(seonhwi.cho)" prio=10 tid=0x00007f31b81f0800 nid=0xeb70 waiting for
monitor entry [0x00007f31b3bba000]
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/LGCbReceiver'
(seonhwi.cho)" prio=10 tid=0x00000000016cf000 nid=0xeb42 waiting for
monitor entry [0x00007f31b51d0000]
"SSH git-upload-pack '/platform/system/core' (seonhwi.cho)" prio=10
tid=0x000000000145c800 nid=0xeb33 waiting for monitor entry
[0x00007f31b5fdd000]

the blocking thread is waiting for monitor entry [0x00007f31b66e4000]
but I don't see from the thread
dump what it is. And I don't know how Apache Mina works.

Another example is where this thread:
"SSH git-upload-pack '/LG_apps/android/vendor/lge/apps/LGEIME'
(inho.chung)" prio=10 tid=0x00007f31b88f7800 nid=0xecd0 runnable
[0x00007f31a6dec000]
java.lang.Thread.State: RUNNABLE
at org.bouncycastle.crypto.engines.AESFastEngine.processBlock(Unknown Source)
at org.bouncycastle.crypto.modes.CBCBlockCipher.encryptBlock(Unknown Source)
at org.bouncycastle.crypto.modes.CBCBlockCipher.processBlock(Unknown Source)
at org.bouncycastle.crypto.BufferedBlockCipher.processBytes(Unknown Source)
at org.bouncycastle.jce.provider.JCEBlockCipher$BufferedGenericBlockCipher.processBytes(Unknown
Source)
at org.bouncycastle.jce.provider.JCEBlockCipher.engineUpdate(Unknown Source)
at javax.crypto.Cipher.update(Cipher.java:1611)
at org.apache.sshd.common.cipher.BaseCipher.update(BaseCipher.java:71)
at org.apache.sshd.common.session.AbstractSession.encode(AbstractSession.java:428)
at org.apache.sshd.common.session.AbstractSession.writePacket(AbstractSession.java:329)
- locked <0x00007f343bcdd7d8> (a java.lang.Object)

blocks the following threads:
"SSH git-upload-pack
'/LG_apps/android/vendor/lge/appwidget/Weather3' (inho.chung)" prio=10
tid=0x000000000164d800 nid=0xed0a waiting for monitor entry
[0x00007f31a4aca000]
"SSH git-upload-pack '/platform/vendor/lge/frameworks/resource'
(inho.chung)" prio=10 tid=0x0000000001687800 nid=0xebf4 waiting for
monitor entry [0x00007f31af372000]
"SSH git-upload-pack '/platform/vendor/lge/frameworks/media'
(inho.chung)" prio=10 tid=0x0000000001cf6000 nid=0xebbf waiting for
monitor entry [0x00007f31b0988000]
"SSH git-upload-pack '/platform/vendor/lge/apps/test' (inho.chung)"
prio=10 tid=0x00000000028f7800 nid=0xeb47 waiting for monitor entry
[0x00007f31b4dcc000]
"SSH git-upload-pack '/platform/prebuilt' (inho.chung)" prio=10
tid=0x0000000002b99800 nid=0xead0 waiting for monitor entry
[0x00007f31bcde1000]
"SSH git-upload-pack '/platform/vendor/lge/frameworks/core'


(inho.chung)" prio=10 tid=0x00007f31b831e000 nid=0xed52 waiting for
monitor entry [0x00007f31a1e9e000]

Again, one needs to understand how Apache Mina works in order to
understand what is going on here.


At least one part I understand and this is where this thread:
"SSH git-upload-pack '/platform/frameworks/base' (hd.mo)" prio=10
tid=0x00000000016e4800 nid=0xe98f waiting for monitor entry
[0x00007f31c3e50000]


java.lang.Thread.State: BLOCKED (on object monitor)

at org.eclipse.jgit.storage.file.PackFile.idx(PackFile.java:154)
- locked <0x00007f321759dae0> (a org.eclipse.jgit.storage.file.PackFile)

blocks these:
"SSH git-upload-pack '/platform/frameworks/base' (nr.seo)" prio=10
tid=0x00007f31b83a8800 nid=0xed1c waiting for monitor entry
[0x00007f31a3bba000]
"SSH git-upload-pack '/platform/frameworks/base' (kidong0420.kim)"
prio=10 tid=0x000000000164f000 nid=0xed0c waiting for monitor entry
[0x00007f31a49c8000]
"SSH git-upload-pack '/platform/frameworks/base' (woohyuk.byun)"
prio=10 tid=0x00000000018fc800 nid=0xed07 waiting for monitor entry
[0x00007f31a4dcc000]
"SSH git-upload-pack '/platform/frameworks/base' (eunji.seo)"
prio=10 tid=0x000000000289b000 nid=0xed04 waiting for monitor entry
[0x00007f31a4ecd000]
"SSH git-upload-pack '/platform/frameworks/base' (gerrit)" prio=10
tid=0x00007f31c849f800 nid=0xec76 waiting for monitor entry
[0x00007f31aa523000]
"SSH git-upload-pack '/platform/frameworks/base' (hyeongjin.kim)"
prio=10 tid=0x00007f31d10d6000 nid=0xeba1 waiting for monitor entry
[0x00007f31b22a0000]
"SSH git-upload-pack '/platform/frameworks/base' (jay.sim)" prio=10
tid=0x00007f31c9bfb800 nid=0xea97 waiting for monitor entry
[0x00007f31be4f7000]
"SSH git-upload-pack '/platform/frameworks/base' (sohyun.nam)"
prio=10 tid=0x0000000001b28000 nid=0xe90c waiting for monitor entry
[0x00007f31c4fcc000]
"SSH git-upload-pack '/platform/frameworks/base' (jungyub.jee)"
prio=10 tid=0x00000000014c3800 nid=0xe8d8 waiting for monitor entry
[0x00007f31c57d4000]

This happens because all the threads are trying to fetch from the same
Git repository
and JGit ensures that the pack file index is loaded from one thread
only (and then
reused from other threads).
I don't know if this is only a transient sitation or if reading of the
pack file index really
takes long time. This can only be checked if you make several thread dumps while
SSH connections hang and then compare them. If you would see that the
blocking thread
is keeping the lock in pack index file for too long then this may
indicate some IO performance issues.

Saša

seonguk.baek

unread,
Mar 29, 2012, 5:52:44 AM3/29/12
to repo-d...@googlegroups.com, seonguk.baek, Luthander, Fredrik
Thanks for your reply..

MartinFick said, the problem seems like jvm gabage collection.

so, we checked jvm gc log, full_gc worked during 20~60seconds, and then gerrit service hanged

similar size project in other server, full_gc is not occurred frequently and it work just second 1 ~ 2 seconds.

what is problem?? 

thanks

2012년 3월 29일 목요일 오후 6시 38분 39초 UTC+9, zivkov 님의 말:

Martin Fick

unread,
Mar 29, 2012, 10:31:18 AM3/29/12
to seonguk.baek, repo-d...@googlegroups.com, Luthander, Fredrik

"seonguk.baek" <baeks...@gmail.com> wrote:
>
>MartinFick said, the problem seems like jvm gabage collection.
>
>so, we checked jvm gc log, full_gc worked during 20~60seconds, and then
>
>gerrit service hanged
>
>similar size project in other server, full_gc is not occurred
>frequently
>and it work just second 1 ~ 2 seconds.
>
>what is problem??

I think you are in the best position to tell us, you can better determine any differences between your machine setups, hardware and usage patterns. If it isn't the first two, then it is probably the last one. We have told you that you are clearly loading your server more than any of us do. The downloads of kernel/msm are particularly hard on the gc. You may be able to tune or fix Gerrit so that it can handle such loads, but we have not been able to, we scaled by adding infrastructure. If you can improve things, please let us know how you did it so that we can benefit from your experience,

-Martin

Pursehouse, David

unread,
Oct 14, 2012, 10:46:36 PM10/14/12
to Albin Joy, repo-d...@googlegroups.com, seonguk.baek, Luthander, Fredrik

A couple of things to try:

 

1. Is the Jenkins user’s ssh connection actually working?  I.e. are you able to connect to gerrit over ssh on the command line using the Jenkins user’s keys?

2. Do you see the events if you run `gerrit stream-events` on the command line (with the Jenkins user account)?

3. Are there any errors or warnings in the Gerrit Trigger’s logs?

 

-David

 

 

From: repo-d...@googlegroups.com [mailto:repo-d...@googlegroups.com] On Behalf Of Albin Joy
Sent: Saturday, October 13, 2012 11:18 AM
To: repo-d...@googlegroups.com
Cc: seonguk.baek; Luthander, Fredrik
Subject: Re: SSH connections to Gerrit randomly hang

 

Hi All,

I am also facing the same problem with Ssh connection hangup. GerritTrigger is a not receiving any "gerrit's stream-event" from Gerrit.

Please anybody can explain me whether it is bug in GerritTrigger, Gerrit or a configuration mistake.

And please suggest me some solution.It will be so helpful for me.

We are blocking with this issue.

--
To unsubscribe, email repo-discuss...@googlegroups.com

Albin Joy

unread,
Oct 15, 2012, 2:38:23 AM10/15/12
to repo-d...@googlegroups.com, Albin Joy, seonguk.baek, Luthander, Fredrik
Hi David,

No error or warning find in the GerritTrigger log
It seems only the particular Ssh connection only getting this problem randomly. 
If I use  Query and Trigger Gerrit Patches option in GerritTrigger, I can able to see the event in the list and from there it is able to trigger build.
But after that also the "gerrit's stream-events" are not receiving.
If I restart GerritTrigger from that time everything is working properly.

Thanks & Regards
Albin Joy

Ragesh Nair

unread,
Oct 15, 2012, 3:04:28 AM10/15/12
to Albin Joy, Robert Sandell, repo-d...@googlegroups.com, seonguk.baek, Luthander, Fredrik
We are also getting this issue and for the time being we are restarting Gerrit Trigger to fix it.

My understanding on this is, due to some load or stability issues between Jenkins and Gerrit master, the SSH connection thread of Gerrit Trigger gets into a hung state. The Gerrit master then recovers after sometime but Gerrit Trigger cant come out of the hung state. We have to restart the gerrit-trigger.

We have observed this randomly on one or more (but not all) of our Jenkins instances connected to our single Gerrit master.

#Adding Bobby (Robert Sandell) in loop to give its latest updates.

-Ragesh  Nair

Anushree Ganjam

unread,
May 10, 2016, 7:25:41 AM5/10/16
to Repo and Gerrit Discussion, ene...@gmail.com
Hi,
Where can I see the Gerrit Trigger’s logs ?
Whenever i Restart the gerrit trigger plugin, only then the Jenkins build starts.

Many times when gerrit event happens,No build gets triggered.

Please help.

Reply all
Reply to author
Forward
0 new messages