Replication: Slave out of sync apparently because "channel is not opened"

1,477 views
Skip to first unread message

cbaldacin

unread,
Feb 23, 2011, 5:45:45 AM2/23/11
to Repo and Gerrit Discussion
Hi, I'm getting this stack trace bellow and I think it must be the
reason the slave is out of sync.

Does anybody knows what may be causing it?

I tried to look into jgit implementation but I'm unable to find
TransportGitSsh class with at least 456 lines :-(

[2011-02-23 00:04:21,754] ERROR
com.google.gerrit.server.git.PushReplication : Cannot replicate to
gerrit2@cnbjlnx001:/srv/gerrit2/git/platform/vendor/semc/packages/apps/
metadata-cleanup.git
org.eclipse.jgit.errors.TransportException: gerrit2@cnbjlnx001:/srv/
gerrit2/git/platform/vendor/semc/packages/apps/metadata-cleanup.git:
channel is not opened.
at org.eclipse.jgit.transport.TransportGitSsh
$JschConnection.connect(TransportGitSsh.java:228)
at org.eclipse.jgit.transport.TransportGitSsh
$SshPushConnection.<init>(TransportGitSsh.java:456)
at
org.eclipse.jgit.transport.TransportGitSsh.openPush(TransportGitSsh.java:
109)
at
org.eclipse.jgit.transport.PushProcess.execute(PushProcess.java:130)
at org.eclipse.jgit.transport.Transport.push(Transport.java:
962)
at com.google.gerrit.server.git.PushOp.pushVia(PushOp.java:
263)
at com.google.gerrit.server.git.PushOp.runImpl(PushOp.java:
209)
at com.google.gerrit.server.git.PushOp.run(PushOp.java:162)
at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask
$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at com.google.gerrit.server.git.WorkQueue
$Task.run(WorkQueue.java:324)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.jcraft.jsch.JSchException: channel is not opened.
at com.jcraft.jsch.Channel.connect(Channel.java:188)
at org.eclipse.jgit.transport.TransportGitSsh
$JschConnection.connect(TransportGitSsh.java:224)
... 16 more

Shawn Pearce

unread,
Feb 23, 2011, 1:17:18 PM2/23/11
to cbaldacin, Repo and Gerrit Discussion
On Wed, Feb 23, 2011 at 02:45, cbaldacin
<carloseduar...@sonyericsson.com> wrote:
> Hi, I'm getting this stack trace bellow and I think it must be the
> reason the slave is out of sync.
>
> Does anybody knows what may be causing it?
>
> I tried to look into jgit implementation but I'm unable to find
> TransportGitSsh class with at least 456 lines :-(

Are you sure you looked at the right version of JGit? In the current
version its 467 lines, and according to this trace, its at least 458
lines in the version that you are running. -)

> [2011-02-23 00:04:21,754] ERROR
> com.google.gerrit.server.git.PushReplication : Cannot replicate to
> gerrit2@cnbjlnx001:/srv/gerrit2/git/platform/vendor/semc/packages/apps/
> metadata-cleanup.git
> org.eclipse.jgit.errors.TransportException: gerrit2@cnbjlnx001:/srv/
> gerrit2/git/platform/vendor/semc/packages/apps/metadata-cleanup.git:
> channel is not opened.

Thanks JSch:

> Caused by: com.jcraft.jsch.JSchException: channel is not opened.
>        at com.jcraft.jsch.Channel.connect(Channel.java:188)
>        at org.eclipse.jgit.transport.TransportGitSsh
> $JschConnection.connect(TransportGitSsh.java:224)

JSch is even worse software than Gerrit itself. Probably the only
thing that will fix this is to switch to JGit to use the MINA SSHD
client, which seems to have fewer bugs, and better programmers behind
it.

More recent versions of JGit (0.11.1, and maybe some earlier builds
than that) support setting the GIT_SSH environment variable. If set it
will use that command to open the SSH connection, instead of the JSch
library. You might try setting GIT_SSH=ssh in your environment when
you launch Gerrit, assuming you are able to upgrade your Gerrit WAR to
use that newer JGit library.

Luciano Carvalho

unread,
Feb 23, 2011, 1:37:24 PM2/23/11
to Shawn Pearce, cbaldacin, Repo and Gerrit Discussion
Hi Shawn,

I have a lot of problems with JSch as well.
We have a lot of mirrors and more than 100 "Auth fail" or "channel not opened" errors every day.

From which Gerrit version I can use the GIT_SSH=ssh setting?

Thanks,

Luciano.


Shawn Pearce

unread,
Feb 23, 2011, 1:50:04 PM2/23/11
to Luciano Carvalho, cbaldacin, Repo and Gerrit Discussion
On Wed, Feb 23, 2011 at 10:37, Luciano Carvalho <lsca...@gmail.com> wrote:
> I have a lot of problems with JSch as well.
> We have a lot of mirrors and more than 100 "Auth fail" or "channel not
> opened" errors every day.
> From which Gerrit version I can use the GIT_SSH=ssh setting?

None. I haven't upgraded Gerrit to use this newer version of JGit yet.

It *might* be as simple as replacing the JGit JARs with the latest
from the JGit project. But I cannot remember if there were API changes
or not, if there are API changes in JGit then Gerrit might need a few
(minor) source code edits to run on the latest JARs.

Martin Fick

unread,
Feb 23, 2011, 2:01:39 PM2/23/11
to repo-d...@googlegroups.com
On Wednesday 23 February 2011 11:50:04 am Shawn Pearce >
> It *might* be as simple as replacing the JGit JARs with
> the latest from the JGit project. But I cannot remember
> if there were API changes or not, if there are API
> changes in JGit then Gerrit might need a few (minor)
> source code edits to run on the latest JARs.

There are API changes in the HEAD, I tried it. But I
believe it works to update it to the stable-0.11 version.

-Martin

cbaldacin

unread,
Feb 24, 2011, 3:56:08 AM2/24/11
to Repo and Gerrit Discussion
After have started this thread, I found this other:
http://groups.google.com/group/repo-discuss/browse_thread/thread/e01cc549001f8653/303d65342d457935?lnk=gst&q=Gerrit-%3Essh%2Bgit+replication+does+not+work#303d65342d457935

And I found that the JSch running on our server is the old one: 0.1.41
Maybe updating to the new JSch could solve it, but not sure!
In my test environment pointed to latest 0.1.44 I'm not able to
reproduce the same exception.
(Next step will be try 0.1.41 instead)

But curiously, after several "channel is not opened" errors, the
project gets replicated somehow after some time.

Carlos

Shawn Pearce

unread,
Feb 24, 2011, 10:04:48 AM2/24/11
to cbaldacin, Repo and Gerrit Discussion
On Thu, Feb 24, 2011 at 00:56, cbaldacin
<carloseduar...@sonyericsson.com> wrote:
> But curiously, after several "channel is not opened" errors, the
> project gets replicated somehow after some time.

I think its a timing bug in JSch. The JSch authors think the
constructs they are using are thread-safe in Java. They aren't. There
are a number of race conditions between different threads for each SSH
connection, and sometimes things don't work like the authors think
they should.

I've looked through the code and given up on trying to fix it. Its
just too damn ugly. A for loop with explicit sleeps and no locking is
not the right way to share data between two threads in Java. Or any
programming language. It might work on a single CPU system with only
cooperative threading, but nobody does that anymore (and hasn't for
years!).

Reply all
Reply to author
Forward
0 new messages