Gerrit->ssh+git replication does not work

511 views
Skip to first unread message

Anatol Pomazau

unread,
Jan 4, 2011, 5:27:03 PM1/4/11
to Repo and Gerrit Discussion
Hi,

After I pushed HEAD version of Gerrit to my server I found that replication stopped working. All replication jobs sit in the gerrit queue forever and never executed.

I looked into master Gerrit error logs and see following messages:

[2011-01-04 00:00:02,247] ERROR com.google.gerrit.server.git.PushReplication : Cannot replicate to ssh+git://slavehost.com/git/yourproject.git
org.eclipse.jgit.errors.TransportException: ssh+git://slavehost.com/git/yourproject.git: session is down
        at org.eclipse.jgit.transport.TransportGitSsh$JschConnection.connect(TransportGitSsh.java:228)
        at org.eclipse.jgit.transport.TransportGitSsh$SshPushConnection.<init>(TransportGitSsh.java:456)
        at org.eclipse.jgit.transport.TransportGitSsh.openPush(TransportGitSsh.java:109)
        at org.eclipse.jgit.transport.PushProcess.execute(PushProcess.java:130)
        at org.eclipse.jgit.transport.Transport.push(Transport.java:962)
        at com.google.gerrit.server.git.PushOp.pushVia(PushOp.java:263)
        at com.google.gerrit.server.git.PushOp.runImpl(PushOp.java:209)
        at com.google.gerrit.server.git.PushOp.run(PushOp.java:162)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:324)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: com.jcraft.jsch.JSchException: session is down
        at com.jcraft.jsch.Channel.connect(Channel.java:185)
        at org.eclipse.jgit.transport.TransportGitSsh$JschConnection.connect(TransportGitSsh.java:224)
        ... 16 more


Looking into slave host logs I see following info in /var/log/auth.log

Jan  4 14:16:10 vpbd2.mtv.corp.google.com sshd[5373]: Connection from XX.YY.107.65 port 50419
Jan  4 14:16:10 vpbd2.mtv.corp.google.com sshd[5373]: Failed none for android-git from XX.YY.107.65 port 50419 ssh2
Jan  4 14:16:10 vpbd2.mtv.corp.google.com sshd[5373]: Accepted publickey for git from XX.YY.107.65 port 50419 ssh2
Jan  4 14:16:10 vpbd2.mtv.corp.google.com sshd[5373]: pam_unix(ssh:session): session opened for user git by (uid=0)
Jan  4 14:16:11 vpbd2.mtv.corp.google.com sshd[5373]: User child is on pid 5525
Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5192]: Corrupted MAC on input.
Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5192]: Disconnecting: Packet corrupt
Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5040]: pam_unix(ssh:session): session closed for user git


The only suspicious message here is "Corrupted MAC on input." Have you seen anything like this before? I tend to blame jsch.

Anatol Pomazau

unread,
Jan 4, 2011, 5:30:56 PM1/4/11
to Repo and Gerrit Discussion
The only suspicious message here is "Corrupted MAC on input." Have you seen anything like this before? I tend to blame jsch.

BTW I can do "ssh slavehost" from the master host.

Shawn Pearce

unread,
Jan 4, 2011, 5:34:45 PM1/4/11
to Anatol Pomazau, Repo and Gerrit Discussion
On Tue, Jan 4, 2011 at 14:27, Anatol Pomazau <ana...@google.com> wrote:
...

> Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5192]: Corrupted MAC on
> input.
> Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5192]: Disconnecting: Packet
> corrupt
> Jan  4 14:16:12 vpbd2.mtv.corp.google.com sshd[5040]: pam_unix(ssh:session):
> session closed for user git
>
> The only suspicious message here is "Corrupted MAC on input." Have you seen
> anything like this before? I tend to blame jsch.

Yea, I agree, that sounds like JSch output some garbage and OpenSSH
rejected it. Thing is, we haven't changed the JSch version in a long,
long time.

Anatol Pomazau

unread,
Jan 4, 2011, 6:29:52 PM1/4/11
to Shawn Pearce, Repo and Gerrit Discussion
I replaced jsch 0.1.41 with 0.1.44 in gerrit.war and deployed it to the server. It seems fixed the problem and replication works now. I can't explain why though. 

Shawn Pearce

unread,
Jan 5, 2011, 1:10:49 PM1/5/11
to Anatol Pomazau, Repo and Gerrit Discussion
On Tue, Jan 4, 2011 at 15:29, Anatol Pomazau <ana...@google.com> wrote:
>> >
>> > The only suspicious message here is "Corrupted MAC on input." Have you
>> > seen
>> > anything like this before? I tend to blame jsch.
>
> I replaced jsch 0.1.41 with 0.1.44 in gerrit.war and deployed it to the
> server. It seems fixed the problem and replication works now. I can't
> explain why though.

Well, .44 contains a bug fix for one of the MAC algorithms:

http://www.mail-archive.com/jsch-...@lists.sourceforge.net/msg00957.html
| Changes since version 0.1.43:
| - bugfix: hmac-md5-96 and hmac-sha1-96 are broken. FIXED.

Its possible the remote server SSH daemon upgraded or got reconfigured
to a new default MAC algorithm, and the JSch client didn't implement
it correctly. It may have been caused by an OpenSSH upgrade, a newer
OpenSSH might be using a different default MAC and that's how the JSch
guys found out there was a bug here.

It might be time to upgrade our pom to use .44.

Reply all
Reply to author
Forward
0 new messages