Re: ssh hash mismatch error

2,328 views
Skip to first unread message

Robin Rosenberg

unread,
Mar 29, 2013, 12:50:40 PM3/29/13
to Bassem Rabil, repo-d...@googlegroups.com


----- Ursprungligt meddelande -----
>
> Hi
>
>
> We are experiencing sporadic errors with ssh connections to gerrit
> server running 2.5.1. Normally there is no impact on gerrit users,
> but continuous integration engines like Jenkins constantly fail due
> to this error. The environment we use for our gerrit server is:
>
>
> Red Hat Enterprise Linux Server release 5.3 (Tikanga)
>
> Linux hostname 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010
> x86_64 x86_64 x86_64 GNU/Linux
>
> OpenSSH_4.3p2
> OpenSSL 0.9.8e-fips-rhel5

I've seen it a few times with Gerrit 2.2 on Windows with Windows 1.8 and older clients.

-- robin

Doug Kelly

unread,
Sep 11, 2013, 4:08:15 PM9/11/13
to repo-d...@googlegroups.com
Sorry to resurrect this thread from the dead.... but...

On Thursday, March 21, 2013 3:23:05 PM UTC-5, Bassem Rabil wrote:
Hi

We are experiencing sporadic errors with ssh connections to gerrit server running 2.5.1. Normally there is no impact on gerrit users, but continuous integration engines like Jenkins constantly fail due to this error. The environment we use for our gerrit server is:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Linux hostname 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
OpenSSH_4.3p2 
OpenSSL 0.9.8e-fips-rhel5
We're in the process of migrating from Ubuntu to CentOS 6, and I never saw this on Gerrit 2.4 or 2.7 within Ubuntu, however this does seem to be fairly frequent within CentOS 6 running 2.7 (I've not tested 2.4 since we don't plan on ever running 2.4 in production with CentOS).
 
We ran a script that loops ssh connection to port 29418 with one second delay for about 30 hours, we got around 200 errors.

The error message:
hash mismatch
debug1: ssh_rsa_verify: signature incorrect
key_verify failed for server_host_key
I should try doing this; the job that fails for me is our script to run garbage collection on repos, which runs it using ssh / gerrit gc under 2.7.  The failure rates are probably pretty comparable, but I don't have any numbers.
 

You can see the full ssh output attached.

Investigating this issue led us to a conclusion that this might be related to OpenSSH, OpenSSL, or OS Kernel versions. Anyone experienced such behavior before ? Any hints to resolve this ?
Did you ever find any resolution to this?  Here are my openssh/openssl versions:

Name        : openssh
Arch        : x86_64
Version     : 5.3p1
Release     : 84.1.el6

Name        : openssh-clients
Arch        : x86_64
Version     : 5.3p1
Release     : 84.1.el6

Name        : openssl
Arch        : x86_64
Version     : 1.0.0
Release     : 27.el6_4.2

I'm just as stumped as you, Bassem, so if you have any tips, I'd be more than glad to hear them.

Thanks!

Doug Kelly

Martin Fick

unread,
Sep 11, 2013, 4:17:40 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly
Are you using control master?

On Wednesday, September 11, 2013 02:08:15 pm Doug Kelly
wrote:
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Doug Kelly

unread,
Sep 11, 2013, 4:42:17 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly


On Wednesday, September 11, 2013 3:17:40 PM UTC-5, MartinFick wrote:
Are you using control master?

ControlMaster is not in use.  The command I used to replicate this is:
while true; do ssh -vvv user@localhost -p 29418 gerrit query status:submitted; if [ $? != 0 ]; then break; fi; done

I've been able to replicate this going from Ubuntu and CentOS to Gerrit 2.7 running on CentOS.  Almost makes me wonder if 2.7's sshd is having some issue... but that doesn't explain why it worked (seemingly) fine when I ran 2.7 + Ubuntu.  Looks like I'll set up some test machines running CentOS 6: one with Gerrit 2.4 and one with Gerrit 2.7.  Basically, my error looks exactly the same as Bassem's original error.

Doug Kelly

unread,
Sep 11, 2013, 5:24:33 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly
I've just confirmed that I can reproduce the bug on a clean CentOS install, running both Gerrit 2.4.2 and Gerrit 2.7.  Also, since it doesn't seem to matter what SSH client is connecting, almost seems like something deep in the kernel.  Which is definitely less-than-good news.

Doug Kelly

unread,
Sep 11, 2013, 7:09:06 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly
On Wednesday, September 11, 2013 4:24:33 PM UTC-5, Doug Kelly wrote:
I've just confirmed that I can reproduce the bug on a clean CentOS install, running both Gerrit 2.4.2 and Gerrit 2.7.  Also, since it doesn't seem to matter what SSH client is connecting, almost seems like something deep in the kernel.  Which is definitely less-than-good news.

A bit more info.  This may be related to the version of OpenJDK that ships with CentOS/RHEL.  I went ahead and installed the Bouncy Castle library, and that appears to resolve the issue.

--Doug 

evlacan

unread,
Sep 11, 2013, 11:32:55 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly



A bit more info.  This may be related to the version of OpenJDK that ships with CentOS/RHEL.  I went ahead and installed the Bouncy Castle library, and that appears to resolve the issue.



Doug, could you run more tests, especially when the server is busy? IMHO Bouncy Castle has nothing to do with this issue.

My suspicion is that this comes from jsch library. In our experience the error happens when there are concurrent ssh connections, the busier the server gets the more often we get this error.
For us it started on master instance and we couldn't reproduce the error on any mirror up to a point when some of the mirrors started to be heavily used and this was the moment when the error popped up there as well.

Personally I think this issue is similar to the topic discussed in this thread a while ago:

You can try to use GIT_SSH environment variable as Swan suggests in that topic and this is also described in the Scaling Gerrit Installations Wiki:

--Vlad


Doug Kelly

unread,
Sep 11, 2013, 11:48:20 PM9/11/13
to repo-d...@googlegroups.com, Doug Kelly


On Wednesday, September 11, 2013 10:32:55 PM UTC-5, evlacan wrote:
Doug, could you run more tests, especially when the server is busy? IMHO Bouncy Castle has nothing to do with this issue.
On the contrary, Bouncy Castle is the only variable in my tests that I've changed.  Command is the same, servers are idle other than the command I run on the server, etc.
 
My suspicion is that this comes from jsch library. In our experience the error happens when there are concurrent ssh connections, the busier the server gets the more often we get this error.
For us it started on master instance and we couldn't reproduce the error on any mirror up to a point when some of the mirrors started to be heavily used and this was the moment when the error popped up there as well.
As I understand it, JSch is only used for *outgoing* SSH connections.  Apache MINA SSHD (and Bouncy Castle, if available) are used to process incoming SSH connections.  This is a problem where incoming SSH connections are rejected because of a host key error on the client side (i.e. server sends its half of the D-H key exchange, client rejects it because of a bad checksum).
 
Personally I think this issue is similar to the topic discussed in this thread a while ago:

You can try to use GIT_SSH environment variable as Swan suggests in that topic and this is also described in the Scaling Gerrit Installations Wiki:
Even if this were the problem, it's a client-side issue being caused by a flaky server response (and as stated before, this part of the code isn't being exercised in my case).

Basically, I'm more keen to believe at this point something in the JRE is amiss (causing a one in a few hundred chance for a calculation error), specifically in the crypto libraries that get used by Apache MINA SSHD.  A threading issue isn't as likely, just because all my connections are set up and torn down serially--and the same issue manifests on two different versions of Gerrit, but only in one server environment.  Now, I'm not sure what the fix for this is.  Certainly, installing Bouncy Castle masks the problem, so maybe that's good enough (does leave the question, what are the pros/cons of having Bouncy Castle available?).  But, perhaps other things to test here are the specific JRE in use--i.e. if it's Oracle's JRE vs. OpenJDK.

--Doug

Doug Kelly

unread,
Sep 12, 2013, 10:16:53 AM9/12/13
to repo-d...@googlegroups.com, Doug Kelly
OK, I just confirmed both boxes are running the Oracle JRE -- the Ubuntu box is running "Java(TM) SE Runtime Environment (build 1.6.0_27-b07)" and the CentOS box is running "Java(TM) SE Runtime Environment (build 1.7.0_25-b15)".  Additionally, I also reproduced this in a VM with a clean CentOS install running the latest version of the OpenJDK that CentOS has packaged (1.7.0_25 -- with a JRE string of "OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)").  So, back to the drawing board with "what in the world did RedHat do?" ;)  Personally, I wouldn't be surprised to see something funny like SELinux playing into this, but anything's possible.

--Doug

Doug Kelly

unread,
Sep 12, 2013, 3:30:02 PM9/12/13
to repo-d...@googlegroups.com, Doug Kelly
On Thursday, September 12, 2013 9:16:53 AM UTC-5, Doug Kelly wrote:
OK, I just confirmed both boxes are running the Oracle JRE -- the Ubuntu box is running "Java(TM) SE Runtime Environment (build 1.6.0_27-b07)" and the CentOS box is running "Java(TM) SE Runtime Environment (build 1.7.0_25-b15)".  Additionally, I also reproduced this in a VM with a clean CentOS install running the latest version of the OpenJDK that CentOS has packaged (1.7.0_25 -- with a JRE string of "OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)").  So, back to the drawing board with "what in the world did RedHat do?" ;)  Personally, I wouldn't be surprised to see something funny like SELinux playing into this, but anything's possible.

Last follow-up for a while, I think.  Disabled SELinux and re-ran, issue still reproduced on Java 1.7.0_25 (both OpenJDK and the Oracle JDK).  Went and copied the JRE 1.6.0_27 from the Ubuntu box and re-ran on that JRE, and so far, I've not reproduced the issue.  So, maybe it's a bug in the JRE, but since Bouncy Castle doesn't use those routines, all is well, assuming you use Bouncy Castle.  I don't think relying on Java 6 is a good idea (since it's EOL), but at least this narrows things down pretty well.

--Doug

Paul Lutus

unread,
Nov 1, 2013, 8:58:17 PM11/1/13
to repo-d...@googlegroups.com


On Thursday, March 21, 2013 1:23:05 PM UTC-7, Bassem Rabil wrote:
Hi

We are experiencing sporadic errors with ssh connections to gerrit server running 2.5.1. Normally there is no impact on gerrit users, but continuous integration engines like Jenkins constantly fail due to this error. The environment we use for our gerrit server is:

/ ... snip
 
Investigating this issue led us to a conclusion that this might be related to OpenSSH, OpenSSL, or OS Kernel versions. Anyone experienced such behavior before ? Any hints to resolve this ?

I've recently been rewriting one of my Android applications (SSHelper), replacing Dropbear with OpenSSH, and I am now getting reports from users seeing the "hash mismatch" error message. After some effort to track down the cause I realized it wasn't my app, but most likely a bug in the OpenSSH library.

An online search shows this bug appearing sporadically over the last decade in different circumstances. In one case replacing a router seemed to have solved it. But it appears often enough that IMHO it should be fully investigated.

In the OpenSSH software library and the SSL library on which it depends, the "hash mismatch" error message appears precisely once -- in line 260 of ssh-rsa.c in the OpenSSH source. This might be more misleading than helpful, since the cause of the error cold be far removed from that specific location.

I'm using OpenSSH-6.3p1 and SSL 1.0.1e, both the current stable releases at the time of writing, in an app that runs on Android. On that basis I concur with some of the other correspondents in this thread -- I don't think Java plays any part in the bug. Because my app runs on Android, and because I use the stock OpenSSH source compiled for an Arm processor, I think this argues for the bug being located somewhere in the SSH / SSL libraries.

To summarize, this report involves the same OpenSSH and SSL source code others are using, but everything else is different -- different processor, different Java, different operating system. I think this supports the idea that those aren't factors, and the OpenSSH library is the prime suspect.

BTW, based on what I can see so far, one obvious remedy is to avoid RSA keys.

 -- Paul Lutus, www.arachnoid.com

Bassem Rabil

unread,
Nov 4, 2013, 7:34:41 AM11/4/13
to repo-d...@googlegroups.com
For our issue, it has been resolved by adding the bouncy castle library thanks to Doug's suggestion. We tested this with both Jetty and Tomcat.

Doug Kelly

unread,
Nov 4, 2013, 4:26:32 PM11/4/13
to repo-d...@googlegroups.com
On Monday, November 4, 2013 6:34:41 AM UTC-6, Bassem Rabil wrote:
For our issue, it has been resolved by adding the bouncy castle library thanks to Doug's suggestion. We tested this with both Jetty and Tomcat.

From the notes I found while looking into the JSch side of this problems, it seems some things changed in JCE according to the JSch developer's notes in the issue tracker.  Specifically, he added this little note in KeyExchange.java:

+  /*
+   * It seems JCE included in Oracle's Java7u6(and later) has suddenly changed
+   * its behavior.  The secrete generated by KeyAgreement#generateSecret()
+   * may start with 0, even if it is a positive value.
+   */

So, I guess this would have to be something for the Apache Mina developers to take up accordingly... but for now, Bouncy Castle is humming along fine.

--Doug

溫啟清

unread,
May 29, 2014, 11:35:46 AM5/29/14
to repo-d...@googlegroups.com
Hi Bassem,

I also meet this error(hash mismatch) on my gerrit version 2.8

can you describe how to "adding the bouncy castle library"?

do you download http://www.bouncycastle.org/download/bcpkix-jdk15on-150.jar and put the bcpkix-jdk15on-150.jar to $GERRIT_HOME/plugins/ and then gerrit plugin add and enable ?

Bassem Rabil

unread,
May 30, 2014, 2:27:46 PM5/30/14
to 溫啟清, repo-d...@googlegroups.com
The bouncycastle library is placed at review_site/lib/bcprov-jdk16-144.jar and you will need to restart the Gerrit instance for this library to take effect. We are using the jdk 1.6 version currently as my example shows.


Thanks and Regards
Bassem Guendy



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/JE7OM6o7DMs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

溫啟清

unread,
Jun 1, 2014, 2:07:48 AM6/1/14
to repo-d...@googlegroups.com, wench...@gmail.com
thank you Bassem, I put the lib to the path review_site/lib and restart the gerrit service, now it works fine

Charles O'Farrell

unread,
Jun 26, 2014, 9:22:21 AM6/26/14
to repo-d...@googlegroups.com
FWIW we hit this problem in Stash recently, and this thread came up. Turns out the latest version of Mina (0.11) hadn't made the appropriate change yet.

If anyone is interested the relevant issue is here (fixed in 0.12):



On Tuesday, November 5, 2013 8:26:32 AM UTC+11, Doug Kelly wrote:

Saša Živkov

unread,
Jun 26, 2014, 11:28:28 AM6/26/14
to repo-d...@googlegroups.com
On Thu, Jun 26, 2014 at 3:22 PM, Charles O'Farrell <char...@gmail.com> wrote:
FWIW we hit this problem in Stash recently, and this thread came up. Turns out the latest version of Mina (0.11) hadn't made the appropriate change yet.

If anyone is interested the relevant issue is here (fixed in 0.12):


Thanks for the info!
We also hit this problem several times in last couple of weeks... Looks like it started after we switched
to Java 7.
 


On Tuesday, November 5, 2013 8:26:32 AM UTC+11, Doug Kelly wrote:

From the notes I found while looking into the JSch side of this problems, it seems some things changed in JCE according to the JSch developer's notes in the issue tracker.  Specifically, he added this little note in KeyExchange.java:

+  /*
+   * It seems JCE included in Oracle's Java7u6(and later) has suddenly changed
+   * its behavior.  The secrete generated by KeyAgreement#generateSecret()
+   * may start with 0, even if it is a positive value.
+   */

So, I guess this would have to be something for the Apache Mina developers to take up accordingly... but for now, Bouncy Castle is humming along fine.

--Doug

--
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
Jun 27, 2014, 5:32:09 AM6/27/14
to Saša Živkov, repo-d...@googlegroups.com
On Thu, Jun 26, 2014 at 5:28 PM, Saša Živkov <ziv...@gmail.com> wrote:


On Thu, Jun 26, 2014 at 3:22 PM, Charles O'Farrell <char...@gmail.com> wrote:
FWIW we hit this problem in Stash recently, and this thread came up. Turns out the latest version of Mina (0.11) hadn't made the appropriate change yet.

If anyone is interested the relevant issue is here (fixed in 0.12):


Thanks for the info!
We also hit this problem several times in last couple of weeks... Looks like it started after we switched
to Java 7.

could we cherry-pick this fix for Mina to get out of this mess ?

--
Matthias

David Ostrovsky

unread,
Jun 28, 2014, 5:18:55 AM6/28/14
to repo-d...@googlegroups.com, ziv...@gmail.com, Dave Borowitz, David Pursehouse (Sony Mobile)
We should certainly do that and include that custom SSHD version fix in upcoming 2.8.6 release [1].

In the past Shawn did that once. So the way to go would be:

1. Create our custom version of Apache SSHD with this change applied [2]
2. Upload custom SSHD artifact to Google storage bucket
3. Adjust our build to consume SSHD from custom location

I can take care of 3., once 1. and 2. is done. Recently Dave did exactly this with guice-servlet-extensions bug.
Adding him and David P. to this thread.


Charles O'Farrell

unread,
Jun 29, 2014, 7:52:49 AM6/29/14
to David Ostrovsky, repo-d...@googlegroups.com


You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/JE7OM6o7DMs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages