Gerrit and Git/SSH (aka Apache Mina SSHD)

669 views
Skip to first unread message

lucamilanesio

unread,
Oct 5, 2017, 3:36:51 AM10/5/17
to Repo and Gerrit Discussion
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems, it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

Feedback is more than welcome, as usual :-)

thomasmu...@yahoo.com

unread,
Oct 5, 2017, 3:41:38 AM10/5/17
to Repo and Gerrit Discussion
Also the WMF isen’t experiencing any problems with ssh. I’ve seen no reports, reported on our bug tracker https://phabricator.wikimedia.org/project/view/330/ about problems with ssh. It seems really stable for us though we are still waiting to upgrade to gain support for edcsa.

Luca Milanesio

unread,
Oct 5, 2017, 3:45:19 AM10/5/17
to thomasmu...@yahoo.com, Repo and Gerrit Discussion
Hi Paladox,
for small-medium setups, works fine.

Is your deployment a large one?
- Number of repos
- Size of repos
- Avg number of refs per repo
- Active users
- Number of pulls/day
- Number of push/day

Luca.

> On 5 Oct 2017, at 08:41, thomasmulhall410 via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
>
> Also the WMF isen’t experiencing any problems with ssh. I’ve seen no reports, reported on our bug tracker https://phabricator.wikimedia.org/project/view/330/ about problems with ssh. It seems really stable for us though we are still waiting to upgrade to gain support for edcsa.
>
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

lucamilanesio

unread,
Oct 5, 2017, 3:50:27 AM10/5/17
to Repo and Gerrit Discussion


On Thursday, October 5, 2017 at 8:36:51 AM UTC+1, lucamilanesio wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:

Four
 
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

4. Make the SSHD backend pluggable as libModule and allow commercial implementations as plugins (e.g. https://www.sshtools.com

thomasmu...@yahoo.com

unread,
Oct 5, 2017, 3:59:23 AM10/5/17
to Repo and Gerrit Discussion
Yep it’s a large one. I doint currently have any active user list

Though this https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit which gives you an idea on how many there are.

Our biggest repos are mediawiki/core and operations/puppet.

About the repos I believe there are over 1,000+ repos

On https://github.com/wikimedia it shows 2,000+ repos though some of those are differential only repos that are replicating. The rest should be gerrit.

Luca Milanesio

unread,
Oct 5, 2017, 4:13:19 AM10/5/17
to Paladox, Repo and Gerrit Discussion
You guys have 1796 repos and the largest is /mediawiki/extensions with 49k commits and has 4K refs.
This is more a medium-sized installation for Gerrit's standards :-)

Our clients have tens of thousands of repos, some of them with hundreds of thousands of refs.
In these situations, we have seen Gerrit's use of Git/SSH struggling :-(

When we suggested to move to Git/HTTP, all the problems just faded away like snow on the sun.

Luca.

thomasmu...@yahoo.com

unread,
Oct 5, 2017, 7:52:03 AM10/5/17
to Repo and Gerrit Discussion
We have had issues with git over http where by a large repo like mediawiki/core failed to clone.

See https://phabricator.wikimedia.org/T152801

About the ssh issue, it depends how much the server can handle at once.

Luca Milanesio

unread,
Oct 5, 2017, 8:35:39 AM10/5/17
to thomasmu...@yahoo.com, Repo and Gerrit Discussion

On 5 Oct 2017, at 12:52, thomasmulhall410 via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

We have had issues with git over http where by a large repo like mediawiki/core failed to clone.

See https://phabricator.wikimedia.org/T152801

Looking at the discussion thread I haven't found anything related to problems in the HTTP stack, but rather missing objects, post buffer sizes etc.
The number of issues in Gerrit Git/HTTP related to Jetty (or Apache Tomcat) are next to zero !


About the ssh issue, it depends how much the server can handle at once.

Are you sure? Have you tried the search I was suggesting?

Saša Živkov

unread,
Oct 5, 2017, 10:11:52 AM10/5/17
to lucamilanesio, Repo and Gerrit Discussion
On Thu, Oct 5, 2017 at 9:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems,

We also run Apache Mina SSHD without problems and we have: close to 20K repositories, over 20K registered users,
some large repos with over 400K of refs, etc...
I don't really know what we did right but if you are interested into some specific details of our gerrit.config let me know.
 
it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

Feedback is more than welcome, as usual :-)

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Saša Živkov

unread,
Oct 5, 2017, 10:13:08 AM10/5/17
to lucamilanesio, Repo and Gerrit Discussion
On Thu, Oct 5, 2017 at 9:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

Maybe we need more acceptance tests?
 

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems, it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

Feedback is more than welcome, as usual :-)

--

Martin Fick

unread,
Oct 5, 2017, 10:28:51 AM10/5/17
to repo-d...@googlegroups.com, lucamilanesio
On Thursday, October 05, 2017 12:36:51 AM lucamilanesio
wrote:
> Hi, all,
> I wanted to trigger a *positive and constructive
> discussion about Gerrit support for Git/SSH* based on
> Apache Mina SSHD.
>
> If you search in the mailing list with the keywords "SSH"
> and "Mina" you'll find a long list. I clearly remember
> that every time we upgraded to a recent version, the old
> issues were mostly gone but new and more worrying issues
> were arising. We had at times to upgrade and then
> downgrade the version of Apache Mina SSHD because of
> that.

I am not familiar with the list of issues, and I don't
really have the energy to search a mailing list to try and
figure out what issues exist, which ones are real reports,
and which ones are current. I don't believe that any
discussion would be very productive without a list of actual
current issues. Perhaps you have such a list?

> This is not good and there isn't so much we can do about
> it because Apache Mina SSHD is not under our control or
> will.

I am guessing from the tone of this that you feel that "fix
issues upstream" is an option you don't think is valid, can
you explain this?

> I know for sure that some large deployments (e.g.
> Qualcomm) are running with Git/SSH on Apache Mina SSHD
> without problems, it would be interesting to get feedback
> on how and why they are not seeing the problems that the
> rest of the community is experiencing.

I would need specifics to help answer such a question. What
specific problems (maybe we do see them and are just used to
them, or have worked around them, or our users don't cause
them)?

Thanks,

-Martin


--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

thomasmu...@yahoo.com

unread,
Oct 5, 2017, 11:47:20 AM10/5/17
to Repo and Gerrit Discussion
I havent tried search yet. Though it can get busy at times. 

though lets add a 5th option of using either gerrit's built in sshd or the servers one.

On Thursday, October 5, 2017 at 1:35:39 PM UTC+1, lucamilanesio wrote:

On 5 Oct 2017, at 12:52, thomasmulhall410 via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

We have had issues with git over http where by a large repo like mediawiki/core failed to clone.

See https://phabricator.wikimedia.org/T152801

Looking at the discussion thread I haven't found anything related to problems in the HTTP stack, but rather missing objects, post buffer sizes etc.
The number of issues in Gerrit Git/HTTP related to Jetty (or Apache Tomcat) are next to zero !


About the ssh issue, it depends how much the server can handle at once.

Are you sure? Have you tried the search I was suggesting?


lucamilanesio

unread,
Oct 5, 2017, 12:04:54 PM10/5/17
to Repo and Gerrit Discussion


On Thursday, October 5, 2017 at 3:11:52 PM UTC+1, zivkov wrote:


On Thu, Oct 5, 2017 at 9:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems,

We also run Apache Mina SSHD without problems and we have: close to 20K repositories, over 20K registered users,
some large repos with over 400K of refs, etc...
I don't really know what we did right but if you are interested into some specific details of our gerrit.config let me know.

What Gerrit version are you running?
Have you changed / customised the Apache Mina SSHD version you use?
 
 
it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

Feedback is more than welcome, as usual :-)

--
--
To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

lucamilanesio

unread,
Oct 5, 2017, 12:15:48 PM10/5/17
to Repo and Gerrit Discussion


On Thursday, October 5, 2017 at 3:28:51 PM UTC+1, MartinFick wrote:
On Thursday, October 05, 2017 12:36:51 AM lucamilanesio
wrote:
> Hi, all,
> I wanted to trigger a *positive and constructive
> discussion about Gerrit support for Git/SSH* based on
> Apache Mina SSHD.
>
> If you search in the mailing list with the keywords "SSH"
> and "Mina" you'll find a long list. I clearly remember
> that every time we upgraded to a recent version, the old
> issues were mostly gone but new and more worrying issues
> were arising. We had at times to upgrade and then
> downgrade the version of Apache Mina SSHD because of
> that.

I am not familiar with the list of issues, and I don't
really have the energy to search a mailing list to try and
figure out what issues exist, which ones are real reports,
and which ones are current.  I don't believe that any
discussion would be very productive without a list of actual
current issues.  Perhaps you have such a list?

True, let me extract them from our Gerrit logs:


Not a short list isn't it?

Of course, they apply to different releases of SSHD and different combinations of the settings / configured backend.
It can be that in your Gerrit version with your use-case and configuration, none of them are applicable.

We have clients with a number of different versions and combinations of configuration and volumes, and in one way or another, they are impacted by one or more of those issues.
 

> This is not good and there isn't so much we can do about
> it because Apache Mina SSHD is not under our control or
> will.

I am guessing from the tone of this that you feel that "fix
issues upstream" is an option you don't think is valid, can
you explain this?

It is an option and we (or mostly Hugo, David, Doug and other contributors) have done it so far ... however, as a matter of fact, there are *LOTS* of issues and typically a symptom of lack of robustness of the tests that the Apache Mina SSHD project is performing before cutting a release.

I have used *a lot* Apache Mina SSHD in the past, and at the end of the day, we decided to go for a more robust and stable implementation.
That was a long ago, and we assumed that the project wasn't just mature enough.

However, as the above list is telling us, that is still the case: we can, of course, keep on fixing them upstream and just hope that the stack would become way more stable moving forward.
 

> I know for sure that some large deployments (e.g.
> Qualcomm) are running with Git/SSH on Apache Mina SSHD
> without problems, it would be interesting to get feedback
> on how and why they are not seeing the problems that the
> rest of the community is experiencing.

I would need specifics to help answer such a question.  What
specific problems (maybe we do see them and are just used to
them, or have worked around them, or our users don't cause
them)?

If you search for the word 'killed' in your sshd_log, how many entries per day you see?
How many SSHD-related exceptions do you see in your error_log daily?

Alon Bar-Lev

unread,
Oct 5, 2017, 12:26:06 PM10/5/17
to lucamilanesio, Repo and Gerrit Discussion
All were fixed, so it is unfair.
This shows an active and receptive upstream.
Maybe without lack of resources but still.
I used apache mina sshd before and once it detached from mina there
was instability period which was resolved after a short while with
some assistance of the community, then a switch in maintainership
cased a bit of instability and longer response times, this was also
resolved.

Luca Milanesio

unread,
Oct 5, 2017, 12:44:07 PM10/5/17
to Alon Bar-Lev, Repo and Gerrit Discussion
Hi Alon, I have just extracted the list of issues we have found during the different releases of Gerrit Code Review.
If you run the same query for searching for Jetty-related issues in Gerrit your list would be much smaller. Those are facts, not opinions or judgements on people or communities.

> This shows an active and receptive upstream.
> Maybe without lack of resources but still.
> I used apache mina sshd before and once it detached from mina there
> was instability period which was resolved after a short while with
> some assistance of the community, then a switch in maintainership
> cased a bit of instability and longer response times, this was also
> resolved.

I am only showing data extracted from our logs, without judging the the people involved in the project.
My comments are purely on software, not on people's commitment or will to contribute.

We are of course grateful to the Apache Mina community, my point was more on the stability across different versions of SSHD (included in Gerrit) and if it makes sense to make that layer pluggable.
That would allow to:

a) Plug different implementations or versions of the same
b) Be more transparent in our release notes about the known issues in the Apache Mina SSHD stack that is included.

Example: I have version X.Y.Z of Gerrit. Which one of the above list apply to me or not? We don't have that list and our clients are suffering from a mix / combination of those.

I could start writing that list and then contribute back to the Gerrit documentation set :-)

Luca.

Martin Fick

unread,
Oct 5, 2017, 1:01:34 PM10/5/17
to repo-d...@googlegroups.com, lucamilanesio
On Thursday, October 05, 2017 09:15:48 AM lucamilanesio
wrote:
Which ones are current, and concern gerrit master (or the
latest stable that we have no reason to believe would be
fixed on master)?


> We have clients with a number of different versions and
> combinations of configuration and volumes, and in one way
> or another, they are impacted by one or more of those
> issues.

The only way to fix that is to upgrade Gerrit. Any decisions
we make about the future of Gerrit will not affect your
customers running older versions.


> > > This is not good and there isn't so much we can do
> > > about it because Apache Mina SSHD is not under our
> > > control or will.
> >
> > I am guessing from the tone of this that you feel that
> > "fix issues upstream" is an option you don't think is
> > valid, can you explain this?

I didn't see an answer to this, except to say that we are
actively doing this, so I am confused as to why "forking
Mina" was listed as a course of action, but not "continuing
to contribute upstream"?


> I have used *a lot* Apache Mina SSHD in the past, and at
> the end of the day, we decided to go for a more robust
> and stable implementation. That was a long ago, and we
> assumed that the project wasn't just mature enough.

What implementation did you consider to be a "more robust
and stable implementation"?



> If you search for the word 'killed' in your sshd_log, how
> many entries per day you see?

We had 26 yesterday on one master. I am under the
impression that these are users killing their sessions
because they ran too long?

> How many SSHD-related exceptions do you see in your
> error_log daily?

I don't know that I have a good way to identify those, aside
from the obvious "Connection reset by peer"s (also users
disconnecting early) which we had 2640 of yesterday on the
same master?

We generally are not having complaints from users about ssh
(aside from new users not being able to login) that I am
aware of, it seems very robust for many years. Perhaps 2.7
and below works well? :)

Martin Fick

unread,
Oct 5, 2017, 1:04:38 PM10/5/17
to repo-d...@googlegroups.com, Luca Milanesio, Alon Bar-Lev
On Thursday, October 05, 2017 05:44:03 PM Luca Milanesio
wrote:
> We are of course grateful to the Apache Mina community, my
> point was more on the stability across different versions
> of SSHD (included in Gerrit) and if it makes sense to
> make that layer pluggable. That would allow to:

This would be neat. I have no idea how feasible that would
be.

> a) Plug different implementations or versions of the same
> b) Be more transparent in our release notes about the
> known issues in the Apache Mina SSHD stack that is
> included.
>
> Example: I have version X.Y.Z of Gerrit. Which one of the
> above list apply to me or not? We don't have that list
> and our clients are suffering from a mix / combination of
> those.
>
> I could start writing that list and then contribute back
> to the Gerrit documentation set :-)

Yes, that would be great if you have that info. It might
even help us for our eventual upgrade! :)

Alon Bar-Lev

unread,
Oct 5, 2017, 1:07:45 PM10/5/17
to Luca Milanesio, Repo and Gerrit Discussion
On 5 October 2017 at 19:44, Luca Milanesio <luca.mi...@gmail.com> wrote:
>
>
>> On 5 Oct 2017, at 17:26, Alon Bar-Lev <alon....@gmail.com> wrote:
>>
>> On 5 October 2017 at 19:15, lucamilanesio <luca.mi...@gmail.com> wrote:
>>>

<snip>

> Hi Alon, I have just extracted the list of issues we have found during the different releases of Gerrit Code Review.
> If you run the same query for searching for Jetty-related issues in Gerrit your list would be much smaller. Those are facts, not opinions or judgements on people or communities.

When all the respect for Jetty, I believe that the ssh protocol is by
design and use much more complex than HTTP, I would have expected more
issues in SSH over the years as SSH is actually progressing and
growing.

luca.mi...@gmail.com

unread,
Oct 5, 2017, 3:35:40 PM10/5/17
to Alon Bar-Lev, Repo and Gerrit Discussion


Sent from my iPhone
This is actually a good point, thanks for sharing it.

Luca

luca.mi...@gmail.com

unread,
Oct 6, 2017, 2:17:09 AM10/6/17
to Martin Fick, repo-d...@googlegroups.com
This is a very good point: if we had a pluggable SSHD module, you could just swap the Apache Mina version and move to a more stable one without having to migrate to Gerrit master (or another version).

Similarly if you are currently running a very stable combination of Gerrit + SSHD (like Sasha or Martin) then you may want to upgrade Gerrit without touching the Ssh layer at all.

At the end of the day, Git/SSH is natively a loose integration where the SSHD daemon is standalone and just run git commands. Only in Gerrit we have embedded in the JVM.

Saša Živkov

unread,
Oct 6, 2017, 8:33:34 AM10/6/17
to lucamilanesio, Repo and Gerrit Discussion
On Thu, Oct 5, 2017 at 6:04 PM, lucamilanesio <luca.mi...@gmail.com> wrote:


On Thursday, October 5, 2017 at 3:11:52 PM UTC+1, zivkov wrote:


On Thu, Oct 5, 2017 at 9:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems,

We also run Apache Mina SSHD without problems and we have: close to 20K repositories, over 20K registered users,
some large repos with over 400K of refs, etc...
I don't really know what we did right but if you are interested into some specific details of our gerrit.config let me know.

What Gerrit version are you running?

2.12.x with some small customizations specific to SAP, mostly about authentication.

Have you changed / customised the Apache Mina SSHD version you use?
 
No. We use exactly the version specified in lib/mina/BUCK:
  id = 'org.apache.sshd:sshd-core:0.14.0',
  id = 'org.apache.mina:mina-core:2.0.8',

We also use sshd.backend=NIO2.
 
 
 
it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

Feedback is more than welcome, as usual :-)

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

luca.mi...@gmail.com

unread,
Oct 6, 2017, 9:03:14 AM10/6/17
to Saša Živkov, Repo and Gerrit Discussion

Sent from my iPhone


On 6 Oct 2017, at 13:32, Saša Živkov <ziv...@gmail.com> wrote:



On Thu, Oct 5, 2017 at 6:04 PM, lucamilanesio <luca.mi...@gmail.com> wrote:


On Thursday, October 5, 2017 at 3:11:52 PM UTC+1, zivkov wrote:


On Thu, Oct 5, 2017 at 9:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems,

We also run Apache Mina SSHD without problems and we have: close to 20K repositories, over 20K registered users,
some large repos with over 400K of refs, etc...
I don't really know what we did right but if you are interested into some specific details of our gerrit.config let me know.

What Gerrit version are you running?

2.12.x with some small customizations specific to SAP, mostly about authentication.

Oh, that’s it. I believe the problems started from 2.13 onwards.


Have you changed / customised the Apache Mina SSHD version you use?
 
No. We use exactly the version specified in lib/mina/BUCK:
  id = 'org.apache.sshd:sshd-core:0.14.0',
  id = 'org.apache.mina:mina-core:2.0.8',

Yep, I believe some instability on SSHD started a bit later. Let me find the reference in DavidO change.

As we said they are possibly fixed on master, but until you upgrade to 2.15 (or 3.0?) you won’t see the benefits :-(


We also use sshd.backend=NIO2.

Interesting, the NIO2 was experimental in that version if I remember correctly. Do you remember why you changed to NIO2 which wasn’t the default?

Doug Luedtke

unread,
Oct 6, 2017, 12:46:39 PM10/6/17
to Repo and Gerrit Discussion

Oh, that’s it. I believe the problems started from 2.13 onwards.


Have you changed / customised the Apache Mina SSHD version you use?
 
No. We use exactly the version specified in lib/mina/BUCK:
  id = 'org.apache.sshd:sshd-core:0.14.0',
  id = 'org.apache.mina:mina-core:2.0.8',

Yep, I believe some instability on SSHD started a bit later. Let me find the reference in DavidO change.

As we said they are possibly fixed on master, but until you upgrade to 2.15 (or 3.0?) you won’t see the benefits :-(


We also use sshd.backend=NIO2.

Interesting, the NIO2 was experimental in that version if I remember correctly. Do you remember why you changed to NIO2 which wasn’t the default?

I believe this was the reason for users to change to NIO2.

luca.mi...@gmail.com

unread,
Oct 6, 2017, 1:19:34 PM10/6/17
to Doug Luedtke, ziv...@gmail.com, Repo and Gerrit Discussion


Sent from my iPhone
Good catch !

@Saša do you confirm?

Saša Živkov

unread,
Oct 9, 2017, 8:12:36 AM10/9/17
to Luca Milanesio, Repo and Gerrit Discussion
Yes, I do because we version and review all changes to gerrit.config :-)
Here is the commit message:
Replace MINA by NIO2

Use NIO2 backend instead of MINA. On <a-production-gerrit-server> NIO2 backend proved
to solve ever growing number of stale ssh connections which contain
no user name and "?" instead of IP address.

Hugo Arès

unread,
Oct 10, 2017, 10:50:59 AM10/10/17
to Repo and Gerrit Discussion


On Thursday, October 5, 2017 at 3:36:51 AM UTC-4, lucamilanesio wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH. 

I know for sure that some large deployments (e.g. Qualcomm) are running with Git/SSH on Apache Mina SSHD without problems, it would be interesting to get feedback on how and why they are not seeing the problems that the rest of the community is experiencing.

We (Ericsson) are running Gerrit 2.12 with default sshd-core (0.14.0) and mina (2.0.8) versions
included in 2.12. We use the NIO2 backend. We do have a lot of git traffic over ssh (average 12
millions transactions per day) and we are not experiencing any major issue.

We have been testing 2.14 for a while now and we did not see any obvious regressions in sshd/mina but
the real test will be next Saturday (October 14th) when 2.14 will be rolled out on our production servers.

I will update this thread after 2.14 upgrade, hopefully I will have noting to report.

 

luca.mi...@gmail.com

unread,
Oct 10, 2017, 12:28:48 PM10/10/17
to Hugo Arès, Repo and Gerrit Discussion


Sent from my iPhone
I believe there is an issue in the version we integrated starting from Gerrit 2.13, and that’s possibly the reason why you guys hasn’t noticed that.

Let me write a bug report later today.

Not sure if it is only related to the Mina backend, let me check.


I will update this thread after 2.14 upgrade, hopefully I will have noting to report.

 
Feedback is more than welcome, as usual :-)

--

lucamilanesio

unread,
Oct 13, 2017, 11:06:33 AM10/13/17
to Repo and Gerrit Discussion
Apologies for the delay, see the bug report at:

All the Gerrit versions from 2.13.x are impacted, because of an issue on SSHD configuration options.
Feel free to comment on the ticket.

Luca.

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Hugo Arès

unread,
Oct 19, 2017, 2:50:30 PM10/19/17
to Repo and Gerrit Discussion

We did upgrade to 2.14 and we had few issues with ssh:

-Cloning of large repos did not work, as reported in [1], fixed in 2.14.5.
-Ssh threads deadlock with NI02 backend, worked around it by switching back to MINA, see [2].
-Logs are flooded with "BufferException: Underflow", see [3].

[1]https://bugs.chromium.org/p/gerrit/issues/detail?id=7425
[2]https://bugs.chromium.org/p/gerrit/issues/detail?id=7486
[3]https://bugs.chromium.org/p/gerrit/issues/detail?id=4947
 

lucamilanesio

unread,
Oct 19, 2017, 4:40:46 PM10/19/17
to Repo and Gerrit Discussion
What surprises me, is that we moved from MINA to NIO2 because of deadlock issues ... that are now on NIO2 and so we switch back to MINA :-(
What is the *most stable* backend we are supposed to refer to?

We changed as well the default to NIO2, which is in 2.14 the wrong and unstable default backend now :-(

Richard Christie

unread,
Oct 28, 2017, 7:15:46 AM10/28/17
to Repo and Gerrit Discussion
Just weighing in on this thread (late to the party) after Luca pointed me to it from the one I raised.

We've definitely seen problems with ssh from time to time over use of gerrit. We've kept mostly up to date with new releases since gerrit 2.6 days over the years. Now we're pinned at 2.13.9 due to the dropping of digest auth in 2.14 - we just have too many groups with http passwords stored in jenkins setups. So that, along with the renaming of the pre-flight hook, are going to cause a lot of pain for us when we do move forward. We would like to though. Polygerrit looks nice, and regardless, being stuck on an old version is bad karma.

We use default ssh settings in each release other than a thread cap of 30, and an idle timeout of 5mins.
The idle timeout is to deal with problems from Atlassian Bamboo gReview which has a bad habit of leaving open idle ssh connections.

Largely we've seen ssh problems in the following situations:
  • Cloning from Windows clients with Putty and openssh seem to 'lock up' quite often and trigger idle timeouts
    • This actually happens frequently enough that we wrap git with code which looks for an idle timeout on fetch and simply retries.
  • Cloning large (multi GB) repositories on to overloaded NFS storage often erroneously trigger the idle timeout, presumably because git cannot flush the data to disk fast enough
  • Cloning via "repo sync" has recently started producing 'random' disconnects with killed messages in the ssh log, but nothing in the error log (which usually reports idle timeouts with stack traces). It tends to happen with non-interactive users when the server is quite busy
This last one has been causing us the most grief recently. More details on the original thread I raised: https://groups.google.com/forum/#!topic/repo-discuss/iLFgxnUjEOI

Scale wise, we have masters dotted geographically around the world (EU, NA, Asia) and most teams use repositories hosted closest to them. We have slave/mirrors with the replication plugin in various small offices, and all the major data-centres (EU/NA/Asia) have full mirrors of all the others. Master nodes have 24 cores dedicated to gerrit and 60-200G of RAM. Repos are served from local RAID-10 SSD. Most masters are serving around 1k repositories and maybe to around 500 real users. Repositories range from small config (kb) to sprawling hardware monstrosities (multi-Gb).

CI users generate by far the most traffic though: clusters in the data centres can sometimes trigger hundreds of nearly simultaneous fetch requests from various CI flows during the working day. On average the rate of requests is much lower - 5-10 a minute at night is more normal.

The observations we've had is that ssh reliability did seem to drop in 2.13, certainly in the early releases, which seem to have been mirrored by feelings above - unstable. We also saw (and continue to see) Hugo's "BufferException: Underflow" flood in the logs. It's also hard to say, as the number of users has massively exploded over the 2.12-2.13 life-time, so some stability issues may be down to that.

Luca's comments as to switching between backends worry me a bit as it looks like sshd is seen here to be a continual problem. Our end users (developers) expect ssh to "just work" and largely don't want to know about this.

I'm happy to test backends for comparison in stability, at least whilst things are "bad" at the moment.

thomasmu...@yahoo.com

unread,
Oct 28, 2017, 8:00:35 AM10/28/17
to Repo and Gerrit Discussion
Your "BufferException: Underflow" warnings are fixed in  https://github.com/apache/mina-sshd/commit/fb4e1fdf0aafbe39c054a13f08f414730d996cca

I think you may want to build from the head of the stable-2.13 branch for gerrit as it includes this


Which fixes cloning large repo's.

For your idle problems, why not up the limit a little bit? maybe 2-3 minutes more?

And set sshd.waitTimeout (if you built from the head of the stable-2.13 branch)

Richard Christie

unread,
Oct 28, 2017, 10:18:34 AM10/28/17
to Repo and Gerrit Discussion
If the warnings really are just due to buggy ssh clients, that's fine, we can just ignore them - we've been ignoring them anyway.

Just to be clear, the timeouts are not caused by individual large repositories, they seem to happen because the connection 'randomly' goes idle during (or perhaps between, in repo's case) transfers. I'll try adding the fix for the WAIT_FOR_SPACE_TIMEOUT to see whether it improves things. The repo sync problems are intermittent though, and complain about the longer 5min timeout (when it says "killed") so not sure if it is related. We also have not been able to force reproduce them when they are not happening.

Thanks for the info and pointers!

Shawn Pearce

unread,
Oct 31, 2017, 12:39:51 AM10/31/17
to lucamilanesio, Repo and Gerrit Discussion
On Thu, Oct 5, 2017 at 12:36 AM, lucamilanesio <luca.mi...@gmail.com> wrote:
Hi, all,
I wanted to trigger a positive and constructive discussion about Gerrit support for Git/SSH based on Apache Mina SSHD.

If you search in the mailing list with the keywords "SSH" and "Mina" you'll find a long list. I clearly remember that every time we upgraded to a recent version, the old issues were mostly gone but new and more worrying issues were arising. We had at times to upgrade and then downgrade the version of Apache Mina SSHD because of that.

This is not good and there isn't so much we can do about it because Apache Mina SSHD is not under our control or will.

There are three options:
  1. Do nothing and just document the current status. We need by far more transparent in our documentation and release notes and incorporating all the known issues and limitations of the Apache Mina SSHD stack.
  2. Fork Apache Mina SSHD and start investing into it to make at least the Gerrit use-cases rock-solid.
  3. Abandon the Apache Mina SSHD route and just use OpenSSH.
I just posted this hack that might support using OpenSSH:
https://gerrit-review.googlesource.com/#/c/gerrit/+/137790 RFC Support OpenSSH as external daemon

Its a pain to configure, but I can see how some installations might prefer this over MINA. Especially if MINA has been difficult.

luca.mi...@gmail.com

unread,
Oct 31, 2017, 8:38:17 AM10/31/17
to Shawn Pearce, Repo and Gerrit Discussion
Hi Shawn, amazing ! Will definitely start testing it :-)

Luca

Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages