Gerrit connection issues

116 views
Skip to first unread message

tech....@gmail.com

unread,
Jul 18, 2024, 5:53:27 AM7/18/24
to Repo and Gerrit Discussion
Hi Experts,

We have Gerrit HA with  48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 110
        batchThreads = 24
        commandStartThreads = 10
        maxConnectionsPerUser = 64
[httpd]
        listenUrl = proxy-https://*:8080/
        maxThreads = 800
        requestLog = true
        acceptorThreads = 48
        minThreads = 49
        maxQueued = 2000

Daniele Sassoli

unread,
Jul 18, 2024, 10:35:54 AM7/18/24
to Repo and Gerrit Discussion
On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:
Hi Experts,

We have Gerrit HA with  48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 110
        batchThreads = 24
 
What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls
I'm sure you'll see stability improvements.

tech....@gmail.com

unread,
Jul 18, 2024, 1:55:24 PM7/18/24
to Repo and Gerrit Discussion
On Thursday 18 July 2024 at 20:05:54 UTC+5:30 Daniele Sassoli wrote:
On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:
Hi Experts,

We have Gerrit HA with  48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 110
        batchThreads = 24
 
What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls
I'm sure you'll see stability improvements.
 
 We have users from both service users and interactive ssh users. 
We are relying on ssh interaction for repo sync ro clone aosp code. So https is not ideal on such activities on our dev and build servers.
With 48 cores what should be the ideal value of threads and batch threads to support interactive+ non interactive= 130+ connections for each node?

Martin Fick

unread,
Jul 18, 2024, 4:37:39 PM7/18/24
to tech....@gmail.com, Repo and Gerrit Discussion
On Thu, Jul 18, 2024 at 11:55 AM tech....@gmail.com <tech....@gmail.com> wrote:
On Thursday 18 July 2024 at 20:05:54 UTC+5:30 Daniele Sassoli wrote:
On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:
Hi Experts,

We have Gerrit HA with  48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 110
        batchThreads = 24
 
What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls
I'm sure you'll see stability improvements.
 
 We have users from both service users and interactive ssh users. 
We are relying on ssh interaction for repo sync ro clone aosp code. So https is not ideal on such activities on our dev and build servers.
With 48 cores what should be the ideal value of threads and batch threads to support interactive+ non interactive= 130+ connections for each node?

In my experience, if you are serving AOSP, then the number of cores is less of a limiting factor than your RAM. If you can handle more threads memory wise, then go for it. But generally more threads means more memory utilization which can be problematic if you have some longer running clones on your larger repositories all at the same time. So increase your thread count "at your own risk". Only testing and real world utilization can tell you what works for your setup, but I suspect 110 threads could already potentially be putting your servers at risk of crashing given just the "wrong" workload.
 

        commandStartThreads = 10

None of my testing ever showed this setting to improve performance, I think we always kept it at 2.
 
        maxConnectionsPerUser = 64
[httpd]
        listenUrl = proxy-https://*:8080/
        maxThreads = 800

This 800 seems excessive and could eventually be a problem if a bunch of them start to do something memory intensive.
 
        requestLog = true
        acceptorThreads = 48
        minThreads = 49
        maxQueued = 2000
 
-Martin

tech....@gmail.com

unread,
Jul 23, 2024, 9:37:16 AM7/23/24
to Repo and Gerrit Discussion
 Our SSH daemon is active. is it even used while ssh is active? we assumed it is combined with ssh thread.

Martin Fick

unread,
Jul 23, 2024, 4:01:30 PM7/23/24
to tech....@gmail.com, Repo and Gerrit Discussion
There are several uses for the http threads. The most obvious is serving up your web traffic and your RestAPIs. The other use is serving up git over http, although this takes up both an http thread and an ssh thread, a design choice which ensures that git over http shows up when observing the queue, and so that git operations can be throttled via the ssh threads setting. 

To be honest, this should probably be improved, I'm not sure why we don't allow Gerrit to have separate queues and configs for the git operations, why bother sharing the ssh queue for that? It wouldn't take much to allow us to configure Gerrit to use a separate thread queue for any command(s) that we want to have its own dedicated thread queue. An http vs ssh split is rather limiting and not a generally super useful split. I guess it was simple at the time, but it may be time to re-evaluate that and consider making it easy to add more thread pools. This could really help with QOS,

-Martin
Reply all
Reply to author
Forward
0 new messages