Gerrit connection issues

tech....@gmail.com

unread,

Jul 18, 2024, 5:53:27 AM (4 days ago) Jul 18

to Repo and Gerrit Discussion

Hi Experts,

We have Gerrit HA with 48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 110
batchThreads = 24
commandStartThreads = 10
maxConnectionsPerUser = 64
[httpd]
listenUrl = proxy-https://*:8080/
maxThreads = 800
requestLog = true
acceptorThreads = 48
minThreads = 49
maxQueued = 2000

Daniele Sassoli

unread,

Jul 18, 2024, 10:35:54 AM (4 days ago) Jul 18

to Repo and Gerrit Discussion

On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:

Hi Experts,

We have Gerrit HA with 48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 110
batchThreads = 24

What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls

I'm sure you'll see stability improvements.

tech....@gmail.com

unread,

Jul 18, 2024, 1:55:24 PM (4 days ago) Jul 18

to Repo and Gerrit Discussion

On Thursday 18 July 2024 at 20:05:54 UTC+5:30 Daniele Sassoli wrote:

On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:
Hi Experts,

We have Gerrit HA with 48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 110
batchThreads = 24

What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls
I'm sure you'll see stability improvements.

We have users from both service users and interactive ssh users.

We are relying on ssh interaction for repo sync ro clone aosp code. So https is not ideal on such activities on our dev and build servers.

With 48 cores what should be the ideal value of threads and batch threads to support interactive+ non interactive= 130+ connections for each node?

Martin Fick

unread,

Jul 18, 2024, 4:37:39 PM (4 days ago) Jul 18

to tech....@gmail.com, Repo and Gerrit Discussion

On Thu, Jul 18, 2024 at 11:55 AM tech....@gmail.com <tech....@gmail.com> wrote:

On Thursday 18 July 2024 at 20:05:54 UTC+5:30 Daniele Sassoli wrote:
On Thursday 18 July 2024 at 10:53:27 UTC+1 tech....@gmail.com wrote:
Hi Experts,

We have Gerrit HA with 48 CPU . 180 RAM.

SSh connections are getting queued after hitting 70-80 + connections at a time on each node.

What should be the correct value to bear 100 + connections with this amount of Cores or it is not possible to server that much of connection looking at 48 cores for each node.

Here is our config:

[sshd]
listenAddress = *:29418
idleTimeout = 10m
backend = MINA
threads = 110
batchThreads = 24

What kind of users are the connections queueing for? Interactive or batch? You have 86(110-24) threads available for interactive, so what you're seeing makes sense.
In general I would recommend using not using Gerrit's SSH stack, you can search the mailing list for plenty of issues with it. If you can get your users to use HTTP calls
I'm sure you'll see stability improvements.

We have users from both service users and interactive ssh users.
We are relying on ssh interaction for repo sync ro clone aosp code. So https is not ideal on such activities on our dev and build servers.
With 48 cores what should be the ideal value of threads and batch threads to support interactive+ non interactive= 130+ connections for each node?

In my experience, if you are serving AOSP, then the number of cores is less of a limiting factor than your RAM. If you can handle more threads memory wise, then go for it. But generally more threads means more memory utilization which can be problematic if you have some longer running clones on your larger repositories all at the same time. So increase your thread count "at your own risk". Only testing and real world utilization can tell you what works for your setup, but I suspect 110 threads could already potentially be putting your servers at risk of crashing given just the "wrong" workload.

commandStartThreads = 10

None of my testing ever showed this setting to improve performance, I think we always kept it at 2.

maxConnectionsPerUser = 64
[httpd]
listenUrl = proxy-https://*:8080/
maxThreads = 800

This 800 seems excessive and could eventually be a problem if a bunch of them start to do something memory intensive.