Gerrit HA and Index failure

175 views
Skip to first unread message

lingalugari mohankrishna

unread,
May 26, 2025, 12:25:02 PM5/26/25
to Repo and Gerrit Discussion
Hello Experts,

We have a 2 node cluster of Gerrit Running on 3.6.8 -- We know it is EOL (Planned for upgrade)

I have been observing too many Times my HTTP Requests on One Node is Going very high and at the same time I am observing from the queue Repository Index is getting triggered for certain repos and they are failing to sync the changes at the same time when my HTTP queue is going high. 

This is literally impacting my Server Accessibility on UI. Any idea on how to fix these issues as HA seems to be not replicating the changes i suspect when i check queue below events can be seen they may be less sometimes and higher sometimes but HTTP queues are going very high.

send event:ref-replication-scheduled => http://x.x.x.:8080 (try #0) 
index change:dev%2Fman%2Fsam~517760 => http://x.x.x.:8080 (try #0)
After service restart everything comes back normal and i had to restart each node twice today. 


Before Restart:
Gerrit Code Review        3.6.8                     now    10:07:28   UTC
                                                 uptime    6 days 15 hrs

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
  adv_bases                     |                     |         |         |
  change_notes                  |   123               |  29.6ms | 97%     |
  changeid_project              |  1024               |         | 85%     |
  changes_by_project            |     1               |         |  0%     |
  default_preferences           |                     |         |         |
  external_ids_map              |     1               |         | 99%     |
  groups                        |                     |         |         |
  groups_bymember               |  1024               | 290.8us | 99%     |
  groups_byname                 |     1               | 823.5us | 99%     |
  groups_bysubgroup             |   357               | 258.0us | 99%     |
  groups_byuuid                 |  1339               |   1.6ms | 99%     |
  groups_external               |     1               |    1.7s | 99%     |
  groups_external_persisted     |                     |    1.6s |  0%     |
  ldap_group_existence          |     7               | 338.3ms | 87%     |
  ldap_groups                   |   653               | 862.9ms | 99%     |
  ldap_groups_byinclude         |  1024               |         | 96%     |
  ldap_usernames                |  1024               | 101.7us | 97%     |
  permission_sort               |  1024               |  91.9us | 97%     |
  plugin_resources              |     7               |         |  2%     |
  project_list                  |     1               |   40.2s | 98%     |
  projects                      |  1024               |   2.9ms | 37%     |
  prolog_rules                  |                     |         |         |
  soy_sauce_compiled_templates  |     1               |  35.5ms | 99%     |
  sshkeys                       |   637               |  60.5ms | 99%     |
  static_content                |    34               |   1.4ms | 85%     |
  lfs-lfs_project_locks         |                     |         |         |
  plugin-manager-plugins_list   |     1               |    3.0s |  0%     |
D accounts                      |  1024  22626  10.95m| 365.1us | 99% 100%|
D change_kind                   | 17049 891328 109.89m|   1.4ms | 97%  99%|
D comment_context               | 17950 251854 158.79m|   6.2ms | 83%  99%|
D conflicts                     | 11388 554225  67.54m|  36.4ms | 75%  99%|
D diff_intraline                |  6516 101892 128.62m|   5.3ms | 18%  98%|
D diff_summary                  |  4087 719529 504.20m|   9.8ms | 77%  99%|
D gerrit_file_diff              | 19139 472656 663.51m|   3.0ms | 52%  97%|
D git_file_diff                 |  8223 342020 505.69m|  38.3ms |  2%  37%|
D git_modified_files            |  4845  43191 174.04m|   3.4ms |  7%  43%|
D git_tags                      |  1024   2108  61.55m|   1.6ms | 66% 100%|
D groups_byuuid_persisted       |         1157   1.04m|         |     100%|
D mergeability                  |  8738 928474 128.28m|  80.7ms | 72%  99%|
D modified_files                |  7893  49839 187.39m|   1.2ms | 95%  92%|
D oauth_tokens                  |                0.00k|         |         |
D persisted_projects            |        13300 661.88m|         |      99%|
D pure_revert                   |   100   7146   1.04m|   5.1ms | 95% 100%|
D web_sessions                  |                0.00k|         |         |

SSH:     29  users, oldest session started    4 days 23 hrs ago
Tasks:  597  total =   74 running +    299 ready +  224 sleeping
Mem: 190.00g total = 125.22g used + 54.78g free + 10.00g buffers
     190.00g max
         707 open files

Threads: 64 CPUs available, 1564 threads
                                    NEW       RUNNABLE        BLOCKED        WAITING  TIMED_WAITING     TERMINATED
  ReceiveCommits                      0              0              0             64              0              0
  SshCommandStart                     0              0              0             24              0              0
  SSH-Interactive-Worker              0              0              0            114              0              0
  SSH git-receive-pack                0              1              0              0              1              0
  H2                                  0              0              0              0             34              0
  HTTP                                0            601            189              1              9              0
  SSH git-upload-pack                 0             39              0              1             40              0
  Other                               0            110              2            211             58              0
  SSH-Stream-Worker                   0              0              0             65              0              0

Luca Milanesio

unread,
May 28, 2025, 3:23:14 AM5/28/25
to Repo and Gerrit Discussion, Luca Milanesio

On 26 May 2025, at 18:25, lingalugari mohankrishna <lingalugari....@gmail.com> wrote:

Hello Experts,

We have a 2 node cluster of Gerrit Running on 3.6.8 -- We know it is EOL (Planned for upgrade)

Yes, please do upgrade. There are many issues that we fixed in the HA plugin and they just do not exist on your version.

I have been observing too many Times my HTTP Requests on One Node is Going very high and at the same time I am observing from the queue Repository Index is getting triggered for certain repos and they are failing to sync the changes at the same time when my HTTP queue is going high. 

This is literally impacting my Server Accessibility on UI. Any idea on how to fix these issues as HA seems to be not replicating the changes i suspect when i check queue below events can be seen they may be less sometimes and higher sometimes but HTTP queues are going very high.

First: upgrade to get the most of fixes in the HA plugin.
Second: check your change.mergeabilityComputationBehavior setting in gerrit.config and make sure that is disabled (the default).


send event:ref-replication-scheduled => http://x.x.x.:8080 (try #0) 

I don’t believe you have just two nodes: how can two nodes in HA configuration schedule replication events?
Can you please share with us the full picture?

index change:dev%2Fman%2Fsam~517760 => http://x.x.x.:8080 (try #0)
After service restart everything comes back normal and i had to restart each node twice today. 

Well, after the service restart the node will be back to normal but you’ll have a lot of stale changes in the index: that’s not exactly *normal* isn't it?



Before Restart:
Gerrit Code Review        3.6.8                     now    10:07:28   UTC
                                                 uptime    6 days 15 hrs

  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
  adv_bases                     |                     |         |         |
  change_notes                  |   123               |  29.6ms | 97%     |
  changeid_project              |  1024               |         | 85%     |

Your changeid_project cache is maxed out. You do have more than 1024 projects I believe.

  changes_by_project            |     1               |         |  0%     |
  default_preferences           |                     |         |         |
  external_ids_map              |     1               |         | 99%     |
  groups                        |                     |         |         |
  groups_bymember               |  1024               | 290.8us | 99%     |

Your groups_bymember cache is maxed out. You do have more than 1024 groups.

  groups_byname                 |     1               | 823.5us | 99%     |
  groups_bysubgroup             |   357               | 258.0us | 99%     |
  groups_byuuid                 |  1339               |   1.6ms | 99%     |
  groups_external               |     1               |    1.7s | 99%     |
  groups_external_persisted     |                     |    1.6s |  0%     |
  ldap_group_existence          |     7               | 338.3ms | 87%     |
  ldap_groups                   |   653               | 862.9ms | 99%     |
  ldap_groups_byinclude         |  1024               |         | 96%     |
  ldap_usernames                |  1024               | 101.7us | 97%     |
  permission_sort               |  1024               |  91.9us | 97%     |
  plugin_resources              |     7               |         |  2%     |
  project_list                  |     1               |   40.2s | 98%     |
  projects                      |  1024               |   2.9ms | 37%     |

Same as above.

  prolog_rules                  |                     |         |         |
  soy_sauce_compiled_templates  |     1               |  35.5ms | 99%     |
  sshkeys                       |   637               |  60.5ms | 99%     |
  static_content                |    34               |   1.4ms | 85%     |
  lfs-lfs_project_locks         |                     |         |         |
  plugin-manager-plugins_list   |     1               |    3.0s |  0%     |
D accounts                      |  1024  22626  10.95m| 365.1us | 99% 100%|

Your accounts in-memory cache is maxed out: you have 22k users but only 1k of them are loaded in the inmemory cache.
Bottom line: your setup would need a bit of review and adjustments to become more suitable for production. With over 22k users, you would need a substantial health check of your setup to avoid them going through painful restarts.

HTH

Luca.





--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/b5ad845d-1abe-404f-80c9-1b60d6e05cb7n%40googlegroups.com.

lingalugari mohankrishna

unread,
May 28, 2025, 8:58:11 AM5/28/25
to Repo and Gerrit Discussion
Hi Luca,

Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
        basePath = /shared/git
[index]
        type = lucene
        ramBufferSize = 4096m
        maxTerms = 8192
        maxBufferedDocs = 3000
        threads = 16
        batchThreads = 16
        maxMergeCount = 100
        reuseExistingDocuments = true
        defaultLimit = 100
        maxLimit = 500
        cacheQueryResultsByChangeNum = true

[receive]
        enableSignedPush = false
        timeout = 150min
[transfer]
        timeout = 120s
[sendemail]
        smtpServer = localhost
        smtpServerPort = 25
        smtpUser = gerrit
[container]
        user = gerrit
        #javaHome = /usr/java/latest
        javaHome = /usr/lib/jvm/jre
        heapLimit = 190g
        javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
        javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 200
        batchThreads = 70
        commandStartThreads = 24
        maxConnectionsPerUser = 64
[httpd]
        listenUrl = proxy-https://*:8080/
        maxThreads = 1000
        requestLog = true
        acceptorThreads = 48
        minThreads = 49
        maxQueued = 2000
[cache]
        directory = cache
        threads = 0
[cache "project_list"]
        maxAge = 130s
[gitweb]
        cgi = /var/www/git/gitweb.cgi
        type = gitweb
[core]
        packedGitLimit = 10g
        packedGitWindowSize = 16k
        packedGitOpenFiles = 10240
#[gc]
#        startTime = Sun 00:00
#        interval = 1 w
[pack]
        threads = 24
        windowMemory = 16g
[log "channel.name"]
        level = DEBUG
[plugins]
        allowRemoteAdmin = true
[hooks]
        path = /data/gerrit/hooks
        syncHookTimeout = 900
        commitReceivedHook = commit-received
[user]
        email = ger...@gmail.com
[change]
#        mergeabilityComputationBehavior = API_REF_UPDATED_AND_CHANGE_REINDEX
        cumulativeCommentSizeLimit = 10m
        maxComments = 40000
        maxUpdates = 10000
        conflictsPredicateEnabled = true
        submitWholeTopic = true
        maxPatchSets = 1000000
[lfs]
        plugin = lfs

[accountPatchReviewDb]
        url = jdbc:postgresql://postgres:5432/and_ha?user=mohan&password=krishna
[database "h2"]
        autoServer = true

and
HA.config


main]
  sharedDirectory = /shared/git

[peerInfo]
  strategy = static

[autoReindex]
  enabled = false
  delay = 10
  pollInterval = 0

[peerInfo "static"] (on Node2)
  url = http://Node1:8080
  url = http://Node2:8080
 

[peerInfo "static"] (on Node1)
  url = http://Node2:8080
  url = http://Node1:8080


[http]
  maxTries = 360
  retryInterval = 10000
  connectionTimeout = 5000
  socketTimeout = 5000

[healthcheck]
  enable = true

[cache]
  threadPoolSize = 8
[index]
  threadPoolSize = 12
  numStripedLocks = 10000
  maxTries = 10
[websession]
  cleanupInterval = 24 hours

Matthias Sohn

unread,
May 28, 2025, 10:37:39 AM5/28/25
to lingalugari mohankrishna, Repo and Gerrit Discussion
On Wed, May 28, 2025 at 2:58 PM lingalugari mohankrishna <lingalugari....@gmail.com> wrote:
Hi Luca,

Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
        basePath = /shared/git
[index]
        type = lucene
        ramBufferSize = 4096m

this option doesn't exist, but there is an option for each index called index.<index name>.ramBufferSize
 
        maxTerms = 8192
        maxBufferedDocs = 3000
        threads = 16
        batchThreads = 16
        maxMergeCount = 100
        reuseExistingDocuments = true
        defaultLimit = 100
        maxLimit = 500
        cacheQueryResultsByChangeNum = true

[receive]
        enableSignedPush = false
        timeout = 150min

Why do you think you need a 2.5 hour timeout for receiving push requests ?
 
[transfer]
        timeout = 120s
[sendemail]
        smtpServer = localhost
        smtpServerPort = 25
        smtpUser = gerrit
[container]
        user = gerrit
        #javaHome = /usr/java/latest
        javaHome = /usr/lib/jvm/jre

Which Java version are you using ?
 
        heapLimit = 190g
        javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
        javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 200

sshd.threads limits the number of concurrently executed ssh requests and also the concurrently executed git requests (both ssh and http)
see https://gerrit-documentation.storage.googleapis.com/Documentation/3.6.8/config-gerrit.html#sshd.threads
As a rule of thumb fetching from a repo can keep one CPU core busy, hence allowing up to 
200 concurrent git requests on a 64 CPU machine may overload it.
 
        batchThreads = 70
        commandStartThreads = 24
        maxConnectionsPerUser = 64
[httpd]
        listenUrl = proxy-https://*:8080/
        maxThreads = 1000
        requestLog = true
        acceptorThreads = 48
        minThreads = 49
        maxQueued = 2000
[cache]
        directory = cache
        threads = 0
[cache "project_list"]
        maxAge = 130s

Why do you think you need that ? This means gerrit potentially has to scan the file system to find projects every 2 minutes.

These options are meant to protect your gerrit server from a run-away script
or a crazy user to prevent issues caused by excessive size/number of objects related to a change.
You are effectively switching off these limits. A more cautious approach would be to start with the defaults
and increase limits when a user hits it with a reasonable use case you want to support.

Though it would be an interesting test to push 1 million patchsets for a single change
and see how gerrit can cope with it.
 
[lfs]
        plugin = lfs

[accountPatchReviewDb]
        url = jdbc:postgresql://postgres:5432/and_ha?user=mohan&password=krishna
[database "h2"]

The database section ceased to exist in gerrit 3.x. It was used to configure reviewdb in gerrit 2.x.
 
        autoServer = true

and
HA.config


main]
  sharedDirectory = /shared/git

Which type of filesystem are you using for the sharedDir ? 

[peerInfo]
  strategy = static

[autoReindex]
  enabled = false
  delay = 10
  pollInterval = 0

[peerInfo "static"] (on Node2)
  url = http://Node1:8080
  url = http://Node2:8080
 

[peerInfo "static"] (on Node1)
  url = http://Node2:8080
  url = http://Node1:8080


[http]
  maxTries = 360
  retryInterval = 10000
  connectionTimeout = 5000
  socketTimeout = 5000

[healthcheck]
  enable = true

[cache]
  threadPoolSize = 8
[index]
  threadPoolSize = 12
  numStripedLocks = 10000

Where did you find this option ?
 
  maxTries = 10
[websession]
  cleanupInterval = 24 hours


On Wednesday, 28 May 2025 at 12:53:14 UTC+5:30 Luca Milanesio wrote:

On 26 May 2025, at 18:25, lingalugari mohankrishna <lingalugari....@gmail.com> wrote:

Hello Experts,

We have a 2 node cluster of Gerrit Running on 3.6.8 -- We know it is EOL (Planned for upgrade)

Yes, please do upgrade. There are many issues that we fixed in the HA plugin and they just do not exist on your version.

I have been observing too many Times my HTTP Requests on One Node is Going very high and at the same time I am observing from the queue Repository Index is getting triggered for certain repos and they are failing to sync the changes at the same time when my HTTP queue is going high. 

This is literally impacting my Server Accessibility on UI. Any idea on how to fix these issues as HA seems to be not replicating the changes i suspect when i check queue below events can be seen they may be less sometimes and higher sometimes but HTTP queues are going very high.

First: upgrade to get the most of fixes in the HA plugin.
Second: check your change.mergeabilityComputationBehavior setting in gerrit.config and make sure that is disabled (the default).


send event:ref-replication-scheduled => http://x.x.x.:8080 (try #0) 

I don’t believe you have just two nodes: how can two nodes in HA configuration schedule replication events?
Can you please share with us the full picture?

The high-availability plugin relies on a shared file system to let all nodes in the HA cluster access git repositories.
Hence there's no replication involved in that HA setup.
If you want to store the data in a separate storage for each node you should look at a multi-site setup instead.
 

Luca Milanesio

unread,
May 29, 2025, 5:33:41 PM5/29/25
to Repo and Gerrit Discussion, Luca Milanesio

On 28 May 2025, at 16:37, Matthias Sohn <matthi...@gmail.com> wrote:

On Wed, May 28, 2025 at 2:58 PM lingalugari mohankrishna <lingalugari....@gmail.com> wrote:
Hi Luca,

Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
        basePath = /shared/git
[index]
        type = lucene
        ramBufferSize = 4096m

this option doesn't exist, but there is an option for each index called index.<index name>.ramBufferSize

I am curious where did you get this suggestion from? What were you trying to achieve? Did you experience some Lucene slowdowns? Do we have some outdated documentation mentioning it somewhere?

 
        maxTerms = 8192
        maxBufferedDocs = 3000
        threads = 16
        batchThreads = 16
        maxMergeCount = 100
        reuseExistingDocuments = true

This option is not available on Gerrit v3.8x., where did you find it?

        defaultLimit = 100
        maxLimit = 500
        cacheQueryResultsByChangeNum = true

This isn’t needed as the default is ’true’ anyway.


[receive]
        enableSignedPush = false
        timeout = 150min

Why do you think you need a 2.5 hour timeout for receiving push requests ?

+1, you should look at any long-running connections and why they are taking so long.

 
[transfer]
        timeout = 120s
[sendemail]
        smtpServer = localhost
        smtpServerPort = 25
        smtpUser = gerrit
[container]
        user = gerrit
        #javaHome = /usr/java/latest
        javaHome = /usr/lib/jvm/jre

Which Java version are you using ?
 
        heapLimit = 190g
        javaOptions = "-Dflogger.backend_factory=com.google.common.flogger.backend.log4j.Log4jBackendFactory#getInstance"
        javaOptions = "-Dflogger.logging_context=com.google.gerrit.server.logging.LoggingContext#getInstance"
[sshd]
        listenAddress = *:29418
        idleTimeout = 10m
        backend = MINA
        threads = 200

sshd.threads limits the number of concurrently executed ssh requests and also the concurrently executed git requests (both ssh and http)
see https://gerrit-documentation.storage.googleapis.com/Documentation/3.6.8/config-gerrit.html#sshd.threads
As a rule of thumb fetching from a repo can keep one CPU core busy, hence allowing up to 
200 concurrent git requests on a 64 CPU machine may overload it.

+1, you should not exceed 2x the number of CPUs, therefore 128 threads in your case.
 
        batchThreads = 70
        commandStartThreads = 24
        maxConnectionsPerUser = 64
[httpd]
        listenUrl = proxy-https://*:8080/
        maxThreads = 1000

1000 threads??? I doubt your box would be able to manage 1000 concurrent REST-API.

        requestLog = true
        acceptorThreads = 48
        minThreads = 49
        maxQueued = 2000

How many concurrent users do you have daily? You’re allowing 1000 concurrent requests and 2000 queued to be executed, 3000 in total.

[cache]
        directory = cache
        threads = 0
[cache "project_list"]
        maxAge = 130s

Why do you think you need that ? This means gerrit potentially has to scan the file system to find projects every 2 minutes.

+1

 
[gitweb]
        cgi = /var/www/git/gitweb.cgi
        type = gitweb
[core]
        packedGitLimit = 10g

You have 190g of heap, and just 10g of JGit cache?
Have you analysed how efficient it is and what’s the eviction rate?

        packedGitWindowSize = 16k
        packedGitOpenFiles = 10240

I doubt 10k open files will suffice in your case, unless you have a small number of small repositories.

#[gc]
#        startTime = Sun 00:00
#        interval = 1 w
[pack]
        threads = 24
        windowMemory = 16g

That’s only read as part of the TransferConfig, but not for all the rest of JGit.
I’d recommend to put the JGit-specific configs in $GERRIT_SITE/etc/jgit.config as they’ll be used everywhere.

[log "channel.name"]
        level = DEBUG

DEBUG? Are you troubleshooting some issues? DEBUG should never be used in production; what is ‘channel.name’?

[plugins]
        allowRemoteAdmin = true
[hooks]
        path = /data/gerrit/hooks
        syncHookTimeout = 900

Waiting for 15 minutes for a synchronous hook to complete means you’re blocking threads for a very very long time.
Why would a sync hook take so long to execute?


        commitReceivedHook = commit-received
[user]
        email = ger...@gmail.com

Do you really own that e-mail?

[change]
#        mergeabilityComputationBehavior = API_REF_UPDATED_AND_CHANGE_REINDEX
        cumulativeCommentSizeLimit = 10m
        maxComments = 40000
        maxUpdates = 10000
        conflictsPredicateEnabled = true
        submitWholeTopic = true
        maxPatchSets = 1000000

These options are meant to protect your gerrit server from a run-away script
or a crazy user to prevent issues caused by excessive size/number of objects related to a change.
You are effectively switching off these limits. A more cautious approach would be to start with the defaults
and increase limits when a user hits it with a reasonable use case you want to support.

Though it would be an interesting test to push 1 million patchsets for a single change
and see how gerrit can cope with it.

+1

 
[lfs]
        plugin = lfs

[accountPatchReviewDb]
        url = jdbc:postgresql://postgres:5432/and_ha?user=mohan&password=krishna

Did you share the production credentials by mistake here?

[database "h2"]

The database section ceased to exist in gerrit 3.x. It was used to configure reviewdb in gerrit 2.x.
 
        autoServer = true

and 
HA.config


main]
  sharedDirectory = /shared/git

Which type of filesystem are you using for the sharedDir ? 

I believe it should be NFS, correct?

Can you share the NFS version, mount and cache options?
Also, have you checked the $GERRIT_SITE/etc/jgit.config for the relevant NFS-specific options?


[peerInfo]
  strategy = static

[autoReindex]
  enabled = false
  delay = 10
  pollInterval = 0

[peerInfo "static"] (on Node2) 
  url = http://Node1:8080
  url = http://Node2:8080
  

[peerInfo "static"] (on Node1) 
  url = http://Node2:8080
  url = http://Node1:8080


[http]
  maxTries = 360
  retryInterval = 10000
  connectionTimeout = 5000
  socketTimeout = 5000

You are retrying the calls for 360*10 =3,600 seconds (1h). 
That means that if one of the node is going down, the one left will keep on accumulating retries in memory and the associated objects.
The result would be that it will run out of memory *if the retries* are going on for 1h, and then you’ll end up with a global outage.

Where di you receive the above recommendation for those settings?
They do not seem to be randonly chosen so there must have been a rationale behind them.


[healthcheck]
  enable = true

I do not recommend to use the one included in the high-availability plugin and use the healthcheck plugin instead.
I confirm my final statement: I don’t believe you have *one* problem but a *series of problems* due mainly to misconfiguration of the system.
Based on the traffic you have, the data on the repositories and the users, it should be pretty straightforward to come to a much balanced set of settings.

HTH

Luca.

Matthias Sohn

unread,
May 29, 2025, 6:19:12 PM5/29/25
to Luca Milanesio, Repo and Gerrit Discussion
On Thu, May 29, 2025 at 11:33 PM Luca Milanesio <luca.mi...@gmail.com> wrote:


On 28 May 2025, at 16:37, Matthias Sohn <matthi...@gmail.com> wrote:

On Wed, May 28, 2025 at 2:58 PM lingalugari mohankrishna <lingalugari....@gmail.com> wrote:
Hi Luca,

Thanks for pointing out and my current Gerrit config looks like below. can you help me on fine tuning it to improve performance ?
GERRIT.CONFIG
[gerrit]
        basePath = /shared/git
[index]
        type = lucene
        ramBufferSize = 4096m

this option doesn't exist, but there is an option for each index called index.<index name>.ramBufferSize

I am curious where did you get this suggestion from? What were you trying to achieve? Did you experience some Lucene slowdowns? Do we have some outdated documentation mentioning it somewhere?

 
        maxTerms = 8192
        maxBufferedDocs = 3000
        threads = 16
        batchThreads = 16
        maxMergeCount = 100
        reuseExistingDocuments = true

This option is not available on Gerrit v3.8x., where did you find it?

This option has been available since 3.10.
Finding the optimal configuration is challenging with hundreds of Gerrit options.
Can we make this easier ?
 

Luca Milanesio

unread,
May 29, 2025, 6:28:58 PM5/29/25
to Repo and Gerrit Discussion, Luca Milanesio
I have been advocating for years for having safer defaults :-) and I’m still all for it.

Some examples:
1) Infinite timeouts (still today, Gerrit wait indefinitely for the SMTP server to respond)
2) Unsafe defaults which may cause repository corruption (we have warning on the release notes, but unsure of how many people actually read them)

Luca.

Matthias Sohn

unread,
May 31, 2025, 5:44:10 PM5/31/25
to Luca Milanesio, Repo and Gerrit Discussion

Luca Milanesio

unread,
Jun 1, 2025, 3:32:09 AM6/1/25
to Repo and Gerrit Discussion, Luca Milanesio


[…]
I confirm my final statement: I don’t believe you have *one* problem but a *series of problems* due mainly to misconfiguration of the system.
Based on the traffic you have, the data on the repositories and the users, it should be pretty straightforward to come to a much balanced set of settings.

Finding the optimal configuration is challenging with hundreds of Gerrit options.
Can we make this easier ?

I have been advocating for years for having safer defaults :-) and I’m still all for it.

Some examples:
1) Infinite timeouts (still today, Gerrit wait indefinitely for the SMTP server to respond)
2) Unsafe defaults which may cause repository corruption (we have warning on the release notes, but unsure of how many people actually read them)

Very good start :-) … reviewed them.

Luca.
Reply all
Reply to author
Forward
0 new messages