K8sgerrit HA some index sync missing

61 views
Skip to first unread message

Serena He

unread,
Aug 28, 2025, 10:23:14 AMAug 28
to Repo and Gerrit Discussion
Hi Experts,

We are using latest gerrit operator with gerrit v3.8.1 (We know it's EOL and we do have plan for upgrade). We have 2 primaries and 2 replicas. Recently, we spotted index not getting synced from time to time, attaching two examples at bottom. Both primaries are up and running when issue happened. They can be solved by online indexing, but it would be very helpful if you can suggest possible cause so we can avoid this manual fix from time to time. Would upgrade to newer Gerrit version solve this?

Here is our high-availability.config:
[autoReindex]
  enabled = true
[http]
  socketTimeout = 20s
  threadPoolSize = 10
  retryInterval = 2s
  maxTries = 1800
[index]
  retryInterval = 2s
  maxTries = 30
  threadPoolSize = 10

If there is any more debug information you need, please let me know.

Thank You,
Serena

Example 1: User adds SSH key but fails to clone(log in gerrit-1) with key-list-empty error(log in gerrit-0). Index this account again solves the problem.

[gerrit-0]
127.0.0.6 [HTTP-64271] - - [2025-08-20T13:07:50.641Z] "POST /plugins/high-availability/index/account/1001720 HTTP/1.1" 204 - 734 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 77 80 34515648 -
127.0.0.6 [HTTP-64498] - - [2025-08-20T13:12:09.814Z] "POST /plugins/high-availability/index/account/1001720 HTTP/1.1" 204 - 217 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 3 10 480936 -
127.0.0.6 [HTTP-64720] - - [2025-08-20T13:20:20.643Z] "POST /plugins/high-availability/index/account/1001720 HTTP/1.1" 204 - 741 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 81 60 34539288 -
127.0.0.6 [HTTP-64271] - - [2025-08-20T13:20:48.416Z] "POST /plugins/high-availability/index/account/1001720 HTTP/1.1" 204 - 217 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 3 - 517128 -
[2025-08-20T13:02:40.716Z] bede0974 [SSHD] username - AUTH FAILURE FROM 127.0.0.6 - - - key-list-empty - - - -
[2025-08-20T13:07:56.247Z] 5b283f8e [SSHD] username - AUTH FAILURE FROM 127.0.0.6 - - - key-list-empty - - - -
[2025-08-20T13:15:04.061Z] b122c149 [SSHD] username - AUTH FAILURE FROM 127.0.0.6 - - - key-list-empty - - - -
[2025-08-20T13:26:42.120Z] 509023bb [SSHD] username - AUTH FAILURE FROM 127.0.0.6 - - - key-list-empty - - - -

[gerrit-1]
[2025-08-20 13:07:49,686] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_ACQUIRE"}
[2025-08-20 13:07:49,728] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","old_id":"c8537b7a9d0b6ca840ec4f1a422bb951c9283f6a","new_id":"04de2be798f25d4ff843670d13c1ee5efc75f19c","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-08-20 13:07:49.000000000","tz":0},"comment":"Updated SSH keys","project_name":"All-Users","type":"UPDATE_REF"}
[2025-08-20 13:07:49,739] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_RELEASE"}
[2025-08-20 13:12:09,373] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_ACQUIRE"}
[2025-08-20 13:12:09,415] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","old_id":"04de2be798f25d4ff843670d13c1ee5efc75f19c","new_id":"478dc6398e80ebedb6dca96a9fc581d3dd4f0d17","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-08-20 13:12:09.000000000","tz":0},"comment":"Updated SSH keys","project_name":"All-Users","type":"UPDATE_REF"}
[2025-08-20 13:12:09,424] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_RELEASE"}
[2025-08-20 13:20:19,672] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_ACQUIRE"}
[2025-08-20 13:20:19,713] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","old_id":"478dc6398e80ebedb6dca96a9fc581d3dd4f0d17","new_id":"49086217a9165951b88c783fd01c1f5783cc456a","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-08-20 13:20:19.000000000","tz":0},"comment":"Updated SSH keys","project_name":"All-Users","type":"UPDATE_REF"}
[2025-08-20 13:20:19,722] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_RELEASE"}
[2025-08-20 13:20:47,966] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_ACQUIRE"}
[2025-08-20 13:20:48,003] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","old_id":"49086217a9165951b88c783fd01c1f5783cc456a","new_id":"acf0a3dc7a7170b1c96ecc6fd573ca331813d5b4","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-08-20 13:20:47.000000000","tz":0},"comment":"Updated SSH keys","project_name":"All-Users","type":"UPDATE_REF"}
[2025-08-20 13:20:48,015] [HTTP POST /accounts/self/sshkeys (username from 10.0.0.43)] INFO  : {"ref_name":"refs/users/20/1001720","project_name":"All-Users","type":"LOCK_RELEASE"}

Example 2: User merge a bunch of dependent changes, but they still get listed under "status:open". Those merged changes under open list disappear and appear upon refresh which indicates index is only updated on one primary. Though reindex can solve this, it's still confusing what makes high availability plugin miss forwarding it to the other primary.

[gerrit-0]
127.0.0.6 [HTTP-98170] - - [2025-07-24T13:56:15.951Z] "POST /plugins/high-availability/index/change/ repo~39735 HTTP/1.1" 204 - 265 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 9 - 2162512 -
127.0.0.6 [HTTP-97923] - - [2025-07-24T13:56:48.043Z] "POST /plugins/high-availability/index/change/ repo~39735 HTTP/1.1" 204 - 319 - "Apache-HttpClient/4.5.2 (Java/11.0.27)" 10 10 2469728 -

[gerrit-1]
10.-.-.- [HTTP-86159] - [2025-07-24T13:56:15.802Z] "POST /changes/repo~39735/revisions/4/review HTTP/1.1" 200 31 472 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 36 30 7514768 -
[2025-07-24 13:56:15,394] [HTTP POST /changes/ repo~39735/revisions/4/review] INFO  : {"ref_name":"refs/changes/35/39735/meta","project_name":" repo","type":"LOCK_ACQUIRE"}
[2025-07-24 13:56:15,444] [HTTP POST /changes/ repo~39735/revisions/4/review] INFO  : {"ref_name":"refs/changes/35/39735/meta","old_id":"db70ea1d25e940ecdb2614940bbe9038b670c11d","new_id":"c471e16b9b5778200e30b16089e9915b5186fabe","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-07-24 13:56:15.000000000","tz":0},"comment":"Update patch set 4","project_name":" repo","type":"UPDATE_REF"}
[2025-07-24 13:56:15,454] [HTTP POST /changes/ repo~39735/revisions/4/review] INFO  : {"ref_name":"refs/changes/35/39735/meta","project_name":" repo","type":"LOCK_RELEASE"}
[2025-07-24 13:56:45,434] [HTTP POST /changes/ repo~39740/revisions/18/submit] INFO  : {"ref_name":"refs/changes/35/39735/meta","project_name":" repo","type":"LOCK_ACQUIRE"}
[2025-07-24 13:56:47,116] [HTTP POST /changes/ repo~39740/revisions/18/submit] INFO  : {"ref_name":"refs/changes/35/39735/meta","old_id":"c471e16b9b5778200e30b16089e9915b5186fabe","new_id":"c2bd9eb95f04381c2abb7635d9b0f5102b9bd936","committer":{"name":"Gerrit Code Review","email":" ","date":"2025-07-24 13:56:44.000000000","tz":0},"comment":"Update patch set 4","project_name":" repo","type":"UPDATE_REF"}
[2025-07-24 13:56:47,177] [HTTP POST /changes/ repo~39740/revisions/18/submit] INFO  : {"ref_name":"refs/changes/35/39735/meta","project_name":" repo","type":"LOCK_RELEASE"}

Matthias Sohn

unread,
Aug 28, 2025, 5:04:47 PMAug 28
to Serena He, Repo and Gerrit Discussion
On Thu, Aug 28, 2025 at 4:23 PM Serena He <Seren...@arm.com> wrote:
Hi Experts,

We are using latest gerrit operator with gerrit v3.8.1 (We know it's EOL and we do have plan for upgrade). We have 2 primaries and 2 replicas. Recently, we spotted index not getting synced from time to time, attaching two examples at bottom. Both primaries are up and running when issue happened. They can be solved by online indexing, but it would be very helpful if you can suggest possible cause so we can avoid this manual fix from time to time. Would upgrade to newer Gerrit version solve this?

If you are using the HA setup you should upgrade to gerrit 3.11.5 or better 3.12.2 and enable the indexSync feature
of the high-availability plugin which we added to fix missed index updates which can happen
if events sent to other primary pods are lost e.g. due to a pod restart or a network issue.

See 
 
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/d3708945-c912-4363-a81a-20be84549883n%40googlegroups.com.

Serena He

unread,
Sep 19, 2025, 5:38:24 AM (9 days ago) Sep 19
to Repo and Gerrit Discussion
On Thursday, August 28, 2025 at 10:04:47 PM UTC+1 Matthias Sohn wrote:
On Thu, Aug 28, 2025 at 4:23 PM Serena He <Seren...@arm.com> wrote:
Hi Experts,

We are using latest gerrit operator with gerrit v3.8.1 (We know it's EOL and we do have plan for upgrade). We have 2 primaries and 2 replicas. Recently, we spotted index not getting synced from time to time, attaching two examples at bottom. Both primaries are up and running when issue happened. They can be solved by online indexing, but it would be very helpful if you can suggest possible cause so we can avoid this manual fix from time to time. Would upgrade to newer Gerrit version solve this?

If you are using the HA setup you should upgrade to gerrit 3.11.5 or better 3.12.2 and enable the indexSync feature
of the high-availability plugin which we added to fix missed index updates which can happen
if events sent to other primary pods are lost e.g. due to a pod restart or a network issue.

See 
 
Hi Matthias,

Thanks for suggestion. I'm upgrading to 3.12.2-27-gcf7d13a276 and have a question about high-availability configuration.
I would like to add the following configuration:
            [indexSync]
              enabled = true
              period = 1m
              initialSyncAge = 3hours
              syncAge = 15minutes
but I find initialSyncAge and syncAge not applied. From log, it's still using default value. Then I refer to [1]. It seems that the following configuration can work.
            [indexSync]
              enabled = true
              period = 1m
            [indexSync ""]
              initialSyncAge = 3hours
              syncAge = 15minutes
Is this expected? Am I missing anything in the doc?

Thank you,
Serena
Reply all
Reply to author
Forward
0 new messages