List groups high latency on 3.13.5

Nuno Costa

unread,

Apr 21, 2026, 7:22:32 AM (3 days ago) Apr 21

to Repo and Gerrit Discussion

Hi All,

As part of preparation for 3.9.11 to 3.13.5 upgrade, we are running gatling tests against REST endpoints ?o=INCLUDES&o=MEMBERS&S=0&n=250 and ?o=INCLUDES&o=MEMBERS&S=50&n=250, which we get much higher latency on 3.13.5.

I can also confirm this happens when running a simple curl command from the same server so it does not seem specific to the gatling tests.

When testing 5 curl connections against 3.9.11, each operation takes between 500-600 ms on the client side.
With 3.13.5, it increases to 900-1100.

I run the same command with trace(~6s) and during 5s is running group admin check + group owner check + group visibility checks for that user.

I tested flushing the bymember cache and after warming it up, rerun the curl command without tracing.
The 1st connection takes around 1500ms and the next ones drop to the 900-1100 range.

```
$ gerrit show-caches | grep groups
groups | 60 | 580.7us | 40% |
groups_bymember | 502 | 399.3us | 2% |
groups_byname | 335 | 419.7us | 99% |
groups_bysubgroup | 13135 | 370.5us | 98% |
groups_byuuid | 22668 | 679.8us | 99% |
groups_external | 1 | 4.7s | 99% |
groups_external_persisted | | 4.6s | 0% |
ldap_groups | 477 | 263.6ms | 99% |
ldap_groups_byinclude | | | |
D groups_byuuid_persisted | 22586 6.89m| | 100%|
```

The 3.11 release notes mentions "Change 435960: Don’t allow discovery of non-visible groups".
Could this be the reason for the higher latency that I'm seeing?

What can we do to improve this?

Thanks,
Nuno

Luca Milanesio

unread,

Apr 21, 2026, 10:04:52 AM (3 days ago) Apr 21

to Repo and Gerrit Discussion, Luca Milanesio, Nuno Costa

Hi Nuno,

Hope you are well :-)

Why don’t you try to reproduce it with Gerrit v3.14 which contains more detailed reporting on performance details?

See the release notes at:
https://www.gerritcodereview.com/3.14.html

Alternatively, you can profile the JVM or just get thread dumps and identify potential hotspots or bottlenecks.

HTH

Luca.

>
> What can we do to improve this?
>
> Thanks,
> Nuno
>

> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/57e2f971-0f26-4fdb-b140-ea32a7d9eae4n%40googlegroups.com.

Nuno Costa

unread,

Apr 22, 2026, 12:00:55 PM (2 days ago) Apr 22

to Repo and Gerrit Discussion

Hi Luca,

Yes, all good here, hope with you as well :)

On Tuesday, 21 April 2026 at 15:04:52 UTC+1 Luca Milanesio wrote:

Why don’t you try to reproduce it with Gerrit v3.14 which contains more detailed reporting on performance details?

See the release notes at:
https://www.gerritcodereview.com/3.14.html

We will try after we have 3.13 stable in production.

Alternatively, you can profile the JVM or just get thread dumps and identify potential hotspots or bottlenecks.

With ~1s latencies, we will try to find something in the thread dumps but it will probably be difficult.

Looking into the JVM profile topic, I found this project that helps visualize all the classes being run by the gerrit process.
https://github.com/async-profiler/async-profiler

Initially I run it with the command `asprof -d 30 -f %t-process-%p-flamegraph.html $(pgrep -f GerritCodeReview)` but to limit the data, I started and stopped manually.

Based on the graph we got, it seems to be taking most of the time on account/AccountCacheImpl.get.
At some point of the account cache stack, I can see a `NoSuchFileException` but I cant find any filesystem issues under the All-Users.git directory.

The other operation that also takes more time is `account/GroupControl.isVisible`, which has 2 distinct stacks.

One is touching `project/ProjectState` and the other touching `metrics/TimerContext`.

I already tested flushing the accounts, groups(all of the caches related to) and as expected, the first run will always take longer(populating the cache) and on next runs I always have the ~1s latency.

Also flushed all the caches(it took 15 minutes to complete the flush, yay h2 :p) but same scenario happens. First run takes longer and populates the cache and next runs keeps at ~1s.

In attach are the snippets of the flamegraph related to the classes I mention before.

Any tips from anyone are welcome :)

Thanks,

Nuno

20260422-161926-000554.png

20260422-162205-000555.png

Reply all

Reply to author

Forward