How to deal with huge repositories?

62 views
Skip to first unread message

Daniil Gan’kov

unread,
Sep 12, 2022, 11:21:55 AM9/12/22
to Repo and Gerrit Discussion
Hello everyone,

We have a Gerrit instance with a couple of huge repositories (ex. a replica of torvalds/linux). These repositories are often cloned by users and CI (service users) multiple times a day simultaneously, causing either full sshd/httpd thread pool load in case of small available thread count, producing a long waiting queue of client requests, or Java heap overflow else (not out of memory exception, but the whole service freeze).

We tried tuning Gerrit configuration with the advices of Gerrit Performance Tuning Cheat Sheet (available here), but that is not working when we need 10+ concurrent requests to fetch a ~3.8 GiB pack file even with 32 GiB RAM (as I understand, that is because every request thread loads its own copy of the pack file into RAM).

On which performance options should we rely to make such a peak load possible? Could this problem be solved without increasing the heap size?

Thanks,
Daniil Gan’kov

Erik ht

unread,
Sep 16, 2022, 2:04:17 AM9/16/22
to Repo and Gerrit Discussion
Maybe start in the other end, reduce how much load you put on the gerrit:

* You shouldn't be cloning from empty directory that many times per day? Advice users and your CI system to work in a more incremental ways. git clean rather than rm -rf etc.
* Shallow clones, clone only current branch instead of all.
* On CI machines with multiple workspaces, utilize git alternates to share the git-objects across all of them https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt (If using repo-tool, check the --mirror and --reference option)
* Add another read-only Gerrit mirror, synchronized with the replication plugin. HA installation can also work but is a bit more complex.
Reply all
Reply to author
Forward
0 new messages