Are your repositories well packed? Unpacked (or less well packed)
repositories can take more CPU usage to serve a given client due
to more thrasing of the file descriptor space as it cycles through
the various files available within that repository.
However, serving a Git client over SSH is a fundamentally high CPU
task. The canonical C implementation of Git and OpenSSH can easily
use 2 CPUs for any given `git clone` client: one CPU running OpenSSH
for encryption of the data, and the other CPU putting together the
data stream for the Git client.
The Java implementation in Gerrit Code Review uses a single thread
for both activities... so its time-slicing that thread between the
data stream generation and the data encryption functions.
On a quad CPU system, assuming the clients can keep up, the most
you can really serve is 4 concurrent clients at once. If some (or
most) clients are on slower network connections, you may be able to
serve more clients at once by letting them share the CPUs while the
server is waiting for the network to transfer the buffered data.
But at gigabit ethernet speeds, its 1 CPU to 1 client.
> We have some fairly sizable repositories; however, I would hope to see
> much better concurrent user performance. Does anyone have any input
> improving Gerrit performance. I have watched this list closely, and
> tried some of the gerrit-config tweaks, but performance continues to
> be worse than abysmal.
What version of Gerrit Code Review?
What are the variables you have set in gerrit.config? Especially the
ones in the sshd and core sections.
Are you doing initial `repo sync`s into an empty directory? Or are
these incremental syncs where the repositories mostly already exist?
What sort of throughput are your clients getting?
I'm (almost) happily serving hundreds of users off a quad-core
system with a 8 GiB JVM heap. I'll admit, we're getting really
tight on CPU capacity. Most of the working day, we're at 390+%
CPU utilization.
I noticed this page on the wiki:
http://groups.google.com/group/repo-discuss/web/repository-repacking
How safe is it to run that repack script? Do you do it at off-peak
times while no other client is using the repos? Or is it safe to run
while others may be in the midst of using the repos? I ask because our
repo is used by teams in different timezones and there is no real
off-peak period. I guess the same is true about the gerrit instance
used by the android team.
Ishaaq
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
Unfortunately we haven't documented this well. :-(
To get acceptable performance out of Gerrit its memory, memory,
memory. That is, my gerrit.config uses the following:
[container]
heapLimit = 8g
[sshd]
threads = 24
batchThreads = 2
[core]
packedGitOpenFiles = 4096
packedGitLimit = 2g
packedGitWindowSize = 16k
[database]
poolMaxIdle = 16
poolLimit = 64
I've redacted the non-performance related variables to make it
easier to see what I have overridden from the defaults.
We run an 8 GiB JVM heap, and of that heap we permit 2 GiB of
memory to used as a buffer cache for packed Git data. By default
that buffer cache is 20 MiB. If you don't raise packedGitLimit
that JVM heap won't really be utilized and you will be thrasing
the internal buffer cache.
We also permit up to 4,096 pack files to be open at once, which
causes gerrit.sh to set our hard file descriptor limit to 8,192.
That gives us plenty of breathing space for network sockets.
We have clients that aren't local to the server, so although this
is a quad core system, we set sshd.threads to 24. Clients in remote
offices take longer to download due to a slower network connection,
but block a thread during that download. So we have more threads
than CPUs to better interleave access.
We're also running bleeding edge master, where we have partitioned
our thread pool to at most 2 threads in the 'non-interactive'
category. That's our build servers, so they can't dominate the
system and block out interactive humans. Its still an experimental
feature we're playing around with, but we found we had to throttle
the build farm.
Finally we need a lot of database connections. Every sshd.thread
needs at least 1 database connection during its work, but you need
more than that because we sometimes open a second connection to
fill a cache entry during a cache miss. If your database.poolLimit
is too low your server will be waiting around for an available
database connection.
A lot of these variables should be more automatic.
sshd.threads probably should be around 2x the number of CPUs you
have, but may go higher if you have a lot of remote WAN based
connections.
database.poolLimit should be at least 2x sshd.threads.
core.packedGitLimit should be at least 50% of your on disk usage
for your fully packed repositories. For android that's at least in
the 4 GiB range these days, so ~2 GiB core.packedGitLimit. But the
more memory you can spare here, the better off your server will be.
> I noticed this page on the wiki:
> http://groups.google.com/group/repo-discuss/web/repository-repacking
>
> How safe is it to run that repack script?
Its safe. I run it on a live server all of the time, without
regard for clients accessing the system. JGit (the git library
under Gerrit Code Review) knows how to deal with a repository that
was repacked in the middle of an access. Assuming you are on a
proper POSIX filesystem.
If you are on Windows, it may get hairy to do a repack while the
server is accessing the repository. JGit leaves files open while
its using them. Windows won't allow the `git gc` or `git repack`
program to delete a file while its open by another process. There is
no way to force JGit to close a file... you have to restart the
Gerrit Code Review server to make it release them.
> Or is it safe to run
> while others may be in the midst of using the repos? I ask because our
> repo is used by teams in different timezones and there is no real
> off-peak period. I guess the same is true about the gerrit instance
> used by the android team.
Yup.
You should be fine. We run a repack (that wiki script) every night.
Our users don't even notice.
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
For more options, visit https://groups.google.com/d/optout.---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.