Gerrit replication slowness (with ~100K refs)

Martin Fick

unread,

Mar 26, 2012, 7:03:40 PM3/26/12

to repo-d...@googlegroups.com, Shawn Pearce

Shawn,

I have done some further investigations into our gerrit
replication performance problems with some encouraging
results. I could use some input on potential directions to
proceed.

We are replicating to 7 slaves via the git protocol to git-
daemons running on our slaves. When we replicate our
kernel/msm repository on our production servers with 24
cores, each msm replication thread starts chewing up 100%
CPU on a core. With 7 threads for one change, our CPU
utilizations rises to 700%, with 3 changes we are almost
maxing out all 24 cores. Each thread ends up taking quite a
while to complete: typically around 10mins; sometimes when
there are many changes being replicated, up to an hour each.

As we discussed on Friday on IRC, most of the time seems
spent in a 2 places in jgit. From what you said, I think
that I understood that, jgit is trying to determine what the
remote mirror has already in order to decide what to put in
the pack file to send to it. It sounds like it is looking
through all the commits on all the refs which it knows the
remote has, in order to reduce the possible list of commits
which it needs to send to the remote. This last piece I
guess is an optimization designed to reduce the network
bandwidth at the expense of some CPU. As it turns out, I
guess that may be quite a bit of CPU!

It sounds like you may be aware of an algorithmic problem
which makes this take a lot of CPU, and perhaps you already
have ideas of how to fix this?

If not, I thought about this problem a bit and think that
perhaps this optimization may not really be appropriate in
Gerrit's use case anyway? Specifically, this optimization
seems to makes sense when you think of a traditional git
development model where you may have a few branches with
long histories. But, in the gerrit model, instead of having
depth, when considering changes, we have breadth. So, while
it is possible that knowing that a remote mirror already has
a specific gerrit change might save some network IO, in most
cases, this is likely not very significant. Perhaps
searching a ref space which may be on the order of 100Krefs
just to possibly not have to send one or 2 commits isn't
really worth it? Arguably, this logic may apply to tags
also? Even in the normal git development model, tags are
unlikely to point to refs not already on a branch, so they
are unlikely to help much either?

With this logic in mind, I set out to test my theories. I
set up a test replication on an unloaded Gerrit host to 7
different copies of kernel/msm on an unloaded mirror. When
pushing a change to the Gerrit host, 7 threads would start
replicating and they would each last on the order of 5:30
mins, not horrible, but not good. If this system were
loaded we would probably be seeing times like on our
production system.

What I did next, is I deleted all the tags (~6.5K)from the
mirrors. This took my replication times down into the ~4:30
mins range. Next, I deleted all the changes (~100K)from the
mirrors. At this point the replication threads completed
right away! So, interestingly enough, this shows that the
high ref counts on the remotes is what is causing all the
CPU usage in Gerrit.

Obviously, we need to keep the changes on the remotes, but
perhaps Gerrit (jgit) should be changed to simply not
consider them (or the tags) when pushing? This proposal
feels a bit hackish, but it also makes sense if the
algorithm cannot be improved. Do you think that you will
likely be able to improve jgit's algorithm to the point
where this is no longer a problem? What about 10 years from
now with 1M refs? :) Will it still makes sense to be
scanning the breadth of such a repo? Is this something that
perhaps should be configurable with a config option?

Thanks for you input,

-Martin

---------------------------------------------
Employee of Qualcomm Innovation Center, Inc. which is a
member of Code Aurora Forum

Shawn Pearce

unread,

Mar 27, 2012, 10:06:24 AM3/27/12

to Martin Fick, repo-d...@googlegroups.com

On Mon, Mar 26, 2012 at 19:03, Martin Fick <mf...@codeaurora.org> wrote:
> I have done some further investigations into our gerrit
> replication performance problems with some encouraging
> results. I could use some input on potential directions to
> proceed.
>
> We are replicating to 7 slaves via the git protocol to git-
> daemons running on our slaves. When we replicate our
> kernel/msm repository on our production servers with 24
> cores, each msm replication thread starts chewing up 100%
> CPU on a core. With 7 threads for one change, our CPU
> utilizations rises to 700%, with 3 changes we are almost
> maxing out all 24 cores. Each thread ends up taking quite a
> while to complete: typically around 10mins; sometimes when
> there are many changes being replicated, up to an hour each.

:-(

> As we discussed on Friday on IRC, most of the time seems
> spent in a 2 places in jgit. From what you said, I think
> that I understood that, jgit is trying to determine what the
> remote mirror has already in order to decide what to put in
> the pack file to send to it. It sounds like it is looking
> through all the commits on all the refs which it knows the
> remote has, in order to reduce the possible list of commits
> which it needs to send to the remote. This last piece I
> guess is an optimization designed to reduce the network
> bandwidth at the expense of some CPU. As it turns out, I
> guess that may be quite a bit of CPU!

Yes. Part of it is a bug in the version of JGit you appear to be
running; JGit is using a List<ObjectId> when really it means a
Set<ObjectId> because it does a lot of contains tests against that
collection. We fixed it in later versions of JGit to use a Set here,
which may improve performance for replication. Were you able to try
this?

> It sounds like you may be aware of an algorithmic problem
> which makes this take a lot of CPU, and perhaps you already
> have ideas of how to fix this?

There are many problems with this code. One of them was the List->Set
change that already happened in more recent versions of JGit.

Another is that the priority queue used to sort commits by date is
just a List with a linear O(N) insertion. This turns into an O(N^2)
operation as JGit tries to process each relevant commit... which for
Gerrit includes all changes the remote already has.

And another is the fact that we even look at all of the refs/changes/*
names from the remote...

> If not, I thought about this problem a bit and think that
> perhaps this optimization may not really be appropriate in
> Gerrit's use case anyway? Specifically, this optimization
> seems to makes sense when you think of a traditional git
> development model where you may have a few branches with
> long histories. But, in the gerrit model, instead of having
> depth, when considering changes, we have breadth. So, while
> it is possible that knowing that a remote mirror already has
> a specific gerrit change might save some network IO, in most
> cases, this is likely not very significant. Perhaps
> searching a ref space which may be on the order of 100Krefs
> just to possibly not have to send one or 2 commits isn't
> really worth it? Arguably, this logic may apply to tags
> also? Even in the normal git development model, tags are
> unlikely to point to refs not already on a branch, so they
> are unlikely to help much either?

Yes. I agree with you here.

> With this logic in mind, I set out to test my theories. I
> set up a test replication on an unloaded Gerrit host to 7
> different copies of kernel/msm on an unloaded mirror. When
> pushing a change to the Gerrit host, 7 threads would start
> replicating and they would each last on the order of 5:30
> mins, not horrible, but not good. If this system were
> loaded we would probably be seeing times like on our
> production system.
>
> What I did next, is I deleted all the tags (~6.5K)from the
> mirrors. This took my replication times down into the ~4:30
> mins range. Next, I deleted all the changes (~100K)from the
> mirrors. At this point the replication threads completed
> right away! So, interestingly enough, this shows that the
> high ref counts on the remotes is what is causing all the
> CPU usage in Gerrit.

Yes. Its because of the way the pushing server is considering all of
those refs/changes/ names. It has to open their objects, push them
into a priority queue using an O(N^2) algorithm, and then walk through
history until it figures out the priority queue is "empty" and
shouldn't be processed further. This is not fast for 100K commits.

> Obviously, we need to keep the changes on the remotes, but
> perhaps Gerrit (jgit) should be changed to simply not
> consider them (or the tags) when pushing?

So actually... this is already what Gerrit Code Review does for its clients.

When a client connects to Gerrit for the purposes of pushing, Gerrit
*hides* the refs/changes/ names from the client and does not show
them. This allows the client to avoid considering those commits,
enabling faster push. Unfortunately this isn't available between a
Gerrit master and a Gerrit slave, because the slave doesn't accept
pushes. And it really isn't available between a Gerrit master and a
git:// daemon or stock Git, because those can't filter the
refs/changes/ names away from the client.

It would be tricky to teach JGit and Gerrit to have the Gerrit master
ignore the refs/changes/ names from the peer during push. This isn't
normally something that gets done in the Git wire protocol. But it
might be worth doing.

> This proposal
> feels a bit hackish, but it also makes sense if the
> algorithm cannot be improved. Do you think that you will
> likely be able to improve jgit's algorithm to the point
> where this is no longer a problem?

Maybe. I might be able to try an alternative algorithm where JGit
first considers only the remote's refs/heads/ and then after its built
up the list of what it wants to send in a map, compares that against
the other references we had previously ignored. If we find another
reference might have been relevant, reprocess the graph with those
other references considered in addition to the relevant refs/heads/
names. This actually should get us a good approximation most of the
time, without the expense of needing to process the refs/changes/ or
refs/tags/ names.

There will be cases where this algorithm fails and the client prepares
too much initially, then has to redo that when it discovers its about
to over-transmit too much data and has to redo its filtering.

> What about 10 years from
> now with 1M refs? :) Will it still makes sense to be
> scanning the breadth of such a repo? Is this something that
> perhaps should be configurable with a config option?

1M refs is a problem. It just won't scale. I'm out of ideas on how to scale it.

I keep saying this, and we keep not doing it. What we really have to
do is take the refs/changes/ names for any closed references and move
them into a parallel "historical" repository. For example, if I have
"foo/bar" as a project we could create "foo/bar/2010", "foo/bar/2011",
"foo/bar/2012" repositories and move all refs/changes/ names for any
change that has been closed or abandoned into a repository based on
the year the change was first created in. This would allow the main
repository to hold active changes and perhaps recently closed changes,
with the long-term stuff being shuffled aside.

URLs for long term archived stuff would have to change, to reference
the long term repository.

Martin Fick

unread,

Mar 29, 2012, 1:01:33 PM3/29/12

to Shawn Pearce, repo-d...@googlegroups.com

On Tuesday, March 27, 2012 08:06:24 am Shawn Pearce wrote:
> On Mon, Mar 26, 2012 at 19:03, Martin Fick
<mf...@codeaurora.org> wrote:
> Yes. Part of it is a bug in the version of JGit you
> appear to be running; JGit is using a List<ObjectId>
> when really it means a Set<ObjectId> because it does a
> lot of contains tests against that collection. We fixed
> it in later versions of JGit to use a Set here, which
> may improve performance for replication. Were you able
> to try this?

Yes, I tried this yesterday. It took the replication time
down from 5:30mins to 3:20min, so it helped some.

> Another is that the priority queue used to sort commits
> by date is just a List with a linear O(N) insertion.
> This turns into an O(N^2) operation as JGit tries to
> process each relevant commit... which for Gerrit
> includes all changes the remote already has.

OK, I messed with this, by using a java LinkedList (for
sorting simplicity), (why did you implement your own linked
list here?) and not sorting on insertion, but rather only
before any retrievals (and by keeping a sorted flag), I was
able to reduce these times considerably still. Now it takes
about 1:10min to replicate!!! I will experiment with a few
other options too, ArrayList, Linked + Array combo...

Also, our upload times (warm) were on the order of 2mins
with stock 2.3-rc0 and with the jgit mods mentioned above,
they are now around 11s!

Maybe these numbers will help you figure out a good way to
move forward in improving jgit (or maybe I'll even submit a
patch to jgit, scary). Do you have any opposition to using
standard java.util Lists in DateRevQueue?

Thanks for your advice so far,

-Martin

--

Martin Fick

unread,

Mar 29, 2012, 2:50:56 PM3/29/12

to repo-d...@googlegroups.com, Shawn Pearce

On Thursday, March 29, 2012 11:01:33 am Martin Fick wrote:
>
> OK, I messed with this, by using a java LinkedList (for
> sorting simplicity), (why did you implement your own
> linked list here?) and not sorting on insertion, but
> rather only before any retrievals (and by keeping a
> sorted flag), I was able to reduce these times
> considerably still. Now it takes about 1:10min to
> replicate!!! I will experiment with a few other options
> too, ArrayList, Linked + Array combo...

After further investigation, that 1:10min is mostly spent
with 100% (one CPU) on the mirror. On the gerrit side, this
fix seems to have brought the heavy CPU usage down to about
4s only! So I don't think that the Gerrit (or jgit) side
needs a lot of further optimization on this (not for 100K
refs, maybe at 1M, but who knows where then).

As a note, the git CPU is being used by:

git rev-list --verify-objects --stdin --not --all

Since these are one line changes that I am pushing, I
suspect that git can still be improved to not take 1min to
receive these. I am using git daemon with git 1.7.8.3 on
the mirrors.

> Also, our upload times (warm) were on the order of 2mins
> with stock 2.3-rc0 and with the jgit mods mentioned
> above, they are now around 11s!

With an ArrayList, these seem to be a bit closer to ~10s
(11s for LinkedList). Replication is about the same with an
ArrayList.

Dave Borowitz

unread,

Mar 29, 2012, 3:45:32 PM3/29/12

to repo-d...@googlegroups.com, Shawn Pearce

So, we're implementing a priority queue. Presumably we can't use java.util.PriorityQueue because ties need to be broken in FIFO order. Would it be too much overhead to store (commit, counter) in a PriorityQueue, and write the comparator to break ties using the lowest counter?

Dave Borowitz

unread,

Mar 30, 2012, 10:45:44 AM3/30/12

to repo-d...@googlegroups.com, Shawn Pearce

I wrote a proof-of-concept implementation using PriorityQueue:

https://git.eclipse.org/r/5491

Martin tested it and it was noticeably slower than his simple list-with-unsorted-bit implementation. Presumably this is some combination of the extra object overhead and the relative amount of resorting needed in a typical DateRevQueue workload.

Shawn Pearce

unread,

Mar 30, 2012, 1:05:02 PM3/30/12

to Dave Borowitz, repo-d...@googlegroups.com

On Thu, Mar 29, 2012 at 15:45, Dave Borowitz <dbor...@google.com> wrote:
> On Thursday, March 29, 2012 10:01:33 AM UTC-7, MartinFick wrote:
>> On Tuesday, March 27, 2012 08:06:24 am Shawn Pearce wrote:
>> > Another is that the priority queue used to sort commits
>> > by date is just a List with a linear O(N) insertion.
>> > This turns into an O(N^2) operation as JGit tries to
>> > process each relevant commit... which for Gerrit
>> > includes all changes the remote already has.
>>
>> OK, I messed with this, by using a java LinkedList (for
>> sorting simplicity), (why did you implement your own linked
>> list here?) and not sorting on insertion, but rather only
>> before any retrievals (and by keeping a sorted flag), I was
>> able to reduce these times considerably still. Now it takes
>> about 1:10min to replicate!!! I will experiment with a few
>> other options too, ArrayList, Linked + Array combo...
>
> So, we're implementing a priority queue. Presumably we can't use
> java.util.PriorityQueue because ties need to be broken in FIFO order. Would
> it be too much overhead to store (commit, counter) in a PriorityQueue, and
> write the comparator to break ties using the lowest counter?

java.util.PriorityQueue had too much object overhead, and we wanted to
break ties in FIFO order. Making our own wrapper object that also
carried the counter and was stored into the PriorityQueue's own node
object was just going to be a lot of garbage for a simple priority
queue.

Martin Fick

unread,

Mar 30, 2012, 7:10:34 PM3/30/12

to repo-d...@googlegroups.com, Dave Borowitz, Shawn Pearce

On Thursday, March 29, 2012 12:50:56 pm Martin Fick wrote:
> On Thursday, March 29, 2012 11:01:33 am Martin Fick wrote:
> After further investigation, that 1:10min is mostly spent
> with 100% (one CPU) on the mirror. On the gerrit side,
> this fix seems to have brought the heavy CPU usage down
> to about 4s only! So I don't think that the Gerrit (or
> jgit) side needs a lot of further optimization on this
> (not for 100K refs, maybe at 1M, but who knows where
> then).

So while heavy CPU for 4s no longer bothered me, upload
times of 11s still did. Also, I noticed that creating a new
branch takes about 11s too for us in kernel/msm. After
further investigation I noticed that both of these paths do
the equivalent of:

for (Ref r : repo.getAllRefs().values())
walk.markUninteresting(walk.parseAny(r.getObjectId()));

and parsing all these objects takes a significant piece of
the time of those commands. It might only be ~5s, but this
time impacts user perception fairly dramatically. So, I was
wondering if it wouldn't be a bad idea to cache RevWalks? I
noticed that RevWalks have an internal object cache and if a
reset() is called it seems OK to reuse them?

So, if we did cache RevWalks, it would be a classic memory
vs time tradeoff. Since we have plenty of memory, I would
welcome such a tradeoff for most use cases. I suspect that
perhaps after clones, it could be problematic to keep that
much data around? Although who knows, perhaps caching some
of that could even help prevent the gc from kicking in after
clones?

To see if it would be worth it, I created a small test case:
I added a place to store a RevWalk on a ProjectState object
and I hacked ReceiveCommits to see if there is one there for
the current project and to store it there when done. It
seems to have helped, after the first push to kernel/msm
which takes about ~10s, I am able to push consistently in
only ~3.5s!!! This sort of upload time would be phenomenal
to have again on our kernel/msm project!

So, is it reasonable to maybe build a caching infrastructure
for RevWalks? What would be a good approach to doing so,
using ehcache? Should they just be pushed onto a stack in
the ProjectState instead? What sort of configuration
options would be useful? Would there be an easy way to
determine the capacity of the current RevWalks (to perhaps
discard the ones which are too big)? Should a RevWalk gain
a way to flush some of its objects to trim it down to a
reasonable size? Should they be cacheable on disk (would
this still potentially be a gain)?

I look forward to a responsive kernel/msm! Thanks for any
ideas/direction,

Reply all

Reply to author

Forward