Gerrit garbage collection and bitmaps generation

720 views
Skip to first unread message

Bassem Rabil

unread,
Jan 28, 2015, 2:12:45 PM1/28/15
to repo-d...@googlegroups.com
Hi 
 
We have been using native git garbage collection since Gerrit 2.7 and stopped using gerrit gc on our daily garbage collection. Back then we got issues when running gerrit gc specially with 2 Gerrit application backends sharing the same filesystem for repositories [1]. Our conclusion back then was that running gerrit gc on one backend updates the information about the repository in the memory of this backend, however the other backend is not aware of such update and thus complains about missing SHA1, expecting different SHA1, or get ACK/NAK error from jgit. There was a workaround for ACK/NAK described by Martin Fick in this thread [2] to hack jgit to get rid of such error. Now native git provides a method to repack the repository generating the bitmaps [3].
 
My questions here:
 
1. I am assuming the bitmaps generated by git repack are compatible with jgit. Is there anyone here who uses git repack to generate the bitmaps for git repositories hosted by Gerrit ?  
2. Is there anyone here who uses gerrit gc without sporadic issues having a shared filesystem for repositories ?
3. For those who currently use gerrit gc to generate the bitmaps, how often you run it for busy repositories ? Do you notice these sporadic ACK/NAK error during fetching ? Are you using any custom jgit version with any special hack ? Did you stop totally using native git garbage collection ?
 
 

Bassem Rabil

unread,
Feb 2, 2015, 10:57:45 AM2/2/15
to repo-d...@googlegroups.com
Any thoughts about generating git bitmaps ? Does this imply that this has been dropped by many of Gerrit community ?

Patrick Renaud

unread,
Feb 5, 2015, 11:42:26 AM2/5/15
to repo-d...@googlegroups.com
Hey guys, do we read by the absence of replies that NOBODY is using gerrit-gc out there? Have we all given up on the bitmaps?

Luca Milanesio

unread,
Feb 5, 2015, 11:50:08 AM2/5/15
to Patrick Renaud, Repo and Gerrit Discussion
I’ve recently used bitmaps generated from cgit (git repack -b) from Gerrit without issues :-)
Did not notice any problems … was I just lucky?

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Patrick Renaud

unread,
Feb 5, 2015, 2:57:19 PM2/5/15
to repo-d...@googlegroups.com, pren...@gmail.com
OK, tx for the feedback Luca. 

For very simple cases it works for us too, but we've had a lot of issues with gerrit-gc during concurrent operations against large busy repos in the past, to the point where we had to give up on them and revert back to standard garbage collection with a git client and without bitmaps.

Bassem is now working on trying to bring gerrit-gc bitmaps back in the picture again, considering the much needed performance boost on clone/fetch operations they provided. He's trying to isolate the pathological cases we had and see if those exist again today on the latest Gerrit. And if it's broken then Hugo will try to fix it. But before engaging too much effort into this we are trying to get a feel for how bad this is with the rest of the Gerrit community. Are we alone in having had issues (I don't think so)? Or have we all secretly given up on the bitmaps? If so, maybe we need to join forces and fix the issues all together, if issues persist.

This is why Bassem sent his mail in the first place, trying to find out if there is still anyone out there making (successful) use of gerrit-gc bitmaps. 

Voilà!

Matthias Sohn

unread,
Feb 5, 2015, 7:56:43 PM2/5/15
to Patrick Renaud, Repo and Gerrit Discussion
On our internal production instance with >10k users we run git-core's gc with bitmap generation enabled
and this works nicely.

We are running gerrit gc every night on our cloud installations of Gerrit, so far we had no problems with that.
Though there we don't have much traffic yet and at the moment we use Gerrit as a Git server only without
using code review since our custom tenant separation doesn't yet cover REST API and UI.

If you experience problems with multiple instances working on the same file system you may try
the new JGit configuration option core.trustfolderstats we introduced to workaround a problem we
faced when multiple JGit based processes look at the same pack files on NFS. One of the processes
created a new pack and the other one didn't see this new pack since the pack folders modification
timestamp didn't change immediately probably due to some lag caused by NFS.
Setting core.trustfolderstats to false ensures that JGit always checks for new pack files and doesn't
trust the pack directory's lastmodified timestamp to find out if its content changed. This fixed
our problem on NFS.

[1] https://git.eclipse.org/r/#/c/36058/ available since JGit 3.6.0

--
Matthias

Bassem Rabil

unread,
Feb 5, 2015, 8:18:37 PM2/5/15
to repo-d...@googlegroups.com, pren...@gmail.com
Thanks Matthias for pointing out this fix for NFS issues. This option core.trustfolderstats should be set in gerrit.config, right ?

Regards
Bassem

Matthias Sohn

unread,
Feb 6, 2015, 10:08:37 AM2/6/15
to Bassem Rabil, Repo and Gerrit Discussion, Patrick Renaud
if you set it in the git config of a repository it should work,
I am not sure if Gerrit does forward it to JGit if it is configured in gerrit.config

Dave Borowitz

unread,
Feb 6, 2015, 7:48:27 PM2/6/15
to Matthias Sohn, Bassem Rabil, Repo and Gerrit Discussion, Patrick Renaud
For context, "gerrit gc" with bitmaps predates bitmap support in upstream C git by like a year. Back then, the tradeoff was, in order to get bitmaps, you _had_ to run gc in the gerrit process, or at least on the same machine. With all the performance problems that implies. (We at Google get around that by running GC in a completely different pool of worker machines, but without a fast shared filesystem that's not really an option for most people.)

Now if C git can generate JGit-compatible bitmaps, you can probably go back to regular git gc without much problem.

Bassem Rabil

unread,
Feb 16, 2015, 9:30:19 AM2/16/15
to repo-d...@googlegroups.com, matthi...@gmail.com, bassem.ra...@ericsson.com, pren...@gmail.com
We are experiencing a weird performance with git bitmaps which are generated using native git, we tried that using different options for repacking using git 2.1 and 2.2, we got almost double the clone time when git bitmaps are generated by native git compared to 30-40 % clone time reduction when using gerrit gc or jgit garbage collection.

For example for a repository without bitmaps, it take 11 mins to clone --bare this repository, when running gerrit/jgit gc, the clone for bare repository takes almost 7 mins as shown below:

$ time git clone --bare ssh://localhost:29418/repository-with-jgit-gc jgit-gc-repo
Cloning into bare repository 'jgit-gc-repo'...
remote: Counting objects: 2, done
remote: Finding sources: 100% (2/2)
remote: Total 799178 (delta 0), reused 799178 (delta 0)
Receiving objects: 100% (799178/799178), 647.80 MiB | 17.22 MiB/s, done.
Resolving deltas: 100% (546459/546459), done.
Checking connectivity... done.

real 7m10.464s

while when running git gc with the following options set to generate the bitmaps, also we used different repack/pack options and still the same degraded performance:
[pack]
        writeBitmaps = true


the bare clone takes 21 mins as shown below:


$ time git clone --bare ssh://localhost:29418/repository-with-git-gc-bmp git-gc-bmp-repo
Cloning into bare repository 'git-gc-bmp-repo'...
remote: Counting objects: 705567, done
remote: Finding sources: 100% (799049/799049)
remote: Getting sizes: 100% (341606/341606)
remote: Compressing objects:  99% (7437047/7437111)  =====> This step of the clone does not show in case of gerrit/jgit gc'ed repo.
remote: Total 799049 (delta 453104), reused 654973 (delta 423286)
Receiving objects: 100% (799049/799049), 880.10 MiB | 14.44 MiB/s, done.
Resolving deltas: 100% (500957/500957), done.
Checking connectivity... done.

real 21m12.827s


It looks to me that bitmaps generated by native git slows down the cloning of the repository, I came across an older discussion in the community [1] regarding slow clone time for repositories imported from file system. 

Did anyone experience such performance degradation using native git bitmaps ? What options you use for your bitmaps generation using native git ? 


I have an impression that this might be caused by this extra compression step in the clone. Does anyone know why the compression step is showing for the repository which has native git bitmaps, and not for those with gerrit gc ?




Thanks and Regards
Bassem Guendy

Luca Milanesio

unread,
Feb 16, 2015, 12:49:17 PM2/16/15
to Bassem Rabil, repo-d...@googlegroups.com, matthi...@gmail.com, pren...@gmail.com
Hi Bassem,
I’ve done recently some benchmarks using Git and JGit with and without bitmaps, my results were slightly different.

I have to say that my bitmaps were generated always with (C)Git and used by JGit and Git.

The order of slow > fast of combinations was:

Slowest >
Git without Bitmaps
JGit without Bitmaps
Git with Bitmaps
JGit with Bitmaps
Fastest >

Tests were made with a very large repo … actually the Linux kernel :-)

I would need to repeat the test with JGit generated bitmaps: you mentioned you believe that generated format is slightly different?

Luca.

Bassem Rabil

unread,
Feb 16, 2015, 1:02:22 PM2/16/15
to repo-d...@googlegroups.com, bassem.ra...@ericsson.com, matthi...@gmail.com, pren...@gmail.com
The generated bitmaps at my side have significantly different size, e.g. for the sample repository I reported in my previous post
5.1M for the bitmaps generated by native git compared to 45M for the bitmaps generated using gerrit/jgit gc.

I am assuming this indicates a different format and content for the bitmaps.

What are the options you currently use for generating the bitmaps using native git ? both the command line and repository configurations ?

Thanks and Regards
Bassem Guendy

Luca Milanesio

unread,
Feb 16, 2015, 1:12:43 PM2/16/15
to Bassem Rabil, repo-d...@googlegroups.com, matthi...@gmail.com, pren...@gmail.com
I’ve used a simple:
git repack -ab

Luca.

Shawn Pearce

unread,
Feb 16, 2015, 2:57:40 PM2/16/15
to Bassem Rabil, repo-discuss, Matthias Sohn, Patrick Renaud
On Mon, Feb 16, 2015 at 10:02 AM, Bassem Rabil <bassem.ra...@ericsson.com> wrote:
The generated bitmaps at my side have significantly different size, e.g. for the sample repository I reported in my previous post
5.1M for the bitmaps generated by native git compared to 45M for the bitmaps generated using gerrit/jgit gc.

That is interesting. Your repository has some weird structure to it that is making gerrit/jgit gc produce a lot more bitmaps.

My first guess is you have a lot of refs/heads/* branches? Each one of those gets at least one bitmap from a gerrit/jgit gc.

I am assuming this indicates a different format and content for the bitmaps.

The formats are very close. git gc writes an extra index at the end of the bitmap file that should be ignored by gerrit/jgit.

As for clone times, what are the sizes of the pack files produced by the resulting gcs? The gerrit/jgit gc should write two pack files, one with the same name as the .bitmap file and another one that is smaller. This two pack format allows `git clone URL` to more or less just dump the bitmap's pack file to the client with little cpu on the server.

I don't think git gc creates two packs. I think it writes one pack. Which might force gerrit into doing more cpu work to slice the pack apart for a client.

Bassem Rabil

unread,
Feb 16, 2015, 3:32:23 PM2/16/15
to repo-d...@googlegroups.com, bassem.ra...@ericsson.com, matthi...@gmail.com, pren...@gmail.com
Hi Shawn 

Here below more information about this repository. If native git cannot be tweaked to get the same pack format as jgit, I think we would go for using jgit to generate bitmaps.

Regards
Bassem

 
That is interesting. Your repository has some weird structure to it that is making gerrit/jgit gc produce a lot more bitmaps.

My first guess is you have a lot of refs/heads/* branches? Each one of those gets at least one bitmap from a gerrit/jgit gc.

This repository has almost 8k refs/heads/* references 

I am assuming this indicates a different format and content for the bitmaps.

The formats are very close. git gc writes an extra index at the end of the bitmap file that should be ignored by gerrit/jgit.

As for clone times, what are the sizes of the pack files produced by the resulting gcs? The gerrit/jgit gc should write two pack files, one with the same name as the .bitmap file and another one that is smaller. This two pack format allows `git clone URL` to more or less just dump the bitmap's pack file to the client with little cpu on the server.
The pack files .pack  in case of jgit for this repository is 647 MB with 22MB .idx file and the smaller one is 413 MB with .idx of 38MB. In case of the native git gc, it looks to me there is one pack file of 731 MB and .idx of 59 MB.

Shawn Pearce

unread,
Feb 16, 2015, 8:03:11 PM2/16/15
to Bassem Rabil, repo-discuss, Matthias Sohn, Patrick Renaud
On Mon, Feb 16, 2015 at 12:32 PM, Bassem Rabil <bassem.ra...@ericsson.com> wrote:
Hi Shawn 

Here below more information about this repository. If native git cannot be tweaked to get the same pack format as jgit, I think we would go for using jgit to generate bitmaps.

Regards
Bassem

 
That is interesting. Your repository has some weird structure to it that is making gerrit/jgit gc produce a lot more bitmaps.

My first guess is you have a lot of refs/heads/* branches? Each one of those gets at least one bitmap from a gerrit/jgit gc.

This repository has almost 8k refs/heads/* references 

OK, that explains the size of the gerrit/jgit created bitmap file. With 8k branches it needs to produce at least 8,000 independent bitmaps to document the individual branches. This is necessary to support clients that ask for only one branch when they clone a repository. When clients ask for all branches the 8,000 bitmaps are quickly OR'd together in memory.

The more divergent the branches are from each other, the bigger the bitmaps have to be to describe them.

I haven't looked at the git gc implementation, but it sounds like they don't make a bitmap for every branch.
 
I am assuming this indicates a different format and content for the bitmaps.

The formats are very close. git gc writes an extra index at the end of the bitmap file that should be ignored by gerrit/jgit.

As for clone times, what are the sizes of the pack files produced by the resulting gcs? The gerrit/jgit gc should write two pack files, one with the same name as the .bitmap file and another one that is smaller. This two pack format allows `git clone URL` to more or less just dump the bitmap's pack file to the client with little cpu on the server. 
The pack files .pack  in case of jgit for this repository is 647 MB with 22MB .idx file and the smaller one is 413 MB with .idx of 38MB. In case of the native git gc, it looks to me there is one pack file of 731 MB and .idx of 59 MB.

Yuck. If the smaller one is 413M/38M you have a lot of content in this repository that is not reachable from a branch in refs/heads/*.

This is a clear case where git gc creating one file saves a lot of disk, as it can get better delta compression opportunities within the one file. This comes at a penalty during git clone where the server has to do more work to decompose the file into only the parts the client asked for.

With that one file from git gc you should see "Compressing ..." during a clone because the server has to decompress and recompress objects on the fly when they are delta compressed against things the client is not asking for. Again this is why gerrit/jgit gc does not do this and instead consumes disk by writing out a second file.


 I don't think git gc creates two packs. I think it writes one pack. Which might force gerrit into doing more cpu work to slice the pack apart for a client.

--

Bassem Rabil

unread,
Feb 16, 2015, 8:53:56 PM2/16/15
to repo-d...@googlegroups.com, bassem.ra...@ericsson.com, matthi...@gmail.com, pren...@gmail.com
Thanks Shawn for your explanation and clarification. 

Regards
Bassem

On Monday, February 16, 2015 at 8:03:11 PM UTC-5, Shawn Pearce wrote:
Yuck. If the smaller one is 413M/38M you have a lot of content in this repository that is not reachable from a branch in refs/heads/*.
 
You are right regarding the large number of refs which are not reachable using refs/heads/*. This repository has 110k references related to Gerrit changes, i.e. refs/changes/*. It is one of our hyper active repositories which uses code reviews extensively. 
Reply all
Reply to author
Forward
0 new messages