JGit's GC can hang on small (and large) repositories

262 views
Skip to first unread message

Doug Kelly

unread,
Nov 1, 2013, 6:35:11 PM11/1/13
to repo-d...@googlegroups.com
Not sure how much this will hurt people here, but we just completed our 2.7 upgrade this week, and with it, switched to gerrit gc.  We had a few issues related to large delta-compressed blobs blowing up JGit (thanks, Bassem!), but the other weird problem started happening on 1-2MB repos.

Basically, with 32 cores, I noticed JGit's GC would hang indefinitely.  It appears to get stuck in an endless loop.  I reported the issue upstream to https://bugs.eclipse.org/bugs/show_bug.cgi?id=420915 -- but wanted to alert anyone else, in case they run into this issue, too.

Also, to add to Bassem's previous comments regarding repacking a repo so JGit doesn't end in the "eternally slow" path for deltas, I found that not only was it a good idea to set core.bigFileThreshold = 10m... but also mark the file types of large objects to not use delta compression.  For example, say, some Windows developer checked in a bunch of .pdb files, you can add "*.pdb -delta" to info/gitattributes, and then run "git repack -a -d -f --window-memory 100m --max-pack-size 20m".  Finally, kick off a gerrit gc, and all should be well (or use "jgit --git-dir . gc" from the all-in-one jgit shell script to test, just in case things go south).

To find what huge blobs are out there, I used:

Then, matched the blob with the filename:
http://stackoverflow.com/questions/223678/which-commit-has-this-blob/12737458#12737458 (see the cmyers version down near the bottom -- I found its output most helpful).

After doing all of this, I *finally* got a set of repos I could garbage collect fully.

Hopefully these tips help someone out there...

--Doug

Matthias Sohn

unread,
Nov 1, 2013, 8:24:43 PM11/1/13
to Doug Kelly, Repo and Gerrit Discussion
On Fri, Nov 1, 2013 at 11:35 PM, Doug Kelly <doug...@gmail.com> wrote:
Not sure how much this will hurt people here, but we just completed our 2.7 upgrade this week, and with it, switched to gerrit gc.  We had a few issues related to large delta-compressed blobs blowing up JGit (thanks, Bassem!), but the other weird problem started happening on 1-2MB repos.

Basically, with 32 cores, I noticed JGit's GC would hang indefinitely.  It appears to get stuck in an endless loop.  I reported the issue upstream to https://bugs.eclipse.org/bugs/show_bug.cgi?id=420915 -- but wanted to alert anyone else, in case they run into this issue, too.

Also, to add to Bassem's previous comments regarding repacking a repo so JGit doesn't end in the "eternally slow" path for deltas, I found that not only was it a good idea to set core.bigFileThreshold = 10m... but also mark the file types of large objects to not use delta compression.  For example, say, some Windows developer checked in a bunch of .pdb files, you can add "*.pdb -delta" to info/gitattributes, and then run "git repack -a -d -f --window-memory 100m --max-pack-size 20m".  Finally, kick off a gerrit gc, and all should be well (or use "jgit --git-dir . gc" from the all-in-one jgit shell script to test, just in case things go south).

setting attributes shouldn't make a difference since jgit doesn't yet support git attributes [1].


--
Matthias 
Reply all
Reply to author
Forward
0 new messages