Help: getting rid of giant git packs

2,738 views
Skip to first unread message

Peter Kasting

unread,
Nov 19, 2015, 4:36:47 PM11/19/15
to Chromium-dev
I'm using the depot_tools git on Windows.

My drive is filled with large git pack files, many of which have old dates, totaling 55 GB.  I ran the following:

git gc --prune=now; git repack -a -d --depth=250 --window=250

...expecting to be left with just one pack file.  But I still seem to have lots.  Did I do something wrong?  How can I slim down my local pack files?

PK

Mike Frysinger

unread,
Nov 19, 2015, 4:41:47 PM11/19/15
to Peter Kasting, Chromium-dev
you probably want to use -A, but keep in mind that your description as-is doesn't make sense.  "old" pack files aren't a thing purely based on timestamps.  they're incremental by default so having multiple packs in your objects tree is perfectly normal.
-mike

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Peter Kasting

unread,
Nov 19, 2015, 4:44:38 PM11/19/15
to Mike Frysinger, Chromium-dev
On Thu, Nov 19, 2015 at 1:39 PM, Mike Frysinger <vap...@chromium.org> wrote:
you probably want to use -A, but keep in mind that your description as-is doesn't make sense.  "old" pack files aren't a thing purely based on timestamps.  they're incremental by default so having multiple packs in your objects tree is perfectly normal.

The point of -a and -A is to not incrementally pack, and instead pack everything into a single pack (from the git-repack docs).  Does that not actually mean "a single pack on disk"?

As to -A vs. -a, it seems like the only difference there is that if I use -A I have to run an extra git gc pass after the repack.  Is that not true?

PK

Mike Frysinger

unread,
Nov 19, 2015, 5:05:14 PM11/19/15
to Peter Kasting, Chromium-dev
On Thu, Nov 19, 2015 at 1:43 PM, Peter Kasting <pkas...@google.com> wrote:
On Thu, Nov 19, 2015 at 1:39 PM, Mike Frysinger <vap...@chromium.org> wrote:
you probably want to use -A, but keep in mind that your description as-is doesn't make sense.  "old" pack files aren't a thing purely based on timestamps.  they're incremental by default so having multiple packs in your objects tree is perfectly normal.

The point of -a and -A is to not incrementally pack, and instead pack everything into a single pack (from the git-repack docs).  Does that not actually mean "a single pack on disk"?

that isn't what your e-mail said.  it said "i saw packs with old timestamps, therefore i then ran git repack" with the implication that the packs, being old, were useless.  i'm pointing out that logic doesn't make sense by itself.

As to -A vs. -a, it seems like the only difference there is that if I use -A I have to run an extra git gc pass after the repack.  Is that not true?

-A will drop useless objects that are already in a pack.  -a won't do that.  since your stated goal is to minimize, it sounds like you want -A.
-mike

Peter Kasting

unread,
Nov 19, 2015, 5:13:08 PM11/19/15
to Mike Frysinger, Chromium-dev
On Thu, Nov 19, 2015 at 2:03 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Nov 19, 2015 at 1:43 PM, Peter Kasting <pkas...@google.com> wrote:
On Thu, Nov 19, 2015 at 1:39 PM, Mike Frysinger <vap...@chromium.org> wrote:
you probably want to use -A, but keep in mind that your description as-is doesn't make sense.  "old" pack files aren't a thing purely based on timestamps.  they're incremental by default so having multiple packs in your objects tree is perfectly normal.

The point of -a and -A is to not incrementally pack, and instead pack everything into a single pack (from the git-repack docs).  Does that not actually mean "a single pack on disk"?

that isn't what your e-mail said.  it said "i saw packs with old timestamps, therefore i then ran git repack" with the implication that the packs, being old, were useless.  i'm pointing out that logic doesn't make sense by itself.

Sorry, I was trying not to bore everyone with needless details.  It's more "I have more than half my drive full of dozens and dozens of packs, far larger than the repo size itself, and I need more space, so I wanted to repack, assuming that a lot of the existing objects were cruft".

As to -A vs. -a, it seems like the only difference there is that if I use -A I have to run an extra git gc pass after the repack.  Is that not true?

-A will drop useless objects that are already in a pack.  -a won't do that.  since your stated goal is to minimize, it sounds like you want -A.

I'm confused.  The docs say -a will leave useless objects in their old packs and then -d will delete the old packs, whereas -A will leave useless objects as unreferenced, so -d will no longer delete them, and instead you need to run an additional git gc to delete them.  Are the docs wrong?  Am I mistaken that -a -d should just delete all these outright?

PK

Mike Frysinger

unread,
Nov 19, 2015, 5:24:34 PM11/19/15
to Peter Kasting, Chromium-dev
your description looks accurate.  i thought -a did less work (based on previous invocations server side), but i guess i remembered wrong (based on the docs).
-mike

Peter Kasting

unread,
Nov 19, 2015, 6:58:09 PM11/19/15
to Mike Frysinger, Chromium-dev
Hmm.  I think maybe the issue is that I have another Chromium checkout inside my webrtc checkout.  Unfortunately, running "git gc --prune=now" there eventually gives: "fatal: Out of memory? mmap failed: No error".  Despite this being on a machine with 64 GB of RAM.

The internet is not helping me much on this one :/

PK

Peter Kasting

unread,
Nov 19, 2015, 10:46:50 PM11/19/15
to Mike Frysinger, Chromium-dev
On Thu, Nov 19, 2015 at 3:57 PM, Peter Kasting <pkas...@google.com> wrote:
Hmm.  I think maybe the issue is that I have another Chromium checkout inside my webrtc checkout.  Unfortunately, running "git gc --prune=now" there eventually gives: "fatal: Out of memory? mmap failed: No error".  Despite this being on a machine with 64 GB of RAM.

If anyone else has this issue: I just nuked msysgit 1.9.5 off my system and installed the 64-bit version of whatever the latest version is, and that let me get past this.

PK 

Daniel Bratell

unread,
Nov 20, 2015, 2:58:46 AM11/20/15
to Mike Frysinger, 'Peter Kasting' via Chromium-dev, pkas...@google.com
Were many of the files named tmp*pack*? In those cases it was results of failed pack operations left behind for debugging. For instance if the disk becomes full git leaves such files behind so that your disk stays full and don't accidentally becomes usable again.


/Daniel

--
/* Opera Software, Linköping, Sweden: CET (UTC+1) */

Peter Kasting

unread,
Nov 20, 2015, 3:15:57 AM11/20/15
to Daniel Bratell, Mike Frysinger, 'Peter Kasting' via Chromium-dev
This one wasn't my issue, I don't think -- but I don't know for sure since I already manually did a git prune, which will clean these up.

I wasn't quite able to stay at work long enough tonight for my repack of the Chromium-in-WebRTC checkout to finish; I'm hoping when I look tomorrow I'll have more free space.

PK 

Ruud van Asseldonk

unread,
Nov 20, 2015, 4:38:08 AM11/20/15
to Chromium-dev, bra...@opera.com, vap...@chromium.org
After the Blink-Chromium merge I repacked my repository. Doing a git repack takes ages for a repository of this size, so I did the following:

I wanted to do a repack, but it is prohibitively slow. The following worked for me and is much faster:

Do a fresh clone. This will ensure that the repo has one big packfile. (And the
server packs it for you so no need to do it locally.)


Restore your local work to this repo (you will lose your stashes):

    $ cd chromium_src
    $ git remote add with_old_packs file:///path/to/chromium/src
    $ git fetch with_old_packs --tags

Swap the fragmented objects and packfiles with the fresh tidy packfiles:

    $ cd /path/to/chromium/src/.git
    $ mv objects objects.bak
    $ mv /path/to/fresh/chromium_src/.git/objects objects

Stop Git from complaining about missing stashes:

    $ git stash clear

If you are convinced that you did not lose any local work, remove `chromium_src`
and `.git/objects.bak`.

You should now have two packfiles, a big one (~4.5 GiB) with the Chromium (and
Blink) history, and a small one with your local work. There should be no loose
objects.

Ruud 

Primiano Tucci

unread,
Nov 25, 2015, 4:10:21 PM11/25/15
to Ruud van Asseldonk, Chromium-dev, Daniel Bratell, Mike Frysinger
Somehow I missed this thread! 

On Thu, Nov 19, 2015 at 9:34 PM, 'Peter Kasting' via Chromium-dev <chromi...@chromium.org> wrote:
>My drive is filled with large git pack files, many of which have old dates, totaling 55 GB. 
55 GB of pack files is insane. An ideal value should be ~4GB. A reasonable value should be around 5-6 GB.
I'd expect "git repack -a -d" to collapse everything in one pack file. Peter, what you are seeing feels just a git bug.
I wonder if that is some bug of old versions of git on windows, bailing out without an explanation when hitting some weird limit due to using some old fs api?

On Thu, Nov 19, 2015 at 10:03 PM, Mike Frysinger <vap...@chromium.org> wrote:
>-A will drop useless objects that are already in a pack.  -a won't do that.  since your stated goal is to minimize, it sounds like you want -A.
Peter is running a prune before the repack, so really -a and -A should not make any difference. Also, very unlikely unreachable objects will contribute that much. The only contributor to unreachable objects I can think of in chromium are local branches which got deleted. I'd be surprised if they can add up to more than some hundred MB per year... unless you changed histograms.xml every day :) 

> Unfortunately, running "git gc --prune=now" there eventually gives: "fatal: Out of memory? mmap failed: No error".  Despite this being on a machine with 64 GB of RAM.
This is a combined effect of:
 - depot tools setting a very high core.deltaBaseCacheLimit (which is a good idea in general)
 - your machine having a lot of cores, which hints git to use a high pack.threads concurrency (which is a good idea in general)
So prune ends up spawning lot of threads each one with a huge deltaCache budget. The reason why the internet is not helping here is that: people on the internet usually don't have 40+ cores workstation and don't mess around with deltaBaseCacheLimit. You have the privilege of doing both :)
The answer here is to either temporarily lower down the deltaBaseCacheLimit (say to 128M) or temporarily reduce pack.threads (I'd probably go for the latter).

On top of that, there is a possibility that even if you succeed to repack everything, the final pack file won't be as small as you expect. The answer is that all these repack operations don't rebuild the delta chains, which is what might hit your sizeof(.git/objects/pack) the most if you sync ~daily for ~years.
What you really want in this case is passing -f (no reuse delta) to git repack. This will take ages (a night if you set the right deltaBaseCacheLimit or pack.threads options), but is the only thing that would actually bring full sanity to your packs... other than restarting from scratch, which honestly might not be such a bad idea ;-)


--

Peter Kasting

unread,
Dec 3, 2015, 2:36:26 PM12/3/15
to Primiano Tucci, Ruud van Asseldonk, Chromium-dev, Daniel Bratell, Mike Frysinger
On Wed, Nov 25, 2015 at 1:08 PM, Primiano Tucci <prim...@chromium.org> wrote:
Somehow I missed this thread! 

On Thu, Nov 19, 2015 at 9:34 PM, 'Peter Kasting' via Chromium-dev <chromi...@chromium.org> wrote:
>My drive is filled with large git pack files, many of which have old dates, totaling 55 GB. 
55 GB of pack files is insane. An ideal value should be ~4GB. A reasonable value should be around 5-6 GB.
I'd expect "git repack -a -d" to collapse everything in one pack file. Peter, what you are seeing feels just a git bug.

It turned out the issue was (as I alluded to before) that there are many different git checkouts, including some inside others.  I hadn't flattened all the checkouts, just some of them.
 
> Unfortunately, running "git gc --prune=now" there eventually gives: "fatal: Out of memory? mmap failed: No error".  Despite this being on a machine with 64 GB of RAM.
This is a combined effect of:
 - depot tools setting a very high core.deltaBaseCacheLimit (which is a good idea in general)
 - your machine having a lot of cores, which hints git to use a high pack.threads concurrency (which is a good idea in general)
So prune ends up spawning lot of threads each one with a huge deltaCache budget. The reason why the internet is not helping here is that: people on the internet usually don't have 40+ cores workstation and don't mess around with deltaBaseCacheLimit. You have the privilege of doing both :)
The answer here is to either temporarily lower down the deltaBaseCacheLimit (say to 128M) or temporarily reduce pack.threads (I'd probably go for the latter).

When I started using 64-bit Git this problem disappeared, but the above is good to know.

On top of that, there is a possibility that even if you succeed to repack everything, the final pack file won't be as small as you expect. The answer is that all these repack operations don't rebuild the delta chains, which is what might hit your sizeof(.git/objects/pack) the most if you sync ~daily for ~years.
What you really want in this case is passing -f (no reuse delta) to git repack. This will take ages (a night if you set the right deltaBaseCacheLimit or pack.threads options), but is the only thing that would actually bring full sanity to your packs... other than restarting from scratch, which honestly might not be such a bad idea ;-)

Yeah, I ultimately used -f.

PK 
Reply all
Reply to author
Forward
0 new messages