how do i perform git gc on a repo managed by gitolite?

1,874 views
Skip to first unread message

stardust

unread,
Sep 26, 2011, 6:47:15 AM9/26/11
to gito...@googlegroups.com
Hi!

How do i perform git gc on a repo managed by gitolite? 

Since i only interact with the repo from outside via fetch/pull and push i wonder what i have to do to start the garbage collection (git gc) on a certain repo.

Thanks!

stardust

unread,
Sep 26, 2011, 7:40:20 AM9/26/11
to gito...@googlegroups.com
i am talking about the repo on the server side, i know that i can run git gc locally.

Nicolas Morey-Chaisemartin

unread,
Sep 26, 2011, 8:52:19 AM9/26/11
to gito...@googlegroups.com, stardust
Run as your repo gitolite user:
git config --global gc.auto true

Thus when git will run git gc when it feels useful (after a certain amount of push usually)

At work we also have a simple crontab script that run git gc every night (to limit the number of files before running a backup) and git gc --aggressive during week ends to try to gain some size

Nicolas

Eli Barzilay

unread,
Sep 26, 2011, 11:07:54 AM9/26/11
to deve...@morey-chaisemartin.com, gito...@googlegroups.com, stardust
Two hours ago, Nicolas Morey-Chaisemartin wrote:
> On 09/26/2011 12:47 PM, stardust wrote:
> > Hi!
> >
> > How do i perform git gc on a repo managed by gitolite?
> >
> > Since i only interact with the repo from outside via fetch/pull
> > and push i wonder what i have to do to start the garbage
> > collection (git gc) on a certain repo.
> >
> Run as your repo gitolite user:
> git config --global gc.auto true
>
> Thus when git will run git gc when it feels useful (after a certain
> amount of push usually)
>
> At work we also have a simple crontab script that run git gc every
> night (to limit the number of files before running a backup) and git
> gc --aggressive during week ends to try to gain some size

It's probably trivial to write a command for running a gc on some repo
on the server too -- but it looks to me like a repo on a server is
one of the *last* places you want to be running a gc on...

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!

Nicolas Morey-Chaisemartin

unread,
Sep 26, 2011, 11:19:20 AM9/26/11
to Eli Barzilay, gito...@googlegroups.com, stardust
Why not?
Ok If you messed up a branch and have loose objects they will disappear.
But, you need to wait for the two week period before gc prunes any loose objects.

And git does not replace backups so even if you destroyed a branch, and gc pruned it, you can still extract the object from previous backups...
You can also configure a longer pruneexpire/reflogexpire period.

And it is a place where you need to run git gc (at least we do). We work with repo such ass gcc which are absolutely huge and takes ages to clone/act upon. Without gcc, we'd just spend half the day waiting to clone/pull/push.

Plus it keeps refs tidy and pack are easier to export for backup purpose:1 file to rsync is much much faster than a few thousands (stat on each file is killing the performances).

Nicolas

milk

unread,
Sep 26, 2011, 11:44:16 AM9/26/11
to deve...@morey-chaisemartin.com, Eli Barzilay, gito...@googlegroups.com, stardust
On Mon, Sep 26, 2011 at 08:19, Nicolas Morey-Chaisemartin
<deve...@morey-chaisemartin.com> wrote:
> And git does not replace backups so even if you destroyed a branch, and gc pruned it, you can still extract the object from previous backups...
> You can also configure a longer pruneexpire/reflogexpire period.
>
> Plus it keeps refs tidy and pack are easier to export for backup purpose:1 file to rsync is much much faster than a few thousands (stat on each file is killing the performances).

This is a little contradictory. You will prune before export to
backup, but rely on the backup for objects that might have been pruned
before the backup...

-milki

Eli Barzilay

unread,
Sep 26, 2011, 11:46:38 AM9/26/11
to deve...@morey-chaisemartin.com, gito...@googlegroups.com, stardust
20 minutes ago, Nicolas Morey-Chaisemartin wrote:
> And git does not replace backups so even if you destroyed a branch,
> and gc pruned it, you can still extract the object from previous
> backups... You can also configure a longer pruneexpire/reflogexpire
> period.

Relying more on backups is exactly why it (gcing frequently) is a bad
idea. It's true that you can configure that -- but what's the point
of running it more frequently then? If you want just the repacking
aspect, then ... run just git repack.


> And it is a place where you need to run git gc (at least we do). We
> work with repo such ass gcc which are absolutely huge and takes ages
> to clone/act upon. Without gcc, we'd just spend half the day waiting
> to clone/pull/push.

Again, if you don't need the gc aspect, there's no point in running
it. But even with just repacking, I'd be surprised if the difference
is noticeable after the initial setup -- it's true that a gcc clone
would be huge, but that's existing stuff that gets packed once and
then you pay the extra cost only for new content (until it gets
repacked automatically too).


> Plus it keeps refs tidy and pack are easier to export for backup
> purpose:1 file to rsync is much much faster than a few thousands
> (stat on each file is killing the performances).

Um, so you want it to be tidy even if you can *lose* stuff because you
want *backups* to be faster? That's a bad cyclic reasoning...

Christopher Fuhrman

unread,
Sep 27, 2011, 1:27:58 AM9/27/11
to Eli Barzilay, deve...@morey-chaisemartin.com, gito...@googlegroups.com, stardust
Howdy,

At ${WORKPLACE}, we allow our developers the ability to remove their own branches remotely on the dev repo. I had originally set up a cron job to run 'git gc' nightly on each repo, but found that branches would mysteriously disappear following garbage collection[1] (Eeek! Scary!)

In the end, I disabled garbage collection entirely on remote repositories and the problem disappeared.

Cheers!

Footnotes:

[1] I suspect that this would happen whenever a branch was given the same name as a branch that had already been deleted.

--
Christopher Fuhrman
cfuh...@gmail.com


Nicolas Morey-Chaisemartin

unread,
Sep 27, 2011, 2:45:20 AM9/27/11
to Eli Barzilay, gito...@googlegroups.com, stardust
On 09/26/2011 05:46 PM, Eli Barzilay wrote:
> 20 minutes ago, Nicolas Morey-Chaisemartin wrote:
>> And git does not replace backups so even if you destroyed a branch,
>> and gc pruned it, you can still extract the object from previous
>> backups... You can also configure a longer pruneexpire/reflogexpire
>> period.
> Relying more on backups is exactly why it (gcing frequently) is a bad
> idea. It's true that you can configure that -- but what's the point
> of running it more frequently then? If you want just the repacking
> aspect, then ... run just git repack.
>
Because with our workflow we tend to create a lot of loose object: dev branch rebased, squashed, or simply ammending commit so they pass through continuous integration.
Also some half woken users who accidentally commit and push useless binary files (I've seen 5Gb core files being commited).

git-repack is vital for us as we have a lots of branches and creates about a dozens tag for each of our ~50 repo on a good day. But gc provides smaller repo and removes all the unnecessary old crap

>> And it is a place where you need to run git gc (at least we do). We
>> work with repo such ass gcc which are absolutely huge and takes ages
>> to clone/act upon. Without gcc, we'd just spend half the day waiting
>> to clone/pull/push.
> Again, if you don't need the gc aspect, there's no point in running
> it. But even with just repacking, I'd be surprised if the difference
> is noticeable after the initial setup -- it's true that a gcc clone
> would be huge, but that's existing stuff that gets packed once and
> then you pay the extra cost only for new content (until it gets
> repacked automatically too).
>

I agree, repack does most of the job, but we still have a lot of loose object and no reason to keep them forever !


>> Plus it keeps refs tidy and pack are easier to export for backup
>> purpose:1 file to rsync is much much faster than a few thousands
>> (stat on each file is killing the performances).
> Um, so you want it to be tidy even if you can *lose* stuff because you
> want *backups* to be faster? That's a bad cyclic reasoning...
>

Well it may be but as we keep our weekly backup forever (rsync is a first step before backuping to tape). We don't risk any data loss.
If an object gets remove by gc, they have been in the repo for at least two weeks (usually more due to reflog) so they have been on at least one weekly backup !

I agree, running gc is pointless with some workflows and dangerous depending on your backup policy, but it not a bad thing neither risky in my case at least.
And I have to admit, I (or someone at work) have never needed to retrieve a loose object older than 3 months. It can happen but it's a tiny corner case that is rare enough so we go digging through backups.

Nicolas

milk

unread,
Sep 27, 2011, 3:19:42 AM9/27/11
to deve...@morey-chaisemartin.com, Eli Barzilay, gito...@googlegroups.com, stardust
Yup. Different workflows support different kind of behavioirs. To each
his own. So, if it works for you to gc more often, great!

-milki

--
Hope is a dimension of the spirit. It is not outside us, but within
us. When you lose it, you must seek it again within yourself and in
people around you -- not in objects or even events.    -
-Vaclav Havel

Reply all
Reply to author
Forward
0 new messages