Thus when git will run git gc when it feels useful (after a certain amount of push usually)
At work we also have a simple crontab script that run git gc every night (to limit the number of files before running a backup) and git gc --aggressive during week ends to try to gain some size
Nicolas
It's probably trivial to write a command for running a gc on some repo
on the server too -- but it looks to me like a repo on a server is
one of the *last* places you want to be running a gc on...
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
And git does not replace backups so even if you destroyed a branch, and gc pruned it, you can still extract the object from previous backups...
You can also configure a longer pruneexpire/reflogexpire period.
And it is a place where you need to run git gc (at least we do). We work with repo such ass gcc which are absolutely huge and takes ages to clone/act upon. Without gcc, we'd just spend half the day waiting to clone/pull/push.
Plus it keeps refs tidy and pack are easier to export for backup purpose:1 file to rsync is much much faster than a few thousands (stat on each file is killing the performances).
Nicolas
This is a little contradictory. You will prune before export to
backup, but rely on the backup for objects that might have been pruned
before the backup...
-milki
Relying more on backups is exactly why it (gcing frequently) is a bad
idea. It's true that you can configure that -- but what's the point
of running it more frequently then? If you want just the repacking
aspect, then ... run just git repack.
> And it is a place where you need to run git gc (at least we do). We
> work with repo such ass gcc which are absolutely huge and takes ages
> to clone/act upon. Without gcc, we'd just spend half the day waiting
> to clone/pull/push.
Again, if you don't need the gc aspect, there's no point in running
it. But even with just repacking, I'd be surprised if the difference
is noticeable after the initial setup -- it's true that a gcc clone
would be huge, but that's existing stuff that gets packed once and
then you pay the extra cost only for new content (until it gets
repacked automatically too).
> Plus it keeps refs tidy and pack are easier to export for backup
> purpose:1 file to rsync is much much faster than a few thousands
> (stat on each file is killing the performances).
Um, so you want it to be tidy even if you can *lose* stuff because you
want *backups* to be faster? That's a bad cyclic reasoning...
At ${WORKPLACE}, we allow our developers the ability to remove their own branches remotely on the dev repo. I had originally set up a cron job to run 'git gc' nightly on each repo, but found that branches would mysteriously disappear following garbage collection[1] (Eeek! Scary!)
In the end, I disabled garbage collection entirely on remote repositories and the problem disappeared.
Cheers!
Footnotes:
[1] I suspect that this would happen whenever a branch was given the same name as a branch that had already been deleted.
--
Christopher Fuhrman
cfuh...@gmail.com
>> And it is a place where you need to run git gc (at least we do). We
>> work with repo such ass gcc which are absolutely huge and takes ages
>> to clone/act upon. Without gcc, we'd just spend half the day waiting
>> to clone/pull/push.
> Again, if you don't need the gc aspect, there's no point in running
> it. But even with just repacking, I'd be surprised if the difference
> is noticeable after the initial setup -- it's true that a gcc clone
> would be huge, but that's existing stuff that gets packed once and
> then you pay the extra cost only for new content (until it gets
> repacked automatically too).
>
I agree, repack does most of the job, but we still have a lot of loose object and no reason to keep them forever !
>> Plus it keeps refs tidy and pack are easier to export for backup
>> purpose:1 file to rsync is much much faster than a few thousands
>> (stat on each file is killing the performances).
> Um, so you want it to be tidy even if you can *lose* stuff because you
> want *backups* to be faster? That's a bad cyclic reasoning...
>
Well it may be but as we keep our weekly backup forever (rsync is a first step before backuping to tape). We don't risk any data loss.
If an object gets remove by gc, they have been in the repo for at least two weeks (usually more due to reflog) so they have been on at least one weekly backup !
I agree, running gc is pointless with some workflows and dangerous depending on your backup policy, but it not a bad thing neither risky in my case at least.
And I have to admit, I (or someone at work) have never needed to retrieve a loose object older than 3 months. It can happen but it's a tiny corner case that is rare enough so we go digging through backups.
Nicolas
-milki
--
Hope is a dimension of the spirit. It is not outside us, but within
us. When you lose it, you must seek it again within yourself and in
people around you -- not in objects or even events. -
-Vaclav Havel