Hi Fabian,
I wouldn't call it bad reputation, git was only designed for small files.
The situation is getting better, but especially on windows is not too good.
I regularly store files up to 1GB in git repositories, so your sizes are
definitly possible.
> As I understand it, the diffing problem shouldn't affect me because a)
> binary files don't need to be diffed anyway and b) even loading a 400 MB
> file into memory twice for diffing is not going to be a problem in a 32 bit
> process.
Although binary files are not diffed, they are still loaded into the memory
for generating the stat info.
You can see that effect if you commit a large file with and without the -q
switch. With the -q switch the commit is done much quicker.
There is currently a patch in submission on the git mailinglist to correct
that, see
http://www.spinics.net/lists/git/msg232637.html.
> The compression problem can be tackled by setting core.bigFileThreshold to
> something smaller than the default 256 MB because Git won't try to compress
> files larger than this. This, however, does have the disadvantage that, for
> example, the 17 revisions of a 150 MB file would amount to about 2.5 GB of
> data even if the revisions could be compressed.
Does diskspace still matter that much?
> The packing problem can be tackled by setting pack.packSizeLimit to
> something small enough to limit the maximum pack size. (The default in Git
> for Windows is 2g.) pack.windowMemory and other pack options seem to play a
> role as well, although I don't understand exactly how. However, these pack
> settings do not affect the size of pack files created for pushing and
> pulling. Therefore, pushing and pulling might remain a problem.
> So, that's the result of my research so far.
>
> Now for my questions:
> 1. Have I got everything right in my analysis above? Am I missing anything
> important, any problems I should expect?
I have on my remote server from where I push and pull, git x64 from debian
wheezy, the following settings:
[core]
packedGitLimit = 512m
packedGitWindowSize = 512m
bigFileThreshold = 256m
[pack]
deltaCacheSize = 256m
windowMemory = 256m
These settings ensure that if I clone a repository from a 32bit windows
machine that I can decompress the packs.
I also use addtionally, this time in the repository, the .gitattributes
files feature of disabling delta compression for a certain file suffix.
e.g. in .gitattributes the following line would always turn delta
compression off for files with *.zip suffix.
*.zip delta=false
This wastes diskspace but can greatly enhance commit speeds.
> 2. Would you recommend setting core.bigFileThreshold, pack.packSizeLimit or
> other options to non-default values proactively on all clients, or should I
> rather postpone this until (if ever) we're experiencing problems? If I
> don't set these values proactively, is there a chance that the Git
> repository could be ruined?
I would not change the clients settings, instead tweak the server settings
and use .gitattributes delta=false feature.
And no you can't ruin the repository. As noted above, the worst thing is
that you can not clone from windows.
> -- What is a good value for core.bigFileThreshold, given my concrete binary
> files of 10 to 400 MB, some of which have up to 17 revisions?
Thats a diskspace vs memory/cpu time question. I'd go for diskspace and use
core.bigFileThreshold=32MB.
> -- What is a good value for pack.packSizeLimit? Git for Windows defaults it
> to 2g, is there any reason not to leave it at that?
> 3. Since pack.packSizeLimit does not affect the packs created for pulling
> and pushing - what problems can I expect there? How could I tackle them?
> 4. "git repack -afd" and "git gc" currently fail with an out of memory
> error on the migrated repository [1][2]. Should I worry about this?
Yes. The repository should be fsck-able also on the client side. As you are
otherwise really limited.
> -- I can make "git repack -afd" work by passing "--window-memory 750m" to
> the command. After that, git gc works fine again) Again, is setting
> pack.windowMemory to 750m something I should do proactively?
Can you try with my suggested bigfileThreshold value? In that case you
shouldn't have to adjust --window-memory.
> [1] $ git repack -afd
> Counting objects: 189121, done.
> Delta compression using up to 8 threads.
> warning: suboptimal pack - out of memory)
> fatal: Out of memory, malloc failed (tried to allocate 331852630 bytes)
You do know that the memory settings are per thread?
> [2] $ git gc
> Counting objects: 189121, done.
> Delta compression using up to 8 threads.
> fatal: Out of memory, malloc failed (tried to allocate 73267908 bytes)
> error: failed to run repack
>
Hope that helps,
Thomas