Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#841414: git "fatal: Out of memory? mmap failed: Cannot allocate memory"

156 views
Skip to first unread message

Ian Jackson

unread,
Oct 20, 2016, 8:40:02 AM10/20/16
to
Control: reassign -1 git
Control: found -1 1:2.1.4-2.1+deb8u2

Hi, git maintainers. I'm having a bit of a conundrum with the dgit
git server and I hope you can help.

I don't know that this is a bug in git. I'm reassigning this bug to
git so as to ask for your input, and I expect that after we are done
with triage this bug will be split and/or reassigned and/or made into
DSA tickets or something.


So, on to the problem. I see this:

mariner:d> git clone https://git.dgit.debian.org/_test_botch.git
Cloning into '_test_botch'...
remote: Counting objects: 1114, done.
remote: warning: suboptimal pack - out of memory
remote: fatal: Out of memory? mmap failed: Cannot allocate memory
remote: aborting due to possible repository corruption on the remote side.
fatal: protocol error: bad pack header
mariner:d>

The server (a VM, cgi-grnet-01.debian.org) has 1G of RAM and 500M of
swap. The repo _test_botch.git is a copy of botch.git, which is the
botch repo from the dgit-repos server for Debian as mirrored to the
git.dgit.d.o server using rsync. The server is running cgit for web
access on browse.dgit.debian.org, and the git http smart transport on
git.dgit.d.o.

I have put a copy of the repo, in tarball form, here:
http://www.chiark.greenend.org.uk/~ijackson/quicksand/2016/_test_botch.git.tar.gz
(NB 210 Mby!)

Searching the intertubes for the error messages produced a lot of
people suggesting `oh just run git-repack' (with various options).
I ran `git repack' (with no options) and it made no difference.

On a hunch I ran `git-gc' on the source repo's actual live copy of
botch.git, and re-mirrored it. Now it works. (That is, `git clone
https://git.dgit.debian.org/botch.git' works, so this problem is no
longer affecting the production copy.)

It is possible that there is actually something wrong with the way I'm
handling my repo. Perhaps I need to explicitly run git-gc
occasionally, or something. I was under the impression that git would
do this automatically when it felt it was appropriate. I find
git-gc(1) slightly unclear on the question.

Can you please advise which of the following you think apply:

- This is a bug of some kind in git.

- The server should be provisioned with more RAM and/or swap.

- The dgit repos should be subjected to more background activity to
`tidy them up'. In this case, I would appreciate any advice you
had about: what the appropriate periodic activity is; how often it
should be run; and whether I need to lock against concurrent
updates by other programs.

- I am confused.

- Better documentation in git might help.

I am, of course, happy to supply more information, and I can do tests
etc. as well if that's helpful.

FYI the dgit-repos are handled by the dgit server in a slightly
unusual way. I don't think this is relevant, because ultimately it
means that the way the actual real repo is dealt with is fairly
conventional, at least from the point of view of the object store,
but:

The usual approach by the dgit server is to make a temporary repo
which is a hardlink farm to the real repo, and receive pushes into the
temporary repo. They are then inspected. If the push is considered
bad, the temporary repo is destroyed. If the push is considered good,
the relevant updates are pushed from the temporary repo to the real
repo. This is all achieved with a wrapper for git-receive pack as
well as some quite exciting hooks. The purpose of this is to avoid
adding objects from bad pushes (which might include unauthorised
pushes of harmful objects) to the real repo's object store.

Thanks for your attention.

Ian.

--
Ian Jackson <ijac...@chiark.greenend.org.uk> These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Anders Kaseorg

unread,
Oct 20, 2016, 3:10:04 PM10/20/16
to
Git really is running out of memory. When I try this locally (using
file:// to force Git to go through the same git-upload-pack dance that it
does over a smart transport, and using ulimit -v as a poor but probably
sufficient approximation of a RAM-limited server), it fails with 920 MB of
virtual memory and succeeds with 940 MB:

$ git clone -u 'ulimit -v 920000; git upload-pack' file:///tmp/_test_botch.git
Cloning into '_test_botch'...
remote: Counting objects: 1114, done.
remote: warning: suboptimal pack - out of memory
remote: fatal: mmap failed: Cannot allocate memory
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: index-pack failed
$ git clone -u 'ulimit -v 940000; git upload-pack' file:///tmp/_test_botch.git
Cloning into '_test_botch'...
remote: Counting objects: 1114, done.
remote: warning: suboptimal pack - out of memory
remote: Compressing objects: 100% (599/599), done.
remote: Total 1114 (delta 734), reused 811 (delta 495)
Receiving objects: 100% (1114/1114), 196.51 MiB | 8.58 MiB/s, done.
Resolving deltas: 100% (734/734), done.
Checking out files: 100% (367/367), done.

This shouldn’t be that surprising, as over 1 GB of actual content is
stored in the repository. (Why is botch so gigantic? It looks like there
may be a ton of unnecessary temporary data in tests/*/tmp and
tests/*/out.)

Git automatically runs ‘git gc --auto’ every so often to optimize the
repository. However, in this case, ‘git gc --auto’ does not detect that
anything needs to be done: you have only 172 < 6700 (gc.auto) loose
objects and 4 < 50 (gc.autoPackLimit) packs. Perhaps the packs are of
particularly poor quality due to previous instances of “warning:
suboptimal pack - out of memory”.

As you found, manually running ‘git gc’ does significantly improve the
situation: after that, a clone succeeds with just 260 MB because there is
nothing to be recompressed.

Anders

Ian Jackson

unread,
Oct 20, 2016, 8:30:02 PM10/20/16
to
Anders Kaseorg writes ("Re: Bug#841414: git "fatal: Out of memory? mmap failed: Cannot allocate memory""):
> Git really is running out of memory. When I try this locally (using
> file:// to force Git to go through the same git-upload-pack dance that it
> does over a smart transport, and using ulimit -v as a poor but probably
> sufficient approximation of a RAM-limited server), it fails with 920 MB of
> virtual memory and succeeds with 940 MB:

Thanks for the investigation. You have provided useful facts.

So I think in terms of the questions I asked, you are saying:

* This is not a very unusual level of memory usage[1] so the server
should be provisioned with more swap, at the very least, and
probably more RAM.

* Using git-gc more often might be beneficial.

Many (most?) of the dgit-repos will read much more than they are
written. So that seens to emphasise the second point. I'm still not
entirely sure whether I should just use the default options to git-gc
but that's probably a good starting point.

Does that seem right ?

I have an outstanding question: do I need to lock against concurrent
updates by other programs ? That is, can I run git-gc and (say)
git-receive-pack at the same time, safely ?

Also, I still wonder if better documentation in git might help.

Thanks,
Ian.

[1] Large repos are to be expected occasionally.

Anders Kaseorg

unread,
Oct 20, 2016, 9:30:02 PM10/20/16
to
On Thu, 20 Oct 2016, Ian Jackson wrote:
> Does that seem right ?

Sounds right to me.

> I have an outstanding question: do I need to lock against concurrent
> updates by other programs ? That is, can I run git-gc and (say)
> git-receive-pack at the same time, safely ?

This is safe by default, as long as you don’t use something like
--prune=all or gc.pruneExpire=all.

Anders

Ian Jackson

unread,
Oct 21, 2016, 6:50:02 AM10/21/16
to
Control: reassign -1 dgit-infrastructure
Control: found -1 2.4
Control: retitle -1 Should occasionally run git-gc in dgit-repos

Anders Kaseorg writes ("Re: Bug#841414: git "fatal: Out of memory? mmap failed: Cannot allocate memory""):
> On Thu, 20 Oct 2016, Ian Jackson wrote:
> > Does that seem right ?
>
> Sounds right to me.

Right, OK, thanks.

> > I have an outstanding question: do I need to lock against concurrent
> > updates by other programs ? That is, can I run git-gc and (say)
> > git-receive-pack at the same time, safely ?
>
> This is safe by default, as long as you don’t use something like
> --prune=all or gc.pruneExpire=all.

Thanks for your helpful input.

Regards,
Ian.
0 new messages