Huge memory allocation from gitiles' visibility check when using bitmap index

49 views
Skip to first unread message

Saša Živkov

unread,
Jun 20, 2024, 11:05:07 AMJun 20
to Repo and Gerrit Discussion
On a large repository [1] we observe that a gitiles GET request on a commit:
GET https:/.../plugins/gitiles/<repo-name>/+/<commit-id>

can cause memory allocation of >100GB when caches are cold.
And even when caches are not cold, this kind of request can often allocate >20GB.

For those having repositories of similar size: can you check if you have the same issue? Note that the allocated memory per request is available in the httpd_log so it should be easy to check.

We found that this is related to the gitiles' visibility check which ends-up reading the bitmap index.
Profiling JVM showed that this code-path a[2] allocated >100GB.

If we remove the bitmap index then with cold caches allocated memory for the same request goes around 400MB.

[1]
$ git count-objects -vH
count: 4347
size: 22.98 MiB
in-pack: 46628601
packs: 240
size-pack: 12.91 GiB
prune-packable: 106
garbage: 1
size-garbage: 0 bytes
$ git show-ref | wc -l
2830244


[2]
Stack Trace    Count    Percentage
void com.googlecode.javaewah.LongArray.<init>(int)    78464    99,8 %
void com.googlecode.javaewah.EWAHCompressedBitmap.<init>(int)    78462    99,8 %
EWAHCompressedBitmap com.googlecode.javaewah.EWAHCompressedBitmap.xor(EWAHCompressedBitmap)    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex$StoredBitmap.getBitmapWithoutCaching()    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex$StoredBitmap.getBitmap()    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex.getBitmap(AnyObjectId)    78349    99,6 %
BitmapIndexImpl$CompressedBitmap org.eclipse.jgit.internal.storage.file.BitmapIndexImpl.getBitmap(AnyObjectId)    78349    99,6 %
BitmapIndex$Bitmap org.eclipse.jgit.internal.storage.file.BitmapIndexImpl.getBitmap(AnyObjectId)    78349    99,6 %
boolean org.eclipse.jgit.internal.revwalk.BitmappedReachabilityChecker$ReachedFilter.include(RevWalk, RevCommit)    78349    99,6 %
RevCommit org.eclipse.jgit.revwalk.PendingGenerator.next()    78349    99,6 %
void org.eclipse.jgit.revwalk.RewriteGenerator.applyFilterToParents(RevCommit)    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.RewriteGenerator.next()    77939    99,1 %
void org.eclipse.jgit.revwalk.TopoSortGenerator.<init>(Generator)    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.StartGenerator.next()    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.RevWalk.next()    77939    99,1 %
Optional org.eclipse.jgit.internal.revwalk.BitmappedReachabilityChecker.areAllReachable(Collection, Stream)    77939    99,1 %
Optional org.eclipse.jgit.revwalk.ReachabilityChecker.areAllReachable(Collection, Collection)    77939    99,1 %
boolean com.google.gitiles.VisibilityChecker.isReachableFrom(String, RevWalk, RevCommit, Collection)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isReachableFrom(String, RevWalk, RevCommit, Stream)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isReachableFromRefs(String, RevWalk, RevCommit, Stream)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isVisible(Repository, RevWalk, ObjectId, Collection)    77939    99,1 %
...

Luca Milanesio

unread,
Jun 20, 2024, 11:34:17 AMJun 20
to Repo and Gerrit Discussion, Luca Milanesio, Saša Živkov

On 20 Jun 2024, at 16:04, Saša Živkov <ziv...@gmail.com> wrote:

On a large repository [1] we observe that a gitiles GET request on a commit:
GET https:/.../plugins/gitiles/<repo-name>/+/<commit-id>

can cause memory allocation of >100GB when caches are cold.
And even when caches are not cold, this kind of request can often allocate >20GB.

For those having repositories of similar size: can you check if you have the same issue? Note that the allocated memory per request is available in the httpd_log so it should be easy to check.

We found that this is related to the gitiles' visibility check which ends-up reading the bitmap index.
Profiling JVM showed that this code-path a[2] allocated >100GB.

If we remove the bitmap index then with cold caches allocated memory for the same request goes around 400MB.

[1]
$ git count-objects -vH
count: 4347
size: 22.98 MiB
in-pack: 46628601

Wow, 46M of objects in packs?


packs: 240
size-pack: 12.91 GiB
prune-packable: 106
garbage: 1
size-garbage: 0 bytes
$ git show-ref | wc -l
2830244


2.8M refs is a lot, how many refs does the repo have under refs/heads/*?
(The number of objects and number of heads would determine how big is your bitmap)

In our experiments and experience, a very large bitmap isn’t effective and would actually cause more issues than benefits.

How large is your bitmap file on disk?

Luca.



[2]
Stack Trace    Count    Percentage
void com.googlecode.javaewah.LongArray.<init>(int)    78464    99,8 %
void com.googlecode.javaewah.EWAHCompressedBitmap.<init>(int)    78462    99,8 %
EWAHCompressedBitmap com.googlecode.javaewah.EWAHCompressedBitmap.xor(EWAHCompressedBitmap)    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex$StoredBitmap.getBitmapWithoutCaching()    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex$StoredBitmap.getBitmap()    78349    99,6 %
EWAHCompressedBitmap org.eclipse.jgit.internal.storage.file.BasePackBitmapIndex.getBitmap(AnyObjectId)    78349    99,6 %
BitmapIndexImpl$CompressedBitmap org.eclipse.jgit.internal.storage.file.BitmapIndexImpl.getBitmap(AnyObjectId)    78349    99,6 %
BitmapIndex$Bitmap org.eclipse.jgit.internal.storage.file.BitmapIndexImpl.getBitmap(AnyObjectId)    78349    99,6 %
boolean org.eclipse.jgit.internal.revwalk.BitmappedReachabilityChecker$ReachedFilter.include(RevWalk, RevCommit)    78349    99,6 %
RevCommit org.eclipse.jgit.revwalk.PendingGenerator.next()    78349    99,6 %
void org.eclipse.jgit.revwalk.RewriteGenerator.applyFilterToParents(RevCommit)    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.RewriteGenerator.next()    77939    99,1 %
void org.eclipse.jgit.revwalk.TopoSortGenerator.<init>(Generator)    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.StartGenerator.next()    77939    99,1 %
RevCommit org.eclipse.jgit.revwalk.RevWalk.next()    77939    99,1 %
Optional org.eclipse.jgit.internal.revwalk.BitmappedReachabilityChecker.areAllReachable(Collection, Stream)    77939    99,1 %
Optional org.eclipse.jgit.revwalk.ReachabilityChecker.areAllReachable(Collection, Collection)    77939    99,1 %
boolean com.google.gitiles.VisibilityChecker.isReachableFrom(String, RevWalk, RevCommit, Collection)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isReachableFrom(String, RevWalk, RevCommit, Stream)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isReachableFromRefs(String, RevWalk, RevCommit, Stream)    77939    99,1 %
boolean com.google.gitiles.VisibilityCache.isVisible(Repository, RevWalk, ObjectId, Collection)    77939    99,1 %
...

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/CACrmCORpAvk9y%3DKk%2Bg_XoR96QQaW-%3DKFP3Li7cHz%2BaZPpaJ%3DzA%40mail.gmail.com.

Matthias Sohn

unread,
Jun 21, 2024, 8:19:26 AMJun 21
to Luca Milanesio, Repo and Gerrit Discussion, Saša Živkov
This repository currently has 1187 branches and 1724 tags.
 
In our experiments and experience, a very large bitmap isn’t effective and would actually cause more issues than benefits.

How large is your bitmap file on disk?

It had several 100MB.
So far we use git to run gc using the k8s-gerrit gc.sh script [1]. 
I guess that's the reason why the bitmap index was so large.
I am working on enhancing the gc container and this script to enable using JGit to run git gc.

We used the following configuration until we switched off repack.writebitmaps to workaround this problem

[repack]
  usedeltabaseoffset = true
  writebitmaps = true
[pack]
  compression = 9
  indexversion = 2
  window = 250
  depth = 50
  threads = 32
[gc]
  autodetach = false
  autopacklimit = 0
  packrefs = true
  reflogexpire = never
  reflogexpireunreachable = never
  auto = 0
  preserveoldpacks = true
  prunepreserved = true
  cruftPacks = true
  pruneExpire = 2.weeks.ago 

-Matthias

Matthias Sohn

unread,
Jun 21, 2024, 8:20:14 AMJun 21
to Luca Milanesio, Repo and Gerrit Discussion, Saša Živkov
So far we use git to run gc using the k8s-gerrit gc.sh script [2]. 
I guess that's the reason why the bitmap index was so large.
I am working on enhancing the gc container and this script to enable using JGit to run git gc.

We used the following configuration until we switched off repack.writebitmaps to workaround this problem

[repack]
  usedeltabaseoffset = true
  writebitmaps = true
[pack]
  compression = 9
  indexversion = 2
  window = 250
  depth = 50
  threads = 32
[gc]
  autodetach = false
  autopacklimit = 0
  packrefs = true
  reflogexpire = never
  reflogexpireunreachable = never
  auto = 0
  preserveoldpacks = true
  prunepreserved = true
  cruftPacks = true
  pruneExpire = 2.weeks.ago 

Luca Milanesio

unread,
Jun 21, 2024, 8:56:09 AMJun 21
to Repo and Gerrit Discussion, Luca Milanesio, Saša Živkov, Matthias Sohn
You may want to leverage my JGit GC bitmap generation improvement at [2] which allowed me to make a similar mono-repo *a lot more efficient* in terms of bitmap generation and use.

HTH

Luca.

Reply all
Reply to author
Forward
0 new messages