Potential memory leak in JGit

129 views
Skip to first unread message

Luca Milanesio

unread,
Jun 11, 2019, 2:31:59 AM6/11/19
to JGit Developers list, Repo and Gerrit Discussion, Luca Milanesio
Hi all,
I am writing to both JGit and Gerrit mailing list because the problem could be of interest for both communities.

I am able to systematically reproduce a fatal memory leak that causes the complete collapse of the JVM.

Repos: the Android repos
Test-case: for (all repos); do clone and git GC; done

P.S. The traffic is serialised so that we have the certainty that threads are done with their jobs when the next operation starts. 

The scenario is quite simple: just generate a constant traffic of continuous clones and JGit GC and, after a few hours, the JVM id completely dead.
By using a G1GC and tuning it, I managed to avoid almost all the STW GC cycles. However, the memory utilised increases continuously and at the end al the GC goes into an infinite loop trying to release something without being able to do so.

Sample of the GC log when the JVM is collapsed:
2019-06-11T08:15:27.340+0200: 28506.935: [Full GC (Allocation Failure)  90G->89G(100G), 106.4610300 secs]
   [Eden: 0.0B(35.0G)->0.0B(35.0G) Survivors: 0.0B->0.0B Heap: 90.5G(100.0G)->89.2G(100.0G)], [Metaspace: 61559K->61559K(65536K)]

And the JVM heap utilization:

With regards to the memory dump, see the following top offenders:


I have explicitly set low limits for caching packfiles in the gerrit.config:

[core]
   packedGitLimit = 10m
   packedGitOpenFiles = 16000
   packedGitWindowSize = 64k

In theory, JGit shouldn't hold huge amount of memory for packfiles and instead continuously read them from the filesystem. However, as you can see from the above figures, we have over 2M byte arrays holding over 22GBytes of memory.

I also have a ~/.gitconfig with a specific [pack] configuration:

[pack]
        maxDeltaDepth=5
        deltaSearchWindowSize=10
        deltaSearchMemoryLimit=0
        deltaCacheSize=52428800
        deltaCacheLimit=100
        compressionLevel=-1
        indexVersion=2
        bigFileThreshold=52428800
        threads=1
        reuseDeltas=true
        reuseObjects=true
        deltaCompress=false
        buildBitmaps=true

What sounds strange to me is the high number of LocalObjectToPack that are held in memory (9M of objects): the ObjectToPack should be only a temporary object used by a PackWriter, but I am not a super-expert of that part of code.

Rings any bell to anyone?

Thanks for the feedback.
Luca.


Matthias Sohn

unread,
Jun 11, 2019, 3:39:41 AM6/11/19
to Luca Milanesio, JGit Developers list, Repo and Gerrit Discussion
On Tue, Jun 11, 2019 at 8:31 AM Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi all,
I am writing to both JGit and Gerrit mailing list because the problem could be of interest for both communities.

I am able to systematically reproduce a fatal memory leak that causes the complete collapse of the JVM.

Repos: the Android repos
Test-case: for (all repos); do clone and git GC; done

Can you specify the exact test setup and share the test case itself ?
  • I guess you are serving from Gerrit ?
  • Which version of Gerrit and JGit are you using ?
  • Which JVM are you using ?
  • Which JVM option settings are you using ?
  • Do you run git gc using native git or JGit (embedded in Gerrit or standalone) ?
 
P.S. The traffic is serialised so that we have the certainty that threads are done with their jobs when the next operation starts. 

The scenario is quite simple: just generate a constant traffic of continuous clones and JGit GC and, after a few hours, the JVM id completely dead.
By using a G1GC and tuning it, I managed to avoid almost all the STW GC cycles. However, the memory utilised increases continuously and at the end al the GC goes into an infinite loop trying to release something without being able to do so.

Sample of the GC log when the JVM is collapsed:
2019-06-11T08:15:27.340+0200: 28506.935: [Full GC (Allocation Failure)  90G->89G(100G), 106.4610300 secs]
   [Eden: 0.0B(35.0G)->0.0B(35.0G) Survivors: 0.0B->0.0B Heap: 90.5G(100.0G)->89.2G(100.0G)], [Metaspace: 61559K->61559K(65536K)]

And the JVM heap utilization:

With regards to the memory dump, see the following top offenders:


I have explicitly set low limits for caching packfiles in the gerrit.config:

[core]
   packedGitLimit = 10m
   packedGitOpenFiles = 16000
   packedGitWindowSize = 64k

In theory, JGit shouldn't hold huge amount of memory for packfiles and instead continuously read them from the filesystem. However, as you can see from the above figures, we have over 2M byte arrays holding over 22GBytes of memory.

I also have a ~/.gitconfig with a specific [pack] configuration:

[pack]
        maxDeltaDepth=5
        deltaSearchWindowSize=10
        deltaSearchMemoryLimit=0
        deltaCacheSize=52428800

this means deltaCacheSize is  5 times larger than the JGit file cache, does that make sense ?
 
        deltaCacheLimit=100
        compressionLevel=-1
        indexVersion=2
        bigFileThreshold=52428800
        threads=1
        reuseDeltas=true
        reuseObjects=true
        deltaCompress=false
        buildBitmaps=true

What sounds strange to me is the high number of LocalObjectToPack that are held in memory (9M of objects): the ObjectToPack should be only a temporary object used by a PackWriter, but I am not a super-expert of that part of code.

do these repositories contain large files ?
 
Rings any bell to anyone?

Thanks for the feedback.
Luca.


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/A0E50AE5-874F-47E6-BCFE-56A58FA5E1D3%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Luca Milanesio

unread,
Jun 11, 2019, 4:19:49 AM6/11/19
to Matthias Sohn, Luca Milanesio, JGit Developers list, Repo and Gerrit Discussion
Hi Matthias,
thanks for the quick rely. 

Answers inline.

On 11 Jun 2019, at 08:39, Matthias Sohn <matthi...@gmail.com> wrote:

On Tue, Jun 11, 2019 at 8:31 AM Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi all,
I am writing to both JGit and Gerrit mailing list because the problem could be of interest for both communities.

I am able to systematically reproduce a fatal memory leak that causes the complete collapse of the JVM.

Repos: the Android repos
Test-case: for (all repos); do clone and git GC; done

Can you specify the exact test setup and share the test case itself ?
  • I guess you are serving from Gerrit ?

v2.15.13

  • Which version of Gerrit and JGit are you using ?
 v4.9.10.201904181027-r

  • Which JVM are you using ?
openjdk full version "1.8.0_212-b04"

  • Which JVM option settings are you using ?
        javaOptions = -server
        javaOptions = -Djava.security.egd=file:/dev/./urandom -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9011 -Dcom.sun.management.jmxremote.rmi.port=9011 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
        javaOptions = -XX:+UnlockExperimentalVMOptions
        javaOptions = -XX:+UseG1GC
        javaOptions = -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/home/gerrit/gerrit-sshd/logs/jvmgc_log
        javaOptions = -Xms100g
        javaOptions = -Xmx100g
        javaOptions = -XX:G1NewSizePercent=35
        javaOptions = -XX:MaxGCPauseMillis=500
        javaOptions = -XX:ParallelGCThreads=10
        javaOptions = -XX:ConcGCThreads=4
        javaOptions = -XX:InitiatingHeapOccupancyPercent=50
        javaOptions = -XX:G1HeapWastePercent=2
  • Do you run git gc using native git or JGit (embedded in Gerrit or standalone) ?
JGit, embedded in Gerrit

 
P.S. The traffic is serialised so that we have the certainty that threads are done with their jobs when the next operation starts. 

The scenario is quite simple: just generate a constant traffic of continuous clones and JGit GC and, after a few hours, the JVM id completely dead.
By using a G1GC and tuning it, I managed to avoid almost all the STW GC cycles. However, the memory utilised increases continuously and at the end al the GC goes into an infinite loop trying to release something without being able to do so.

Sample of the GC log when the JVM is collapsed:
2019-06-11T08:15:27.340+0200: 28506.935: [Full GC (Allocation Failure)  90G->89G(100G), 106.4610300 secs]
   [Eden: 0.0B(35.0G)->0.0B(35.0G) Survivors: 0.0B->0.0B Heap: 90.5G(100.0G)->89.2G(100.0G)], [Metaspace: 61559K->61559K(65536K)]

And the JVM heap utilization:
<Screenshot 2019-06-11 at 07.18.59.png>

With regards to the memory dump, see the following top offenders:

<Screenshot 2019-06-11 at 07.22.05.png>

I believe there are problems in releasing PackWriters, I do see 8k of them allocated :-O

From the code I see that *IF* PackWriter.close() isn't called, then the instance is pointed by a static Map:
private static final Map<WeakReference<PackWriter>, Boolean> instances =
new ConcurrentHashMap<>();
The map contains a WeakReference which it should, in theory, recalled by the JVM eventually ... but it seems that it doesn't do it as expected.
PackWriter contains pointers to many objects, including packfiles that are generated "on-the-fly" to serve client Git requests.
 


I have explicitly set low limits for caching packfiles in the gerrit.config:

[core]
   packedGitLimit = 10m
   packedGitOpenFiles = 16000
   packedGitWindowSize = 64k

In theory, JGit shouldn't hold huge amount of memory for packfiles and instead continuously read them from the filesystem. However, as you can see from the above figures, we have over 2M byte arrays holding over 22GBytes of memory.

I also have a ~/.gitconfig with a specific [pack] configuration:

[pack]
        maxDeltaDepth=5
        deltaSearchWindowSize=10
        deltaSearchMemoryLimit=0
        deltaCacheSize=52428800

this means deltaCacheSize is  5 times larger than the JGit file cache, does that make sense ?
 
        deltaCacheLimit=100
        compressionLevel=-1
        indexVersion=2
        bigFileThreshold=52428800
        threads=1
        reuseDeltas=true
        reuseObjects=true
        deltaCompress=false
        buildBitmaps=true

What sounds strange to me is the high number of LocalObjectToPack that are held in memory (9M of objects): the ObjectToPack should be only a temporary object used by a PackWriter, but I am not a super-expert of that part of code.

do these repositories contain large files ?

Yes, Android has some large files.

Sven Selberg

unread,
Aug 12, 2019, 8:47:56 AM8/12/19
to Repo and Gerrit Discussion
Hi,

I stumbled across this thread when trouble-shooting an issue we just had and thought I'd mention it here in case they are related:

All CPUs went full-throttle for about 30 minutes after which everything returned to normal.
From a heap-dump I took during the high load I got (we have a quite modest heap of  16.7 GB):

* LocalObjectToPack  - 65M instances
* Byte[]                           - 3M Instances
* ObjectId                      - 3M Instances

I cannot however find any evidence of a JGit GC anywhere during the time-frame of the incident.

Gerrit: v2.16.8-106-g6d4cf6d026
JGit:    5.1.7.201904200442-r
Java:  1.8.0_212-b03 (Open JDK)

/Sven

On Tuesday, June 11, 2019 at 10:19:49 AM UTC+2, lucamilanesio wrote:
Hi Matthias,
thanks for the quick rely. 

Answers inline.

On 11 Jun 2019, at 08:39, Matthias Sohn <matthi...@gmail.com> wrote:

To unsubscribe, email repo-d...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

Luca Milanesio

unread,
Aug 12, 2019, 8:51:19 AM8/12/19
to Sven Selberg, Luca Milanesio, Repo and Gerrit Discussion

On 12 Aug 2019, at 13:47, Sven Selberg <sven.s...@axis.com> wrote:

Hi,

I stumbled across this thread when trouble-shooting an issue we just had and thought I'd mention it here in case they are related:

All CPUs went full-throttle for about 30 minutes after which everything returned to normal.
From a heap-dump I took during the high load I got (we have a quite modest heap of  16.7 GB):

* LocalObjectToPack  - 65M instances
* Byte[]                           - 3M Instances
* ObjectId                      - 3M Instances

I cannot however find any evidence of a JGit GC anywhere during the time-frame of the incident.

Gerrit: v2.16.8-106-g6d4cf6d026
JGit:    5.1.7.201904200442-r
Java:  1.8.0_212-b03 (Open JDK)

Unfortunately, did not have a lot of time to invest on it ... but I found a potential problem with the PackWriter.

JGit keeps a full list of PackWriters in a static field until the caller is closing them (not sure why).
It looks like some of them are left open and, longer term, you could end up with logs of PackWriters that are the top-level objects of the ones you found.

As soon as I'll have more insights on the problem, will publish on the mailing list.

Luca.


More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/8b0ad786-45fe-4680-ba0b-0af8ae6bacef%40googlegroups.com.

Sven Selberg

unread,
Aug 12, 2019, 9:07:46 AM8/12/19
to Repo and Gerrit Discussion


On Monday, August 12, 2019 at 2:51:19 PM UTC+2, lucamilanesio wrote:


On 12 Aug 2019, at 13:47, Sven Selberg <sven....@axis.com> wrote:

Hi,

I stumbled across this thread when trouble-shooting an issue we just had and thought I'd mention it here in case they are related:

All CPUs went full-throttle for about 30 minutes after which everything returned to normal.
From a heap-dump I took during the high load I got (we have a quite modest heap of  16.7 GB):

* LocalObjectToPack  - 65M instances
* Byte[]                           - 3M Instances
* ObjectId                      - 3M Instances

I cannot however find any evidence of a JGit GC anywhere during the time-frame of the incident.

Gerrit: v2.16.8-106-g6d4cf6d026
JGit:    5.1.7.201904200442-r
Java:  1.8.0_212-b03 (Open JDK)

Unfortunately, did not have a lot of time to invest on it ... but I found a potential problem with the PackWriter.

JGit keeps a full list of PackWriters in a static field until the caller is closing them (not sure why).
It looks like some of them are left open and, longer term, you could end up with logs of PackWriters that are the top-level objects of the ones you found.

I had:
97 instances of PackWriter (and PackWriter$MutableState)

Under "normal" conditions I had:
12 instances of PackWriter (and PackWriter$MutableState)

 

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

Sven Selberg

unread,
Aug 12, 2019, 9:09:24 AM8/12/19
to Repo and Gerrit Discussion


On Monday, August 12, 2019 at 3:07:46 PM UTC+2, Sven Selberg wrote:


On Monday, August 12, 2019 at 2:51:19 PM UTC+2, lucamilanesio wrote:


On 12 Aug 2019, at 13:47, Sven Selberg <sven....@axis.com> wrote:

Hi,

I stumbled across this thread when trouble-shooting an issue we just had and thought I'd mention it here in case they are related:

All CPUs went full-throttle for about 30 minutes after which everything returned to normal.
From a heap-dump I took during the high load I got (we have a quite modest heap of  16.7 GB):

* LocalObjectToPack  - 65M instances
* Byte[]                           - 3M Instances
* ObjectId                      - 3M Instances

I cannot however find any evidence of a JGit GC anywhere during the time-frame of the incident.

Gerrit: v2.16.8-106-g6d4cf6d026
JGit:    5.1.7.201904200442-r
Java:  1.8.0_212-b03 (Open JDK)

Unfortunately, did not have a lot of time to invest on it ... but I found a potential problem with the PackWriter.

JGit keeps a full list of PackWriters in a static field until the caller is closing them (not sure why).
It looks like some of them are left open and, longer term, you could end up with logs of PackWriters that are the top-level objects of the ones you found.

I had:
97 instances of PackWriter (and PackWriter$MutableState)

Under "normal" conditions I had:
12 instances of PackWriter (and PackWriter$MutableState)

Under same conditions I had ~ 600k instances of LocalObjectToPack

Luca Milanesio

unread,
Aug 12, 2019, 9:11:42 AM8/12/19
to Sven Selberg, Luca Milanesio, Repo and Gerrit Discussion

On 12 Aug 2019, at 14:07, Sven Selberg <sven.s...@axis.com> wrote:



On Monday, August 12, 2019 at 2:51:19 PM UTC+2, lucamilanesio wrote:


On 12 Aug 2019, at 13:47, Sven Selberg <sven....@axis.com> wrote:

Hi,

I stumbled across this thread when trouble-shooting an issue we just had and thought I'd mention it here in case they are related:

All CPUs went full-throttle for about 30 minutes after which everything returned to normal.
From a heap-dump I took during the high load I got (we have a quite modest heap of  16.7 GB):

* LocalObjectToPack  - 65M instances
* Byte[]                           - 3M Instances
* ObjectId                      - 3M Instances

I cannot however find any evidence of a JGit GC anywhere during the time-frame of the incident.

Gerrit: v2.16.8-106-g6d4cf6d026
JGit:    5.1.7.201904200442-r
Java:  1.8.0_212-b03 (Open JDK)

Unfortunately, did not have a lot of time to invest on it ... but I found a potential problem with the PackWriter.

JGit keeps a full list of PackWriters in a static field until the caller is closing them (not sure why).
It looks like some of them are left open and, longer term, you could end up with logs of PackWriters that are the top-level objects of the ones you found.

I had:
97 instances of PackWriter (and PackWriter$MutableState)

If you have open instances of PackWriter when there are no active receive/upload packs or GCs, then they must be leaks :-(
The PackWriter is the top of the iceberg, there are a lot more objects behind it and all pointing eventually to it.

Luca.


More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/6f2f51d8-bd9a-400a-9931-b403a60a442a%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages