2.6rc3 issue: one of gerrit thread high CPU and "git clone "very slow

1,301 views
Skip to first unread message

Chunlin Zhang

unread,
May 23, 2013, 11:57:03 PM5/23/13
to repo-d...@googlegroups.com
The detail is:
-after run some time the gerrit become very slow when git clone/repo upload,the slow steps are "Counting objects" and "Finding sources" or "remote: Resolving deltas" when repo upload,so user should wait long time to get the sourcecode
-I use top/htop to see the system load,but found the cpu and disk I/O is nothing special,most of cpus idle and the "wa" of top is low
-But I notice if gerrit become slow,one of gerrit thread alway use hight cpu time
-after restart gerrit,

I have found this issue for some days after upgrade to 2.6rc0, because I have not found this happen when using gerrit 2.5.So I try to upgrad to 2.6rc3 but still the same.

How to diagnosis this issue? I have no idea what the busy gerrit thread are doing?In error log I can not found anything useful.

Matthias Sohn

unread,
May 24, 2013, 4:00:16 AM5/24/13
to Chunlin Zhang, Repo and Gerrit Discussion
create a thread dump in order to find out what the busy thread is doing,
kill -3 <processid> is your friend, or use jstack <processid>.

When did you gc last time ? Did you use jgit's gc which creates the shiny new bitmap indexes ?

--
Matthias

Chunlin Zhang

unread,
May 24, 2013, 9:57:27 AM5/24/13
to Matthias Sohn, Repo and Gerrit Discussion
I will try the jstack,thank you!

When did you gc last time ? Did you use jgit's gc which creates the shiny new bitmap indexes ?
I just run a script nightly to run "git gc" for those big .git directory( > 15G now).Does "jgit's gc" mean the "gerrit gc" command?
--
Matthias


Doug Kelly

unread,
May 24, 2013, 11:11:24 AM5/24/13
to repo-d...@googlegroups.com, Matthias Sohn
15GB? WOW! That's a large repo!  Even some of our "larger" repos here only have .git directories in the order of 2-3GB.  If you're having files that large, I'd say there's a pretty good chance you're running into some of the problems previously mentioned on the list[1].  The receive.maxObjectSizeLimit might be relevant to the issue (receiving large binaries can be a problem, from what I've seen).  Also, another post[2] discusses similar issues with repositiories of this size (though, this also discusses some of the performance improvements made to address concerns like that about a year ago).

I think what you might find in jstack is that there are threads stuck in receive-pack.  This will eat up tons of CPU, but they won't show up in the work queue or connection list (I believe the underlying ssh connection died, but Gerrit gets "stuck" waiting for data).  A restart of Gerrit will clear the issue as a temporary fix, but like you found--it will come back, and it tells very little of the root cause.  It would be nice to find the root cause, as this has been known to hit us, too, and the best weapon we have is a restart of Gerrit.  Our symptom is the box will appear with a very high load average, but with plenty of CPU and RAM available--but the queue will become extremely long, and it will even take time to accept new items into the queue (the ssh connection will just hang for minutes at a time).

Good luck!

--Doug

Matthias Sohn

unread,
May 25, 2013, 4:27:42 AM5/25/13
to Chunlin Zhang, Repo and Gerrit Discussion
gerrit gc uses jgit's gc implementation which creates bitmap indexes native git
doesn't (yet) have. This helps Gerrit to serve repos a lot faster. 

You can also run it from jgit's command line: 
$ jgit debug-gc
use a recent nightly version [1] which knows how to create bitmap indexes.

or kepler M7 version:

--
Matthias

Chunlin Zhang

unread,
May 28, 2013, 5:50:51 AM5/28/13
to Repo and Gerrit Discussion
On Fri, May 24, 2013 at 4:00 PM, Matthias Sohn <matthi...@gmail.com> wrote:
(forgot to reply all...)
Today the hanging happened again.
I could not find useful infomation in "jstack <pid>" 's output,see the p.txt in attachment.
And I use kill -3 <pid> but it still always use high CPU time,so kill -3 not work.
So at last I have to restart gerrit to solve temporarily. 
--
Matthias


Chunlin Zhang

unread,
May 28, 2013, 5:55:31 AM5/28/13
to Martin Fick, Repo and Gerrit Discussion



On Fri, May 24, 2013 at 10:13 PM, Martin Fick <mf...@codeauror.org> wrote:
Search the list for .noz files.
I can not find any .noz files in my gerrit git directory,so it is not cause of my issue?

Saša Živkov

unread,
May 28, 2013, 7:57:40 AM5/28/13
to Chunlin Zhang, Repo and Gerrit Discussion
On Tue, May 28, 2013 at 11:50 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:



On Fri, May 24, 2013 at 4:00 PM, Matthias Sohn <matthi...@gmail.com> wrote:
On Fri, May 24, 2013 at 5:57 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:
The detail is:
-after run some time the gerrit become very slow when git clone/repo upload,the slow steps are "Counting objects" and "Finding sources" or "remote: Resolving deltas" when repo upload,so user should wait long time to get the sourcecode
-I use top/htop to see the system load,but found the cpu and disk I/O is nothing special,most of cpus idle and the "wa" of top is low
-But I notice if gerrit become slow,one of gerrit thread alway use hight cpu time
-after restart gerrit,

I have found this issue for some days after upgrade to 2.6rc0, because I have not found this happen when using gerrit 2.5.So I try to upgrad to 2.6rc3 but still the same.

How to diagnosis this issue? I have no idea what the busy gerrit thread are doing?In error log I can not found anything useful.

create a thread dump in order to find out what the busy thread is doing,
kill -3 <processid> is your friend, or use jstack <processid>.

When did you gc last time ? Did you use jgit's gc which creates the shiny new bitmap indexes ?

(forgot to reply all...)
Today the hanging happened again.
Was the CPU usage high at the time when the hanging happened?
Did you again identify the thread with the high cpu usage?
 
I could not find useful infomation in "jstack <pid>" 's output,see the p.txt in attachment.
What did you search for there?
You should find the thread with the ID identified in the previous step (top -H). 
 
And I use kill -3 <pid> but it still always use high CPU time,so kill -3 not work.
Matthias mentioned the "kill -3 <pid>" as an alternative way to create a thread dump.
kill -3 is not a magic tool that will reduce the CPU usage.
 
So at last I have to restart gerrit to solve temporarily. 
--
Matthias


--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Chunlin Zhang

unread,
May 28, 2013, 10:25:24 AM5/28/13
to Saša Živkov, Repo and Gerrit Discussion
On Tue, May 28, 2013 at 7:57 PM, Saša Živkov <ziv...@gmail.com> wrote:



On Tue, May 28, 2013 at 11:50 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:



On Fri, May 24, 2013 at 4:00 PM, Matthias Sohn <matthi...@gmail.com> wrote:
On Fri, May 24, 2013 at 5:57 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:
The detail is:
-after run some time the gerrit become very slow when git clone/repo upload,the slow steps are "Counting objects" and "Finding sources" or "remote: Resolving deltas" when repo upload,so user should wait long time to get the sourcecode
-I use top/htop to see the system load,but found the cpu and disk I/O is nothing special,most of cpus idle and the "wa" of top is low
-But I notice if gerrit become slow,one of gerrit thread alway use hight cpu time
-after restart gerrit,

I have found this issue for some days after upgrade to 2.6rc0, because I have not found this happen when using gerrit 2.5.So I try to upgrad to 2.6rc3 but still the same.

How to diagnosis this issue? I have no idea what the busy gerrit thread are doing?In error log I can not found anything useful.

create a thread dump in order to find out what the busy thread is doing,
kill -3 <processid> is your friend, or use jstack <processid>.

When did you gc last time ? Did you use jgit's gc which creates the shiny new bitmap indexes ?

(forgot to reply all...)
Today the hanging happened again.
Was the CPU usage high at the time when the hanging happened?
Did you again identify the thread with the high cpu usage?
Yes,I observed this for several times,once gerrit become slow there alway is a thread with high cpu usage,I use htop to see the threads details,the strange thead use cpu time for example 6Hour+ and other theads only XX minute or XX seconds.
 
I could not find useful infomation in "jstack <pid>" 's output,see the p.txt in attachment.
What did you search for there?
You should find the thread with the ID identified in the previous step (top -H). 
Yes,I run "jstack <pid>" then output with a error msg,so I use "jstack -F <pid>". The pid I use is the strange thread I see in htop. 
 
And I use kill -3 <pid> but it still always use high CPU time,so kill -3 not work.
Matthias mentioned the "kill -3 <pid>" as an alternative way to create a thread dump.
kill -3 is not a magic tool that will reduce the CPU usage.
I misunderstood this,I will try it next time.

Doug Kelly

unread,
May 28, 2013, 10:31:05 AM5/28/13
to repo-d...@googlegroups.com, Martin Fick
I think your attachments got filtered, so I didn't see the stack dump  However, seeing as this happened to us a few times last week, the files are named "noz*" and "incoming_*.pack" under the repository.  Sorry if there was any confusion!

Specifically, the "stuck" state I tend to see is:

"SSH git-receive-pack '/repo' (username)" prio=10 tid=0x0000000043508800 nid=0x57ac runnable [0x00007fefd6cea000]

   java.lang.Thread.State: RUNNABLE

                at java.util.zip.Inflater.inflateBytes(Native Method)

                at java.util.zip.Inflater.inflate(Inflater.java:238)

(and the stack continues for quite a while...)

--Doug

Saša Živkov

unread,
May 28, 2013, 10:38:50 AM5/28/13
to Chunlin Zhang, Repo and Gerrit Discussion
On Tue, May 28, 2013 at 4:25 PM, Chunlin Zhang <zhangc...@gmail.com> wrote:



On Tue, May 28, 2013 at 7:57 PM, Saša Živkov <ziv...@gmail.com> wrote:



On Tue, May 28, 2013 at 11:50 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:



On Fri, May 24, 2013 at 4:00 PM, Matthias Sohn <matthi...@gmail.com> wrote:
On Fri, May 24, 2013 at 5:57 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:
The detail is:
-after run some time the gerrit become very slow when git clone/repo upload,the slow steps are "Counting objects" and "Finding sources" or "remote: Resolving deltas" when repo upload,so user should wait long time to get the sourcecode
-I use top/htop to see the system load,but found the cpu and disk I/O is nothing special,most of cpus idle and the "wa" of top is low
-But I notice if gerrit become slow,one of gerrit thread alway use hight cpu time
-after restart gerrit,

I have found this issue for some days after upgrade to 2.6rc0, because I have not found this happen when using gerrit 2.5.So I try to upgrad to 2.6rc3 but still the same.

How to diagnosis this issue? I have no idea what the busy gerrit thread are doing?In error log I can not found anything useful.

create a thread dump in order to find out what the busy thread is doing,
kill -3 <processid> is your friend, or use jstack <processid>.

When did you gc last time ? Did you use jgit's gc which creates the shiny new bitmap indexes ?

(forgot to reply all...)
Today the hanging happened again.
Was the CPU usage high at the time when the hanging happened?
Did you again identify the thread with the high cpu usage?
Yes,I observed this for several times,once gerrit become slow there alway is a thread with high cpu usage,I use htop to see the threads details,the strange thead use cpu time for example 6Hour+ and other theads only XX minute or XX seconds.
 
I could not find useful infomation in "jstack <pid>" 's output,see the p.txt in attachment.
What did you search for there?
You should find the thread with the ID identified in the previous step (top -H). 
Yes,I run "jstack <pid>" then output with a error msg,so I use "jstack -F <pid>". The pid I use is the strange thread I see in htop. 
 
I think you have to use the pid of the Gerrit process, not the pid of that thread.

Chunlin Zhang

unread,
May 28, 2013, 11:05:34 AM5/28/13
to Saša Živkov, Repo and Gerrit Discussion
On Tue, May 28, 2013 at 10:38 PM, Saša Živkov <ziv...@gmail.com> wrote:
I think you have to use the pid of the Gerrit process, not the pid of that thread.
You are right! I can see useful stack detail now. The only problem is that I don't know which "java.lang.Thread.State" corresponding the strange thread I see in htop. 

Chunlin Zhang

unread,
May 28, 2013, 11:30:44 AM5/28/13
to Repo and Gerrit Discussion, Matthias Sohn
On Fri, May 24, 2013 at 11:11 PM, Doug Kelly <doug...@gmail.com> wrote:
15GB? WOW! That's a large repo!  Even some of our "larger" repos here only have .git directories in the order of 2-3GB.  If you're having files that large, I'd say there's a pretty good chance you're running into some of the problems previously mentioned on the list[1].  The receive.maxObjectSizeLimit might be relevant to the issue (receiving large binaries can be a problem, from what I've seen).  Also, another post[2] discusses similar issues with repositiories of this size (though, this also discusses some of the performance improvements made to address concerns like that about a year ago).
(sorry to forget "reply to all" again...)
I have to say that I had to do gc action because once there was a big .git directory which is over 100G,the memory is no enough to handle so that the swp is used for 10+G,then the whole system became very slow,I have to "shutdown -r" in the end.
After git gc the big .git became 4+G.
But after this issue fixed,the gerrit v2.5 have been working well for months till these 3 or 4 weeks,I think it began to happen after I upgrade gerrit to 2.6. 

I think what you might find in jstack is that there are threads stuck in receive-pack.  This will eat up tons of CPU, but they won't show up in the work queue or connection list (I believe the underlying ssh connection died, but Gerrit gets "stuck" waiting for data).  A restart of Gerrit will clear the issue as a temporary fix, but like you found--it will come back, and it tells very little of the root cause.  It would be nice to find the root cause, as this has been known to hit us, too, and the best weapon we have is a restart of Gerrit.  Our symptom is the box will appear with a very high load average, but with plenty of CPU and RAM available--but the queue will become extremely long, and it will even take time to accept new items into the queue (the ssh connection will just hang for minutes at a time).

Good luck!

--Doug

--
You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/jN8TElfgCtM/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.

Saša Živkov

unread,
May 28, 2013, 11:47:56 AM5/28/13
to Chunlin Zhang, Repo and Gerrit Discussion
Convert the pid from htop to hexadecimal number and then search for that number
in the thread dump. 

Shawn Pearce

unread,
May 28, 2013, 11:49:13 AM5/28/13
to Chunlin Zhang, Martin Fick, Repo and Gerrit Discussion
In <= 2.5 Gerrit created .noz files unless you configured
core.streamFileThreshold in gerrit.config to be sufficiently high
(e.g. 2047m).

In 2.6, if you don't specify core.streamFileThreshold Gerrit defaults
to the smaller of 25% of the heap or 2047M. It logs this setting at
startup with "Defaulting core.streamFileThreshold to ...". I think the
rule here is large enough to never create noz files for most
installations.

Shawn Pearce

unread,
May 28, 2013, 11:50:46 AM5/28/13
to Doug Kelly, repo-discuss, Martin Fick
On Tue, May 28, 2013 at 7:31 AM, Doug Kelly <doug...@gmail.com> wrote:
> I think your attachments got filtered, so I didn't see the stack dump
> However, seeing as this happened to us a few times last week, the files are
> named "noz*" and "incoming_*.pack" under the repository. Sorry if there was
> any confusion!
>
> Specifically, the "stuck" state I tend to see is:
>
> "SSH git-receive-pack '/repo' (username)" prio=10 tid=0x0000000043508800
> nid=0x57ac runnable [0x00007fefd6cea000]
>
> java.lang.Thread.State: RUNNABLE
>
> at java.util.zip.Inflater.inflateBytes(Native Method)
>
> at java.util.zip.Inflater.inflate(Inflater.java:238)
>

Someone reported on the Git mailing list that JGit has an infinite
loop under certain inflation conditions. They managed to create a
reproduction case, but it requires a 1 GiB Git repository... and I am
still waiting on the details to be posted to the JGit bug tracker. Its
possible that is what this is.

Doug Kelly

unread,
May 28, 2013, 12:06:16 PM5/28/13
to repo-d...@googlegroups.com


On Tuesday, May 28, 2013 10:50:46 AM UTC-5, Shawn Pearce wrote:
Someone reported on the Git mailing list that JGit has an infinite
loop under certain inflation conditions. They managed to create a
reproduction case, but it requires a 1 GiB Git repository... and I am
still waiting on the details to be posted to the JGit bug tracker. Its
possible that is what this is.

Not to go off topic too much, but that seems entirely likely.  We have a user here that seems to be able to reproduce the issue reliably for us (as much as we'd rather he didn't) -- I'll have to keep up on the JGit bug tracker and follow that issue as details emerge, but thanks for the heads up.  The repo that most recently caused the infinite loop for us was only 400MB, though, (at least, it was after I cleared out all the incoming .pack files).

Chunlin Zhang

unread,
May 28, 2013, 9:28:58 PM5/28/13
to Repo and Gerrit Discussion
 The "core" part of my gerrit.config is:
'''
[core] 
        packedGitOpenFiles = 4096
        packedGitLimit = 2g
        packedGitWindowSize = 16k
        streamFileThreshold = 2047m
'''
So should I set the streamFileThreshold bigger?
I will try to set it to 4096m next time I restart gerrit.

Chunlin Zhang

unread,
May 28, 2013, 9:39:32 PM5/28/13
to Saša Živkov, Repo and Gerrit Discussion
Thank you for your help.I will try next time gerrit begin slow. 

Chunlin Zhang

unread,
Jun 3, 2013, 12:29:45 AM6/3/13
to Saša Živkov, Doug Kelly, Repo and Gerrit Discussion
After I do the gc for all the ".git" >100MB,this issue not happened for days,I thought it had been solved.
But today the hanging status happened again.
I could not find useful information from java stack trace,so after got there information below I restarted gerrit again.

1. java stack trace,see attachment,from #3 htop we can see the busy thread ID is 7280,convert to hex:0x1c70 ,but there is no stack trace for this thread.

2. .noz file search result, I could not find .noz file but found one noz9056333443909665782.tmp
gerrit2@smart:~/review_site/git$ find .|grep noz
./projects/mt6575-pad-ics.git/objects/noz9056333443909665782.tmp
gerrit2@smart:~/review_site/git$ ll ./projects/mt6575-pad-ics.git/objects/noz9056333443909665782.tmp
-rw-r--r-- 1 gerrit2 gerrit2 2 2012-05-29 15:19 ./projects/mt6575-pad-ics.git/objects/noz9056333443909665782.tmp

3. htop snapshot:
(after this issue happen,I observed that: the 7280 thread's cpu time sometimes 80+% sometimes 160+%, but always high,and other threads's cpu time are low most of time )

  1  [                                                                0.0%]     Tasks: 424 total, 2 running
  2  [                                                                0.0%]     Load average: 1.82 1.91 1.74
  3  [                                                                0.0%]     Uptime: 109 days(!), 02:47:50
  4  [||                                                              1.0%]
  5  [|                                                               0.2%]
  6  [|                                                               0.2%]
  7  [                                                               -nan%]
  8  [|                                                               0.6%]
  9  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]
  10 [                                                               -nan%]
  11 [                                                                0.0%]
  12 [                                                                0.0%]
  13 [                                                               -nan%]
  14 [|                                                               0.2%]
  15 [                                                                0.0%]
  16 [|||||                                                           5.2%]
  17 [                                                                0.0%]
  18 [                                                                0.0%]
  19 [                                                                0.0%]
  20 [                                                                0.0%]
  21 [                                                                0.0%]
  22 [                                                                0.0%]
  23 [                                                                0.0%]
  24 [                                                                0.0%]
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||11445/16002MB]
  Swp[|                                                        123/42332MB]

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 7293 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:00.00  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7287 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  3:31.18  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7286 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:00.00  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7285 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:59.45  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7284 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  1:01.81  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7283 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:00.00  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7282 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:24.40  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7281 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1  0:07.71  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7280 gerrit2   20   0 8189M 7383M  4880 R 87.0 46.1 10h08:00  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7279 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1 17:50.33  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7278 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1 17:49.65  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7277 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1 17:50.05  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7276 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1 17:50.36  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
 7275 gerrit2   20   0 8189M 7383M  4880 S  0.0 46.1 17:50.58  |   `- GerritCodeReview -Xmx7168m -jar /home/gerrit2/review_site/bin/gerrit.war daemon -d /hom
p3.7z

Doug Kelly

unread,
Jun 3, 2013, 12:58:03 AM6/3/13
to Chunlin Zhang, Saša Živkov, Repo and Gerrit Discussion
Well, it is an interesting stack trace--not the problem I've seen specifically, though it looks like at the moment you took this snapshot, two users are pushing changes into Gerrit, and a large number more (37) are fetching from repositories.  Unfortunately I don't have any outright suggestions, other than perhaps looking at the output of "gerrit show-connections" (run from the gerrit server's ssh port) and also "gerrit show-queue" and maybe comparing this over time--perhaps it will yield some further ideas.

Lundh, Gustaf

unread,
Jun 3, 2013, 5:13:28 AM6/3/13
to Chunlin Zhang, Saša Živkov, Doug Kelly, Repo and Gerrit Discussion

With large Gits and lots of traffic, I’m wondering if  “-Xmx7168m” is really enough. Could you run jvisualvm (included in the JDK) on the server and take a look at how hard the Java GC is working?

 

You could also use the JVisualvm Sampler feature to see where Gerrit spends most of its CPU-cycles. That could certainly give us a hint on what’s going on.

 

JVisualVM Sampler Hints:

 

Run $JDK/bin/jvisualvm ->

Select application (Gerrit) ->

Sampler Tab ->

Sample: CPU ->

Wait a few minutes ->

Click “Snapshot” ->

Click “Hotspot Tab” at the bottom ->

[Take Screenshot 1] ->

Right click the item with most “Self time” ->

Select “Show Back Traces” ->

[Take Screenshot 2]

 

Upload the screenshots to some other image sharing site (like imgur) and post to this thread.

 

/Gustaf

--
--
To unsubscribe, email
repo-discuss...@googlegroups.com

Saša Živkov

unread,
Jun 3, 2013, 8:55:50 AM6/3/13
to Chunlin Zhang, Doug Kelly, Repo and Gerrit Discussion
On Mon, Jun 3, 2013 at 6:29 AM, Chunlin Zhang <zhangc...@gmail.com> wrote:
After I do the gc for all the ".git" >100MB,this issue not happened for days,I thought it had been solved.
But today the hanging status happened again.
I could not find useful information from java stack trace,so after got there information below I restarted gerrit again.

1. java stack trace,see attachment,from #3 htop we can see the busy thread ID is 7280,convert to hex:0x1c70 ,but there is no stack trace for this thread.
There is no stack trace but the thread is listed:

"VM Thread" prio=10 tid=0x00007faf28074000 nid=0x1c70 runnable 

The "VM Thread" could have been running the GC.
Maybe increasing the heap size would help.

Chunlin Zhang

unread,
Jun 9, 2013, 5:57:41 AM6/9/13
to Lundh, Gustaf, Saša Živkov, Doug Kelly, Repo and Gerrit Discussion
The hanging happened just now,I use your method and got 2 screen snapshot:

Inline image 2Inline image 1
callstack.png
hotspot.png

Chunlin Zhang

unread,
Aug 12, 2013, 2:17:34 AM8/12/13
to Lundh, Gustaf, Saša Živkov, Doug Kelly, Repo and Gerrit Discussion
On Mon, Jun 3, 2013 at 5:13 PM, Lundh, Gustaf <Gustaf...@sonymobile.com> wrote:

With large Gits and lots of traffic, I’m wondering if  “-Xmx7168m” is really enough. Could you run jvisualvm (included in the JDK) on the server and take a look at how hard the Java GC is working?

We use a new server 2 weeks ago, the hanging issue happen twice in these 2 weeks,it had been not so often(in the old server,it happens once for 1 or 2 days.)

At the begging of using new server,I have config gerrit to run with "-Xmx15360m",after the hangging happened for the first time,I change to "-Xmx 20480m",but this morning it happened again,now I set to "-Xmx30720m".

I thinks the memory may be the root cause of hangging, because when It happend this morning, I see all the 20G ram used by gerrit in the JVisualvm,many be the memory is not enough and the Java GC is working hard.
Reply all
Reply to author
Forward
0 new messages