We are currently investigating a repeating pattern of events on our Gerrit master running version 3.2.7 as follows:
Eventually we must restart the service (gerrit.sh restart)
Has anyone else observed a similar pattern of behavior?
Can anyone suggest methods to make it more stable?
Master server details:
Gerrit version 3.2.7 (jetty container, hosting ~5K projects, ~1.1TB
total size)
openjdk version 1.8.0_282
RHEL 7U9 (Maipo)
CPU: 64 cores x86_64
RAM: 128 GB
We originally chose these system specs based on Gerrit tuning guide (http://ctf-dev-environment-vagrant.s3.amazonaws.com/Gerrit-Performance-Tuning-Cheat-Sheet.pdf) going back to the time of Gerrit 2.15. Is this document still applicable to Gerrit 3.2?
On 9 Apr 2021, at 18:46, motorhe...@gmail.com <motorhe...@gmail.com> wrote:We are currently investigating a repeating pattern of events on our Gerrit master running version 3.2.7 as follows:
- increased CPU Utilization
(load average goes from normal ~5/5/5 up to 20/20/20 for long periods)- increased java GC activity
- JMX GC time >750ms (normally never >50ms)
- JMX GC Count (rate) shows strange behavior
- PS MarkSweep (old generation collector) >0.05 (normally never >0.004)
- PS Scavenge (young generation collector) stuck at 0 (normally ~0.05)
- MarkSweep and Scavenge normally run concurrently for long periods without incident
- spike will last for ~45min and then settle down
- will only stay settled for ~5min before spiking for 45min again
- Once started, this cycle will continue until master is restarted
- ssh port 29418 becomes very slow
- replication delay and queue size escalate
- web UI becomes sluggish and later unresponsive
Eventually we must restart the service (gerrit.sh restart)
- after restart, Gerrit will appear to run fine for a few days before this all starts happening again (for this reason we cannot refer to the restart a s fix - more like a temporary workaround)
- issue tends to occur on Monday (or after a weekend), but not always
- does not occur at any fixed time of day (01:00 on one occasion, 10:30 the next, etc.)
Has anyone else observed a similar pattern of behavior?
Can anyone suggest methods to make it more stable?Master server details:
Gerrit version 3.2.7 (jetty container, hosting ~5K projects, ~1.1TB total size)
openjdk version 1.8.0_282
RHEL 7U9 (Maipo)
CPU: 64 cores x86_64
RAM: 128 GBWe originally chose these system specs based on Gerrit tuning guide (http://ctf-dev-environment-vagrant.s3.amazonaws.com/Gerrit-Performance-Tuning-Cheat-Sheet.pdf) going back to the time of Gerrit 2.15. Is this document still applicable to Gerrit 3.2?
Regards,Robert GregorySystems Administrator,SW Infrastructure, AMD
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/6fcd0942-af9a-44e6-95fa-d9189df87efan%40googlegroups.com.
HiWould you like to show me how to get the following values?
- JMX GC time
- JMX GC Count (rate)
- PS MarkSweep (old generation collector)
- PS Scavenge (young generation collector) stuck
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/435756c3-0e3b-4862-9aa1-00a51a14c9ebn%40googlegroups.com.