Hello Everyone!
Me and my team have been struggling with a problem where Jenkins is stalling a lot between steps and stages, particularly in changes' validation (which we do with the Gerrit Trigger plugin). The periodic executions (daily, during the night) do not exhibit the same behaviour, although they do exactly the same as the changes' validation.
This happens throughout all stages and all steps, but the easiest one to witness the behaviour is in a single step stage, where the step (a bat command) takes less than 30 seconds to execute, however, Jenkins reports over 5 minutes of execution time for the stage.
A couple of months ago we upgraded to Jenkins 2.492.3 (from 2.319.1), and the problem started to manifest itself.
Recently, we bumped the controller's resources from 4CPUs and 5GB of RAM to 8CPUs and 10GB of RAM, with no discernible improvement in performance (for this particular case).
We have upwards of 100 declarative pipelines with the ocasional script step (some of them multibranch), and we are using the "Performance-optimized" option in Speed/Durability configuration.
We have 16 agents with a total of 45 executors, and at any given time we have ~20-30 concurrent executions throughout the day (and ~5 concurrent executions throughout the night).
Unfortunately our pipelines are long, with most of them taking ~1h.
We've gathered GC logs and have seen some worrisome patterns.
After the recent increase in resources, this pattern seems to have disappeared, but we are still experiencing the same long pauses, as I've said before.
Our current configurations to the JVM are: "-server -Xms8G -Xmx8G \
-XX:MaxDirectMemorySize=1G -XX:MaxMetaspaceSize=512M \
-XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -XX:+ExplicitGCInvokesConcurrent \
-XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions"
With all that being said, I'm not sure how to continue with the analysis:
Pardon me for the long post, but any general advice either on how to fix or how to further debug this problem, would be much appreciated.
Best Regards,
PS - I'm not sure if this mailing list is ok with screen shots, pardon me if not.
Fábio Almeida
Platform Engineering Team Lead
SISCOG - Sistemas Cognitivos, SA
A Campo Grande, 378 - 3º, 1700-097 Lisboa, Portugal
T +351 217 529 100
W www.siscog.pt
Optimising the resources of the world
DISCLAIMER This message may contain confidential information. You should not copy or address
this message to third parties. If you are not the appropriate recipient we kindly ask you to delete
the message and notify the sender.
The contents of this message and its attachments are the sole responsibility of the sender and under
no circumstances can SISCOG - Sistemas Cognitivos, SA be liable for any resulting consequences.
--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jenkinsci-users/ad38efaa5eb2e73c935a28281b4537e6%40siscog.pt.
Thank you for your reply Maciej.However, I don't think I have the same problem you are describing since no artifacts are being archived into Jenkins itself, we archive things in Nexus at a later stage.
Additionally, as I said, the nightly jobs, which are equal to the changes validating, aren't being affected by this slowdown.At the moment, I'm convinced that this a problem with either the change discovery done by Gerrit Trigger Plugin, or a load problem (less so of this option, because doubling our resources had virtually no effect on the problem).