GoCD is slow, several git processes are running with high cpu usage

100 views
Skip to first unread message

Brandon V

unread,
Jan 11, 2023, 1:58:35 AM1/11/23
to go-cd
Hi all,

Our GoCD server is acting slow. We have >1000 pipelines that trigger automatically on updates to various branches of a Git repository. For each branch that needs to be picked up by GoCD, we have a distinct git material with shallow clone set to true. The time to trigger from Git updates is quite slow. The UI is also a bit sluggish in general.

Environment info:
Linux
GoCD server 21.1.0
PostgreSQL 12.8 database

The server was recently recreated updating the OS from Ubuntu 16.04 to Ubuntu 20.04. The most important pieces of configuration were restored from the previous server. However, it's possible some configuration was lost, or some package version was changed, or something like that. I'd appreciate any pointers.

While trying to debug this, I noticed a few things:

CPU utilization on the machine is hovering around 60%, when there are no jobs running.

The top consumers of CPU on the machine are git commands run by the GoCD server. At any given time there are about 8 instances of git processes like "git branch -r --contains <commit-sha>". Each of these git commands can be using a whole CPU.

Looking at things related to git, I noticed:

The GoCD server logs in /var/log/go-server/go-server.log have a lot these messages (referencing the different deployment branches):

2023-01-11 06:44:04,168 WARN  [ThreadPoolTaskScheduler-7] MaterialUpdateService:204 - [Material Update] Skipping update of material GitMaterial{url=g...@github.com:repo/app.git, branch='abranch', shallowClone=true, submoduleFolder='null'} which has been in-progress since Wed Jan 11 06:43:04 UTC 2023

Using jstack to get thread dumps from the server, this seems to be the java stack trace where those git processes are launched:

"130@MessageListener for MaterialUpdateListener" #130 daemon prio=5 os_prio=0 cpu=14618.25ms elapsed=100474.11s tid=0x00007fa2ded9dca0 nid=0xea6f in Object.wait()  [0x00007fa20c5ad000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(java...@15.0.1/Native Method)
        - waiting on <no object reference available>
        at java.lang.Object.wait(java...@15.0.1/Unknown Source)
        at java.lang.ProcessImpl.waitFor(java...@15.0.1/Unknown Source)
        - locked <0x00000005370e58e0> (a java.lang.ProcessImpl)
        at com.thoughtworks.go.util.ProcessWrapper.waitForExit(ProcessWrapper.java:54)
        at com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:354)
        at com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:378)
        at com.thoughtworks.go.domain.materials.SCMCommand.runOrBomb(SCMCommand.java:38)
        at com.thoughtworks.go.domain.materials.git.GitCommand.containsRevisionInBranch(GitCommand.java:364)
        at com.thoughtworks.go.config.materials.git.GitMaterial.modificationsSince(GitMaterial.java:132)
        at com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:35)
        at com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:26)
        at com.thoughtworks.go.server.service.MaterialService.modificationsSince(MaterialService.java:134)
        at com.thoughtworks.go.server.materials.ScmMaterialUpdater.insertLatestOrNewModifications(ScmMaterialUpdater.java:56)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.insertLatestOrNewModifications(MaterialDatabaseUpdater.java:157)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterialWithNewRevisions(MaterialDatabaseUpdater.java:149)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater$2.doInTransaction(MaterialDatabaseUpdater.java:108)
        at com.thoughtworks.go.server.transaction.TransactionCallback.doWithExceptionHandling(TransactionCallback.java:23)
        at com.thoughtworks.go.server.transaction.TransactionTemplate.lambda$executeWithExceptionHandling$2(TransactionTemplate.java:43)
        at com.thoughtworks.go.server.transaction.TransactionTemplate$$Lambda$1842/0x00000008045df9c8.doInTransaction(Unknown Source)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
        at com.thoughtworks.go.server.transaction.TransactionTemplate.executeWithExceptionHandling(TransactionTemplate.java:40)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterial(MaterialDatabaseUpdater.java:105)
        - locked <0x0000000537f34488> (a java.lang.String)
        at com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:64)
        at com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:32)
        at com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83)
        at com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63)
        at java.lang.Thread.run(java...@15.0.1/Unknown Source)



Any help is appreciated, thanks!

Ashwanth Kumar

unread,
Jan 11, 2023, 6:27:25 AM1/11/23
to go...@googlegroups.com
Can you check the avg. disk io on the machine?  Something like:

$ iostat -k 2

Also are you using SSDs or HDDs on the GoCD Server? If you're using HDDs see if you can run the same setup on SSDs to see if there is a difference.

Thanks,


--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/3c3fe87e-5639-4316-9dfa-bc3dcf68d901n%40googlegroups.com.


--

Ashwanth Kumar / ashwanthkumar.in

Chad Wilson

unread,
Jan 11, 2023, 11:29:04 AM1/11/23
to go...@googlegroups.com
Yeah, I would start by investigating what is different about raw git performance as Ashwanth alludes to. Just to clarify, you don't run agents on the same machine as the server, do you?

These are forked processes from GoCD, and the speed at which a given git operation runs isn't really affected by GoCD in any relevant way I can think of. If there is a difference in the performance of the location where GoCD server is storing its temporary artifacts between new and old (/var/lib-go-server mount, normally), that might be something to look at.

The thread dumps you post are generally just indicative of the material subsystem being very slow and not being able to get through the number requiring material checks/polls in the allotted time. There are many posts and such about slow performance around the place e.g, here, here and elsewhere in GitHub going back a well - perhaps can mine the existing resources for ideas.

As to why things appear to have degraded since your previous server version, it's a bit difficult to tell. What was your rebuild methodology? Did you follow https://docs.gocd.org/current/advanced_usage/one_click_backup.html#restoring-gocd-using-backup? Are you sure you restored the config folder (wrapper-properties.conf and similar) and any overrides to GoCD memory allocation, system properties etc? It's conceptually possible that your previous configuration had tuned the material subsystem to either do more in parallel or to poll less frequently and such, or that you had some global git config which somehow affected git performance - but without more detail, it's probably just idle speculation.

-Chad



Brandon V

unread,
Jan 11, 2023, 1:50:35 PM1/11/23
to go-cd
Ashwanth: looking at `iostat -k 2` I don't see any particularly high volume of disk io. The disk is 99% idle. The disk is an SSD, and it's the same kind that was used on our old GoCD server. The disk is an AWS EBS io2 volume to be specific, and so I've used the AWS metrics to check for any signs of disk latency and the disk seems to be performing smoothly.

Chad: That's correct, we don't run agents on the GoCD server. For extra info about the rebuild methodology, we used the GoCD backup instructions as you linked to backup the old server, then restored from that backup on a new machine. Config files such as wrapper-properties.conf were restored, but it's theoretically possible that there were edits made in a way that wouldn't be captured in this backup/restore (e.g. a global git config file). Thanks for the extra clues about git material polling - I'll look into what config options are available.

A bit more debugging:

I would have to guess this git command is cpu-bound. Using `lsof -u go | grep git` to figure out what git is doing, it looks like the git commands are busy reading files like these:

/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-47a1b4f9c1abd793f7783d61b3719520e8ee174e.pack
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.pack
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.pack
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-7755613d3efe8fdfcd63bfc16a62a8194b508c74.idx
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-3fcca0d6c0dd729a6a650b87583ec5424ec7c103.idx
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-0c24d29c94e20367351971b219ded2455aa496d3.idx
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.idx
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.idx
/var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/packed-refs

These look like git repos under /var/lib/go-server/pipelines/flyweight/ where GoCD is running these `git branch -r --contains <commit>` commands. I was able to copy one of these git repos and inspect it. Its size is 333M on disk, and has about 130000 commits. I confirmed that running `git branch -r --contains <commit>` with a recent commit id took about 30 seconds. Then I tried `git gc`. After that completed, the same `git branch -r --contains <commit>` took less than 1 second.

Chad Wilson

unread,
Jan 11, 2023, 10:08:37 PM1/11/23
to go...@googlegroups.com
Interesting - those are large repos and do agree with your assessment about it seeming CPU bound.

GoCD does git gc repos, but that's possibly not the case for the "flyweight" ones on the server which are used to monitor for changes, and list out the changes between revisions inside GoCD. I believe you should consider it safe to manually git gc all those repos (as long as the file permissions aren't affected - do so as the correct user).

On the other hand, with a fresh server, I'm surprised those clones would have a large amount of garbage. I'm not a git internals expert, but understood that such garbage accumulates over time rather than starting that way. Did your old server perhaps have some old workaround cron/supervisor job that git gced all of these manually I wonder to avoid you getting into this state?

You could also entirely remove those flyweights and let GoCD recreate them to see if they get recreated in an "efficient" state without needing a git gc. Perhaps test on a couple first. Obviously if you remove all of them it will put a bit of load on your repository manager as GoCD/git re-clones everything for a while.

-Chad

Brandon V

unread,
Jan 12, 2023, 12:30:23 PM1/12/23
to go-cd
Problem solved! Running git gc inside all of the git repos under /var/lib/go-server/pipelines/flyweight/ (as the go user) totally fixed this. Server CPU usage is back down to the expected single-digit range, and the git material updates are fast.

Now I know that running git gc in this case is safe (it's not safe for the config repo, per https://docs.gocd.org/current/advanced_usage/config_repo.html), and now I know what these flyweight directories are (it seems they are the git repos used for polling each distinct git material, though I haven't yet found documentation to confirm).

Next I'll probably look at adding git gc on these repos as a daily or weekly cron.

Thanks for your help and work on GoCD!
Reply all
Reply to author
Forward
0 new messages