Manual triggers extremely delayed

26 views
Skip to first unread message

Evan Amstutz

unread,
May 27, 2026, 9:30:49 PMMay 27
to GoCD Users
Hello, 
We had to restart our gocd 25.1.0 (20070-8f4b43b7bab6e0666bc9282ef4e25fa80d5b0529).  restart on a new ec2 instance due to a catostrophic error. We have recovered the majority of the previous server but we now see extreme delays when manually kicking off a pipeline. Triggered by changes works as expected as does rerunning failed slices but manual triggers take well over 5 minutes to register. Any ideas on what could be causing that?

Thank you,
Evan Amstutz

Chad Wilson

unread,
May 27, 2026, 10:35:35 PMMay 27
to go...@googlegroups.com
Not sure. Whats in the server logs in the time period? Are there git child processes stuck or doing a lot of work? (Assuming you're using git)

If nothing obvious, you could use jstack or "kill -3 pid" to dump stack traces to logs and see if there is an obvious place something is stuck during the manual trigger.

Otherwise would probably need more details about your deployment and how you did your recovery (perhaps what originally failed) to make any guesses.

-Chad

--
You received this message because you are subscribed to the Google Groups "GoCD Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/go-cd/47c215a3-a8b4-46db-b094-db329c019d4cn%40googlegroups.com.

Evan Amstutz

unread,
May 28, 2026, 10:51:05 PMMay 28
to GoCD Users
It is getting stuck between the scheduler accepting the job and creating a build-cause. Its probably because our git materials are huge and we have close to 100 pipelines generally trying to run at the same time. 
To recover we rebuilt the ec2 instance from a snapshot. When we got back into the instance all of our config as code pipelines had to be repulled from our monorepo which took about 10 minutes. Triggered by changes and rerun failed works fine and i believe thats because the materials are already available. So if i had to guess based on all of this it would be due to pulling a large material over and over. But we do have everything set to shallow clone and that doesnt seem to speed anything up.
Thanks,
Evan

Evan Amstutz

unread,
Jun 3, 2026, 2:08:27 PMJun 3
to GoCD Users
Once materials caught up everything works as expected.

Chad Wilson

unread,
Jun 3, 2026, 2:27:36 PMJun 3
to go...@googlegroups.com
Ok. To look into it further would probably need an example thread dump to see which logic it was stuck in, and some more details on the setup.

Materials for the same repo are generally shared unless you're using something special like the git path material plugin with different subpaths.

If it really was just the raw git clone speed and amount of data to be cloned, shallow clones won't help as they are ignored server-side - IIRC it needs the history to give GoCD's fan-in guarantees (and don't think it has any special logic to use shallow clones if there are no existing pipeline runs with history for that material, a switch a fresh database).

GoCD right now doesn't support blobless or treeless clones which is what would possibly speed this up server-side, if that was the root of the issue (since that filters the repo to basically the log, without content).

If you want to help, could open something at https://github.com/gocd/gocd/issues with some more details.

-Chad

Reply all
Reply to author
Forward
0 new messages