Performance of popups in the gui

132 views
Skip to first unread message

Hans Dampf

unread,
Feb 12, 2024, 10:19:32 AMFeb 12
to go-cd
Hello,

we notice slow loads in the gui when you open the popup of a stage.  The loading animation in the popup takes 5-6 seconds on every stage, and we try to narrow down why.

What we know so far is if we disable and delete all agents in the gui the loading is instant. With the agents registered (imported db table "agents"), the load time is back to several seconds.

Gocd does the following sql statment in the database. 'update agents set deleted=true where uuid in("alot uuids")'

If I execute the statement directly in the database, it does not increase the performance. You have to delete the agents via gui.

The database backend is a postgres 14.
We have several other gocd setups on the same databaseserver without this problem.

The setup consists of 19 agent-servers with 5 agents each, overall 95 agents.
Every agent did process ~ 15000 stages so far.

I haven+t figured out at the moment what exactly gocd does during the creation of the popup. But it must have something to do with the agents and/or the agent runtime history.

Maybe someone can point me in the right direct where the bottleneck could be.

Regards

Chad Wilson

unread,
Feb 12, 2024, 11:13:09 AMFeb 12
to go...@googlegroups.com
Hi Hans

Is this slowness specific to builds that are currently running? Or is the slowness (and speed up with deleted agents) the same even with builds/stages that are no longer running (even if the agents are)?

Anything majorly different about this GoCD server setup compared to the others, e.g # of agents in the DB (deleted or not) or # of pipeline runs or configuration/version - that type of thing?

To narrow things down a bit more, it's worth opening your browser Inspect and looking at the network tab when clicking the button and seeing which of the requests are slow:

image.png

On the assumption that it's the server that's slow rather than the UI, that will at least tell us which API is slow, and thus which code path or DB queries to look at contributing to the 5-6 seconds (or whether it is all of them).

If it's much faster when the agents are deleted, my guess is that it could be to do with the logic to essentially allow the agents to be linked, but I am not sure intuitively which logic that might be.

-Chad

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/502ce13f-1696-490c-aa91-6ca542f7e78cn%40googlegroups.com.
Message has been deleted

Wolfgang Achinger

unread,
Feb 13, 2024, 6:15:34 AMFeb 13
to go...@googlegroups.com
In the other setups we have ~60 agents, but they build a lot less frequent (~400 times/pipeline). In these setups, if we open the popup menu there is a load of maybe 50ms
I already tried to disable the agents directly via the database by SQL, but this did not work. 
It would be nice to know what else happens when you delete agents over the GUI.
Maybe a purge of a cache or a reload.

The Popup

popup.png
Timings before delete of all agents
before.png
Timings after delete of all agents
after.png


You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/c1n1Aq7hG1k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-wCHVD9UJQ_ZPXMkXWKRi6vsLRR9tMMFdAZMiTk6hySQ%40mail.gmail.com.

Wolfgang Achinger

unread,
Feb 13, 2024, 6:55:10 AMFeb 13
to go...@googlegroups.com
For comparison, that's a call of our oldest setup with 60 agents but a lot less builds/pipeline
It is a lot faster.
oldsetup.png


Am Mo., 12. Feb. 2024 um 17:13 Uhr schrieb Chad Wilson <ch...@thoughtworks.com>:
You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/c1n1Aq7hG1k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-wCHVD9UJQ_ZPXMkXWKRi6vsLRR9tMMFdAZMiTk6hySQ%40mail.gmail.com.

Chad Wilson

unread,
Feb 13, 2024, 8:08:03 AMFeb 13
to go...@googlegroups.com
Yes, there is a cache that is notified when you delete an agent via the GUI, but I believe it only removes the individual agent that is deleted, not a full clear of the cache.

This same cache is supposed to be used by the /api/agents call that is slow/stuck - to make it faster (which doesn't seem to be working!) :-)

I wonder what is happening that could make it so slow. Perhaps look at the JSON response from the agents API when it is slow - does the response size and content look intuitively correct, similar to here? How does the response structure compare to the other machine that is relatively fast? (are the # of agents returned and environments per agent similar?) They are both relatively large uncompressed responses (144kB and 112kB but not wildly different).

If you're trying to dig through yourself, the entrypoint for the API serving that tab (assuming you are in a recentish version) is here.

-Chad

Message has been deleted

Wolfgang Achinger

unread,
Feb 13, 2024, 11:01:24 AMFeb 13
to go...@googlegroups.com
Well, we do notice a direct increase in speed if we only remove partial of the agents.
The response if the call is what is to be expected from the API call as described here https://api.gocd.org/current/#get-all-agents

I compared the output from the fast go-server with the slow one, and the structure is the same.

We are currently running version 23.5.0-18179

I tried to look into the code, but I'm not a (java-)developer, so this would be more wildly guessing.


Down from 95 to 60 agents
60.png


Chad Wilson

unread,
Feb 13, 2024, 11:39:30 AMFeb 13
to go...@googlegroups.com
Unfortunately, I'd probably need some more definitive heuristics about the nature of the response agents and environments, or a carefully redacted example response to guess any further. How many environments and resources do the agents have attached, on average?

Otherwise I am also wildly guessing as to what the main variables might be in how to replicate a similar situation.

Is this a problem that has always been there, or something that has changed with a GoCD version or other change in environment?
Is it faster when the server is restarted, and gets slower over time (or the same after a restart)?
Why do you feel it is the # of jobs/stages the agents have processed that is a key factor, rather than simply the # of agents or some other agent configuration factor?

-Chad

On Tue, Feb 13, 2024 at 11:51 PM 'Wolfgang Achinger' via go-cd <go...@googlegroups.com> wrote:
Well, we do notice a direct increase in speed if we only remove partial of the agents.
The response if the call is what is to be expected from the API call as described here https://api.gocd.org/current/#get-all-agents

I compared the output from the fast go-server with the slow one, and the structure is the same.

We are currently running version 23.5.0-18179

I tried to look into the code, but I'm not a (java-)developer, so this would be more wildly guessing.


Down from 95 to 60 agents
60.png


--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

Ketan Padegaonkar

unread,
Feb 13, 2024, 11:59:11 AMFeb 13
to go...@googlegroups.com
IIRC - there were some loggers (hibernate and ibatis) that could be enabled to get better heuristics around the executed SQL/timings etc.

Postgresql also comes with slow query logging that can be enabled on the server side to capture slow queries above a particular threshold. Be sure to not keep it enabled for too long.

- Ketan



Chad Wilson

unread,
Feb 13, 2024, 10:59:07 PMFeb 13
to go...@googlegroups.com
In theory I think this code isn't supposed to be dependent on the database to serve the API responses (unless there is some "refresh-periodically-and-block-api" that I am missing), but I am certainly not an expert on the specific area and perhaps there is some Hibernate lazy inflation or refresh-from-db going on, as the level of slowness here does look like a "blocked on DB" type of issue.

Wolfgang Achinger

unread,
Feb 14, 2024, 3:01:32 AMFeb 14
to go...@googlegroups.com
So i did some further testing. And removed all pipelines with but keep tall agents.
Without any pipelines it runs as expected good
nopipelines.png

I have to say we have on this setup about 1200 Pipelines configured but without the agents the access on the popup menu is, as already told,  instant
Does the go-server read every time the whole pipeline config when you access the submenu and not just the single pipeline I like to open the popup? Or is there a crosscheck between pipeline and agents in the logic?

Setup 19 Agentserver with 5 Agents each
overall 95 Agents
~ 1200 Pipelines

No Agents fast
No Pipelines fast
Pipelines and Agents slow

Chad Wilson

unread,
Feb 14, 2024, 4:22:56 AMFeb 14
to go...@googlegroups.com
This is useful information. The API response is enriched with environment mappings to ALL agents which I understand comes from the pipeline config, but which I don't think should have a direct relationship to the # of pipelines - and also should be in memory cached for this purpose.

That's something to validate though.

Are your pipelines using pipelines as code, or all GUI or API managed? (I.e inside the config.xml)

Wolfgang Achinger

unread,
Feb 14, 2024, 5:29:03 AMFeb 14
to go...@googlegroups.com

Chad Wilson

unread,
Feb 15, 2024, 12:15:09 AMFeb 15
to go...@googlegroups.com
What % of the 1200 pipelines are defined in YAML? 9x%?

There are different code paths for config 'implied' by pipelines-as-code so trying to gauge the most likely way to replicate something similar to what you are seeing.

Wolfgang Achinger

unread,
Feb 15, 2024, 3:15:51 AMFeb 15
to go...@googlegroups.com
All 100% of the pipelines are generated that way. We create them by using jinja2 templates and python scripts to keep them as uniform as possible.

Wolfgang Achinger

unread,
Feb 15, 2024, 4:38:16 AMFeb 15
to go...@googlegroups.com
Additional information
Since the pipelines are configured via ~150 yaml files. 
I tested it now with one big, merged config file with all pipelines
But this did not change anything, 
performance slow.

Chad Wilson

unread,
Feb 15, 2024, 9:01:30 AMFeb 15
to go...@googlegroups.com
How many distinct environments and resources do you have across these 1200 pipelines, roughly?

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.

Wolfgang Achinger

unread,
Feb 15, 2024, 9:17:13 AMFeb 15
to go...@googlegroups.com
1 environment
164 materials
0 elastic agents
2 config repos
0 artifact stores
0 pluggable scms

You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/c1n1Aq7hG1k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAA1RwH8Vb0U5YPQNB4Qzf2d6kP8KiYRBsgXr1Jux3xEMEN_H5A%40mail.gmail.com.

Chad Wilson

unread,
Feb 15, 2024, 9:50:33 AMFeb 15
to go...@googlegroups.com
And how many resources are defined across the agents?

Can you please answer the earlier questions I asked as well? It's rather difficult to efficiently help if you don't respond to the questions that characterise the problem from a maintainer perspective. :-)

- Is this a problem that has always been there, or something that has changed with a GoCD version or other change in environment?
- Is it faster when the server is restarted, and gets slower over time (or the same after a restart)?
- Why do you feel it is the # of jobs/stages the agents have processed that is a key factor, rather than simply the # of agents or some other agent configuration factor?

Additionally, can you share a redacted output from /go/api/support ? You can enter the URL in the browser when logged in as an admin. Be careful of the "Runtime Information" and "System Health Information" sections when sharing. These are the two main places which might leak unintentional information from your setup. Redact the individual values which feel sensitive to you.

-Chad


Wolfgang Achinger

unread,
Feb 15, 2024, 11:29:36 AMFeb 15
to go...@googlegroups.com
> And how many resources are defined across the agents?
What exactly do you mean here? System resources? XMS XMX Values of java ?

- Is this a problem that has always been there, or something that has changed with a GoCD version or other change in environment?
No we use this setup no for about a year, patch the system on a regular basis including the latest gocd stable version.

- Is it faster when the server is restarted, and gets slower over time (or the same after a restart)?
No a restart does not affect the speed at all. It stays constant

- Why do you feel it is the # of jobs/stages the agents have processed that is a key factor, rather than simply the # of agents or some other agent configuration factor?
I don't know it was more a wild guess. After later testing, i don't think this anymore. I cleaned up some tables and reduced the agent history visible in the GUI, but this did not affect the speed (Well, it increased the speed of the listing of the agent history itself but not the loading time of the popups). 

If it is ok i will send the support output directly your our mailadress  so it will not get shared in the thread.

Chad Wilson

unread,
Feb 15, 2024, 11:57:41 AMFeb 15
to go...@googlegroups.com
Cool, thanks! Just trying to gather enough information to see if I can replicate or find the issue in a dedicated chunk of time this weekend.

You can email it to me, and/or encrypt with my GPG key if you'd like (https://github.com/chadlwilson/chadlwilson/blob/main/gpg-public-key.asc)

By 'resources' I am referring to the GoCD functionality where you can tag agents with resources that they offer, which are then matched to pipeline jobs that say they require those resources to run as part of agent assignment.

> No we use this setup no for about a year, patch the system on a regular basis including the latest gocd stable version.

To make sure I understand you, are you saying that the problem has been here for the last year, perhaps gradually getting worse a story add more agents or pipelines - but not an issue suddenly created after a particular upgrade or change?

-Chad

Wolfgang Achinger

unread,
Feb 16, 2024, 2:40:46 AMFeb 16
to go...@googlegroups.com
> By 'resources' I am referring to the GoCD functionality where you can tag agents with resources that they offer, which are then matched to pipeline jobs that say they require those resources to run as part of agent assignment.
10 Agents have 5 resources attached
85 have 1 resource attached

We use the resources to different special agents. They do the same as the rest, but they are placed in dedicated networks.

> To make sure I understand you, are you saying that the problem has been here for the last year, perhaps gradually getting worse a story add more agents or pipelines - but not an issue suddenly created after a particular upgrade or change?
That's correct. It's more an over-time issue than a sudden issue.

I sent the additional information out, but not directly, they come from a different mail address over a secure transfer method. 

Chad Wilson

unread,
Feb 17, 2024, 12:08:03 PMFeb 17
to go...@googlegroups.com
Hiya folks

I've been able to replicate this problem and should be able to fix it for a subsequent release - thanks for the help debugging.

The problem appears to arrive when there is a large number of pipelines mapped to an environment; and also a large number of agents for that environment. The logic for calculating the API response agents > environments is accidentally very, very inefficient (I think it's O(n^2 x m^2) or something crazy. I replicated something similar to what you describe with 5,000 pipelines and 60 or so agents, all mapped into the same, single, logical environment.

image.png


In your case if you have all say 1,690 pipelines mapped to a single environment (from your stats below), and all of your 95 agents are in the same environment, you'd definitely trigger this issue. I can't tell exactly from what you have shared how the pipelines and agents are mapped to environments, so this is a guess - can you confirm how many agents and pipelines are mapped to the environment below?

"Number of pipelines": 1690,
"Number of environments": 1,
"Number of agents": 95,

If it's the same problem, you will probably find that untagging the agents from the environment also has a similar speed-up effect to deleting all of the agents (although then the pipelines requiring that environment won't schedule either, obviously).

Another workaround in the meantime, if you don't rely on the environment
  • to define environment variables/secure environment variables that apply across all pipelines/jobs
  • to affect whether jobs are scheduled to special agents
... may be to untag all pipelines and agents from the environment you use and just use the default/empty environment.

-Chad

Wolfgang Achinger

unread,
Feb 19, 2024, 2:15:36 AMFeb 19
to go...@googlegroups.com
Hello,

this is actually incredible news for us. We will look forward to the release with the fix.
Thanks for the support.

The workarounds seem not to be viable for us, since we use the environment for a lot of global variables and customizations.

Regards

Wolfgang Achinger

unread,
Feb 19, 2024, 2:16:02 AMFeb 19
to go...@googlegroups.com
> can you confirm how many agents and pipelines are mapped to the environment below?
Yes that is true

Chad Wilson

unread,
May 12, 2024, 10:05:12 PMMay 12
to go...@googlegroups.com
Apologies for the slow release of this (been rather busy personally) but 24.1.0 is out with what I think should be a fix for this issue.

If you have any feedback it'd be appreciated.

-Chad


Wolfgang Achinger

unread,
May 13, 2024, 4:10:25 AMMay 13
to go...@googlegroups.com
Dude the fix is amazing !!!!!

Chad Wilson

unread,
May 13, 2024, 4:24:46 AMMay 13
to go...@googlegroups.com
Great to hear - back to how it was supposed to behave! I hope it hasn't caused any other regressions 🙏

Now that this is cleaned up, let me know if it exposes any other unexpected weird slowness you can't get to the bottom of and I'll see if I can chip away at the other various niggles.

-Chad

Wolfgang Achinger

unread,
May 13, 2024, 4:30:42 AMMay 13
to go...@googlegroups.com
I'm currently in the process of upgrading the server and agents of all our gocd setups. We will monitor it the next few days and I will come back if we notice anything

Hans Dampf

unread,
May 23, 2024, 3:39:52 AMMay 23
to go-cd
So we have been running the new version for a couple of days now, and I can't see any further problems with it. The performance is great with >1000 Pipelines now.
It is a really great improvement.

Chad Wilson

unread,
May 24, 2024, 1:38:47 AMMay 24
to go...@googlegroups.com
Great to hear. Really appreciate the feedback!

Reply all
Reply to author
Forward
0 new messages