Recommended hardware spec for running a GO server or performance tuning advice

220 views
Skip to first unread message

Carl Reid

unread,
Apr 15, 2016, 5:31:32 AM4/15/16
to go-cd
We keep hitting performance problems on our GO server. This is now at the point where we are considering moving away from GO as it is affecting productivity.

The issues are manifested typically in crashes, hanging, extreme slowness of the user interface and delays in pipelines picking up and executing.

The user interface problems are typically when:
  • Clicking from one place to another (from Admin to pipeline view for example)
  • Rendering the main pipeline page and having to wait sometimes a minute before being able to perform a search
  • Saving any changes can take anywhere from 30 seconds up to 5 minutes
  • Editing the Xml in the Admin interface is chronically slow (always measured in minutes)
  • Displaying the console output from a pipeline job always causes Chrome to display the "this page is unresponsive
  • Users report general sluggishness in doing anything
We have the following hardware (we moved away from virtual server due to IO issues)

  • Dell R710
  • 2 x Intel E5520 @ 2.27Ghz (4 cores each hyper-threaded)
  • 24GB RAM
  • OS: Debian 8.3 (jessie)
  • 2 x 146GB RAID-5 (OS)
  • 4 x 500GB RAID-10 (artifacts)

We have played with the heap size and found that having a smaller heap has improved things a little which was not quite what we expected.

If anyone can recommend what hardware we need for the number of pipelines we have or any performance tuning then it would be much appreciated. 

Go API Support output attached.

Thanks

Carl







go api support.txt

Aravind SV

unread,
Apr 15, 2016, 7:28:47 AM4/15/16
to go...@googlegroups.com
Quick thoughts:

750 pipelines and 2GB Xmx doesn't make sense. Even if it feels faster, it's inadequate. You should go back up to at least 6GB.

About Chrome and slowness, this might be relevant: There is an issue in Chrome which causes some slowness.

Dashboard slowness is known. It needs to be worked on. You could try hiding some pipeline (personalize) and see if it helps for now.


I've seen much bigger installations have lesser issues. So, I know it can work well, but I don't have the time, unfortunately, to spend on profiling this. Hopefully, some memory settings and fixes like this one (coming up in 16.4) might help.

Cheers,
Aravind

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carl Reid

unread,
Apr 15, 2016, 8:03:06 AM4/15/16
to go-cd
Thanks for the reply - can I ask how you determined 750 pipelines should be 6GB? Is there a rule of thumb we can apply?

We had it set to 20GB before however we found the GC was taking excessive CPU usage and this was degrading performance significantly.
Since reducing it to 2GB this has stopped happening however we still see slow performance and the occasional crash.

Jason D

unread,
Apr 15, 2016, 11:22:16 AM4/15/16
to go-cd
I'd like to echo Carl's comments and concerns.  We have a bigger installation (2500+) pipelines and likely have been tolerating even worse performance.  For instance, to save config on almost any screen can take anywhere from 20-60+ seconds which makes building new pipelines a laborious process for those who are not GO admins.  Yes, there are work-arounds that make it bearable but the lack of attention on enterprise level performance has been an ongoing issue for quite a while now.

I'm open to the fact we may be doing something wrong w/our setup so, Aravind, if you can detail the setup, performance and usage of the "much bigger installations" I'd be very interested in seeing it.

Thanks.

Aravind SV

unread,
Apr 18, 2016, 4:12:41 PM4/18/16
to go...@googlegroups.com
On Fri, Apr 15, 2016 at 8:03 AM, Carl Reid <carland...@gmail.com> wrote:
Thanks for the reply - can I ask how you determined 750 pipelines should be 6GB? Is there a rule of thumb we can apply?

We had it set to 20GB before however we found the GC was taking excessive CPU usage and this was degrading performance significantly.
Since reducing it to 2GB this has stopped happening however we still see slow performance and the occasional crash.

It's a bit of a rule of thumb and a bit of having seen multiple systems. Without knowing your setup completely, it's hard to do much more. Here's what I'd need to consider to give you a better answer (or, think of these as factors which can affect performance):

1. Number of pipelines (and really number of stages):

This is because scheduling depends on this, and the number of stages it would need to consider as downstream / upstream and run fan-in to find the right revisions, etc. increase a lot with the number of pipelines. Notice that the number of jobs and the number of tasks has no bearing on this. Tasks, especially don't, since they run on the GoCD agent.


2. Number of agents, number of users, and scripts using API calls:


All are relevant because they increase the amount of web traffic and hence the amount of work needed to be done by the server, to service those requests. Having a lot of agents can cause a slowdown because they're all pinging the server for work. Usually, this doesn't happen but it shows up vividly upon a server upgrade, because all the agents will suddenly want to upgrade themselves and will ask the server for the upgraded agent.

The number of users is usually an underestimated factor of this. I know of a few installations having so many users and agents that the GoCD server ends up having to server close to a million requests per hour (a steady few hundreds of requests per second) apart from everything else it's doing on a big instance like this.


3. Number of materials
:

This is because GoCD has to poll all of these materials for changes and if there are hundreds of materials, by the time it finishes one round of polling of all of them, it's time to start another round. Or, if some or all of them are slow, you'll see messages in the logs about skipping material update, because a material has been in the queue for about 60 to 90 seconds and has not been processed. GoCD processes (polls) 10 materials in parallel.


4. Number of pipeline instances (runs):

GoCD loads information about all the pipeline runs it knows about (not deep, but shallow - information about its build number, label, etc). So, if you have a 100 pipelines with 30000 runs each, it could load a lot of information (usually at startup). This will differ in performance from having 100 pipelines with 300 runs each). This is done for fan-in scheduling, so that it can go back however much it needs to. I feel it can be done differently, but that is the way it is.


5. Speed of your IO (IOPS):

This affects many aspects of the system. Most notably, the H2 database and publishing and fetching artifacts. This can show up as a slowdown of the artifact fetch at an agent level and slowdown in publishing them. This becomes a bit of a big deal in setups with a VM as the GoCD server. People underestimate IOPS and how it is affected by other VMs on the same host. You can see this show up when you run diagnostics (or monitoring) on the host, rather than the guest. At some point, even with quick IO, H2 can become a bottleneck, if there are too many changes happening too quickly. It can block of those threads.


6. Number of CPUs:

Usually more the merrier. If this is too few (1 or 2), there can be contention because not all threads will get time and will be stopped unnecessarily.


7. Amount of memory you've given it (Xmx):

Again, usually more the better, but not too much. I don't have a great rule of thumb for this, but what I've noticed is that for a system of the scale you showed, about 6 to 8GB makes sense. 2GB was too little. For 2GB, I'd expect it to have maybe 50 to a 100 pipelines, with 20 to 30 materials and agents each. If you give it too little, it'll spend too much time in garbage collection and if you give it too much, it'll do GC less often, but when it does it'll take longer.


Having said that, even if you give me all of those numbers, I don't have a magic formula for this. :) It really depends on you trying out a couple of things and knowing some of this so that you can appropriately size your installation.

Cheers,
Aravind

Jason D

unread,
Apr 19, 2016, 1:47:29 PM4/19/16
to go-cd
Good info Aravind, thank you.  

What we would find particularly helpful are the specs, setup and usage of some of your larger clients that satisfied w/performance.  Do you think you could find this information?  Wouldn't have to name the client, obviously.



On Friday, April 15, 2016 at 4:31:32 AM UTC-5, Carl Reid wrote:

Carl Reid

unread,
May 11, 2016, 9:54:07 AM5/11/16
to go-cd
I second what Jason has said here, any reference implementations in terms of hardware spec, server configuration and number of pipelines etc would be very useful.
We are considering new hardware to alleviate the issues but art the moment do not know what we should be looking at in terms of spec.
Reply all
Reply to author
Forward
0 new messages