How frequently is an Agent's build_status updated in API?

33 views
Skip to first unread message

Ashwanth Kumar

unread,
Feb 27, 2016, 7:07:10 AM2/27/16
to go-c...@googlegroups.com
While working on a tool to autoscale GoCD agents, I happen to notice that BuildAssignmentService assigns a job to an agent but querying the agents API doesn't reflect the same. 

GoCD Server logs
2016-02-27 11:08:49,646  INFO [qtp515132998-26] BuildRepositoryRemoteImpl:106 - [Agent [5d0d5422798e, 172.17.0.3, d1130234-e5aa-43dc-a384-1a16f9a4679d, 56bb93ac-7493-4097-b136-8d93e157f926]] is reporting status and result [Completed, Passed] for [Build [test-no-env-long-running/9/defaultStage/1/defaultJob/42]]
2016-02-27 11:08:49,653  INFO [qtp515132998-26] Stage:218 - Stage is being completed by transition id: 239
2016-02-27 11:08:49,721  INFO [73@MessageListener for WorkFinder] BuildAssignmentService:115 - [Agent Assignment] Assigned job [JobIdentifier[test-no-env-short-1, 10, 10, defaultStage, 1, defaultJob, 43]] to agent [Agent [5d0d5422798e, 172.17.0.3, d1130234-e5aa-43dc-a384-1a16f9a4679d]]
2016-02-27 11:08:54,750  INFO [qtp515132998-26] GoConfigDao:194 - Config update request by anonymous is in queue - CompositeConfigCommand{commands=[UpdateAgentApprovalStatus{uuid='d1130234-e5aa-43dc-a384-1a16f9a4679d', denied=true}]}
2016-02-27 11:08:54,751  INFO [qtp515132998-26] GoConfigDao:198 - Config update request by anonymous is being processed
2016-02-27 11:08:54,758  INFO [qtp515132998-26] MagicalGoConfigXmlWriter:103 - [Serializing Config] Generating config partial.
2016-02-27 11:08:54,759  INFO [qtp515132998-26] GoFileConfigDataSource:247 - [Configuration Changed] Saving updated configuration.
2016-02-27 11:08:54,779  INFO [qtp515132998-26] CachedFileGoConfig:235 - About to notify config listeners
2016-02-27 11:08:54,781  INFO [qtp515132998-26] CachedFileGoConfig:243 - About to notify config listeners
2016-02-27 11:08:54,782  INFO [qtp515132998-26] BuildAssignmentService:172 - [Configuration Changed] Removing jobs for pipelines that no longer exist in configuration.
2016-02-27 11:08:54,784  INFO [qtp515132998-26] CachedFileGoConfig:251 - Finished notifying all listeners
2016-02-27 11:08:54,785  INFO [qtp515132998-26] CachedFileGoConfig:245 - Finished notifying all listeners
2016-02-27 11:08:54,786  INFO [qtp515132998-26] GoConfigDao:211 - Config update request by anonymous is completed
2016-02-27 11:08:54,810  INFO [qtp515132998-20] GoConfigDao:194 - Config update request by anonymous is in queue - CompositeConfigCommand{commands=[DeleteAgent{uuid='d1130234-e5aa-43dc-a384-1a16f9a4679d'}]}
2016-02-27 11:08:54,812  INFO [qtp515132998-20] GoConfigDao:198 - Config update request by anonymous is being processed
2016-02-27 11:08:54,819  INFO [qtp515132998-20] MagicalGoConfigXmlWriter:103 - [Serializing Config] Generating config partial.
2016-02-27 11:08:54,821  INFO [qtp515132998-20] GoFileConfigDataSource:247 - [Configuration Changed] Saving updated configuration.
2016-02-27 11:08:54,836  INFO [qtp515132998-20] CachedFileGoConfig:235 - About to notify config listeners
2016-02-27 11:08:54,837  INFO [qtp515132998-20] CachedFileGoConfig:243 - About to notify config listeners


From my tool's logs
16:38:54.750 scalar.go:98#Execute ▶ INFO 0044 Disabling the agent d1130234-e5aa-43dc-a384-1a16f9a4679d on Go Server
16:38:54.807 scalar.go:101#Execute ▶ DEBUG 0045 Checking if the disabled agent d1130234-e5aa-43dc-a384-1a16f9a4679d has started building
16:38:54.817 scalar.go:105#Execute ▶ DEBUG 0046 Disabled agent d1130234-e5aa-43dc-a384-1a16f9a4679d is in Idle state so deleting it
16:38:54.817 scalar.go:106#Execute ▶ INFO 0047 Deleting the agent d1130234-e5aa-43dc-a384-1a16f9a4679d on Go Server
16:38:55.216 docker.go:84#ScaleDown ▶ INFO 0048 Terminating agent d1130234-e5aa-43dc-a384-1a16f9a4679d created via Docker


The tool is running 5.30 hours ahead of the server TZ.

Notes
- I'm using the travix/gocd-server docker image, running GoCD Version - 16.2.1(3027-07834a1f2ce79a13d42b03601698ef48f128a012)
- I poll the '/go/api/agents' every 10 seconds.
- The tool first disables the agent and verifies if the agent has started building and if not, send a Delete request for the agent. 

Any ideas on pointers on why is this happening? 
BTW, when I increase the delay to 15 seconds or more, I don't see it happen often. 

--

Ashwanth Kumar / ashwanthkumar.in

Ashwanth Kumar

unread,
Feb 27, 2016, 7:28:03 AM2/27/16
to go-c...@googlegroups.com
Taking back what I said on the 15 second interval. It happens irrespective of what's the pooling interval is. If the tool queries within some duration (so far the highest have been 7 seconds), the api still responds the agent is in Idle state. 

My concern is when this happens the job that got assigned to that agent never gets rescheduled to another agent and doesn't show up on the /go/api/jobs/scheduled.xml endpoint. 

Is there anyway to find all the building / stuck jobs like this via any endpoint? 

Mahesh Panchakasahriah

unread,
Feb 29, 2016, 8:44:55 AM2/29/16
to go-cd-dev, ashwan...@googlemail.com
Hi Ashwanth,

BuildAssignmentService assigning a Job to an agent would not immediately result in the agent processing the job, the agent would remain in idle state till it actually starts processing the job. 

My concern is when this happens the job that got assigned to that agent never gets rescheduled to another agent and doesn't show up on the /go/api/jobs/scheduled.xml endpoint.
 
The jobs that are assigned but never picked up(in case the agent dies before processing) will be rescheduled at a later point of time, the reschedule interval is configured through the property  cruise.reschedule.hung.builds.interval which defaults to 5 mins.

Is there anyway to find all the building / stuck jobs like this via any endpoint? 

Currently there is no endpoint to list building/stuck jobs, do you think /go/cctray.xml would serve your purpose?

Thanks,
Mahesh

Ashwanth Kumar

unread,
Feb 29, 2016, 9:31:34 PM2/29/16
to Mahesh Panchakasahriah, go-cd-dev
Hey Mahesh,

BuildAssignmentService assigning a Job to an agent would not immediately result in the agent processing the job, the agent would remain in idle state till it actually starts processing the job. 

In which case it would really help if the API exposed the state "assigned" in build_state of the agent. Also very similarly the go-server logs has a state called "preparing" for the agent, but it's not exposed via the API either. If this is something you guys would consider having it in the API? If so, I can contribute a patch for the same. 
 
The jobs that are assigned but never picked up(in case the agent dies before processing) will be rescheduled at a later point of time, the reschedule interval is configured through the property  cruise.reschedule.hung.builds.interval which defaults to 5 mins.

Thanks. I just need an advice on, if I reduce it to 1 minutes - does it have any performance impacts on the server? Because in the docker image I see this this happening only every 30 minutes and it hasn't overriden the value there yet. If I manually set it GO_SERVER_SYSTEM_PROPERTIES environment variable overriding it, it feels like it's working as expected.  

Is there anyway to find all the building / stuck jobs like this via any endpoint? 

Currently there is no endpoint to list building/stuck jobs, do you think /go/cctray.xml would serve your purpose?

I was hoping very similar to the scheduled.xml there's would be another endpoint which would return the building jobs, with their environment and resources. Unfortunately cctray.xml doesn't give me that information :( 

Ketan Padegaonkar

unread,
Feb 29, 2016, 10:55:06 PM2/29/16
to go-c...@googlegroups.com
On Tue, Mar 1, 2016 at 8:01 AM 'Ashwanth Kumar' via go-cd-dev <go-c...@googlegroups.com> wrote:
Hey Mahesh,

BuildAssignmentService assigning a Job to an agent would not immediately result in the agent processing the job, the agent would remain in idle state till it actually starts processing the job. 

In which case it would really help if the API exposed the state "assigned" in build_state of the agent. Also very similarly the go-server logs has a state called "preparing" for the agent, but it's not exposed via the API either. If this is something you guys would consider having it in the API? If so, I can contribute a patch for the same. 

Please, we'd love to expose more information in the API. If you have any specific requirements, open an issue and we're happy to help you with getting the PR going.
 
 
The jobs that are assigned but never picked up(in case the agent dies before processing) will be rescheduled at a later point of time, the reschedule interval is configured through the property  cruise.reschedule.hung.builds.interval which defaults to 5 mins.

Thanks. I just need an advice on, if I reduce it to 1 minutes - does it have any performance impacts on the server? Because in the docker image I see this this happening only every 30 minutes and it hasn't overriden the value there yet. If I manually set it GO_SERVER_SYSTEM_PROPERTIES environment variable overriding it, it feels like it's working as expected.

I don't think it'll cause too much of a problem. That 5 minute interval was to work around intermittent network issues when agents go away, and come back in a few minutes, and the job continues. If you're sure that your network issues won't cause a delay of more than 1 minute, go ahead. I don't see too much of a performance issue there.

 

Is there anyway to find all the building / stuck jobs like this via any endpoint? 

Currently there is no endpoint to list building/stuck jobs, do you think /go/cctray.xml would serve your purpose?

I was hoping very similar to the scheduled.xml there's would be another endpoint which would return the building jobs, with their environment and resources. Unfortunately cctray.xml doesn't give me that information :( 

CCTray is supposed to provide enough information for just that — CCTray. Please open an issue and we can take this conversation there. 
Reply all
Reply to author
Forward
0 new messages