We use workstations as additional build and test agents due to them having the correct software installed (Visual Studio) and being quite powerful machines.
However since these are workstations users often turn them off!
When a pipeline job is set to run on all agents the pipeline schedule seems to include the agents of the machines that are turned off and never completes until the machine is brought back online. The problem with this is that a new execution of the pipeline will not start until the previous one has completed. This requires us to manually go into each pipeline and cancel the execution. Not a great situation especially when there are a large number of pipelines that this occurs on.
My questions are:
- Is there a way of preventing GO from scheduling an "all agents" job to NOT schedule the job for a non-responsive or missing agent?
- Is there a way of getting GO from timing out and cancelling the pipeline scheduling for non-responsive or missing agents?
I have tried setting the job timeout however this only seems to apply once a job has started.Thanks in advance.Carl
--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thank you for the reply.Whilst I am keen to use an API approach to do this I am less keen on using an API that is going to be deprecated for obvious reasons.Is there going to be a replacement for this? Are there new APIs in the works?
And also, how would I find out which stages I should cancel? Is there an API that can show me stages that contain jobs that are assigned but not started (i.e. assigned to agents that are non-responsive?)
Another thought I had to prevent this from happening is create a scheduled task that runs frequently looking for agents that are "lost contact" for more than, say, 10 minutes and to disable them.I can then add a scheduled task to each of our servers and workstations to "enable" themselves if they are not currently enabled. This way agents on machines that are available and have connectivity to the GO Server will always be enabled and those that are turned off or lose connectivity will be disabled. It won't help pipelines that have already been scheduled however it will make the problem smaller.
Go reschedules the job when the agents are back online, is that not happening in your case?
Can you please shed some light on what format the "scheduled_date" value is in in the following API http://www.go.cd/documentation/user/current/api/stages_api.htmlI have tried parsing it multiple ways from the system.datetime object in .NET however this is not giving me a valid date.How do I convert this into a date time?
Unfortunately there's no api to figure this out. The pipeline history api reports the job/stage status and state as unknown. There's a schedule date field which may be used to assume that the job's stuck not sure if that's the right way to go about it.
--