pipeline.. sometimes slaves just hang

23 views
Skip to first unread message

Jonathan Hodgson

unread,
Dec 1, 2016, 8:39:33 AM12/1/16
to Jenkins Users
Hi,

Right now I'm looking at a pipelne job which has been hung on two slaves for over two hours.

Both slaves (one windows, one OSX) show as being connected, and the slave is still running on both, but both are stalled... one on starting a batch script, the other either in the shell script or just after it.

The Job has run without issue before.

This raises a couple of questions

1) What might be causing this (I've had mystery hangs before).
2) How can I investigate it

and 

3) Is there a way I can implement some sort of time out? If there has been no activity from the slave for a given period of time I'd prefer it if the build ended and I received an email telling me there had been a problem. As it is, I'm unaware of the problem unless I check, (which sort of negates the point of an autobuild), and the build system is effectively usefless until I go there and stop the build.

Jonathan Hodgson

unread,
Dec 1, 2016, 8:43:29 AM12/1/16
to Jenkins Users
It seems it was the windows slave specifically that was blocking things. When I stopped the slave app the OSX slave kicked into action.

The questions above still apply.

Peter Teichner

unread,
Dec 2, 2016, 6:48:08 AM12/2/16
to Jenkins Users
You can implement timeout following this guide - this one is for user input but you can adapt it accordingly with a try/catch block. https://support.cloudbees.com/hc/en-us/articles/226554067-Pipeline-How-to-add-an-input-step-with-timeout-that-continues-if-timeout-is-reached-using-a-default-value

Jonathan Hodgson

unread,
Dec 2, 2016, 8:03:57 AM12/2/16
to Jenkins Users
Thanks, but I'm not sure this does what I need.

The impression I have is that the timeout step is based on how long the contained step takes to execute.

What I need is a timeout based on when step actually does something. such as generate some output.

A full build can take a long time, if I set a timeout to allow for that, and it hands near the beginning, the slave could be hung up for  half an hour or more.

But it continuously generates new output to the console if it is working, so being able to timeout based on that stopping, that would work.

Peter Teichner

unread,
Dec 2, 2016, 8:30:25 AM12/2/16
to Jenkins Users
I see - it's a bit tricky to understand how your setup works without actually seeing the code you've written. Maybe you can look at the longest build time and set that as the timeout if the agent execution is happening inside the script rather than as a separate job

Jonathan Hodgson

unread,
Dec 2, 2016, 12:50:07 PM12/2/16
to Jenkins Users
The code is pretty complex, and getting more complex, but to summarize the steps in question are batch or shell scripts (depending on the platform) which run a compilation (and later more that will run various tests). How long they take can be quite variable, for example depending on whether a full rebuild has been requested, so settng a maximum timeout step at the maximum time for that complete step risks things stalling for a long period.

For that reason it would be useful to have a timeout based on when there was a change in stdout, for example.

Incidentally, it's jenkins that's having the issues, not the compilation.
Reply all
Reply to author
Forward
0 new messages