Reasons for job termination

27 views
Skip to first unread message

Fabian Cenedese

unread,
Jul 3, 2020, 3:19:22 AM7/3/20
to jenkins...@googlegroups.com
Hi

We've been using Jenkins for years now. Recently a problem has
come up that I can't explain. Jobs started to get terminated with
no apparent reason. With a signal handler I found that it's
apparently the Jenkins user that is sending the SIGTERM to
the running process.

What are reasons for Jenkins to stop a job?

There is no second build being started and it's throttled anyway.
The build timeout plugin is installed but this is a pipeline job
where it doesn't work. And I don't use the timeout options in
the pipeline.
I don't see anything in the jenkins log at that time.

How can I find out why the job is killed?

Thanks

bye Fabi

Gianluca

unread,
Jul 3, 2020, 3:51:23 AM7/3/20
to Jenkins Users
Hi,
what you describe seems something we experienced.
The issue in our case was that the Jenkins agents were VMs running on an overloaded host with network issues.
A combination of network errors, agents not responding and IP exhaustion made Jenkins terminating the jobs with SIGTERM when it was uncapable to restore connection with the agent.
It was hard to find because the host running the VMs was overloaded when the agents were doing something so it was something like:
agent was ok -> agent started to build a job -> job was spawning other VMs for testing -> host got overloaded -> agent could run properly -> Jenkins lost connection with agent -> job got terminated -> host not anymore in overload -> agent ok again -> jenkins restored connection with agent.

fcenedese

unread,
Jul 3, 2020, 9:59:33 AM7/3/20
to Jenkins Users
Thanks for the hint. That's sure something we can look into. I would have guessed
that a lost connection would show up in the system log but it might not. At least
I can try to improve the situation now.

Thanks again

fcenedese

unread,
Jul 8, 2020, 3:00:36 AM7/8/20
to Jenkins Users
I just wanted to add my findings in case somebody else is looking for a solution to a similar problem.

It turned out that we have a second jenkins job running on the same machine, mostly unrelated to
the first job that was getting killed. The second job wants to start a process which can only work
if the process isn't already running. Therefore it is looking for processes with a certain name and kills
them if they exist. This pattern now unfortunately also matched a process of the first job and killed
it, assuming it was his own still running process. And as this didn't have anything to do with jenkins
it also didn't show up in the logs.

So it wasn't a jenkins error or resource problem but simply human error.

Thanks for any help and sorry for the noise.
Reply all
Reply to author
Forward
0 new messages