Builds aborting randomly

1,398 views
Skip to first unread message

Lukas Rytz

unread,
Jul 28, 2012, 9:04:41 AM7/28/12
to jenkins...@googlegroups.com
Hi all,


Lately we see quite a lot of jobs (~10 %) that just abort without any intervention.
Somebody else ever had similar problems?

No error message in the console output:

[...]
[partest] testing: [...]/run/reflection-constructormirror-nested-good.scala [ OK ]
[partest] testing: [...]/files/run/viewtest.scala [ OK ]
[partest] testing: [...]/files/run/reify_newimpl_20.scala [ OK ]
Build was aborted
Archiving artifacts
Checking console output
Email was triggered for: Aborted
Sending email for trigger: Aborted

The abort is not because of a timeout (build timeout plugin).
The Jenkins logs say that the abort is due to an un-cougth InterruptedException, stack trace
below. It always looks the same.

I think the reason is an InterruptedException in master-slave communication. The slaves are
connected over SSH using the "SSH Slaves Plugin".

I don't think that the exception is caused by our testing tool - this is running on the client in
another (JVM) process, so even if it quits with an InterruptedException, that should not abort
the Jenkins build.


Thanks for any pointers!
Lukas



Jenkins Log:

INFO: scala-checkin #6609 aborted
java.lang.InterruptedException
  at java.lang.Object.wait(Native Method)
  at hudson.remoting.Request.call(Request.java:146)
  at hudson.remoting.Channel.call(Channel.java:663)
  at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
  at $Proxy36.join(Unknown Source)
  at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
  at hudson.Launcher$ProcStarter.join(Launcher.java:345)
  at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
  at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
  at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
  at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
  at hudson.model.Build$BuildExecution.build(Build.java:199)
  at hudson.model.Build$BuildExecution.doRun(Build.java:160)
  at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
  at hudson.model.Run.execute(Run.java:1488)
  at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
  at hudson.model.ResourceController.execute(ResourceController.java:88)
  at hudson.model.Executor.run(Executor.java:236)



Lukas Rytz

unread,
Jul 29, 2012, 5:55:37 AM7/29/12
to jenkins...@googlegroups.com
Further observation: it seems to happen only when running multiple concurrent builds
of the same job on the same slave (but not when running multiple builds on separate
slaves, at least it seems that way currently).

Lukas Rytz

unread,
Aug 14, 2012, 4:41:21 AM8/14/12
to jenkins...@googlegroups.com
Well, that's unfortunately not the case. I changed our setup to never run builds of
the same job on the same machine in parallel, but the aborts still happen. Just
less often.

The aborts always come in batches. The last batch was 48 aborts at the same time,
each producing the same message in the Jenkins log (see first post).

I'm mostly wondering if no-one ever experienced this problem..

Lukas

Richard Bywater

unread,
Aug 14, 2012, 4:45:38 AM8/14/12
to jenkins...@googlegroups.com
Wild guess but are the builds happening on a Windows based slave and
is someone logging out whilst the builds are running?

I've had problems in the past with this (its a thing you can get
around by passing the right argument -- -Xrs I think from memory)

Might be nowhere near the issue but just in case :)

Cheers
Richard.

Lukas Rytz

unread,
Aug 14, 2012, 4:48:28 AM8/14/12
to jenkins...@googlegroups.com


On Tuesday, August 14, 2012 10:45:38 AM UTC+2, Richard Bywater wrote:
Wild guess but are the builds happening on a Windows based slave and
is someone logging out whilst the builds are running?

Thanks for the pointer! But that cannot be it - they are all linux slaves running
with the SSH Slaves Plugin, and they are dedicated machines, nobody is
interacting with them..

Joachim Van der Auwera

unread,
Aug 14, 2012, 5:01:28 AM8/14/12
to jenkins...@googlegroups.com
Maybe the machines are running out of memory? I have heard of Linux killing random processes to release memory.

Evgeny Makarov

unread,
Oct 11, 2012, 5:13:49 AM10/11/12
to jenkins...@googlegroups.com
Hi. As I can see the solution not found yet? I have same problem with job interruption. Have anyone found the solution?
Thanks

Pawel

unread,
Oct 11, 2012, 7:26:22 AM10/11/12
to jenkins...@googlegroups.com
Have you tried -Xrs parameter for JVM?

Lukas Rytz

unread,
Mar 18, 2013, 11:34:37 AM3/18/13
to jenkins...@googlegroups.com
This is embarrassing, but let me post it for reference.
It seems the reason was a simple mis-configuration (we allowed anonymous users to cancel builds).
The search engine crawlers were probably causing the aborts by visiting the "job/id/stop" links.


It would have helped to get a bit of information in the build log output, or in the jenkins log, on the cause of the abort.



On Saturday, July 28, 2012 3:04:41 PM UTC+2, Lukas Rytz wrote:
Reply all
Reply to author
Forward
0 new messages