java.nio.channels.ClosedByInterruptException when I click cancel button from a running job

1,466 views
Skip to first unread message

Daniel Anechitoaie

unread,
Mar 20, 2018, 5:15:48 PM3/20/18
to Jenkins Developers
Hi,

I'm wrting a plugin that implements a SimpleBuildStep and I have a strange behaivour that I don't understand.
If I start the build and then click the cancel button while it's in progress I get a java.nio.channels.ClosedByInterruptException and the job fails with error instead of with the normal job interupted status.

He're the full stack trace:

java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:216)
        at hudson.util.FileChannelWriter.write(FileChannelWriter.java:72)
        at java.io.Writer.write(Writer.java:192)
        at hudson.util.AtomicFileWriter.write(AtomicFileWriter.java:162)
        at java.io.Writer.write(Writer.java:157)
        at hudson.XmlFile.write(XmlFile.java:189)
        at hudson.model.Run.save(Run.java:1923)
        at hudson.model.Run.execute(Run.java:1784)


My plugin is a Builder plugin that implements SimpleBuildStep and I run my code (which zips some files from the workspace and uploads them to a server) in a 
"private static class DeployCallable extends MasterToSlaveFileCallable<Void>" instance that is called using "workspace.act(new DeployCallable(...".

The strange thing is that if I restar Jenkins then the build that I canceled also dissapears from the build history for this job.


Any idea what would cause this issue? And idea what can I do to fix it?

Also am I using the right class/method (workspace.act and MasterToSlaveFileCallable) to make my plugin compatible with master/slave setups?



Thank you.

Jesse Glick

unread,
Mar 20, 2018, 5:27:48 PM3/20/18
to Jenkins Dev
Offhand that sounds like a core bug. If you were just unlucky enough
to cancel at the exact moment `Run.save` was being called, then that
is a robustness fix. Possibly some other exception was thrown which
failed to clear the thread interruption flag. Would need a
self-contained reproducible test case to analyze—if you have one, file
a bug report.

Daniel Anechitoaie

unread,
Mar 20, 2018, 5:29:52 PM3/20/18
to Jenkins Developers
This happens all the time for me. So I run the plugin (in dev mode) with "mvn hpi:run", start the build, click cancel, and boom, I get this exception.

Daniel Anechitoaie

unread,
Mar 21, 2018, 7:12:28 AM3/21/18
to Jenkins Developers
My plugin just zips some files and pushes them to a HTTP server so I'm using:
and

I sit possible maybe one of these libraries? Throw some kind of exception when Jenkins tries to interrupt the job when I click cancel and this bubbles up to crash the job thread somehow?

I tried a test with just a bunch of Thread.sleep() before it gets to my code (that zips the files and HTTP puts them) and if I cancel the job there it gets aborted nicely and is handled properly by Jenkins.
So it looks that this issue appears when job is being canceled while in the process op zipping the files or making those HTTP calls.


Another interesting thing is that even if on Jenkins side things crashed (nothing appears on the build log anymore - as I print things step by step as they are happening) the files are still being pushed to the server.
(i.e. If I have 10 zip files and I cancel the job while the 3rd file is being pushed, on Jenkins things have crashed - nothing appears anymore on build log - but on the server I still see the rest of the files from 4 to 10 being pushed 
and the job is actually marked as failed after everything is finished. At this moment I see the build in Jenkins web interface, but if I restart it it disappears from the log - I guess due to that "at hudson.XmlFile.write(XmlFile.java:189)"
as it never gets saved to the disk).


On Tuesday, March 20, 2018 at 11:27:48 PM UTC+2, Jesse Glick wrote:

Jesse Glick

unread,
Mar 21, 2018, 11:32:21 AM3/21/18
to Jenkins Dev
On Wed, Mar 21, 2018 at 7:12 AM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> [Is it] possible maybe one of these libraries? […]
>
> I tried a test with just a bunch of Thread.sleep() before it gets to my code
> (that zips the files and HTTP puts them) and if I cancel the job there it
> gets aborted nicely and is handled properly by Jenkins.

So sounds like a problem in the interaction between the behavior of
those libraries and Jenkins interrupt handling. Again I suspect that
something is catching interrupts, proceeding without throwing
`InterruptedException`, but then setting the thread interrupt flag and
this gets ignored up until the moment the build record is being
finalized and the NIO calls made from `AtomicFileWriter` check the
flag. If true, the solution would probably be for the code which runs
an individual build step (somewhere in `AbstractBuild` IIRC) to check
the interrupt flag and throw `InterruptedException` at that time.
Could be verified in a `JenkinsRule`-based test by having a
`TestBuilder` which mimics the relevant behavior of these libraries.

I doubt this particular issue could apply to Pipeline builds, though
there might be other ways in which interrupt handling is incorrect
there as well.
Message has been deleted
Message has been deleted

Daniel Anechitoaie

unread,
Mar 22, 2018, 6:16:33 AM3/22/18
to jenkin...@googlegroups.com, jgl...@cloudbees.com
I’ve added two replies to this thread by my emails keep getting deleted and they don’t appear on the group on the web.
Have i did something wrong?



--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-dev/BSL0N5UwyC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr1G3%3DVt28Y2pXupBTUVdsw-D%3DOBpkqv-%3Dw4sTtdoxsOyg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Baptiste Mathus

unread,
Mar 22, 2018, 6:19:17 AM3/22/18
to Jenkins Developers
Nothing wrong, your messages were just in the moderation queue. I just approved your messages and your future posts here. 

Please just take extra care to keep differentiating what needs to go the dev list, because dev related like you rightly did here, and what needs to go the users list.

Cheers

2018-03-21 22:55 GMT+01:00 Daniel Anechitoaie <danie...@gmail.com>:
I just posted a reply to this thread and somehow the message got deleted.
I see 







Have I did something wrong?

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/3e269568-7fa9-4743-b845-e7764eaac4bf%40googlegroups.com.

Jesse Glick

unread,
Mar 22, 2018, 2:04:10 PM3/22/18
to Jenkins Dev
On Wed, Mar 21, 2018 at 5:33 PM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> java.lang.InterruptedException
> at hudson.model.Build$BuildExecution.build(Build.java:214)

So this is from

https://github.com/jenkinsci/jenkins/blob/3b5c715ae0d00ec3a38c63d8c9bf9de2a76b9e29/core/src/main/java/hudson/model/Build.java#L212-L214

I suspect that code is wrong—see below.

> But in Jenkins build log it just stops and nothing is displayed.

Not sure what would cause that. The stack trace implies that the
`InterruptedException` should be caught here:

https://github.com/jenkinsci/jenkins/blob/3b5c715ae0d00ec3a38c63d8c9bf9de2a76b9e29/core/src/main/java/hudson/model/Run.java#L1740-L1745

as apparently happened in your `Thread.sleep` test builder. Perhaps
the ZIP function is not interruptible, which is too bad, but then the
symptom should merely be that the ZIP proceeds to completion, and then
the build is shown as aborted in the normal way.

> If I cancel during the HTTP upload part there I get the java.nio.channels.ClosedByInterruptException

That would be consistent with my hypothesis that the thread interrupt
flag is not getting cleared. Check if this helps either or both of
your cases (ZIP and HTTP) without regressing the `sleep` case:

diff --git a/core/src/main/java/hudson/model/Build.java
b/core/src/main/java/hudson/model/Build.java
index 15aed23050..7f2fbbbc34 100644
--- a/core/src/main/java/hudson/model/Build.java
+++ b/core/src/main/java/hudson/model/Build.java
@@ -208,8 +208,7 @@ public abstract class Build <P extends
Project<P,B>,B extends Build<P,B>>
return false;
}

- Executor executor = getExecutor();
- if (executor != null && executor.isInterrupted()) {
+ if (Thread.interrupted()) {
// someone asked build interruption, let stop the
build before trying to run another build step
throw new InterruptedException();
}

Daniel Anechitoaie

unread,
Mar 23, 2018, 8:19:43 AM3/23/18
to Jenkins Developers
>  Perhaps the ZIP function is not interruptible,  
I think so too.


> which is too bad, but then the symptom should merely be that the ZIP proceeds to completion, and then 
> the build is shown as aborted in the normal way. 
Which is totally fine. I zip things in a loop so even if it can't be interrupted right away but it will stop the next loop iteration from happening is totally fine.


> Check if this helps either or both of 
> your cases (ZIP and HTTP) without regressing the `sleep` case: 
Give me some time to figure out how to compile Jenkins with this patch and then run my plugin in it and I'll come back with feedback.


Some other things that I noticed while doing all kind of tests is that:
The problem reported in this thread happens when I use:

---
workspace.act(new DeployCallable(...
...
private static class DeployCallable extends MasterToSlaveFileCallable<Void> {..
---

if I use

---
launcher.getChannel().call(new DeployCallable(...
...
private static class DeployCallable extends MasterToSlaveCallable<Void, IOException> {
---

then it's not possible to interrupt at all. If I click cancel the full build step runs and only at the end the build is marked as interrupted. 
So it kind of seems that only MasterToSlaveFileCallable supports being interrupted while MasterToSlaveCallable it does not.

But this is a different issue, so I just brought it up as a FYI. I'll run the tests with your patch and MasterToSlaveFileCallable




BTW thank you so much for the help, I've been having this issue for a long time but now it's the first time I feel I'm making some progress in figuring out what's going on.

Jesse Glick

unread,
Mar 23, 2018, 11:30:03 AM3/23/18
to Jenkins Dev
On Fri, Mar 23, 2018 at 8:19 AM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> So it kind of seems that only MasterToSlaveFileCallable supports being
> interrupted while MasterToSlaveCallable it does not.

That would be odd, since file callables are just a bit of sugar,
internally implemented using generic callables. The one thing that
strikes my notice is that the wrapper translates
`InterruptedException` to `TunneledInterruptedException`, though this
ought to be transparent as `FilePath.act` unwraps that and rethrows
the original `InterruptedException`.

Daniel Anechitoaie

unread,
Mar 23, 2018, 2:19:19 PM3/23/18
to Jenkins Developers
I took latest Jenkins code from master, applied your patch, and runt the plugin.
Now when I click cancel I get.

Mar 23, 2018 8:13:31 PM hudson.model.Run execute
SEVERE: Failed to save build record
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:216)
at hudson.util.FileChannelWriter.write(FileChannelWriter.java:72)
at java.io.Writer.write(Writer.java:192)
at hudson.util.AtomicFileWriter.write(AtomicFileWriter.java:162)
at java.io.Writer.write(Writer.java:157)
at hudson.XmlFile.write(XmlFile.java:189)
at hudson.model.Run.save(Run.java:1923)
at hudson.model.Run.execute(Run.java:1784)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)

Jesse Glick

unread,
Mar 23, 2018, 2:44:09 PM3/23/18
to Jenkins Dev
On Fri, Mar 23, 2018 at 2:19 PM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> Now when I click cancel I get.

In other words, the same as before?

I think we have reached the limit of what is possible for “debugging
by telegram”. If you have a self-contained, minimal, reproducible test
case, you should file a bug report with those details. Given that this
probably only applies to freestyle and not Pipeline builds, I cannot
foresee anyone taking the time to work on an analysis and fix instead
of other priorities, unless this problem seems applicable to a variety
of plugins and somehow never got reported before.

Daniel Anechitoaie

unread,
Mar 23, 2018, 3:21:28 PM3/23/18
to Jenkins Developers
With pipelines seems to be working fine. If I click cancel the job is properly canceled without any errors.
If it happens and it's while a long running process (like zipping a big folder) it even asks me after a while if I want to force close the job which also works.

So the problem seems to be only with freestyle builds.



> unless this problem seems applicable to a variety 
> of plugins and somehow never got reported before. 
I think I should've become a QA as somehow I always have the luck of finding the most unexpected issues in unexpected cases.
I did searched for other plugins that implement a build step and that actually try to do any kind of file processing inside those builds but I haven't any similar to what I'm doing
so I can see how they are doing it and if they have same issues as me or not.


Anyway I guess I'll give up on this as I have no idea what else I can do and be happy that it works ok with pipeline builds as freestyle builds are kind of slowly becoming legacy. 


Thank you for your help, at least I have a better understanding of what's happening.

Jesse Glick

unread,
Mar 23, 2018, 3:37:41 PM3/23/18
to Jenkins Dev
On Fri, Mar 23, 2018 at 3:21 PM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> If it happens and it's while a long running process (like zipping a big
> folder) it even asks me after a while if I want to force close the job which
> also works.

This is a sign that your zip operation is uninterruptible, which is
potentially a problem, though perhaps not your first priority for bug
fixing.

> I did searched for other plugins that implement a build step and that
> actually try to do any kind of file processing inside those builds

Well, `ArtifactArchiver` in core does something conceptually similar,
for example.

Daniel Anechitoaie

unread,
Apr 2, 2018, 12:07:03 PM4/2/18
to Jenkins Developers
>This is a sign that your zip operation is uninterruptible, which is 
> potentially a problem, though perhaps not your first priority for bug 
> fixing.

I'm totally ok if it's not interruptible. It's a 3rd party lib and you can't expect all libs to work perfectly.
But it should interrupt after zip operation finishes, no?
So it should just be delayed a little which is totally ok.

I've check all the code from the zip library I use (org.zeroturnaround.zip) and they don't seem to be doing anything with Interrupted expecting that would case this behaviour. 
Also If I run my plugin just on master (so not inside a MasterToSlaveCallable) it does get interrupted ok (not instantly but as soon as zip operation is finished and without errors).
So my understanding is that maybe something in how MasterToSlaveCallables are handled cause this strange behaviour since on master and on pipelines works ok and the problem
seems to be just for MasterToSlaveCallables.

I even wrapped all my zipping code in a try {} catch (Exception e) {} and nothing is thrown by the zip code.



> Well, `ArtifactArchiver` in core does something conceptually similar, 
> for example. 
I've checked this and the code there is kind of split into multiple client to master operations plus it doesn't use any 3rd party library.
It first makes a call `FilePath.act(` to get the list of files. And then it calls .archive on them.
Everything seems to be handled with just calls to FilePath methods so just native internal Jenkins APIs

Jesse Glick

unread,
Apr 2, 2018, 1:27:58 PM4/2/18
to Jenkins Dev
On Mon, Apr 2, 2018 at 12:07 PM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> If I run my plugin just on master (so not inside a
> MasterToSlaveCallable) it does get interrupted ok (not instantly but as soon
> as zip operation is finished and without errors).
> So my understanding is that maybe something in how MasterToSlaveCallables
> are handled cause this strange behaviour since on master and on pipelines
> works ok and the problem
> seems to be just for MasterToSlaveCallables.

Not sure offhand. Sounds like it could be a bug in Jenkins core. Would
need to have a minimal self-contained test case that could be run
through a debugger to track down the culprit.

Daniel Anechitoaie

unread,
Apr 2, 2018, 1:49:33 PM4/2/18
to Jenkins Developers
I can provide whatever is needed.
Would a stripped down version of the plugin with just the absolute minimal amount of code needed to replicate the issue be what is needed?
And where should I provide this? I can put it on a new public GitHub repo?

Jesse Glick

unread,
Apr 3, 2018, 1:38:53 PM4/3/18
to Jenkins Dev
On Mon, Apr 2, 2018 at 1:49 PM, Daniel Anechitoaie
<danie...@gmail.com> wrote:
> Would a stripped down version of the plugin with just the absolute minimal
> amount of code needed to replicate the issue be what is needed?

That would be ideal, yes.

> And where should I provide this? I can put it on a new public GitHub repo?

Sure, or simply a source ZIP attached to an issue.

Again I make no promises about core devs picking this up for
investigation, unless it seems like there might be a broader problem
affecting a lot of people. The backlog is dizzying, I am afraid.
Reply all
Reply to author
Forward
0 new messages