Here is the behavior I have observed:
- A WorkflowJob is created using Job DSL with concurrentBuilds false.
- The job starts running.
- Another build of the WorkflowJob is scheduled (due to an SCM commit) but remains in the queue, blocked on the first build.
- A Job DSL job runs again and updates the above WorkflowJob's job definition.
- At this point, what should be impossible happens: the second build starts running concurrently with the first build (even though concurrentBuilds is false in the DSL!). I can clearly see both running in the Jenkins classic UI.
- The first build fails with the java.io.FileNotFoundException mentioned above (program.dat (No such file or directory)).
Here is some background information. When Job DSL creates a pipeline job with concurrentBuilds false, it emits the following XML in the flow definition:
<concurrentBuild>false</concurrentBuild>
Now, note that this field is deprecated in WorkflowJob:
/** @deprecated replaced by {@link DisableConcurrentBuildsJobProperty} */
private @CheckForNull Boolean concurrentBuild;
In fact, the getter and setter in WorkflowJob use this deprecated field just to set a DisableConcurrentBuildsJobProperty property on the job:
@Exported
@Override public boolean isConcurrentBuild() {
return getProperty(DisableConcurrentBuildsJobProperty.class) == null;
}
[...]
public void setConcurrentBuild(boolean b) throws IOException {
concurrentBuild = null;
boolean propertyExists = getProperty(DisableConcurrentBuildsJobProperty.class) != null;
// If the property exists, concurrent builds are disabled. So if the argument here is true and the
// property exists, we need to remove the property, while if the argument is false and the property
// does not exist, we need to add the property. Yay for flipping boolean values around!
if (propertyExists == b) {
BulkChange bc = new BulkChange(this);
try {
removeProperty(DisableConcurrentBuildsJobProperty.class);
if (!b) {
addProperty(new DisableConcurrentBuildsJobProperty());
}
bc.commit();
} finally {
bc.abort();
}
}
}
The deserialization from XML takes place in WorkflowJob#onLoad:
@Override public void onLoad(ItemGroup<? extends Item> parent, String name) throws IOException {
super.onLoad(parent, name);
if (buildMixIn == null) {
buildMixIn = createBuildMixIn();
}
buildMixIn.onLoad(parent, name);
if (triggers != null && !triggers.isEmpty()) {
setTriggers(triggers.toList());
}
if (concurrentBuild != null) {
setConcurrentBuild(concurrentBuild);
}
We know that Job DSL is writing out the XML with <concurrentBuild>false</concurrentBuild>. So when the job is deserialized, the deprecated field concurrentBuild must be set to false. The onLoad method checks this field, sees that it is not null, and calls WorkflowJob#setConcurrentBuild(false). This changes the value of the field from false to null and adds the DisableConcurrentBuildsJobProperty property to the job in a bulk change. My theory is that while this is taking place, another caller concurrently invokes WorkflowJob#isConcurrentBuild. Since the DisableConcurrentBuildsJobProperty is not yet set on the job, this method returns true. Hence the scheduler starts running this job concurrently, erroneously. Then later on, we reach a pathological state in Pipeline and the java.io.FileNotFoundException is thrown. Next, I will try to prove my theory. I'll post updates in this bug, but I welcome any suggestions. |