[JIRA] (JENKINS-59668) Run wrapper process in the background fails with the latest changes

2 views
Skip to first unread message

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 4, 2019, 3:42:03 PM10/4/19
to jenkinsc...@googlegroups.com
Victor Martinez created an issue
 
Jenkins / Bug JENKINS-59668
Run wrapper process in the background fails with the latest changes
Issue Type: Bug Bug
Assignee: Unassigned
Components: durable-task-plugin
Created: 2019-10-04 19:41
Environment: Jenkins 2.197
durable-task 1.30
Priority: Major Major
Reporter: Victor Martinez

Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

 

[2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
[2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
script returned exit code -2  

 

Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

https://github.com/elastic/apm-agent-nodejs/pull/1393

Apparently it happens when running a docker inside of worker.

 

Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around

 

Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

 

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 7:33:03 AM10/8/19
to jenkinsc...@googlegroups.com
Victor Martinez commented on Bug JENKINS-59668
 
Re: Run wrapper process in the background fails with the latest changes

I've just got some interesting stacktrace when running the below snippet

 

      post {
        always {
          sh label: 'Docker ps', script: 'docker ps -a || true'
        }
      } 

 


12:25:19  [Pipeline] sh (Docker ps)
12:30:26  process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5ddcccd6
12:30:26  (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
12:30:26  Error when executing always post condition:
12:30:26  hudson.AbortException: script returned exit code -2
12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658)
12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604)
12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548)
12:30:26  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
12:30:26  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
12:30:26  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
12:30:26  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
12:30:26  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
12:30:26  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
12:30:26  	at java.lang.Thread.run(Thread.java:748)
12:30:26   

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 7:35:01 AM10/8/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
 
Change By: Victor Martinez
Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

 
{code:java}
[2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
[2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
script returned exit code -2  {code}

 

Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

- [https://github.com/elastic/apm-agent-nodejs/pull/1393]

Apparently it happens when running a docker inside of a worker.

 

Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around :(


 

Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

 

 

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 7:45:02 AM10/8/19
to jenkinsc...@googlegroups.com
 
Re: Run wrapper process in the background fails with the latest changes

Not sure if it helps but in a nutshell the below pseudo-declarative represents the workflow that triggers this particular issue.

pipeline {
   agent linux
   stages {
      stage('foo') {
          steps {
             sh 'docker run .....'
             scripts {
                  node('windows) {
                        bat ....
                  }
             }
          } 
          post {
               clean {
                   sh 'docker ps -a || true'
               }
          }
      }
   } 
} 

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 7:45:03 AM10/8/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
Change By: Victor Martinez
Environment: Jenkins 2.197
durable-task 1.30

Docker version 19.03.1, build 74b1e89e8a

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 11:36:01 AM10/8/19
to jenkinsc...@googlegroups.com
 
Re: Run wrapper process in the background fails with the latest changes

When enabling the flag LAUNCH_DIAGNOSTICS:

16:19:29  Post stage
16:19:29  [Pipeline] sh (Docker ps)
16:19:29  nohup: failed to run command 'sh': No such file or directory
16:24:35  process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5dd00fe7
16:24:35  Error when executing always post condition:
16:24:35  hudson.AbortException: script returned exit code -2
16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658)
16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604)
16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548)
16:24:35  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
16:24:35  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
16:24:35  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
16:24:35  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
16:24:35  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
16:24:35  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
16:24:35  	at java.lang.Thread.run(Thread.java:748)
16:24:35  
16:24:35  [Pipeline] }
 

 

Apparently the error seems to be related to running a post step within the stage context which uses the top-level agent defined at the very beginning of the declarative pipeline.

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 8, 2019, 11:36:03 AM10/8/19
to jenkinsc...@googlegroups.com
Victor Martinez edited a comment on Bug JENKINS-59668
Not sure if it helps but in a nutshell the below pseudo-declarative represents the workflow that triggers this particular issue.


{code:java}
pipeline {
  agent
{ label ' linux ' }
  stages {
     stage('foo') {
         steps {
            sh 'docker run .....'
            scripts {
                 node('windows ' ) {

                       bat ....
                 }
            }
         }
         post {
              clean {
                  sh 'docker ps -a || true'
              }
         }
     }
  }
} {code}

dnusbaum@cloudbees.com (JIRA)

unread,
Oct 11, 2019, 10:22:03 AM10/11/19
to jenkinsc...@googlegroups.com

dnusbaum@cloudbees.com (JIRA)

unread,
Oct 11, 2019, 11:14:03 AM10/11/19
to jenkinsc...@googlegroups.com
Devin Nusbaum commented on Bug JENKINS-59668
 
Re: Run wrapper process in the background fails with the latest changes

IIRC, using LAUNCH_DIAGNOSTICS restores the previous behavior, so if enabling that does not fix the problem, I don't think JENKINS-58290 is the cause. The other change in that release was PR 95, which changes how the default shell is selected. I would check how the default shell is configured on the agents in your examples, since "nohup: failed to run command 'sh': No such file or directory" definitely seems like the problem is related to the shell being used, which I believe previously would have defaulted to /bin/sh, now defaults to sh, so maybe your agents need sh on PATH or something?

jglick@cloudbees.com (JIRA)

unread,
Oct 11, 2019, 3:10:03 PM10/11/19
to jenkinsc...@googlegroups.com

I presume the issue is trying to run sh on Windows and it not existing. BTW the full Pipeline script does not seem to make much sense: you are grabbing a Linux node, then holding an executor lock while also grabbing a Windows node?

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 11, 2019, 3:42:02 PM10/11/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
 
Change By: Victor Martinez
Attachment: image-2019-10-11-20-41-24-095.png

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 11, 2019, 3:55:02 PM10/11/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
Change By: Victor Martinez
Attachment: image-2019-10-11-20-54-36-634.png

jglick@cloudbees.com (JIRA)

unread,
Oct 11, 2019, 3:59:03 PM10/11/19
to jenkinsc...@googlegroups.com
 
Re: Run wrapper process in the background fails with the latest changes

Sometimes this results from script mistakes like

withEnv(["PATH=${tool 'whatever'}/bin"]) {
  sh 'whatever'
}

when what was meant was

withEnv(["PATH+WHATEVER=${tool 'whatever'}/bin"]) {
  sh 'whatever'
}

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 11, 2019, 4:49:02 PM10/11/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
Change By: Victor Martinez
Attachment: image-2019-10-11-21-48-35-471.png

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 11, 2019, 5:13:02 PM10/11/19
to jenkinsc...@googlegroups.com
Victor Martinez updated an issue
Change By: Victor Martinez
Attachment: image-2019-10-11-22-12-00-367.png

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 11, 2019, 5:25:03 PM10/11/19
to jenkinsc...@googlegroups.com
Victor Martinez commented on Bug JENKINS-59668
 
Re: Run wrapper process in the background fails with the latest changes

Much appreciated for the debugging and answers. Although I've found the current pipeline might be a bit too complicated, but that's likely another story.

 

I was able to narrow down the issue a bit more based on your feedback, for some reason the PATH environment variable get corrupted, so the below snippet produces

 

def forLinux() (
  return {
    node('linux'){ 
      try {
        sh label: 'Pre-Environment', script: 'env | sort'   // only for debugging purposes
        deleteDir()
        unstash 'source'
        retry(2){
          sleep  23
          sh(label: "Run Tests"...
        }
        sh label: 'Post-Environment', script: 'env | sort'  // only for debugging purposes
      } catch(e){ 
        error(e.toString())
      } finally {
        sh label: 'Environment', script: 'env | sort'  // only for debugging purposes
        ...
      }
    }
  }
}

 

 

 

I just managed to add the withEnv to ensure the PATH is defined, although not sure how it gets corrupted though, as that particular behaviour does not happen when the above snippet in a parallel step without any other references but calls to the forLinux

 

For instance, the below snippet shows how the parallel is configured for the issue with the nohup and a similar one which does not fail with the nohup.

 

stage('FailedWithNoHup') {
  environment {
    HOME = "${env.WORKSPACE}"
  }
  steps {
    def parallelTasks = [:]
    parallelTasks["Linux-1"] = forLinux(version: 1) 
    parallelTasks["Linux-2"] = forLinux(version: 2)
    parallelTasks["Windows-1"] = forWindows(version: 1)
  }
}
stage('Works') {
  environment {
    HOME = "${env.WORKSPACE}"
  }
  steps {
    def parallelTasks = [:]
    parallelTasks["Linux-1"] = forLinux(version: 1) 
    parallelTasks["Linux-2"] = forLinux(version: 2)
    parallel(parallelTasks)
  }
}
def forLinux(Map params = [:]){
  def version = params?.version
  return {
    node('linux'){ 
      try {
        sh label: 'Pre-Environment', script: 'env | sort'   // only for debugging purposes
        deleteDir()
        unstash 'source'
        retry(2){
          sleep  23
          sh(label: "Run Tests"...
        }
        sh label: 'Post-Environment', script: 'env | sort'  // only for debugging purposes
      } catch(e){ 
        error(e.toString())
      } finally {
        sh label: 'Environment', script: 'env | sort'  // only for debugging purposes
        ...
      }
    }
  }
}
def forWindows(Map params = [:]){
  return {
    node('windows'){
      ...
    }
  }
}


 

 

jglick@cloudbees.com (JIRA)

unread,
Oct 11, 2019, 5:46:02 PM10/11/19
to jenkinsc...@googlegroups.com

Using

sh 'env | sort'

is obviously not going to help diagnosis if the problem is a broken sh step! You can try

echo "PATH set to $PATH"

which is not quite as convincing but may pinpoint your issue: either something in your build or Jenkins node configuration corrupting $PATH, or an accidental usage of sh inside a Windows node (without a properly configured Cygwin environment or whatever).

Can probably be closed as this looks like a user error, not related to recent changes except to the extent that for technical reasons we had to turn off LAUNCH_DIAGNOSTICS by default to avoid wasting system resources in the normal case that $PATH is OK. Maybe it could be automatically enabled for the rest of the JVM session upon encountering one of these errors, or something like that.

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 15, 2019, 11:22:05 AM10/15/19
to jenkinsc...@googlegroups.com
Victor Martinez closed an issue as Not A Defect
 
Change By: Victor Martinez
Status: Open Closed
Resolution: Not A Defect

VictorMartinezRubio@gmail.com (JIRA)

unread,
Oct 15, 2019, 11:22:05 AM10/15/19
to jenkinsc...@googlegroups.com
Victor Martinez commented on Bug JENKINS-59668
 
Re: Run wrapper process in the background fails with the latest changes

Thanks again for the help, I finally managed to find the issue related to the $PATH, which it was related to a manipulation of the env map, the env seems to be a global map therefore when running a parallel step using different OS its behavior might be unpredictable.

 

Thanks again and thanks for caring about LAUNCH_DIAGNOSTICS to be the default approach. 

 

I'll close this ticket now if you don't mind

Reply all
Reply to author
Forward
0 new messages