All Jenkins Pipeline jobs suddenly failed due to script security

1,212 views
Skip to first unread message

danb...@gmail.com

unread,
Aug 10, 2016, 1:06:06 PM8/10/16
to Jenkins Users
I'm going to provide a simplified description about a problem which caused all the jobs on a Jenkins instance to fail. Let me know where I can include more detail to help uncover the root problem.

We have a Jenkins instance with the Pipeline plugin installed. There are a dozen jobs all scheduled to run at different times overnight. Every job has a single pipeline script which constructs an object and calls a single method on that object. Except for the parameters, the scripts are identical across all jobs. The class was written by us and performs pretty complex operations under the hood. The jobs were created and configured by administrators. Non administrators have never touched any job configuration. The jobs have run successfully for many weeks. Up to today, we have not had to explicitly manage the Script Security plugin. Presumably this is because script security had been behaving properly as described under "Script Approval" on the Script Security plugin information page. We do not use Groovy Sandboxing.

Last night, all of the jobs failed with the following console output:
Started by timer
org
.jenkinsci.plugins.scriptsecurity.scripts.UnapprovedUsageException: script not yet approved for use
        at org
.jenkinsci.plugins.scriptsecurity.scripts.ScriptApproval.using(ScriptApproval.java:459)
        at org
.jenkinsci.plugins.workflow.cps.CpsFlowDefinition.create(CpsFlowDefinition.java:105)
        at org
.jenkinsci.plugins.workflow.cps.CpsFlowDefinition.create(CpsFlowDefinition.java:58)
        at org
.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206)
        at hudson
.model.ResourceController.execute(ResourceController.java:98)
        at hudson
.model.Executor.run(Executor.java:410)
Finished: FAILURE

No operations were performed yesterday which we suspect could have caused this error (no job configuration changes, no plugin upgrades, no hitting the clear script authorization button). When the jobs were run manually by an admin, they continued to fail with the same error. We were able to get a job to successfully complete again by saving its configuration without making changes. Saving the configuration of a single job did not fix the error for the rest of the jobs. We were able to get jobs running again by re-saving the configuration for each job individually. Up to this point, nothing appeared in the Script Authorization queue. Once half of the jobs were fixed, we restarted the Jenkins service. Half of the jobs still failed when run, but now the Script Authorization queue was populated with the Pipeline scripts for those broken jobs. Approving the scripts from the queue fixed the remaining jobs.

My best guess about what happened is (unexpected behaviors in bold):
  1. The script authorization white list was correctly populated as jobs were added and configured with Pipeline scripts by Jenkins admins
  2. The jobs ran successfully for some time
  3. An Unknown Event caused the script authorization white list to be cleared
  4. The jobs started failing with the UnapprovedUsageException error
  5. Some failure caused the script authorization queue to not be populated with the failed scripts
  6. Having an admin re-save the configuration for a job successfully re-authorized the Pipeline script for that job
  7. Restarting the Jenkins service fixed the error with the script authorization queue
  8. Approving from the script authorization queue worked as expected
Assuming my guess is more or less correct, I am most interested in diagnosing the Unknown Event so that we can take steps to prevent it from happening in the future. If that is not possible, steps to prevent the Script Security plugin from ever blocking jobs would also be appreciated. In our case, job stability is much more valuable than the security benefits provided by the plugin. I am also interested in the strange behavior of the script authorization queue, but that is not critical.

I think it is less plausible, but I'll also suggest another guess which would explain this behavior (unexpected behaviors in bold):
  1. The Script Security plugin was never been running correctly, so its white list was not populated
  2. The jobs ran successfully because the Script Security plugin was broken and did not stop them
  3. An Unknown Event caused the Script Security plugin to suddenly start working
  4. The jobs started failing because their Pipeline scripts were not yet on the white list
  5. Some failure caused the script authorization queue to not be populated with the failed scripts
  6. Having an admin re-save the configuration for a job successfully authorized the Pipeline script for that job for the first time
  7. Restarting the Jenkins service fixed the error with the script authorization queue
  8. Approving from the script authorization queue worked as expected

I would appreciate any help in figuring out what happened and how to prevent recurrences. Please let me know if there is anything I provide/do to make it easier. Thank you,

Daniel Koverman


danb...@gmail.com

unread,
Aug 10, 2016, 4:45:19 PM8/10/16
to Jenkins Users
I forgot to mention I'm running Jenkins v2.12, Pipeline v2.2, and Script Security Plugin v1.20.
Reply all
Reply to author
Forward
0 new messages