We had just set up a new deployment, and after deploying some apps successfully, all of a sudden Jenkins was not able to spin up worker pods. Kept getting an error saying that the agents were offline.
I did a dump of the jenkins logs, and many of the logs at the end of it said there was a service account issue.
Nov 22, 2016 3:44:02 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
SEVERE: Error in provisioning; slave=KubernetesSlave name: kubernetes-45a3293ad7d84770a6ace9aa571584ab-7acb7b6b299, template=org.csanchez.jenkins.plugins.kubernetes.PodTemplate@b1d042f
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default/api/v1/namespaces/default/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:310)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:261)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:232)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:207)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:547)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:243)
at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:573)
at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:553)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I checked, and the jenkins service account was still there. And the secrets were all still there as well, which contained the certificates.
I have two namespaces deployed with jenkins deployed to each. It was affecting both namespaces and both could not initialize the agents.
I was not able to figure out how to recover from this, so I had to redo the deployment with a fresh install. The fresh install resolved the issue. I am unable to replicate the issue.
Any idea on what may have caused this or why jenkins would be reporting a service account being revoked? Any idea how I might recover from this in the future?