Hi folks
My team has been working on an issue on-and-off since July 23rd.
I think we might have hit the jackpot in terms of trying to reproduce
the issue that affected us initially on July 23rd. Here’s what happened:
- Once the copy of the Prod Jenkins Home finished, I started Jenkins into quiet mode (I didn’t want a prod deployment that runs on a schedule running in stage by mistake). Jenkins started without issues.
- Then, I disabled all the jobs (again to prevent a job from running by mistake whenever I took Jenkins out of quiet mode).
- Then, since we were running stage with production’s config, the
stage controller actually connected to the prod AWS account to create
the agents there. Ooops.
- Since having stage create its agents in the wrong AWS account is
not ideal, I ran my ansible configuration playbook in stage. Three restarts
later and Jenkins didn’t crash in any of them. Stage configuration was
successful!
- From the UI, I disabled quiet mode, but I noticed the builds were not starting.
2021-09-07 20:19:11.628+0000 [id=29] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7a94f7bb failed
java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded
at hudson.ExtensionList.getInstance(ExtensionList.java:166)
at jenkins.security.QueueItemAuthenticatorConfiguration.get(QueueItemAuthenticatorConfiguration.java:61)
at jenkins.security.QueueItemAuthenticatorConfiguration$ProviderImpl.getAuthenticators(QueueItemAuthenticatorConfiguration.java:70)
at jenkins.security.QueueItemAuthenticatorProvider$IteratorImpl.hasNext(QueueItemAuthenticatorProvider.java:44)
at hudson.model.Queue$Item.authenticate(Queue.java:2331)
at hudson.model.Node.canTake(Node.java:401)
at hudson.model.Queue.makeFlyWeightTaskBuildable(Queue.java:1736)
at hudson.model.Queue.makeBuildable(Queue.java:1698)
at hudson.model.Queue.maintain(Queue.java:1546)
at hudson.model.Queue$MaintainTask.doRun(Queue.java:2902)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
- So I restarted Jenkins one more time (again, with the same
configuration my playbook had left in the previous restart, no changes),
when suddenly
java.lang.IllegalStateException: Expected 1 instance of jenkins.security.s2m.AdminWhitelistRule but got 0
at hudson.ExtensionList.lookupSingleton(ExtensionList.java:451)
at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:59)
at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:42)
at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:286)
at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:351)
at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:287)
at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:777)
at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:713)
at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:777)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:762)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:638)
at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:307)
at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:299)
This is an issue that has shown up before. Usually another restart
fixes the issue, but I’ve now restarted Jenkins about 4 times and it
still shows up that error. I’m hoping this will allow us to investigate a
bit more what’s going on.
I have the GC logs, logs, thread dumps and an SOS report from stage. The latest PID is 2058587, so the last GC logs is this file gc-2058587-2021-09-07_16-11-45.log.
Some of those would need to be sanitized before I can share, but let me know if any of that would be useful.
First and foremost, is there a fix for this? Secondly, is this a known bug?
Best Regards,
Doug Whitfield