[JIRA] (JENKINS-61687) Exceptions due to race condition during build deletion

7 views
Skip to first unread message

dbeck@cloudbees.com (JIRA)

unread,
Apr 18, 2020, 3:43:03 PM4/18/20
to jenkinsc...@googlegroups.com
Daniel Beck updated an issue
 
Jenkins / Bug JENKINS-61687
Exceptions due to race condition during build deletion
Change By: Daniel Beck
Summary: Exceptions due to race condition during build deeltion deletion
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

Kalle.Niemitalo@procomp.fi (JIRA)

unread,
May 6, 2020, 4:40:08 AM5/6/20
to jenkinsc...@googlegroups.com
Kalle Niemitalo commented on Bug JENKINS-61687
 
Re: Exceptions due to race condition during build deletion

The problem occurs when many builds finish around the same time and is unrelated to the global build discarder.

No, we got "An exception occurred when executing Project Build Discarder" from three consecutive builds even though there were several minutes of idle time between them.

Build Start Get node Finish Build Discarder
JobA #89 08:38:24 08:38:24 08:44:29 08:44:29 NoSuchFileException deleting JobA #39
JobB #579 08:49:18 08:49:19 08:53:32 08:53:53 NoSuchFileException deleting JobB #529
JobA #90 10:18:40 10:18:42 10:24:18 10:24:18 NoSuchFileException deleting JobA #40
JobB #580 10:19:50 10:24:17 10:28:53 No error

Periodic background build discarder ran hourly at **:26:01 or **:26:02, taking less than a second each time.

(I hypothesize that each build runs Project Build Discarder twice in parallel and those race against each other, and that build JobB #580 avoided the error because Periodic background build discarder had already discarded the build and both instances of Project Build Discarder then noticed that.)

The call stack is a bit different from what is in the issue description, because this is on Windows:

toukokuuta 06, 2020 10:24:18 AP. WARNING jenkins.model.BackgroundGlobalBuildDiscarder lambda$processJob$0

An exception occurred when executing Project Build Discarder
Also:   java.nio.file.NoSuchFileException: REDACTED\branches\master\builds\40 -> REDACTED\branches\master\builds\.40
		at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79)
		at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
		at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
		at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
		at java.nio.file.Files.move(Files.java:1395)
		at hudson.model.Run.delete(Run.java:1621)
		at hudson.tasks.LogRotator.perform(LogRotator.java:166)
jenkins.util.io.CompositeIOException: Failed to rotate logs for [REDACTED/master #40]
	at hudson.tasks.LogRotator.perform(LogRotator.java:223)
	at hudson.model.Job.logRotate(Job.java:469)
	at jenkins.model.JobGlobalBuildDiscarderStrategy.apply(JobGlobalBuildDiscarderStrategy.java:54)
	at jenkins.model.BackgroundGlobalBuildDiscarder.lambda$processJob$0(BackgroundGlobalBuildDiscarder.java:67)
	at jenkins.model.BackgroundGlobalBuildDiscarder$$Lambda$230/0000000000000000.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at jenkins.model.BackgroundGlobalBuildDiscarder.processJob(BackgroundGlobalBuildDiscarder.java:61)
	at jenkins.model.GlobalBuildDiscarderListener.onFinalized(GlobalBuildDiscarderListener.java:49)
	at hudson.model.listeners.RunListener.fireFinalized(RunListener.java:255)
	at hudson.model.Run.onEndBuilding(Run.java:2018)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.finish(WorkflowRun.java:625)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$800(WorkflowRun.java:137)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:1026)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:488)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:821)

dbeck@cloudbees.com (JIRA)

unread,
May 6, 2020, 5:36:02 AM5/6/20
to jenkinsc...@googlegroups.com

(I hypothesize that each build runs Project Build Discarder twice in parallel and those race against each other

I don't remember how far along I was in the investigation when I posted the comment, but IIRC this is correct.

Specifically, the "background" build discarders are also run after a build finishes, in addition to the regular build discarder. The idea being, why should it only run periodically, when it could as well run when a build is finished, and not accumulate a lot of builds for its periodic run?

https://github.com/jenkinsci/jenkins/blob/da90af311587f6c3d37ec4e9c4b4637763924743/core/src/main/java/jenkins/model/GlobalBuildDiscarderListener.java#L46-L50

Now, if you configure the "job specific build discarder" as part of global/background build discarders (the default), there'll be two build discarder instantiations running after a build is finished, both configured to delete the exact same builds. So this configuration basically creates the circumstances under which the (AFAICT) long-standing concurrency bug occurs. (Strictly speaking, any global build discarder could trigger this, but if the configurations are different, it's likely that only one build discarder will ever find builds to delete.)

A workaround for this specific case only could be to skip the "project specific build discarder" when we're triggered by a build finishing (i.e. only run it when triggered by the periodic run), but that seems like a hack, but doesn't consider fixed global build discarders that match the job configuration exactly.

A more reasonable workaround, don't run multiple build discarders on a job in parallel. There's currently no synchronization there.


Ideally however, we'd fix the concurrency issues in build deletion, then neither builds in quick succession, nor background build discarders will be a problem.

Kalle.Niemitalo@procomp.fi (JIRA)

unread,
May 6, 2020, 6:33:04 AM5/6/20
to jenkinsc...@googlegroups.com

Would it be reasonable to specifically catch and swallow java.nio.file.NoSuchFileException from the Files.move call? Or would that just surface more concurrency problems in other parts of hudson.model.Run.delete?

Reply all
Reply to author
Forward
0 new messages