improving WorkspaceCleanupThread in a multi-branch world

1,312 views
Skip to first unread message

Michael Neale

unread,
May 11, 2016, 7:38:03 AM5/11/16
to Jenkins Developers
One thing I have come across with using multibranch pipeline (with github, BTW) is that it generates a LOT of noise in a agent's workspace. 

The WorkspaceCleanupThread runs periodically to cleanup things, but will only remove workspaces on an agent (slave) machine that haven't been used in a month. Any younger and they are kept by default. 

In a multibranch world. you have jobs, and thus disposable workspaces popping into and out of existence like subatomic particles of the quantum foam of outer-space (getting carried away with metaphor here) - this can end up swamping the agent. 

Lets say you are lucky enough to have a productive team: they contribute 5 pull requests a day that get merged at the rate of 1 a day (so, 5 branches pop into and out of existence each day). The master works great, but on the slave that means 5 new workspaces created each day. 

Say each workspace is about 1GB of "stuff" (it can happen, not hard with artifact caching, maven, npm). That is 5 Gig a day accumulated, or 100Gig a month of noise, before it is eligible to be cleaned up. 

Multiply this by a few teams, and you can end up using up a lot of disk for what amounts to the steady state of just a few jobs (remember, as branches are removed, Jenkins removes them from the master, but the workspaces on any agents have to be garbage collected). Disk space or inodes could be exhausted before the garbage collector could do its thing, and in any case it is fairly inefficient. 

https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/WorkspaceCleanupThread.java#L114  - is the code that simply checks for age. Are there better heuristics we could apply in this world? (check for disk space, reduce age before deletion if it is low, inodes if you have a unix filesystem). Checking the ratio of "older" workspaces to current ones could also mean reducing the age at which things could be deleted. 

Is it worth exploring smarter heuristics, or is the real solution to have an alternative workspace cleaner that multibranch plugins include, that cleanup any workspaces on the event of a branch (job) removal? (not sure how easy this would be, may have to be best efforts, and supplement the workspace cleanup thread). 

Thoughts? 

Daniel Beck

unread,
May 11, 2016, 8:05:28 AM5/11/16
to jenkin...@googlegroups.com

> On 11.05.2016, at 13:38, Michael Neale <mne...@cloudbees.com> wrote:
>
> Is it worth exploring smarter heuristics, or is the real solution to have an alternative workspace cleaner that multibranch plugins include, that cleanup any workspaces on the event of a branch (job) removal? (not sure how easy this would be, may have to be best efforts, and supplement the workspace cleanup thread).
>

WorkspaceCleanupThread should be moved into a plugin, and from there it can be grown, be made configurable and/or get siblings. There are probably already issues in Jira related to all that.

Michael Neale

unread,
May 11, 2016, 8:20:13 AM5/11/16
to Jenkins Developers, m...@beckweb.net


On Wednesday, May 11, 2016 at 10:05:28 PM UTC+10, Daniel Beck wrote:


WorkspaceCleanupThread should be moved into a plugin, and from there it can be grown, be made configurable and/or get siblings. There are probably already issues in Jira related to all that.


For things like multibranch, do you think it is better they explicitly remove unneeded workspaces on the agents, vs tuning the cleanup thread? (given they know when it can safely be removed) - or they can provide their own supplemental cleanup. 

Oliver Gondža

unread,
May 11, 2016, 8:30:08 AM5/11/16
to jenkin...@googlegroups.com
I do not think it is as easy as that. There are several fundamental
problems that prevent us to implement what Michael suggests.

- There is no easy way to map workspace directory to Job(s) that used
that, much less about the build. (Unless we want to traverse all builds).
- There is no guarantee that $REMOTE_FS_ROOT/workspace/XXX is actually a
Jenkins build workspace. I remember people reporting they store some
other stuff there so we can not purge what we can not prove is a Jenkins
build's workspace (without breaking their experience).

There is a lot of heuristics to implement if we address those:

- If all jobs that used workspace directory in the past are deleted the
WSs can go immediately. (Current implementation can not prove the
directory belongs to a job so it leaves it there IIRC)
- If the system is running out of space, oldest workspaces that does not
host running build can go.
- If we claim that everything in $REMOTE_FS_ROOT/workspace/ belongs to
Jenkins we can remove any clutter create by other tools or jobs
accessing $WORKSPACE/.. accidentally (as people use to put it).
- ...

Otherwise, we are doomed to deal with the clutter on slaves.

--
oliver

Jesse Glick

unread,
May 11, 2016, 2:47:06 PM5/11/16
to Jenkins Dev
On Wed, May 11, 2016 at 8:30 AM, Oliver Gondža <ogo...@gmail.com> wrote:
> There is no easy way to map workspace directory to Job(s) that used that

No, but you can easily map `Job` to a workspace on a given `Node`. So
when a job is deleted, it should be possible to immediately scan all
currently connected nodes and delete the associated workspace if there
is one. (You need to look for `@…` variants too.)

`WorkspaceCleanupThread` already uses `getWorkspaceFor` so I am not
sure what you think the problem is. Maybe you were thinking of the
two-year-old code prior to JENKINS-21023.

> There is no guarantee that $REMOTE_FS_ROOT/workspace/XXX is actually a
> Jenkins build workspace. I remember people reporting they store some other
> stuff there

That is their problem. They should pick another location.

Oliver Gondža

unread,
May 12, 2016, 5:17:37 AM5/12/16
to jenkin...@googlegroups.com
On 05/11/2016 08:47 PM, Jesse Glick wrote:
> On Wed, May 11, 2016 at 8:30 AM, Oliver Gondža <ogo...@gmail.com> wrote:
>> There is no easy way to map workspace directory to Job(s) that used that
>
> No, but you can easily map `Job` to a workspace on a given `Node`. So
> when a job is deleted, it should be possible to immediately scan all
> currently connected nodes and delete the associated workspace if there
> is one. (You need to look for `@…` variants too.)

The workspace can be used by another job's build via custom workspace
feature at the time you delete it.

> `WorkspaceCleanupThread` already uses `getWorkspaceFor` so I am not
> sure what you think the problem is. Maybe you were thinking of the
> two-year-old code prior to JENKINS-21023.

Custom workspace feature is one of the problems. Also, it will not cover
any changes to item name or custom workspace value. (After you change
any of that, old workspaces will be left behind).

>> There is no guarantee that $REMOTE_FS_ROOT/workspace/XXX is actually a
>> Jenkins build workspace. I remember people reporting they store some other
>> stuff there
>
> That is their problem. They should pick another location.

Perhaps yes, though we can do a better job documenting that. (Whatever
directory Jenkins do not know about will be considered a disposable
workspace)

--
oliver

Michael Neale

unread,
May 12, 2016, 6:19:35 PM5/12/16
to Jenkins Developers
On Thursday, May 12, 2016 at 7:17:37 PM UTC+10, ogondza wrote:
On 05/11/2016 08:47 PM, Jesse Glick wrote:
> On Wed, May 11, 2016 at 8:30 AM, Oliver Gondža <ogo...@gmail.com> wrote:
>> There is no easy way to map workspace directory to Job(s) that used that
>
> No, but you can easily map `Job` to a workspace on a given `Node`. So
> when a job is deleted, it should be possible to immediately scan all
> currently connected nodes and delete the associated workspace if there
> is one. (You need to look for `@…` variants too.)

The workspace can be used by another job's build via custom workspace
feature at the time you delete it.




In any case, this is what has been in action for some time, unless custom workspace somehow disables it, then there haven't been complaints so far. 

In theory the best time to delete is when the job is removed, but even so, this cleanup thread will always be needed as there will be cases it isn't possible to reach it.  

Jesse Glick

unread,
May 12, 2016, 6:36:43 PM5/12/16
to Jenkins Dev
On Thu, May 12, 2016 at 5:17 AM, Oliver Gondža <ogo...@gmail.com> wrote:
> The workspace can be used by another job's build via custom workspace feature

The custom workspace (mis)feature should not be abused to reuse a
valid workspace.

Anyway we could always check `WorkspaceList` to see if we can get a
lock on the workspace while we delete it.

> it will not cover any
> changes to item name or custom workspace value. (After you change any of
> that, old workspaces will be left behind).

Sure, but there is only so much we can do.

> Perhaps yes, though we can do a better job documenting that. (Whatever
> directory Jenkins do not know about will be considered a disposable
> workspace)

Anything under the overall workspace directory must be considered disposable.

Jesse Glick

unread,
May 12, 2016, 6:37:45 PM5/12/16
to Jenkins Dev
On Thu, May 12, 2016 at 6:36 PM, Jesse Glick <jgl...@cloudbees.com> wrote:
>> it will not cover any changes to item name
>
> Sure, but there is only so much we can do.

…though we could have an `ItemListener.onLocationChanged` which
deletes workspaces from the old location too.

Michael Neale

unread,
May 12, 2016, 11:09:27 PM5/12/16
to Jenkins Developers
So talking in code instead, this could be done in the computed folder: 


(doing similar work to what the cleanup thread does, but when it removes the orphaned item). 

Michael Neale

unread,
Oct 5, 2016, 8:07:54 PM10/5/16
to Jenkins Developers
Tilting at windmills again - this is still biting and there are no solutions out there. 

The 3 approaches I can see: 

* Improve workspace cleanup thread (garbage collection is always good) - this is a good catch all
* Improve computed folder to be smarter about removed jobs
* Remove workspaces when things are deleted, as much as possible. 

These aren't technically related to pipeline (although Tyler has found a bug with parallel) - but they are magnified by git flow with multibranch and pipeline.

Jesse Glick

unread,
Oct 7, 2016, 2:27:40 PM10/7/16
to Jenkins Dev
On Wed, Oct 5, 2016 at 8:07 PM, Michael Neale <mne...@cloudbees.com> wrote:
> * Improve workspace cleanup thread (garbage collection is always good) -
> this is a good catch all

Yes, though it currently suffers from the issue that there is no
record on a node of what was what. See my proposal:

https://issues.jenkins-ci.org/browse/JENKINS-2111?focusedCommentId=270320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-270320

> * Improve computed folder to be smarter about removed jobs

Not sure what that means.

> * Remove workspaces when things are deleted, as much as possible.

This now happens for workspaces on online agents corresponding to
branch projects that are deleted (via orphaning).

> Tyler has found a bug
> with parallel

Issue reference?

> they are magnified by git flow with multibranch and
> pipeline.

Agreed.

Michael Neale

unread,
Oct 9, 2016, 10:50:57 PM10/9/16
to Jenkins Developers
On Saturday, October 8, 2016 at 5:27:40 AM UTC+11, Jesse Glick wrote:
On Wed, Oct 5, 2016 at 8:07 PM, Michael Neale <mne...@cloudbees.com> wrote:

Yes, though it currently suffers from the issue that there is no
record on a node of what was what. See my proposal:

https://issues.jenkins-ci.org/browse/JENKINS-2111?focusedCommentId=270320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-270320


That sounds pretty good, and would solve a lot of ills. 
 

> * Improve computed folder to be smarter about removed jobs

Not sure what that means.



This was in relation to my PR which was on computed folder (seemed easiest place to clean it up) - however JENKINS-2111 proposal (above) seems to cover this. 
 

> * Remove workspaces when things are deleted, as much as possible.

This now happens for workspaces on online agents corresponding to
branch projects that are deleted (via orphaning).

Great. 
 

> Tyler has found a bug
> with parallel

Issue reference?

Will get Tyler to post it. 
 

Michael Neale

unread,
Oct 9, 2016, 11:46:14 PM10/9/16
to Jenkins Developers
FYI Tylers ticket on parallel: 


"Nodes allocated inside of parallel() should have their workspaces removed immediately"
Reply all
Reply to author
Forward
0 new messages